What's new for Python in 2025?

Python 3.14 was released on 7th October 2025. Here we summarise some of the more interesting changes and some trends in Python development and data-science over the past year. We will highlight the following:
- the colourful Python command-line interface;
- project-management tool
uv
; - free-threading;
- and a brief summary of other developments.
The Python 3.14 release notes also describe the changes to base Python.
Colourful REPL
At Jumping Rivers we have taught a lot of people to program in Python. Throughout a programming career you get used to making, and learning from, mistakes. The most common mistakes made in introductory programming lessons may still trip you up in 10 years time: unmatched parentheses, typos, missing quote symbols, unimported dependencies.
Our Python training courses are presented using Jupyter. Jupyter notebooks have syntax highlighting that makes it easy to identify an unfinished string, or a mis-spelled keyword.
But, most Python learners don’t use Jupyter (or other high-level programming tools) on day one - they experiment with Python at the command line. You can type “python” into your shell/terminal window and start programming into the “REPL” (read-evaluate-print loop).
Any effort to make the REPL easier to work with will be beneficial to beginning programmers. So the introduction of syntax highlighting in the Python 3.14 REPL is really beneficial.
uv
and package development
One of the big trends in Python development within 2025, is the rise of
the project management tool
uv
. This is a Rust-based command-line tool
and can be used to initialise a package / project structure, to specify
the development and runtime environment of a project, and to publish a
package to PyPI.
At Jumping Rivers, we have used poetry
for many of the jobs that uv
excels at. Python is used for the data preparation tasks for
diffify.com, and we use
poetry
to ensure that our developers each use
precisely the same package versions when working on that project (See our current
blog series on Poetry). But,
poetry
doesn’t prevent developers using different versions of Python.
For that, we need a second tool like
pyenv
(which allows switching
between different Python versions) or for each developer to have the
same Python version installed on their machine.
uv
goes a step further than poetry
and allows us to pin Python
versions for a project. Let’s use uv
to install Python 3.14, so that
we can test out features in the new release.
First follow the
instructions for installing uv
.
Then at the command line, we will use uv
to create a new project where
we’ll use Python 3.14.
# [bash]
cd ~/temp
mkdir blog-py3.14
cd blog-py3.14
# Which versions of Python 3.14 are available via uv?
uv python list | grep 3.14
# cpython-3.14.0rc2-linux-x86_64-gnu <download available>
# cpython-3.14.0rc2+freethreaded-linux-x86_64-gnu <download available>
You’ll see something similar regardless of the operating system that you use. That lists two versions of Python 3.14 - one with an optional system called “Free Threading” (see later). We’ll install both versions of Python:
uv python install cpython-3.14.0rc2-linux-x86_64-gnu
uv python install cpython-3.14.0rc2+freethreaded-linux-x86_64-gnu
Users of pyenv
will be able to install Python 3.14 in a similar
manner.
We can select between the two different Python versions at the command line. First using the version that does not have free threading:
uv run --python=3.14 python
# Python 3.14.0rc2 (main, Aug 18 2025, 19:19:22) [Clang 20.1.4 ] on linux
# ...
>>> import sys
>>> sys._is_gil_enabled()
# True
Then using the version with free threading (note the t
suffix)
uv run --python=3.14t python
# ...
# Python 3.14.0rc2 free-threading build (main, Aug 18 2025, 19:19:12) [Clang 20.1.4 ] on linux
# ...
>>> import sys
>>> sys._is_gil_enabled()
# False
Project creation and management with uv
uv
is capable of much more than allowing us to switch between
different versions of Python. The following commands initialise a Python
project with uv
:
# From ~/temp/blog-py3.14
# Indicate the default python version for the project
uv python pin 3.14
# Initialise a project in the current directory
uv init .
# Check the Python version
uv run python --version
# Python 3.14.0rc2
This adds some files for project metadata (pyproject.toml, README.md) and version control:
tree -a -L 1
# .
# ├── .git
# ├── .gitignore
# ├── main.py
# ├── pyproject.toml
# ├── .python-version
# ├── README.md
# ├── uv.lock
# └── .venv
#
# 2 directories, 6 files
Now we can add package dependencies using uv add <packageName>
and
other standard project-management tasks. But one thing I wanted to
highlight is that uv
allows us to start a Jupyter notebook, using the
project’s Python interpreter, without either adding jupyter
as a
dependency or explicitly defining a kernel for jupyter
:
uv run --with jupyter jupyter lab
Creating a new notebook using the default Python 3 kernel in the JupyterLab session that starts, should ensure you are using the currently active Python 3.14 environment.
Threading
Python 3.13 introduced an experimental feature, ‘Free-threading’, that is now officially supported as of 3.14.
First though, what is a ’thread’? When a program runs on your computer, there are lots of different tasks going on. Some of those tasks could run independently of each other. You, as the programmer, may need to explain to the computer which tasks can run independently. A thread is a way of cordoning-off one of those tasks; it’s a way of telling the computer that your software is running on, that this task here can run separately from those tasks there, and the logic for running this task too. (Basically).
Python has allowed developers to define threads for a while. If you have a few tasks that are largely independent of each other, each of these tasks can run in a separate thread. Threads can access the same memory space, meaning that they can access and modify shared variables in a Python session. In general, this also means that a computation in one thread could update a value that is used by another thread, or that two different threads could make conflicting updates to the same variable. This freedom can lead to bugs. The CPython interpreter was originally written with a locking mechanism (the Global Interpreter Lock, GIL) that prevented different threads from running at the same time (even when multiple processors were available) and limited the reach of these bugs.
Traditionally, you would have used threads for “non-CPU-bound tasks” in Python. These are the kinds of tasks that would be unaffected by having more, or faster, processors available to the Python instance: network traffic, file access, waiting for user input. For CPU-bound tasks, like calculations and data-processing, you could use Python’s ‘multiprocessing’ library (although some libraries like ‘numpy’ have their own low-level mechanisms for splitting work across cores). This starts multiple Python instances, each doing a portion of the processing, and allows a workload to be partitioned across multiple processors.
The main other differences between threading and multiprocessing in Python are in memory and data management. With threading, you have one Python instance, with each thread having access to the same memory space. With multiprocessing, you have multiple Python instances that work independently: the instances do not share memory, so to partition a workload using multiprocessing, Python has to send copies of (subsets of) your data to the new instances. This could mean that you need to store two or more copies of a large dataset in memory when using multiprocessing upon it.
Simultaneous processing across threads that share memory-space is now possible using the free-threaded build of Python. Many third-party packages have been rewritten to accommodate this new build and you can learn more about free-threading and the progress of the changes in the “Python Free-Threading Guide”.
As a simple-ish example, lets consider natural language processing.
There is a wonderful blog post about parallel processing with the
nltk
package on the
“WZB Data Science Blog”.
We will extend that example to use free-threading.
ntlk
provides access to some of the
Project Gutenberg books, and we can
access this data as follows:
# main.py
import nltk
def setup():
nltk.download("gutenberg")
nltk.download("punkt_tab")
nltk.download('averaged_perceptron_tagger_eng')
corpus = { f_id: nltk.corpus.gutenberg.raw(f_id)
for f_id in nltk.corpus.gutenberg.fileids()
}
return corpus
corpus = setup()
The key-value pairs in corpus
are the abbreviated book-title and
contents for 18 books. For example:
corpus["austen-emma.txt"]
# [Emma by Jane Austen 1816]
#
# VOLUME I
#
# CHAPTER I
#
#
# Emma Woodhouse, handsome, clever, and rich, with a comfortable home ...
A standard part of a text-processing workflow is to tokenise and tag the
“parts-of-speech” (POS) in a document. We can do this using two nltk
functions:
# main.py ... continued
def tokenise_and_pos_tag(doc):
return nltk.pos_tag(nltk.word_tokenize(doc))
A function to sequentially tokenise and POS-tag the contents of a corpus of books can be written:
# main.py ... continued
def tokenise_seq(corpus):
tokens = {
f_id: tokenise_and_pos_tag(doc)
for f_id, doc in corpus.items()
}
return tokens
You need to install or build Python in a particular way to make use of
“Free-threaded” Python. In the above, we installed Python “3.14t” using
uv
, so we can compare the speed of free-threaded and sequential,
single-core, processing.
We will use the
timeit
package to
analyse processing speed, from the command line.
# Activate the threaded version of Python 3.14
uv python pin 3.14t
# Install the dependencies for our main.py script
uv add timeit nltk
# Time the `tokenise_seq()` function
# -- but do not time any setup code...
PYTHON_GIL=0 \
uv run python -m timeit \
--setup "import main; corpus = main.setup()" \
"main.tokenise_seq(corpus)"
# [lots of output messages]
# 1 loop, best of 5: 53.1 sec per loop
After some initial steps where the nltk
datasets were downloaded and the
corpus
object was created (neither of which were timed, because these
steps were part of the timeit
--setup
block), tokenise_seq(corpus)
was
run multiple times and the fastest speed was around 53 seconds.
A small note: we have used the environment variable PYTHON_GIL=0
here.
This makes it explicit that we are using free-threading (turning off the
GIL). This wouldn’t normally be necessary to take advantage of
free-threading (in Python “3.14t”), but was needed because one of the
dependencies of nltk
hasn’t
been validated for the free-threaded build yet.
To write a threaded-version of the same, we introduce two functions. The first is a helper that takes (filename, document-content) pairs and returns (filename, processed-document) pairs:
def tupled_tokeniser(pair):
file_id, doc = pair
return file_id, tokenise_and_pos_tag(doc)
The second function creates a Thread-pool, taking advantage of as many CPUs as there are available
on my machine (16, counted by multiprocessing.cpu_count()
). Each document is processed as a
separate thread and we wait for all of the documents to be processed before returning results to the
caller:
import multiprocessing as mp
from concurrent.futures import ThreadPoolExecutor, wait
# ...
def tokenise_threaded(corpus):
with ThreadPoolExecutor(max_workers=mp.cpu_count()) as tpe:
try:
futures = [
tpe.submit(tupled_tokeniser, pair)
for pair in corpus.items()
]
wait(futures)
finally:
# output is a list of (file-id, data) pairs
tokens = [f.result() for f in futures]
return tokens
# Time the `tokenise_threaded()` function
# -- but do not time any setup code...
PYTHON_GIL=0 \
uv run python -m timeit \
--setup "import main; corpus = main.setup()" \
"main.tokenise_threaded(corpus)"
# [lots of output messages]
# 1 loop, best of 5: 32.5 sec per loop
I could see that every core was used when processing the documents, using the
htop
tool on Ubuntu. At points during the run, each of the 16 CPUs was at
near to 100% use (whereas only one or two CPUs were busy at any time during the sequential run):
But, despite using 16x as many CPUs, the multithreaded version of the processing script was only about 40% faster. There was only 18 books in the dataset and some disparity between the book lengths (the bible, containing millions of words was processed much slower than the others). Maybe the speed up would be greater with a larger or more balanced dataset.
In the post on the WZB Data Science blog, there is a multiprocessing implementation of the above. Running their multiprocessing code with 16 CPUs gave a similar speed up to multithreading (minimum time 31.2 seconds). Indeed, if I was writing this code for a real project, multiprocessing would remain my choice, because the analysis for one book can proceed independently of that for any other book and data volumes aren’t that big.
Other News
Python 3.14 has also introduced some improvements to exception-handling, a new approach to string templating and improvements to the use of concurrent interpreters. See the Python 3.14 release notes for further details.
In the wider Python Data Science ecosystem, a few other developments have occurred or are due before the end of 2025:
- The first stable release of the Positron IDE was made in August;
- Pandas 3.0 is due before the end of the year, and will introduce strings as a data-type, copy-on-write behaviour, and implicit access to columns in DataFrame-modification code;
- Tools that ingest DataFrames are becoming agnostic to DataFrame library through the Narwahls project. See the Plotly write-up on this subject.
Python data science progresses at such a speed that we can only really scratch the surface here. Have we missed anything in the wider Python ecosystem (2025 edition) that will make a huge difference to your data work? Let us know on LinkedIn or Bluesky.
