Description
I've seen some effort towards annotating scipy
, and that's great!
As it currently stands, (static) type-checkers report many false positives, as they're unable to infer, for example, the signature of a function or its return type.
This can be rather frustrating, especially for typed libraries that use scipy
internally. One such example is Lmo
, which is fully type-hinted and heavily relies on scipy.quad
, scipy.stats
, scipy.special
, and scipy.optimize
. I can confidently say that the majority of the effort I put into typing Lmo
went into dealing with typing issues "caused" by scipy
. To illustrate, lmo/typing/scipy.py
is a collections of such workarounds, amounting to nearly 1,000 lines of code. But in most cases, workarounds like these aren't even possible, so the only options are either to (mis)use typing.cast
or to forcibly ignore the typing errors.
I've had to deal with similar issues in other projects of mine as well.
And that's why I decided to build scipy-stubs
.
What's up with scipy-stubs
?
By now, this must sound like a bad supervillain story, because after all, I could've contributed those stubs to scipy
directly.
So allow me to explain myself here, officer.
There are a couple of important reasons for why I chose to go about it this way and it doesn't include world domination.
Experimental stubs should be optional
Adding type hints to scipy
could have a huge impact on typed libraries. Taking Lmo
as an example again, it would almost certainly require changes to many of its current annotations to maintain compatiblity.
Lmo
, like many other scientific Python libraries, follows SPEC 0, so it supports a specific range of scipy
versions.
But because Lmo
wants to have valid type annotations for all of these versions, it wouldd require a lot of effort to remain compatible with each scipy
version. Additionally, there has also been a lot of work put into improving the typing of numpy recently (sorry, my bad). If done absolutely right, then the CI test matrix would become a combinatorial mess (and might actually turn me into an supervillain).
But all of this can be avoided by making the new type annotations opt-in. This way, package maintainers can, for example, require it for specific scipy
or numpy
versions.
Independent release cycle
New things move fast, and scipy-stubs
does too. But old mature projects like scipy
don't move as fast.
Giving a young husky to the pope isn't very wise; and neither is adding new stubs to scipy
.
So let's at least wait until scipy-stubs
is more mature before considering merging it into scipy
.
Most new ideas are wrong
When writing stubs, you'll often run into issues that could be solved in several non-trivial ways. If you later find out that you've made a suboptimal choice, you're going to want to improve it. But if the relevant stubs are bundled with scipy
, that probably won't be possible, as it would break backward compatibility.
So to illustrate this, take scipy.sparse.linalg.LinearOperator
for example:
- Should it be made generic? (spoiler alert: yes)
- Should you add a type parameter for the shape, even if it's usually either 1-d or 2-d?
- Would you bind it to
tuple[int] | tuple[int, int]
, ortuple[int, ...]
, or would you use separate type parameters, one for each dimension, e.g. usingtyping.TypeVarTuple
?
- Would you bind it to
- Should the the dtype parameter to
np.dtype
ornp.generic
? Or perhaps onlynp.number[Any]
, maybe includingnp.bool
? - How will you deal with
numpy.object_
scalars, given thattype(np.object_("world domination")) is str
?
So I’m sure you can imagine that it’s easy to make a decision that you’ll want to rectify later on.
Having a separate scipy-stubs
will effectively allow for such a “ctrl+z” option, as it doesn’t require backward compatibility: Package maintainers can pin the scipy-stubs
version that they support and adjust their types accordingly.
Linters, type-checkers, stub-testers, test-runners, ...
In order to properly write type stubs, they have to be valid and complete. And there are tools to help with that, an important one being mypy
's stubtest
.
In the case of scipy-stubs
, it also uses ruff
and pyright
in its toolchain (at the moment only locally). This toolchain is very different from the one that scipy
uses. So I’m guessing that it would require a significant amount of effort to get them to work together within the CI.
There’s also the developer experience (DX) aspect that it would harm. For instance, having to wait for the runtime stuff to complete in the CI each time you submit a change to the stubs.
But scipy-stubs
on its own can just use GitHub actions, while keeping the workflow runtime below 5 minutes.
I hope that clears things up a bit. And I might’ve said this before, but I wouldn’t mind merging scipy-stubs
into scipy
once it is mature enough.
My only goal with scipy-stubs
is to provide type hints for all of scipy
:
The Swiss Army knife of data science deserves to have proper type annotations too.