TYP: scipy-stubs and the path towards typing scipy

I've seen some effort towards annotating scipy, and that's great!

As it currently stands, (static) type-checkers report many false positives, as they're unable to infer, for example, the signature of a function or its return type.

This can be rather frustrating, especially for typed libraries that use scipy internally. One such example is Lmo, which is fully type-hinted and heavily relies on scipy.quad, scipy.stats, scipy.special, and scipy.optimize. I can confidently say that the majority of the effort I put into typing Lmo went into dealing with typing issues "caused" by scipy. To illustrate, lmo/typing/scipy.py is a collections of such workarounds, amounting to nearly 1,000 lines of code. But in most cases, workarounds like these aren't even possible, so the only options are either to (mis)use typing.cast or to forcibly ignore the typing errors.
I've had to deal with similar issues in other projects of mine as well.

And that's why I decided to build scipy-stubs.

What's up with `scipy-stubs`?

By now, this must sound like a bad supervillain story, because after all, I could've contributed those stubs to scipy directly.

So allow me to explain myself here, ~~officer~~.
There are a couple of important reasons for why I chose to go about it this way _{and it doesn't include world domination_.}

Experimental stubs should be optional

Adding type hints to scipy could have a huge impact on typed libraries. Taking Lmo as an example again, it would almost certainly require changes to many of its current annotations to maintain compatiblity.

Lmo, like many other scientific Python libraries, follows SPEC 0, so it supports a specific range of scipy versions.
But because Lmo wants to have valid type annotations for all of these versions, it wouldd require a lot of effort to remain compatible with each scipy version. Additionally, there has also been a lot of work put into improving the typing of numpy recently (sorry, my bad). If done absolutely right, then the CI test matrix would become a combinatorial mess (and might actually turn me into an supervillain).

But all of this can be avoided by making the new type annotations opt-in. This way, package maintainers can, for example, require it for specific scipy or numpy versions.

Independent release cycle

New things move fast, and scipy-stubs does too. But ~~old~~ mature projects like scipy don't move as fast.
Giving a young husky to the pope isn't very wise; and neither is adding new stubs to scipy.

So let's at least wait until scipy-stubs is more mature before considering merging it into scipy.

Most new ideas are wrong

When writing stubs, you'll often run into issues that could be solved in several non-trivial ways. If you later find out that you've made a suboptimal choice, you're going to want to improve it. But if the relevant stubs are bundled with scipy, that probably won't be possible, as it would break backward compatibility.

So to illustrate this, take scipy.sparse.linalg.LinearOperator for example:

Should it be made generic? (spoiler alert: yes)
Should you add a type parameter for the shape, even if it's usually either 1-d or 2-d?
- Would you bind it to tuple[int] | tuple[int, int], or tuple[int, ...], or would you use separate type parameters, one for each dimension, e.g. using typing.TypeVarTuple?
Should the the dtype parameter to np.dtype or np.generic? Or perhaps only np.number[Any], maybe including np.bool?
How will you deal with numpy.object_ scalars, given that type(np.object_("world domination")) is str?

So I’m sure you can imagine that it’s easy to make a decision that you’ll want to rectify later on.

Having a separate scipy-stubs will effectively allow for such a “ctrl+z” option, as it doesn’t require backward compatibility: Package maintainers can pin the scipy-stubs version that they support and adjust their types accordingly.

Linters, type-checkers, stub-testers, test-runners, ...

In order to properly write type stubs, they have to be valid and complete. And there are tools to help with that, an important one being mypy's stubtest.

In the case of scipy-stubs, it also uses ruff and pyright in its toolchain (at the moment only locally). This toolchain is very different from the one that scipy uses. So I’m guessing that it would require a significant amount of effort to get them to work together within the CI.

There’s also the developer experience (DX) aspect that it would harm. For instance, having to wait for the runtime stuff to complete in the CI each time you submit a change to the stubs.

But scipy-stubs on its own can just use GitHub actions, while keeping the workflow runtime below 5 minutes.

I hope that clears things up a bit. And I might’ve said this before, but I wouldn’t mind merging scipy-stubs into scipy once it is mature enough.

My only goal with scipy-stubs is to provide type hints for all of scipy:
The Swiss Army knife of data science deserves to have proper type annotations too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What's up with `scipy-stubs`?

Experimental stubs should be optional

Independent release cycle

Most new ideas are wrong

Linters, type-checkers, stub-testers, test-runners, ...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

What's up with scipy-stubs?

Experimental stubs should be optional

Independent release cycle

Most new ideas are wrong

Linters, type-checkers, stub-testers, test-runners, ...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What's up with `scipy-stubs`?