Add selector user guide #1419

Vincent-Maladiere · 2025-05-27T16:44:16Z

doc/selectors.rst

rcap107 · 2025-06-04T09:53:45Z

doc/selectors.rst

+- Delayed selection: passing a selection rule, to be evaluated later on a
+  dataframe that is not yet available. For example, without selectors
+  it is not possible to instantiate a :class:`~skrub.SelectCols` that selects "all
+  columns that have numerical data types, except the column ``'User ID'``", if the


I am not very convinced by this example. How about using datetime columns that haven't yet been parsed as datetimes? So they'd start as strings, and the selector for dates would not recognize them as such.

better? "all columns except those with the suffix 'ID'"

to me the delayed selection is important when there are transformations that add or modify columns so that they're different from the input, so casting to a different datatype or maybe generating features with one of the encoders (which is why I mentioned datetimes)

"all columns except those with the suffix ID" looks to me like something that could be done immediately

doc/selectors.rst

skrub/_select_cols.py

doc/selectors.rst

rcap107

Thanks a lot for the PR @Vincent-Maladiere

I made a few comments, mostly to reword a few sentences.

rcap107 · 2025-06-05T15:18:24Z

doc/selectors.rst

+dataframes:
+
+>>> import pandas as pd
+>>> df = pd.DataFrame(


I'm wondering if there should be a line explaining that "this is a dataset about page sizes", I find it a bit unusual and did not understand what the numbers were saying the first read through

Vincent-Maladiere · 2025-06-05T16:48:57Z

We can revamp this user guide slightly with OnEachColumn
It might be worth adding an example for selectors as well

rcap107 · 2025-06-06T12:35:34Z

We can revamp this user guide slightly with OnEachColumn

It might be worth adding an example for selectors as well

In this PR or a separate one?

Vincent-Maladiere · 2025-06-06T13:32:59Z

Maybe in a separate one?

rcap107

sure we can always iterate in separate PRs

Vincent-Maladiere · 2025-06-06T13:59:24Z

waiting for @GaelVaroquaux to have a quick look before merging

rcap107 · 2025-06-16T11:38:57Z

@GaelVaroquaux mentioned in the weekly meeting that this and other doc PRs should be iterated over, so we can merge it and open new PRs as needed

add doc selectors

4610cfc

Vincent-Maladiere force-pushed the doc_selectors branch from 2b9de03 to 4610cfc Compare June 2, 2025 13:21