-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ENH] Add Subsequence Extraction Transformer to sktime.transformations
module
#6967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions about the design here.
- why is this a panel transformer? Do we need information about the other sequences for one given sequence?
- related: why are we padding at first? Should the algorithm not also work for unequal length series?
- currently, we are carrying out the aggregation in
fit
and remembering indices fortransform
, correct? Should we not instead carry it out intransform
, i nthe first place? - we can carry out minimal parameter validation in
fit
, or in__init__
, to avoid the "else" and "try-except" blocks later.
Further, code formatting checks are faliing, I assume that precommit has not been fully set up by you - here's how to set them up locally: https://www.sktime.net/en/latest/developer_guide/coding_standards.html#setting-up-local-code-quality-checks
Responses to @fkiraly:
Ahh, I was referencing
Remo
8000
ved in this commit. Initially I had it to pass some checks, but found an alternative solution without needing to use
Initially, I defined "fitting" as trying to compute the indices needed to retrieve the correct subsequence from the sequence. Then, I reserved the "transform" step to actually retrieve the subsequence from the input data. Do you think this is a good way to design this transformer or can the fitting definition be improved here?
Refactored in this commit.
Ahh, I forgot that I reconfigured my WSL instance to use Ubuntu 22.04 and forgot to reinstall the pre-commits in that process. Thanks! |
In general, the series seen in
Yes, series-to-series is the correct type in my opinion. |
I see. It does make sense especially if we are using a fitted transformer for inference purposes, so after more thought, I agree with you. Changes are in this commit. Thanks! |
great, restarting the tests. |
Apologies for the failed pre-commit. I believe I resolved it this time. I need to double check if I set my pre-commits correctly on my local machine. Can you restart the tests when you get a chance? Thanks! |
(another go) |
Running into build issue #6772, namely Read the Docs build failed due to timeout. Confirmed that this issue shouldn't block the PR's build. A restart on the build until the docs build step passes should do the trick. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Suggestions for improvement, but not blocking:
- we could allow an arbitrary series-to-primitives transformer (from
sktime
), or a function with specific signature, as theaggregate
algorithm - the docstring should say whether
subsequence_len
is iniloc
orloc
units - the docstring should say what exactly the algorithm is - it is implied but never explicitly stated
- are there any literature references for the use of this algorithm that one should add? It is simple, but perhaps where it is used as a component is helpful.
- Perhaps different naming might be more convenient to the user? Shorten
subsequence_len
, and callmethod
(not very specific),selector
orselect_fun
orselect_method
or similar?
Some updates from the great recommendations provided by @fkiraly (Thanks!):
I was able to add numpy function support for
Addressed here.
Addressed here. Is there a way to check if the LaTeX is correctly rendered and if the hyperlink reference to Wikipedia is also correctly formatted?
Added here. It took me a while to find one but it turns out that this transformer generalizes the maximum sum subarray problem to using other aggregate functions and selector methods. Because of this, I changed the default values of the transformer here.
Changed |
ok, great! Already approved before, but before approving again, we need to make sure that the tests run. |
1. :math:`A(x_{i}, \cdots, x_{i+k-1})` is maximal when `selector = 'max'`, and | ||
2. :math:`A(x_{i}, \cdots, x_{i+k-1})` is minimal when `selector = 'min'`. | ||
|
||
When `aggregate_fn = np.sum` and `selector = 'max'`, the problem degenerates to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to ensure the docstring displays code fonts correctly, use double backticks, not single backticks for code. Single backticks are reserved for rst expressions such as :math:
.
Also, minor point about this sentence, I would lead with the special case: "Maximum sum subarray is a special case and can be obtained by setting ..." - if this is an important subcase, I would also mention it further up.
"handles-missing-data": False, | ||
} | ||
|
||
def __init__(self, subseq_len, aggregate_fn=np.sum, kwargs={}, selector="max"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we must avoid functions and mutable defaults like the dict.
The pattern to deal with this is using None
, and internally a private attribute self._aggregate_fn
(since the attributes of same name should always be identical to init args).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only some small change requests remaining, see above.
Thanks for the review @fkiraly. Hopefully this will be the last batch of code updates. 😄
Addressed here.
Addressed here. Similar to
Because I decided to make
Removed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! left one non-blocking comment about documenting the options.
Addressed merge conflicts in |
…ns` module (sktime#6967) #### Reference Issues/PRs Fixes sktime#6901. #### What does this implement/fix? Explain your changes. Subsequence extraction transformer is a transformer created to extract subsequences of specified length that meet some criterion with respect to an aggregate function. It addresses questions like the following: - Given a sequence $(x_1, x_2, ..., x_n)$, find a $k$ - length subsequence such that the average value of $(x_j, x_{j+1}, ..., x_{j+k-1})$ is maximal.
Reference Issues/PRs
Fixes #6901.
What does this implement/fix? Explain your changes.
Subsequence extraction transformer is a transformer created to extract subsequences of specified length that
meet some criterion with respect to an aggregate function. It addresses questions like the following:
Does your contribution introduce a new dependency? If yes, which one?
Nope.
What should a reviewer concentrate their feedback on?
Did you add any tests for the change?
Nope.
Any other comments?
Excited to add my first PR to sktime. Really looking forward to the reviews!
PR checklist
For all contributions
How to: add yourself to the all-contributors file in the
sktime
root directory (not theCONTRIBUTORS.md
). Common badges:code
- fixing a bug, or adding code logic.doc
- writing or improving documentation or docstrings.bug
- reporting or diagnosing a bug (get this pluscode
if you also fixed the bug in the PR).maintenance
- CI, test framework, release.See here for full badge reference
maintainers
tag - do this if you want to become the owner or maintainer of an estimator you added.See here for further details on the algorithm maintainer role.
For new estimators
docs/source/api_reference/taskname.rst
, follow the pattern.Examples
section.python_dependencies
tag and ensureddependency isolation, see the estimator dependencies guide.