-
Notifications
You must be signed in to change notification settings - Fork 2.2k
fix: RecursiveSplitter
bug in the case when the recursive chunking is triggered
#9316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
RecursiveSplitter
bug in the case where a split_text
is found to be longer than split_length
and recursive chunking is triggered
RecursiveSplitter
bug in the case where a split_text
is found to be longer than split_length
and recursive chunking is triggeredRecursiveSplitter
bug in the case when the recursive chunking is triggered
Pull Request Test Coverage Report for Build 14727388496Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
@@ -426,7 +426,7 @@ def test_run_separator_exists_but_split_length_too_small_fall_back_to_character_ | |||
result = splitter.run(documents=[doc]) | |||
assert len(result["documents"]) == 10 | |||
for doc in result["documents"]: | |||
if re.escape(doc.content) not in ["\ "]: | |||
if re.escape(doc.content) not in ["\\ "]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember seeing a warning about a regex. I assume this helps to get rid of that warning. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's it 👍🏽
Related Issues
Proposed Changes
Removed the line that was making a cumulative count in the case where a
split_text
is found to be longer thansplit_length
and recursive chunking is triggered.A new test case was added to the
RecursiveDocumentSplitter
that verifies its behavior regarding the issue mentioned above.How did you test it?
test_run_complex_text_with_multiple_separators()
to the existing test fileChecklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
and added!
in case the PR includes breaking changes.