-
Notifications
You must be signed in to change notification settings - Fork 92
perf: speed up line detection #847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
when the `locator` is not disabled
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #847 +/- ##
=======================================
Coverage 95.12% 95.12%
=======================================
Files 8 8
Lines 2196 2196
Branches 577 577
=======================================
Hits 2089 2089
Misses 107 107 ☔ View full report in Codecov by Sentry. |
@kboshold this one is an actual fix for the performance issue from your initial example, even when not normalizing the input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two minor things I would change: the test-descriptions on the test you copied and the new one don't describe what's actually being tested.
Other than that it looks really good! I was a little confused at first when I saw your changes to the regex and loop, but after playing around with it for a little bit and testing the outcomes it seems to work exactly as before but without matching the content inbetween line-endings for no reason. Good job dude!
test/dom-parser.test.js
Outdated
@@ -330,7 +331,20 @@ describe('DOMParser', () => { | |||
const source = `<root>${'A'.repeat(50000)}\u2029${'A'.repeat(50000)}\u0085${'A'.repeat(50000)}\u2028${'A'.repeat(50000)}\u2029</root>`; | |||
const doc = parser.parseFromString(source, MIME_TYPE.XML_TEXT); | |||
expect(new XMLSerializer().serializeToString(doc)).toEqual(source.replace(/[\u0085\u2028\u2029]/g, '\n')); | |||
}, 500); | |||
}); | |||
test('should be able to open documents with alternative whitespace without creating a bottleneck and replacing them with \\n', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and replacing them with \n
this description doesn't make sense for this test
test/dom-parser.test.js
Outdated
@@ -330,7 +331,20 @@ describe('DOMParser', () => { | |||
const source = `<root>${'A'.repeat(50000)}\u2029${'A'.repeat(50000)}\u0085${'A'.repeat(50000)}\u2028${'A'.repeat(50000)}\u2029</root>`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without creating a bottleneck
The test description here actually also doesn't make sense, maybe you can rephrase it when you are already at it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, somehow the test I copied is also covering the one that @kboshold added, so I merged them into one and corrected the description.
test/dom-parser.test.js
Outdated
const > | ||
const normalizeLineEndings = jest.fn((source) => source); | ||
const { parser } = getTestParser({ onError, normalizeLineEndings }); | ||
const source = `<root>${'A'.repeat(15000)}\u2029${'A'.repeat(15000)}\u0085${'A'.repeat(15000)}\u2028${'A'.repeat(15000)}\u2029</root>`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make more sense to have a \n
as the first character, since the test currently has \u2029
twice.
Solution:
const source = `<root>${'A'.repeat(15000)}\n${'A'.repeat(15000)}\u0085${'A'.repeat(15000)}\u2028${'A'.repeat(15000)}\u2029</root>`;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's such big of a difference, but hey, I applied your suggestion.
I applied all the suggested changes, so please approve it. |
normalizeLineEndings
, so it actually fails..*
twice, to speed up processing larger files containing certain Unicode chars as demonstrated in Increased processing times when using \u2028 and \u2029 #838