8000 Switch to `score` instead of `worstCaseBacktrackCount` by tjenkinson · Pull Request #637 · tjenkinson/redos-detector · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Switch to score instead of worstCaseBacktrackCount #637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 19, 2024
Merged

Conversation

tjenkinson
Copy link
Owner
@tjenkinson tjenkinson commented Nov 25, 2024

Switch to score instead of worstCaseBacktrackCount and fix some bugs.

worstCaseBacktrackCount wasn't always accurate if there were any unbound quantifiers given we didn't know the max length of the input string.

So now there's a more abstract score instead.

Breaking changes

Library:

  • maxBacktracks config removed
  • maxScore config added
  • worstCaseBacktrackCount removed
  • score added
  • hitMaxBacktracks error removed
  • hitMaxScore error added
  • BacktrackCount type removed
  • Score type added

CLI:

  • --maxBacktracks removed
  • --maxScore added

How is score calculated?

  • All the different paths an input string could take through the provided pattern are calculated.
  • Then for each candidate path found above, starting from just the first character, up to the complete path, all the other paths that could also match a string that matches the candidate path are found.
  • The score is the highest number found above. The higher the score, the more backtracks an engine will potentially need to take if the input string doesn't match the pattern.
  • If the score is 1 this means no backtracks can occur and for every possible input string the pattern could only match one way.
  • If there are too many different paths it can be too expensive to calculate an accurate score, so it falls back incrementing every time a new path is found.

closes #624
closes #621

@tjenkinson tjenkinson changed the title Switch to score instead of worstCaseBacktracks Switch to score instead of worstCaseBacktrackCount Nov 25, 2024
[/ab/, false, new Set(['missingAnchor'])],
[/(a+)+/, false, new Set(['missingAnchor'])],
[/a*a+/, false, new Set(['missingAnchor'])],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rxrx matches anx in 7 steps.

[/ab/, false, new Set(['missingAnchor'])],
[/(a+)+/, false, new Set(['missingAnchor'])],
[/a*a+/, false, new Set(['missingAnchor'])],
[/a*a{2,}/, false, new Set(['missingAnchor'])],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rxrx matches anx in 9 steps.

[/(a+)+/, false, new Set(['missingAnchor'])],
[/a*a+/, false, new Set(['missingAnchor'])],
[/a*a{2,}/, false, new Set(['missingAnchor'])],
[/a*aa/, false, new Set(['missingAnchor'])],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto 9 steps.

[/(a+)+/, true, new Set(['missingAnchor'])],
[/(a+)+$/, false, new Set(['missingAnchor'])],
[/(a+)+a/, false, new Set(['missingAnchor'])],
[/(a|aa)/, false, new Set(['missingAnchor'])],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rxrx 6 steps.

[/a*aa/, false, new Set(['missingAnchor'])],
[/(a+)+/, true, new Set(['missingAnchor'])],
[/(a+)+$/, false, new Set(['missingAnchor'])],
[/(a+)+a/, false, new Set(['missingAnchor'])],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rxrx matches in 21 steps irrespective of n.

@tjenkinson tjenkinson merged commit 6382d1f into main Dec 19, 2024
8 checks passed
@tjenkinson tjenkinson deleted the switch-to-score branch December 19, 2024 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

#618 regressions and more explicit methodology with 2.4.3 a simple regex (e.g "line.split(/\s+/) ;" for splitting into words) is now unsafe
2 participants
0