8000 Switch to `score` instead of `worstCaseBacktrackCount` by tjenkinson · Pull Request #637 · tjenkinson/redos-detector · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Switch to score instead of worstCaseBacktrackCount #637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 79 additions & 65 deletions README.md
8000
Original file line number Diff line number Diff line change
@@ -1,77 +1,41 @@
# redos-detector

A CLI and library which tests with certainty if a regex pattern is safe from ReDoS attacks. Supported in the browser, Node and Deno.
A CLI and library which tests helps score how vulnerable a regex pattern is to ReDoS attacks. Supported in the browser, Node and Deno.

By default it considers a pattern unsafe if there are potentially more than 200 backtracks possible.
Note you should make sure that the input string has a sensible length limit, as the amount of work needed to process the input may increase with the length of the string.

There are some cases where it may report a pattern as unsafe when in reality it's safe, but it should never report a pattern as safe when it is not.
## What does it do?

Note that this tool checks patterns that are expected to run using an engine that implements the [ECMA-262](https://www.ecma-international.org/publications-and-standards/standards/ecma-262/) (JS) standard. If your pattern will be run in a different engine please be aware the result given here could be incorrect due to [differences in engine behaviour](docs/differences-in-engines.md).

## Demo

[https://redosdetector.com/](https://redosdetector.com/) [[Source](https://github.com/tjenkinson/redos-detector-demo)]

## Examples

### Good

```ts
isSafe(/^([a-c]*)d([a-c]*)$/).safe === true;
```

[Demo](https://redosdetector.com/?pattern=%5E%28%5Ba-c%5D*%29d%28%5Ba-c%5D*%29%24)

because for any given input string this can only match in one way, or not match.

### Bad

```ts
isSafe(/^([a-b]*)([a-c]*)$/).safe === false;
```

[Demo](https://redosdetector.com/?pattern=%5E%28%5Ba-b%5D*%29%28%5Ba-c%5D*%29%24)
- It calculates all the different paths an input string could take through the provided pattern.
- Then for each candidate path found above, starting from just the first character, up to the complete path, it finds all the other paths that could also match a string that matches the candidate path.
- The score is the highest number found above. The higher the score, the more backtracks an engine will potentially need to take if the input string doesn't match the pattern.
- If the score is `1` this means no backtracks can occur and for every possible input string the pattern could only match one way.
- If there are too many different paths it can be too expensive to calculate an accurate score, so it falls back incrementing every time a new path is found.

because the input `a` could match in both places. The input `ax` could result in both being tried.
Note that this tool checks patterns that are expected to run using an engine that implements the [ECMA-262](https://www.ecma-international.org/publications-and-standards/standards/ecma-262/) (JS) standard. If your pattern will be run in a different engine please be aware the result given here could be incorrect due to [differences in engine behaviour](docs/differences-in-engines.md).

The CLI would output the following for this:
### Example

```
Regex is not safe. There could be infinite backtracks.
With `^(.|a)(.|b)$` there are 6 candidate paths:

The following trails show how the same input can be matched multiple ways.
2: `[a-b]` | 10: `[a-c]`
========================
2: `[a-b]` | 10: `[a-c]`
10: `[a-c]` | 10: `[a-c]`
=========================
2: `[a-b]` | 10: `[a-c]`
10: `[a-c]` | 10: `[a-c]`
10: `[a-c]` | 10: `[a-c]`
=========================
2: `[a-b]` | 10: `[a-c]`
2: `[a-b]` | 10: `[a-c]`
========================
1. `.`
2. `a`
3. `..`
4. `.b`
5. `a.`
6. `ab`

Hit maximum number of backtracks so there may be more results than shown here.
Note there may be more results than shown here as some infinite loops are detected and removed.
```
Path 1 and 2 will be put into a group because an input string of `a` would match both. This would have a candidate score of 2.

The first result means you could have a match where given the same input string, `[a-c]` takes a character (at position 10), or `[a-b]` takes a character (at position 2).
Paths 3, 4, 5 and 6 will also be put into a group because an input string of `ab` would match them all. Therefore this group would have a candidate score of 4.

_Note this could be made good again by making the first group atomic. [Atomic groups](https://www.regular-expressions.info/atomic.html) are not supported directly right now, but can be inferred using a pattern like `^(?=([a-b]*))\1([a-c]*)$` ([Demo](https://redosdetector.com/?pattern=%5E%28%3F%3D%28%5Ba-b%5D*%29%29%5C1%28%5Ba-c%5D*%29%24))._
Therefore the score that's reported would be 4.

```ts
isSafe(/^(a|a)+$/).safe === false;
```

[Demo](https://redosdetector.com/?pattern=%5E%28a%7Ca%29%2B%24)
Note in reality with the input `ab!` it would backtrack 5 times (3 for `ab`, and then 2 for `a`), but the returned score is just the highest for a given prefix of the input string. This is because we don't know the maximum length the input string could be. E.g. with `^.a*.$` the total number of backtracks will be higher the more `a`'s there are.

is bad because in the group `a` could match on both sides. The input `aaaaaax` could result in many combinations being tried.

## How does it work?
## Demo

This tool will generate all the combinations of input strings that would match the pattern, and will output any input string that could match the pattern in multiple ways.
[https://redosdetector.com/](https://redosdetector.com/) [[Source](https://github.com/tjenkinson/redos-detector-demo)]

## Usage

Expand All @@ -88,11 +52,11 @@ The following is the structure of the result you will get from both `isSafe`, `i
```ts
type Root = {
safe: boolean;
error: null | 'hitMaxBacktracks' | 'hitMaxSteps' | 'timedOut';
error: null | 'hitMaxScore' | 'hitMaxSteps' | 'timedOut';
trails: Trail[];
patternDowngraded: boolean;
pattern: string;
worstCaseBacktrackCount:
score:
| {
infinite: true;
}
Expand Down Expand Up @@ -156,7 +120,7 @@ The following options exist for both the library and CLI:
- `unicode`: Enable unicode mode. _(Default: `false`)_
- `dotAll`: Enable dot-all mode, which allows `.` to match new lines. _(Default: `false`)_
- `multiLine`: Enable multi-line mode, which changes `^` and `$` to match the start or end of any line within the string. _(Default: `false`)_
- `maxBacktracks`: If worst case count of possible backtracks is above this number, the regex will be considered unsafe. _(Default: `200`)_
- `maxScore`: If the score goes above this number, the regex will be considered unsafe. _(Default: `200`)_
- `maxSteps`: The maximum number of steps to make. If this limit is hit `error` will be `hitMaxSteps`. You probably don't need to change this. _(Default: `20000`)_
- `timeout`: The maximum amount of time (ms) to spend processing. Once this time is passed the trails found so far will be returned, and the `error` will be `timeout`. _(Default: `Infinity`)_
- `downgradePattern`: Automatically downgrade the pattern if it's not supported as is. If this happens `patternDowngraded` will be `true` and `pattern` will contain the downgraded version. An exception may be thrown if the pattern needed to be downgraded and it wasn't. _(Default: `true`)_
Expand All @@ -166,7 +130,7 @@ _Note it's possible for there to be a infinite number of results, so you should
### CLI

```sh
$ npx redos-detector check "<regex pattern>" (--caseInsensitive) (--unicode) (--dotAll) (--multiLine) (--maxBacktracks <number>) (--maxSteps <number>) (--timeout <number>) (--alwaysIncludeTrails) (--disableDowngrade) (--resultsLimit <number>) (--json)
$ npx redos-detector check "<regex pattern>" (--caseInsensitive) (--unicode) (--dotAll) (--multiLine) (--maxScore <number>) (--maxSteps <number>) (--timeout <number>) (--alwaysIncludeTrails) (--disableDowngrade) (--resultsLimit <number>) (--json)
```

to run on the fly or
Expand All @@ -191,10 +155,59 @@ $ npm install redos-detector

The following functions are provided:

- `isSafe(regexp: RegExp, options?: { maxBacktracks?: number, maxSteps?: number, timeout?: number, downgradePattern?: boolean })`: This takes a [`RegExp`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp). The `i` and `u` flags are supported.
- `isSafePattern(pattern: string, options?: { maxBacktracks?: number, maxSteps?: number, timeout?: number, downgradePattern?: boolean, caseInsensitive?: boolean, unicode?: boolean, dotAll?: boolean, multiLine?: boolean })`: This takes just the pattern as a string. E.g. `a*`.
- `isSafe(regexp: RegExp, options?: { maxScore?: number, maxSteps?: number, timeout?: number, downgradePattern?: boolean })`: This takes a [`RegExp`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp). The `i` and `u` flags are supported.
- `isSafePattern(pattern: string, options?: { maxScore?: number, maxSteps?: number, timeout?: number, downgradePattern?: boolean, caseInsensitive?: boolean, unicode?: boolean, dotAll?: boolean, multiLine?: boolean })`: This takes just the pattern as a string. E.g. `a*`.
- `downgradePattern(input: { pattern: string, unicode: boolean }`: This downgrades the provided pattern to one which is supported. You won't need to use this unless you set the `downgradePattern` option to `false`.

## Further Examples

### Good

```ts
isSafe(/^([a-c]*)d([a-c]*)$/).safe === true;
```

[Demo](https://redosdetector.com/?pattern=%5E%28%5Ba-c%5D*%29d%28%5Ba-c%5D*%29%24)

because for any given input string this can only match in one way, or not match.

### Bad

```ts
isSafe(/^([a-b]*)([a-c]*)$/).safe === false;
```

[Demo](https://redosdetector.com/?pattern=%5E%28%5Ba-b%5D*%29%28%5Ba-c%5D*%29%24)

because the input `a` could match in both places. The input `ax` could result in both being tried.

The CLI would output the following for this:

```
Regex is not safe. There could be infinite backtracks.

The following trails show how the same input can be matched multiple ways.
2: `[a-b]` | 10: `[a-c]`
10: `[a-c]` | 10: `[a-c]`
=========================
2: `[a-b]` | 2: `[a-b]`
2: `[a-b]` | 10: `[a-c]`
10: `[a-c]` | 10: `[a-c]`
=========================
```

The first result means you could have a match where given the same input string, `[a-c]` takes a character (at position 10), or `[a-b]` takes a character (at position 2).

_Note this could be made good again by making the first group atomic. [Atomic groups](https://www.regular-expressions.info/atomic.html) are not supported directly right now, but can be inferred using a pattern like `^(?=([a-b]*))\1([a-c]*)$` ([Demo](https://redosdetector.com/?pattern=%5E%28%3F%3D%28%5Ba-b%5D*%29%29%5C1%28%5Ba-c%5D*%29%24))._

```ts
isSafe(/^(a|a)+$/).safe === false;
```

[Demo](https://redosdetector.com/?pattern=%5E%28a%7Ca%29%2B%24)

is bad because in the group `a` could match on both sides. The input `aaaaaa!` could result in many combinations being tried.

## Useful Resources

Here are some great resources, which I found super helpful when building this.
Expand All @@ -203,3 +216,4 @@ Here are some great resources, which I found super helpful when building this.
- **[https://regexper.com/](https://regexper.com/)**: Tool for visualising a regex pattern as a chart.
- **[https://regex101.com/](https://regex101.com/)**: Tool for testing a pattern with syntax highlighting and useful information.
- **[https://github.com/jviereck/regjsparser](https://github.com/jviereck/regjsparser)**: Parser used by this project.
- **[Counting in Regexes Considered Harmful: Exposing ReDoS Vulnerability of Nonbacktracking Matchers](https://www.usenix.org/conference/usenixsecurity22/presentation/turonova)**: A paper which demonstrates how simple unbounded patterns can be expensive.
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "redos-detector",
"description": "A CLI and library which tests with certainty if a regex pattern is safe from ReDoS attacks.",
"description": "A CLI and library which tests helps score how vulnerable a regex pattern is to ReDoS attacks. Supported in the browser, Node and Deno.",
"main": "dist/redos-detector.js",
"bin": {
"redos-detector": "bin.js"
Expand Down
Loading
Loading
0