8000 Preliminary "foot in the door" SQLite support by brandur · Pull Request #870 · riverqueue/river · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Preliminary "foot in the door" SQLite support #870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 14, 2025
Merged

Conversation

brandur
Copy link
Contributor
@brandur brandur commented May 1, 2025

Here, add an implementation for SQLite. The new driver passes the entire
driver test suite (including some newly added tests), and checks out
against a new end-to-end suite of client tests targeting it. I think at
this point we can consider it to be largely functional, although with
minimal real-world vetting, but at least alpha quality that we can start
putting in front of users.

The main reason this turned out to be quite an effort is that SQLite,
although nominally supporting a lot of the same syntax Postgres does,
just has a lot of unexpected limitations and quirks that all need to be
tracked down separately and a workaround added. For example:

  • SQLite doesn't really have data types. Everything is either an
    integer, real, text, or blob, and anything more sophisticated just has
    to be shoehorned into one of these types. For example, there's no
    boolean. Despite the fanfare recently, there isn't even a jsonb. Jsonb
    gets pushed into a blob.

  • The most annoying missing data type is a timestamp. Date/times are
    stored as either Unix integers or strings, and SQLite provides a
    number of obnoxious, error-prone, built-ins to work with them.

  • No arrays.

  • No modification operations in CTEs. (Although counting my lucky stars
    that CTEs are supported at least.)

  • No listen/notify.

Aside from that though, we had to get some testing infrastructure in
place, and our testing has traditionally been quite affixed to Postgres,
which would've made a lot of changes necessary. However, the schema
additions in #848 do help to make this quite a bit easier with the
driver-based TestSchema and TestTx functions, and because they raise
schema automatically, it doesn't complicate test setup instructions in
any way by adding extra steps.

A new TestDriverRiverSQLite test has been added to run the full
barrage of driver tests for SQLite. I've also added a new family of
TestClientWithDriver tests that are similar to TestDriver* in that
they run for each of our supported drivers, but in this case they do a
series of end-to-end tests to more fully exercise the entire River
client. I added a respectable set of base client driver tests that walks
each driver through all basic functionality (including for features that
are higher risk of failing from one database compared to the next like
JobList), but as with most tests, they could stand to be extended
further, which I'm sure we'll get to in the future.

An example test for SQLite has been added, demonstrating the use of
River with a SQLite driver and in-memory SQLite database.

A few caveats:

  • I've found that the combination of SQLite's reduced capabilities +
    sqlc's bugginess [1] make bit batch insert/update operations basically
    impossible (I tried every workaround I could possibly thin of), so
    these are done in loops of individual operations instead. I think
    this is okay for now though. For one, sqlc will hopefully get these
    limitations fixed eventually, and for two, SQLite databases will often
    be running locally, meaning the round trip cost per operation is much
    lower than what we'd see in a hosted Postgres somewhere.

  • It's not totally clear that having a SQLite implementation will be
    "worth it enough" in the long run given that it will add some
    difficulty to non-trivial future database-related operations. My hope
    is that as it's in a prerelease state, we can gauge how bad it is to
    keep up to date. If it feels like way more effort than it's worth, we
    can still axe it before it ever becomes a functional driver.

[1] sqlc-dev/sqlc#3802 (comment)

@brandur brandur force-pushed the brandur-sqlite branch 6 times, most recently from 0bc580b to ddb3277 Compare May 1, 2025 05:53
@travisby
Copy link
travisby commented May 1, 2025

Just to highlight: I'm super stoked for this.

I know that the river recommendation is to spin up postgres for tests, but I really like (external) dependency-less tests. I have some pretty gross/extensive mocks for testing and for a "local development" version of our app.

This is going to let me do sooo much cleanup, thanks! Hoping that

It's not totally clear that having a SQLite implementation will be "worth it enough"

we find enough usecases for that to not be the case! I understand my more niche example is likely not enough reason on its own!

@brandur
Copy link
Contributor Author
brandur commented May 2, 2025

@travisby Great! Thanks for the feedback. We'll try to get this over the line.

@brandur brandur force-pushed the brandur-sqlite branch 6 times, most recently from 8e8cb6c to bfaa56e Compare May 3, 2025 21:00
@brandur
Copy link
Contributor Author
brandur commented May 3, 2025

Alright, I added implementations for all of river_queue as well, along with river_client + river_client_queue although these aren't actually used and we'll probably delete them. That leaves only river_job left.

I ran into a bit of a problem in that I'm pretty sure sqlc + SQLite + use of jsonb is completely broken, and had to implement everything using normal json for now (which TBH, probably isn't that much worse). I opened a discussion about that on sqlc, but given some of the track record of other questions/comments I've left on sqlc issues over the last couple weeks, I'm not super optimistic that it'll go anywhere:

sqlc-dev/sqlc#3953

I did fix at least one existing bug during the implementation too, namely that river_queue being unpaused would target an already unpaused queue despite comments in tests clearly indicating that this was unintentional (and missing assertions to verify a no-op), which is kind of a nice side benefit.

@brandur
Copy link
Contributor Author
brandur commented May 6, 2025

@bgentry Hey, how would you feel about me bringing this in? I know it's a little unconventional, but I'm now really close — I got almost all of river_job implemented, and am down to only two queries left. Unfortunately it's the two worst ones, and I'll have to do some work at the client level for sure to make sure we have adequate test coverage, but I'm feeling pretty good that this is going to work.

I ended up adding a bunch of new test cases in the driver tests to help tease out problems with SQLite. Unfortunately that makes the PR more prone to collision, but more positively, the added coverage will help tease out more any other potential problems that might still exist in #883.

If brought into mainline, I figure I'll be able to keep chipping away at it and probably have the whole thing working within a week or two.

@brandur brandur force-pushed the brandur-sqlite branch 5 times, most recently from 78c1733 to d05bc02 Compare May 11, 2025 01:21
@@ -715,7 +725,7 @@ func NewClient[TTx any](driver riverdriver.Driver[TTx], config *Config) (*Client
// we're actually going to be working jobs (as opposed to just enqueueing
// them):
if config.willExecuteJobs() {
if !driver.HasPool() {
if !driver.PoolIsSet() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up renaming this so that it sorts more closely to its with PoolSet in the driver files.

@brandur brandur force-pushed the brandur-sqlite branch 3 times, most recently from 5acd3af to a02c9cc Compare May 11, 2025 01:29
@brandur brandur requested a review from bgentry May 11, 2025 01:32
@brandur brandur force-pushed the brandur-sqlite branch 4 times, most recently from 8515ecc to cec4162 Compare May 11, 2025 08:25
Copy link
Contributor
@bgentry bgentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made it through again 😅

CHANGELOG.md Outdated
@@ -7,6 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Preliminary River driver for SQLite (`riverdriver/riversqlite`). This driver seems to produce good results as judged by the test suite, but so far as minimal real world vetting. Try it and let us know how it works out. [PR #870](https://github.com/riverqueue/river/pull/870).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Preliminary River driver for SQLite (`riverdriver/riversqlite`). This driver seems to produce good results as judged by the test suite, but so far as minimal real world vetting. Try it and let us know how it works out. [PR #870](https://github.com/riverqueue/river/pull/870).
- Preliminary River driver for SQLite (`riverdriver/riversqlite`). This driver seems to produce good results as judged by the test suite, but so far has minimal real world vetting. Try it and let us know how it works out. [PR #870](https://github.com/riverqueue/river/pull/870).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. Fixed.

@@ -1997,12 +2014,16 @@ func (c *Client[TTx]) JobList(ctx context.Context, params *JobListParams) (*JobL
}
params.schema = c.config.Schema

if c.driver.DatabaseName() == "sqlite" && params.metadataFragment != "" {
return nil, errors.New("JobListResult.Metadata is not supported on SQLite")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI doesn't have metadata filters yet but when I add args + metadata filters this will break. We may need to expose a flag in the UI's API about whether these filters should be enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this more, but I'm kind of thinking we should switch to JSON Path for that stuff. It's more general and can be supported by multiple databases.

@@ -12,16 +12,27 @@ require (
github.com/stretchr/testify v1.10.0
go.uber.org/goleak v1.3.0
golang.org/x/mod v0.24.0
modernc.org/sqlite v1.37.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-checking that none of this will get pulled in to a non-sqlite app's deps. Like if I switch the demo project to use this version, does it even download sqlite at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, damn, good catch on that. Yeah, that would've added modernc is a dep everywhere.

Basically what happened is that as I was trying to test things, I'd added an import ( _ "modernc.org/sqlite" ) to riversharedtest. It wasn't needed by the end, but because of the underscore import, goimport can't detect that it'd not being used, and therefore left it in.

There is a DBPoolSQLite function in there, but it only needs to use the sql package. The caller tests must be importing modernc.org/sqlite, but that's in the test assembly.

I took out the import and tidied. This dependency is gone now.

Comment on lines +49 to +52
// TODO(brandur): Remove this before first release including SQLite driver. It's
// needed temporarily because there's no riversqlite tag to target (the one
// referenced above is fake and does not exist).
replace github.com/riverqueue/river/riverdriver/riversqlite => ../river/riverdriver/riversqlite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting just to not forget this before release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, +1. I have a big warning banner at the top of my TODO list as a reminder to myself to remove it.

// it back early means that a detach can no longer be run on it. It
// doesn't matter much if an extra schema is attached until the tests
// finish running.
// require.NoError(t, exec.SchemaAttach(ctx, &riverdriver.SchemaAttachParams{Schema: schemaVersion2}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making sure you meant to leave this in here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nope. That was part of a previous attempt in getting things to work and no longer needed. I also removed one other similar block in this file that was also commented out.

@brandur brandur force-pushed the brandur-sqlite branch 6 times, most recently from 02e5a41 to d359c14 Compare May 14, 2025 01:17
Here, add an implementation for SQLite. The new driver passes the entire
driver test suite (including some newly added tests), and checks out
against a new end-to-end suite of client tests targeting it. I think at
this point we can consider it to be largely functional, although with
minimal real-world vetting, but at least alpha quality that we can start
putting in front of users.

The main reason this turned out to be quite an effort is that SQLite,
although nominally supporting a lot of the same syntax Postgres does,
just has a lot of unexpected limitations and quirks that all need to be
tracked down separately and a workaround added. For example:

* SQLite doesn't really have data types. Everything is either an
  integer, real, text, or blob, and anything more sophisticated just has
  to be shoehorned into one of these types. For example, there's no
  boolean. Despite the fanfare recently, there isn't even a jsonb. Jsonb
  gets pushed into a blob.

* The most annoying missing data type is a timestamp. Date/times are
  stored as either Unix integers or strings, and SQLite provides a
  number of obnoxious, error-prone, built-ins to work with them.

* No arrays.

* No modification operations in CTEs. (Although counting my lucky stars
  that CTEs are supported at least.)

* No listen/notify.

Aside from that though, we had to get some testing infrastructure in
place, and our testing has traditionally been quite affixed to Postgres,
which would've made a lot of changes necessary. However, the schema
additions in #848 do help to make this quite a bit easier with the
driver-based `TestSchema` and `TestTx` functions, and because they raise
schema automatically, it doesn't complicate test setup instructions in
any way by adding extra steps.

A new `TestDriverRiverSQLite` test has been added to run the full
barrage of driver tests for SQLite. I've also added a new family of
`TestClientWithDriver` tests that are similar to `TestDriver*` in that
they run for each of our supported drivers, but in this case they do a
series of end-to-end tests to more fully exercise the entire River
client. I added a respectable set of base client driver tests that walks
each driver through all basic functionality (including for features that
are higher risk of failing from one database compared to the next like
`JobList`), but as with most tests, they could stand to be extended
further, which I'm sure we'll get to in the future.

An example test for SQLite has been added, demonstrating the use of
River with a SQLite driver and in-memory SQLite database.

A few caveats:

* I've found that the combination of SQLite's reduced capabilities +
  sqlc's bugginess [1] make bit batch insert/update operations basically
  impossible (I tried every workaround I could possibly thin of), so
  these are done in loops of individual operations instead. I *think*
  this is okay for now though. For one, sqlc will hopefully get these
  limitations fixed eventually, and for two, SQLite databases will often
  be running locally, meaning the round trip cost per operation is much
  lower than what we'd see in a hosted Postgres somewhere.

* It's not totally clear that having a SQLite implementation will be
  "worth it enough" in the long run given that it will add some
  difficulty to non-trivial future database-related operations. My hope
  is that as it's in a prerelease state, we can gauge how bad it is to
  keep up to date. If it feels like way more effort than it's worth, we
  can still axe it before it ever becomes a functional driver.

[1] sqlc-dev/sqlc#3802 (comment)
@brandur
Copy link
Contributor Author
brandur commented May 14, 2025

Thanks Blake!

Okay, I changed my mind a bit and am going to merge this in after all. I do still want to try and get an improvement for jsonb in there, but even if I pull that off, the sqlc release cycles are so long that it's going to take some time. I also realized that since json and jsonb are both actually blob in SQLite, there won't be any required DDL for anyone who made a table with json. Given the driver is still pretty alpha, I think I'll just add a note in the changelog with some UPDATE statements that'll convert json to jsonb instead of doing a full migration.

I also added SQLite support to the River CLI to make sure people try to use it have the ability to run migrations and everything.

@brandur brandur merged commit d04b82a into master May 14, 2025
10 checks passed
@brandur brandur deleted the brandur-sqlite branch May 14, 2025 01:22
brandur added a commit that referenced this pull request May 16, 2025
…ut metadata

This one follows up #870 to add an optimization for job completion where
we separate out the most common case of setting jobs to `completed`
without any metadata required and update all of them in a simplified
batch query, then do the rest of the completions afterwards.

In any non-degenerate queue, most completions will be setting success
states so this should help with real world uses, but it also helps us
significantly improve SQLite's benchmarking numbers.

Here's a new benchmark run where throughput is ~4x what it was doing
before and roughly on par with Postgres:

    $ go run ./cmd/river bench --database-url "sqlite://:memory:" --num-total-jobs 1_000_000
    bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
    bench: jobs worked [      88218 ], inserted [          0 ], job/sec [    44109.0 ] [2s]
    bench: jobs worked [      91217 ], inserted [          0 ], job/sec [    45608.5 ] [2s]
    bench: jobs worked [      88858 ], inserted [          0 ], job/sec [    44429.0 ] [2s]
    bench: jobs worked [      77219 ], inserted [          0 ], job/sec [    38609.5 ] [2s]
    bench: jobs worked [      82045 ], inserted [          0 ], job/sec [    41022.5 ] [2s]
    bench: jobs worked [      84052 ], inserted [          0 ], job/sec [    42026.0 ] [2s]
    bench: jobs worked [      72028 ], inserted [          0 ], job/sec [    36014.0 ] [2s]
    bench: jobs worked [      90047 ], inserted [          0 ], job/sec [    45023.5 ] [2s]
    bench: jobs worked [      88875 ], inserted [          0 ], job/sec [    44437.5 ] [2s]
    bench: jobs worked [      89240 ], inserted [          0 ], job/sec [    44620.0 ] [2s]
    bench: jobs worked [      88842 ], inserted [          0 ], job/sec [    44421.0 ] [2s]
    bench: jobs worked [      59359 ], inserted [          0 ], job/sec [    29679.5 ] [2s]
    bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    42822.8 ], running 23.35203575s

Here's a normal non-memory file-based database:

    $ go run ./cmd/river bench --database-url "sqlite://./sqlite/bench.sqlite3" --num-total-jobs 1_000_000
    bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
    bench: jobs worked [      83657 ], inserted [          0 ], job/sec [    41828.5 ] [2s]
    bench: jobs worked [      76648 ], inserted [          0 ], job/sec [    38324.0 ] [2s]
    bench: jobs worked [      88036 ], inserted [          0 ], job/sec [    44018.0 ] [2s]
    bench: jobs worked [      75473 ], inserted [          0 ], job/sec [    37736.5 ] [2s]
    bench: jobs worked [      82604 ], inserted [          0 ], job/sec [    41302.0 ] [2s]
    bench: jobs worked [      84048 ], inserted [          0 ], job/sec [    42024.0 ] [2s]
    bench: jobs worked [      85508 ], inserted [          0 ], job/sec [    42754.0 ] [2s]
    bench: jobs worked [      90580 ], inserted [          0 ], job/sec [    45290.0 ] [2s]
    bench: jobs worked [      83568 ], inserted [          0 ], job/sec [    41784.0 ] [2s]
    bench: jobs worked [      86062 ], inserted [          0 ], job/sec [    43031.0 ] [2s]
    bench: jobs worked [      88508 ], inserted [          0 ], job/sec [    44254.0 ] [2s]
    bench: jobs worked [      75308 ], inserted [          0 ], job/sec [    37654.0 ] [2s]
    bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    42331.9 ], running 23.622860125s

The improved benchmarks only work for fixed job burndown mode (with the
`--num-total-jobs` option) because inserting jobs is still pretty slow
because it's still done one by one.

Once again, I'm pretty sure I'll be able to land some SQLite fixes
that'll make batch operations possible using `json_each`, and then we
should be able to make all normal operations batch-wise. That'll take
some time though, and we can get this optimization out in time for the
initial SQLite release.
brandur added a commit that referenced this pull request May 16, 2025
…ut metadata

This one follows up #870 to add an optimization for job completion where
we separate out the most common case of setting jobs to `completed`
without any metadata required and update all of them in a simplified
batch query, then do the rest of the completions afterwards.

In any non-degenerate queue, most completions will be setting success
states so this should help with real world uses, but it also helps us
significantly improve SQLite's benchmarking numbers.

Here's a new benchmark run where throughput is ~4x what it was doing
before and roughly on par with Postgres:

    $ go run ./cmd/river bench --database-url "sqlite://:memory:" --num-total-jobs 1_000_000
    bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
    bench: jobs worked [      88218 ], inserted [          0 ], job/sec [    44109.0 ] [2s]
    bench: jobs worked [      91217 ], inserted [          0 ], job/sec [    45608.5 ] [2s]
    bench: jobs worked [      88858 ], inserted [          0 ], job/sec [    44429.0 ] [2s]
    bench: jobs worked [      77219 ], inserted [          0 ], job/sec [    38609.5 ] [2s]
    bench: jobs worked [      82045 ], inserted [          0 ], job/sec [    41022.5 ] [2s]
    bench: jobs worked [      84052 ], inserted [          0 ], job/sec [    42026.0 ] [2s]
    bench: jobs worked [      72028 ], inserted [          0 ], job/sec [    36014.0 ] [2s]
    bench: jobs worked [      90047 ], inserted [          0 ], job/sec [    45023.5 ] [2s]
    bench: jobs worked [      88875 ], inserted [          0 ], job/sec [    44437.5 ] [2s]
    bench: jobs worked [      89240 ], inserted [          0 ], job/sec [    44620.0 ] [2s]
    bench: jobs worked [      88842 ], inserted [          0 ], job/sec [    44421.0 ] [2s]
    bench: jobs worked [      59359 ], inserted [          0 ], job/sec [    29679.5 ] [2s]
    bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    42822.8 ], running 23.35203575s

Here's a normal non-memory file-based database:

    $ go run ./cmd/river bench --database-url "sqlite://./sqlite/bench.sqlite3" --num-total-jobs 1_000_000
    bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
    bench: jobs worked [      83657 ], inserted [          0 ], job/sec [    41828.5 ] [2s]
    bench: jobs worked [      76648 ], inserted [          0 ], job/sec [    38324.0 ] [2s]
    bench: jobs worked [      88036 ], inserted [          0 ], job/sec [    44018.0 ] [2s]
    bench: jobs worked [      75473 ], inserted [          0 ], job/sec [    37736.5 ] [2s]
    bench: jobs worked [      82604 ], inserted [          0 ], job/sec [    41302.0 ] [2s]
    bench: jobs worked [      84048 ], inserted [          0 ], job/sec [    42024.0 ] [2s]
    bench: jobs worked [      85508 ], inserted [          0 ], job/sec [    42754.0 ] [2s]
    bench: jobs worked [      90580 ], inserted [          0 ], job/sec [    45290.0 ] [2s]
    bench: jobs worked [      83568 ], inserted [          0 ], job/sec [    41784.0 ] [2s]
    bench: jobs worked [      86062 ], inserted [          0 ], job/sec [    43031.0 ] [2s]
    bench: jobs worked [      88508 ], inserted [          0 ], job/sec [    44254.0 ] [2s]
    bench: jobs worked [      75308 ], inserted [          0 ], job/sec [    37654.0 ] [2s]
    bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    42331.9 ], running 23.622860125s

The improved benchmarks only work for fixed job burndown mode (with the
`--num-total-jobs` option) because inserting jobs is still pretty slow
because it's still done one by one.

Once again, I'm pretty sure I'll be able to land some SQLite fixes
that'll make batch operations possible using `json_each`, and then we
should be able to make all normal operations batch-wise. That'll take
some time though, and we can get this optimization out in time for the
initial SQLite release.
brandur added a commit that referenced this pull request May 16, 2025
…ut metadata

This one follows up #870 to add an optimization for job completion where
we separate out the most common case of setting jobs to `completed`
without any metadata required and update all of them in a simplified
batch query, then do the rest of the completions afterwards.

In any non-degenerate queue, most completions will be setting success
states so this should help with real world uses, but it also helps us
significantly improve SQLite's benchmarking numbers.

Here's a new benchmark run where throughput is ~4x what it was doing
before and roughly on par with Postgres:

    $ go run ./cmd/river bench --database-url "sqlite://:memory:" --num-total-jobs 1_000_000
    bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
    bench: jobs worked [      88218 ], inserted [          0 ], job/sec [    44109.0 ] [2s]
    bench: jobs worked [      91217 ], inserted [          0 ], job/sec [    45608.5 ] [2s]
    bench: jobs worked [      88858 ], inserted [          0 ], job/sec [    44429.0 ] [2s]
    bench: jobs worked [      77219 ], inserted [          0 ], job/sec [    38609.5 ] [2s]
    bench: jobs worked [      82045 ], inserted [          0 ], job/sec [    41022.5 ] [2s]
    bench: jobs worked [      84052 ], inserted [          0 ], job/sec [    42026.0 ] [2s]
    bench: jobs worked [      72028 ], inserted [          0 ], job/sec [    36014.0 ] [2s]
    bench: jobs worked [      90047 ], inserted [          0 ], job/sec [    45023.5 ] [2s]
    bench: jobs worked [      88875 ], inserted [          0 ], job/sec [    44437.5 ] [2s]
    bench: jobs worked [      89240 ], inserted [          0 ], job/sec [    44620.0 ] [2s]
    bench: jobs worked [      88842 ], inserted [          0 ], job/sec [    44421.0 ] [2s]
    bench: jobs worked [      59359 ], inserted [          0 ], job/sec [    29679.5 ] [2s]
    bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    42822.8 ], running 23.35203575s

Here's a normal non-memory file-based database:

    $ go run ./cmd/river bench --database-url "sqlite://./sqlite/bench.sqlite3" --num-total-jobs 1_000_000
    bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
    bench: jobs worked [      83657 ], inserted [          0 ], job/sec [    41828.5 ] [2s]
    bench: jobs worked [      76648 ], inserted [          0 ], job/sec [    38324.0 ] [2s]
    bench: jobs worked [      88036 ], inserted [          0 ], job/sec [    44018.0 ] [2s]
    bench: jobs worked [      75473 ], inserted [          0 ], job/sec [    37736.5 ] [2s]
    bench: jobs worked [      82604 ], inserted [          0 ], job/sec [    41302.0 ] [2s]
    bench: jobs worked [      84048 ], inserted [          0 ], job/sec [    42024.0 ] [2s]
    bench: jobs worked [      85508 ], inserted [          0 ], job/sec [    42754.0 ] [2s]
    bench: jobs worked [      90580 ], inserted [          0 ], job/sec [    45290.0 ] [2s]
    bench: jobs worked [      83568 ], inserted [          0 ], job/sec [    41784.0 ] [2s]
    bench: jobs worked [      86062 ], inserted [          0 ], job/sec [    43031.0 ] [2s]
    bench: jobs worked [      88508 ], inserted [          0 ], job/sec [    44254.0 ] [2s]
    bench: jobs worked [      75308 ], inserted [          0 ], job/sec [    37654.0 ] [2s]
    bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    42331.9 ], running 23.622860125s

The improved benchmarks only work for fixed job burndown mode (with the
`--num-total-jobs` option) because inserting jobs is still pretty slow
because it's still done one by one.

Once again, I'm pretty sure I'll be able to land some SQLite fixes
that'll make batch operations possible using `json_each`, and then we
should be able to make all normal operations batch-wise. That'll take
some time though, and we can get this optimization out in time for the
initial SQLite release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0