Description
Description
Problem
The go-ipfs
dependency closure includes 47 modules under github.com/ipfs
. Here are their interdependencies (this does not include libp2p nor other PL orgs):
Pain
- Changes must be propagated across many repos, in the right order
- Repos are best-effort to maintain and keep up-to-date, leading to complex dependency graphs due to different versions floating around
- It's difficult to get feedback about whether a change is safe for consumers of the code, due to being in different repos w/ different CI
- In some cases, this discourages experimentation since it can be hard to bubble changes up to end-user applications like go-ipfs
Current Desirable Properties
- Experimental code can easily mix-and-match functionality from go-ipfs
- The dependency graph of the consumer does not include every transitive dependency of
go-ipfs
Why are repos structured this way?
The intention of the current layout was to encourage flexibility, extensibility, and experimentation. Functionality of IPFS could be reused in other projects without depending on IPFS as a whole.
Also these repos predate most Go tooling.
How much does a repo cost?
Repo maintenance costs include:
- Keeping dependencies up-to-date
- This is non-trivial as it often requires chasing down other dependencies in the dependency graph...mostly we don't do this until we have to
- Releasing new versions as necessary
- Making sure CI is still working
- Migrating from Travis/CircleCI to Actions (still in progress)
- Rolling out unified CI
- Backporting changes across major versions as necessary
- Manually testing impact of new code changes on downstream consumers
- Monitoring issue trackers, PRs, etc.
- Updating submodules
- Commonly used for testing, example code, etc.
- Often these contain circular module dependencies which complicate propagating breaking changes
Why now? What's changed?
We have an increasing amount of:
- Repos
- See maintenance costs above
- Some are in various states of deprecation, which adds to the maintenance costs and the cost of implementing new features
- Some don't build due to flaky tests, with not enough incentive to fix them until it becomes a blocker
- Projects
- Often these result in backwards-incompatible changes, sometimes even new major versions, which then need to be propagated around to all the downstream repos
- finding those repos can be difficult (e.g. backporting across versions, in-flight work, etc.)
- Increase in # in-flight projects means we're more likely to have repos in transient broken states which block/slow the progress of other projects (this happens often)
- Often these result in backwards-incompatible changes, sometimes even new major versions, which then need to be propagated around to all the downstream repos
Also, Go modules now exist, along with module graph pruning. The latter is key to preventing consumers from having an explosion of transient dependencies if they just want to reuse some small piece of code.
How can we consolidate repos? What's the ideal end state?
We want our repo layout to facilitate day-to-day development, while also letting us reuse components and functionality. Code that is commonly changed and built together should be in the same repo (as much as possible), so that it can be tested and released together.
We can leverage some of the new tooling around Go modules to retain the flexibility of separate repos, without having to pay the significant cost.
The ideal repo layout:
- go-libipfs
- Roll up most repos that start with
github.com/ipfs/go-*
- Build produces no binaries
- Contains no Go submodules
- Includes all supported "official" interfaces and implementations
- Unsupported and experimental code can live elsewhere, once they "graduate" they are moved into the
go-libipfs
repo for long-term maintenance
- Unsupported and experimental code can live elsewhere, once they "graduate" they are moved into the
- High code quality bar
- Careful consideration of cross-package dependencies
- Consumes other libs like IPLD, multiformats modules, libp2p, etc.
- Roll up most repos that start with
- go-libdatastore
- Datastore interfaces and supported implementations
- This is its own repo to avoid circular dependencies with libp2p
- TODO can libp2p be refactored to remove the circular dependency? Also, Go tolerates circular module dependencies, so why specifically is that bad?
- (list of reasons added by mvdan)
- Impossible to require one module without the other, in either direction.
- Updating both modules becomes a trickier dance: modify A, modify B, update A's dependency on B, update B's dependency on A
- The module dependency graph becomes a "downward spiral" bouncing between A and B, meaning your dependency graph will grow over time
- (list of reasons added by mvdan)
- go-ipfs
- Thin layer that consumes
go-libipfs
and produces theipfs
binary - Could be some other name for the Go IPFS implementation
- Thin layer that consumes
- go-ipfs-gateway
- Experimental gateway implementation that also consumes
go-libipfs
- Experimental gateway implementation that also consumes
Other consumers of go-libipfs
include libp2p (datastore) and Filecoin and IPFS cluster and ipfs-lite, and the IPFS examples.
What about consumers of repos we want to remove/archive? How do we roll this out?
go-libp2p did something similar a couple years ago, largely avoiding breaking consumers by shimming out existing repos to point to the consolidated one, example: https://github.com/libp2p/go-libp2p-protocol/blob/master/protocol.go
We can use this same trick to incrementally consolidate without breaking consumers.
See e.g. this PoC of moving go-namesys into go-ipfs while preserving backwards compatibility (in reality we'd move it to go-libipfs):
There may be some cases where this isn't possible without breaking changes.