Mounting Barecat archives via FUSE

Background

Barecat is a simple and highly scalable aggregate storage format for storing many (tens of millions and more) small files, with focus on fast random access and minimal overhead. You can think of it as a filesystem-in-file, or as a key-value store. Data is stored sequentially in a flat file (or multiple shard files) and an SQLite database is used to index the data. The in 7309 dex is used to quickly locate the data of a file by its path and to provide directory listings, file statistics, and other metadata. It can handle at least tens of millions of files and terabytes of data, even over 100k files in single directories. Directory listing is written to produce the results in a streaming fashion, so entries will start appearing even in huge directories fairly quickly.

Barecat archives can be mounted via FUSE, allowing it to be used like a filesystem locally. This is useful for browsing the contents of the archive, for reading and writing files. This is mostly for inspecting the data and making smaller changes, but for the main workload (e.g. training a deep learning model), you should use the Python API, which is more efficient as it directly accesses the data without the overhead of FUSE.

Installation

sudo apt-get install libfuse-dev libsqlite3-dev  # or equivalent with other package managers
pip install barecat-mount

Usage

# readonly:
barecat-mount mydata.barecat mountpoint/

# read-write:
barecat-mount --writable mydata.barecat mountpoint/

# unmount:
fusermount -u mountpoint/
# or
umount mountpoint/

A Note on Fragmentation

Since Barecat always adds new files at the end of the archive, many deletions and insertions will lead to fragmentation. The general idea is to write once, read many times, and do deletions only when you need to fix a mistake. There is basic heuristic auto-defragmentation that can be enabled as follows:

barecat-mount --writable --enable-defrag mydata.barecat mountpoint/

This way, the filesystem will periodically defragment itself after significant amount of deletions. You can also perform a defrag with:

barecat-defrag mydata.barecat

This will go in sequence and move all the files towards the beginning of the archive, leaving no gaps. This may take very long, since even closing one byte gap requires moving all the following data. A quick option is available with:

barecat-defrag --quick mydata.barecat

This will proceed backwards, starting from the end of the archive, and will move each file into the first available gap, counted from the beginning of the archive (first-fit). The algorithm stops after meeting the first file that has no gap that can fit it.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
src/barecat_mount		src/barecat_mount
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mounting Barecat archives via FUSE

Background

Installation

Usage

A Note on Fragmentation

About

Uh oh!

Releases 2

Uh oh!

Languages

License

isarandi/barecat-mount

Folders and files

Latest commit

History

Repository files navigation

Mounting Barecat archives via FUSE

Background

Installation

Usage

A Note on Fragmentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Languages