Tags: suorcd/bees
Tags
Bees v0.7 This is a long overdue maintenance release collecting some years of bug fixes. There are no bees metadata format changes in this release. Highlights: * Remove 8-CPU thread limit * Add kernel bugs reference table to docs * Workarounds for btrfs send and balance issues * Reduce the number of temporary inodes created * Use posix_fadvise to optimize page cache usage * Use private namespace for mounts under systemd * Assorted bug fixes and small performance improvements * SIGTERM handler to save crawl state, hash table, and exit * Higher ref limits per extent on kernels with LOGICAL_INO_V2 Build dependency changes: * Convert docs to Github Flavored Markdown * Updates for new compilers including clang * Remove dependencies on libbtrfs-dev and uuid-dev * Remove unversioned `libcrucible.so` shared library Shortlog: Andrey Brusnik (1): fs: Change array syntax to pointer syntax Jiahao XU (7): Add new options MOUNT_OPTIONS Modify systemd unit and beesd.in to use private mnt namespace Further sandbox beesd using systemd.exec options Update comment in beesd@.service.in Fix typo when setting default val of MOUNT_OPTIONS in beesd.in Update default MOUNT_OPTIONS beesd.in Rm MOUNT_OPTIONS for it is of no use and dangerous Kai Krakow (9): Update references to Gentoo Makefile: Specify version when building from tarball Makefile: Use the jobserver properly Makefile: mkdir .depends only when needed Makefile: Bring back -O3 in a downstream-compatible way crucible: Try repairing a build failure around swap macro Makefile: Fix git usage for non-git source archive bees-context: Remove confusing log message bees: Avoid unused result with -Werror=unused-result SeerLite (1): install.md: Update Arch Linux instructions Zygo Blaxell (168): README: split into sections, reformat for github.io Merge remote-tracking branch 'nilninull/master' docs: add "what to do when something goes wrong" page docs: add coredumpctl src: add bees-version.new.c to .gitignore hash: reduce hash table extent size to 128KB scripts: use multiples (not power) of 128K hash: remove pointless copy roots: do not allow transid_min to be numeric_limits<uint64_t>::max() roots: do not accept 18446744073709551615 as max_transid in beescrawl.dat roots: fix subvol scan rollover on subvols with empty transid range context: serialize LOGICAL_INO calls bees: drop unused member m_uuid context: cache result of home_fd() roots: simplify BeesRoots::transid_max_nocache Revert "roots: simplify BeesRoots::transid_max_nocache" roots: reimplement transid_max_nocache using extent tree root context: better detection for toxic extents scripts: put AL16M back to avoid breaking existing scripts hash: remove preloaded toxic hash blacklist context: remove limit on the number of references to an extent fs: support LOGICAL_INO_V2 docs: toxic extents and btrfs send stats: streamline add_count fs: remove thread_local stor 10000 age fs: if search fails, return empty result set resolver: don't log hash collision incidents workarounds: add workaround for btrfs send docs: reorganize options, add workaround for btrfs send bees: soft-limit computed thread counts to 8 docs: working with `btrfs send` is kind of a feature main: single BeesContext instance per process roots: improve "RO root 6094" message docs: derive docs/index.md from README.md README: reintroduce new btrfs-send-compatibility workaround docs: add instructions for Ubuntu 18.10 docs: use bash "type -p" because dash isn't useful docs: dash more useful than previously believed roots: quick fix for task scheduling bug leading to loss of crawl_master tempfile: drop the fsync() task: add cancel method process: ntoa function for signals time: separate sleep time calculation from sleep_for method bees: handle SIGTERM and SIGINT, force immediate flush and exit hash: clean up comments, audit for bugs build: make libcrucible a static library docs: tested with GCC 6.3.0 docs: bees can stop now process: SIGUNUSED is deprecated bees: make exceptions less prominent in log output docs: describe expected exceptions and impact of exception handling docs: add Gotcha for SIGTERM docs: add some notes about interactions with balance task: queue and run exactly once per instance docs: event counter documentation status: report number of active worker threads in status output docs: update kernel compatibility page, now recommending 5.0.4 README: highlight DATA CORRUPTION WARNING docs: update btrfs feature interaction status for flushoncommit and SSD caching layers docs: tested build with btrfs-progs 4.20.2 bees: don't try to print si_lower and si_upper BtrfsExtentWalker: use a buffer at least as large as a btrfs metadata page to avoid EOVERFLOW fs: do not emulate extent-same by clone lib: fix non-local lambda expression cannot have a capture-default lib: add cityhash function hash: prepare for user-selectable hash functions process: Fix gettid() ambiguity with glibc >= 2.30 context: workaround to prevent LOGICAL_INO and btrfs balance from running concurrently docs: update known kernel bugs list bees: initialize context in the correct order bees: replace uncaught_exception(), deprecated in C++17 docs: use Github Flavored Markdown with table extension docs: update kernel bug tracking for October 2020 docs: fix table formatting for kernel bugs list docs: improve send workaround text, add references to backref commits, make grammar more good now docs: expand the tree mod log issues docs: btrfs-kernel: 4.20 adds 32-bit single convert bug, tree mod log issue Zygo#4 stats: remove nonsense dedup_unique_bytes stat task: make it build with clang process: make it build with clang bees: make it build with clang bees context: make it build with clang extentwalker: make it build with clang clang: fix struct/class declaration/definition mismatches chatter: make it build with clang roots: make it build with clang bees: move usage message out of source file and fix a few inaccuracies roots: report the search parameters on tree search ioctl error roots: separate crawl sizes into bytes and items fs: always use container's actual size not requested size fs: make operator<() for search ioctl inline tempfile: remove old comments about fsync and deadlock bugs context: move prealloc dedupe to a separate Task fs: don't zero-fill btrfs data containers string: second argument to stoull is technically a nullptr lib: introduce Pool, a class for storing reusable anonymous objects context: move TempFile from TLS to Pool and fix some FdCache issues tempfile: remove size limit in realign() context: fix shutdown log messages identifying the wrong thread include: #undef crc32c lib: don't rebuild libcrucible unless there is a version change test: rebuild the tests if libcrucible.a changes cache: clean up pointer mangling and duplicate code cache: remove unused #includes fd: move relative path string to library lib: namedptr: thread-safe reference counted named object store src: use correct flags for compiling .c files, fix missing dependencies fd: deprecate Resource in favor of NamedPtr fs: add support and workarounds for btrfs fs_info v2 fs: deprecate vector<char> fs: remove buffer overrun check in get_struct_ptr for non-copying containers lib: introduce Spanner, a pointer and size delimiting a range fs: use Spanner to refer to ioctl arg buffer instead of making vector copies resolve: add bees.h constants for balance and logical_ino serialization bees: remove si_addr_lsb from siginfo debug message to fix FTBFS build: include localconf everywhere lib: fs: stop using libbtrfs-dev helper functions to re-enable buffer length checks docs: remove libbtrfs-dev as a build-time dependency docs: btrfs-kernel: add the 5.10 performance regression, the Ctrl-C on balance kernel crash has been fixed docs: btrfs-kernel: update recommended kernels list, slow backrefs bug has been backported uuid: drop dependency on uuid.h docs: drop incomplete build recipe for ubuntu 14.04 docs: note that FIEMAP is also affected by backref performance issue ntoa: fix bits_ntoa formatting and error handling ntoa: fix comment disparaging gcc for not implementing C99 compound literals in C++ context: get rid of all instances of pthread_cancel context: get rid of shared_ptr<BeesContext> in every single cached Fd object src: bees depends on libcrucible.a process: SIGCLD is not portable options: remove default 8 CPU thread limit pool: use weak_ptr to run destructor earlier fd: make the close method on IOHandle private crucible: use '#include "crucible/...' everywhere test: fd: note when bad cast exception is expected chatter: add option to remove log level prefix docs: finally concede that the consensus spelling is "dedupe" docs: btrfs-kernel: add the extent ref hash bug task: serialize Task execution when Tasks block due to mutex contention task: track number of Task objects in program and provide report task: replace waiting state with run/exec counter task: handle thread lifecycle more strictly context: report Task instance count roots: clean up crawl_master cache: emit log messages when clearing FD cache bees: use helper function for readahead bees: misc comment updates context: track record extent reference counts roots: split constructor into separate start method context: remove unnecessary copies bees: use a reserved symbol name in BEESLOG bees: increase StringFile size limit bees: trace and log improvements during roots and context startup roots: add a TRACE for transid_max search and crawl_transid thread docs: update kernel bugs table as of 5.12.3 tracer: annotate both ends of the stack trace extentwalker: fix the hole position logic extentwalker: fix missing characters extentwalker: fix the binary search and add some debug infrastructure trace: move BeesTrace and BeesNote into their own translation unit trace: current_exception() is not a replacement for uncaught_exception() task: set the name of consumer threads so it is not "load_tracker" context: stop creating new refs when there are too many already fiemap: don't force flush so we can see the delalloc shenanigans context: calculate TOTAL RATES correctly fs: avoid unaligned access when copying btrfs search headers bees: readahead() in the kernel is posix_fadvise(..., POSIX_FADV_WILLNEED) hash: use POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED docs: add `readahead_` event group nilninull (1): FIX: The systemd service file is always installed rsjaffe (1): systemd service replace deprecated parameters
bees v0.6.5 Make clang builds work. Zygo Blaxell (8): extentwalker: make it build with clang task: make it build with clang bees: make it build with clang bees context: make it build with clang clang: fix struct/class declaration/definition mismatches chatter: make it build with clang roots: make it build with clang build: include localconf everywhere
main: single BeesContext instance per process After weeks of testing I copied part of a change to main without copying the rest of the change, leading to an immediate segfault on startup. So here is the rest of the change: limit the number of BeesContexts per process to 1. This change was discussed at Zygo#54 (comment) but there are more reasons to do it now: the candidates to replace the current hash table format are less forgiving of sharing hash tables, and it may even become necessary to have more than one hash table per BeesContext instance (e.g. to keep datasum and nodatasum data separate). Signed-off-by: Zygo Blaxell <bees@furryterror.org>
bees v0.6.1 Several bug and build fixes. Kai Krakow (2): Makefile: Specify version when building from tarball crucible: Try repairing a build failure around swap macro Zygo Blaxell (5): src: add bees-version.new.c to .gitignore roots: do not allow transid_min to be numeric_limits<uint64_t>::max() roots: do not accept 18446744073709551615 as max_transid in beescrawl.dat roots: fix subvol scan rollover on subvols with empty transid range context: cache result of home_fd() rsjaffe (1): systemd service replace deprecated parameters
Bees v0.6 This release brings some significant performance improvements. This release exists so we can refer its beeshome file format as "the bees v0.6 format." This format is three years old and is currently blocking further performance improvement. Future bees versions may import data from this format, but no support for downgrades is planned. Highlights: * Fixed a bug in extent mapping that was causing severe performance loss * Multi-threaded parallel execution * Dynamic thread pool size based on system load average * Automatically adjusts scan polling interval to match filesystem update rate * Subvol parallel scan modes: lockstep (0), independent (1), and sequential (2) * Fixes for ARM, Gentoo, systemd compatibility Shortlog: Kai Krakow (75): crucible: Allow setting a relative path option for name_fd() getopt: Add logic to set relative path from $CWD Remove filter path logic from frontend script Fix example config for timestamp logging Fix indentation/alignment after integration Remove process forking from frontend script Make clear that options must be supplied in one variable Fix a fallthrough error in GCC 7+ Fix a fallthrough error in GCC 7+ Don't zap localconf in "make clean" Add scripts to "make all" target Generalize sed invocation rule systemd: Don't start in system-update.target systemd: Don't start without essential system services systemd: Provide URL and better description Makefile: depend install_scripts on scripts Makefile: let "make install" install the complete distribution Add beesd@.service to gitignore Makefile improvement Installation: Prepare README Installation: Add new section to README Makefile: Document Makefile changes Installation: Add Arch Linux instructions Makefile: Document scripts/beesd Installation: Document optional dependency on blkid Installation: Introduce DESTDIR into Makefile Installation: Improve filesystem layout flexibility Installation: Keep version tag in a variable Installation: Fix soname QA warning in Gentoo Installation: -fPIC should not be used unconditionally Installation: Add Gentoo ebuild Installation: Remove superfluous cruft from Gentoo ebuild Installation: Depend Gentoo ebuild on markdown Makefile: Fail gracefully if markdown is not installed Makefile: depends.mk is not an optional include Makefile: .o already depends on its .h file Makefile: generalize .so target Makefile: rename OBJS to CRUCIBLE_OBJS Makefile: fix dependency generation Makefile: Generalize the .version.cc target Makefile: do not be verbose about mv Makefile: speedup dependency generation Logging: Add log levels to output Makefile: Some cleanups Makefile: Fix some dependencies Logging: Improve text layout when discarding log timestamps Makefile: -lXXXXX is really a filename parameter Makefile: force rebuilding tests when Makefile changed Makefile: Get rid of test for-loop README: Fix markdown syntax error README: Some things are simply no longer true Cmdline: Rename "notimestamps" to "no-timestamps" Cmdline: Rename "relative-paths" to "strip-paths" Cmdline: Fix text alignment README: Add notes about packaging Makefile: Unclutter "make test" output Code style: Fix wrong indentation Makefile: remove tests from "make all" Makefile: Run install tests only for default target "reallyall" Makefile: make installing libs a separate target Makefile: Allow installation of fiemap/fiewalk support tools Makefile: Auto-detect systemd unit path Scripts: Fix systemd unit not being templated Installation: Remove USR_PREFIX from Makefile Compilation: Let the code know about package config Makefile: .version.o is made from a generated file Scripts: Don't prefix timestamps when running with systemd Makefile: create a template compiler Makefile: Due to VPATH, libcrucible links to hard-coded libuuid path Makefile: "which" is not portable Makefile: Do not force making README.html Makefile: Do not force optimizations by default Gentoo: Rework Gentoo ebuild into overlay beesd: Fix the wrapper not finding any config file contrib/gentoo: Update ebuild Timofey Titovets (5): Fix: exec bees - breaks bash trap handling of umount bees workdir Fix: exec bees - breaks bash trap handling of umount bees workdir Make beesd -h useful Update options in sample config Rewrite beesd arg parser Zygo Blaxell (83): Merge remote-tracking branches 'kakra/feature/add-relative-path-option' and 'kakra/integration' hash: reduce mutex contention using one mutex per hash table extent Merge remote-tracking branch 'nefelim4ag/master' crucible: add cleanup class lockset: drop unused method wait_unlock crucible: resource: remove excess locking crucible: resource: optimize map cleanup roots: remove open_root_cache correctly subvol-threads: increase resource and thread limits Makefiles: don't append to depends.mk.new test: add -lpthread to Makefile bees: clean up #if 0 ... fsync ... #endif code README: update dependencies and Linux kernel bugs list crucible: add Task class roots: remove dead code and #if blocks crucible: remove unused TimeQueue and WorkQueue classes roots: scan in parallel using Tasks crawl: implement two crawler algorithms and adjust scheduling parameters README: describe the scanning mode (-m option) hash: do the mlock after loading the table crucible: cache: linked-list LRU implementation counters: track pair growing time crawl: make logging less verbose task: allow external access to Task print function BeesNote: if thread name was not set, get it from Task or pthread_getname_np logging: get Task names for log messages roots: update Task print functions for new usage Merge remote-tracking branch 'kakra/proposal/prepare-for-more-libs' BeesNote: thread naming fixes Task: convert print_fn to a string time: drop unused Timer methods types: don't throw an exception when it's likely we are already reporting an exception BeesBlockData: don't leak file contents in the log README: update Linux kernel bugs list (v4.14) bees: drop BEESINFO roots: comment updates and general cleanup time: add RateEstimator, a class for optimally polling irregular external events roots: use RateEstimator to track transids crawl: combine two messages per crawl cycle into one FdCache: clear cache on every new transid / crawl cycle roots: use RateEstimator as a transid_max cache and clean up logs roots: poll every 10 transids scan: fix length mismatch exception for prealloc extents at EOF ExtentWalker: increase efficiency for typical btrfs extent sizes roots: move common code for creating crawl Tasks into a method roots: add scan-mode 2 "oldest crawler first" README: add scan-mode 2 and expand descriptions of modes 0 and 1 resolve: drop support for old-style compressed BeesAddr context: improve toxic match logs time: add update_monotonic to RateEstimator task: allow user access to ID and default constructor log: BEESLOGNOTE doesn't do what we think it does crawl: don't block a Task waiting for new transids roots: determine transid_max without open()ing every subvol root crawl: filter extents correctly crawl: somebody should set max_transid extentwalker: remove wrong constraint check roots: get rid of common error messages, add more error counters cache: release lock before clearing README: clarify that bees is not to be used on old kernels README: FD caches are now cleared every 10 transactions resolve: break up long intra-extent dedup loops crucible: MAP_32BIT is not defined on ARM crucible: progress: a progress tracker for worker queues BeesBlockData: fix data type issues stats: rename "chase_wrong_data" to "chase_no_data" fs: fix FTBFS on GCC 8 tempfile: update comments around bees_sync bees: revert TOXIC_INTERVAL back to pre-4.14 levels crawl: use custom order instead of (ab)using BeesFileRange::operator< README: update known bugs and issues list crucible: error: record location of exception in what() message crucible: progress: drop the set() method README.md: update build-deps context: log dedups with single unbroken log message bees: configurable log verbosity bees: use readahead instead of posix_fadvise bees: dynamic thread pool size based on system load average roots: if queue is full run again README: spell 'available' correctly bees: add -G/--thread-min option for minimum thread count Merge Zygo#62 extentwalker: don't fetch absurd numbers of extents just to throw them away
Bees v0.5 Headlines: * ignore files with FS_NOCOW_FL attribute set * fix toxic extent detection * doc updates for Ubuntu builds and kernel v4.14 * now builds on GCC 7+ * more bundled systemd glue * minor performance improvements Coenraad Loubser (1): Verbatim Ubuntu build instructions Kai Krakow (18): Skip nocow files to speed up processing Enable detect of markdown binary Change README.md reflecting nodatacow inode attribute Move bees to libexec install dir Bees is meant to be run as root only Make config example more clear Explicitly mark systemd unit as Type=simple Adjust service restart and shutdown behavior Adjust CPU and IO shares when running under systemd Allow custom libexec location Fix libexec prefix discrepancy Implement getopt options parser Add option for prefixing timestamps Update README after integrating new features Add beesd generated script to gitignore Fix naming Make service starter accept bees options Fix a fallthrough error in GCC 7+ Timofey Titovets (2): Makefile add scripts target for correctly packaging Update btrfs compression types, add ZSTD, drop LAST Zygo Blaxell (36): crucible: cache: no need to use explicit lock type crucible: fs: keep ioctl buffer between runs bees: limit FD cache size explicitly bees: change formatting for physical bytenr ranges in dedup bees: types: improve serialization of byte ranges bees: remove file open serialization mutex bees: use C++11 syntax for constant initializers bees: make a thread note when we read data bees: handle trace functions that throw exceptions bees: time tmpfile create and copy operations bees: drop unused constants bees: trace calls to BeesResolver crucible: lockset: track lockers and use handle type [bees master branch edition] bees: use handle type for hash table extent locks Merge branch 'master' of git://github.com/kakra/bees crucible: add ioctl_iflags_set to complement ioctl_iflags_get bees: use ioctl_iflags_get and ioctl_iflags_set instead of opencoded versions roots: move flags check after file identity checks and make error message style consistent README: remove stray whitespace Makefile: add test to PHONY list README: update list of currently known kernel bugs bees: drop unused BeesWorkQueue classes log: simplify output for dedup and scan tmpfiles: note that kernel race condition is not yet fixed roots: trace transid_max calculation roots: drop open_root_nocache log entry Merge remote-tracking branch 'kakra/feature/markdown-detection' Merge remote-tracking branch 'kakra/master' chatter: use static function to control timestamping behavior main: use static function to control timestamps in log output makeflags: fix missing -D_FILE_OFFSET_BITS=64 in comment scan: insert toxic matched extents into hash table as they are discovered error: drop redundant CHECK_CONSTRAINT Merge remote-tracking branch 'nefelim4ag/master' README: update the state of bees and the kernel for v4.14 Makefile: if multiple Markdown utilities are present, use the first one
Bees v0.4 Fixed an iterator-invalidation bug in the FD cache which could lead to a crash. Cached file descriptors are now released periodically to avoid a situation where bees would continue to dedup large files long after they had been deleted. This is the same approach used to prevent bees from indefinitely delaying subvol deletes. Merged contributions from Github (thanks!). Paul Jones: * Remove reference to *.c files in Makefile On Gentoo it errors out because there is no *.c Timofey Titovets: * Scripts: Remove code for short path name in log * Add install subcommand to make * Add install_scripts subcommand to make * Add help section to Makefile * Add filter to remove time from bees output * Make filters configurable * Makefile: make service install compatible with debian systems * Check: if disk with UUID are btrfs by blkid * Bees: fix [-Werror=implicit-fallthrough=] Zygo Blaxell: * trivial: mass purge of whitespace errors * crucible: remove ArgList and drop the unimplemented interpreter classes * crucible: remove unused execpipe * README: "btrfs: improve delayed refs iterations" has been merged into v4.10-rc1 * build: move BEES_VERSION to a separate C file to avoid unnecessary building * crucible: get rid of DefaultBool, just use C++11 initializer syntax * crucible: LockSet: add a maximum size constraint * README: update copyright year, remove some obsolete statements * main: ArgList would silently drop the first argument * main: count arguments correctly * crucible: extentwalker: add compressed() and bytenr() methods * hash: make thread status message more consistent * hash: remove the unused m_prefetch_rate_limit * hash: prevent eleventy-gigabyte core dumps * bees: fix deadlock in thread status reporting * crucible: time: fix uninitialized member * crucible: rework the Resource class * lib: add a version string * src: Update bees-version.c more often * crucible: cache: construct return value before releasing lock * crucible: fix further instances of copy-after-unlock bug * bees: fix further instances of copy-after-unlock bug * bees: clean up statistics class * crucible: cache: clean up use of iterators * context: purge FD cache every COMMIT_INTERVAL
Bees v0.3 * Add bash and systemd wrappers (Timofey Titovets) * Optimize CRC64 implementation (Paul Jones) * GCC native optimization flags (Paul Jones) * Change extent scan order to ensure initial scan completion with rolling snapshots * Fix failure to delete unique shared blocks * Remove some useless memory initialization * Work around btrfs fsync bug * Fix support for 32-bit platforms * Fix builds on Ubuntu * Don't fail at startup if hugepage support not enabled in kernel * Remove (failed) experimental shared hash table feature
PreviousNext