This repository contains the artifact for evaluating DRACE, a DRAM-Centric metadata service for distributed file systems.
Title: DRACE: A DRAM-Centric Metadata Service for Distributed File System
Metadata service (MDS) is an important component of distributed file system (DFS). Traditional MDS optimizations often adopt the scale-out solution (i.e., increasing the number of MDSes) to meet the performance requirement in large-scale computing scenarios. However, we argue that existing MDS designs do not exploit the full potential of each MDS node. The main reason is that they follow the storage-centric idea, that is, they only consider DRAM as a cache, which introduces problems of bandwidth contention, high data persistence overhead, heavy metadata index, and low NUMA scalability.
On the contrary, we observe that all metadata can be stored in DRAM, which leads us to use the DRAM-centric design to unleash the scale-up potential of MDS. Specifically, we propose DRACE, a DRAM-Centric metadata service for the metadata server equipped with large DRAM and powerful CPUs. First, a DRAM-centric metadata layout is designed for fast metadata indexing and persisting. Then, a lock-free namespace organization is proposed to ensure high concurrency for metadata access. Finally, a fine-grained NUMA-aware partition scheme is customized to improve scalability under multi-NUMA architecture.
Our evaluations indicate that DRACE consistently outperforms state-of-the-art MDS solutions, on file operations our throughput speedups are 5.54-95.02x, 4.65-28.87x, 15.91-47.42x for creation, stat, and removal respectively. Other experiments show that DRACE has outstanding scalability and supports different types of storage devices well.
For detailed artifact evaluation instructions, please refer to ARTIFACT_EVALUATION.md.
# Install dependencies (Ubuntu 22.04)
sudo apt update && sudo apt install -y git cmake ninja-build gcc-11 g++-11 \
liburing-dev libnuma-dev python3-toml
# Clone and build
git clone --recurse-submodules https://github.com/nsccgz-storage/drace.git
cd drace
./scripts/build.sh
# Mount hugepages (required)
sudo ./scripts/mount-hugepage.sh
# Run basic test
./scripts/local/setup-test-servers.sh
# In another terminal:
export LD_PRELOAD=$PWD/build/src/client/libdfsclient.so
echo "Hello DRACE" > /tmp/test.txt
cat /tmp/test.txt
- DRAM-Centric Design: All metadata stored in DRAM for optimal performance
- Lock-Free Namespace: High concurrency metadata operations
- NUMA-Aware Partitioning: Optimized for multi-NUMA architectures
- High Performance: 5.54-95.02x speedup over state-of-the-art solutions
drace/
├── src/ # Source code
│ ├── client/ # Client-side POSIX wrapper and RPC client
│ ├── server/ # Metadata and data server implementation
│ ├── common/ # Shared utilities (config, logging, metadata structures)
│ ├── rpc/ # eRPC-based communication layer
│ ├── lock/ # Distributed lock implementation
│ └── threading/ # CPU affinity and threading utilities
├── include/dfs/ # Public headers
│ ├── client/ # Client API headers
│ ├── server/ # Server interfaces (abstract_data, metadata, dcache)
│ ├── common/ # Common data structures and utilities
│ └── rpc/ # RPC interface definitions
├── proto/ # FlatBuffers schema definitions
├── scripts/ # Automation and utility scripts
│ ├── local/ # Single-node testing scripts
│ ├── cluster/ # Multi-node deployment scripts
│ ├── exps/ # Experiment automation scripts
│ ├── utils/ # Data processing utilities
│ └── perf/ # Performance analysis tools
├── conf/ # TOML configuration files
│ ├── local/ # Single-node configurations
│ ├── two-nodes/ # Two-node test configurations
│ └── nvm-cluster/ # Production cluster configurations
├── test/ # Unit and integration tests
├── benchmark/ # Performance benchmarks
│ ├── mpi-tile-io.c # MPI I/O benchmark
│ └── hashmap-bench.cc # Metadata operation benchmark
├── third_party/ # External dependencies (eRPC, FlatBuffers, etc.)
├── ansible/ # Deployment automation playbooks
└── docs/ # Documentation
├── ABOUT.md # Project overview
├── md-structure.md # Metadata server design
└── perf.md # Performance tuning guide
The artifact supports the following experiments:
- Metadata Performance: Throughput of create, stat, and removal operations
- Scalability Analysis: Thread and client scalability tests
- NUMA Efficiency: Performance across NUMA nodes
- Storage Backend Comparison: Performance with different storage devices
- CPU: Multi-core x86_64 processor (preferably multi-NUMA)
- RAM: Minimum 16GB, recommended 128GB+ for full evaluation
- Storage: 100GB free space for experiments
- Network: Gigabit Ethernet (100Gbps for cluster experiments)
For questions about the artifact evaluation, please open an issue in this repository or contact the authors.
This project is licensed under the Apache License 2.0. See LICENSE file for details.