Iterate large directories efficiently with python.
python-getdents
is a simple wrapper around Linux system call getdents64
(see man getdents
for details). Here's some study on why ls
, os.listdir()
and others are so slow when dealing with extremely large directories.
- Verify that implementation works on platforms other than
x86_64
.
pip install getdents
python3 -m venv env
. env/bin/activate
pip install -e .
ulimit -v 33554432 && py.test tests/
Or
ulimit -v 33554432 && ./setup.py test
from getdents import getdents
for inode, type, name in getdents('/tmp', 32768):
print(name)
import os
from getdents import *
fd = os.open('/tmp', O_GETDENTS)
for inode, type, name in getdents_raw(fd, 2**20):
print({
DT_BLK: 'blockdev',
DT_CHR: 'chardev ',
DT_DIR: 'dir ',
DT_FIFO: 'pipe ',
DT_LNK: 'symlink ',
DT_REG: 'file ',
DT_SOCK: 'socket ',
DT_UNKNOWN: 'unknown ',
}[type], {
True: 'd',
False: ' ',
}[inode == 0],
name,
)
os.close(fd)