-
Notifications
You must be signed in to change notification settings - Fork 16
CLI utility to find duplicate files
License
jvirkki/dupd
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
dupd is a file duplicate detection CLI utility. The build currently supports Linux (tested on Debian), Solaris (tested on OpenIndiana) and Mac OS X. Other UNIX variants should work with minimal changes. dupd seeks to be fast and efficient. For most file sets it performs better than popular tools such as 'fdupes'. See this article for some performance comparisons: http://www.virkki.com/jyri/articles/index.php/duplicate-file-detection-performance/ Quick howto: % dupd scan --path $HOME (or whichever path you want to scan) % dupd report dupd supports a slightly different approach on dealing with duplicates than other similar tools I have tried. Other tools tend to fall into one of these two styles: 1. Tools which produce a long text report and you're on your own. This works ok but can be overwhelming when the number of files is huge. 2. Tools which are eager (sometimes dangerously overeager) to actively delete duplicates. I find this far too risky (dupd will never delete files directly). While dupd can be used in mode #1 ('dupd report') and it can be sort of coaxed into mode #2 ('dupd rmsh'), dupd is best when used in an interactive exploratory style. I generate a scan of the entire filesystem first, which dupd saves into an SQLite database. I then drill down into subdirectories where duplicates live and using the ls/dups/uniques/file operations, I review where duplicates are coming from and why. Then I move or remove files around as desired and then analyze other subdirectories. After a few rounds of such cleanup, I then re-run a whole scan to refresh the database. For more information read the USAGE document or run 'dupd help'.
About
CLI utility to find duplicate files
Topics
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published