A desktop search engine for your private files,
everything works offline, it never uses the internet.
git clone https://github.com/bittorf/ekuku-search.git
cd ekuku-search
# install needed programs (if any):
./ekuku check_deps
# get help:
./ekuku help
# testrun on a file, e.g.:
./ekuku mime /path/to/file
# let it index a directory:
./ekuku scan /path/to/dir
Scan given directory, extract and enrich metadata, store it
in a database and make it searchable. Also looks inside archives
and supports arbitrary file- and foldernames.
It uncompresses archives, or archives in archives.
For example an 7-zip compressed ISO-file contains a ZIP, which
contains a TAR with a Libreoffice-document which contains a MP4,
which contains audio, video and a picture, which has text in it...
It tries to extract and enrich metadata, e.g.
- audio: extract cover-pictures and generate text-transcription
- video: extract subtitles
- images: extract faces, text, location, camera-model etc.
- ...and a lot more
- using
find
on a directory to extract and insert into (or update) database:- objecttype (e.g. file or dir)
- modification time
- filesize
- /full/path/and/filename
- extract checksum and mimetype of all files in database if not known yet or modification time changed
- for images using magick and tesseract
- for videos using ffmpeg
- for audio using e.g. SoX
- for text using libreoffice
- for binaries using binwalk
- insert into (or update) database
- e.g. temporarily uncompress, unarchive, and/or loop-mount any filesystem
- compressor support for
gzip
,xz
,zstd
,bzip2
and others - archive support for
zip
,tar
,7z
,rar
,lha
and others - filesystem support for
iso
,squashfs
,ext2/3/4
,qcow2
and others- run Job-1/2/3
- remove extraction or mount
- compressor support for
- rexxbot (initially, around 1993) => "bot" in the name is not nice
- https://wireless.subsignal.org/index.php?title=Rexxbot
- poormens-desktop-search-engine => too bulky
- https://wireless.subsignal.org/index.php?title=Poormens_desktop_search_engine.sh
- filebot (already taken: https://www.filebot.net)
- file_cabinet => too arbitrary
- ekuku-bot => bot in the name is not nice
- ekuku-search => ekuku is in wikipedia since ~may 2012
- ^^^^^^^^^^^^ lets use this
### ToDo:
# === loop1 | fastscan ===
# 1) scan directory and get 4 values:
# a) type of object (e.g. file or dir)
# b) modification time
# c) filesize
# d) /full/path/and/filename in [base64] format
# >files.txt
#
# === loop2 | index only new/changed objects ===
# 2) read files.txt and query database for each line
# a) is this quadruple known?
#
# 3) if not known, do deeper analysis:
# a) file?: get sha256
# b) file?: get mimetype
# b) write [type,mtime,size,mimetype,dirname,basename,sha256] to database-table OBJECTS
#
# === loop3 | metadata ===
# 1) lookup database which [size+sha256] have missing metadata
# b) write [size+sha256, json-metadata] to database-table METADATA
#
# === loop4 | archive ISO-unboxing ===
# 1) provide helper
#
# === loop5 | archive unboxing ===
# 2) lookup database which [size+sha256] are unanalysed
# a) mark [size+sha256] as '{archive:IS-IN-WORK@timestamp}' in database-table METADATA
# b) unbox archive and
# c) read each file/dir like loop1
# d) write [type,mtime,size,dirname,basename,sha256sum] to database table UNBOXED
# e) mark [size+sha256] as '{archive:unboxed}'
#
# === loop6 | web-ui ===
# 1) server connections
#
# === loop7 | web-ui-query-completer? ===
# 1) foo
#
# === loop8 | metadate-API check+update ===
# 1) query metadata-plugins and detect database entries with lower metadata API version
TODO: http://wiki.redump.org/index.php?title=Dumping_Guides