GitHub - moeC137/video-recutter: Takes a video, generates an .srt with vosk-api, then recuts the video for a given text with videogrep/ffmpeg

Notes:

Opetions for .srt generation with timestamps for every word:

https://github.com/googleapis/python-speech/blob/master/samples/snippets/transcribe_word_time_offsets.py or https://github.com/alphacep/vosk-api/blob/master/python/example/test_srt.py

autosub uses google api but dosent give timestamp for every word...maybe fixable https://github.com/agermanidis/autosub

Recuting the video with: https://github.com/antiboredom/videogrep

extracting plaintext from .srt: https://gist.github.com/ndunn219/62263ce1fb59fda08656be7369ce329b#file-srt_to_txt-py

sort unique words for noun, verb, adjective, pronoun, adverb, article, preposition, numeral or conjunction: https://github.com/moeC137/wiktionary-curl-wordtype-checker

python3 test_srt.py myvideo.mp4 > myvideo.srt

(its important that the .mp4 and the .srt have THE SAME name or videogrep wont fint the .srt)

python3 srt_to_txt.py myvideo.srt

(this generates the myvideo.txt)

getting uniqe words from text with numbers:

("sort -n" for non reverse)

getting uniqe words from text WITHOUT numbers:

cat myvideo.txt | grep -o -E '\w+' | tr '[A-Z]' '[a-z]' | sort | uniq > unique_words.txt

sorting words into categories:

while read -ra line; do for word in "${line[@]}"; do sh checker.sh -l German -w $word; done; done < unique_words.txt

videogreping for multiple exact words:

videogrep --input input.mp4 --output output.mp4 --search '\bword1\b|\bword2\b|\bword3\b'

(use --padding for adding extra ms to the clips) ("\b" regex for exact string matching) ########################################################### ########################################################### outdated ########################################################## getting single words from the custom text file: tr -cs 'A-Za-z_' '[\n*]' < custom_text_file.txt

alternativ: while read -ra line; do for word in "${line[@]}"; do echo "$word"; done; done < custom_text_file.txt

(replace "echo "$word";" with command to repeat)

full command: while read -ra line; do for word in "${line[@]}"; do videogrep --input input.mp4 --output $word.mp4 --max-clips 1 --search "\b${word}\b"; done; done < text_test_file.txt ########################################################################### ###########################################################################

create custom_text.txt with ONLY words from unique_words.txt

full command with numeration of outputs: COUNTER=0; while read -ra line; do for word in "${line[@]}"; do COUNTER=$[COUNTER + 1] COUNTER_PRINT="$(printf '%03d' ${COUNTER})" videogrep --input myvideo.mp4 --output ~/video-recutter/combiner/"${COUNTER_PRINT}${word}".mp4 --max-clips 1 --search "\b${word}\b"; done; done < custom_text.txt

(with 00 front zero padding)

stitching everything in the folder together in ffmpeg: cd combiner && for f in *.mp4 ; do echo file '$f' >> list.txt; done && ffmpeg -f concat -safe 0 -i list.txt -c copy stitched-video.mp4 && rm list.txt

(copy this from RAW file else some \\\ are missing!)

(it doses in in alphabetcal order....so i need to add a counter to the naming in the videograp loop)

Workflow:

Start script >> welcoming message "paste youtube link here"
Paste link >> video download with youtube-dl
ffmpeg >> extract and convert audio for vosk-api
generate .srt with timestamp for every word with vosk-api (use test_srt.py and change "words per line to 1")
Shows all usable words
Write text with usable words
Videogrep takes every word, cuts video part and places it in folder
ffmpeq combines all vdieo snippets from folder to final video.

command for extracting all keword timestamps for a word from multiple .srt : grep -B 1 -H "keyword" * > keywords.txt

combine this with https://github.com/moeC137/youtube-clip-taker to download the keyword from multiple videos.

remove gaps in keywords.txt : awk 'NR % 3 != 0' keywords.txt > keywordsV2.txt awk 'NR % 2 != 0' keywordsV2.txt > keywordsV3.txt

pharse timestamps lines from text document to downloader: (thx to https://stackoverflow.com/questions/69982768/how-to-repeat-a-command-for-every-line-in-a-textfile-with-given-arguments-from-t?noredirect=1#comment123711347_69982768 ) #!/bin/bash

infile=$1 # input filename count=1 # filename serial number while IFS= read -r line; do # read the input file assigning line vid=${line:11:11} # video id of youtube start=${line:32:12} # start time stop=${line:49:12} # stop time file=$(printf "clip%03d.mp4" "$count") # output filename (( count++ )) # increment the number echo < /dev/null sh download_youtube.sh "https://www.youtube.com/watch?v=$vid" "$start" "$stop" "$file" done < "$infile"

replace every "," with a "." on the timespamp lines to makes it work wiht milliseconds

grep for multiple words: grep -B 1 -H "wetter|abend" * > multi.txt

remove lines that are to short: sed -r '/^.{,43}$/d' keywords.txt > keywords2.txt

crop to vertical: ffmpeg -i wild.mkv -vf crop=ih*9/16:ih wild_crop.mkv

Use word_slicer.sh to slice clip into words form custom list. no numeration.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
combiner		combiner
mycmd.sh		mycmd.sh
readme.md		readme.md
srt_to_txt.py		srt_to_txt.py
test_srt.py		test_srt.py
word_slicer.sh		word_slicer.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

moeC137/video-recutter

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages