Assignment No.1 - To create a Lip Sync Video

Repo Structure

checkpoints - weights of the pre-trained model
data_files - input video and target audio to be synced
face_detection - a model to detect faces in a frame (ref. https://github.com/Rudrabha/Wav2Lip)
models - SOTA model for the lipsync task Wav2Lip (ref. https://github.com/Rudrabha/Wav2Lip)
requirements.txt - packages pinned to be installed
sync.py - the actual Python script to be run by the end-user
utils.py - a utility script for sync.py

Steps to follow (Recommend using a UNIX with a conda environment)

Create a virual environment conda create --name listed_1 python=3.6
Activate the environment conda activate listed_1
Clone the repo git clone https://github.com/adityagandhamal/Assgn1.git
run cd Assgn1
run pip install -r requirements.txt [Note: Go on installing each package if the process gets stuck (happens usually while building dependency wheels)]
Download the face detection model and place it in face_detection/detection/sfd/ as s3fd.pth
Download the weights of the pre-trained model Wav2Lip + GAN and place the file in checkpoints
Run python sync.py
You'll obtain an output listed_out.mp4

Sample Input and Output

Input Video

vid_in_trim2.mp4

Target Audio

output10_trim.mp4

Output Result

download.1.mp4

Disclaimer:

The sample above is just a demo to get a notion of the task while the actual output video has been attached as a drive link in the mail. Keeping in mind the limitations of the pre-trained model and the scope of the input video (as the subject is seen to be disappearing from the scene frequently), the video and the audio are both trimmed using a third-party website.

Also, make sure to run the code on a GPU instance as the process is killed on a CPU. The following attached is the proof of the same.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Assignment No.1 - To create a Lip Sync Video

Repo Structure

Steps to follow (Recommend using a UNIX with a conda environment)

Sample Input and Output

Input Video

Target Audio

Output Result

Disclaimer:

Hence, the code is run in a Colab Notebook

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
checkpoints		checkpoints
data_files		data_files
face_detection		face_detection
models		models
README.md		README.md
requirements.txt		requirements.txt
sync.py		sync.py
utils.py		utils.py

adityagandhamal/Assgn1

Folders and files

Latest commit

History

Repository files navigation

Assignment No.1 - To create a Lip Sync Video

Repo Structure

Steps to follow (Recommend using a UNIX with a conda environment)

Sample Input and Output

Input Video

Target Audio

Output Result

Disclaimer:

Hence, the code is run in a Colab Notebook

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages