Facial Emotion Detection Hackathon Project, Create a model and test it using 5 to 10 second videos to detect emotions
Please watch the demonstration video along with testing the app as in the video, I discuss an alternate approach to doing this which I was not able to implement in time. In the video, I used clips from the CREMA-D dataset. CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified). Actors spoke from a selection of 12 sentences. The sentences were presented using one of six different emotions (Anger, Disgust, Fear, Happy, Neutral, and Sad) and four different emotion levels (Low, Medium, High, and Unspecified).
Member: Suhail Ahmed | suhailz13ahmed@outlook.com
Deployed Link: https://fer-dds-mlx.streamlit.app
I implemented a CNN model and trained it on the FER 2013 partial daataset for 75 epochs.
The model is built using the Keras Sequential API. It consists of multiple layers of convolutional neural networks, batch normalization, max-pooling, and dropout layers to prevent overfitting. The architecture is designed to progressively extract higher-level features from the input images.
-
Input Layer: The input layer expects images of shape (48, 48, 1), which corresponds to grayscale images of size 48x48 pixels.
-
Convolutional Layers: These layers use 3x3 filters to convolve the input and extract features. The activation function used is ReLU (Rectified Linear Unit).
-
Batch Normalization: This layer normalizes the outputs of the previous layer to stabilize and accelerate the training process.
-
Max-Pooling Layers: These layers downsample the input by taking the maximum value in each 2x2 pool, reducing the spatial dimensions.
-
Dropout Layers: These layers randomly drop a fraction of the units during training to prevent overfitting.
-
Flatten Layer: This layer flattens the 3D output from the convolutional layers into a 1D vector, which is fed into the dense (fully connected) layers.
-
Dense Layers: These layers perform the final classification. The last dense layer uses a softmax activation function to output probabilities for each of the seven emotion classes.
The model is compiled using the Adam optimizer with the specified learning rate. The loss function used is categorical cross-entropy, which is suitable for multi-class classification problems. Accuracy is used as the evaluation metric.
Three callbacks are used during training to improve performance and prevent overfitting:
ReduceLROnPlateau: This callback reduces the learning rate when the validation loss plateaus, helping the model converge. EarlyStopping: This callback stops training when the validation accuracy does not improve for a specified number of epochs, preventing overfitting. ModelCheckpoint: This callback saves the model weights when the validation loss improves, ensuring the best model is saved.
The model is trained using the fit method. The training data is split into training and validation sets, and the model is trained for the specified number of epochs and batch size. The shuffle parameter ensures that the data is shuffled before each epoch.
The use of a convolutional neural network (CNN) is appropriate for image classification tasks due to its ability to automatically learn spatial hierarchies of features from input images. The multiple convolutional layers with increasing filter sizes help the model capture complex patterns in the data.
Dropout and batch normalization are used extensively throughout the network to prevent overfitting and improve generalization. The l2 regularization on the first layer also helps in reducing overfitting by penalizing large weights.
The Adam optimizer is chosen for its efficiency and adaptive learning rate, which helps in faster convergence compared to traditional stochastic gradient descent.
Please nagivate to the dds directory and once you are in it, run:
streamlit run app.py
Upon uploading you video, it will take some time to process it, depending on the number of pixels and length of the video. After its processing, you will be able to see all the detected emotions per frame and the most detected emotion is the predicted emotion.
You may also scroll below and use the slider to view the emotion detected at a particular frame. This feature can be especially useful when multiple emotions are covered in the same video.
The following libraries are used in this project:
- streamlit for creating the web application.
- cv2 (OpenCV) for image processing and face detection.
- numpy for array manipulation.
- keras for loading the pre-trained emotion detection model.
- tempfile for handling temporary files.
- streamlit_webrtc for handling real-time video processing.
The custom-trained emotion detection model and Haar Cascade face classifier are loaded at the beginning of the script. This ensures that the models are ready for use when processing the video frames.
An emotion count dictionary is initialized to keep track of the number of times each emotion is detected in the video frames.
A custom VideoTransformer class is defined to process each video frame. This class uses the transform method to:
Detect faces in the frame. Predict the emotion for each detected face. Draw a rectangle around each face and annotate it with the predicted emotion label. The emotion counts are updated for each prediction.
The process_video function handles the uploaded video file. It:
Saves the uploaded file to a temporary location. Reads the video frame-by-frame. Processes each frame to detect faces and emotions. Stores the processed frames in a list. Displays the detected emotions and their counts. Provides a slider to navigate through the frames.
The main function sets up the Streamlit interface. It sets the title of the app, provides a file uploader for the user to upload a video file and calls the process_video function if a file is uploaded.
# Decoding Data Science in partnership with Falcons.ai
Objective: Develop an efficient facial emotion classification system employing OpenCV/Tensorflow to identify facial emotions within video streams. The goal is to achieve a high level of accuracy, low latency, and minimal computational overhead.
Similar to:
Data Source: A video dataset or a combination of image datasets featuring the target objects in states of emotion.
Kaggle : https://www.kaggle.com/datasets/msambare/fer2013
Preprocessing (if needed): Standardize or augment the images/video frames to improve model generalization, if necessary, while preserving the aspect ratio and critical features.
Model Selection & Training:
- Using the FER dataset(partial).
- Train a custom model using the prepared dataset and analyze the performance.
- Deploy Streamlit and OpenCV to allow users a web ui in which to upload a video and have the video frames analyzed by your model.
Expecation
The expectations are for the following:
- The code used to train the model.
- The model you trained.
- The Code used to run the UI and upload the video for inference.
This problem set provides a clear path to address image analysis issues using OpenCV, with a focus on Facial Emotion Classification in video streams. It allows researchers or students to hone in on critical aspects such as data preprocessing, model selection, hyperparameter tuning, performance evaluation, and results interpretation.
-------------- Fully functional Jupyternotebook will be added upon hack-a-thon challenge completion --------------
To use the notebook with relative ease please follow the steps below:
-
Ensure all of the required libraries are installed.
-
Load the libraries.
-
Run the cells and the cloud images will be generated and saved in the "clouds" directory.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you want, feel free to fork this repository. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/YourFeature
) - Commit your Changes (
git commit -m 'Add some YourFeature'
) - Push to the Branch (
git push origin feature/YourFeature
) - Open a Pull Request
Project Link: [https://github.com/Falcons-ai/fer_dds_challenge]
Contributing Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you want, feel free to fork this repository. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
Fork the Project Create your Feature Branch (git checkout -b feature/YourFeature) Commit your Changes (git commit -m 'Add some YourFeature') Push to the Branch (git push origin feature/YourFeature) Open a Pull Request