Human-object interaction (HOI) problem domain Activity Detection 🏃‍♀️🏃‍♂️

This project aims to detect activities in a video using a TSU and STEP dataset! As a user, you will be using the jupyter notebook, which will allow you to run the code and see the results. but before that please set up the environment!

Do note that more documentation will be in the jupyter notebook!

Set Up 💾

Clone the repository with Submodules

git clone --recurse-submodules https://github.com/ict3104-team14-2022/nvda-ml-activity-detection.git

Setup environment

conda env create --file environment.yml
conda activate activity-detection
jupyter labextension 
8000
install @jupyter-widgets/jupyterlab-manager
python -m ipykernel install --user --name ipykernel-activity-detection --display-name "Python (Activity Detection)"
jupyter-lab

Install Dependencies

ffmpeg

Reinstall

jupyter kernelspec uninstall activity-detection

Exporting Conda Dependencies

conda env export --from-history > environment.yml

WanDB 1️⃣

You can connect the notebook the WanDB to view statistics online! Below is a reference on how to connect, but you can also refer to the jupyter notebook under Training Section to run it!

# Link your wandb.
wandb.login()
# Display your project workspace.
%wandb ict3104-team14-2022/nvda-ml-activity-detection -h 2048

Before You Begin - Here's what you need to know 📝

TSU 🍎 - Toyota Smart Home

Please request for the untrimmed dataset available in the Toyota Smart Home site.

RGB - Videos in MP4 is required for feature extraction.
- Place it in data/TSU/TSU_Videos_mp4/

STEP 🍏 - Spatio-Temporal Progressive Learning for Video Action Detection

This algorithm is more involved, requires more effort to personally acquire the dataset as it is uploaded in YouTube.

Follow the STEP README for installation instructions.
- Install APEX.
- Do not clone STEP as it is included in this repository.
- In a terminal, cd into STEP, and install external packages with python setup.py build develop
Install ffmpeg - Tutorial to add ffmpeg to path in Windows
Download the dataset from YouTube. In the step/custom_utils directory, there are scripts to download the videos.
1. Install yt-dlp :
```
pip install yt-dlp
```
2. To get the list of valid videos, Run:
```
python get_valid_youtube.py
```
3. Download the videos, Run:
```
python download_vids.py
```
4. There may be some copyright issues when downloading videos.
  - In get_valid_youtube.py , comment and uncomment a specified block of code to remove videos that are not available from the train/val annotations.
  - Generate a new ava_train_v2.1_filter.csv and ava_val_v2.1_filter.csv, run:
```
python get_valid_youtube.py
```
5. Move videos into step/datasets/ava/videos
6. Generate labels using Dataset Preparation
```
python scripts/generate_label.py datasets/ava_val_v2.1_filter.csv
python scripts/generate_label.py datasets/ava_train_v2.1_filter.csv
```
  - Move generated labels val.pkl and train.pkl into datasets/ava/label

After preparing the dataset and selecting a pipeline, In JupyterLab, Kernel > Run Selected Cell and All Below.

And you are done preparing STEP dataset! 🥳

[CVPR 2022] MS-TCT  - Multi-Scale Temporal ConvTransformer for Action Detection

Here for paper

Prepare the I3D feature Like the previous works (e.g. TGM, PDAN), MS-TCT is built on top of the pre-trained I3D features. Thus, feature extraction is needed before training the network.
1. Please download the Charades dataset (24 fps version) from this link.
2. Follow this repository to extract the snippet-level I3D feature.
Dependencies Please satisfy the following dependencies to train MS-TCT correctly:

pytorch 1.9
python 3.8
timm 0.4.12
pickle5
scikit-learn
numpy

Quick Start
1. Change the rgb_root to the extracted feature path in the train.py.
2. Use ./run_MSTCT_Charades.sh for training on Charades-RGB. The best logits will be saved automatically in ./save_logit.
3. Use python Evaluation.py -pkl_path /best_logit_path/ to evaluate the model with the per-frame mAP and the action-conditional metrics.

If you need more reference please go to this readme.

And you are done preparing MS-TCT dataset! 🥳

Pipeline Selection 👷

There are three algorithms available in this notebook.

Toyota Smart Home. Toyota Smarthome Untrimmed (TSU) is targeting the activity detection task in long untrimmed videos. Therefore, in TSU, the entire recording when the person is visible. The dataset contains 536 videos with an average duration of 21 mins. The dataset is annotated with 51 activities.
Spatio-Temporal Progressive Learning for Video Action Detection STEP a progressive learning framework for spatio-temporal action detection in videos. To learn more, the poster can be found at Google Drive. For a more in-depth discussion, the paper can be read at arxiv. It uses the AVA Actions v2.1 dataset. The dataset is annotated with 80 activities.
Multi-Scale Temporal ConvTransformer for Action Detection Action detection is a significant and challenging task, especially in densely-labelled datasets of untrimmed videos. Such data consist of complex temporal relations including composite or co-occurring actions. To detect actions in these complex settings, it is critical to capture both shortterm and long-term temporal information efficiently. To this end, we propose a novel ‘ConvTransformer’ network for action detection: MS-TCT.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
MS-TCT		MS-TCT
data/TSU		data/TSU
model		model
step		step
test		test
tsu		tsu
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
activity-detection.ipynb		activity-detection.ipynb
environment.yml		environment.yml
openh264-1.8.0-win64.dll		openh264-1.8.0-win64.dll

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Human-object interaction (HOI) problem domain Activity Detection 🏃‍♀️🏃‍♂️

Set Up 💾

Clone the repository with Submodules

Setup environment

Install Dependencies

Reinstall

Exporting Conda Dependencies

WanDB 1️⃣

Before You Begin - Here's what you need to know 📝

TSU 🍎 - Toyota Smart Home

STEP 🍏 - Spatio-Temporal Progressive Learning for Video Action Detection

[CVPR 2022] MS-TCT  - Multi-Scale Temporal ConvTransformer for Action Detection

Pipeline Selection 👷

About

Uh oh!

Releases

Packages

Uh oh!

Languages

nevermyuk/activity-detection

Folders and files

Latest commit

History

Repository files navigation

Human-object interaction (HOI) problem domain Activity Detection 🏃‍♀️🏃‍♂️

Set Up 💾

Clone the repository with Submodules

Setup environment

Install Dependencies

Reinstall

Exporting Conda Dependencies

WanDB 1️⃣

Before You Begin - Here's what you need to know 📝

TSU 🍎 - Toyota Smart Home

STEP 🍏 - Spatio-Temporal Progressive Learning for Video Action Detection

[CVPR 2022] MS-TCT  - Multi-Scale Temporal ConvTransformer for Action Detection

Pipeline Selection 👷

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages