This is the official Github repository for our paper ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild, published at ECCV Workshops, 2024.
Authors: Arya Farkhondeh*, Samy Tafasca*, Jean-Marc Odobez
*: Equal contribution.
ChildPlay-Hand is a novel video dataset for modeling Hand-Object Interaction (HOI) in the wild that includes person and object bounding boxes, as well as manipulation actions. ChildPlay-Hand is unique in: (1) providing per-hand annotations; (2) featuring videos in uncontrolled settings with natural interactions; (3) including gaze labels from the ChildPlay-Gaze dataset for joint modeling of manipulations and gaze. We introduce two tasks: object-in-hand (OiH) and manipulation stages (ManiS), and benchmark various spatio-temporal and segmentation networks on these tasks.
See SETUP.md
See data/README.md
for how to download and organize the data and annotations.
We provide a demo to test the models on your input video. The demo generates a video with predicted hand actions and, optionally, the extracted 2D skeletons overlaid for all individuals. To run the demo:
python demo.py video_path=path/to/my_video.mp4 output_path="my_video"
To customize the settings, edit the configuration file at configs/demo.yaml
.
This project uses PyTorch Lightning for training and Hydra for managing configurations. Each model-task combination has a dedicated config file located at ./configs/{model}_{task}.yaml
. You also need to specify the input type explicitly. The available options are:
model
: {poseconv3d, rgbposeconv3d, hiera, sstcn, mstcn}task
: {object, manipulation}input_type
: {body, hand}
Here is how you can run a training job using the Hiera model with hand input (Hiera-Hand) for the manipulation (ManiS) task:
python train.py --config-name hiera_manipulation.yaml configs.input_type=hand
To run the test and reproduce the results reported in the paper, execute:
bash test.sh
This will generate the evaluation outputs as reported in the paper. You can specify different models, tasks, and input types by editing the test.sh
script.
If you use our code, models or data assets, please consider citing us:
@inproceedings{Farkhondeh_ECCVW_2024,
author = {Farkhondeh*, Arya and Tafasca*, Samy and Odobez, Jean-Marc},
title = {ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV) Workshops},
year = {2024},
note = {* Equal contribution}
}
@InProceedings{Tafasca_2023_ICCV,
author = {Tafasca*, Samy and Gupta*, Anshul and Odobez, Jean-Marc},
title = {ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {20935-20946},
note = {* Equal contribution}
}
Parts of the code were adapted from the repositories PYSKL and LART. We are thankful to the authors for their contributions.