RAT7M is an animal (rat) pose estimation database containing nearly 7 million frames with 2D & 3D keypoints acquired from motion capture across a diverse variety of rodent poses. This dataset was initially used in our paper Geometric deep learning enables 3D kinematic profiling across species and environments (Dunn et al. 2021, Nat Methods), i.e. DANNCE (code release).
Given the massive number of available frames in the dataset, additional instructions are offered in this repository for generating a smaller subset of 2D frames from the full RAT7M release, referred as "mini-RAT7M". We anticipate that it can be used as a reliable benchmark in the development of animal/rodent tracking algorithms, especially under common deep learning frameworks such as PyTorch and Tensorflow.
Table of Contents
It is expected to run the following in a Python3 environment, with extra dependencies on:
- opencv-python
- urllib3
- (optional, only for visualization) matplotlib
Make sure there is at least 300 GB free disk space.
To download the dataset, run python download_all.py
, which will yield 3 new folders under the current directory:
video_sequences
containing *.mp4 videos (n=2028).- These videos were recorded on 5 subjects, denoted as
"s1", "s2", "s3", "s4", "s5"
. - Depending on when the recording took place, there exists 7 different recordings, denoted as
"s1-d1", "s2-d1", "s2-d2", "s3-d1", "s4-d1", "s5-d1" and "s5-d2"
. - In the actual RAT7M, each recording was broken down into a series of video sequences containing at most 3500 frames. Each video sequence is named with the fashion of
{subject_id}-{recording-day}-{camera_id}-{starting_frame_idx}.mp4
.
- These videos were recorded on 5 subjects, denoted as
annnotations
containing *.mat files with motion capture data and camera parameters (n=7).- Each .mat file corresponds to a specific recording (e.g. "s1-d1") and is named as
mocap-{subject-id}-{recording-day}.mat
. - The following information is included in each annotation file:
cameras
: extrinsic and intrinsic parameters for each camera.- 'IntrinsicMatrix': 3x3 matrix K about camera internal properties, including focal lengths, principal points and skewness.
- 'RotationMatrix': 3x3 matrix R.
- 'TranslationVector': 1x3 vector t.
- R and t are usually referred together as "camera extrinsics", describing the position and orientation of a camera in the world coordinate system.
- 'TangentialDistortion': 1x2 vector describing the distortion from the lens and the image plane not being parallel.
- 'RadialDistortion': 1x2 vector describing the distortion 8000 where light rays bend more away from the optical center.
mocap
: 3D coordinates for 20 body joints from motion capture.- The coordinates are given in the following order:
0: "Front Head", 1: "Back of the Head", 2: "Left of the Head", 3: "Anterior Spine", 4: "Medial Spine", 5: "Posterior Spine", 6: "Offset 1 (for stability of motion capture)", 7: "Offset 2 (for stability of motion capture), 8: "Left Hip", 9: "Right Hip", 10: "Left Elbow", 11: "Left Arm", 12: "Left Shoulder", 13: "Right Shoulder", 14: "Right Elbow", 15: "Right Arm", 16: "Right Knee", 17: "Left Knee", 18: "Left Shin", 19: "Right Shin"
name
: name of the recording ("Subject1-Day1").
- We recommend using the utility functions
load_cameras()
andload_mocap()
inmatlab_utils.py
to load the annotations.
- Each .mat file corresponds to a specific recording (e.g. "s1-d1") and is named as
- (
zips
containing all original data in zip format; can be deleted.)
If needed, the dataset can also be downloaded manually from Figshare.
mini-RAT7M contains N=112730 data samples/timesteps extracted from the existing video sequences.
Notice that each data sample DOES NOT correspond to one single image, but from 6 synchronized camera views.
- Train: n = 88194
- s1-d1: 17812
- s2-d1: 17441
- s2-d2: 19728
- s3-d1: 19845
- s4-d1: 13368
- Test: n = 24536
- s5-d1: 10445
- s5-d2: 14091
Use the Google Drive link to download mini_rat7m_train_test_annotation.pkl. Besides the corresponding annotations from the full RAT7M, it also contains information needed for extracting the right frames from video sequences.
annot_dict = {
"cameras": dict[subject_idx][day_idx][camera_name] // dict of camera parameters as introduced above
"camera_names": numpy.ndarray // "Camera1"
"table": {
"subject_idx": numpy.ndarray //subject ID (1, 2, 3, 4, 5)
"day_idx": numpy.ndarray //recording day (1, 2)
"train_test": numpy.darray //"train" or "test"
"frame_idx": dict[camera_name] //video frame index
"image_path": dict[camera_name] //relative path to image "images_unpacked/s1-d1/camera1/frame_000014.jpg"
"2D_keypoints": dict[camera_name] //2D keypoints w.r.t each camera, with shape [20, 2]
"2D_com": dict[camera_names] //2D center of mass
"3D_keypoints": numpy.ndarray //3D keypoints with shape [20, 3]
}
"table"
is another nested dictionary where each entry's last level is designated to be a numpy.array of size N=112730, the total number of data samples. This organization makes data fetching easier under common deep learning frameworks.
- Camera parameters for the "s4-d1" recording:
annot_dict['cameras'][4][1]
- Name of the 4th camera:
annot_dict['camera_names'][3]
- Find data samples corresponding to s5-d1:
np.logical_and(annot_dict['table']['subject_idx'] == 5, annot_dict['table']['day_idx'] == 1)
- 2D keypoint coordinates for the i-th data point from camera 4:
annot_dict['table']['2D_keypoints']['Camera4'][i-1]
Run python extract_frames.py
, which should yield
-
images_unpacked
containing all unpacked 2D frames.It is organized as follows:
- images_unpacked/ - s1-d1/ - camera1/ - frame_00014.jpg - .... - camera2/ - camera3/ - camera4/ - camera5/ - camera6/ - s2-d1/ - s2-d2/ - s3-d1/ - s4-d1/ - s5-d1/
Open visualization.ipynb
and follow the instructions inside. If everything works correctly, you should be seeing a random image similar to
where
- 20 body keypoints are marked in red
- 2D center of mass is marked in blue
- 21 pairs of keypoints are connected correspondingly based on rat anatomy.
- Mean Per-Joint Position Error (MPJPE): mean Euclidean/L2 distances between predicted keypoints and ground truth.
- Procrustes Analysis MPJPE (PA-MPJPE): MPJPE after rigid alignment with the ground truth skeletons after translation, rotation and scale).
- Normalized MPJPE (N-MPJPE): MPJPE after scale normalization to make the evaluation independent of subject size.
- We provide helper functions for computing the above metrics in
metric_utils/metrics.py
.-
Each function expects two arguments:
predicted
: numpy.darray of shape (n_samples, n_joints, 2/3)target
: numpy.darray of shape (n_samples, n_joints, 2/3)
and returns the error metric results for each sample:
- numpy.darray of shape (n_sample, 1)
-
- People might also use the script
metric_utils/compute_metrics.py
to automatically compute the error metrics.- Run
python compute_metrics.py --pred_path /path/to/your/csv/file
. Check--help
to see other optional arguments. - The CSV file should have a shape of (N=24536, 60) without header, where the second dimension comes from flattening 3D coordinates of the 20 body keypoints, i.e. (keypoint1_x, keypoint1_y, keypoint1_z, keypoint2_x, keypoint2_y, keypoint2_z, ...).
- Run
If you use this dataset, please kindly cite:
@article{dunn2021geometric,
title={Geometric deep learning enables 3D kinematic profiling across species and environments},
author={Dunn, Timothy W and Marshall, Jesse D and Severson, Kyle S and Aldarondo, Diego E and Hildebrand, David GC and Chettih, Selmaan N and Wang, William L and Gellis, Amanda J and Carlson, David E and Aronov, Dmitriy and others},
journal={Nature methods},
volume={18},
number={5},
pages={564--573},
year={2021},
publisher={Nature Publishing Group}
}