Description
Hello, I am trying to run the WiLoR locally on my own dataset. I am using Intel Realsense 435d and I am using demo.py to run the model inference.
What I am trying to achieve is to get a 4x4 homogeneous transformation that represents pose of the hand w.r.t camera. The model outputs global_orient
(global orientation) and cam_t
(camera translation) vectors; so I am placing them in a 4x4 matrix to represent the overall hand pose wrt camera.
Now, I am able to run the inference and save the hand poses, but the cam_t vector I am getting is very wrong. I know for a
5F93
fact that my hand is 30cm from the camera along camera z axis, but cam_t's z value is more than 1m. I don't exactly know why I am getting this error. I understand I have to modify the values of intrinsic, focal length, image size, etc. in model_config.yaml file, but to be honest apart from modifying the EXTRA.FOCAL_LENGTH
parameter in the model_config.yaml file, I don't see anything else to do here. I also am doubtful about image size parameter, MODEL.IMAGE_SIZE
is set to be 256, but I am using 640x480 image and I don't know what value to input here.
Can someone please tell me what all do I need to change if I am running an inference on my own dataset? I would be eternally grateful if you anyone (especially the authors) could help me out here.
The following is my hand pose wrt camera.
And as you can see, the z value (hand_pose_wrt_c[3,3]) is 1.01 m which is WRONG. When I drew the body frame on my image, I am getting the below image -
The above image looks alright, but it is very wrong. Also, I would like to understand when the model outputs global_orient (global orientation), what is this 3D rotation matrix defined with respect to? I mean, where is the origin of this frame? Is it towards the bottom center of the palm (near the wrist)? or is it at the center of the palm (in the middle)?