Pre-trained models do not reproduce paper results #6

una-dinosauria · 2016-09-07T13:46:25Z

Hi!

I'm using the pre-trained models available at https://drive.google.com/open?id=0B7lfjqylzqmMZlI3TUNUUEFQMXc and running generateMotionForecast.py. This produces motion predictions for different activities and models, but I've found that these do not correspond to what is reported in the paper. For reference, here's the figure from the paper that I'm talking about:

But, for example, using the lstm3lr_walking model, the checkpoint.pikforecast_error file contains the following values:

T=0 2.87922000885, 0.053318887949
T=1 3.31045722961, 0.0613047629595
T=2 3.72076749802, 0.0689031034708
T=3 4.20972061157, 0.0779577866197
T=4 4.62205123901, 0.0855935439467
T=5 4.9056763649, 0.0908458605409
T=6 5.15456962585, 0.0954549908638
T=7 5.68943977356, 0.105359993875

T=8 6.14819526672, 0.113855466247
T=9 6.47697734833, 0.119944028556
T=10 6.86927509308, 0.127208799124
T=11 7.25948381424, 0.134434878826
T=12 7.56049823761, 0.140009224415
T=13 7.60584354401, 0.140848949552
T=14 7.81918954849, 0.144799813628
T=15 7.99432945251, 0.148043140769
T=16 8.21197509766, 0.152073606849
T=17 8.22490978241, 0.152313143015
T=18 8.21773910522, 0.152180358768
T=19 8.20940303802, 0.152025982738
T=20 8.21308326721, 0.152094140649
T=21 8.08870410919, 0.14979082346
T=22 7.9909658432, 0.147980853915
T=23 7.93785572052, 0.146997332573
T=24 8.08372688293, 0.149698644876
T=25 8.17058372498, 0.151307106018
T=26 8.29908180237, 0.153686702251
T=27 8.29321861267, 0.15357811749
T=28 8.33865356445, 0.154419511557
T=29 8.29992961884, 0.153702393174
T=30 8.31999206543, 0.154073923826
T=31 8.37398910522, 0.155073866248
T=32 8.47292232513, 0.156905964017
T=33 8.59246826172, 0.159119784832
T=34 8.65988731384, 0.160368278623
T=35 8.66351318359, 0.160435423255
T=36 8.65542507172, 0.160285651684
T=37 8.70272254944, 0.161161527038
T=38 8.90265083313, 0.16486389935
T=39 9.08981990814, 0.168329998851
T=40 9.22410964966, 0.170816838741
T=41 9.25332164764, 0.171357810497
T=42 9.3009595871, 0.172239989042
T=43 9.29813861847, 0.172187745571
T=44 9.26357460022, 0.171547681093
T=45 9.19590568542, 0.170294553041
T=46 9.15723419189, 0.169578418136
T=47 9.24366569519, 0.171178996563
T=48 9.30495262146, 0.172313943505
T=49 9.25953674316, 0.171472907066
T=50 9.24114990234, 0.171132400632
T=51 9.26937294006, 0.171655058861
T=52 9.3104429245, 0.172415614128
T=53 9.19757270813, 0.170325413346
T=54 9.04441356659, 0.167489141226
T=55 8.96823406219, 0.166078403592
T=56 9.00592136383, 0.166776314378
T=57 9.09947776794, 0.168508842587
T=58 9.06608009338, 0.167890369892
T=59 9.1175775528, 0.168844029307
T=60 9.23169708252, 0.170957356691
T=61 9.25059127808, 0.171307250857
T=62 9.23868370056, 0.171086728573
T=63 9.21300506592, 0.170611202717
T=64 9.20988750458, 0.170553475618
T=65 9.30304908752, 0.172278687358
T=66 9.30745029449, 0.17236019671
T=67 9.29339599609, 0.172099933028
T=68 9.21964550018, 0.170734182
T=69 9.22905826569, 0.170908480883
T=70 9.11111068726, 0.168724268675
T=71 9.0918712616, 0.168367981911
T=72 8.92658901215, 0.165307208896
T=73 8.91659736633, 0.165122166276
T=74 8.82111263275, 0.163353934884
T=75 8.90966320038, 0.16499376297
T=76 9.02032756805, 0.167043104768
T=77 9.09782981873, 0.168478325009
T=78 9.22392463684, 0.170813426375
T=79 9.33905029297, 0.172945380211
T=80 9.31301212311, 0.172463193536
T=81 9.44260978699, 0.174863144755
T=82 9.45653438568, 0.17512100935
T=83 9.52670955658, 0.176420554519
T=84 9.64883327484, 0.178682103753
T=85 9.83387374878, 0.182108774781
T=86 9.95151329041, 0.184287279844
T=87 9.91870689392, 0.183679759502
T=88 9.91715335846, 0.18365098536
T=89 10.0150337219, 0.18546359241
T=90 9.95522022247, 0.184355929494
T=91 9.70408630371, 0.179705306888
T=92 9.56737327576, 0.1771735847
T=93 9.58298301697, 0.177462652326
T=94 9.52612495422, 0.176409721375
T=95 9.55842971802, 0.177007958293
T=96 9.53139877319, 0.176507383585
T=97 9.50600910187, 0.176037207246
T=98 9.59951972961, 0.177768886089
T=99 9.80951976776, 0.181657776237

where the left and right columns correspond to skel_err and err_per_dof as computed in forecastTrajectories.py#L124

skel_err = np.mean(np.sqrt(np.sum(np.square((forecasted_motion - trY_forecasting)),axis=2)),axis=1)
err_per_dof = skel_err / trY_forecasting.shape[2]

I find one value to be much worse, and the other to be about 1 order of magnitude better. Do you have any pointers as to what I could be doing wrong?

The text was updated successfully, but these errors were encountered:

una-dinosauria · 2016-09-15T10:41:07Z

Hey! Sorry for bothering again. I've also made some movies with these models and they definitely do not correspond to what is shown in the official video of the paper -- maybe you didn't upload the final final models?

asheshjain399 · 2016-09-15T14:31:49Z

The final models are here: https://drive.google.com/open?id=0B7lfjqylzqmMZlI3TUNUUEFQMXc (same link as above). The numbers is Table 1 are Euler angle errors, and not exponential map error. I think you are outputting exponential map errors.

The model are trained on exponential map representation of joints, the output is then converted to Euler angle representation for visualization and quantitative comparison.

asheshjain399 · 2016-09-15T14:35:41Z

You should look into the Utils directory. It has some Matlab scripts that the do the conversion for you (Sorry, Utils is not documented yet).

una-dinosauria · 2016-09-17T13:20:00Z

Thanks a lot. I looked into the utils directory and found this motionGenerationError.m file that computes error with the conversion expmap->rotmat->euler. When I run this on the generated motion of pre-trained models, I get the following errors:

erd walking        [0.93 1.18 1.59 1.97 2.24 ]
lstm3lr walking    [0.77 1.00 1.29 1.74 1.84 ]
srnn walking       [0.81 0.94 1.16 1.48 1.78 ]
erd eating         [1.27 1.45 1.66 1.95 2.02 ]
lstm3lr eating     [0.89 1.09 1.35 1.66 1.97 ]
srnn eating        [0.97 1.14 1.35 1.62 2.09 ]
erd smoking        [1.66 1.95 2.35 2.63 3.61 ]
lstm3lr smoking    [1.34 1.65 2.04 2.30 2.59 ]
srnn smoking       [1.45 1.68 1.94 2.24 2.64 ]
erd discussion     [2.27 2.47 2.68 2.92 3.16 ]
lstm3lr discussion [1.88 2.12 2.25 2.33 2.45 ]
srnn discussion    [1.22 1.49 1.83 2.07 2.24 ]

These results are a bit better than those reported in Table 1 :) -- Do you have an idea of what could be causing the discrepancy? I've noticed that the code ignores the global rotation and translation (e.g. motionGenerationError.m#L35 sets them to zero); I experimented with setting only the global rotation to zeros and I get slightly worse results, but still better than those in Table 1. However, if I completely comment that line (e.g., I add global rotation and global rotation), I get the following results:

erd walking        [4.69 11.71 33.38 62.81 106.05 ]
lstm3lr walking    [4.30 10.24 29.06 51.01 83.57 ]
srnn walking       [6.94 15.12 32.33 57.26 89.95 ]
erd eating         [4.87 10.07 17.09 27.64 38.08 ]
lstm3lr eating     [6.24 12.47 22.04 40.35 82.21 ]
srnn eating        [5.05 9.32 14.75 23.29 35.91 ]
erd smoking        [4.20 7.77 15.42 31.38 51.39 ]
lstm3lr smoking    [3.75 7.26 14.21 21.88 31.83 ]
srnn smoking       [4.44 8.15 14.34 22.08 31.76 ]
erd discussion     [5.95 13.99 33.42 59.98 111.36 ]
lstm3lr discussion [9.55 20.88 42.32 62.46 82.69 ]
srnn discussion    [9.40 19.81 39.03 59.83 99.46 ]

Which are definitely much worse.

Thanks again for getting back to me; we seem to be getting closer to reproducing the results in the paper.

una-dinosauria · 2016-09-17T13:23:59Z

As a side note, I'm assuming that everything is at 25fps, right? Since you have 8x less data than what can be downloaded from human3.6m, and that is sampled at 200fps. Hence, in the error vector I'm using the indices [2,4,8,14,25] which correspond to [80, 160, 320, 560 and 1000] milliseconds.

asheshjain399 · 2016-09-25T06:09:09Z

The errors reported in table 1 only include the Euler angles, and does not include the global translation and rotation errors. This similar to Fragkiadaki et al. ICCV'15.

We used mocap data at 100hz (down sampled by 2) and not at 25Hz. The reason you see less data is because we don't use all the data from human3.6. The details on the sequences we used can be found in the experiment section of the paper. Just to reiterate, our experiment settings (to the best of our effort) are very similar to Fragkiadaki et al.

Seleucia · 2016-12-04T21:24:09Z

Hello @asheshjain399 , Thank you very much for releasing code and pre-trained models. I'm trying to reproduce the your results but i couldn't manage it.

I see that you are normalizing the data, so prediction is also normalised. are you computing error over normalized data? or are you unnormalizing your prediction with data statistics.

I used the motionGenerationError.m file to generate error, but it seems it is expecting the prediction vector should be 99 dimensional, but on the other hand code produces 54 dimensional vector. I modify motionGenerationError.m file to handle this but i'm not sure if that is correct way or not.

Another thing is that i see that you are computing direct 2d L2 loss between each angles, shouldn't it be 3d loss between joints with this your error will be considerebly less.

pvmilk · 2017-07-24T05:04:09Z

@una-dinosauria I am trying to reproduce your result by modifying motionGenerationError.m to calculate error from forecast_N_n and ground_truth_forecast_N_n, n in [0, 23].

lstm3lr walking [0.77 1.00 1.29 1.74 1.84 ]

Note : I believe that this is the same value that appears in your paper (cvpr2017; on human motion prediction using RNN).

However, I got the number that is very different from your result and the result in srrn paper.

lstm3lr walking [7.7294, 8.7923, 8.7971, 9.2380, 9.1237]

The only modification I done on motionGenerationError.m is

Considering all data (24, instead of 7) | motionGenerationError.m#L18
Considering only 54 features (instead of 100) | motionGenerationError.m#L31 and motionGenerationError.m#L47
Filename to read from | motionGenerationError.m#L20 and motionGenerationError.m#L24 and motionGenerationError.m#L40 and motionGenerationError.m#L43

And other differences could be

The code is running on octave, rather than matlab
- (only warning) warning: RNNexp/structural_rnn/CRFProblems/H3.6m/mhmublv/Motion/RotMat2Euler.m: possible Matlab-style short-circuit operator at line 34, column 16
The prediction was running using Theano 0.9.0

Am I missing something here? e.g. unnormalized that data, consider only n in [0,7].

@Seleucia So you manage to solve your issue of reproducing the results from srnn paper using a pre-trained model?

Thank you.

una-dinosauria · 2017-07-24T16:48:43Z

I did manage to get the numbers that I reported, and I remember them being reasonably close to what the SRNN paper reports. Have you made movies for your predictions? If I remember correctly, the movie for discussion was exactly was is shown in the official SRNN movie, but I never managed to get the other ones.

I'm currently away at a conference but if you make your branch public I can look at the code once I get back to the lab (and make a diff with my code to see if there's something noticeably different).

Seleucia · 2017-07-24T17:07:32Z

Yes, i manage to get exactly same number given at the srnn paper with pre-trained models. I tthink confusion here is related with the subsampling. @una-dinosauria is right that given data here is 25ps not 100fps.

pvmilk · 2017-07-25T05:53:56Z

@una-dinosauria No, I haven't made movies for the predictions yet. Let's me try a couple things on my own. If it is still not working still, I will ask you a favor for a diff.

@Seleucia Do you make any change to the source code more than motionGenerationError.m as mentioned earlier (99->54)? Can you also elaborate more on the subsampling issue?

Thank you.

Seleucia · 2017-07-25T06:05:25Z

I did not make any changes except that I mention here. Subsampling issue was related with the selected frames, SRNN paper is reporting the frames: [8, 16, 32, 56, 100], not the one @una-dinosauria said: [2,4,8,14,25]. SRNN paper assumes that they subsampled by 2, on the other @una-dinosauria paper assuming that subsampled by 8. I think @una-dinosauria is right, given time at SRNN paper is wrong.

pvmilk · 2017-07-25T11:40:25Z

@una-dinosauria I tried to duplicate your result as mentioned, but without success.
Could you have a look into it when you have time?

Thank you.

Here is what I did and my result:

1.) I use both srnn branch of both RNNexp (@3ba986b) and NeuralModels (@fb02335).

2.) Changes I made is to make the program run, and they can be found in patch_srnn.txt

3.) I download the data and pre-trained model, then forecase the motion with

$ python generateMotionForecast.py lstm3lr `datapath`/pre-trained/lstm3lr_walking/

4.) I calculate the error using matlab script

$ octave

octave:1> merr = motionGenerationError('`datapath`/pre-trained/lstm3lr_walking/');

(I actually use octave here, also I need to download H3.6m visualize code version 1.1 and extract it under RNNexp/structural_rnn/CRFProblems/H3.6m/h36devkit folder).

lstm3lr walking

merr([2,4,8,14,25]) = 3.0110 3.9911 4.8584 6.8636 7.1127

Below is the error value for a 100 predicted frames.

pvmilk · 2017-07-26T12:31:02Z

@una-dinosauria I think I got it already. The output of the prediction from generateMotionForecast.py needed to be unnormalised before calculating the error with motionGenerationError.m.

There is a unnormalised method provide in unNormalizeData.py, but you would need to modify the source code to do it yourself.

For those who is following the thread, I will provide the patch once I clean my code.

Thank you.

pvmilk · 2017-07-27T06:06:52Z

As promised, please replace the following patch srnn_patch.txt in step 2.) I provided above.

With this, you should be able to reproduce the same/similar result as the Structural-RNN for lstm3lr and erd case.

lstm3lr walking

merr([8,16,32,56,100]) =   1.1697 1.4747 1.6444 1.7967 2.1886

erd walking

merr([8,16,32,56,100]) =  1.3010 1.5636 1.8428 2.005 2.3858

Please note that if I used merr([2,4,8,14,25]) the different is slightly better than the one report in the baseline paper (cvpr2017; on human motion prediction using RNN).

lstm3lr walking

merr([2,4,8,14,25]) =   0.67755 0.88913 1.16974 1.41097 1.59932

erd walking

merr([2,4,8,14,25]) =   0.85603 1.04604 1.30096 1.52555 1.71511

UPDATE (9 August 2017):
For those who also tried to duplicate the result for other action (eating, smoking, discussion), you may need to look into the parameters 'actions' in 'RNNexp/structural_rnn/CRFProblems/H3.6m/processdata.py'

MAtthewGHuser · 2021-02-22T16:48:23Z

Thanks a lot. I looked into the utils directory and found this motionGenerationError.m file that computes error with the conversion expmap->rotmat->euler. When I run this on the generated motion of pre-trained models, I get the following errors:
erd walking        [0.93 1.18 1.59 1.97 2.24 ]
lstm3lr walking    [0.77 1.00 1.29 1.74 1.84 ]
srnn walking       [0.81 0.94 1.16 1.48 1.78 ]
erd eating         [1.27 1.45 1.66 1.95 2.02 ]
lstm3lr eating     [0.89 1.09 1.35 1.66 1.97 ]
srnn eating        [0.97 1.14 1.35 1.62 2.09 ]
erd smoking        [1.66 1.95 2.35 2.63 3.61 ]
lstm3lr smoking    [1.34 1.65 2.04 2.30 2.59 ]
srnn smoking       [1.45 1.68 1.94 2.24 2.64 ]
erd discussion     [2.27 2.47 2.68 2.92 3.16 ]
lstm3lr discussion [1.88 2.12 2.25 2.33 2.45 ]
srnn discussion    [1.22 1.49 1.83 2.07 2.24 ]
These results are a bit better than those reported in Table 1 :) -- Do you have an idea of what could be causing th 7079 e discrepancy? I've noticed that the code ignores the global rotation and translation (e.g. motionGenerationError.m#L35 sets them to zero); I experimented with setting only the global rotation to zeros and I get slightly worse results, but still better than those in Table 1. However, if I completely comment that line (e.g., I add global rotation and global rotation), I get the following results:
erd walking        [4.69 11.71 33.38 62.81 106.05 ]
lstm3lr walking    [4.30 10.24 29.06 51.01 83.57 ]
srnn walking       [6.94 15.12 32.33 57.26 89.95 ]
erd eating         [4.87 10.07 17.09 27.64 38.08 ]
lstm3lr eating     [6.24 12.47 22.04 40.35 82.21 ]
srnn eating        [5.05 9.32 14.75 23.29 35.91 ]
erd smoking        [4.20 7.77 15.42 31.38 51.39 ]
lstm3lr smoking    [3.75 7.26 14.21 21.88 31.83 ]
srnn smoking       [4.44 8.15 14.34 22.08 31.76 ]
erd discussion     [5.95 13.99 33.42 59.98 111.36 ]
lstm3lr discussion [9.55 20.88 42.32 62.46 82.69 ]
srnn discussion    [9.40 19.81 39.03 59.83 99.46 ]
Which are definitely much worse.

Thanks again for getting back to me; we seem to be getting closer to reproducing the results in the paper.

Hi, I also try to reproduce the code. And I use motionGenerationError.m to convert the data expmap->rotmat->euler. But the result what I got is like that:

srnn walking       [4.57 5.12 5.95 6.04 7.43 ]    (skel_err)
                           [0.10 0.11 0.12 0.13 0.15 ]    (err_per_dof)

It's different from your result

srnn walking       [0.81 0.94 1.16 1.48 1.78 ]

And it is also different from results in the paper.

So maybe I have a look at your reproduce code? I appreciate you so much.

una-dinosauria changed the title ~~Cannot reproduce paper results~~ Pre-trained models do not reproduce paper results Sep 12, 2016

pvmilk mentioned this issue Jul 24, 2017

Sampling rate of the data una-dinosauria/human-motion-prediction#8

Closed

CHELSEA234 mentioned this issue Jun 6, 2018

Doubt on quantitative result table. una-dinosauria/human-motion-prediction#29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pre-trained models do not reproduce paper results #6

Pre-trained models do not reproduce paper results #6

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pre-trained models do not reproduce paper results #6

Pre-trained models do not reproduce paper results #6

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!