Help on some problems #2

samakos · 2023-05-02T09:56:14Z

hello, I spent lot of time trying reproducing the experiment on FUNSD dataset but there are some misconnections.
First of all, the function constituency_parsing_extractor(parse_string) has not been defined, but I am trying to implement it myself.
The second problem is that I found some errors in the function text_density, text_number, char_density and char_number which I think I solved.
The third problem is that in the 3rd notebook you read this pkl file gcn_visual-gcn_char_density-bert_base_cls_test.pkl which has not been defined in the previous notebooks.

I find Doc-GCN a very powerful model and I am trying to solve these issues so I can use your model in other public datasets.
Is there anyone available to help on these issues?

samakos · 2023-05-02T13:08:24Z

also:
in the final notebook, in the dataframe you use for training you use some columns gcn_near_char_density, gcn_near_char_number, gcn_near_token_number which are not produced in the other notebooks

yihaoding · 2023-05-04T14:58:42Z

Hi Samakos,

We use benepar_en3 to parse the input text contents.
Please point out the errors in those functions so I can double-check the uploaded version to update those functions.
The pkl file contains the updated GCN hidden output from appearance (visual) and density graphs(previous notebook). We use this file to get the original text and syntactic features before feeding it into GCN. More than happy to provide the downlink later, but you can use the provided file to get the required feature directly.
Thanks for paying attention to our paper! We are happy to answer all your questions and make this repository more accurate and precise. The additional functions and links will be updated this weekend.

samakos · 2023-05-05T09:04:15Z

hello @yihaoding thank you very much for your willingness to help, and great work guys, amazing paper.
Today I will spend all the day to identify the issues and the errors in the functions and I will provide it to you later today

Thank you very much

samakos · 2023-05-05T09:52:45Z

Hello again, thank you for your help, kindly appreciate it. Below you can see what problems I had. Maybe it is not an error and I did something wrong. Thank you very much for your help!

Notebook 1: funsd_dataset_preprocessing.ipynb
1)
Section: Scene graph generation
for l in train_list_dict:
for obj in train_list_dict[l]['objects']:
token_density(train_list_dict[l]['objects'][obj])
token_number(train_list_dict[l]['objects'][obj])
char_density(train_list_dict[l]['objects'][obj])
char_number(train_list_dict[l]['objects'][obj])

The functions token_density and token_number are not defined, but I guess it is a typo and the correct one is text_density and text_number.
However with text_density and text_number some functions require 2 arguments but the code has only 1 argument. Also for the function that has 1 argument (like text_number) the code doesn’t run.

In addition the train_list_dict and eval_list_dict are not defined in notebook 1 but I took them from the notebook GCN_Funsd_distance_weighted_based_publicly.ipynb, is it correct?

constituency_parsing_extractor(parse)
The function is not defined and the code doesn’t run.
Could you please provide the function?
/content/funsd_train_bert_cls.pkl
Could you please provide this file because I cannot generate it from the existing code. I modified the code and I would like to see if I have the same results as you.

Notebook 4: Funsd_Object_Detection_best_model.ipynb
4) in the df_train.head() I see that the visual features have a lot of negative numbers (almost all) but in the first notebook when extracting the visual features there weren't negative numbers.

in the notebook GCN_Funsd_distance_weighted_based_publicly.ipynb in the train_list_dict pp['91361993']['objects']['0']['text_density'] is a float. also the char_density is a float. But in the final notebook Funsd_Object_Detection_best_model.ipynb the density is a list with negative numbers. How this density occurs?
new_df = df_train[['text', 'label','near_visual_feature', 'gcn_near_char_density', 'gcn_near_char_number',
'level1_parse_emb','level2_parse_emb','gcn_near_token_density','density','visual_feature','gcn_bert_predicted']]

The gcn_bert predicted is not in the columns of the dataframe but I guess it is the gcn_bert_base column, correct?

in the notebook funsd_dataset_preprocessing.ipynb the parsing_level1 and parsing_level2 are defined, but in the final notebook there are level1_parse_emb and level2_parse_emb. In which notebooks you define the level_parse_emb?

Overall I think that most of the errors occur because I cannot generate the text_number, text_density, char_density and char_number.

Thank you very much! Amazing work guys!

shashank-kit · 2023-05-07T05:46:50Z

Thanks @samakos for raising similar issue. You have done a good consolidation of potential issues in reproduction mainly due to disconnections and some partial implementations which I faced last week.

@adlnlp @yihaoding
Could you please help us by bridging the gaps and addressing the inconsistenties raised by @samakos with pickle file, column names, function definitions and how they are being generated ? Also uploaded a sample result files from each notebooks would be beneficial to reconcile with our results.

Thank you very much! Indeed an amazing paper with incredible potential.
Looking forward for your new commits with fixes.

samakos · 2023-05-15T11:45:14Z

hello again guys, and hope this message finds you well. I would like to ask whether there is any update related to the issues. I am a MSc student and would like to explore doc-gcn and include it on my thesis. I would like to ask about the feasibility of solving the problems until end of May. If you are not available this month there is no problem :)

yihaoding · 2023-05-30T05:35:20Z

Hi, Thank you very much for working on our research project again.

For Q1 and Q2, the URL was our previous version and I updated it to the updated ipynb link to show how they generate the corresponding features. Please try the updated ipynb and let me know if there are any issues.
You can also get into the colab via here.
For Q3, we use the pre-trained bert-cls-uncased to extract [cls] token as the textual (semantic) representations for each layout component.
For Q4, the visual features are different: the original visual feature (2048 d) extracted from ResNet and the GCN-enhanced visual features (768 d).
For Q5, we convert the density value (a digit) to a high dimensional vector) for integrating with other aspect representations. Original text density is always positive, but the vectorised text density may have negative elements.
For Q6, yes, that’s correct since we tested using BERT-large in our model.
For Q7, we use the parent-child relation-based GCN network to get the level1 and level2 embedding, respectively.

yihaoding mentioned this issue May 30, 2023

Missing links and codes on FUNSD dataset examples #1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Help on some problems #2

Help on some problems #2

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Help on some problems #2

Help on some problems #2

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!