8000 RuntimeError: received 0 items of ancdata · Issue #27 · allenai/kb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

RuntimeError: received 0 items of ancdata #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy 8000 statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
abdkiwan opened this issue Oct 7, 2020 · 6 comments
Open

RuntimeError: received 0 items of ancdata #27

abdkiwan opened this issue Oct 7, 2020 · 6 comments

Comments

@abdkiwan
Copy link
abdkiwan commented Oct 7, 2020

Hello,

Could you plz tell me the reason of this error, and how to solve it ?
It appears shortly after starting to pre-train a language model, then it stops.

Number of workers: 8
Number of corpus files: 8
torch version: 1.2.0

thanks for help

@matt-peters
Copy link
Contributor

I haven't seen this error, can you post a full traceback?

@abdkiwan
Copy link
Author
abdkiwan commented Oct 9, 2020

Hello,

I could solve the problem partially by increasing the limit of the open files using: ulimit -n NEW_NUMBER_OF_FILES
After doing this, the training could run for a few hours, then I received another error and it crashed.
Here is the full traceback:

Traceback (most recent call last):
File "/home/IAIS/akiwan/relation-extraction/kb/kb/multitask.py", line 135, in call
batch = next(generators[index])
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/bin/allennlp", line 8, in
sys.exit(run())
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/run.py", line 18, in runTraceback (most recent call last):
File "/home/IAIS/akiwan/relation-extraction/kb/kb/multitask.py", line 135, in call
batch = next(generators[index])
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/bin/allennlp", line 8, in
sys.exit(run())
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/run.py", line 18, in run
main(prog="allennlp")
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/commands/init.py", line 101, in main
args.func(args)
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/commands/train.py", line 103, in train_model_from_args
args.force)
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/commands/train.py", line 136, in train_model_from_file
return train_model(params, serialization_dir, file_friendly_logging, recover, force)
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/commands/train.py", line 204, in train_model
metrics = trainer.train()
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/training/trainer.py", line 538, in train
train_metrics = self._train_epoch(epoch)
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/training/trainer.py", line 334, in _train_epoch
for batch_group in train_generator_tqdm:
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/tqdm/std.py", line 1174, in iter
for obj in iterable:
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/common/util.py", line 105, in
return iter(lambda: list(islice(iterator, 0, group_size)), [])
File "/home/IAIS/akiwan/relation-extraction/kb/kb/multitask.py", line 141, in call
raise ValueError
ValueError
main(prog="allennlp")
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/commands/init.py", line 101, in main
args.func(args)
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/commands/train.py", line 103, in train_model_from_args
args.force)
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/commands/train.py", line 136, in train_model_from_file
return train_model(params, serialization_dir, file_friendly_logging, recover, force)
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/commands/train.py", line 204, in train_model
metrics = trainer.train()
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/training/trainer.py", line 538, in train
train_metrics = self._train_epoch(epoch)
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/training/trainer.py", line 334, in _train_epoch
for batch_group in train_generator_tqdm:
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/tqdm/std.py", line 1174, in iter
for obj in iterable:
File "/home/IAIS/akiwan/anaconda3/envs/knowbert/lib/python3.6/site-packages/allennlp/common/util.py", line 105, in
return iter(lambda: list(islice(iterator, 0, group_size)), [])
File "/home/IAIS/akiwan/relation-extraction/kb/kb/multitask.py", line 141, in call
raise ValueError
ValueError

@matt-peters
Copy link
Contributor

Can you post your config? Are you using the multitask_iterator with only a single dataset / task? If you have only a single dataset then you can use a standard iterator without the multitask wrapper.

@abdkiwan
Copy link
Author

Hello Peter,

Here is the configuration file (DatasetReader & Iterator):

"dataset_reader": {
"type": "multitask_reader",
"datasets_for_vocab_creation": [],
"dataset_readers": {
"language_modeling": {
"type": "multiprocess",
"base_reader": {
"type": "bert_pre_training",
"tokenizer_and_candidate_generator": {
"type": "bert_tokenizer_and_candidate_generator",
"entity_candidate_generators": {
"wiki": {"type": "wiki"},
},
"entity_indexers": {
"wiki": {
"type": "characters_tokenizer",
"tokenizer": {
"type": "word",
"word_splitter": {"type": "just_spaces"},
},
"namespace": "entity"
}
},
"bert_model_type": "bert-base-uncased",
"do_lower_case": true,
},
"lazy": true,
"mask_candidate_strategy": "full_mask",
},
"num_workers": 8,
},
}
},

"iterator": {
    "type": "multitask_iterator",
    "names_to_index": ["language_modeling"],
    "iterate_forever": true,

    "sampling_rates": [1],

    "iterators": {
        "language_modeling": {
            "type": "multiprocess",
            "base_iterator": {
                "type": "self_attn_bucket",
                "batch_size_schedule": "base-24gb-fp32",
                "iterator": {
                    "type": "bucket",
                    "batch_size": 8,
                    "sorting_keys": [["tokens", "num_tokens"]],
                    "max_instances_in_memory": 2500,
                }
            },
            "num_workers": 8,
        },
    },
},

@matt-peters
Copy link
Contributor

It isn't necessary to use the multitask_iterator with only one task. Try replacing the iterator section with this:

"iterator": {
    "type": "multiprocess",
    "base_iterator": {
        "type": "self_attn_bucket",
        "batch_size_schedule": "base-24gb-fp32",
        "iterator": {
            "type": "bucket",
            "batch_size": 8,
            "sorting_keys": [["tokens", "num_tokens"]],
            "max_instances_in_memory": 2500,
        }   
    },  
    "num_workers": 8,
},

@abdkiwan
Copy link
Author

I did exactly what you suggested. However, the model didn't go into training at all.
Here are the info messages:

2020-10-15 01:30:33,083 - INFO - allennlp.training.trainer - Beginning training.
2020-10-15 01:30:33,083 - INFO - allennlp.training.trainer - Epoch 0/0
2020-10-15 01:30:33,083 - INFO - allennlp.training.trainer - Peak CPU memory usage MB: 8793.632
2020-10-15 01:30:33,499 - INFO - allennlp.training.trainer - GPU 0 memory usage MB: 11
2020-10-15 01:30:33,499 - INFO - allennlp.training.trainer - GPU 1 memory usage MB: 1146
2020-10-15 01:30:33,500 - INFO - allennlp.training.trainer - GPU 2 memory usage MB: 7327 11841
2020-10-15 01:30:33,500 - INFO - allennlp.training.trainer - GPU 3 memory usage MB: 11
2020-10-15 01:30:33,506 - INFO - allennlp.training.trainer - Training
0%| | 0/1 [00:00<?, ?it/s]
0%| | 0/1 [00:00<?, ?it/s]

2020-10-15 01:30:34,213 - INFO - allennlp.training.tensorboard_writer - Training | Validation
2020-10-15 01:30:34,214 - INFO - allennlp.training.tensorboard_writer - gpu_0_memory_MB | 11.000 | N/A
2020-10-15 01:30:34,215 - INFO - allennlp.training.tensorboard_writer - wiki_el_precision | 0.000 | N/A
2020-10-15 01:30:34,216 - INFO - allennlp.training.tensorboard_writer - cpu_memory_MB | 8793.632 | N/A
2020-10-15 01:30:34,217 - INFO - allennlp.training.tensorboard_writer - nsp_loss_ema | 0.000 | N/A
2020-10-15 01:30:34,217 - INFO - allennlp.training.tensorboard_writer - lm_loss_wgt | 0.000 | N/A
2020-10-15 01:30:34,218 - INFO - allennlp.training.tensorboard_writer - wiki_el_f1 | 0.000 | N/A
2020-10-15 01:30:34,219 - INFO - allennlp.training.tensorboard_writer - wiki_span_f1 | 0.000 | N/A
2020-10-15 01:30:34,219 - INFO - allennlp.training.tensorboard_writer - gpu_1_memory_MB | 1146.000 | N/A
2020-10-15 01:30:34,220 - INFO - allennlp.training.tensorboard_writer - nsp_loss | 0.000 | N/A
2020-10-15 01:30:34,221 - INFO - allennlp.training.tensorboard_writer - gpu_2_memory_MB | 11841.000 | N/A
2020-10-15 01:30:34,221 - INFO - allennlp.training.tensorboard_writer - total_loss | 0.000 | N/A
2020-10-15 01:30:34,222 - INFO - allennlp.training.tensorboard_writer - lm_loss_ema | 0.000 | N/A
2020-10-15 01:30:34,223 - INFO - allennlp.training.tensorboard_writer - wiki_span_precision | 0.000 | N/A
2020-10-15 01:30:34,223 - INFO - allennlp.training.tensorboard_writer - lm_loss | 0.000 | N/A
2020-10-15 01:30:34,223 - INFO - allennlp.training.tensorboard_writer - wiki_span_recall | 0.000 | N/A
2020-10-15 01:30:34,224 - INFO - allennlp.training.tensorboard_writer - nsp_accuracy | 0.000 | N/A
2020-10-15 01:30:34,224 - INFO - allennlp.training.tensorboard_writer - gpu_3_memory_MB | 11.000 | N/A
2020-10-15 01:30:34,225 - INFO - allennlp.training.tensorboard_writer - total_loss_ema | 0.000 | N/A
2020-10-15 01:30:34,225 - INFO - allennlp.training.tensorboard_writer - mrr | 0.000 | N/A
2020-10-15 01:30:34,226 - INFO - allennlp.training.tensorboard_writer - loss | 0.000 | N/A
2020-10-15 01:30:34,226 - INFO - allennlp.training.tensorboard_writer - wiki_el_recall | 0.000 | N/A
2020-10-15 01:30:39,877 - INFO - allennlp.training.checkpointer - Best validation performance so far. Copying weights to 'knowbert_bert_rxnorm_lm_corpus_50_epochs_1/best.th'.
2020-10-15 01:30:45,483 - INFO - allennlp.training.trainer - Epoch duration: 00:00:12
2020-10-15 01:30:45,484 - INFO - allennlp.training.checkpointer - loading best weights
2020-10-15 01:30:45,830 - INFO - allennlp.models.archival - archiving weights and vocabulary to knowbert_bert_rxnorm_lm_corpus_50_epochs_1/model.tar.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0