Description
I have been getting errors when running createdb on my full data, but I am finding it difficult to debug as it takes ~1 day to reach the place where it fails, and I have not able to replicate these errors in smaller subsets of my data. I am using unicore version 1.0.1, build h7ef3eeb_0 from bioconda, with the command unicore createdb -g --max-len 8000 --afdb-lookup --afdb-local=afdb data db/${current_data}_db weights
. The full dataset I am working with has: Total number of sequences: 1246479 Average sequence length: 416.41404387879777 Number of sequences >1000: 84939
The output file is quite long, so here is the last part, ending with the error:
100%|█████████▉| 497470/497471 [22:08:46<00:00, 6.24it/s] Example: predicted for protein unicore_203705 with length 7993: (array([2, 2, 2, ..., 2, 2, 2], shape=(7993,), dtype=int8), 19) Traceback (most recent call last): File "/project/maizegdb/ltibbs/conda_envs/unicore_gpu_env/etc/predict_3Di_encoderOnly.py", line 410, in <module> main() File "/project/maizegdb/ltibbs/conda_envs/unicore_gpu_env/etc/predict_3Di_encoderOnly.py", line 397, in main get_embeddings( File "/project/maizegdb/ltibbs/conda_envs/unicore_gpu_env/etc/predict_3Di_encoderOnly.py", line 295, in get_embeddings assert s_len == len(predictions[identifier][0]), print( TypeError: len() of unsized object Error: Command exited with code 1 Command: "python" "/project/maizegdb/ltibbs/conda_envs/unicore_gpu_env/etc/predict_3Di_encoderOnly.py" "-i" "/project/90daydata/maizegdb/ltibbs/unicore_eukaryote/db/combined_aa.fasta" "-o" "/project/90daydata/maizegdb/ltibbs/unicore_eukaryote/db/combined_3di.fasta" "--model" "weights" "--half" "0" "--threads" "48"
Thank you!