Description
Hello everyone. First of all, I'd like to congratulate you on this incredible tool.
In my current project, I need to create cell x transcript matrices for my single nucleus human sample data generated using 10x technology. I have been trying to process this data using kallisto bus following this tutorial but unfortunately none of the indexes I have created have been compatible with 10xv1 technology and so I have been getting the following error:
**[gabrielfonseca@login 02_kallisto_bus]$ srun kb count --tcc -i index_HS_98.idx -g t2g.txt -c1 cDNA_t2c.txt -c2 introns_t2c.txt -x 10xv1 -o /sn_ovary/results/gabrielfonseca/02_kallisto_bus/01_Kbc_results/ --filter bustools -t 28 --workflow nucleus /sn_ovary/data/00_raw_data/input/raw_data/
[2024-03-09 12:28:07,487] INFO [count_lamanno] Using index index_HS_98.idx to generate BUS file to /sn_ovary/results/gabrielfonseca/02_kallisto_bus/01_Kbc_results/ from
[2024-03-09 12:28:07,488] INFO [count_lamanno] /sn_ovary/data/00_raw_data/input/raw_data/
[2024-03-09 12:28:08,599] ERROR [count_lamanno]
[bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology
Error: Number of files (1) does not match number of input files required by technology 10XV1 (3)
kallisto 0.48.0
Generates BUS files for single-cell sequencing
Usage: kallisto bus [arguments] FASTQ-files
Required arguments:
-i, --index=STRING Filename for the kallisto index to be used for
pseudoalignment
-o, --output-dir=STRING Directory to write output to
Optional arguments:
-x, --technology=STRING Single-cell technology used
-l, --list List all single-cell technologies supported
-B, --batch=FILE Process files listed in FILE
-t, --threads=INT Number of threads to use (default: 1)
-b, --bam Input file is a BAM file
-n, --num Output number of read in flag column (incompatible with --bam)
-T, --tag=STRING 5′ tag sequence to identify UMI reads for certain technologies
--fr-stranded Strand specific reads for UMI-tagged reads, first read forward
--rf-stranded Strand specific reads for UMI-tagged reads, first read reverse
--unstranded Treat all read as non-strand-specific
--unstranded Treat all read as non-strand-specific
--paired Treat reads as paired
--genomebam Project pseudoalignments to genome sorted BAM file
-g, --gtf GTF file for transcriptome information
(required for --genomebam)
-c, --chromosomes Tab separated file with chromosome names and lengths
(optional for --genomebam, but recommended)
--verbose Print out progress information every 1M proccessed reads
[2024-03-09 12:28:08,599] ERROR [main] An exception occurred
Traceback (most recent call last):
File "/opt/spack/opt/spack/linux-rocky8-x86_64/gcc-12.2.0/py-kb-python-0.27.3-wdlrh6n6qos4wtlxjqcdgtp7ys2z7gon/lib/python3.10/site-packages/kb_python/main.py",
line 1305, in main
COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
File "/opt/spack/opt/spack/linux-rocky8-x86_64/gcc-12.2.0/py-kb-python-0.27.3-wdlrh6n6qos4wtlxjqcdgtp7ys2z7gon/lib/python3.10/site-packages/kb_python/main.py",
line 491, in parse_count
count_velocity(
File "/opt/spack/opt/spack/linux-rocky8-x86_64/gcc-12.2.0/py-ngs-tools-1.8.1-olf5dpwkrl54xgpw6icmsugxnwpponf6/lib/python3.10/site-packages/ngs_tools/logging.py"
, line 62, in inner
return func(*args, **kwargs)
File "/opt/spack/opt/spack/linux-rocky8-x86_64/gcc-12.2.0/py-kb-python-0.27.3-wdlrh6n6qos4wtlxjqcdgtp7ys2z7gon/lib/python3.10/site-packages/kb_python/count.py",
line 1593, in count_velocity
bus_result = kallisto_bus(
File "/opt/spack/opt/spack/linux-rocky8-x86_64/gcc-12.2.0/py-kb-python-0.27.3-wdlrh6n6qos4wtlxjqcdgtp7ys2z7gon/lib/python3.10/site-packages/kb_python/validate.p
y", line 116, in inner
results = func(*args, **kwargs)
File "/opt/spack/opt/spack/linux-rocky8-x86_64/gcc-12.2.0/py-kb-python-0.27.3-wdlrh6n6qos4wtlxjqcdgtp7ys2z7gon/lib/python3.10/site-packages/kb_python/count.py",
line 150, in kallisto_bus
run_executable(command)
File "/opt/spack/opt/spack/linux-rocky8-x86_64/gcc-12.2.0/py-kb-python-0.27.3-wdlrh6n6qos4wtlxjqcdgtp7ys2z7gon/lib/python3.10/site-packages/kb_python/dry/__init
__.py", line 25, in inner
return func(*args, kwargs)
File "/opt/spack/opt/spack/linux-rocky8-x86_64/gcc-12.2.0/py-kb-python-0.27.3-wdlrh6n6qos4wtlxjqcdgtp7ys2z7gon/lib/python3.10/site-packages/kb_python/utils.py",
line 203, in run_executable
raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/opt/spack/opt/spack/linux-rocky8-x86_64/gcc-12.2.0/py-kb-python-0.27.3-wdlrh6n6qos4wtlxjqcdgtp7ys2z7gon/lib/python3.10/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i index_HS_98.idx -o /sn_ovary/results/gabrielfonseca/02_kallisto_bus/01_Kbc_results/ -x 10xv1 -t 28 /sn_ovary/data/00_raw_data/input/raw_data/' returned non-zero exit status 1.
I have created several indexes with different versions of gtf and genomes human files. The last version I tried was according to this one from 10x (References - 2020-A (July 7, 2020)) but it doesn't worked. I really need to process this data as soon as possible. This is the command I've been using to generate the indexes:
kb ref -i index_HS_98.idx -g t2g.txt -f1 ./cdna.fa -f2 ./introns.fa -c1 cDNA_t2c.txt -c2 introns_t2c.txt --workflow=nucleus /sn_ovary/data/gabrielfonseca/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz.1 /sn_ovary/data/gabrielfonseca/gencode.v32.primary_assembly.annotation.gtf.gz --overwrite