Bgreat2 is intended to map reads or read pairs on de Bruijn graphs in an efficient manner. The de Bruijn graph should be represented as a set of unitigs. We advise the use of Bcalm in order to do so (https://github.com/GATB/bcalm). A mapping can be represented by a path in the graph (list of nodes) or by the actual sequences of the graph. This last behavior have been shown able to correct a set a of reads (see https://travis-ci.org/Malfoy/BCOOL).
Bgreat2 now index anchors from the unitigs and do not need a third party tool to align on large unitigs. To do so Bgreat2 index by default all anchors. If too much memory is used try to reduce the proportion of kmer index with -i option as example -i 10 will index 1/10 kmers. Bgreat can be used to correct reads from a DBG (correction mode) or to know where the reads appear in the graph. Bgreat can now map read pairs direclty. Bgreat can handle zipped files.
Unpaired Mapping
Paired mapping, reads should be interleaved
Value of k used to construct the graph
Size of the anchors used to start mapping Can be used if k is way larger than 31
By default Bgreat index all anchors from the graph. With -i 10 it index on out of ten anchors, in order to reduce the memory usage.
Unitig file in fasta This file can be obtained using bcalm (https://github.com/GATB/bcalm) on a reads file
Maximal hamming distance between a read and its corresponding graph sequence for a mapping to be valid Default value is 5
Note that Bgreat ignore quality information
Bgreat will output the sequence of corresponding path of the read in the graph Intuitevely, the read is "corrected" according to the graph sequence, mode used by Bcool corrector (https://github.com/Malfoy/BCOOL)
The advanced options are experimental and in current developpement and should not be used
In the default mode, the numbers outputed correspond to the paths of unitigs a read (or pair of reads) maps on.
>read1
3;4;-6;
Mean that the read1 mapped on unitig 3 then 4 then the reverse complement of the unitig 6.
To get the corresponding sequence the tool numberToSequences will do the conversion (warning: large files may be produced this way due to redundancy of large unitigs)
Usage: ./numbersToSequences unitigs.fa paths 31 > superReads.fa
In this mode the corrected reads are direclty outputed. If the -O option is used, the corrected reads will be in the right order.
Map an unpaired reads file on a low k DBG in a output file "output_paths"
./bgreat -u reads.fa -g dbg27.fa -k 27 -f output_paths
Can also work with zipped files
./bgreat -u reads.fa.gz -g dbg27.fa -k 27 -f output_paths
Map an unpaired reads file on a low k DBG in a output file "output_paths" with a maximum of 2 missmatches
./bgreat -u reads.fa -g dbg27.fa -k 27 -f output_paths -m 2
Map an unpaired reads file in FASTQ on a low k DBG in a output file "output_paths"
./bgreat -u reads.fa -q -g dbg27.fa -k 27 -f output_paths
Map an unpaired reads file on a low k DBG in a output file "output_paths" using 8 cores
./bgreat -u reads.fa -g dbg27.fa -k 27 -f output_paths -t 8
Map a paired reads file (interleaved format) on a low k DBG in a output file "output_paths"
./bgreat -x paired_reads.fa -g dbg27.fa -k 27 -f output_paths
Map a paired reads file (interleaved format) on a high k DBG in a output file "output_paths" with a anchors size of 31 (good value for NGS reads)
./bgreat -x paired_reads.fa -g dbg91.fa -k 91 -f output_paths -a 31
Correct an unpaired reads file on a low k DBG in a output file "reads_cor.fa"
./bgreat -u reads.fa -g dbg27.fa -k 27 -f reads_cor.fa -c
Correct an unpaired reads file on a low k DBG in a compressed output file "reads_cor.fa.gz"
./bgreat -u reads.fa -g dbg27.fa -k 27 -f reads_cor.fa.gz -c
Create superReads from a paired reads file on a low k DBG in a output file "superReads.fa"
./bgreat -x paired_reads.fa -g dbg27.fa -k 27 -f superReads.fa -c