sync #1

starskyzheng · 2022-11-10T12:59:36Z

No description provided.

#2095

- -i/-e filtering expression containing missing values must not crash when subfields are queried on the fly - parsing of subfield arrays with missing values now done correctly Resolves #2098

When both DP and AD values are present, VAF is calculated as AD/DP. This could result in VAF values bigger than 1 and access beyond array boundaries. Fixes #2102

This also - adds a check for cases where the --file-list is empty - adds a synonynomous option --force-no-index to --no-index Resolves #2100

The main function sets mplp.bsmpl = bam_smpl_init() which allocates memory. This is freed at the end of main_mpileup. Unfortunately there are many ways of exiting this function early which don't do this free. Restructured it slightly so the free happens. It's not a serious problem, but it did prevent make check from working when using -fsanitize=address. This wasn't spotted via the CI as it doesn't have a PTY and hence it avoids running the usage tests.

If the filtering expression queried keys from a file, for example -i 'INFO/TAG=@strings_expected' the program would segfault if INFO/TAG was not present in the first VCF record. This is now fixed by making sure NULL is not passed to khash_str2int_has_key. Fixes #2111

For an alignment that doesn't have an indel but is aligned against reads that do have an indel, the indel quality comes from the BAM quality. However we already have indelQ assigned, so this avoids changing to BAM qual if indelQ is zero as that is a special case for a read aligning to multiple indel "types" (lengths) with equal score. This avoids excess AD numbers for poorly chosen alignments. Fixes #2113

CONS_CUTOFF_DEL and CONS_CUTOFF_INC are now 35% instead of 40% (also DEL is new as it previously shared the score with SNPs). Plus when computing the ratio we no longer apply this as a percentage to the entire depth, but as a percentage of the alignments overlapping any STRs spanning this region (so meaningful alignments) that aren't already allocated to the "type" being diagnosed. Both of these make it slightly more likely to emit a second consensus sequence potentially containing indel variations. This fixes the specific problem raised in #2117, but like all heuristics it benefits some cases and harms others. We will always have some cases that fall just either side of the thresholds and look odd. Overall it seems a slim benefit and the work to track the number of meaningful spanning alignments may have potential value in subsequent updates. Fixes #2117

This applies where there are zero observed indels in a sample (but they are in other samples). It's odd how this appears to be a pretty rare change, as we'd expect a significant change to FP, but it's tiny. Genotype assignment error changes a lot, but only in one of the 3 samples I tested. On HG002, I see the following differences to calling rates. Previous All QUAL>=100 InDel TP 11823 / 11798 InDel FP 5374 / 4898 InDel GT 296 / 291 InDel FN 115 / 140 New All QUAL>=100 InDel TP 11822 / 11795 InDel FP 5313 / 4805 InDel GT 80 / 74 InDel FN 116 / 143 This was HG002 called in conjunction with HG003 and HG004, but not as a trio (so no pedegree supplied). Oddly despite that HG002 is much more accurate than HG003 and HG004, with GT assignment error rates an order of magnitude higher. This PR makes them a bit higher still (maybe another 20%). I cannot explain either of these, but perhaps it's simply down to the accuracy of the truth set as HG002 is by far the most widely curated of the three. Either that or my analysis has a flaw somewhere. Fixes #2130.

This didn't work in the past because update_from_fai() opened, read the header from, and then closed the input file, preventing the reheader_* functions from accessing the rest of the file contents when the input is a stream. As all the reheader_* functions read the header into a kstring, it's possible to make streaming work by passing this kstring into update_from_fai() and adjusting it to work directly on that copy of the data. As update_from_fai() no longer needs to write a temporary file, args_t::rm_tmpfile and args_t::tmp_prefix can be removed. The -T option is ignored as it's no longer needed, but is still accepted for compatibility. The init_tmp_prefix() function is still used by some other bcftools subcommands, so is left in place for now.

When setting value by determining index from the genotype, we face the problem of how to interpret truncating arrays. Say we have TAG defined as Number=. and GT:TAG 1/1:0,1,2 0/0:0 Then when querying we expect the following expression to evaluate for the second sample as -i 'TAG[1:1]="."' .. true -i 'TAG[1:GT]="."' .. false The problem is that the implementation truncates the number of fields, filling usually fewer than the original number of per-sample values. This is fixed by adding an exception that makes the code aware of this. Fixes #2133

Using --write-index defaults to --write-index=csi as before, but we can now do --write-index=tbi to get TBI indices instead. This takes precedence over auto-detection based on filename suffix when using the filename##idx##indexname nomenclature. The exception to defaulting to CSI is bcftools isec, which defaults to TBI. This disparity was there before and this PR doesn't change that behaviour. Fixes #2008

Note this is the only subcommand which defaults to writing TBI for VCF.gz. Everything else defaults to CSI. I'm unsure why this is, but I haven't changed it in this PR.

Previously only the last file had the index written.

This was previously segmentation faulting as not being a string means it didn't set the key field. We now report it as a parse error.

Given --write-index now takes an optional argument, and optional long options are --long-opt=arg and short options are -larg, I chose to also accept -l=arg for the sake of consistency and ease of documentation. The standard -larg still works too. If we wish to stick strictly to the normal conventions, then this is a trivial change in version.c (and a search and replace in the documentation). Also fix formatting bug in bcftools merge man page section leading to a lot of underlined text. Fixes #2139

It stated "Missing the --fa-ref option", but the option is "--fasta-ref".

Red Hat's ExtUtils::Embed returns CLAGS & LDFLAGS that produce a PIE position-independent executable. However libhts.a and the rest of bcftools's *.o files are not compiled as PIC, so linking fails as these other objects use relocations that are invalid for PIE. While bcftools could link against libhts.so and compile the rest of its objects with -fpic, the logistics are non-trivial. So it's easier to omit the redhat-hardened-cc1 and redhat-hardened-ld specs that set up for building a PIE executable. Fixes #1322.

…e sites

Resolves #2398

Requires the htslib update samtools/htslib#1912 Resolves #2395

…gins Verbosity values bigger than 3 are passed to the underlying HTSlib library so that the user can investigate network issues and other problems occurring at the library level. Resolves #2235

Now is also supports sample negation as advertised in the manual page, e.g. `-s ^sample1,sample2` to include all samples but sample1 and sample2 Resolves #2380

On little-endian hosts, `int` here caused incorrect results at positions beyond the range of int. On big-endian hosts, it caused incorrect results at all positions.

Calling bgzf_close() (and hts_close()) frees the file pointer, so the errcode field cannot be accessed afterwards. Report errors as per errno instead as is done in vcfconvert.c. Also report the right file pointer's errcode after bgzf_write() failure.

When the gVCF contains overlapping blocks, they would trigger an infinite loop in the program and it would never finish Resolves #2410

Prior to C23, declaring a variable after a case label is an error. Introduce a function encapsulating --verbosity parsing to avoid needing a local variable within this option parsing case. Similarly in vcfmerge.c, avoid declaring a variable after a goto label.

MINGW x64 works fine, but UCRT-x64 fails one of the setGT tests. Specifically, `-n c:././.` gets silently rewritten before main() so the argv element contains `-n c;.\.\.` instead. This is a pain, and arguably we should provide a less problematic CLI alternative for this as c: was always going to trip up Windows boxes. However for now we can work around it by not having it as a separate argument and letting getopt do the tokenisation of argv into option and optarg for us by removing the space. This doesn't fix the bug obviously, but it makes it pass tests if using the UCRT environment.

This is to allow renaming samples from a list of samples on command line, rather than from a file of sample names. Unfortunately, the existing option `-s, --samples` conflicts with the rest of bcftools, therefore this is added as -n, --samples-list LIST New sample names given as a comma-separated list -N, --samples-file FILE New sample names in a file, see the man page for details The old option remains valid but is not advertised in the usage page. Resolves #2383

pd3 force-pushed the develop branch 2 times, most recently from ab5de54 to 2191405 Compare February 13, 2023 10:34

pd3 force-pushed the develop branch from 2ec0dd1 to a1b781d Compare November 1, 2023 10:54

pd3 and others added 27 commits February 12, 2024 14:32

Exit with an informative error message when wrong format given. Resolves

12a6617

#2095

Fix two bugs in vep-split

89a0309

- -i/-e filtering expression containing missing values must not crash when subfields are queried on the fly - parsing of subfield arrays with missing values now done correctly Resolves #2098

Fix a silly bug introduced by 12a6617

9bfaa6d

Fix another silly bug

c63329b

Prevent segfault on invalid DP/AD values

1a975b3

When both DP and AD values are present, VAF is calculated as AD/DP. This could result in VAF values bigger than 1 and access beyond array boundaries. Fixes #2102

Add new option --force-single to support single-file edge case

d092f00

This also - adds a check for cases where the --file-list is empty - adds a synonynomous option --force-no-index to --no-index Resolves #2100

Update NEWS

9b83399

Minor documentation fix

bbab29d

Make bcftools isec --write-index=FMT apply to isec directory output too

9cc4520

Note this is the only subcommand which defaults to writing TBI for VCF.gz. Everything else defaults to CSI. I'm unsure why this is, but I haven't changed it in this PR.

Fix +scatter -n so it honours --write-index

d2105b6

Previously only the last file had the index written.

Fix a bug in expression parsing for type=INDEL with missing quotes.

274040c

This was previously segmentation faulting as not being a string means it didn't set the key field. We now report it as a parse error.

Add documentation on the optional =FMT bit of --write-index.

142fd1a

Fix csq error message for now fasta file.

106f158

It stated "Missing the --fa-ref option", but the option is "--fasta-ref".

Update documentation

78ed055

Merge branch 'develop' of github.com:samtools/bcftools into develop

2a61cd4

Add usage case to demonstrate 78ed055

2abc298

Support for conversion from tags using localized alleles (e.g. LPL, LAD)

466ceae

pd3 and others added 30 commits April 10, 2025 09:58

Add missing tests

06ed647

Remove unused variable to prevent -Werror=unused-variable failure

9cff2f4

Fix a bug, a missing denominator in MEAN calculation

068da7a

Add the misc/vrfs-variances script

790c317

The -r/-R option newly merge overlapping regions, preventing duplicat…

c32a1af

…e sites

Fix a memmove bug

e9a12b9

Fix a misplaced warning

7811b22

Make the concat -G option work for plain VCFs. Fixes #2392

78099f2

New option to remove or annotate clusters of sites within a window

1719c60

Add tests for mpileup non-ACGT characters, see #2393

0bce7ab

Fix a bug, the -S, --samples-file option is no longer ignored

f85deee

Resolves #2398

Add experimental, and for now hidden, option --hts-verbose

9f4bef6

Make --hts-verbose work, incorrect optkey was used

65cbaea

test update with updated rlen calculation in htslib

34c3edd

Use the highest VCF version when merging headers

05621cf

Requires the htslib update samtools/htslib#1912 Resolves #2395

Add the option -v, --verbosity INT to all bcftools commands and plu…

056ae51

…gins Verbosity values bigger than 3 are passed to the underlying HTSlib library so that the user can investigate network issues and other problems occurring at the library level. Resolves #2235

The -s, --samples option was not working properly

70ca3eb

Now is also supports sample negation as advertised in the manual page, e.g. `-s ^sample1,sample2` to include all samples but sample1 and sample2 Resolves #2380

Use correct type in mpileup_get_val()

10853dc

On little-endian hosts, `int` here caused incorrect results at positions beyond the range of int. On big-endian hosts, it caused incorrect results at all positions.

Fix Makefile dependencies and clean mpileup2/*.o

aff7d42

Include <strings.h> for strncasecmp() declaration [minor]

21b17b5

Merge branch 'develop' of pd3-github:samtools/bcftools into develop

a72744f

Copyright year update and build cleanup

3ead9bf

Merge branch 'develop' of pd3-github:samtools/bcftools into develop

f310e99

Fix a bug for incorrectly formatted gVCF files

283023b

When the gVCF contains overlapping blocks, they would trigger an infinite loop in the program and it would never finish Resolves #2410

Release 1.22

0949c4d

Merge version number bump and NEWS file from master

38e0139

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sync #1

sync #1

Uh oh!

Uh oh!

Uh oh!

sync #1

Are you sure you want to change the base?

sync #1

Uh oh!

Conversation

Uh oh!

Uh oh!