Tags · GenePlus/cnvkit

v0.9.7.b0

Version 0.9.7-beta

This release contains several major enhancements particularly relevant to germline
analysis. If used in production pipelines, further evaluation and benchmarking would be
wise. Highlights:

**Control sample clustering**: To make better use of larger reference sample pools,
`reference --cluster` will correlate the given normal samples' bin-wise coverage depths
to extract clusters to be used as reference profiles. The reference .cnn file produced
this way will then contain the `log2` and `spread` summary statistics for each cluster,
in addition to the global summary stats. Given this "clustered reference" profile, `fix
--cluster` will then correlate each test sample to each clustered `log2` profile in the
reference to choose the most relevant control pool for normalization. The `batch` option
`--cluster` will perform both these steps. Nod to Gambin lab and the authors of
ExomeDepth, CoNVaDING, CLAMMS, and others for inspiration. (etal#308)

Calculation of bin weights has changed. **This will change your segmentation results**,
hopefully for the better. Details below. (etal#429)

The `batch` pipeline now performs some **segmentation post-processing** automatically:
calculating and filtering segmentation calls by 50% confidence intervals of the segment
mean log2 ratios, in order to reduce false positives, followed by separate bin-level
testing to detect small (e.g. exon-size) CNVs that were not caught by segmentation.
The bin- and segment-level results are returned as separate .cns files; deciding whether
and how to combine or use these results together is left as an exercise for the user.

We've **dropped Python 2.7 support**. Python version 3.5 or later is now required.

This is a beta release. Please let me know how it works for you via the Issues page. If
this release contains any issues that are blocking your work, try installing one of the
previous stable versions 0.9.6 or 0.9.5::

conda install cnvkit=0.9.6

Dependencies
------------

- Remove all Python 2.7 compatibility shims.
- Raise minimum pandas version from 0.20.1 to 0.23.3.
- Add scikit-learn (dependency of pomegranate, for HMM segmentation). Remove the older
hmmlearn implementation.

Commands
--------

`batch`:

- Post-process segments with `segmetrics` (50% CI), `call` (filter by CI, but don't call
integer copy number), and `bintest`.
- Return `bintest` result as a separate, independent .cns output.
- Add option '--segment-method', equivalent to `segment -m`.
- Rename option '--method' to '--seq-method' (but '--method' still accepted for now).
- Add option `--cluster`, passed to `reference` and `fix` if given. (etal#308)

`bintest`:

- New command superseding `cnv_ztest.py` script.
- Report p-value as a column `p_bintest` (previously `ztest`) in the .cns output.
- Fix probabilities for positive log2 values, i.e. gains, which previously always had
p-value = 1.0. (etal#429)

`fix`:

- Change calculation of bin weights to be more consistent with `1-var` meaning,
with more emphasis on reference spread. It is now simpler, more consistent with
`import-rna`, and particularly improves the accuracy of `bintest`. (etal#429)
- Squeeze the range of reference-free weights
- Drop bins with gc outside [.3, .7]. CLAMMS paper shows these bins carry no useful
signal.
- With `--cluster` and a clustered reference input, calculate the test sample's Pearson
correlation versus each cluster's log2, and take the best one for normalization.

`reference`:

- With `--cluster`, do k-means clustering of the sample bin-level read depth correlation
matrix, per [Kusmirek et al. 2018](https://doi.org/10.1101/478313).
Parameter k defaults to the cube root of number of samples. Only clusters of at least
4 samples are kept for emitting summary statistics in the reference profile.

`segment`:

- hmm: Fix pomegranate-based implementation. Use iterative Savitzky-Golay smoothing with
a narrow bandwidth.
- Use HMM for post-TCN segmentation on VCF allele freqs
- Add parameter for smoothing before CBS (thanks @EwaMarek)

`segmetrics`:

- Add 'ttest' option for 1-sample t-test p-value.
- Implement & expose --smooth-bootstrap option. For smoothing, KDE bandwidth is based
on each bin's weight as a proxy for the SD of its log2 ratio values. To reduce the
risk of over-smoothing on larger sample sizes, we use a loose interpretation of
Silverman's Rule to reduce the bandwidth as the number of bins in a segment increases
(k^-1/4).

API
---

- `do_heatmap`: Add 'ax' parameter (thanks @fbrundu)
- `CNA.residuals()`: speed; keep index intact in returned pd.Series
- smoothing: Linearly roll-off weights in mirrored wings. Affects CNA.smoothed() /
savgol, but not rolling median bias correction.
- Rename `CNA.smoothed()` to `CNA.smooth_log2()`, since it returns the smoothed log2
values, not a new/altered CNA.

Bug fixes
---------

- `batch`: Fix argparse formatting issue (etal#466)
- `import-rna`: Fix a regression in reading 2-column per-gene counts (`-f counts`).
- `reference`: Fix sex inference/usage when creating haploid-x reference (etal#459; thanks
@duartemolha)
- `scatter`: Use a safe matplotlib backend on OS X to avoid crash
- VariantArray: Fix/streamline indexing of variants by bin/segment

Nov 30, 2019
ac14e5a
zip
tar.gz

v0.9.6

Version 0.9.6

=============

Much-needed maintenance and bug fixes, for the most part. Some key dependencies
have changed, though this should be generally painless for you, and one or two
regressions introduced by recent optimizations have been fixed.

This will be the last CNVkit version to run on Python 2.7. The next major
release of pandas (0.25.0) will remove support for Python 2.7, and once that
happens it will become increasingly difficult to install future versions of
CNVkit on Python 2.7 -- so we're not going to try.

The segmentation method `flasso` depends on the R package `cghFLasso`, which is
unmaintained and has been removed from CRAN.  For now, `segment -m flasso` is
still supported if you already have `cghFLasso` installed. But given the above,
`flasso` will be removed from the next CNVkit version in favor of the HMM-based
methods.

Dependencies
------------

- Raised minimum pandas version from 0.18.1 to 0.20.1, and support up to 0.24.2,
  resolving some warnings and an error in pandas 0.22+. (etal#413; thanks @chapmanb)
- The soft dependency on `hmmlearn` is replaced with an explicit dependency on
  `pomegranate` for the HMM-based segmentation methods. This dependency will now
  be pulled in automatically when installing via `pip` or `conda`.
- The R package `cghFLasso` has been removed from CRAN, and therefore is no
  longer a dependency of CNVkit and will not be installed automatically through
  the standard `conda` installation method. (etal#419)

Commands
--------

`antitarget`:

- Be more specific in removing noncanonical chromosomes (e.g. alternate
  contigs, mitochondria) from the binned regions. This avoids skipping
  chromosomes of interest in some non-human genomes with non-numeric contig
  names, like yeast. (etal#388; credit for regexes to @brentp)

`coverage`:

- With `--count-reads`, use query aligned length to handle soft-clipped reads
  properly. Now the results with and without this option should be similar.
(etal#411; thanks @desnar)

`segment`:

- For `-m flasso`, partition array by chromosome to avoid edge effects. (etal#409, etal#412; thanks @giladmishne)
- Removed the deprecated option `--rlibpath`; use `--rscript-path` instead.
- Note that the HMM methods are still provisional. A stable, supported version
  of these methods will be provided in the next CNVkit release.

Python API
----------

- `do_scatter` now returns a figure (etal#408; thanks @jeremy9959)

Bug fixes
---------

- `scatter`: Whole chromosomes can once again be specified with `-c`. (In the
  previous release, a chromosome without coordinates would cause an IndexError.)
  (etal#393)
- `import-rna`: Option --max-log2 can now be specified by users. (Previously,
  only the default value of +3.0 worked.)
- VCF I/O (`skgenome.tabio`): Support GATK 4's VCF files that contain records
  with empty ALT alleles, substituting zero if ALT AD is missing. (etal#391; thanks
  @chapmanb)
- Due to a certain versioning-dependent interaction between numpy, pandas,
  cython, and conda (details [here](numpy/numpy#432)),
  CNVkit may have printed spurious RuntimeWarning messages which could be safely
  ignored. The current release attempts to silence these messages if they occur.
  (etal#390).

Mar 21, 2019
1c8d69d
zip
tar.gz

v0.9.5

Minor bugfix and usability improvement.

`autobin`:
    Ensure targets are non-empty and match BAM chrom names (closes etal#371)

`segment`:
    segment: Suppress help text for deprecated --rlibpath (etal#317)
    segment: Fix help text display (etal#380)

Aug 13, 2018
fd35552
zip
tar.gz

v0.9.4

Bump version to 0.9.4

Aug 2, 2018
6a6266b
zip
tar.gz

v0.9.3

Version 0.9.3

This release fixes a single bug that caused the `segmetrics` command to crash
(etal#325).

Specifically, the command would crash unless at least one option from each of
the following option sets was specified:

- Location statistics: --mean, --median, --mode
- Spread statistics: --stdev, --sem, --mad, --mse, --iqr, --bivar
- Interval statistics: --ci, --pi

This bug would not be triggered by calling `cnvlib.do_segmetrics` through the
Python API, which is why it was not caught in automated testing.

Mar 6, 2018
9bdb083
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.9.10

v0.9.9

v0.9.8

v0.9.7

v0.9.7.b1

v0.9.7.b0

v0.9.6

v0.9.5

v0.9.4

v0.9.3

Tags: GenePlus/cnvkit