8000 Extend the .counts method of an Alignment by mdehoon · Pull Request #5011 · biopython/biopython · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Extend the .counts method of an Alignment #5011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy sta 8000 tement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 176 commits into
base: master
Choose a base branch
from

Conversation

mdehoon
Copy link
Contributor
@mdehoon mdehoon commented Jun 17, 2025

This PR extends the .counts method of the Alignment class with one optional argument, which can be either

  • a wildcard character
  • a substitution matrix
  • a PairwiseAligner object

The wildcard character is ignored when calculating identities and mismatches.

The substitution matrix is used to calculate the number of positives, and the total substitution score.

The PairwiseAligner object is used to also calculate the gap score, the substitution score, and the alignment score.

Printing the AlignmentCounts object gives detailed information about the identities/mismatches/gaps/scores.

  • I hereby agree to dual licence this and any previous contributions under both
    the Biopython License Agreement AND the BSD 3-Clause License.

  • I have read the CONTRIBUTING.rst file, have run pre-commit
    locally, and understand that continuous integration checks will be used to
    confirm the Biopython unit tests and style checks pass with these changes.

  • I have added my name to the alphabetical contributors listings in the files
    NEWS.rst and CONTRIB.rst as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

@mdehoon
Copy link
Contributor Author
mdehoon commented Jun 18, 2025

The Windows test failures on AppVeyor are concerning.

This is fixed now.

Copy link
codecov bot commented Jun 20, 2025

Codecov Report

Attention: Patch coverage is 82.23350% with 35 lines in your changes missing coverage. Please review.

Project coverage is 86.39%. Comparing base (2811549) to head (371c051).
Report is 104 commits behind head on master.

Files with missing lines Patch % Lines
Bio/Align/substitution_matrices/__init__.py 63.15% 21 Missing ⚠️
Bio/Align/__init__.py 88.88% 14 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5011      +/-   ##
==========================================
+ Coverage   85.37%   86.39%   +1.02%     
==========================================
  Files         282      286       +4     
  Lines       59395    60686    +1291     
==========================================
+ Hits        50706    52432    +1726     
+ Misses       8689     8254     -435     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mdehoon
Copy link
Contributor Author
mdehoon commented Jun 20, 2025

Follow up of PR #4221 for issue #3538

@peterjc
Copy link
Member
peterjc commented Jun 21, 2025

This should definitely be a squash-and-merge into a single commit! Debugging a Windows issue this way is painful isn't it?

@mdehoon
Copy link
Contributor Author
mdehoon commented Jun 21, 2025

@peterjc OK to proceed?

@peterjc
Copy link
Member
peterjc commented Jun 21, 2025

I need to read this properly later - bigger than I expected!

- gap_score - the total gap score (calculated only if the
argument is a pairwise aligner, and set to None
otherwise);
- gaps - the total gap length;;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra semi-colon

@@ -3723,32 +3544,30 @@ def substitutions(self):
start1, start2 = end1, end2
return m

def counts(self, substitution_matrix=None, wildcard=None, ignore_sequences=False):
def counts(self, argument=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why drop the explicit three alternative named arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature in Biopython release 1.85 is def counts(self):, so without the three arguments.

The three possible choices for argument are a wildcard character, a substitution matrix, or a PairwiseAligner object. These are mutually exclusive. For example, it does not make sense to have both a wildcard character and an aligner object.

If we use three explicitly named keywords and e.g. wildcard is the first keyword argument, then users can no longer do counts(aligner) or counts(wildcard), but they have to use the keyword.

@@ -1282,7 +1103,7 @@ def parse_printed_alignment(cls, lines):
nbytes, sequence = parser.feed(line)
sequences.append(sequence)
shape = parser.shape
coordinates = np.empty(shape, np.int64)
coordinates = np.empty(shape, np.intp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type change explains why so many files changed. Should this be transparent to users of the code, or will they need (breaking) changes too?

Copy link
Contributor Author
@mdehoon mdehoon Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using Py_ssize_t variables to represent indices. The correct dtype for that is np.intp. np.int64 will work on most platforms, but strictly speaking it is not correct.

It won't affect any users, except those that are working on an ancient platform where Py_ssize_t is a 32-bit integer. But for those users, having np.in64 here will likely be a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0