-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Extend the .counts method of an Alignment #5011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy sta 8000 tement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This is fixed now. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5011 +/- ##
==========================================
+ Coverage 85.37% 86.39% +1.02%
==========================================
Files 282 286 +4
Lines 59395 60686 +1291
==========================================
+ Hits 50706 52432 +1726
+ Misses 8689 8254 -435 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This should definitely be a squash-and-merge into a single commit! Debugging a Windows issue this way is painful isn't it? |
@peterjc OK to proceed? |
I need to read this properly later - bigger than I expected! |
Bio/Align/__init__.py
Outdated
- gap_score - the total gap score (calculated only if the | ||
argument is a pairwise aligner, and set to None | ||
otherwise); | ||
- gaps - the total gap length;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra semi-colon
@@ -3723,32 +3544,30 @@ def substitutions(self): | |||
start1, start2 = end1, end2 | |||
return m | |||
|
|||
def counts(self, substitution_matrix=None, wildcard=None, ignore_sequences=False): | |||
def counts(self, argument=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why drop the explicit three alternative named arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature in Biopython release 1.85 is def counts(self):
, so without the three arguments.
The three possible choices for argument
are a wildcard character, a substitution matrix, or a PairwiseAligner
object. These are mutually exclusive. For example, it does not make sense to have both a wildcard character and an aligner object.
If we use three explicitly named keywords and e.g. wildcard
is the first keyword argument, then users can no longer do counts(aligner)
or counts(wildcard)
, but they have to use the keyword.
@@ -1282,7 +1103,7 @@ def parse_printed_alignment(cls, lines): | |||
nbytes, sequence = parser.feed(line) | |||
sequences.append(sequence) | |||
shape = parser.shape | |||
coordinates = np.empty(shape, np.int64) | |||
coordinates = np.empty(shape, np.intp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type change explains why so many files changed. Should this be transparent to users of the code, or will they need (breaking) changes too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using Py_ssize_t
variables to represent indices. The correct dtype
for that is np.intp
. np.int64
will work on most platforms, but strictly speaking it is not correct.
It won't affect any users, except those that are working on an ancient platform where Py_ssize_t
is a 32-bit integer. But for those users, having np.in64
here will likely be a bug.
This PR extends the
.counts
method of theAlignment
class with one optional argument, which can be eitherPairwiseAligner
objectThe wildcard character is ignored when calculating identities and mismatches.
The substitution matrix is used to calculate the number of positives, and the total substitution score.
The
PairwiseAligner
object is used to also calculate the gap score, the substitution score, and the alignment score.Printing the
AlignmentCounts
object gives detailed information about the identities/mismatches/gaps/scores.I hereby agree to dual licence this and any previous contributions under both
the Biopython License Agreement AND the BSD 3-Clause License.
I have read the
CONTRIBUTING.rst
file, have runpre-commit
locally, and understand that continuous integration checks will be used to
confirm the Biopython unit tests and style checks pass with these changes.
I have added my name to the alphabetical contributors listings in the files
NEWS.rst
andCONTRIB.rst
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)