8000 Slicing will no longer drop the NaN rows by xiki-tempula · Pull Request #275 · alchemistry/alchemlyb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Slicing will no longer drop the NaN rows #275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Fixes
is not found (issue #272, PR #273).
- The regex in the AMBER parser now reads also 'field=value' pairs where
there are no spaces around the equal sign (issue #272, PR #273).
- Pre-processing function slicing will not drop NaN rows (issue #274, PR #275).


10/31/2022 orbeckst, xiki-tempula, DrDomenicoMarson
Expand Down
6 changes: 3 additions & 3 deletions src/alchemlyb/preprocessing/subsampling.py
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,9 @@ def slicing(df, lower=None, upper=None, step=None, force=False):
DataFrame
`df` subsampled.


.. versionchanged:: 1.0.1
The rows with NaN values are not dropped by default.
"""
try:
df = df.loc[lower:upper:step]
Expand All @@ -391,9 +394,6 @@ def slicing(df, lower=None, upper=None, step=None, force=False):
"to use slicing on DataFrames with unique time values "
"for each row. Use `force=True` to ignore this error.")

# drop any rows that have missing values
df = df.dropna()

return df


Expand Down
25 changes: 19 additions & 6 deletions src/alchemlyb/tests/test_preprocessing.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
"""Tests for preprocessing functions.

"""
import pytest

import alchemtest.gmx
import numpy as np
import pytest
from alchemtest.gmx import load_benzene
from alchemtest.namd import load_idws
from numpy.testing import assert_allclose

import alchemlyb
from alchemlyb.parsing import gmx
from alchemlyb.parsing import gmx, namd
from alchemlyb.parsing.gmx import extract_u_nk, extract_dHdl
from alchemlyb.preprocessing import (slicing, statistical_inefficiency,
equilibrium_detection,
decorrelate_u_nk, decorrelate_dhdl,
u_nk2series, dhdl2series)
from alchemlyb.parsing.gmx import extract_u_nk, extract_dHdl
from alchemtest.gmx import load_benzene, load_ABFE

import alchemtest.gmx

def gmx_benzene_dHdl():
dataset = alchemtest.gmx.load_benzene()
Expand Down Expand Up @@ -86,6 +86,19 @@ def slicer(self, *args, **kwargs):
def test_basic_slicing(self, data, size):
assert len(self.slicer(data, lower=1000, upper=34000, step=5)) == size

def test_unchanged(self):
# NAMD energy files only have dE for adjacent lambdas, this ensures
# that the slicer will not drop these rows as they have NaN values.
file = load_idws().data['forward'][0]
u_nk = namd.extract_u_nk(file, 298)

# Do the pre-processing as the u_nk are from all lambdas
groups = u_nk.groupby('fep-lambda')
for key, group in groups:
group = group[~group.index.duplicated(keep='first')]
df = self.slicer(group, None, None, None)
assert len(df) == len(group)

@pytest.mark.parametrize(('dataloader', 'lower', 'upper'),
[
('gmx_benzene_dHdl_fixture', 1000, 34000),
Expand Down
0