A collection of models to predict conformation independent MBIS atom-centred charges for molecules, built on the NAGL package by SimonBoothroyd.
The required dependencies to run these models can be installed using mamba
and the provided environment file:
mamba env create -f devtools/conda-envs/env.yaml
You will then need to install this package from source, first clone the repository from github:
git clone https://github.com/bismuthadams1/nagl-mbis.git
cd nagl-mbis
With the nagl environment activated install the models via:
pip install -e . --no-build-isolation
NAGL-MBIS offers a number of pre-trained models to compute conformation-independent MBIS charges, these can be loaded using the following code in a script
from naglmbis.models import load_charge_model
# load two pre-trained charge models
charge_model = load_charge_model(charge_model="nagl-gas-charge-wb")
# load a model trained to scf dipole and mbis charges
charge_model_2 = load_charge_model(charge_model="nagl-gas-charge-dipole-wb")
A list of the available models can be found in naglmbis/models/models.py
We can then use these models to predict the corresponding properties for a given openff-toolkit Molecule object or rdkit Chem.Mol
.
from openff.toolkit.topology import Molecule
# create ethanol
ethanol = Molecule.from_smiles("CCO")
# predict the charges (in e)
charges = charge_model.compute_properties(ethanol.to_rdkit())["mbis-charges"]
For computing partially polarised charges, we can use the class ComputePartialPolarised
from openff.toolkit.topology import Molecule
from naglmbis.models.base_model import ComputePartialPolarised
from naglmbis.models import load_charge_model
gas_model = load_charge_model(charge_model="nagl-gas-charge-dipole-esp-wb-default")
water_model = load_charge_model(charge_model="nagl-water-charge-dipole-esp-wb-default")
polarised_model = ComputePartialPolarised(
model_gas = gas_model,
model_water = water_model,
alpha = 0.5 #scaling parameter which can be adjusted
)
partial_charges = polarised_model.compute_polarised_charges(ethanol.to_rdkit())
print(partial_charges)
To use the charges in a simulation, we first create an Interchange object (following on from above):
from openff.toolkit import Quantity, unit
charges = polarised_model.compute_polarised_charges(ethanol.to_rdkit())
# Convert the charges to a 1D numpy array
charges = charges.detach().numpy().astype(float).squeeze()
# Assign the charges to the molecule and normalise them
ethanol.partial_charges = Quantity(
charges,
unit.elementary_charge,
)
ethanol._normalize_partial_charges()
Now, create the interchange object. Note that the charge_from_molecules argument is critical, otherwise we'll end up with AM1-BCC charges. Also note that you will need to install openff-interchange e.g. mamba install -c conda-forge openff-interchange
.
from openff.toolkit import ForceField
from openff.interchange import Interchange
force_field = ForceField("openff-2.2.1.offxml")
interchange = Interchange.from_smirnoff(force_field=force_field, topology=[ethanol], charge_from_molecules=[ethanol])
print(ethanol.partial_charges)
You can then run a simulation with your engine of chioce, for example with OpenMM as shown here.
This repository includes several partial charge models. The table below summarizes each model’s training objectives, the level of theory used for the training data (see details below), and the phase (gas or water) in which the QM data was calculated. For brevity:
-
Q = on-atom charges
-
μ = dipole moment
-
V = electrostatic potential (ESP)
Model | Training Objective | Level of Theory of Training Set | Phase |
---|---|---|---|
nagl-v1-mbis |
Q | HF/6-31G* - MBIS Charges | gas |
nagl-v1-mbis-dipole |
Q, |
HF/6-31G* - MBIS Charges | gas |
nagl-gas-charge-wb |
Q | ωB97X-D/def2-TZVPP - MBIS Charges | gas |
nagl-gas-charge-dipole-wb |
Q, |
ωB97X-D/def2-TZVPP - MBIS Charges, QM Dipoles | gas |
nagl-gas-charge-dipole-esp-wb-default |
Q, |
ωB97X-D/def2-TZVPP- MBIS Charges, QM Dipoles, ESP rebuilt to 1.4-2.0 |
gas |
nagl-water-charge-wb |
Q | ωB97X-D/def2-TZVPP - MBIS Charges | water |
nagl-water-charge-dipole-wb |
Q, |
ωB97X-D/def2-TZVPP- MBIS Charges, QM Dipoles | water |
nagl-water-charge-dipole-esp-wb-default |
Q, |
ωB97X-D/def2-TZVPP - MBIS Charges, QM Dipoles, ESP rebuilt to 1.4-2.0 |
water |
nagl-gas-esp-wb-2A |
Q, |
ωB97X-D/def2-TZVPP - MBIS Charges, QM Dipoles, ESP rebuilt to 1.4-2.0 |
water |
nagl-gas-esp-wb-15A |
Q, |
ωB97X-D/def2-TZVPP - MBIS Charges, QM Dipoles, ESP rebuilt to 1.4-2.0 |
water |
This model uses a minimal set of basic atomic features including
- one hot encoded element
- the number of bonds
- ring membership of size 3-8
- n_gcn_layers 5
- n_gcn_hidden_features 128
- n_mbis_layers 2
- n_mbis_hidden_features 64
- learning_rate 0.001
- n_epochs 1000
The models in this repo were trained from two QM datasets.
- The models starting with
nagl-v1
:
These models were trained on the OpenFF ESP Fragment Conformers v1.0 dataset which is on QCArchive.
These models were computed using HF/6-31G* with PSI4 and was split 80:10:10 using the deepchem maxmin spliter.
- The rest of the models:
These models were trained on the MLPepper RECAP Optimized Fragments v1.0 and MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0 datasets.
These models were computed using $\omega$B79X-d/def2-TZVPP with PSI4 and was split 80:10:10 using the deepchem maxmin spliter.
The training scripts are located in the scripts subfolder in this repo. This is split into further subfolders.
- dataset - this subfolder contains all the scripts to pull down the QM data from qcarchive.