8000 GitHub - cole-group/nagl-mbis: Testing out GCNN for charge prediction
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

cole-group/nagl-mbis

 
 

Repository files navigation

NAGL-MBIS

License: MIT

A collection of models to predict conformation independent MBIS atom-centred charges for molecules, built on the NAGL package by SimonBoothroyd.

Installation

The required dependencies to run these models can be installed using mamba and the provided environment file:

mamba env create -f devtools/conda-envs/env.yaml

You will then need to install this package from source, first clone the repository from github:

git clone https://github.com/bismuthadams1/nagl-mbis.git
cd nagl-mbis

With the nagl environment activated install the models via:

pip install -e . --no-build-isolation 

Quick start

NAGL-MBIS offers a number of pre-trained models to compute conformation-independent MBIS charges, these can be loaded using the following code in a script

from naglmbis.models import load_charge_model

# load two pre-trained charge models
charge_model = load_charge_model(charge_model="nagl-gas-charge-wb")
# load a model trained to scf dipole and mbis charges
charge_model_2 = load_charge_model(charge_model="nagl-gas-charge-dipole-wb")

A list of the available models can be found in naglmbis/models/models.py

We can then use these models to predict the corresponding properties for a given openff-toolkit Molecule object or rdkit Chem.Mol.

from openff.toolkit.topology import Molecule

# create ethanol
ethanol = Molecule.from_smiles("CCO")
# predict the charges (in e)
charges = charge_model.compute_properties(ethanol.to_rdkit())["mbis-charges"]

For computing partially polarised charges, we can use the class ComputePartialPolarised

from openff.toolkit.topology import Molecule
from naglmbis.models.base_model import ComputePartialPolarised
from naglmbis.models import load_charge_model

gas_model = load_charge_model(charge_model="nagl-gas-charge-dipole-esp-wb-default")
water_model = load_charge_model(charge_model="nagl-water-charge-dipole-esp-wb-default")

polarised_model = ComputePartialPolarised(
   model_gas = gas_model,
   model_water = water_model,
   alpha = 0.5 #scaling parameter which can be adjusted
)

partial_charges = polarised_model.compute_polarised_charges(ethanol.to_rdkit())
print(partial_charges)

Using the charges in a simulation

To use the charges in a simulation, we first create an Interchange object (following on from above):

from openff.toolkit import Quantity, unit

charges = polarised_model.compute_polarised_charges(ethanol.to_rdkit())

# Convert the charges to a 1D numpy array
charges = charges.detach().numpy().astype(float).squeeze()

# Assign the charges to the molecule and normalise them
ethanol.partial_charges = Quantity(
            charges,
            unit.elementary_charge,
        )
ethanol._normalize_partial_charges()

Now, create the interchange object. Note that the charge_from_molecules argument is critical, otherwise we'll end up with AM1-BCC charges. Also note that you will need to install openff-interchange e.g. mamba install -c conda-forge openff-interchange.

from openff.toolkit import ForceField
from openff.interchange import Interchange

force_field = ForceField("openff-2.2.1.offxml")
interchange = Interchange.from_smirnoff(force_field=force_field, topology=[ethanol], charge_from_molecules=[ethanol])
print(ethanol.partial_charges)

You can then run a simulation with your engine of chioce, for example with OpenMM as shown here.

Models

Summary of Models

This repository includes several partial charge models. The table below summarizes each model’s training objectives, the level of theory used for the training data (see details below), and the phase (gas or water) in which the QM data was calculated. For brevity:

  • Q = on-atom charges

  • μ = dipole moment

  • V = electrostatic potential (ESP)

Model Training Objective Level of Theory of Training Set Phase
nagl-v1-mbis Q HF/6-31G* - MBIS Charges gas
nagl-v1-mbis-dipole Q, $\mu$ HF/6-31G* - MBIS Charges gas
nagl-gas-charge-wb Q ωB97X-D/def2-TZVPP - MBIS Charges gas
nagl-gas-charge-dipole-wb Q, $\mu$ ωB97X-D/def2-TZVPP - MBIS Charges, QM Dipoles gas
nagl-gas-charge-dipole-esp-wb-default Q, $\mu$, V ωB97X-D/def2-TZVPP- MBIS Charges, QM Dipoles, ESP rebuilt to 1.4-2.0 $\times$ VdW with 0.5Å spacing grid up to MBIS Quadrupole gas
nagl-water-charge-wb Q ωB97X-D/def2-TZVPP - MBIS Charges water
nagl-water-charge-dipole-wb Q, $\mu$ ωB97X-D/def2-TZVPP- MBIS Charges, QM Dipoles water
nagl-water-charge-dipole-esp-wb-default Q, $\mu$, V ωB97X-D/def2-TZVPP - MBIS Charges, QM Dipoles, ESP rebuilt to 1.4-2.0 $\times$ VdW with 0.5Å spacing grid up to MBIS Quadrupole water
nagl-gas-esp-wb-2A Q, $\mu$, V ωB97X-D/def2-TZVPP - MBIS Charges, QM Dipoles, ESP rebuilt to 1.4-2.0 $\times$ VdW with 2Å spacing grid up to MBIS Quadrupole water
nagl-gas-esp-wb-15A Q, $\mu$, V ωB97X-D/def2-TZVPP - MBIS Charges, QM Dipoles, ESP rebuilt to 1.4-2.0 $\times$ VdW with 1.5Å spacing grid up to MBIS Quadrupole water

MBISGraphMode

This model uses a minimal set of basic atomic features including

  • one hot encoded element
  • the number of bonds
  • ring membership of size 3-8
  • n_gcn_layers 5
  • n_gcn_hidden_features 128
  • n_mbis_layers 2
  • n_mbis_hidden_features 64
  • learning_rate 0.001
  • n_epochs 1000

The models in this repo were trained from two QM datasets.

  1. The models starting with nagl-v1:

These models were trained on the OpenFF ESP Fragment Conformers v1.0 dataset which is on QCArchive.

These models were computed using HF/6-31G* with PSI4 and was split 80:10:10 using the deepchem maxmin spliter.

  1. The rest of the models:

These models were trained on the MLPepper RECAP Optimized Fragments v1.0 and MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0 datasets.

These models were computed using $\omega$B79X-d/def2-TZVPP with PSI4 and was split 80:10:10 using the deepchem maxmin spliter.

Training

The training scripts are located in the scripts subfolder in this repo. This is split into further subfolders.

  1. dataset - this subfolder contains all the scripts to pull down the QM data from qcarchive.

About

Testing out GCNN for charge prediction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.6%
  • Jupyter Notebook 1.4%
0