8000 GitHub - HongxinXiang/EDBench
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

HongxinXiang/EDBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

17 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ‹ EDBench: Large-Scale Electron Density Data for Molecular Modeling

Python 3.7+ GitHub GitHub last commit


๐Ÿ“ Project Directory / Table of Contents

๐Ÿ“ข News

  • [2025/05/13] Uploaded code of prediction tasks with X-3D and PointVector.

  • [2025/05/10] Repository initialized!


๐Ÿงช 1. Summary

Most existing molecular machine learning force fields (MLFFs) focus on atom- or molecule-level properties like energy and forces, while overlooking the foundational role of electron density (ED), denoted as $\rho(r)$. According to the Hohenbergโ€“Kohn theorem, ED uniquely determines all ground-state properties of many-body quantum systems. However, ED is expensive to compute via first-principles methods such as Density Functional Theory (DFT), limiting its large-scale use in MLFFs.

EDBench ๐Ÿ‹ addresses this gap by providing a large-scale, high-quality dataset of electron densities for over 3.3 million molecules, based on the PCQM4Mv2 standard. To benchmark electronic-scale learning, we introduce a suite of ED-centric tasks covering:

  • Prediction of quantum chemical properties
  • Retrieval across structure and ED modalities
  • Generation of ED from molecular structures

We demonstrate that ML models can learn from ED with high accuracy and also generate high-quality ED, dramatically reducing DFT costs. All data and benchmarks will be made publicly available to support ED-driven research in drug discovery and materials science.

๐Ÿ“„ Citation

@misc{xiang2025edbenchlargescaleelectrondensity,
  title        = {EDBench: Large-Scale Electron Density Data for Molecular Modeling},
  author       = {Hongxin Xiang and Ke Li and Mingquan Liu and Zhixiang Cheng and Bin Yao and Wenjie Du and Jun Xia and Li Zeng and Xin Jin and Xiangxiang Zeng},
  year         = {2025},
  eprint       = {2505.09262},
  archivePrefix= {arXiv},
  primaryClass = {physics.chem-ph},
  url          = {https://arxiv.org/abs/2505.09262}


๐Ÿงฌ 2. EDBench Database

Built on PCQM4Mv2, the EDBench dataset contains accurate DFT-computed EDs for 3.3M+ molecules, enabling deep learning at the electronic scale.


๐Ÿงช 3. Benchmark Tasks

We design a suite of benchmark tasks centered on electron density (ED):

๐Ÿ”ฎ 3.1 Prediction Tasks

Predict quantum chemical properties from ED representations.

๐Ÿ“‚ Click to expand the directory structure
{benchmark_root}
{benchmark root}
โ”œโ”€โ”€ ed_energy_5w
โ”‚   โ”œโ”€โ”€ raw
โ”‚   โ”‚   โ”œโ”€โ”€ ed_energy_5w.csv
โ”‚   โ”‚   โ”œโ”€โ”€ readme.md
โ”‚   โ”‚   โ””โ”€โ”€ psi4_grid0.4_cubes
โ”‚   โ”‚       โ””โ”€โ”€ {mol_index}
โ”‚   โ”‚           โ”œโ”€โ”€ Mol1_Dt.cube
โ”‚   โ”‚           โ”œโ”€โ”€ timer.dat
โ”‚   โ”‚           โ”œโ”€โ”€ Mol1.sdf
โ”‚   โ”‚           โ”œโ”€โ”€ Mol1_ESP.cube
โ”‚   โ”‚           โ””โ”€โ”€ {mol_index}_Psi4.out
โ”‚   โ””โ”€โ”€ processed
โ”‚       โ””โ”€โ”€ mol_EDthresh{thresh}_data.pkl
โ”œโ”€โ”€ ed_homo_lumo_5w
โ”‚   โ”œโ”€โ”€ raw
โ”‚   โ”‚   โ”œโ”€โ”€ ed_homo_lumo_5w.csv
โ”‚   โ”‚   โ”œโ”€โ”€ readme.md
โ”‚   โ”‚   โ””โ”€โ”€ psi4_grid0.4_cubes
โ”‚   โ””โ”€โ”€ processed
โ”‚       โ””โ”€โ”€ mol_EDthresh{thresh}_data.pkl
โ”œโ”€โ”€ ed_multipole_moments_5w
โ”‚   โ”œโ”€โ”€ raw
โ”‚   โ”‚   โ”œโ”€โ”€ ed_multipole_moments_5w.csv
โ”‚   โ”‚   โ”œโ”€โ”€ readme.md
โ”‚   โ”‚   โ””โ”€โ”€ psi4_grid0.4_cubes
โ”‚   โ””โ”€โ”€ processed
โ”‚       โ””โ”€โ”€ mol_EDthresh{thresh}_data.pkl
โ””โ”€โ”€ ed_open_shell_5w
    โ”œโ”€โ”€ raw
    โ”‚   โ”œโ”€โ”€ ed_open_shell_5w.csv
    โ”‚   โ”œโ”€โ”€ readme.md
    โ”‚   โ””โ”€โ”€ psi4_grid0.4_cubes
    โ””โ”€โ”€ processed
        โ””โ”€โ”€ mol_EDthresh{thresh}_data.pkl
Dataset Dir Name Link Description
ED5-EC ed_energy_5w Dataverse 6 energy components (DF-RKS Final Energy, Nuclear Repulsion Energy, One-Electron Energy, Two-Electron Energy, DFT Exchange-Correlation Energy, Total Energy)
ED5-OE ed_homo_lumo_5w Dataverse 7 orbital energies (HOMO-2, HOMO-1, HOMO-0, LUMO+0, LUMO+1, LUMO+2, LUMO+3)
ED5-MM ed_multipole_moments_5w Dataverse 4 multipole moment (Dipole X, Dipole Y, Dipole Z, Magnitude)
ED5-OCS ed_open_shell_5w Dataverse Binary classification of open-/closed-shell systems

๐Ÿ” 3.2 Retrieval Task

Cross-modal retrieval between molecular structures (MS) and electron densities (ED).

๐Ÿ“‚ Click to expand the directory structure
{benchmark_root}
โ”œโ”€โ”€ ed_retrieval_5w/
โ”‚   โ”œโ”€โ”€ raw/
โ”‚   โ”‚   โ”œโ”€โ”€ ed_retrieval_5w.csv
โ”‚   โ”‚   โ”œโ”€โ”€ readme.md
โ”‚   โ”‚   โ””โ”€โ”€ psi4_grid0.4_cubes/
โ”‚   โ””โ”€โ”€ processed/
โ”‚       โ””โ”€โ”€ mol_EDthresh{thresh}_data.pkl
Dataset Dir Name Link Description
ED5-MER ed_retrieval_5w Dataverse Cross-modal retrieval: MS โ†” ED

๐Ÿงฌ 3.3 Generation Task

Generate ED representations from molecular structures.

๐Ÿ“‚ Click to expand the directory structure
{benchmark_root}
โ”œโ”€โ”€ ed_prediction_5w/
โ”‚   โ”œโ”€โ”€ raw/
โ”‚   โ”‚   โ”œโ”€โ”€ ed_prediction_5w.csv
โ”‚   โ”‚   โ”œโ”€โ”€ readme.md
โ”‚   โ”‚   โ””โ”€โ”€ psi4_grid0.4_cubes/
โ”‚   โ””โ”€โ”€ processed/
โ”‚       โ””โ”€โ”€ mol_EDthresh{thresh}_data.pkl
Dataset Dir Name Link Description
ED5-EDP ed_prediction_5w Dataverse Predict ED from molecular structures

๐Ÿ“‚ 3.4 Dataset File Format

Each raw/ directory includes a .csv summary file describing each molecule.

๐Ÿ“Œ Common Columns

  • index: Molecule index
  • smiles: Original SMILES
  • canonical_smiles: Canonicalized SMILES
  • scaffold_split: Scaffold-based split (80% train / 10% valid / 10% test)
  • random_split: Random split (80% train / 10% valid / 10% test)

๐Ÿงพ Task-Specific Columns

  • Prediction:
    • label: Ground-truth values (space-separated if multi-task)
  • Retrieval:
    • negative_index: Space-separated indices of 10 negative samples

๐Ÿš€ 4. Running Benchmarks

โš›๏ธ 4.1 Prediction Tasks

The code and detailed instructions for running prediction tasks can be found in this ๐Ÿ“‚directory.


๐Ÿ“ฌ Contact

Feel free to open an issue or pull request for questions or contributions. For academic inquiries, contact the authors upon paper publication.


๐Ÿ“˜ License

Released for research use under an open-source MIT license.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0