- ๐ข News
- ๐งช 1. Summary
- ๐งฌ 2. EDBench Database
- ๐งช 3. Benchmark Tasks
- ๐ 4. Running Benchmarks
- ๐ฌ Contact
- ๐ License
-
[2025/05/13] Uploaded code of prediction tasks with X-3D and PointVector.
-
[2025/05/10] Repository initialized!
Most existing molecular machine learning force fields (MLFFs) focus on atom- or molecule-level properties like energy and forces, while overlooking the foundational role of electron density (ED), denoted as
EDBench ๐ addresses this gap by providing a large-scale, high-quality dataset of electron densities for over 3.3 million molecules, based on the PCQM4Mv2 standard. To benchmark electronic-scale learning, we introduce a suite of ED-centric tasks covering:
- Prediction of quantum chemical properties
- Retrieval across structure and ED modalities
- Generation of ED from molecular structures
We demonstrate that ML models can learn from ED with high accuracy and also generate high-quality ED, dramatically reducing DFT costs. All data and benchmarks will be made publicly available to support ED-driven research in drug discovery and materials science.
๐ Citation
@misc{xiang2025edbenchlargescaleelectrondensity, title = {EDBench: Large-Scale Electron Density Data for Molecular Modeling}, author = {Hongxin Xiang and Ke Li and Mingquan Liu and Zhixiang Cheng and Bin Yao and Wenjie Du and Jun Xia and Li Zeng and Xin Jin and Xiangxiang Zeng}, year = {2025}, eprint = {2505.09262}, archivePrefix= {arXiv}, primaryClass = {physics.chem-ph}, url = {https://arxiv.org/abs/2505.09262}
Built on PCQM4Mv2, the EDBench dataset contains accurate DFT-computed EDs for 3.3M+ molecules, enabling deep learning at the electronic scale.
We design a suite of benchmark tasks centered on electron density (ED):
Predict quantum chemical properties from ED representations.
๐ Click to expand the directory structure
{benchmark_root}
{benchmark root}
โโโ ed_energy_5w
โ โโโ raw
โ โ โโโ ed_energy_5w.csv
โ โ โโโ readme.md
โ โ โโโ psi4_grid0.4_cubes
โ โ โโโ {mol_index}
โ โ โโโ Mol1_Dt.cube
โ โ โโโ timer.dat
โ โ โโโ Mol1.sdf
โ โ โโโ Mol1_ESP.cube
โ โ โโโ {mol_index}_Psi4.out
โ โโโ processed
โ โโโ mol_EDthresh{thresh}_data.pkl
โโโ ed_homo_lumo_5w
โ โโโ raw
โ โ โโโ ed_homo_lumo_5w.csv
โ โ โโโ readme.md
โ โ โโโ psi4_grid0.4_cubes
โ โโโ processed
โ โโโ mol_EDthresh{thresh}_data.pkl
โโโ ed_multipole_moments_5w
โ โโโ raw
โ โ โโโ ed_multipole_moments_5w.csv
โ โ โโโ readme.md
โ โ โโโ psi4_grid0.4_cubes
โ โโโ processed
โ โโโ mol_EDthresh{thresh}_data.pkl
โโโ ed_open_shell_5w
โโโ raw
โ โโโ ed_open_shell_5w.csv
โ โโโ readme.md
โ โโโ psi4_grid0.4_cubes
โโโ processed
โโโ mol_EDthresh{thresh}_data.pkl
Dataset | Dir Name | Link | Description |
---|---|---|---|
ED5-EC | ed_energy_5w |
Dataverse | 6 energy components (DF-RKS Final Energy, Nuclear Repulsion Energy, One-Electron Energy, Two-Electron Energy, DFT Exchange-Correlation Energy, Total Energy) |
ED5-OE | ed_homo_lumo_5w |
Dataverse | 7 orbital energies (HOMO-2, HOMO-1, HOMO-0, LUMO+0, LUMO+1, LUMO+2, LUMO+3) |
ED5-MM | ed_multipole_moments_5w |
Dataverse | 4 multipole moment (Dipole X, Dipole Y, Dipole Z, Magnitude) |
ED5-OCS | ed_open_shell_5w |
Dataverse | Binary classification of open-/closed-shell systems |
Cross-modal retrieval between molecular structures (MS) and electron densities (ED).
๐ Click to expand the directory structure
{benchmark_root}
โโโ ed_retrieval_5w/
โ โโโ raw/
โ โ โโโ ed_retrieval_5w.csv
โ โ โโโ readme.md
โ โ โโโ psi4_grid0.4_cubes/
โ โโโ processed/
โ โโโ mol_EDthresh{thresh}_data.pkl
Dataset | Dir Name | Link | Description |
---|---|---|---|
ED5-MER | ed_retrieval_5w |
Dataverse | Cross-modal retrieval: MS โ ED |
Generate ED representations from molecular structures.
๐ Click to expand the directory structure
{benchmark_root}
โโโ ed_prediction_5w/
โ โโโ raw/
โ โ โโโ ed_prediction_5w.csv
โ โ โโโ readme.md
โ โ โโโ psi4_grid0.4_cubes/
โ โโโ processed/
โ โโโ mol_EDthresh{thresh}_data.pkl
Dataset | Dir Name | Link | Description |
---|---|---|---|
ED5-EDP | ed_prediction_5w |
Dataverse | Predict ED from molecular structures |
Each raw/
directory includes a .csv
summary file describing each molecule.
index
: Molecule indexsmiles
: Original SMILEScanonical_smiles
: Canonicalized SMILESscaffold_split
: Scaffold-based split (80% train / 10% valid / 10% test)random_split
: Random split (80% train / 10% valid / 10% test)
- Prediction:
label
: Ground-truth values (space-separated if multi-task)
- Retrieval:
negative_index
: Space-separated indices of 10 negative samples
The code and detailed instructions for running prediction tasks can be found in this ๐directory.
Feel free to open an issue or pull request for questions or contributions. For academic inquiries, contact the authors upon paper publication.
Released for research use under an open-source MIT license.