8000 adding metadata dictionary to cjsonwriter by amandadumi · Pull Request #1148 · cclib/cclib · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

adding metadata dictionary to cjsonwriter #1148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

amandadumi
Copy link
Member

Here is a quick addition to the cjson writer which would allow the metadata dictionary to be output as its own metadata object. This is being discussed as a desired feature in both #1105 and #1143. I haven't used cjson in my own work so this solution may not be in the right format. Just wanted to discuss a possible approach to incorporating this.

@codecov
Copy link
codecov bot commented Jul 24, 2022

Codecov Report

Patch coverage: 65.00% and project coverage change: +0.01 🎉

Comparison is base (d533158) 87.79% compared to head (a077cf7) 87.80%.

❗ Current head a077cf7 differs from pull request most recent head a7f2547. Consider uploading reports for the commit a7f2547 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1148      +/-   ##
==========================================
+ Coverage   87.79%   87.80%   +0.01%     
==========================================
  Files          64       69       +5     
  Lines       13435    13535     +100     
==========================================
+ Hits        11795    11885      +90     
- Misses       1640     1650      +10     
Impacted Files Coverage Δ
cclib/parser/data.py 80.80% <ø> (ø)
cclib/io/cjsonwriter.py 89.44% <65.00%> (-4.18%) ⬇️

... and 6 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@berquist berquist added this to the v1.8 milestone Jul 26, 2022
@berquist berquist added the io label Jul 26, 2022
@berquist
Copy link
Member

This looks fine to me, but we (probably long ago) have veered off the CJSON format. I used this snippet on the result of ccwrite cjson data/ORCA/basicORCA5.0/dvb_rocis.out and it looks ok; the metadata block is appropriately ignored. ORCA dvb_ir.out causes it to segfault, though it does that even on output from before this PR.

Unless we can do some sort of standardization with Marcus (who is definitely still using modified CJSON) and Geoff (who is reworking spectroscopy representation among other things I don't recall), we should have even just a comment in docs that this is "cclib-extended Chemical JSON".

@ghutchis
Copy link
Contributor

Let's talk over Slack or on the Avogadro forum. I think adding the metadata block is a good idea.

I'm not sure if Marcus is doing anything with CJSON right now, it seems as if Brookhaven is keeping him occupied, as well as Tomviz.

@ghutchis
Copy link
Contributor

I'd certainly like to see something in which reading in the metadata will help configure the input generator GUI (e.g., load an MP2 calculation, and it shows an MP2 calculation by default).

@ghutchis
Copy link
Contributor

Here's a more recent reference point on the cjson code:
https://github.com/OpenChemistry/avogadro-cclib/blob/main/cclibScript.py

I added some support for metadata here: https://github.com/OpenChemistry/avogadro-cclib/blob/4a2524fb0c17ad72890a9c8fbe915e421ffa0171/cclibScript.py#L198

But IIRC I didn't add anything on the Avogadro side to parse it yet.

Suggestions welcome .. in particular, I'd like to add total charge and spin multiplicity soon, probably under "properties" to properly roundtrip files (e.g., for bond perception, etc.)

@ghutchis
Copy link
Contributor
ghutchis commented Aug 3, 2022

I'm very open to standardization, so there can be something relatively well defined when Avogadro 2.0 is released.

One relevant example from the chemicaljson repository (https://github.com/OpenChemistry/chemicaljson/blob/master/examples/water_psi4_oc.cjson)

has:

"inputParamet
8000
ers": {
    "basis": "6-31g",
    "functional": "b3lyp",
    "task": "energy",
    "theory": "dft"
  },

This seems like an extremely good idea to enable users to "round trip" calculations. A few other useful metadata for the inputParameters

  • dispersion correction type (D30, D3BJ, D4, ..)
  • solvation model(?)

The "task" key is obviously not in metadata but most should be fairly easy to deduce from cclib:

  • energy (single point = default)
  • optimization (multiple optimization steps exist)
  • frequencies (vibfreqs exist)
  • transition state
  • excitations (etenergies exist)

I'm also open to adding more "task" options.

These all seem like they'd be useful to pull out into the "inputParameters" key.. to be used by the Python input generator scripts.

The other metadata .. I'm not sure whether it's better to have something under "properties" or as you've proposed having a new key "metadata."

@amandadumi
Copy link
Member Author

I went through the pull requests (#1143, #1142, #1141) and collected the data that they would want to introduce into the cjson, initially proposed as additions to a metadata section. As @ghutchis mentioned, many of these would fit in inputParameters field. There are a few that I am unclear on where they should be placed, I've indicated this with (?) following the data. Specifically, information about hardware related information.

basis_set (already exists inputParameters)
calc_type (already exists in inputParameters, but would be renamed to theory)
dispersion (add to inputParameters)
functional (inputParameters)
grid (?, this is a string describing the grid type used )
keywords_line (add to inputParameters)
memory (?)
processors (?)
qm_program (maybe inputParameters)
run_date (?)
solvation (could be added to inputParameters)

@ghutchis
Copy link
Contributor

Most of these seem useful for inputParameters. Way back when there was the JSON planning meeting, I suggested that the format needed to be good enough to work both as input and output - since many users might want a format as part of a workflow (e.g., post-calculation analysis, localized MOs, etc.). Besides that, most output formats include the input parameters.

We also discussed that "extensions" were probably useful / needed for specific program options. (For example, I'm not sure that Grid5 from Orca corresponds to anything specific in other programs.)

Let me clean up the CJSON repo, post the draft schema and I'll submit a patch here based on the avogadro-cclib reader.

I think my only question is the run_date which feels much more like metadata to me, not inputParameters or properties.

@amandadumi
Copy link
Member Author

This sounds like a plan, thank you. I'll work on placing these keywords in the right place then.

I do agree that run_date does fit best in metadata.

@@ -123,6 +123,28 @@ def as_dict(self):
cjson_dict['atoms']['elements']['atom count'] = len(self.ccdata.atomnos)
cjson_dict['atoms']['elements']['heavy atom count'] = len([x for x in self.ccdata.atomnos if x > 1])

if hasattr(self.ccdata, 'metadata'):
cjson_dict['inputParameters'] = dict()
if 'run_date' in self.ccdata.metadata:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this if/else block be improved by using a mapping and loop over that?

@jvalegre jvalegre mentioned this pull request Mar 8, 2023
@berquist berquist mentioned this pull request Mar 8, 2023
@amandadumi amandadumi force-pushed the cjson_metadata branch 2 times, most recently from e53866e to b716e11 Compare March 19, 2023 01:04
@berquist berquist requested review from berquist and shivupa March 22, 2023 00:26
Copy link
Member
@berquist berquist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doing something like this to compare:

#!/bin/bash

set -euo pipefail

for example in dvb_sp_un dvb_gopt; do
    outfile=/home/eric/development/cclib_berquist/data/QChem/basicQChem5.4/${example}.out
    # cclib writer
    ccwrite cjson ${outfile} || ret=$?
    if [[ $ret -eq 0 ]]; then
        < ${example}.cjson jq --indent 4 . > ${example}.cclib.cjson
    fi
    if [[ -f ${example}.cjson ]]; then
        rm ${example}.cjson
    fi
    # Avogadro writer
    python /home/eric/development/forks/python/avogadro-cclib/cclibScript.py --read < ${outfile} | jq --indent 4 . > ${example}.avogadro.cjson
done

cjson_dict['inputParameters'] = dict()
cclib_to_json_keys = {'theory':'methods'}
for i in ['run_date','basis_set','theory','dispersion','functional','grid','memory','processors','qm_program']:
if i in self.ccdata.metadata:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation problem here.

cjson = cclib.io.cjsonwriter.CJSON(data).generate_repr()

json_data = json.loads(cjson)
self.assertTrue("run_date" in json_data["metadata"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your fault, but when you rebase this PR, can you change these to

Suggested change
self.assertTrue("run_date" in json_data["metadata"])
assert "run_date" in json_data["metadata"]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the asserts and then this can be merged IMO

@shivupa
Copy link
Member
shivupa commented May 30, 2023

Hi No pressure just bumping this. Have you already addressed Eric's review locally?

@berquist berquist modified the milestones: v1.8, v1.8.1 Jun 13, 2023
@berquist berquist modified the milestones: v1.8.1, v2.0 Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0