GitHub - NCEAS/metadig-checks: MetaDIG suites and checks for data and metadata improvement and guidance.

metadig-checks: MetaDIG suites and checks for data and metadata improvement and guidance.

Author: Matthew B. Jones, Peter Slaughter (NCEAS), Ted Habermann, Sean Gordon
License: Apache 2
Submit Bugs and feature requests

metadig-checks contain metadata quality checks that are used by the MetaDIG Quality engine.

A glossary of metadata terms is available on the ESIP Wiki at http://wiki.esipfed.org/index.php/Concepts_Glossary. This glossary is open for editing / additions to the whole ESIP community.

MetaDIG Data Suite Checks

In metadig-checks, data suite quality checks are written in Python. Below is a template from which to begin writing data checks from:

def call():
    global output
    global status
    global output_identifiers
    global output_type
    global metadigpy_result

    # Import your required libariries to perform the data check you're writing
    from metadig import StoreManager
    import metadig as md
    import pandas as pd
    ...

    # Get a manager object so that we can retrieve objects
    manager = StoreManager(storeConfiguration)  

    # The variables below are used by the Metadig-Engine, MetacatUI and other clients
    output_identifiers = [] # This is a lit of pids that have been checked
    output_data = [] # This array contains the corresponding message for the list of pids checked
    status_data = [] # This array represents the results for each pid checked: 'SUCCESS' or 'FAILURE'
    output_type = [] # This is the type of data found in 'output_data': 'text' or 'markdown'
    metadigpy_result = {} # This dictionary is required for 'MetaDIG-py' to return check results

    # Set appropriate output if dataPids are unavailable
    if len(dataPids) == 0:
        output_data = "No data objects found."

    # Confirm datapids are present and loop over them
    for pid in dataPids:
        # Retrieve data object and sysmeta
        output_identifiers.append(pid)

        # Retrieve and validate the object
        try:
            obj, sys = manager.get_object(pid)
            # Perform desired action on 'obj' retrieved
            # TODO: Perform desired check actions
            # Add the results for the pid processed 
            # If the retrieved object is not valid and should not be checked
            # you may want to skip it. For example:
            # obj, fname, csv_status = md.get_valid_csv(manager, pid)
            # if csv_status == "SKIP":
            #     output_data.append(f"Placeholder Text For Invalid Data Object")
            #     output_type.append("text")
            #     status_data.append(csv_status)
            continue
        except Exception as e:
            # Record an unexpected issue and move onto checking the next pid
            output_data.append(f"Unexpected Exception: {e}")
            output_type.append("text")
            status_data.append("FAILURE")
            continue

        # Perform the data check on the object
        try:
            # TODO: Code the data check
        except Exception as e:
            output_data.append(f"Unexpected Exception: {e}")
            output_type.append("text")
            status_data.append("FAILURE")
            continue
        if "BooleanToCheck" == True:
            output_data.append(f"{filename} is able to be ...")
            output_type.append("text")
            status_data.append("SUCCESS")
        else:
            output_data.append(f"{filename} cannot be ...")
            output_type.append("text")
            status_data.append("FAILURE")

    # Gather and tally up the results
    successes = sum(x == "SUCCESS" for x in status_data)
    failures = sum(x == "FAILURE" for x in status_data)
    skips = sum(x == "SKIP" for x in status_data)
    output = output_data # Or you can write a custom message
    
    if successes > 0 and failures == 0:
        status = "SUCCESS"
    elif successes == 0 and failures > 0:
        status = "FAILURE"
    else:
        status = "FAILURE" 

    # The array below must be populated in order for the `MetaDIG-py` run_check
    # function to return valid results.
    metadigpy_result["identifiers"] = output_identifiers
    metadigpy_result["output"] = output_data
    metadigpy_result["status"] = status
    return True

License

Copyright [2013] [Regents of the University of California]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0:

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Acknowledgements

Work on this package was supported by:

DataONE Network
NSF ACI - DATANET grant #1443062 to T. Habermann and M. B. Jones

Additional support was provided for collaboration by the National Center for Ecological Analysis and Synthesis, a Center funded by the University of California, Santa Barbara, and the State of California.

Name		Name	Last commit message	Last commit date
Latest commit History 423 Commits
bin		bin
code		code
data		data
docs		docs
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
ISSUE_TEMPLATE.md		ISSUE_TEMPLATE.md
LICENSE		LICENSE
README.md		README.md
build.properties		build.properties
build.xml		build.xml
usage.md		usage.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

metadig-checks: MetaDIG suites and checks for data and metadata improvement and guidance.

MetaDIG Data Suite Checks

License

Acknowledgements

About

Uh oh!

Releases 12

Packages

Contributors 7

Languages

License

NCEAS/metadig-checks

Folders and files

Latest commit

History

Repository files navigation

metadig-checks: MetaDIG suites and checks for data and metadata improvement and guidance.

MetaDIG Data Suite Checks

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Contributors 7

Languages

Packages