8000 namespace conflict introduced when importing/exporting EML generated under older schema · Issue #347 · ropensci/EML · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
namespace conflict introduced when importing/exporting EML generated under older schema #347
Open
@RobLBaker

Description

@RobLBaker
< 68B6 /div>

I ran across this interesting issue with an older EML file. I downloaded the file, imported it using EML::read_eml() and then wrote it back to .xml using EML::write_eml(). The result was a corrupted eml file with conflicts in the namespace that nevertheless passes the EML::eml_validate() validation check:

I downloaded a data package from EDI: https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-and.4780.4

The file knb-lter-and.4780.4.xml is an EML formatted file. Upon download, the initial eml tag in knb-lter-and.4780.4.xml looks like so:

<eml:eml xmlns:ds="eml://ecoinformatics.org/dataset-2.1.1" xmlns:eml="eml://ecoinformatics.org/eml-2.1.1" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" packageId="knb-lter-and.4780.4" system="https://pasta.edirepository.org/" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 [http://nis.lternet.edu/schemas/EML/eml-2.1.1/eml.xsd">

I then imported to R with EML::read_eml and wrote it back to .xml:

mymeta<-EML::read_eml("knb-lter-and.4780.4.xml", from="xml")
View(mymeta)
EML::write_eml(mymeta, "exportedEML.xml")

And when I open the new "exportedEML.xml" file I see:

<eml:eml xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.2" xmlns:ds="eml://ecoinformatics.org/dataset-2.1.1" packageId="knb-lter-and.4780.4" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 http://nis.lternet.edu/schemas/EML/eml-2.1.1/eml.xsd" system="[https://pasta.edirepository.org">](https://pasta.edirepository.org%22%3E/)

It appears that even though the xmlns:eml attribute is now eml-2.2.0, the schema location (xsi:schemaLocation=) and xmlns:ds both still indicate the original EML 2.1.1.

Both files validate using EML::eml_validate(). I assume this is because the EML package does not actually use the namespace within the EML file to identify the schema to validate against but instead has that namespace hardcoded in elsewhere.

I understand it is possible to tell EML to switch between schema versions, but I still think this qualifies as a potential bug. I can see users generating an EML file under one schema and (perhaps years later) updating it under a second schema. In that scenario, this namespace conflict is easily introduced. If the default it to update everything to the latest schema, that should be done consistently.

On a side note, it would be nice to preserve the evolution of an EML file if it is edited under multiple different schemas during it's lifetime (for instance as a data package is incrementally added to and versioned). But I think there is likely a better place to systematically implement that version history than the eml namespace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0