8000 RDF representation of DCAT-AP for version 3 ? · Issue #414 · SEMICeu/DCAT-AP · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

RDF representation of DCAT-AP for version 3 ? #414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
markdoerr opened this issue Feb 12, 2025 · 6 comments
Open

RDF representation of DCAT-AP for version 3 ? #414

markdoerr opened this issue Feb 12, 2025 · 6 comments

Comments

@markdoerr
Copy link

Hello together,

DCAT-AP 2.2.1 had a very useful RDF representation (not only SHACL).
This can be used to build, e.g. Terminology Boxes in triple stores and for easy generation of specific application profiles.
Will such an RDF representation be also made available for the latest 3.X versions ?
We would be very happy, thanks.

@bertvannuffelen
Copy link
Contributor

@markdoerr see issue #315 for the motivation why it is not present.

@markdoerr
Copy link
Author

Thanks, @bertvannuffelen, for the fast hint :)

@markdoerr
Copy link
Author

Hi @bertvannuffelen,
I have several questions to the discussion on #315 . Do you prefer to do it here or shall we re-open the issue ?

@bertvannuffelen
Copy link
Contributor

lets discuss it here.

@markdoerr
Copy link
Author

(sorry for the delay, Bert, I had to teach ;).... but now:

We are a semantic subgroup of NFDI4Chem/Cat (you might have heard of the German Research Data Infrastructure, NFDI),
assembling a DCAT-Application Profile for Chemistry and related fields: DCAT-4C-AP. We are using LinkML as our universal data modelling tool, since this provides a single source of truth for many different applications, like OWL exports, pydantic data models, and also SHACL. This is a very handy representation. We are planning to build metadata repositories (as RDF graphs in triple stores), containing metadata about chemistry / catalysis related experiments and then do SPARQL search and - hopefully - logic reasoning on that data. For that purpose, we also need an RDF representation of our DCAT-application profile in the triple store (as a Terminology box) to derive chemistry related classes. Our current approach is to model the DCAT-for-chemistry Application profile in LinkML by deriving it from the LinkML representation of DCAT-AP 3.0. We can then simply use the pydantic representation to generate new instances of this Appliaction Profile (which can then be directly imported via JSON-LD or turtle / SPARQL into our triple store). The base DCAT-AP LinkML model we currently generate by a handwritten SHACL2LinkML converter from your DCAT-AP3.0 release (not very future proof). This is the background for my request. Do you see a simpler solution ?

@bertvannuffelen
Copy link
Contributor

@markdoerr we use for DCAT-AP the SEMIC toolchain which does the similar processing as LinkML.
The general objectives for the SEMIC toolchain are found the SEMIC Style Guide . The whole specification is generated based on a single point of source information.
As you see, the tool supports also the case beyond publishing the core formal model, but also allows to include other parts of information.

For chemical substances you might contact the Belgian - region of Flanders, agency of environment as they publish a codelist with a mapping other naming conventions. See https://data.omgeving.vlaanderen.be/doc/concept/chemische_stof/AFZSMODLJJCVPP-UHFFFAOYSA-N .

In addition to your profile: there is a difference between a profile of DCAT and DCAT-AP. DCAT-AP imposes rules that are suitable for sharing metadata in the European context, while DCAT is the global vocabulary.
That means that by adopting DCAT-AP some key elements are already restricted in use. For instance a dataset should have a title, and a catalogue should have a publisher.
Your current model is not very clear about this, but it has a major impact. Consider the area of application and the effort publishers of datasets have to do to enter in a common European Dataspace.

With the following, I hope to capture your question correctly:
You mentioned we need a RDF representation, but I think you mean we need a template RDF representation which can be used to instantiate data. As the RDF representation can be checked according to the SHACL shapes.
How 6185 ever that template is an implementation effort every system has to do. For instance, if your catalogue does not has Dataset Series then why the template should contain a Dataset Series representation. But also do you encode your licences as embedded objects or are they referenced by a URI.
All these variations are valid according to DCAT-AP, but require implementers to make decisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0