8000 Definition of Catalog confusing and does not lead to proper use of this class · Issue #437 · SEMICeu/DCAT-AP · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Definition of Catalog confusing and does not lead to proper use of this class #437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
keestrautwein opened this issue May 27, 2025 · 2 comments

Comments

@keestrautwein
Copy link
keestrautwein commented May 27, 2025

The current definition of Catalog is very confusing: "A catalogue or repository that hosts the Datasets or Data Services being described."
There are several problems with this description.

  1. The W3C standard gives a completely different definition: "A curated collection of metadata about resources". This implies that a curator collects metadata about resources in a collection. This a much needed part of the standaard: It is a way to manually make a collection of resources that belong together in a way that may not be possible to construct using other DCAT means, like search criteria on the properties. In DCAT-AP this option seems to be lost
  2. The use of the word "host" is confusing. It can be read to imply that a catalog is connected to the physical storage of the DCAT description, the "host". This notion is incompatible with RDF. RDF data typically stored in a distributed database, meaning that data can be fetched from any of multiple physical sources. In a well designed environment the exact location of the triples is irrelevant for the user. It seems very strange to use a central class of DCAT to designate such an irrelevant fact.
  3. We could interpret the meaning of "hosts" differently. It could for instance als be read as referring to different namespaces or other technical differences. Again it would be strange that these "technical" construction were to be described by Catalogue.
  4. Catalogues are also usefull because it is possible to define catalogues inside catalogues. This is especially usefull when the total catalog is large. "Nested" catalogues can help maintain structure or make new combinations. But the DCAT-AP definition does not seem to identify this use, as it only mentions Data Services and Datasets.

A good definition leads to the proper use of this construct, a bad one to misuse that is not compatible with DCATv3.
I Would like the definition to reflect the use of Catalog better. The W3C definition seems to be appropriate, but maybe could be improved upon.

@joachimnielandt
Copy link

In Flanders we are currently investigating the use of the Catalog entity for the purpose of describing a "virtual (sub)catalogue" within the metadata.vlaanderen.be catalogue. This would be used for e.g. "dataspaces" that wish to gather relevant metadata records.

Implementation work is underway in the dcat-ap geonetwork plugin to support this: metadata101/dcat-ap#76.

@keestrautwein
734A Copy link
Author
keestrautwein commented May 28, 2025

In Flanders we are currently investigating the use of the Catalog entity for the purpose of describing a "virtual (sub)catalogue" within the metadata.vlaanderen.be catalogue. This would be used for e.g. "dataspaces" that wish to gather relevant metadata records.

Implementation work is underway in the dcat-ap geonetwork plugin to support this: metadata101/dcat-ap#76.

This is an interesting example of a use case for a Catalogue because it seems collect data from different sources that may not know their DCAT is used for use in a data space. There are many other examples to be given.

Another example of a use case for making collections that are not described by the original contributors, is a Catalogue of datasets en services that support something under investigation. For instance: during the Covid pandemic several Catalogues were constructed containing relevant data for investigation. This way, not every investigator had to find relevant data for themselves: a base set of data could be established.

Catalogue can also be used to structure a large collection of DCAT "descriptions" for an audience that are known to prefer this order. Another use is using catalogues to structure the processes for the party that collects datasets in some way.

Even if there is a way to construct a query that describes a Catalog, it could still be very useful to offer the results of this query as a Catalogue. That way users do not have to construct the query themselves: the catalog is already there.

There are many more examples to be given. I think the definition of Catalogue should try to reflect this. Or at a minimum keep these uses possible for people just reading the definition of dcat:Catalog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0