8000 #DVFD9-47 URL format of CKAN resource parser to match URL provided by data.gov.au by WilsonWu2100 · Pull Request #8 · govCMS/ckan_connect · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Nov 20, 2024. It is now read-only.

#DVFD9-47 URL format of CKAN resource parser to match URL provided by data.gov.au #8

Merged
8000 merged 3 commits into from
Jun 29, 2021

Conversation

WilsonWu2100
Copy link
Contributor

URL format of CKAN resource parser needs to match URL provided by data.gov.au

@thery-dh thery-dh self-requested a review June 28, 2021 05:11
@@ -14,7 +14,7 @@ class CkanResourceUrlParser implements CkanResourceUrlParserInterface {
/**
* Regex to check whether a CKAN resource path is valid.
*/
const CKAN_RESOURCE_PATH_REGEX = '/^\/dataset\/.*\/resource\/[^\/]+$/';
const CKAN_RESOURCE_PATH_REGEX = '/(?:resource\/|resource_id=)([a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})/';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we should validate the resource id since if it ever changes this will break

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thery-dh I disagree, this is a robust way of ensuring a valid resource id, if it changes (I think this is unlikely) then the regex should be updated to reflect the changes

previously /dataset/fooo/resource/wrong would have indicated that wrong is a valid url + resource id. Now it is enforcing the correct format

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is a good attempt to be thorough, I think it likely to cause issues rather than solving anything.
All this regex should do is make sure that there is a resource id and extract id. If the resource id is incorrect, then the subsequent API query will fail and should consider the resource as "non existent".
If you look at the last part of the regex for instance, it will only grab the last 12 chars of the resource id. Overtime this may very well increments to 13 or more as database grow and then you will get an incorrect id...same goes with additions of non alphanum chars, etc...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that is correct @thery-dh the resource id is a hash and always formatted the same way with the same number of characters (likely for internal validation reasons). It is not a traditional id that increments.

In a brand new instance of CKAN the first record has a resource ID in that format, as does a recent resource id in a CKAN instance that has hundreds of thousands of resources eg https://data.gov.au/data/dataset/native-title-determination-applications-schedule/resource/d00049f3-fc1a-4536-b23e-d43f53254934

I challenge you to find a resource id that doesn't work with this ;) https://regex101.com/r/84R9UI/1/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info Jez.
You are right that the resource_id format is unlikely to change. Looking a bit deeper into it, it is a uuid generated in python: https://github.com/ckan/ckan/blob/master/ckan/model/resource.py#L35
using the uuid4() function: https://github.com/ckan/ckan/blob/5c386b8cd44ad41c8a94d7c7953c863d6e2f64bb/ckan/model/types.py#L18
making the format immutable.

However, I still believe that it should not be up to the connector to 8000 validate the id but instead simply used what is being passed, personal opinion I guess.

@thery-dh thery-dh merged commit cc44ae1 into govCMS:8.x-1.x Jun 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0