-
Notifications
You must be signed in to change notification settings - Fork 120
Extending options for populating Taken Date from file-metadata #4457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…age time being treated as UTC
Thanks for this! I think we should not go too far with trying to read the information that is not reliable (these are all incorrect as per spec) and we should always pressure all suppliers to be as accurate as possible when it comes to date taken (and all metadata). Recording it correctly, with subseconds, time zone and daylight savings is easiest when the camera settings are correct. And then any software rewriting the file doesn’t destroy it. Some suppliers also decided (ages ago) to incorrectly use this field(s) for eg. date sent or embargo and they must stop these practices. There are proper fields such information can be saved into which are spec-compliant (eg. Release Date+Time, Date+Time Sent, License Start Date and others) That said, it will never be perfect. I think extending feasibility to 14 hours may make sense, although I will leave this also for others to ponder. To two additional fields:
I think pt. 1 warrants some discussion as does extending feasibility to 14 hours. I am not entirely against either, but I think that if we do it, we should do it consciously. Let us know if you have any thoughts, we will add ours too. Or, given that it’s dates and time, we may just blindly merge it without any thinking. ;-) |
Have removed use of xmp:CreateDate for setting of taken date - with revisions to timezone management I think it would hardly have been used. Have retained 14 hour offset to allow the setting of Taken Date for images where local timezone info has been forced into UTC due to lack of knowledge about actual timezone - this seems to work well to allow for the retention of meaningful taken date information |
What does this change?
Currently the code sets the Taken Date for an image based on 1 of 3 fields in the file metadata (either read directly from the embedded metadata or formed as a composite).
In some circumstances these fields are not correctly populated and the field can be left unset even though there is data within the embedded metadata that would allow a value to be set. This PR adds an additional field to the list of fields that can be scanned to establish a taken date. The new field is 'iptc:Date Created' - this has no time component and will only be used when no time data is available. This field will be scanned only if the existing 3 fields fail to yield a value and so will not impact any images that have embedded metadata that is already supported.
This PR also introduces 2 additional date string patterns into the format list that are seen in incoming image data but are currently unsupported leading to a failure to set the taken date.
The main reason that taken date doesn't get set has now been identified as the taken date having a value after the upload time and the rule applied to discount these values as incorrect. However, we have found that in a large number of cases this is too restrictive. The embedded metadata includes taken date information with the time component given in the local timezone where the picture was taken (or the timezone to which the camera is set) but it doesn't provide any information on the timezone in many cases and under these circumstances we assume UTC - which can be incorrect. This leads to the images appearing to have taken date after the upload time when in fact the discrepancy is due to incorrect assumptions about the timezone of the taken date embedded metadata. This is particularly prevelant for images from Asia and Australia that are many hours ahead of UTC. To better handle this problem we have adapted the rule associated with 'taken date after upload time' to be 'taken date is feasible given upload time'. Timezones that are ahead of UTC can be up to 14 hours ahead and so we allow the 'taken date' to be up to 'upload time + 14 hours' - if it is beyond this value it is still deemed incorrect and rejected, but if it falls in this window it is assumed that it could be correct subject to timezone anomalies.
With this change in place we have seen a significant reduction in the number of images without a date taken value. Some still persist - mostly PA images that are embargoed and the taken date is set to the embargoed period end - which can fall outside the allowed time window and aa small number of Shutterstock images which have genuinely infeasible taken date information.
How should a reviewer test this change?
Check that all supported date format patterns are interpretted correctly
Ensure that in the event that the initial 3 fields are not present in the metadata that the 4th field gets considered correctly
Check that images with feasible time taken values (given possible time offsets from UTC) have the taken date value set despite is being after the upload time.
Who should look at this?
Tested? Documented?