8000 Add experimental PDF/A compliance mode by Starfox64 · Pull Request #3269 · dompdf/dompdf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add experimental PDF/A compliance mode #3269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8000 Merged
merged 8 commits into from
Jan 14, 2025
Merged

Conversation

Starfox64
Copy link
Contributor
@Starfox64 Starfox64 commented Sep 1, 2023

This PR adds experimental PDF/A compliance mode and closes #1106

I'm calling this experimental for multiple reasons:

  • Only takes care of adding the required metadata
  • Doesn't force font embedding, it's on the user to embed their fonts
  • I can only test so many documents, the user should generally verify actual compliance with a tool like veraPDF (profile PDF/A-3b) and report issues
  • Isn't as robust GhostScript

This still should work for most people on non esoteric documents.

Addendum

Fixes #3442

@bsweeney bsweeney added this to the 2.0.5 milestone Sep 1, 2023
Copy link
Contributor
@williamdes williamdes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff looks good, maybe some concerns about the licencing of the binary file

@williamdes
Copy link
Contributor

@Starfox64 do you think it would solve #3269 ?

@Starfox64
Copy link
Contributor Author

That's the wrong Issue ID I think.

@williamdes
Copy link
Contributor
williamdes commented Apr 19, 2024

Indeed, this one: #3442

Could you rebase your work ?
I was about to try it but composer wants 2.1+

🇫🇷 👋🏻

@Starfox64
Copy link
Contributor Author

I've done a merge recently and it looks like I'm ahead of master.

With regards to #3442 I think this might be fixed by this line https://github.com/dompdf/dompdf/pull/3269/files#diff-f365bcb24d5c081b6bfde4aa59a8eda68824ea9658c362098554fc659dcec1d0R3426

It's funny to see that we are literally trying to do the same thing btw, nothing spells out fun more than trying to embed XML in PDF metadata.

@williamdes
Copy link
Contributor

I've done a merge recently and it looks like I'm ahead of master.

With regards to #3442 I think this might be fixed by this line https://github.com/dompdf/dompdf/pull/3269/files#diff-f365bcb24d5c081b6bfde4aa59a8eda68824ea9658c362098554fc659dcec1d0R3426

It's funny to see that we are literally trying to do the same thing btw, nothing spells out fun more than trying to embed XML in PDF metadata.

Okayy, I got it wrong: barryvdh/laravel-dompdf[v2.1.0, ..., v2.1.1] require dompdf/dompdf ^2.0.3
So I will "downbase" your work and test it for 2.0

@williamdes
Copy link
Contributor

@bsweeney
Copy link
Member

FYI I'm re-targeting this to 3.1 (from 3.0.1) since it's a new feature not a patch. I know there are a lot of issues slate for 3.0.x releases but I suspect most of those will be pushed out once I have a chance to review them.

@bsweeney bsweeney modified the milestones: 3.0.1, 3.1.0 Apr 19, 2024
Copy link
Contributor
@williamdes williamdes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add Fixes: #3442 to this PR ?
It passes nearly all compliance tests except

[Specification: ISO 19005-3:2012, Clause: 6.2.11.4.1, Test number: 1](https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Parts-2-and-3-rules#rule-621141-1)	
The font programs for all fonts used for rendering within a conforming file shall be embedded within that file, as defined in ISO 32000-1:2008, 9.9.	Failed

Let's follow up on #3443

@bsweeney bsweeney linked an issue Apr 19, 2024 that may be closed by this pull request
@williamdes williamdes mentioned this pull request Apr 19, 2024
1 task
@williamdes
Copy link
Contributor

Would you mind reviewing this first batch @bsweeney ?
This is quite critical for me as I need to progress into PDF/A + Factur-x

@bsweeney
Copy link
Member
bsweeney commented Dec 8, 2024

FYI, I force pushed a rebase for this branch so the merge commit wasn't necessary. I'll be taking a look at what else we need and comment and/or commit additional changes.

@williamdes
Copy link
Contributor

FYI, I force pushed a rebase for this branch so the merge commit wasn't necessary. I'll be taking a look at what else we need and comment and/or commit additional changes.

Thank you, let us know if some help is needed

8000

@bsweeney
Copy link
Member
bsweeney commented Dec 26, 2024

FYI, it's possible to achieve PDF/A compliance with the changes here. I'll add a page to the wiki outlining the requirements and provide sample code.

To address your specific issue regarding fonts, you should set your default font by passing it in as part of the instantiation options and your document should not specify any fonts that would break compliance. For example, if you style your document with a generic font family then Dompdf will translate that into a core PDF font. So, for example, font-family: sans-serif is translated to mean "use the core PDF font Helvetica" which is not an embeddable font.

Something like the following should generate a compliant document:

$dompdf = new Dompdf([
    "defaultFont" => "DejaVu Sans",
    "isPdfAEnabled" => true
]);
$html = <<<EOF
<html>
<head>
    <style>
        * { font-family: DejaVu Sans; }
    </style>
</head>
<body>
    <p>This is PDF/A-3b compliant.</p>
</body>
EOF;
$dompdf->loadHtml($html);

I'm still thinking through how to ensure text styled with a generic family uses an embeddable font. Most likely it will require remapping the font families prior to render. I'll include that information in the wiki once I have a solution.

Co-authored-by: William Desportes <williamdes@wdes.fr>
@williamdes
Copy link
Contributor

For reference I am posting some documentation I found on mpdf
https://mpdf.github.io/what-else-can-i-do/pdf-a3-xmp-rdf.html

Seems to have been implemented in mpdf/mpdf#558
and rdf data in mpdf/mpdf@4978b70

Maybe we can copy the example and make it for dompdf

Copy link
Member
@bsweeney bsweeney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I do see some room for improvement, I think it's best to put some of these potential changes off to a future release so we can move forward with getting this out there for those who need the functionality now.

I do have some improvements so hold tight just a little longer.

@bsweeney
Copy link
Member
bsweeney commented Jan 2, 2025

I ran the code through my local test suite and found a few minor compatibility problems. I also enabled PDF/A when using the PDFLib backend (which did not require many changes).

Only thing I haven't been able to address so far is compliance when using CMYK colors/images. I'll look at that issue a bit more, but I suspect addressing CMYK support may have to wait. I think the correct resolution would be to use a consistent color profile in the document, which would require a bit of effort. Plus, the CMYK profiles from ICC are fairly large compared to Dompdf itself (2 - 8 MB for the ones I looked at).

- Sets appropriate flags on link annotations
- Reworks the OutputIntents object as an array of intents.
- Adds line breaks where required around object identifiers
- Sets max codepoint to 255 for binary mode header bytes
MissingWidth: Some monospace fonts do not include the MissingWidth value in the font metrics (see DejaVu Sans Mono). In this scenario use the width of the space character if it exists.

CapHeight: FontLib now retrieves the CapHeight, so use that value if present. Otherwise continue to use the Ascender value.
@wblessen
Copy link
wblessen commented Jan 6, 2025

FYI, it's possible to achieve PDF/A compliance with the changes here.

Hello Brian, this is very good news.

I'll add a page to the wiki outlining the requirements and provide sample code.

https://github.com/dompdf/dompdf/wiki/PDFA-Support

To address your specific issue regarding fonts, you should set your default font by passing it in as part of the instantiation options and your document should not specify any fonts that would break compliance. For example, if you style your document with a generic font family then Dompdf will translate that into a core PDF font. So, for example, font-family: sans-serif is translated to mean "use the core PDF font Helvetica" which is not an embeddable font.

I think its perfectly fine, to add a hint to the wiki,
that generic font families are not possible with PDF-A mode.

You could even throw an Exception, IMO

@bsweeney
Copy link
Member
bsweeney commented Jan 6, 2025

Indeed, PDFLib will throw exceptions for non-conforming elements. Soft failure does seem to be unsupportive of the end-user goal without at least a notification that the generated PDF will not be compliant. Perhaps it does make sense to throw an exception if a non-conforming element (e.g., CMYK element or non-embeddable font) is used.

Known non-compliant content include non-embeddable fonts, specifying CMYK colors, and CMYK images.
@bsweeney bsweeney merged commit c8aef1f into dompdf:master Jan 14, 2025
18 checks passed
@bsweeney bsweeney linked an issue Jan 16, 2025 that may be closed by this pull request
@Cruiser13
Copy link

Awesome, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for PDF standard A-3b EOL marker shall be immediately followed by a % PDF/A
5 participants
0