8000 Use alternative libraries for PDF exports by bhennion · Pull Request #6364 · xournalpp/xournalpp · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Use alternative libraries for PDF exports #6364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bhennion
Copy link
Collaborator

I played around a bit with alternative libraries for exporting PDF (see #3177)
You can try out two options (one via PoDoFo (you'll need the latest dev version of this library) and one via libmupdf)
They can be enabled via the cmake flags ENABLE_POFODO and ENABLE_MUPDF.

If enabled, you can export via PoDoFo or mupdf from the command line (only there for now) with the options --export-podofo and --export-mupdf (together with foo.xopp -p bar.pdf).

I'm happy to get any feedback on those attempts (especially if someone encounters a pdf/xopp file where the output is not as expected). The TOC, hyperlinks, Pdf-annotations and other data in the PDF should be preserved.

Forwarded message from #3177:

Also, I (maybe wrongly) assumed that the new export backends would "pass through" the background PDF and put annotations on top. But, at least in this case, both cairo and mupdf increase the file size (though not terribly) with mupdf > cairo.

This is exactly what I did. The xopp-annotations are exported to PDF by cairo (without the pdf background) and added to the background pdf as an XObject (some pdf feature). I had made a mistake in the case the background PDF page already had such XObjects.
Now, the PDF file is opened, the annotations are added, and then the pdf is saved again. Under the hood, the library may decompress bit of data, do something on it, and resave it with a different compression level (see e.g. this striking podofo/podofo#163). This may lead to size increases.

Another possible source of (ever so slight) size increase is with the fact that some internal structure is changed on purpose by my code: specifically, the content of a page can be stored in two ways. Either a single compressed stream or an array of streams. In order to add the overlayed xopp-annotations, I chose to always go with an array of streams, even if the original page had a single stream. This way, I can add by hand the stream necessary for displaying the xopp-annotation overlay without ever touching (decompressing) the original stream.
In a similar manner, I had to add some indirections between the page structures and some of their data to minimize size increase in the case where the user duplicated a PDF page (e.g. the xopp shows twice the first page of the PDF background).
All this may add a handful of bytes per page (maybe up to a kB/page, I don't really know).

One last thing is that pages are modified even is they have no annotations on them. This (and more) could/should be optimized out.

@bhennion bhennion added PDF Related to PDF features export Issues related to export labels Mar 31, 2025
@bhennion
Copy link
Collaborator Author

I optimized things a little bit (so pages that have neither been annotated nor duplicated are not modified at all).
@mjg @tattsan if you have time to test this new version, it is supposed to work in all situations: various background PDF's, with pages annotated/duplicated/removed/reordered in Xournal++, or with additional pages with other backgrounds (images or lined paper and so on).

@mjg
Copy link
Contributor
mjg commented Mar 31, 2025

I tried building with podofo-1.0.0-beta1 - do you need even newer (git master)? Seems so, given that some structures/types are off. Ow wait, there's a beta2 .... going back :|

@mjg
Copy link
Contributor
mjg commented Mar 31, 2025

OK, that (beta2) works. Noticed:

  • podofo inserts a "modified date", mupdf does not
  • podofo spits out a bunch of Fixing references in 26 0 R by 25 lines, cairo and mupdf do not

@tattsan
Copy link
Contributor
tattsan commented Mar 31, 2025

Thanks! Cmake passed cleanly. I've only tested mupdf.
There still remains non-exported xopp-annotation issue, in case 'duplicated PDF page with XObjects'.

@bhennion
Copy link
Collaborator Author
bhennion commented Apr 1, 2025

Thanks! Cmake passed cleanly. I've only tested mupdf. There still remains non-exported xopp-annotation issue, in case 'duplicated PDF page with XObjects'.

Fixed!

@tattsan
Copy link
Contributor
tattsan commented Apr 1, 2025

Thanks! I've confirmed that the bug has been fixed.

@rolandlo

This comment was marked as outdated.

@bhennion bhennion linked an issue Apr 2, 2025 that may be closed by this pull request
@bhennion

This comment was marked as resolved.

@rolandlo

This comment was marked as outdated.

@bhennion
Copy link
Collaborator Author
bhennion commented Apr 3, 2025

So, I wasn't very happy with the choices of libraries I made, so I implemented a QPDF-based of the thing (you can try it via --export-qpdf in the CLI).
The three libraries have benefits and drawbacks:

  1. libmupdf: C library with a decent C++ binding. Old and well tested. Main drawback: it is HUGE. (ubuntu package = 80MB once installed, Archlinux is 50MB + 30MB of deps). No good cmake integration
  2. PoDoFo: C++ library. The interface is pretty intuitive, but it is young (still beta) and in active development. We'd have to wait for the first stable release and we'll most likely encounter some bugs on weird PDF files. It won't be readily available on ubuntu or similar LTS. (package size < 6MB). So so cmake integration
  3. QPDF: C++ library, old enough to be available on ubuntu 22.04 LTS. Seemingly well-tested as well. The interface is not very intuitive though (hard to access a page number for instance). As a drawback, it will be harder with this library to expand the idea of this PR to cover issues like PDF annotation interoperability with other apps #2486. (package size ~4MB). Good cmake integration

Not quite certain what the best choice is, but I'd lean towards QPDF for now.

@mjg
Copy link
Contributor
mjg commented Apr 4, 2025

On Fedora, mupdf-cpp-libs+deps weighs in at 55MiB when installed. 6.2MiB are leptonica+tesseract, 46.7MiB mupdf-libs. I think the main reason is embedded fonts (which are trimmed down a bit in Fedora compared to the default) and the overall functionality.

From a packager and user perspective, the ideal solution would be export plugins which can be built as subpackages and installed separately. For a mupdf/zathura/qpdfview/... user, the xournalpp-mupdf-plugin does not pull in any extra dependencies which they haven't installed already. Others may prefer other export plugins. Subpackages would give everyone their choice.

I know this requires extra effort, but since you've abstracted the interface already ... ;-)

@rolandlo

This comment was marked as resolved.

@mjg
Copy link
Contributor
mjg commented Apr 4, 2025
[ 78%] Building CXX object src/core/CMakeFiles/xournalpp-core.dir/pdf/base/QPdfExport.cpp.o
In file included from /usr/include/qpdf/Buffer.hh:24,
                 from /usr/include/qpdf/QPDF.hh:36,
                 from /home/mjg/src/xournalpp/src/core/pdf/base/QPdfExport.cpp:10:
/usr/include/qpdf/PointerHolder.hh:32:3: warning: #warning "POINTERHOLDER_TRANSITION is not defined -- see qpdf/PointerHolder.hh" [-Wcpp]
   32 | # warning "POINTERHOLDER_TRANSITION is not defined -- see qpdf/PointerHolder.hh"
      |   ^~~~~~~

and then

/home/mjg/src/xournalpp/src/core/pdf/base/QPdfExport.cpp: In function ‘void reorderBackgrounds(QPDF&, const std::vector<HybridPdfExport::OutputPageInfo>&)’:
/home/mjg/src/xournalpp/src/core/pdf/base/QPdfExport.cpp:104:49: error: cannot convert ‘__gnu_cxx::__alloc_traits<std::allocator<QPDFPageObjectHelper>, QPDFPageObjectHelper>::value_type’ {aka ‘QPDFPageObjectHelper’} to ‘const QPDFObjectHandle&’
  104 |             if (!ps[nbValidPages].isSameObjectAs(backgroundPages[n])) {
      |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/qpdf/QPDF.hh:42:
/usr/include/qpdf/QPDFObjectHandle.hh:311:25: note: initializing argument 1 of ‘bool QPDFObjectHandle::isSameObjectAs(const QPDFObjectHandle&) const’
  311 |     bool isSameObjectAs(QPDFObjectHandle const&) const;
      |                         ^~~~~~~~~~~~~~~~~~~~~~~

This is with qpdf-devel 11.10.1 on Fedora 42. Maybe differently placed const (value/ref? Or, in fact, ref vs val for ps and backgroundPages?

@bhennion
Copy link
Collaborator Author
bhennion commented Apr 4, 2025

@mjg: would replacing this line by

            if (!ps[nbValidPages].isSameObjectAs(backgroundPages[n].getObjectHandle())) {

work?

I'm guessing they added an implicit conversion since 11.10.1

@mjg
Copy link
Contributor
mjg commented Apr 4, 2025

if (!ps[nbValidPages].isSameObjectAs(backgroundPages[n].getObjectHandle())) {

Yes, this compiles! I had tried with QPDF_FUTURE meanwhile which didn't help.

BTW: You might want to

-#ifdef ENABLE_MUPDF
+#ifdef ENABLE_QPDF

in src/core/pdf/base/QPdfExport.{h,cpp}.

@bhennion
Copy link
Collaborator Author
bhennion commented Apr 4, 2025

And I'm guessing setting: # define POINTERHOLDER_TRANSITION 4 before the QPdf includes should get rid of your warning as well. See
https://github.com/qpdf/qpdf/blob/6434e09dc2f9942388bb03c666635fd3c2394342/include/qpdf/PointerHolder.hh#L41-L60

@mjg
Copy link
Contributor
mjg commented Apr 4, 2025

So, in principle qpdf works. A few things I noticed in comparison:

  • cairo sets creation date to the current one, sets no modification date
  • mupdf, podofo set creation date to that of the PDF background, modification date to current one
  • qpdf sets creation date to that of the PDF background
  • all but qpdf set titel to the xopp filename
  • qpdf comes with smaller default view size (?; using evince)
  • cairo claims PDF-1.7, others PDF-1.4
    Indeed, one can argue either way for copying info from the (first?) PDF background or not, depending on the use case (annotating/filling in or "just background"). I guess we'd want it the same for all extra exporters at least. We've clearly reached the bike-shedding phase :-)

@rolandlo
Copy link
Member

The code for the QPdfExport looks good. I have done some more tests after studying the code and those passed fine as well.
The PoDoFo backend has a problem with PDF background pages duplicated in the xopp-file that contain forms or annotations though. Only the first occurence shows the forms and annotations.

@bhennion

This comment was marked as resolved.

@rolandlo

This comment was marked as resolved.

@bhennion

This comment was marked as resolved.

@rolandlo

This comment was marked as resolved.

@bhennion

This comment was marked as resolved.

@ceztko
Copy link
ceztko commented May 7, 2025

PoDoFo maintainer here. Few replies to your very interesting comments:

See podofo/podofo#253 for the report upstream

Should be fixed in master. See my comment here.

podofo inserts a "modified date", mupdf does not

You can disable metadata updating when saving with PdfSaveOptions::NoMetadataUpdate

  • podofo spits out a bunch of Fixing references in 26 0 R by 25

Oh, yes: this makes no sense at all. I demoted the log to Debug in podofo/podofo@85c6e68 .

and resave it with a different compression level (see e.g. this striking podofo/podofo#163). This may lead to size increases.

This is sad but it will be definitely fixed at some point after 1.0 release. It just didn't escalate in my job so far, as I have other priorities.

PoDoFo: C++ library. The interface is pretty intuitive, but it is young (still beta) and in active development

Actually it is not that young (first commit dates back to 2006). I'm the current maintainer after the original author retired. The stable release is 0.10.4 but 1.0 is coming soon (I hope 1-2 weeks away) and it will feature a robust (if not stable) API.

It won't be readily available on ubuntu or similar LTS.

We are in contact with Debian/Ubuntu devs so 1.0 will receive a fast track and 0.10.x releases skipped.

we'll most likely encounter some bugs on weird PDF files

Some broken PDFs (not standard compliant) are not supported and will require a recovery mode as implemented in other parsers. Unfortunately this is currently not my priority, so again it's something that will come after 1.0.

The PoDoFO library is licensed under LGPL v2 or later, which appears to be compatible with GPL v2 or later.

PoDoFo will be soon relicensed to dual LGPL 2+/MPL2. Probably shortly after 1.0.

@ceztko

This comment was marked as resolved.

@rolandlo

This comment was marked as outdated.

@ceztko

This comment was marked as resolved.

@bhennion

This comment was marked as resolved.

@ceztko

This comment was marked as resolved.

@ceztko

This comment was marked as resolved.

@rolandlo

This comment was marked as resolved.

@bhennion

This comment was marked as resolved.

ceztko added a commit to podofo/podofo that referenced this pull request May 23, 2025
…n.cmake

It emerged from comments in[1][2] that current podofo-config.cmake version is not
really working to . Guide at[3] was followed. Also added public headers to the target
with install target_sources(... FILE_SET ...) as documented in [4], which were absent
from exports anyway in podofo-config.cmake as well. This increases the cmake
requirement to 3.23. Initial export patches were introduced by vcpkg maintainers,
as described here[5]

[1] xournalpp/xournalpp#6364 (comment)
[2] xournalpp/xournalpp#6364 (comment)
[3] https://cmake.org/cmake/help/latest/guide/tutorial/Adding%20Export%20Configuration.html
[4] https://cmake.org/cmake/help/latest/command/target_sources.html#file-sets
[5] microsoft/vcpkg#31584 (comment)
@ceztko
Copy link
ceztko commented May 23, 2025

I greatly improved creation of podofo-config.cmake by adding public header exports and added creation of podofo-config-version.cmake, which was missing. Such file convention is supported as well. I have no time to test them but at least everything should be in place now and PoDoFo should be findable as a package (provided the export config files are reachable) and be usable with just target_link_libraries(target podofo::podofo). 1.0.0 is now very close to be released and there is only one last known critical bug waiting to be fixed.

@bhennion
Copy link
Collaborator Author

I greatly improved creation of podofo-config.cmake by adding public header exports and added creation of podofo-config-version.cmake, which was missing. Such file convention is supported as well. I have no time to test them but at least everything should be in place now and PoDoFo should be findable as a package (provided the export config files are reachable) and be usable with just target_link_libraries(target podofo::podofo). 1.0.0 is now very close to be released and there is only one last known critical bug waiting to be fixed.

That's great! Thanks a lot!
Things like

    find_package(PoDoFo 1.0.0 REQUIRED)
    message(STATUS "Found PoDoFo: ${PoDoFo_VERSION}")

seem to behave exactly as expected.

I encountered an include issue though:

In file included from /usr/include/podofo/auxiliary/StreamDeviceBase.h:10,
                 from /usr/include/podofo/podofo.h:17,
                 from [path to]/xournalpp/src/core/pdf/base/PoDoFoPdfExport.cpp:9:
/usr/include/podofo/auxiliary/basedefs.h:110:10: fatal error: podofo_config.h: No such file or directory
  110 | #include "podofo_config.h"
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.

The point seems to be that podofo_config.h is now installed under podofo/podofo_config.h. Patching the PoDoFo sources with patch.txt seems to make it work (I'm testing by installing PoDoFo via Archlinux's AUR package podofo-git).

ceztko added a commit to podofo/podofo that referenced this pull request May 24, 2025
…nfig.cmake exports improvements

Should fix the issue reported here xournalpp/xournalpp#6364 (comment)
@ceztko
Copy link
ceztko commented May 24, 2025

I encountered an include issue though

@bhennion : Thank you for the feedback. It was a regression in the location where the generated podofo_config.h must be installed. I restored the location as originally intended. BTW: I'm quite candid in admitting I have no idea how this package mechanism is generally enabled. I guess in a linux distribution or using a package manager that integrates with CMake these packages are automatically found because the config files are installed in a known location. What if I want the test the mechanism in windows so find_package(PoDoFo 1.0.0 REQUIRED) will work when installing the library in a arbitrary location?

Please consider that the automatic config file creation was introduced by a random vcpkg patch I found just lurking their issues, which I decided to merge. I didn't know that somehow this wasn't actually working: header files were not exported, so I have no idea how the headers location resolution would have worked.

@bhennion
Copy link
Collaborator Author

What if I want the test the mechanism in windows so find_package(PoDoFo 1.0.0 REQUIRED) will work when installing the library in a arbitrary location?

I'm guessing this won't work. The user will have to add the right path to some environment variable.
Actually, the same is true for Linux if you're library is not properly installed and those files are not in the standard CMake paths

@bhennion
Copy link
Collaborator Author

In any case, it all seems to work fine now. Thanks a lot!

@ceztko
Copy link
ceztko commented May 24, 2025

What if I want the test the mechanism in windows so find_package(PoDoFo 1.0.0 REQUIRED) will work when installing the library in a arbitrary location?

I'm guessing this won't work.

Altering CMAKE_PREFIX_PATH does the trick! With latest changes in master now you can do:

cmake_minimum_required(VERSION 3.23)

project(PoDoFoSample)

set(CMAKE_CXX_STANDARD 17)

# In Windows, or if you installed the package to non
# standard paths
list(APPEND CMAKE_PREFIX_PATH "D:\\Test\\PoDoFoInstallDir")

# Alternative to altering CMAKE_PREFIX_PATH
#set(PoDoFo_DIR "D:\\Test\\PoDoFoInstallDir\\share\\cmake\\podofo")

find_package(PoDoFo 1.0.0 REQUIRED)
message(STATUS "Found PoDoFo: ${PoDoFo_VERSION}")

add_executable(helloworld main.cpp)
target_link_libraries(helloworld podofo::podofo)

It's the first time ever I try to locate libraries this way, I have to say it's an interesting mechanism for package managers. I am now tagging 1.0-rc1. I will inform you here when 1.0 final is released (few days).

@ceztko
Copy link
ceztko commented May 31, 2025

PoDoFo 1.0.0 released (announcement).

@Rehan-MALAK
Copy link

@bhennion

Hi! I just wanted to thank you for your code. I just built your pr/podofo branch with -DENABLE_QPDF=yes, and the exported PDF now includes the links ! I really missed that feature !

A typical use case for me is annotating math courses—mostly generated by LaTeX—with clickable table of contents, links to the bibliography, and links to previous definitions or theorems. This is a huge improvement!

Thanks again for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
0