DocBook reader: Preserve callouts and xrefs in code #7389

jtojnar · 2021-06-17T21:16:47Z

When converting DocBook documents like https://github.com/NixOS/nixpkgs/blob/c06cb69ca0dc02b37e27104ac8415c3cab173328/nixos/modules/programs/plotinus.xml#L27, these need to be dealt with manually.

src/Text/Pandoc/Readers/DocBook.hs

jtojnar · 2021-06-17T21:20:29Z

Ideally, the AST would support rich code blocks and the dumbing down would be handled in writers but until then let’s preserve the data.

jtojnar · 2021-06-17T21:21:13Z

It might be also useful to control this behaviour with a config option (extension).

When converting DocBook documents like https://github.com/NixOS/nixpkgs/blob/c06cb69ca0dc02b37e27104ac8415c3cab173328/nixos/modules/programs/plotinus.xml#L27, these need to be dealt with manually, or too much information will be lost. Ideally, the AST would support rich code blocks and the dumbing down would be handled in writers but until then, let’s preserve the elements verbatim.

@jtojnar

see: jgm/pandoc#7389 for full details. @jtojnar

jgm · 2021-06-21T00:46:37Z

Can you explain the motivation for this change, and what it does?
There is not even a test to show how the behavior changes.

jtojnar · 2021-06-21T06:51:47Z

In NixOS manual, we have code like

<programlisting>
<xref linkend="opt-services.xserver.desktopManager.gnome.enable"/> = true;
</programlisting>

docbook-xsl renders it as

services.xserver.desktopManager.gnome.enable = true;

We are trying to switch our documentation from DocBook to CommonMark and using pandoc for the conversion.

Currently, pandoc parses the code as

 = true;

since xref is an empty element and elementToStr will return empty string for it.

We would like it to produce the following

<xref linkend="opt-services.xserver.desktopManager.gnome.enable"/> = true;

The xref tag is in the code block verbatim, letting us to deal with it manually. (Probably using some pandoc filter.)

I will add a test but first I would like to know if this behaviour is reasonable or if it should be optional (off-by-default extension).

jgm · 2021-07-04T15:51:24Z

I don't really like the idea of including a string in the code block that actually isn't verbatim code.
Would it be possible for pandoc to resolve the xref? I'm not sure how that happens -- is all the information needed for doing that elsewhere in the docbook document?

jtojnar · 2021-07-04T16:44:43Z

In our use case, the anchors are located in different files (that are XIncluded into the final book along with the file containing the xrefs). So we would need some other mechanism to resolve the links, which would be even more complex than this. Leaving it up to the user (e.g. to resolve them using a filter script) would probably be cleaner. The issue is it requires preserving the data.

But I agree that mixing code with metadata without any way to distinguish them would be ugly. Alternative would be keeping this behind an extension flag Ext_docbook_preserve_markup_in_codeblocks and when a XML element were detected in a code block XML element, it would set the raw contents of the XML element as the CodeBlock contents (and add a raw_docbook class to the code block to distinguish it).

Much nicer option would be modifying the AST to support rich code blocks – this is supported by other languages than DocBook (HTML, rST). But yeah, it would be a pretty big backwards compatibility break. (Or adding RichCodeBlock if that is more acceptable.)

jtojnar · 2021-07-04T16:48:02Z

Or maybe an extension that would keep the elements that would cause information loss as RawBlock, to be handled manually. That feels like kosher solution that would be easier to implement than a AST changes.

But I could even do this short term and work on the AST changes long term if you think it is a good idea.

jgm · 2021-07-04T18:27:13Z

A pragmatic solution would be to preprocess, resolving these references, before passing to pandoc.

jtojnar marked this pull request as draft June 17, 2021 21:17

jtojnar commented Jun 17, 2021

View reviewed changes

src/Text/Pandoc/Readers/DocBook.hs Outdated Show resolved Hide resolved

jtojnar force-pushed the docbook-code-xref branch from f07969d to 7031a92 Compare June 17, 2021 21:26

jtojnar force-pushed the docbook-code-xref branch from 7031a92 to 88f99ed Compare June 17, 2021 22:23

jtojnar marked this pull request as ready for review June 17, 2021 22:23

blaggacao mentioned this pull request Jun 18, 2021

pandoc: add useful patch for nixos manual transformation to CommonMark NixOS/nixpkgs#127391

Closed

blaggacao added a commit to blaggacao/nixpkgs that referenced this pull request Jun 18, 2021

pandoc: add useful patch for nixos manual transformation to CommonMark 8000

e719316

see: jgm/pandoc#7389 for full details. @jtojnar

jgm force-pushed the main branch from 3ff001b to 3da4d41 Compare December 18, 2024 23:02

jgm force-pushed the main branch from 60c147d to bfcff3e Compare May 12, 2025 00:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DocBook reader: Preserve callouts and xrefs in code #7389

DocBook reader: Preserve callouts and xrefs in code #7389

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DocBook reader: Preserve callouts and xrefs in code #7389

Are you sure you want to change the base?

DocBook reader: Preserve callouts and xrefs in code #7389

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!