MarkdownRenderer not rendering HTML as Markdown #367

the-red-herring · 2025-03-21T14:16:13Z

Hi!
I have been trying for a while to do a conversion on a string that can contain some user text, which might well contain some html which I would like to convert to being markdown. This is the way I have setup to do this:

        List<Extension> extensions = List.of(TablesExtension.create());
        Parser parser = Parser.builder()
                .extensions(extensions)
                .build();
        MarkdownRenderer renderer = MarkdownRenderer.builder()
                .extensions(extensions)
                .build();

        Node document = parser.parse(stringToConvert);

        return renderer.render(document);

A simple example of stringToConvert is:

<ul><li>asdf</li><li>asdf</li><li>asdf</li><li>asdf</li></ul>

The returned string from the renderer.render(document) is:

<ul><li>asdf</li><li>asdf</li><li>asdf</li><li>asdf</li></ul>

This is what I was expecting to be returned though:

* asdf
* asdf
* asdf

Apologies if I am missing something quite obvious here. It looks like the Parser is returning a Node that, on the surface, looks correct - I suspect I might be doing something wrong here or doing something that is not supported. I can get the example usage of MarkdownRenderer provided in the docs working as it should do but the way I am doing it above is slightly different from the example. I don't see an obvious reason why it shouldn't work though?

(Also see what the reference implementation does: https://spec.commonmark.org/dingus/)

The text was updated successfully, but these errors were encountered:

robinst · 2025-03-23T12:53:06Z

The reason for this is that commonmark-java's Parser is a Markdown parser, not an HTML parser. Markdown allows some HTML to be embedded, but when rendered back to HTML or Markdown, it's mostly just passed through. There is no actual semantic meaning to that HTML for the parser/renderer.

If your input is always HTML, you will want to use a HTML parser, and then convert from the HTML representation (ul, li etc) to the commonmark-java representation such as BulletList, ListItem, etc. Then rendering that using MarkdownRenderer will work as you expect.

the-red-herring · 2025-03-24T11:31:29Z

Hi @robinst,

I appreciate the answer, thank you,

the-red-herring added the bug label Mar 21, 2025

robinst closed this as completed Mar 23, 2025

robinst removed the bug label Mar 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MarkdownRenderer not rendering HTML as Markdown #367

MarkdownRenderer not rendering HTML as Markdown #367

Uh oh!

Uh oh!

MarkdownRenderer not rendering HTML as Markdown #367

MarkdownRenderer not rendering HTML as Markdown #367

Comments

Uh oh!

Uh oh!

Uh oh!