Skip to content

feat: add support for drawio.png#53

Open
Saphnx wants to merge 7 commits intotuunit:mainfrom
Saphnx:main
Open

feat: add support for drawio.png#53
Saphnx wants to merge 7 commits intotuunit:mainfrom
Saphnx:main

Conversation

@Saphnx
Copy link
Copy Markdown

@Saphnx Saphnx commented Mar 13, 2026

Add support for editable png diagrams.

Comment on lines +202 to +203
xml_data = unquote(img.info.get("mxfile"))
diagram_xml = etree.fromstring(xml_data.encode())
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when it is just a png image without the mxfile information? I think it would break and would not display anything, correct?

Copy link
Copy Markdown
Author

@Saphnx Saphnx Mar 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's missing any mxfile data, it'll fail to retrieve the metadata and therefore won't be a valid diagram, causing it to fall back to the existing error handling. I've updated the example to include this case:

image

I've also updated mkdocs-classic, as I'd missed that and noticed it when I was pulling in the latest changes.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is undesirable as it could easily happen that users have named files diagram.drawio.png without it containing the mxdata in the png metadata and this PR would hence be a breaking change for those users of this plugin. Can we introduce a special error handling for png files to force rendering/treating them as normal images if the parsing fails?

Copy link
Copy Markdown
Author

@Saphnx Saphnx Mar 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I hadn't considered that people may do that without intentionally saving it as an editable PNG. In this case, it's a valid flow, so I've shifted things around a little and introduced a warning for when the mxfile data is missing on a .drawio.png file.

The flow now is:

  1. Attempt to extract mxfile xml from file using new retrieve_mxfile method:
    1. If PNG with no mxfile data, log a warning and return None
    2. Else, parse the mxfile data into XML and return it
    3. An exception will be thrown if it fails to read or parse the file
  2. Handle result:
    1. If None, it's a PNG so simply skip this diagram to allow it to render normally
    2. If an exception was thrown, log an error and replace the diagram with "Not a diagram file" message (as before)
    3. If it successfully retrieved the XML, continue
  3. Call substitute_with_file method as before, but this time passing in the XML instead of the path and src:
    1. If successful, return the substitution
    2. If an exception was thrown, as in step 2, log an error and replace the diagram with "Not a diagram file" message (as before)

Does the flow make sense? Or did you have something else in mind? I have little experience with Python, so happy to tweak things if it can be tidied up further.

pyproject.toml Outdated
python = ">=3.8.0,<4.0"
requests = ">=2.0"
Jinja2 = ">=3.0"
Pillow = "<=12.1.1"
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pillow is quite excessive for just retrieving a bit of metadata from a PNG. It introduces a significant amount of sub dependencies and C code compilation which easily fails or is just unnecessary overhead in terms of compute and storage. Therefore I suggest we drop pillow in favour of pypng

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per PNG Spec RFC2083

https://datatracker.ietf.org/doc/html/rfc2083#page-24

This should work for extracting the mxfile contents without the overhead of pillow

import png

r = png.Reader(filename="file.png")
for chunk in r.chunks():
    if chunk[0] == b"tEXt":
        key, value = chunk[1].split(b"\x00", 1)
        if key == b"mxfile":
            mxfile = value.decode("latin-1")

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much better, thanks for your effort =)


xml_data = unquote(mxfile_metadata)
return etree.fromstring(xml_data.encode())
else:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this else is redundant just put the return here

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's quite a blatant case, gave me a good laugh at myself at least! Good spot, and apologies, I must have overlooked it in my moving things around and working out the best flow.


xml_data = unquote(mxfile_metadata)
return etree.fromstring(xml_data.encode())
else:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this else is redundant just put the return here to reduce nesting

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about it move the return to the verify top and invert the if statement

if not src.endswith(".png"):
  etree.parse(path.joinpath(src).resolve())

this way we adhere to early return principles and you can get rid of the indentation for the complete png extraction logic

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

furthermore the regex ignores the casing this statement is case sensitive so convert to lower casing first and then check the contains

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about it move the return to the verify top and invert the if statement

if not src.endswith(".png"):
  etree.parse(path.joinpath(src).resolve())

this way we adhere to early return principles and you can get rid of the indentation for the complete png extraction logic

Absolutely - much nicer.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

furthermore the regex ignores the casing this statement is case sensitive so convert to lower casing first and then check the contains

Ahh good spot, thanks for that. That'd be the sort of thing I'd cover with a unit test normally, but there's none in the project so far. Is that a preference given the size of the plugin? Or would you be happy if I added some? Not done it with Python before, but I see Unittest is built-in?

Also fix uppercase scenario and tidy code flow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants