Skip to content

XML-invalid characters in Lingbuzz article metadata bleed through to the feed, making it unparseable #2

@philipmgrant

Description

@philipmgrant

Currently the feed contains broken XML that fails to parse in readers, because it pulls through the U+0002 character included in the keywords of this LingBuzz article, and that isn't a valid character for XML 1.0. (If you look at the page source, the "whitespace" in the middle of the keyword 'indefinite classifier' actually contains U+0002).

I'm surprised that this isn't taken care of by underlying 'rss' or 'xml' Node package, but it seems that the invalid character bleeds through.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions