Skip to content

Conversation

@ronaldtse
Copy link
Contributor

A heavy refactor. There are implications for all pending PRs... will see what we can do...

# Format: <?xml version="1.0" encoding="UTF-8"?>
# Both version and encoding are optional in the match
# Use character class excluding '>' to prevent ReDoS
if xml.match(/\A[ \t\r\n]*<\?xml[ \t\r\n]+([^>]+)\?>/)

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on a
library input
may run slow on strings starting with '<?xml\t' and with many repetitions of '\t\t'.
This
regular expression
that depends on a
library input
may run slow on strings starting with '<?xml\t' and with many repetitions of '\t\t'.
This
regular expression
that depends on a
library input
may run slow on strings starting with '<?xml\t' and with many repetitions of '\t\t'.
@HassanAkbar
Copy link
Member

@ronaldtse, Please let me know if there's anything I can do to help finalize this PR.

@ronaldtse ronaldtse force-pushed the rt-xml-namespace-standards-alignment branch from e775d39 to d01695a Compare January 15, 2026 04:31
ronaldtse and others added 10 commits January 15, 2026 12:51
…data loss

Nokogiri parser drops everything after an unescaped & character during
parsing, causing data loss. For example, 'R&C' becomes 'R' (losing '&C').

This fix pre-processes XML to escape unescaped & characters before parsing,
while preserving all valid entities (XML, HTML, and custom entities).

The regex uses negative lookahead to only escape & that are NOT part of
entity-like patterns (&xxx; where xxx is alphanumeric, #digits, or #xhex).

Fixes data loss issue in malformed XML handling.
…tion

Add support for boolean value_map to implement 'presence means true' pattern:
- Serialization: true -> empty element (<Active/>), false -> omitted
- Deserialization: empty element -> true, absent element -> false

Usage:
  map_element "Active", to: :active, value_map: {
    from: { empty: true, omitted: false },
    to: { true: :empty, false: :omitted }
  }

Changes:
- lib/lutaml/model/xml/transformation.rb: Add boolean handling in serialize_value
  to return empty string for true: :empty mapping and nil for false: :omitted
- lib/lutaml/model/xml/transformation.rb: Add boolean handling in should_skip_value?
  to skip rendering for false: :omitted mapping
- lib/lutaml/model/transform.rb: Add boolean handling in apply_value_map
  to return boolean values directly for Boolean type attributes
- lib/lutaml/model/serialize.rb: Add boolean handling in apply_value_map
  to handle both from/to directions with proper type checking
- lib/lutaml/model/transform/xml_transform.rb: Clean up debug code
- Added form attribute to XmlDataModel::XmlElement
- Created ElementFormOptionRule decision rule
- Fixed DeclarationPlanner to NOT propagate use_prefix: true to children
- Fixed NamespaceCollector to collect children when element is nil
- Fixed serialize.rb to distinguish prefix: false from prefix: nil
- Updated test expectations in namespace_principles_spec.rb and namespace_inheritance_spec.rb

Reduced NOKOGIRI test failures from 144 to 107 (37 failures fixed).

Co-Authored-By: Claude <[email protected]>
- Updated person_spec.rb to expect Type namespace declarations on root element
- Type namespaces are hoisted to parent for efficiency (W3C compliant)

Reduced NOKOGIRI test failures from 107 to 106 (38 failures fixed total).

Co-Authored-By: Claude <[email protected]>
- Updated type_namespace_roundtrip_spec.rb to expect Type namespace hoisting to root
- Updated type_namespace_integration_spec.rb to expect default format behavior

Reduced NOKOGIRI test failures from 106 to 101 (43 failures fixed total).

Co-Authored-By: Claude <[email protected]>
Anonymous classes created with Class.new don't have a name,
which causes NoMethodError when calling split on nil.

This fix uses safe navigation operator and provides fallback
to 'anonymous' when class name is nil.

Fixes namespace_integration_spec.rb:224 test failure
The extract_input_namespaces API now includes format information
(:default or :prefix) in the namespace hash. Updated test
expectations to match the new structure.
@ronaldtse ronaldtse force-pushed the rt-xml-namespace-standards-alignment branch from d01695a to 3c2cad3 Compare January 15, 2026 15:58
- Add polymorphic option to map_element with attribute and class_map
- Fixes namespace_spec.rb:376 test for polymorphic collections with namespaces
- Add create_element, add_element, add_text methods to CustomMethodWrapper
- Handle XML string fragments by storing as raw_content for adapter parsing
- Fixes some custom serialization test failures
…liant behavior

Child models maintain their own namespaces with local xmlns declarations,
following W3C compliance. The MathML namespace is now declared on the child
element (math) instead of being hoisted to the parent element (UnitSymbol).

This aligns with the W3C XML namespace specification and the pattern
established in commit 9ea5e85 where child models maintain their own
namespaces.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants