Skip to content

perf: optimize ColorValidator — up to 3x faster on arrays, 14x on scalars#5576

Open
KRRT7 wants to merge 7 commits intoplotly:mainfrom
KRRT7:perf/color-validator-optimization
Open

perf: optimize ColorValidator — up to 3x faster on arrays, 14x on scalars#5576
KRRT7 wants to merge 7 commits intoplotly:mainfrom
KRRT7:perf/color-validator-optimization

Conversation

@KRRT7
Copy link
Copy Markdown
Contributor

@KRRT7 KRRT7 commented Apr 16, 2026

Overview

Optimizes ColorValidator validation with up to 3x speedup on numpy array inputs and fixes a bug where invalid colors in 2D numpy arrays were silently accepted.

Changes

1. ColorValidator optimization (commit 1)

  • Replace custom fullmatch() shim (which called dir(), rebuilt regex strings, and recompiled via re.match() on every call) with compiled .fullmatch() — the Python 3.4 shim is unnecessary since plotly requires ≥3.8
  • Convert named_colors from list to frozenset for O(1) lookups instead of O(n) linear scan through 148 entries
  • Merge validate_coerce loop and find_invalid_els second pass into a single pass
  • Call perform_validate_coerce directly for 1-D numpy array elements, skipping the full type-dispatch per element
  • Reorder checks: named color lookup (now O(1)) before the rare var(--*) ddk regex

2. Bug fix: 2D numpy silent invalid color acceptance (commit 2)

  • 2D numpy arrays with invalid color strings were silently replacing them with None instead of raising ValueError. The equivalent list input correctly raised.
  • Remove dead code (find_invalid_els default arg that was no longer reachable)
  • Add comprehensive tests: 100% line coverage on changed region (lines 1360–1500)

Benchmarks

ColorValidator (1000 color strings, 50 iterations)

Path Before After Speedup
List input 17.71 ms 9.00 ms 1.97x
NumPy input 29.03 ms 9.49 ms 3.06x
Scalar 78.3 µs 5.7 µs 13.7x

Testing

  • All 1324 validator tests pass
  • 100% line coverage on changed region
  • 2D numpy edge case verified (nested arrays dispatch through recursive validate_coerce)
  • ruff format passes
  • CHANGELOG updated

KRRT7 added 7 commits April 16, 2026 07:12
- Replace custom fullmatch() shim (which rebuilt regex strings and
  recompiled on every call via dir() + re.match) with compiled
  pattern .fullmatch() — Python 3.4+ compat shim is no longer needed
- Convert named_colors from list to frozenset for O(1) lookups
  instead of O(n) linear scan through 148 entries
- Merge validate + find_invalid_els into a single pass over arrays,
  eliminating redundant second iteration
- Call perform_validate_coerce directly for 1-D numpy array elements,
  skipping the full validate_coerce type-dispatch per element
- Reorder checks: named color lookup (now O(1)) before rare ddk regex

Benchmarks (1000 color strings, 50 iterations):
  List path:   17.71ms → 9.00ms  (1.97x faster)
  Numpy path:  29.03ms → 9.49ms  (3.06x faster)
  Scalar:      78.3µs  → 5.7µs   (13.7x faster)
2D+ numpy arrays with invalid color strings were silently replacing
them with None instead of raising ValueError. The list path correctly
raised for the same input. This was caused by the multidimensional
numpy fallback not collecting invalid elements from sub-array results.

Also adds comprehensive tests covering all ColorValidator code paths:
- None and typed_array_spec inputs
- 1D numpy with invalid colors (raise path)
- 2D numpy with invalid colors (now raises, was silently accepting)
- 3-level nested lists (find_invalid_els recursion)
- Numeric numpy fast path with numbers_allowed
- Removes dead code (unreachable default arg in find_invalid_els)

100% line coverage on the changed region (lines 1360-1500).
Three changes to the hot path hit by every fig.show(), write_html(),
to_json(), and write_image() call:

1. to_typed_array_spec: replace copy_to_readonly_numpy_array (which
   copies the array, wraps through narwhals, and sets readonly flag)
   with a lightweight np.asarray — the input is already a deepcopy
   from to_dict(), so copying again is pure waste.

2. convert_to_base64: replace is_homogeneous_array (which checks
   numpy, pandas, narwhals, and __array_interface__) with a direct
   isinstance(value, np.ndarray) check. In the to_dict() context,
   data is already validated and stored as numpy arrays.

3. is_skipped_key: replace list scan with frozenset lookup (O(1)).

Profile results (10 traces × 100K points, 20 calls):
  to_typed_array_spec: 1811ms → 1097ms (40% faster)
  copy_to_readonly_numpy_array: 226ms → 0ms (eliminated)
  narwhals from_native: 68ms → 0ms (eliminated)
  is_skipped_key: 41ms → ~0ms (eliminated)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant