perf: optimize ColorValidator — up to 3x faster on arrays, 14x on scalars by KRRT7 · Pull Request #5576 · plotly/plotly.py

KRRT7 · 2026-04-16T12:14:04Z

Overview

Optimizes ColorValidator validation with up to 3x speedup on numpy array inputs and fixes a bug where invalid colors in 2D numpy arrays were silently accepted.

Changes

1. ColorValidator optimization (commit 1)

Replace custom fullmatch() shim (which called dir(), rebuilt regex strings, and recompiled via re.match() on every call) with compiled .fullmatch() — the Python 3.4 shim is unnecessary since plotly requires ≥3.8
Convert named_colors from list to frozenset for O(1) lookups instead of O(n) linear scan through 148 entries
Merge validate_coerce loop and find_invalid_els second pass into a single pass
Call perform_validate_coerce directly for 1-D numpy array elements, skipping the full type-dispatch per element
Reorder checks: named color lookup (now O(1)) before the rare var(--*) ddk regex

2. Bug fix: 2D numpy silent invalid color acceptance (commit 2)

2D numpy arrays with invalid color strings were silently replacing them with None instead of raising ValueError. The equivalent list input correctly raised.
Remove dead code (find_invalid_els default arg that was no longer reachable)
Add comprehensive tests: 100% line coverage on changed region (lines 1360–1500)

Benchmarks

ColorValidator (1000 color strings, 50 iterations)

Path	Before	After	Speedup
List input	17.71 ms	9.00 ms	1.97x
NumPy input	29.03 ms	9.49 ms	3.06x
Scalar	78.3 µs	5.7 µs	13.7x

Testing

All 1324 validator tests pass
100% line coverage on changed region
2D numpy edge case verified (nested arrays dispatch through recursive validate_coerce)
ruff format passes
CHANGELOG updated

I have followed the PR Guidelines
I have followed the Community Code of Conduct
I have added an entry to the CHANGELOG

- Replace custom fullmatch() shim (which rebuilt regex strings and recompiled on every call via dir() + re.match) with compiled pattern .fullmatch() — Python 3.4+ compat shim is no longer needed - Convert named_colors from list to frozenset for O(1) lookups instead of O(n) linear scan through 148 entries - Merge validate + find_invalid_els into a single pass over arrays, eliminating redundant second iteration - Call perform_validate_coerce directly for 1-D numpy array elements, skipping the full validate_coerce type-dispatch per element - Reorder checks: named color lookup (now O(1)) before rare ddk regex Benchmarks (1000 color strings, 50 iterations): List path: 17.71ms → 9.00ms (1.97x faster) Numpy path: 29.03ms → 9.49ms (3.06x faster) Scalar: 78.3µs → 5.7µs (13.7x faster)

2D+ numpy arrays with invalid color strings were silently replacing them with None instead of raising ValueError. The list path correctly raised for the same input. This was caused by the multidimensional numpy fallback not collecting invalid elements from sub-array results. Also adds comprehensive tests covering all ColorValidator code paths: - None and typed_array_spec inputs - 1D numpy with invalid colors (raise path) - 2D numpy with invalid colors (now raises, was silently accepting) - 3-level nested lists (find_invalid_els recursion) - Numeric numpy fast path with numbers_allowed - Removes dead code (unreachable default arg in find_invalid_els) 100% line coverage on the changed region (lines 1360-1500).

…r-optimization

Three changes to the hot path hit by every fig.show(), write_html(), to_json(), and write_image() call: 1. to_typed_array_spec: replace copy_to_readonly_numpy_array (which copies the array, wraps through narwhals, and sets readonly flag) with a lightweight np.asarray — the input is already a deepcopy from to_dict(), so copying again is pure waste. 2. convert_to_base64: replace is_homogeneous_array (which checks numpy, pandas, narwhals, and __array_interface__) with a direct isinstance(value, np.ndarray) check. In the to_dict() context, data is already validated and stored as numpy arrays. 3. is_skipped_key: replace list scan with frozenset lookup (O(1)). Profile results (10 traces × 100K points, 20 calls): to_typed_array_spec: 1811ms → 1097ms (40% faster) copy_to_readonly_numpy_array: 226ms → 0ms (eliminated) narwhals from_native: 68ms → 0ms (eliminated) is_skipped_key: 41ms → ~0ms (eliminated)

KRRT7 added 7 commits April 16, 2026 07:12

Merge remote-tracking branch 'upstream/main' into perf/color-validato…

16a151f

…r-optimization

chore: ruff format and CHANGELOG entry

237044e

chore: update CHANGELOG with to_dict optimization

40561ca

revert: remove to_dict optimization (moved to separate PR)

8734a7e

KRRT7 mentioned this pull request Apr 16, 2026

perf: optimize to_dict() serialization — 40% faster for data-heavy figures #5577

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: optimize ColorValidator — up to 3x faster on arrays, 14x on scalars#5576

perf: optimize ColorValidator — up to 3x faster on arrays, 14x on scalars#5576
KRRT7 wants to merge 7 commits intoplotly:mainfrom
KRRT7:perf/color-validator-optimization

KRRT7 commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

KRRT7 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

1. ColorValidator optimization (commit 1)

2. Bug fix: 2D numpy silent invalid color acceptance (commit 2)

Benchmarks

ColorValidator (1000 color strings, 50 iterations)

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KRRT7 commented Apr 16, 2026 •

edited

Loading