CMS-948: data creation scripts #445

RealOrangeOne · 2025-12-15T15:49:58Z

What is the context of this PR?

We need a way to create test data in bulk for performance testing etc.

https://officefornationalstatistics.atlassian.net/browse/CMS-948

Data creation is seeded, meaning it can be recreated reliably.

Data is identified by a prefix. To save complexity, it's only possible to delete all test data, rather than specific seeds.

How to review

There are 2 new management commands.

create_test_data creates data based on a seed.
delete_test_data removes all test data from the system (based on the prefix)

Follow-up Actions

List any follow-up actions (if applicable), like needed documentation updates or additional testing.

This lets the signals be sent when running non-wagtail admin codepaths.

…d reusing stock delete functionality

The MP_Node factories need `parent` in `django_get_or_create`, and to be using `TreeQuerySet` else the needed methods don't exist.

Because the topic tree hides the root node in a number core method, it gets corrupted during deletion.

…-scripts

This keeps the command parts separate

By default, they enqueue tasks when instances are saved, which is unnecessary when many of the instances are about to be deleted.

This factory is definitely configured incorrectly, but something about the model, dependent tests or factory itself requires that it work this way.

They may not have been indexed yet

No need to be less informative when actually deleting.

MaciekBaron · 2026-01-23T09:46:39Z

.pylintrc

        missing-function-docstring,
        fixme,
        too-many-public-methods,
+        line-too-long,


Maybe question for @MebinAbraham but I think it might not be a good idea to disable line-too-long globally, what do you think?

Line length is already verified by ruff. pylint's line-too-long verifies that the entire line (including comments) is under a given size. That means a line can be valid with ruff, but fail with pylint simply because any ignore comments push it over the threshold. That distinction isn't really useful, so this just lets ruff handle it.

MaciekBaron · 2026-01-23T09:56:37Z

cms/test_data/config.py

+
+class PageCreationConfig(ModelCreationConfig):
+    published: FractionalFloat = 0.5
+    revisions: NonNegativeInt | RangeConfig = 0


Unless I misunderstood, aren't all Pages guaranteed to have at least one revision in a normal environment?

I think the revision is the history, so the first draft save of a page won't necessarily have a revision. That's my understanding at least, I may very well be wrong.

MaciekBaron · 2026-01-23T10:03:13Z

cms/test_data/config.py

+
+
+class PageCreationConfig(ModelCreationConfig):
+    published: FractionalFloat = 0.5


I think a better name would be published_ratio or published_probability.

The implementation is probability rather than ratio, so I'll go with that.

Naming things is hard!

MaciekBaron · 2026-01-23T10:10:54Z

cms/test_data/tests/test_commands.py

+                    "dataset_manual_links": 1,
+                    "explore_more": 2,
+                    "published": 1,
+                    "revisions": {"min": 1, "max": 3},


Minor: Could we change the range min to 2? This way we test that the range works, and that it doesn't just create one item for any given range (if it always generates 1 item, the tests on line 70 and 71 will always pass).

MaciekBaron · 2026-01-23T10:34:43Z

cms/test_data/constants.py

@@ -0,0 +1 @@
+SEEDED_DATA_PREFIX = "Z-RANDOM "


Minor: Would it be better for this file to be named constants.py?

Prior art: https://github.com/ONSdigital/dis-wagtail/blob/main/cms/datavis/constants.py

utils.py usually just contain utility functions

Agreed. I think I'd imagined this module having more "utils", but never ended up writing any.

…-scripts

MaciekBaron added the component: Tooling Makefiles, linters, container instrumentation etc label Jan 2, 2026

RealOrangeOne added 7 commits January 6, 2026 14:07

Start script to create test data

e95b725

Add delete test data command

056cbc8

Run hooks to ensure auto-created child pages are created

087c2c8

Add confirmation prompt to create test data command

b7e3afa

Make logging easier to read when running interactively

e99dd04

Convert some Wagtail hooks into signals

068c90e

This lets the signals be sent when running non-wagtail admin codepaths.

Publish topic pages after creation

584053f

RealOrangeOne force-pushed the CMS-948-data-creation-scripts branch from 0a987a7 to 584053f Compare January 6, 2026 16:54

This was referenced Jan 7, 2026

Convert some Wagtail hooks into signals #455

Merged

Make post topic page child creation safer #456

Closed

RealOrangeOne added 19 commits January 7, 2026 12:37

Better support deleting treebeard models by collecting descendants an…

a2f5333

…d reusing stock delete functionality

Extract collector creation to placate ruff

3d9a656

Create "explore more" blocks on topic page

d311a4b

Set faker via factory-boy locale to en_GB

2d0cfb8

Use default site rather than first site

55efcbc

Improve assertions for modified objects

3fe84c5

Populate dataset field on topic

8f73467

Fix typing and linting issues

71ef63b

Pass creation config as file

a1798ce

Use fully fake namespace for dataset factory

2d674c5

Move test generation to its own app

906568d

Validate config with pydantic

4114a1a

Prevent dataset factory from creating duplicate instances

f476e2d

Use correct base manager for topics

3b5ed8c

Allow controlling how often pages are published

be7a422

Remove custom creation implementation and fix factories

d18d78f

The MP_Node factories need `parent` in `django_get_or_create`, and to be using `TreeQuerySet` else the needed methods don't exist.

Add random revision creation

bc5e717

Test seed

6ed2720

Ensure topic tree is valid after test data deletion

63c7a14

Because the topic tree hides the root node in a number core method, it gets corrupted during deletion.

RealOrangeOne added 9 commits January 19, 2026 11:25

Make config an optional argument

fc47beb

Check dataset model

6d8150a

Add helper command to output default config or schema

13b9065

Merge remote-tracking branch 'origin/main' into CMS-948-data-creation…

1a87d02

…-scripts

Extract data creation to its own class

5f13f2f

This keeps the command parts separate

Ensure search and reference indexes are correctly cleaned up

c6241cf

By default, they enqueue tasks when instances are saved, which is unnecessary when many of the instances are about to be deleted.

Run delete and creation inside transactions

c0fa1f4

Restore cursed TopicFactory

f9250e6

This factory is definitely configured incorrectly, but something about the model, dependent tests or factory itself requires that it work this way.

Don't check indexes exist for all models

d875dc7

They may not have been indexed yet

RealOrangeOne marked this pull request as ready for review January 21, 2026 14:52

RealOrangeOne requested a review from a team as a code owner January 21, 2026 14:52

RealOrangeOne added 4 commits January 21, 2026 16:17

Improve code documentation

d3d80f7

Reset factory seed after tests

2ed4995

Query any supported field, rather than maintaining explicit list

ed26fe8

Include full instance names in message before deletion

3ebc665

No need to be less informative when actually deleting.

MaciekBaron requested changes Jan 23, 2026

View reviewed changes

RealOrangeOne added 5 commits January 23, 2026 12:35

Put constant in constants file

edeb40b

Rename published probability so it's clearer that it's a probability

8f39167

Test multiple revisions are created

59f8998

Increase timeout

f8508d3

Merge remote-tracking branch 'origin/main' into CMS-948-data-creation…

fef3586

…-scripts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CMS-948: data creation scripts #445

CMS-948: data creation scripts #445

Uh oh!

RealOrangeOne commented Dec 15, 2025

Uh oh!

MaciekBaron Jan 23, 2026

Uh oh!

RealOrangeOne Jan 23, 2026

Uh oh!

MaciekBaron Jan 23, 2026

Uh oh!

RealOrangeOne Jan 23, 2026

Uh oh!

MaciekBaron Jan 23, 2026

Uh oh!

RealOrangeOne Jan 23, 2026

Uh oh!

MaciekBaron Jan 23, 2026

Uh oh!

MaciekBaron Jan 23, 2026

Uh oh!

RealOrangeOne Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		class PageCreationConfig(ModelCreationConfig):
		published: FractionalFloat = 0.5

CMS-948: data creation scripts #445

Are you sure you want to change the base?

CMS-948: data creation scripts #445

Uh oh!

Conversation

RealOrangeOne commented Dec 15, 2025

What is the context of this PR?

How to review

Follow-up Actions

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants