Skip to content

Commit 7a1484f

Browse files
Add CONTRIBUTING.md, with how-to docs on element-name code generation
This change adds some how-to docs for the process of regenerating the element-name hash tables in the ElementName.java file — which is something you need to do when adding a new HTML element. The docs include steps and info on: - How to add new element constants - Which code sections to uncomment for regeneration - How to compile and run the code generator - Notes on Gecko vs Java-only builds - How to update the generated arrays - Technical details about the hash function and BST structure
1 parent 6904dd1 commit 7a1484f

File tree

1 file changed

+95
-0
lines changed

1 file changed

+95
-0
lines changed

CONTRIBUTING.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# Contributing to htmlparser
2+
3+
## Adding New HTML Elements
4+
5+
When adding new HTML elements to the parser, you must regenerate the element name hash tables in `src/nu/validator/htmlparser/impl/ElementName.java`.
6+
7+
### Step 1: Add the new element constant
8+
9+
Add a new `static final ElementName` constant for your element, following the existing pattern:
10+
11+
```java
12+
public static final ElementName MYNEWELEMENT = new ElementName(
13+
"mynewelement", "mynewelement",
14+
// CPPONLY: NS_NewHTMLElement,
15+
// CPPONLY: NS_NewSVGUnknownElement,
16+
TreeBuilder.OTHER);
17+
```
18+
19+
The flags (like `TreeBuilder.OTHER`, `SPECIAL`, `SCOPING`, etc.) depend on how the element should be handled by the tree builder.
20+
21+
### Step 2: Uncomment the code generation sections
22+
23+
Uncomment three sections in `ElementName.java`:
24+
25+
1. **The imports** near the top (~lines 26-39):
26+
- `java.io.*`
27+
- `java.util.*`
28+
- `java.util.regex.*`
29+
30+
2. **`implements Comparable<ElementName>`** on the class declaration (~line 49)
31+
32+
3. **The code generation block** marked with:
33+
`"START CODE ONLY USED FOR GENERATING CODE uncomment and run to regenerate"`
34+
That includes the `main()` method and helper functions (~lines 272-659)
35+
36+
### Step 3: Add case to treeBuilderGroupToName() if needed
37+
38+
If your element uses a new `TreeBuilder` group constant, add a case for it in the `treeBuilderGroupToName()` method within the code generation block.
39+
40+
### Step 4: Compile and run
41+
42+
Compile the project:
43+
44+
```bash
45+
mvn compile
46+
```
47+
48+
Run the `ElementName` class with paths to the Gecko tag-list files:
49+
50+
```bash
51+
java -cp target/classes nu.validator.htmlparser.impl.ElementName \
52+
/path/to/nsHTMLTagList.h \
53+
/path/to/SVGTagList.h
54+
```
55+
56+
**For Java-only builds** (not Gecko), you can use empty dummy files:
57+
58+
```bash
59+
mkdir -p /tmp/tagfiles
60+
touch /tmp/tagfiles/nsHTMLTagList.h /tmp/tagfiles/SVGTagList.h
61+
java -cp target/classes nu.validator.htmlparser.impl.ElementName \
62+
/tmp/tagfiles/nsHTMLTagList.h \
63+
/tmp/tagfiles/SVGTagList.h
64+
```
65+
66+
> **Note:** Using empty files means the `CPPONLY` comments will all show `NS_NewHTMLUnknownElement`. For Gecko builds, use the actual files from moz-central:
67+
> - `parser/htmlparser/nsHTMLTagList.h`
68+
> - `dom/svg/SVGTagList.h`
69+
70+
### Step 5: Update the generated arrays
71+
72+
The program outputs:
73+
1. All element constant definitions (with updated `CPPONLY` comments if using real Gecko tag files)
74+
2. The `ELEMENT_NAMES` array in level-order binary search tree order
75+
3. The `ELEMENT_HASHES` array with corresponding hash values
76+
77+
Replace the existing `ELEMENT_NAMES` and `ELEMENT_HASHES` arrays in the file with the generated output. The arrays must stay in sync—element at position N in `ELEMENT_NAMES` must have its hash at position N in `ELEMENT_HASHES`.
78+
79+
### Step 6: Re-comment the code generation sections
80+
81+
After regeneration, comment out the sections you uncommented in Step 2 to restore the file to its normal state.
82+
83+
### Step 7: Run tests
84+
85+
Verify your changes work correctly:
86+
87+
```bash
88+
mvn test
89+
```
90+
91+
### Technical Details
92+
93+
The hash function (`bufToHash`) creates a unique integer for each element name using the element's length and specific character positions. The arrays are organized as a level-order binary search tree for O(log n) lookup performance.
94+
95+
If you encounter a hash collision (two elements with the same hash), the regeneration will report an error. That would require modifying the hash function, which has not been necessary historically.

0 commit comments

Comments
 (0)