Skip to content

lib: add bf_vector and use it in cgen#487

Open
pzmarzly wants to merge 2 commits intofacebook:mainfrom
pzmarzly:push-ukrnssyykoqn
Open

lib: add bf_vector and use it in cgen#487
pzmarzly wants to merge 2 commits intofacebook:mainfrom
pzmarzly:push-ukrnssyykoqn

Conversation

@pzmarzly
Copy link
Copy Markdown
Contributor

@pzmarzly pzmarzly commented Mar 24, 2026

I initially planned to use bf_vector in bf_hashset, but I got convinced by @yaakov-stein that it's not a good idea. However, we could use bf_vector in other places, especially where we use bf_dynbuf or where we handroll vector-like behavior.

Commits:

  • lib: core: add bf_vector - implementation. Vector doubles in size when it hits the limits. Has a helper to remove elements as well, though shrinking is not implemented.
  • cgen: use bf_vector for img - Instead of managing img, img_size, and img_cap by hand, use bf_vector.

@meta-cla meta-cla bot added the cla signed label Mar 24, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 24, 2026

Claude review of PR #487 (0926237)

Suggestions

  • Overflow guard in bf_vector_addsrc/libbpfilter/core/vector.c:165vec->size + 1 can theoretically wrap; bf_vector_add_many already uses __builtin_add_overflow for the same pattern
  • Missing test for bf_vector_inittests/unit/libbpfilter/core/vector.c — public function only tested indirectly via bf_vector_new
  • Orphaned instruction on fixup failuresrc/libbpfilter/cgen/program.c:511EMIT runs before bf_fixup_new in both bf_program_emit_fixup and bf_program_emit_fixup_elfstub; if fixup creation fails after a successful emit, an un-patched placeholder instruction remains in the image
  • Large initial reservation without rationalesrc/libbpfilter/cgen/program.c:119bf_vector_reserve(&_program->img, 512) is 8x larger than the old default of 64; consider a named constant and/or comment
  • void * pointer arithmetic in bf_vector_foreachsrc/libbpfilter/include/bpfilter/core/vector.h:65 — arithmetic on void * is a GCC extension, not valid ISO C; downstream consumers using -pedantic will get a compilation error

Nits

  • _end naming conventionsrc/libbpfilter/include/bpfilter/core/vector.h:66 — should use __end to match __next/__r/__idx pattern in other macros
  • _BF_VECTOR_MAX_CAP commentsrc/libbpfilter/core/vector.c:16 — says "multiplying by 1.5x" but growth uses cap + cap / 2
  • bf_vector_clean doc omits elem_size behaviorsrc/libbpfilter/include/bpfilter/core/vector.h:89elem_size is preserved after clean; the doc should mention this so callers know bf_vector_add works without re-initialization

CLAUDE.md improvements

  • The style guide says "Only use assert() for pointer values" but the codebase uses assert for non-pointer validation in several places (e.g., bf_vector_default asserts elem_size > 0). Consider documenting when non-pointer asserts are acceptable (e.g., for by-value return APIs).
  • Double-underscore identifiers (__next, __end, __r, __idx) in macros are technically reserved by the C standard (C17 7.1.3). This is an established pattern in the codebase; consider documenting it as an accepted convention.

Workflow run

@pzmarzly pzmarzly marked this pull request as draft March 24, 2026 13:30
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from 2bbe579 to 419e434 Compare March 24, 2026 14:33
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from 419e434 to e77f462 Compare March 24, 2026 15:18
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from e77f462 to 305dd19 Compare March 24, 2026 16:07
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from 305dd19 to f5a94cd Compare March 24, 2026 16:57
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from f5a94cd to e9d06a6 Compare March 24, 2026 17:37
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from e9d06a6 to 4cd032f Compare March 24, 2026 17:53
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch 2 times, most recently from 703e225 to a8759b5 Compare March 24, 2026 18:32
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from a8759b5 to c828488 Compare March 24, 2026 19:26
(elem) && (elem) < (vec)->data + ((vec)->size * (vec)->elem_size); \
(elem) = (elem) + (vec)->elem_size)

/**
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: bf_vector_foreach performs pointer arithmetic on void * ((elem) + (vec)->elem_size and (vec)->data + ...), which is a GCC extension not valid in standard C.

Since vector.h is a public header under include/bpfilter/core/, any downstream consumer using this macro with -pedantic -std=c17 will get a compile error. The pedantic integration test (tests/integration/pedantic_c.sh) includes all public headers with -pedantic-errors -std=c17 -Werror — the macro definition alone won't trigger an error (unexpanded macros aren't compiled), but any call site under those flags will fail.

For comparison, bf_list_foreach in list.h avoids this by iterating over typed pointers (bf_list_node *). Consider casting through char *:

Suggested change
/**
#define bf_vector_foreach(vec, elem) \
for (void *(elem) = (vec)->data; \
(elem) && (char *)(elem) < (char *)(vec)->data + ((vec)->size * (vec)->elem_size); \
(elem) = (void *)((char *)(elem) + (vec)->elem_size))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that, but the codebase is already not C17 compliant in different places as well.

* @brief Iterate over every element of a `bf_vector`.
*
* `elem` is declared as a pointer to the element type and will point to each
* element in turn. Do not add or remove elements during iteration.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not add

This should not be an issue.

remove elements during iteration

Same here, as we don't shrink the memory buffer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not add

If that addition causes growth/reallocation, void *(elem) = (vec)->data; will point to the old memory address. I could turn the iterator into index int i = 0; and use double-for trick to have void *(elem) = (vec)->data + i * (vec)->elem_size;, but I think it's an overkill?

remove elements during iteration

But we shift the next elements left, i.e. 5th element becomes 4th, 4th becomes 3rd, etc. The foreach will skip an element because of (elem) = (elem) + (vec)->elem_size.

@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch 2 times, most recently from eb242f7 to 9c2bbd0 Compare April 1, 2026 15:36
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch 3 times, most recently from d5af70d to 73c0440 Compare April 1, 2026 16:07
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from 73c0440 to 8ff783d Compare April 2, 2026 13:04
_program->fixups = bf_list_default(bf_fixup_free, NULL);
_program->handle = handle;

r = bf_vector_reserve(&_program->img, 512);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: The initial reservation of 512 instructions (512 * sizeof(struct bpf_insn) = 4 KiB) is 8x larger than the old _BF_PROGRAM_DEFAULT_IMG_SIZE of 64. This may be an intentional tuning improvement to reduce early reallocations, but the rationale is not documented. Consider defining a named constant (like the removed _BF_PROGRAM_DEFAULT_IMG_SIZE) and adding a brief comment explaining the sizing choice.

/**
* @brief Get a pointer to the n-th element.
*
* @param vec Initialised vector. Must be non-NULL.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: The bf_vector_clean documentation says the vector can be "reused (e.g. by re-assigning via bf_vector_default)" but does not mention that elem_size is preserved. Since bf_vector_add works correctly on a cleaned vector without re-initialization, consider noting that elem_size is retained, so callers know whether they need to call bf_vector_default/bf_vector_init again.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should call init / default nonetheless. We can be permissive in case they forgot, but then we shouldn't document it.

@pzmarzly pzmarzly requested a review from qdeslandes April 7, 2026 15:15
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from 8ff783d to 0926237 Compare April 8, 2026 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants