-
Notifications
You must be signed in to change notification settings - Fork 74
Description
Hi team
Big thanks for building such a constructive project.
It’s really helped me develop ideas on how to preprocess projects. To produce documentation that’s effective for my own work, I’ve been adding several enhancements in my fork to support integration with my personal projects.
Right now, I’ve set up three feature branches:
- Feature Enhancements
feat: Major performance improvements and architectural enhancements (v0.1.1 - v0.1.6) e2720pjk/CodeWiki#27 - Joern CPG Integration(POC)
PR: Implemented hybrid AST + Joern CPG analysis e2720pjk/CodeWiki#12 - A/B Testing Framework(Demo)
CodeWiki A/B testing framework ready e2720pjk/CodeWiki#10
Roadmap
My next planned improvement is full-project analysis with adaptive batch processing: introducing dynamic limit calculation and resumable workflows to better handle large-scale repositories.
e2720pjk#26
Context
These changes were generated with the help of an LLM, so I understand if there are concerns about quality. I’ll continue refining and maintaining them in my fork, and I’d be glad to contribute upstream if they’re considered useful.
Questions for the maintainers
- Would you be open to reviewing PRs for any of these features?
- I noticed that the
tests/directory is explicitly listed in.gitignore, which conflicts with the configuration inpyproject.toml. Does this mean tests are not intended to be included in the repository? Should pytest files be excluded from PRs?
Branch Summary
Feature Enhancements
Quick Reference: New CLI Options & Configuration Parameters (v0.1.1 - v0.1.6)
CLI Command Options (generate)
| Option | Type | Default | Description |
|---|---|---|---|
--respect-gitignore |
flag | False |
Respect .gitignore patterns during analysis (v0.1.1) |
--max-files |
int | 100 |
Maximum number of files to analyze (range: 1-5000) (v0.1.1) |
--max-entry-points |
int | 5 |
Maximum number of entry points to identify (v0.1.1) |
--max-connectivity-files |
int | 10 |
Maximum number of high-connectivity files (v0.1.1) |
CLI Command Options (config set)
| Option | Type | Default | Range | Description |
|---|---|---|---|---|
--enable-parallel-processing |
flag | True |
- | Enable parallel processing for leaf modules (v0.1.1) |
--disable-parallel-processing |
flag | - | - | Disable parallel processing (v0.1.1) |
--concurrency-limit |
int | 5 |
1-10 | Maximum concurrent API calls (v0.1.1) |
--max-tokens-per-module |
int | 36369 |
1000-200000 | Maximum tokens per module (v0.1.1) |
--max-tokens-per-leaf |
int | 16000 |
500-100000 | Maximum tokens per leaf module (v0.1.1) |
--cache-size |
int | 1000 |
100-10000 | LLM cache size - number of cached prompts (v0.1.1) |
Joern CPG Support (POC)
Replacing LLM-dependent clustering and flow analysis with Joern Code Property Graphs for more deterministic control flow and dependency extraction. Initial POC is functional; looking for feedback on approach.
A/B Testing (Demo)
Version comparison scripts have been implemented and are functional. However, the evaluation metrics still need refinement, as current reports show anomalies. At this stage, the implementation serves mainly as a showcase of evaluation methods, not a core feature.
Reference reports available at e2720pjk#9 (comment) .