Skip to content

Discussion: New Branches for Feature Extensions and Joern CPG #28

@e2720pjk

Description

@e2720pjk

Hi team

Big thanks for building such a constructive project.
It’s really helped me develop ideas on how to preprocess projects. To produce documentation that’s effective for my own work, I’ve been adding several enhancements in my fork to support integration with my personal projects.
Right now, I’ve set up three feature branches:

  1. Feature Enhancements
    feat: Major performance improvements and architectural enhancements (v0.1.1 - v0.1.6) e2720pjk/CodeWiki#27
  2. Joern CPG Integration(POC)
    PR: Implemented hybrid AST + Joern CPG analysis e2720pjk/CodeWiki#12
  3. A/B Testing Framework(Demo)
    CodeWiki A/B testing framework ready e2720pjk/CodeWiki#10

Roadmap

My next planned improvement is full-project analysis with adaptive batch processing: introducing dynamic limit calculation and resumable workflows to better handle large-scale repositories.
e2720pjk#26

Context

These changes were generated with the help of an LLM, so I understand if there are concerns about quality. I’ll continue refining and maintaining them in my fork, and I’d be glad to contribute upstream if they’re considered useful.

Questions for the maintainers

  1. Would you be open to reviewing PRs for any of these features?
  2. I noticed that the tests/ directory is explicitly listed in .gitignore, which conflicts with the configuration in pyproject.toml. Does this mean tests are not intended to be included in the repository? Should pytest files be excluded from PRs?

Branch Summary

Feature Enhancements

Quick Reference: New CLI Options & Configuration Parameters (v0.1.1 - v0.1.6)
CLI Command Options (generate)
Option Type Default Description
--respect-gitignore flag False Respect .gitignore patterns during analysis (v0.1.1)
--max-files int 100 Maximum number of files to analyze (range: 1-5000) (v0.1.1)
--max-entry-points int 5 Maximum number of entry points to identify (v0.1.1)
--max-connectivity-files int 10 Maximum number of high-connectivity files (v0.1.1)
CLI Command Options (config set)
Option Type Default Range Description
--enable-parallel-processing flag True - Enable parallel processing for leaf modules (v0.1.1)
--disable-parallel-processing flag - - Disable parallel processing (v0.1.1)
--concurrency-limit int 5 1-10 Maximum concurrent API calls (v0.1.1)
--max-tokens-per-module int 36369 1000-200000 Maximum tokens per module (v0.1.1)
--max-tokens-per-leaf int 16000 500-100000 Maximum tokens per leaf module (v0.1.1)
--cache-size int 1000 100-10000 LLM cache size - number of cached prompts (v0.1.1)

Joern CPG Support (POC)

Replacing LLM-dependent clustering and flow analysis with Joern Code Property Graphs for more deterministic control flow and dependency extraction. Initial POC is functional; looking for feedback on approach.

A/B Testing (Demo)

Version comparison scripts have been implemented and are functional. However, the evaluation metrics still need refinement, as current reports show anomalies. At this stage, the implementation serves mainly as a showcase of evaluation methods, not a core feature.
Reference reports available at e2720pjk#9 (comment) .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions