Skip to content

Conversation

@ParameswaranSajeenthiran
Copy link
Collaborator

@ParameswaranSajeenthiran ParameswaranSajeenthiran commented Jan 20, 2026

Summary

This PR introduces the new Agentic Retrieval and improvements in Knowledge Graph (KG) construction and semantic search, focusing on performance, scalability, and reliability.


1. Agentic Retrieval

  • Introduced session-based agent orchestration initiated from the frontend to manage agentic retrieval workflows per user query
  • Implemented agent-driven query understanding, enabling interpretation of user intent and constraints from raw query text
  • Added semantic beam search plan generation, allowing the agent to construct and rank multi-step retrieval plans
  • Enabled iterative execution of beam search plans, where intermediate results inform subsequent retrieval decisions
  • Integrated agent-controlled response generation, synthesizing grounded responses from semantic beam search outputs

2. Knowledge Graph (KG) Construction

  • Implemented KG construction from PDF with support in both UI and server
  • Parallelized graph file persistence and embedding generation to reduce ingestion latency
  • Migrated online edge embedding (API during query) to offline embedding generation and indexing
  • Implemented tuple validation with retry mechanism for invalid tuples to improve data reliability
  • General code-level optimizations for ingestion and persistence
  • Added and extended integration tests for KG construction and persistence pipeline

2. Semantic Beam Search & Graph / Query Optimizations

  • Introduced Semantic Beam Search for improved multi-hop reasoning and retrieval quality
  • Added Fennel-based graph partitioning for better scalability
  • Migrated lightweight queries from processes to threads to enable shared access to large-scale indexes
  • General code-level optimizations for query execution
  • Integration tests added for search correctness and query performance

Impact

  • Faster and more reliable KG ingestion
  • Reduced dependency on runtime embedding APIs
  • Scalable query execution and search
  • Higher-quality semantic search results

Notes for Reviewers

  • Parallelism changes affect persistence and embedding pipelines — concurrency handling should be reviewed
  • Offline edge embedding migration introduces new indexing logic

@ParameswaranSajeenthiran ParameswaranSajeenthiran changed the title [GraphRAG][v2] Implementation of Agentic Retrieval and Performance enhancement of KG construction and Semantic Beam Search [GraphRAG][v2] Implementation of Agentic Retrieval and Performance enhancement of KG construction & Semantic Beam Search Jan 20, 2026
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
9 Security Hotspots
19.9% Duplication on New Code (required ≤ 3%)
D Security Rating on New Code (required ≥ A)
E Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

CURL* curl = curl_easy_init();
if (curl) {
std::string response;
curl_easy_setopt(curl, CURLOPT_SSLVERSION, CURL_SSLVERSION_DEFAULT);

Check failure

Code scanning / SonarCloud

Weak SSL/TLS protocols should not be used High

Use stronger SSL and TLS versions See more on SonarQube Cloud
}

// TLS + connection settings
curl_easy_setopt(curl, CURLOPT_SSLVERSION, CURL_SSLVERSION_DEFAULT);

Check failure

Code scanning / SonarCloud

Weak SSL/TLS protocols should not be used High

Use stronger SSL and TLS versions See more on SonarQube Cloud
@codecov
Copy link

codecov bot commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 0.65917% with 2562 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.92%. Comparing base (c52c17b) to head (a3ce8dd).

Files with missing lines Patch % Lines
src/knowledgegraph/construction/Pipeline.cpp 0.00% 691 Missing ⚠️
src/util/Utils.cpp 4.40% 321 Missing and 26 partials ⚠️
src/frontend/JasmineGraphFrontEnd.cpp 0.00% 318 Missing ⚠️
...ssor/nlp/semanticbeamsearch/SemanticBeamSearch.cpp 0.00% 274 Missing ⚠️
.../incremental/JasmineGraphIncrementalLocalStore.cpp 0.00% 142 Missing ⚠️
.../frontend/core/executor/impl/AgentPlanExecutor.cpp 0.00% 136 Missing ⚠️
.../core/executor/impl/SemanticBeamSearchExecutor.cpp 0.00% 95 Missing ⚠️
src/vectorstore/FaissIndex.cpp 0.00% 72 Missing ⚠️
src/rag/util/LLMUtils.cpp 0.00% 69 Missing ⚠️
...tioner/stream/HDFSMultiThreadedHashPartitioner.cpp 0.00% 61 Missing ⚠️
... and 20 more
Additional details and impacted files
@@            Coverage Diff            @@
##           master    #350      +/-   ##
=========================================
- Coverage    0.95%   0.92%   -0.04%     
=========================================
  Files         108     115       +7     
  Lines       26278   28224    +1946     
  Branches    17306   18615    +1309     
=========================================
+ Hits          250     260      +10     
- Misses      25812   27737    +1925     
- Partials      216     227      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need clang here?

@@ -0,0 +1,12 @@
BasedOnStyle: Google
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to change this file?

message(WARNING "No ANTLR generated files found in /home/ubuntu/software/antlr/")
set(GENERATED_SRC ""
src/frontend/JasmineGraphFrontEnd.cpp
src/localstore/incremental/JasmineGraphIncrementalLocalStore.cpp
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is src/localstore/incremental/JasmineGraphIncrementalLocalStore.cpp mentioned three times here?


#This parameter holds the maximum label size of Node Block
org.jasminegraph.nativestore.max.label.size=43
org.jasminegraph.nativestore.max.label.size=256
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we increase this value to 256?

org.jasminegraph.vectorstore.enabled=true
org.jasminegraph.vectorstore.dimension=512
org.jasminegraph.vectorstore.embedding.model=jina/jina-embeddings-v2-small-en
org.jasminegraph.vectorstore.embedding.ollama.endpoint=http://gemma3_container:11441
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we expose this endpoint?

@@ -0,0 +1,121 @@
"""Copyright 2025 JasmineGraph Team
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Copyright 2025 JasmineGraph Team
"""Copyright 2026 JasmineGraph Team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants