Skip to content

Commit 529bdb7

Browse files
committed
init
0 parents  commit 529bdb7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+4786
-0
lines changed

.gitignore

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
**node_modules
2+
**dist
3+
4+
**/.env
5+
**/.env.development.local
6+
**/.env.test.local
7+
**/.env.production.local
8+
**/.env.local
9+
10+
**.DS_Store
11+
12+
.cursor
13+
.vscode
14+
.claude
15+
*.txt

.prettierignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
**/*/node_modules/
2+
**/*/dist/
3+
**/*/drizzle/
4+
**/*/*.d.ts

.prettierrc.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"semi": false,
3+
"singleQuote": true,
4+
"tabWidth": 2,
5+
"trailingComma": "es5",
6+
"printWidth": 100,
7+
"bracketSpacing": true,
8+
"arrowParens": "avoid",
9+
"endOfLine": "lf"
10+
}

README.md

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
# h-codex
2+
3+
A semantic code search tool for intelligent, cross-repo context retrieval.
4+
5+
## ✨ Features
6+
7+
- **AST-Based Chunking**: Intelligent code parsing using Abstract Syntax Trees for optimal chunk boundaries
8+
- **Embedding & Semantic Search**: Using OpenAI's `text-embedding-3-small` model (support for `voyage-code-3` planned)
9+
- **Vector Database**: PostgreSQL with pgvector extension for efficient similarity search
10+
- **Multi-Language Support**: TypeScript, JavaScript, and extensible for other languages
11+
- **MCP Integration**: Seamlessly connects with AI coding assistants through Model Context Protocol
12+
13+
## 🚀 Getting started
14+
15+
h-codex can be integrated with AI assistants through the Model Context Protocol.
16+
17+
### Example with Claude Desktop
18+
19+
Edit your `claude_mcp_settings.json` file:
20+
21+
```json
22+
{
23+
"mcpServers": {
24+
"h-codex": {
25+
"command": "npx",
26+
"args": ["@h-codex/mcp"],
27+
"env": {
28+
"OPENAI_API_KEY": "your_openai_api_key_here",
29+
"DB_CONNECTION_STRING": "postgresql://postgres:password@localhost:5432/h-codex"
30+
}
31+
}
32+
}
33+
}
34+
```
35+
36+
## 🛠️ Development
37+
38+
### Prerequisites
39+
40+
- [Node.js](https://nodejs.org/) (v18+)
41+
- [pnpm](https://pnpm.io/) - Package manager
42+
- [Docker](https://www.docker.com/) - For running PostgreSQL with pgvector
43+
- OpenAI API key for embeddings
44+
45+
### Getting Started
46+
47+
1. **Clone the repository**
48+
49+
```bash
50+
git clone https://github.com/hpbyte/h-codex.git
51+
cd h-codex
52+
```
53+
54+
2. **Set up environment variables**
55+
56+
```bash
57+
cp packages/core/.env.example packages/core/.env
58+
```
59+
60+
Edit the `.env` file with your OpenAI API key and other configuration options.
61+
62+
3. **Install dependencies**
63+
64+
```bash
65+
pnpm install
66+
```
67+
68+
4. **Start PostgreSQL database**
69+
70+
```bash
71+
cd dev && docker compose up -d
72+
```
73+
74+
5. **Set up the database**
75+
76+
```bash
77+
pnpm run db:migrate
78+
```
79+
80+
6. **Start development server**
81+
82+
```bash
83+
pnpm dev
84+
```
85+
86+
## 🔧 Configuration Options
87+
88+
| Environment Variable | Description | Default |
89+
| ---------------------- | -------------------------------- | ------------------------------------------------------- |
90+
| `OPENAI_API_KEY` | OpenAI API key for embeddings | Required |
91+
| `EMBEDDING_MODEL` | OpenAI model for embeddings | `text-embedding-3-small` |
92+
| `CHUNK_SIZE` | Maximum chunk size in characters | `1000` |
93+
| `SEARCH_RESULTS_LIMIT` | Max search results returned | `10` |
94+
| `SIMILARITY_THRESHOLD` | Minimum similarity for results | `0.5` |
95+
| `DB_CONNECTION_STRING` | PostgreSQL connection string | `postgresql://postgres:password@localhost:5432/h-codex` |
96+
97+
## 🏗️ Architecture
98+
99+
```mermaid
100+
graph TD
101+
subgraph "Core Package"
102+
subgraph "Ingestion Pipeline"
103+
Explorer["Explorer<br/>(file discovery)"]
104+
Chunker["Chunker<br/>(AST parsing & chunking)"]
105+
Embedder["Embedder<br/>(semantic embeddings)"]
106+
Indexer["Indexer<br/>(orchestration)"]
107+
108+
Explorer --> Chunker
109+
Chunker --> Embedder
110+
Embedder --> Indexer
111+
end
112+
113+
subgraph "Storage Layer"
114+
Repository["Repository"]
115+
end
116+
117+
Indexer --> Repository
118+
Repository --> Database[(PostgreSQL Vector Database)]
119+
end
120+
121+
subgraph "MCP Package"
122+
MCPServer["MCP Server"]
123+
CodeIndexTool["Code Index Tool"]
124+
CodeSearchTool["Code Search Tool"]
125+
126+
MCPServer --> CodeIndexTool
127+
MCPServer --> CodeSearchTool
128+
end
129+
130+
CodeIndexTool --> Indexer
131+
CodeSearchTool --> Repository
132+
```
133+
134+
## 🗺️ Roadmap
135+
136+
- Support for additional embedding providers (Voyage AI)
137+
- Enhanced language support with more tree-sitter parsers
138+
139+
## 📄 License
140+
141+
This project is licensed under the MIT License

dev/docker-compose.yml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: h-codex
2+
3+
services:
4+
postgres:
5+
image: pgvector/pgvector:pg16
6+
container_name: h-codex-postgres
7+
env_file:
8+
- ../packages/core/.env
9+
environment:
10+
PGDATA: /var/lib/postgresql/data/pgdata
11+
POSTGRES_DB: h-codex
12+
POSTGRES_USER: postgres
13+
POSTGRES_PASSWORD: password
14+
ports:
15+
- '5432:5432'
16+
volumes:
17+
- postgres_data:/var/lib/postgresql/data
18+
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
19+
healthcheck:
20+
test: ['CMD-SHELL', 'pg_isready -U postgres -d h-codex']
21+
interval: 5s
22+
timeout: 5s
23+
retries: 5
24+
25+
volumes:
26+
postgres_data:
27+
driver: local

dev/init.sql

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
-- Initialize database with pgvector extension
2+
CREATE EXTENSION IF NOT EXISTS vector;
3+
4+
-- Create indexes for better performance
5+
-- These will be created after migrations run, but we ensure the extension is available
6+
SELECT 'Database initialized with pgvector extension' as status;

eslint.config.js

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
import eslint from '@eslint/js'
2+
import tseslint from '@typescript-eslint/eslint-plugin'
3+
import tsParser from '@typescript-eslint/parser'
4+
import prettierPlugin from 'eslint-plugin-prettier'
5+
import prettierConfig from 'eslint-config-prettier'
6+
7+
export default [
8+
eslint.configs.recommended,
9+
{
10+
files: ['**/*.{ts,tsx}'],
11+
languageOptions: {
12+
parser: tsParser,
13+
parserOptions: {
14+
ecmaVersion: 'latest',
15+
sourceType: 'module',
16+
},
17+
globals: {
18+
console: 'readonly',
19+
process: 'readonly',
20+
__dirname: 'readonly',
21+
},
22+
},
23+
plugins: {
24+
'@typescript-eslint': tseslint,
25+
prettier: prettierPlugin,
26+
},
27+
rules: {
28+
...tseslint.configs.recommended.rules,
29+
...prettierConfig.rules,
30+
'@typescript-eslint/no-explicit-any': 'warn',
31+
'@typescript-eslint/no-unused-vars': [
32+
'warn',
33+
{
34+
argsIgnorePattern: '^_',
35+
varsIgnorePattern: '^_',
36+
},
37+
],
38+
'no-console': 'off',
39+
'prettier/prettier': 'error',
40+
'eol-last': ['error', 'always'],
41+
},
42+
},
43+
{
44+
ignores: ['node_modules/**', '**/dist/**', 'drizzle/**', '*.d.ts', 'bun.lock', 'packages/core/tests/**'],
45+
},
46+
]

package.json

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
{
2+
"name": "h-codex",
3+
"version": "0.1.0",
4+
"description": "A semantic code search tool for intelligent, cross-repo context retrieval.",
5+
"author": "Htoo Pyae Lwin",
6+
"license": "MIT",
7+
"type": "module",
8+
"scripts": {
9+
"build": "pnpm -r run build",
10+
"dev": "pnpm --filter @h-codex/core run dev",
11+
"db:generate": "pnpm --filter @h-codex/core run db:generate",
12+
"db:migrate": "pnpm --filter @h-codex/core run db:migrate",
13+
"lint:fix": "eslint packages --ext .ts --fix",
14+
"format": "prettier --write \"packages/**/*.ts\""
15+
},
16+
"keywords": [
17+
"code-intelligence",
18+
"embeddings",
19+
"ast",
20+
"chunking",
21+
"typescript"
22+
],
23+
"devDependencies": {
24+
"@eslint/js": "^9.29.0",
25+
"@typescript-eslint/eslint-plugin": "^8.34.1",
26+
"@typescript-eslint/parser": "^8.34.1",
27+
"eslint": "^9.29.0",
28+
"eslint-config-prettier": "^10.1.5",
29+
"eslint-plugin-prettier": "^5.5.0",
30+
"prettier": "^3.5.3",
31+
"typescript": "^5.8.3"
32+
}
33+
}

packages/core/.env.example

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
OPENAI_API_KEY=your_openai_api_key_here
2+
3+
# db
4+
DB_CONNECTION_STRING=postgresql://postgres:password@localhost:5432/h-codex
5+
6+
EMBEDDING_MODEL=text-embedding-3-small
7+
CHUNK_SIZE=1000
8+
9+
SEARCH_RESULTS_LIMIT=10
10+
SIMILARITY_THRESHOLD=0.5

0 commit comments

Comments
 (0)