|
| 1 | +# REQ-01-CORE-REPOSITORY.md |
| 2 | + |
| 3 | +## Overview |
| 4 | +**Phase**: 1 of 5 |
| 5 | +**Dependencies**: Foundation Layer |
| 6 | +**Deliverables**: Core repository and data structures |
| 7 | +**Estimated Effort**: 3 developer weeks |
| 8 | + |
| 9 | +## Context & References |
| 10 | +- **Architecture**: See [AST_TECH_SPEC.md](./AST_TECH_SPEC.md#1-repository-system) |
| 11 | +- **Data Models**: See [AST_TECH_SPEC.md](./AST_TECH_SPEC.md#5-data-models) |
| 12 | +- **API Contracts**: See [AST_TECH_SPEC.md](./AST_TECH_SPEC.md#repository-api) |
| 13 | +- **Foundation Integration**: Uses `ElixirScope.Storage.DataAccess` patterns |
| 14 | + |
| 15 | +## Functional Requirements |
| 16 | + |
| 17 | +### FR-1.1: Module Data Storage |
| 18 | +**Priority**: MUST |
| 19 | +**Description**: Repository MUST store module data with O(1) lookup performance |
| 20 | +**Acceptance Criteria**: |
| 21 | +- [ ] Store `ModuleData.t()` structures in ETS table with module_name as key |
| 22 | +- [ ] Support concurrent reads via `:read_concurrency` option |
| 23 | +- [ ] Implement atomic updates with GenServer coordination |
| 24 | +- [ ] Maintain compilation_hash for change detection |
| 25 | +- [ ] Store instrumentation_points and correlation_metadata |
| 26 | + |
| 27 | +### FR-1.2: Function Data Storage |
| 28 | +**Priority**: MUST |
| 29 | +**Description**: Repository MUST store function-level data with efficient retrieval |
| 30 | +**Acceptance Criteria**: |
| 31 | +- [ ] Store `FunctionData.t()` structures with function_key as primary key |
| 32 | +- [ ] Support lookup by `{module_name, function_name, arity}` tuple |
| 33 | +- [ ] Store AST fragments and complexity scores |
| 34 | +- [ ] Maintain runtime correlation data slots |
| 35 | + |
| 36 | +### FR-1.3: AST Node Mapping |
| 37 | +**Priority**: MUST |
| 38 | +**Description**: Repository MUST maintain bidirectional AST node mappings |
| 39 | +**Acceptance Criteria**: |
| 40 | +- [ ] Store AST nodes with unique `ast_node_id` identifiers |
| 41 | +- [ ] Support correlation_id to ast_node_id mapping |
| 42 | +- [ ] Implement instrumentation point storage and retrieval |
| 43 | +- [ ] Maintain temporal correlation indices |
| 44 | + |
| 45 | +### FR-1.4: Repository Lifecycle Management |
| 46 | +**Priority**: MUST |
| 47 | +**Description**: Repository MUST support creation, cleanup, and monitoring |
| 48 | +**Acceptance Criteria**: |
| 49 | +- [ ] GenServer-based repository with configurable name |
| 50 | +- [ ] ETS table lifecycle management (creation, cleanup) |
| 51 | +- [ ] Repository statistics collection and reporting |
| 52 | +- [ ] Health check functionality |
| 53 | +- [ ] Graceful shutdown with data preservation options |
| 54 | + |
| 55 | +### FR-1.5: Configuration Management |
| 56 | +**Priority**: SHOULD |
| 57 | +**Description**: Repository SHOULD support runtime configuration |
| 58 | +**Acceptance Criteria**: |
| 59 | +- [ ] Configurable memory limits per table type |
| 60 | +- [ ] Adjustable cleanup intervals and TTL settings |
| 61 | +- [ ] Performance tracking enable/disable |
| 62 | +- [ ] Maximum table size limits with overflow handling |
| 63 | + |
| 64 | +## Non-Functional Requirements |
| 65 | + |
| 66 | +### NFR-1.1: Performance |
| 67 | +- **Module lookup**: < 1ms (99th percentile) for 10K modules |
| 68 | +- **Function lookup**: < 1ms (99th percentile) for 100K functions |
| 69 | +- **Correlation mapping**: < 5ms (99th percentile) for 1M correlations |
| 70 | +- **Memory usage**: < 50MB base repository for typical Elixir project |
| 71 | + |
| 72 | +### NFR-1.2: Reliability |
| 73 | +- **Availability**: 99.9% uptime during normal operation |
| 74 | +- **Error recovery**: Automatic ETS table reconstruction on corruption |
| 75 | +- **Memory pressure**: Graceful degradation with LRU eviction |
| 76 | +- **Supervision**: OTP supervision tree integration |
| 77 | + |
| 78 | +### NFR-1.3: Concurrency |
| 79 | +- **Read operations**: Unlimited concurrent reads via ETS `:read_concurrency` |
| 80 | +- **Write operations**: Serialized through GenServer with batching support |
| 81 | +- **Lock contention**: < 1% under normal load patterns |
| 82 | +- **Process isolation**: Repository failures don't affect Foundation Layer |
| 83 | + |
| 84 | +### NFR-1.4: Scalability |
| 85 | +- **Module capacity**: Support up to 10K modules |
| 86 | +- **Function capacity**: Support up to 100K functions |
| 87 | +- **AST node capacity**: Support up to 1M instrumented nodes |
| 88 | +- **Linear memory growth**: O(n) memory usage with data size |
| 89 | + |
| 90 | +## Technical Implementation Notes |
| 91 | + |
| 92 | +### Files to Implement |
| 93 | +- [ ] `lib/elixir_scope/ast/repository/core.ex` (estimated: 500 LOC) |
| 94 | + - GenServer implementation with ETS table management |
| 95 | + - CRUD operations for modules, functions, AST nodes |
| 96 | + - Correlation index management |
| 97 | + - Statistics collection and health checks |
| 98 | + |
| 99 | +- [ ] `lib/elixir_scope/ast/data/module_data.ex` (estimated: 200 LOC) |
| 100 | + - ModuleData struct definition and type specifications |
| 101 | + - Constructor functions and validation |
| 102 | + - Update helpers and data transformation utilities |
| 103 | + - Runtime correlation integration points |
| 104 | + |
| 105 | +- [ ] `lib/elixir_scope/ast/data/function_data.ex` (estimated: 150 LOC) |
| 106 | + - FunctionData struct with complexity tracking |
| 107 | + - Call graph data structures |
| 108 | + - Performance metrics slots |
| 109 | + - Runtime correlation hooks |
| 110 | + |
| 111 | +- [ ] `lib/elixir_scope/ast/data/shared_data_structures.ex` (estimated: 100 LOC) |
| 112 | + - Common type definitions and utilities |
| 113 | + - Correlation ID generation and management |
| 114 | + - AST node ID utilities |
| 115 | + - Performance tracking structures |
| 116 | + |
| 117 | +- [ ] `lib/elixir_scope/ast/data/supporting_structures.ex` (estimated: 100 LOC) |
| 118 | + - Instrumentation point definitions |
| 119 | + - Correlation metadata structures |
| 120 | + - Repository configuration types |
| 121 | + - Statistics and monitoring types |
| 122 | + |
| 123 | +### Integration Points |
| 124 | +- **Foundation Layer**: Uses `ElixirScope.Storage.DataAccess` for ETS patterns |
| 125 | +- **Configuration**: Integrates with `ElixirScope.Foundation.Config` for settings |
| 126 | +- **Utilities**: Uses `ElixirScope.Utils` for common operations |
| 127 | +- **Error Handling**: Follows Foundation Layer error patterns |
| 128 | + |
| 129 | +### ETS Table Design |
| 130 | +```elixir |
| 131 | +# Modules table: O(1) module lookup |
| 132 | +:ets.new(:ast_modules, [:set, :public, {:read_concurrency, true}]) |
| 133 | + |
| 134 | +# Functions table: O(1) function lookup |
| 135 | +:ets.new(:ast_functions, [:set, :public, {:read_concurrency, true}]) |
| 136 | + |
| 137 | +# AST nodes table: instrumentation mapping |
| 138 | +:ets.new(:ast_nodes, [:set, :public, {:read_concurrency, true}]) |
| 139 | + |
| 140 | +# Correlation index: runtime correlation |
| 141 | +:ets.new(:correlation_index, [:bag, :public, {:read_concurrency, true}]) |
| 142 | +``` |
| 143 | + |
| 144 | +## Testing Requirements |
| 145 | + |
| 146 | +### Unit Tests |
| 147 | +- [ ] Repository GenServer lifecycle (start, stop, restart) |
| 148 | +- [ ] Module CRUD operations with validation |
| 149 | +- [ ] Function storage and retrieval |
| 150 | +- [ ] AST node mapping functionality |
| 151 | +- [ ] Correlation index operations |
| 152 | +- [ ] Configuration validation and application |
| 153 | +- [ ] Error handling for invalid data |
| 154 | + |
| 155 | +### Integration Tests |
| 156 | +- [ ] Foundation Layer DataAccess integration |
| 157 | +- [ ] Configuration service integration |
| 158 | +- [ ] Error handling with Foundation Layer patterns |
| 159 | +- [ ] Multi-repository scenarios |
| 160 | + |
| 161 | +### Performance Tests |
| 162 | +- [ ] Module lookup benchmark: 1K, 10K, 100K scale |
| 163 | +- [ ] Function lookup benchmark with various arity distributions |
| 164 | +- [ ] Correlation mapping performance under load |
| 165 | +- [ ] Memory usage profiling with realistic data sets |
| 166 | +- [ ] Concurrent access stress testing (100+ processes) |
| 167 | + |
| 168 | +### Property-Based Tests |
| 169 | +- [ ] Repository state consistency under concurrent operations |
| 170 | +- [ ] ETS table integrity after random operations |
| 171 | +- [ ] Data serialization/deserialization round-trips |
| 172 | + |
| 173 | +## Definition of Done |
| 174 | +- [ ] All functional requirements implemented and tested |
| 175 | +- [ ] Performance benchmarks meet or exceed NFR targets |
| 176 | +- [ ] Unit test coverage ≥ 90% for all modules |
| 177 | +- [ ] Integration tests pass with Foundation Layer |
| 178 | +- [ ] Performance tests validate scalability targets |
| 179 | +- [ ] Code review completed with senior developer approval |
| 180 | +- [ ] Dialyzer passes with zero warnings |
| 181 | +- [ ] Documentation updated in AST_TECH_SPEC.md |
| 182 | +- [ ] Ready for Phase 2 dependency integration |
| 183 | + |
| 184 | +## Risk Mitigation |
| 185 | +- **ETS Memory Growth**: Implement configurable size limits with LRU eviction |
| 186 | +- **Correlation Index Performance**: Monitor and optimize bag table operations |
| 187 | +- **GenServer Bottleneck**: Design for future batching and parallel processing |
| 188 | +- **Data Consistency**: Implement atomic operations and validation |
| 189 | + |
| 190 | +## Phase 2 Handoff Requirements |
| 191 | +- Core repository operational and tested |
| 192 | +- Data structures proven with realistic data sets |
| 193 | +- API contracts validated and stable |
| 194 | +- Performance baselines established |
| 195 | +- Integration patterns documented for Phase 2 parsing system |
0 commit comments