Skip to content

Fix the logic to fetch the rows#13

Merged
severo merged 4 commits intomainfrom
fix-logic
Nov 21, 2025
Merged

Fix the logic to fetch the rows#13
severo merged 4 commits intomainfrom
fix-logic

Conversation

@severo
Copy link
Contributor

@severo severo commented Nov 21, 2025

We now allow the estimated row numbers in different random row ranges to overlap. We only reassign them when the bytes are contiguous and we merge two ranges.
The consequence is that the response is a lot faster.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the CSV row fetching logic to improve performance by allowing row number estimates in different random ranges to overlap. Previously, the system would reassign row numbers across all ranges to prevent overlap, which was slower. Now, row numbers are only reassigned when bytes are contiguous and ranges are merged. The changes also introduce a "prefix rows" fetching strategy to handle partial rows when starting from estimated byte positions.

Key changes:

  • Modified CSVCache.store() to require a firstRow parameter for creating new ranges
  • Refactored getNextMissingRow() to return separate firstByteToFetch and firstByteToStore values along with expected firstRow
  • Reduced default chunk size from 500KB to 100KB for finer-grained fetching when estimating positions

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/cache.ts Added firstRow parameter to store() method; refactored getNextMissingRow() to support prefix fetching with separate fetch/store byte positions
src/dataframe.ts Refactored fetch loop to use new getNextMissingRow() API; added prefix row fetching; changed event emission to per-row instead of per-fetch; reduced default chunk size to 100KB
test/cache.test.ts Updated all store() calls to include required firstRow parameter; updated getNextMissingRow() test expectations for new return structure; removed test for row number estimation across ranges
test/dataframe.test.ts Added extra row to test data; updated expected values to reflect new fetching behavior with prefix rows
src/Welcome.tsx Added new example dataset (19th-century novelists) for testing with large files
package.json Updated dependencies: csv-range to 0.0.10, vitest/coverage to 4.0.12, eslint plugins to latest versions, vite to 7.2.4
package-lock.json Updated lockfile to match package.json dependency changes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@severo severo merged commit c592277 into main Nov 21, 2025
3 checks passed
@severo severo deleted the fix-logic branch November 21, 2025 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant