Skip to content

Project split#17

Open
petr-pokorny-absa wants to merge 2 commits intomasterfrom
project-split
Open

Project split#17
petr-pokorny-absa wants to merge 2 commits intomasterfrom
project-split

Conversation

@petr-pokorny-absa
Copy link
Collaborator

This pull request introduces significant improvements to the documentation and refactors the Scala codebase for the big file comparison project. The changes clarify the modular structure, update instructions for running and testing, and introduce new, well-documented types for comparison logic. The code is reorganized into api and app modules, with clearer separation of concerns and enhanced type safety for analysis results and metrics.

Documentation improvements:

  • Added a detailed "Project Structure" section to README.md, explaining the split into api and app modules and listing key files and their responsibilities. Updated all example commands and configuration paths to reflect the new structure and module separation. [1] [2] [3] [4]
  • Improved instructions for running, testing, and configuring the project, including updated Spark and Hadoop commands, test resource handling, and Java version requirements. [1] [2]

Codebase modularization and refactoring:

  • Moved core comparison logic and data models to the new api module (Comparator.scala, DatasetComparisonHelper.scala, analysis/*), separating them from CLI and I/O code in the app module. [1] [2]
  • Refactored DatasetComparisonHelper.exclude to remove unnecessary SparkSession parameter, simplifying its interface.

Typed analysis and metrics:

  • Introduced new sealed trait AnalysisResult and its case classes to represent explicit outcomes of row-by-row analysis, improving error handling and result clarity. [1] [2]
  • Added a strongly-typed ComparisonMetrics case class for comparison statistics, and a default implementation of ComparisonMetricsCalculator that computes metrics for any dataset pair. [1] [2]

API documentation and code comments:

  • Added comprehensive Scaladoc comments to all new and refactored classes and methods, clarifying their purpose and usage. [1] [2] [3] [4] [5]

Overview

Release Notes

  • Code refactored and split to app and api modules

Related

Closes #14

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
3 3 100% 70% 🟢

New Files

No new covered files...

Modified Files

No covered modified files...

updated for commit: 54ea7e1 by action🐍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move data comparison logic to a library

1 participant