Project split by petr-pokorny-absa · Pull Request #17 · AbsaOSS/dataset-comparison

petr-pokorny-absa · 2026-03-03T10:22:14Z

This pull request introduces significant improvements to the documentation and refactors the Scala codebase for the big file comparison project. The changes clarify the modular structure, update instructions for running and testing, and introduce new, well-documented types for comparison logic. The code is reorganized into api and app modules, with clearer separation of concerns and enhanced type safety for analysis results and metrics.

Documentation improvements:

Added a detailed "Project Structure" section to README.md, explaining the split into api and app modules and listing key files and their responsibilities. Updated all example commands and configuration paths to reflect the new structure and module separation. [1] [2] [3] [4]
Improved instructions for running, testing, and configuring the project, including updated Spark and Hadoop commands, test resource handling, and Java version requirements. [1] [2]

Codebase modularization and refactoring:

Moved core comparison logic and data models to the new api module (Comparator.scala, DatasetComparisonHelper.scala, analysis/*), separating them from CLI and I/O code in the app module. [1] [2]
Refactored DatasetComparisonHelper.exclude to remove unnecessary SparkSession parameter, simplifying its interface.

Typed analysis and metrics:

Introduced new sealed trait AnalysisResult and its case classes to represent explicit outcomes of row-by-row analysis, improving error handling and result clarity. [1] [2]
Added a strongly-typed ComparisonMetrics case class for comparison statistics, and a default implementation of ComparisonMetricsCalculator that computes metrics for any dataset pair. [1] [2]

API documentation and code comments:

Added comprehensive Scaladoc comments to all new and refactored classes and methods, clarifying their purpose and usage. [1] [2] [3] [4] [5]

Overview

Release Notes

Code refactored and split to app and api modules

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
3	3	100%	70%	🟢

New Files

No new covered files...

Modified Files

No covered modified files...

updated for commit: 54ea7e1 by action🐍

petr-pokorny-absa added 2 commits March 3, 2026 08:52

refactoring

4421c45

split into api and app

54ea7e1

petr-pokorny-absa requested review from OlivieFranklova and tmikula-dev as code owners March 3, 2026 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project split#17

Project split#17
petr-pokorny-absa wants to merge 2 commits intomasterfrom
project-split

petr-pokorny-absa commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

petr-pokorny-absa commented Mar 3, 2026

Overview

Release Notes

Related

Uh oh!

github-actions bot commented Mar 3, 2026

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant