Skip to content

Conversation

@benITo47
Copy link

Description

This PR refactors the OCR and VerticalOCR to improve efficiency.

Previously, our implementation relied on multiple instances of the same detector and recognizer models, each handling a different input size (3x Detector, 4x Recognizer instances). This approach was resource-intensive.

This update introduces a more streamlined approach by using a single detector and a single recognizer model, each with multiple forward_ methods (e.g., forward_800, forward_320). These methods handle different input widths within the same model instance, significantly reducing the number of loaded models and simplifying the API.

This change is a breaking change as it modifies the arguments for useOCR, useVerticalOCR, OCRModule, and VerticalOCRModule

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

Manual sanity checks.

Screenshots

Related issues

#692

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

Refactor of TypeScript interfaces and hooks for OCR and VerticalOCR to support models that expose multiple inference methods for different input sizes.
This commit simplifies current setup by allowing a single detector and recognizer source, rather than requiring separate entries for different input sizes.
…er model

	Adapts the C++ Recognition controllers to handle a single recognizer file that contains multiple inference methods.
	This commit addapts the C++ OCR and VerticalOCR controllers to handle a single detector model  with multiple inference methods
@msluszniak msluszniak added the feature PRs that implement a new feature label Jan 14, 2026
@msluszniak msluszniak linked an issue Jan 14, 2026 that may be closed by this pull request
@msluszniak
Copy link
Member

The cpp code and docs look good, someone needs to review the rest and test it

Copy link
Contributor

@IgorSwat IgorSwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a good code.

std::vector<types::DetectorBBox> generate(const cv::Mat &inputImage);
cv::Size getModelImageSize() const noexcept;
std::vector<types::DetectorBBox> generate(const cv::Mat &inputImage,
const int inputWidth);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary const.
Also use int32_t (or any other explicit int type you want) instead of an int.

auto inputShapes = getAllInputShapes(methodName);
if (inputShapes.empty()) {
throw std::runtime_error("Detector model has no input shape for method: " +
methodName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the error message to something like "Detector model: invalid method name XYZ"


std::pair<std::vector<int32_t>, float>
Recognizer::generate(const cv::Mat &grayImage) {
Recognizer::generate(const cv::Mat &grayImage, int inputWidth) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int32_t please :)

if (shapes.empty()) {
throw std::runtime_error(
"Recognizer model has no input tensors for method " + method_name);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the error message to something like "Recognizer model: invalid method name XYZ"

std::shared_ptr<react::CallInvoker> callInvoker);
std::pair<std::vector<int32_t>, float> generate(const cv::Mat &grayImage);
std::pair<std::vector<int32_t>, float> generate(const cv::Mat &grayImage,
int inputWidth);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int32_t please :)

modelMediumImageSize =
calculateImageSizeForWidth(constants::kMediumDetectorWidth);
modelLargeImageSize =
calculateImageSizeForWidth(constants::kLargeDetectorWidth);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend using either this-> (as in the line above, ex. this->modelSmallImageSize), or marking the class members with _ at the end (ex. modelSmallImageSize_).

By doing so it's clear which fields are class members, and which one are not.

cv::Size(modelImageSize.width / 2, modelImageSize.height / 2));
float txtThreshold = this->detectSingleCharacters
cv::Size(modelInputSize.width / 2, modelInputSize.height / 2));
float txtThreshold = detectSingleCharacters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this-> removed?


// if this is Narrow Detector, do not group boxes.
if (!this->detectSingleCharacters) {
if (!detectSingleCharacters) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this-> removed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PRs that implement a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Re-export OCR to use single weights

4 participants