-
Notifications
You must be signed in to change notification settings - Fork 57
feat: Change OCR to multi-method model #716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Refactor of TypeScript interfaces and hooks for OCR and VerticalOCR to support models that expose multiple inference methods for different input sizes. This commit simplifies current setup by allowing a single detector and recognizer source, rather than requiring separate entries for different input sizes.
…er model Adapts the C++ Recognition controllers to handle a single recognizer file that contains multiple inference methods.
This commit addapts the C++ OCR and VerticalOCR controllers to handle a single detector model with multiple inference methods
|
The cpp code and docs look good, someone needs to review the rest and test it |
IgorSwat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall a good code.
| std::vector<types::DetectorBBox> generate(const cv::Mat &inputImage); | ||
| cv::Size getModelImageSize() const noexcept; | ||
| std::vector<types::DetectorBBox> generate(const cv::Mat &inputImage, | ||
| const int inputWidth); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary const.
Also use int32_t (or any other explicit int type you want) instead of an int.
| auto inputShapes = getAllInputShapes(methodName); | ||
| if (inputShapes.empty()) { | ||
| throw std::runtime_error("Detector model has no input shape for method: " + | ||
| methodName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change the error message to something like "Detector model: invalid method name XYZ"
|
|
||
| std::pair<std::vector<int32_t>, float> | ||
| Recognizer::generate(const cv::Mat &grayImage) { | ||
| Recognizer::generate(const cv::Mat &grayImage, int inputWidth) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int32_t please :)
| if (shapes.empty()) { | ||
| throw std::runtime_error( | ||
| "Recognizer model has no input tensors for method " + method_name); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change the error message to something like "Recognizer model: invalid method name XYZ"
| std::shared_ptr<react::CallInvoker> callInvoker); | ||
| std::pair<std::vector<int32_t>, float> generate(const cv::Mat &grayImage); | ||
| std::pair<std::vector<int32_t>, float> generate(const cv::Mat &grayImage, | ||
| int inputWidth); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int32_t please :)
| modelMediumImageSize = | ||
| calculateImageSizeForWidth(constants::kMediumDetectorWidth); | ||
| modelLargeImageSize = | ||
| calculateImageSizeForWidth(constants::kLargeDetectorWidth); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend using either this-> (as in the line above, ex. this->modelSmallImageSize), or marking the class members with _ at the end (ex. modelSmallImageSize_).
By doing so it's clear which fields are class members, and which one are not.
| cv::Size(modelImageSize.width / 2, modelImageSize.height / 2)); | ||
| float txtThreshold = this->detectSingleCharacters | ||
| cv::Size(modelInputSize.width / 2, modelInputSize.height / 2)); | ||
| float txtThreshold = detectSingleCharacters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this-> removed?
|
|
||
| // if this is Narrow Detector, do not group boxes. | ||
| if (!this->detectSingleCharacters) { | ||
| if (!detectSingleCharacters) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this-> removed?
Description
This PR refactors the OCR and VerticalOCR to improve efficiency.
Previously, our implementation relied on multiple instances of the same detector and recognizer models, each handling a different input size (3x Detector, 4x Recognizer instances). This approach was resource-intensive.
This update introduces a more streamlined approach by using a single detector and a single recognizer model, each with multiple forward_ methods (e.g., forward_800, forward_320). These methods handle different input widths within the same model instance, significantly reducing the number of loaded models and simplifying the API.
This change is a breaking change as it modifies the arguments for useOCR, useVerticalOCR, OCRModule, and VerticalOCRModule
Introduces a breaking change?
Type of change
Tested on
Testing instructions
Manual sanity checks.
Screenshots
Related issues
#692
Checklist
Additional notes