PPOCRv5-Android

High-performance on-device OCR application for Android using LiteRT (formerly TensorFlow Lite). This project demonstrates real-time text detection and recognition using PP-OCRv5 models from PaddleOCR, optimized for mobile deployment.

This project originated as a byproduct while building a larger on-device AI project (also coming soon as open source). I polished it up and decided to share it with the community.

Screenshots

Camera OCR	Gallery OCR

Features

Real-time camera OCR with live text detection overlay
Gallery image OCR with interactive text selection
GPU-accelerated inference via OpenCL
FP16 quantized models for optimal performance/accuracy balance
Pure C++ image preprocessing (no OpenCV dependency)
Support for 18,383 characters (CJK, Latin, symbols...)
Rotated text detection with arbitrary angle support

Architecture

graph TB
    subgraph UI["Jetpack Compose UI"]
        CameraPreview[CameraX Preview]
        Overlay[Canvas Overlay]
        Benchmark[Benchmark Panel]
    end

    subgraph Kotlin["Kotlin Layer"]
        OcrEngine[OcrEngine<br/>JNI Bridge]
        Accelerator[Accelerator Selector]
        Analyzer[ImageAnalyzer<br/>CameraX Callback]
    end

    subgraph Native["Native C++ Layer"]
        LiteRT[LiteRT Runtime]
        Preprocess[Image Preprocess]
        Postprocess[Post-process<br/>DBNet + CTC]
    end

    UI --> Kotlin
    Kotlin -->|JNI| Native
    LiteRT --> GPU[GPU OpenCL]
    LiteRT --> CPU[CPU XNNPACK]

Requirements

Component	Version
Android SDK	28+ (Android 9.0)
NDK	29.0.14206865
CMake	3.22.1
Gradle	9.2.1
AGP	8.13.2
Kotlin	2.3.0 (K2 compiler)
Java	21

Build

# Debug build
./gradlew assembleDebug

# Release build (minified, optimized)
./gradlew assembleRelease

# Run unit tests
./gradlew test

# Run instrumented tests
./gradlew connectedAndroidTest

# Clean build artifacts
./gradlew clean

The release APK is located at app/build/outputs/apk/release/app-release.apk.

Project Structure

graph LR
    subgraph app/src/main
        subgraph java/me/fleey/ppocrv5
            MainActivity.kt
            data
            navigation
            ocr
            ui
            util
        end

        subgraph cpp
            CMakeLists.txt
            ppocrv5_jni.cpp
            ocr_engine.cpp
            text_detector.cpp
            text_recognizer.cpp
            image_utils.cpp
            postprocess.cpp
            litert_cc_sdk
        end

        subgraph assets/models
            ocr_det_fp16.tflite
            ocr_rec_fp16.tflite
            keys_v5.txt
        end

        res
    end

Models

Model	Architecture	Input Shape	Quantization
ocr_det_fp16.tflite	DBNet	[1, 640, 640, 3]	FP16
ocr_rec_fp16.tflite	SVTRv2	[1, 48, W, 3]	FP16
keys_v5.txt	Dictionary	-	-

Models are sourced from PaddleOCR and converted to TFLite format with FP16 quantization.

Model Conversion

The project includes a GitHub Actions workflow for automated model conversion. You can also run the conversion manually.

Conversion Pipeline

graph LR
    A[PaddlePaddle<br/>inference.json] -->|paddle2onnx| B[ONNX]
    B -->|onnxsim + graphsurgeon| C[ONNX Fixed]
    C -->|onnx2tf| D[TFLite FP16]

Automated Conversion (GitHub Actions)

The workflow triggers automatically when app/src/main/assets/models/manifest.json is modified, or can be triggered manually via workflow_dispatch.

Converted models are uploaded as artifacts with 90-day retention.

Manual Conversion

Prerequisites

pip install paddlepaddle paddle2onnx onnxsim huggingface_hub numpy psutil
pip install onnx==1.15.0 onnx_graphsurgeon
pip install tensorflow==2.15.0 tf_keras
pip install sng4onnx onnx2tf==1.22.3

Step 1: Download PP-OCRv5 Models

from huggingface_hub import snapshot_download

snapshot_download(repo_id="PaddlePaddle/PP-OCRv5_mobile_det", local_dir="models_tmp/det")
snapshot_download(repo_id="PaddlePaddle/PP-OCRv5_mobile_rec", local_dir="models_tmp/rec")

Step 2: Convert to ONNX

paddle2onnx --model_dir models_tmp/det \
  --model_filename inference.json \
  --params_filename inference.pdiparams \
  --save_file models_tmp/ocr_det_v5.onnx \
  --opset_version 14

paddle2onnx --model_dir models_tmp/rec \
  --model_filename inference.json \
  --params_filename inference.pdiparams \
  --save_file models_tmp/ocr_rec_v5.onnx \
  --opset_version 14

Step 3: Fix ONNX for GPU Compatibility

The ONNX models require modifications for LiteRT GPU delegate compatibility:

Simplify with static input shapes using onnxsim
Decompose HardSigmoid into primitive ops (Mul, Add, Max, Min)
Change Resize coordinate_transformation_mode from half_pixel to asymmetric

See .github/workflows/convert-models.yml for the complete fix script.

Step 4: Convert to TFLite FP16

# Detection model
onnx2tf -i models_tmp/ocr_det_v5_fixed.onnx -o models_tmp/converted_det \
  -b 1 -ois x:1,3,640,640 -n

# Recognition model
onnx2tf -i models_tmp/ocr_rec_v5_fixed.onnx -o models_tmp/converted_rec \
  -b 1 -ois x:1,3,48,320 -n

The FP16 TFLite files are generated in the output directories.

Step 5: Download Dictionary

wget https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/main/ppocr/utils/dict/ppocrv5_dict.txt \
  -O app/src/main/assets/models/keys_v5.txt

GPU Compatibility Notes

The conversion process ensures compatibility with LiteRT GPU delegate:

Static input shapes (required for GPU acceleration)
No unsupported operators (HardSigmoid decomposed)
Asymmetric resize mode (better GPU support)
FP16 quantization (optimal for mobile GPU)

OCR Pipeline

Detection (DBNet)

Resize input image to 640x640 with aspect ratio preservation
Normalize with ImageNet statistics:
- Mean: [0.485, 0.456, 0.406]
- Std: [0.229, 0.224, 0.225]
Run inference to produce probability map
Threshold binarization (threshold: 0.3)
Contour detection and minimum area rectangle extraction
Filter boxes by confidence (>0.5) and area

Recognition (SVTRv2)

Crop and rotate text region from original image
Resize to height 48, variable width (max 320)
Normalize:
- Mean: [0.5, 0.5, 0.5]
- Std: [0.5, 0.5, 0.5]
Run inference to produce character logits
CTC greedy decoding with dictionary lookup
Output recognized text with confidence score

Hardware Acceleration

The application uses LiteRT with GPU acceleration via OpenCL. The accelerator selection follows this priority:

Priority	Accelerator	Backend	Notes
1	GPU	OpenCL	Recommended for FP16 models
2	CPU	XNNPACK	Universal fallback

GPU acceleration is enabled by default and provides 2-4x speedup over CPU on most devices. The application automatically falls back to CPU if GPU initialization fails.

Native Layer

The native layer is implemented in C++17 using the LiteRT C++ API. Key components:

OcrEngine: Orchestrates detection and recognition pipeline
TextDetector: DBNet inference with post-processing
TextRecognizer: SVTRv2 inference with CTC decoding
ImageUtils: Resize, normalize, crop operations (NEON-optimized)
Postprocess: Contour detection, rotated rectangle extraction

Build optimizations for release:

LTO (Link-Time Optimization)
NEON SIMD vectorization
Function/data section garbage collection
Symbol visibility hiding

Permissions

Permission	Purpose	Required
android.permission.CAMERA	Camera preview and capture	Yes

Supported ABIs

Only arm64-v8a is supported. 32-bit architectures are excluded to reduce APK size and because GPU delegates require 64-bit.

License

Copyright 2025 Fleey

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Acknowledgments

PaddleOCR - PP-OCRv5 models
LiteRT - On-device ML runtime
Abseil - C++ common libraries

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
app		app
gradle		gradle
images		images
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

License

iFleey/PPOCRv5-Android

Folders and files

Latest commit

History

Repository files navigation

PPOCRv5-Android

Screenshots

Features

Architecture

Requirements

Build

Project Structure

Models

Model Conversion

Conversion Pipeline

Automated Conversion (GitHub Actions)

Manual Conversion

Prerequisites

Step 1: Download PP-OCRv5 Models

Step 2: Convert to ONNX

Step 3: Fix ONNX for GPU Compatibility

Step 4: Convert to TFLite FP16

Step 5: Download Dictionary

GPU Compatibility Notes

OCR Pipeline

Detection (DBNet)

Recognition (SVTRv2)

Hardware Acceleration

Native Layer

Permissions

Supported ABIs

License

Acknowledgments

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages