This project offers a Rust implementation of ddddocr
and ocr_api_server
. It provides a binary version for CAPTCHA recognition that doesn't rely on the OpenCV library, ensuring cross-platform compatibility. The goal is to deliver a simple OCR API server that's easy to deploy.
This is an easy-to-use, general-purpose CAPTCHA recognition library written in Rust. We encourage users to report bugs and suggest new features.
- Introduction
- Table of Contents
- Supported Environments
- Installation Steps
- Usage Documentation
- OCR API Server Example
- Troubleshooting
System | CPU | GPU | Notes |
---|---|---|---|
Windows 64-bit | ✔ | ? | Some Windows versions require the VC runtime library. |
Windows 32-bit | ✔ | ? | Static linking is unsupported; some versions need the VC runtime library. |
Linux 64 / ARM64 | ✔ | ? | May require upgrading the glibc version; see glibc upgrade guide. |
Linux 32-bit | ✘ | ? | |
macOS X64 | ✔ | ? | For M1/M2/M3 chips, refer to issue #67. |
-
The
lib.rs
file implementsddddocr
. -
The
main.rs
file implementsocr_api_server
. -
The
model
directory contains models and character sets. -
To include this library in your project, add:
ddddocr = { git = "https://github.com/86maid/ddddocr.git", branch = "master" }
-
To enable CUDA support:
ddddocr = { git = "https://github.com/86maid/ddddocr.git", branch = "master", features = ["cuda"] }
-
The project supports both static and dynamic linking. By default, it uses static linking and will automatically download the necessary libraries during the build process. Ensure your proxy settings are configured correctly. Note that the CUDA feature does not support static linking and will download dynamic libraries as needed.
- If you prefer not to build from source, precompiled binaries are available in the releases section.
- You can also use the configured GitHub Actions for building the project.
- Designed to recognize single-line text, such as common alphanumeric CAPTCHAs. The project supports Chinese, English (with options for case sensitivity), numbers, and certain special characters.
- Example:
let image = std::fs::read("target.png").unwrap(); let mut ocr = ddddocr::ddddocr_classification().unwrap(); let res = ocr.classification(image, false).unwrap(); println!("{:?}", res);
- To use the previous model:
let image = std::fs::read("target.png").unwrap(); let mut ocr = ddddocr::ddddocr_classification_old().unwrap(); let res = ocr.classification(image, false).unwrap(); println!("{:?}", res);
- For images in transparent black PNG format, use the
png_fix
parameter:classification(image, true);
- Example:
let image = std::fs::read("target.png").unwrap(); let mut det = ddddocr::ddddocr_detection().unwrap(); let res = det.detection(image).unwrap(); println!("{:?}", res);
Sample images are available to illustrate object detection capabilities.
The small slider is a separate PNG image with a transparent background, as shown below:
Then, the background contains the slot for the small slider, as shown below:
- Example:
let target_bytes = std::fs::read("target.png").unwrap(); let background_bytes = std::fs::read("background.png").unwrap(); let res = ddddocr::slide_match(target_bytes, background_bytes).unwrap(); println!("{:?}", res);
One image contains the slot (as shown below):
Another image is the original (as shown below):
- Example:
let target_bytes = std::fs::read("target.png").unwrap(); let background_bytes = std::fs::read("background.png").unwrap(); let res = ddddocr::slide_comparison(target_bytes, background_bytes).unwrap(); println!("{:?}", res);
- To provide more flexible control over OCR results, the project supports setting character ranges.
-
Character Range Parameters:
Parameter Value Description 0 Digits 0-9 1 Lowercase letters a-z 2 Uppercase letters A-Z 3 Lowercase and uppercase letters a-z, A-Z 4 Lowercase letters a-z and digits 0-9 5 Uppercase letters A-Z and digits 0-9 6 Lowercase and uppercase letters a-z, A-Z, and digits 0-9 7 Default character set: lowercase a-z, uppercase A-Z, and digits 0-9
-
For custom character sets, provide a string without spaces, where each character represents a candidate, e.g., "0123456789+-x/="
.
Example usage:
let image = std::fs::read("image.png").unwrap();
let mut ocr = ddddocr::ddddocr_classification().unwrap();
// The number 3 corresponds to the enum CharsetRange::LowercaseUppercase;
// there's no need to specify the enum explicitly.
// ocr.set_ranges(3);
// Custom character set
ocr.set_ranges("0123456789+-x/=");
let result = ocr.classification_probability(image, false).unwrap();
// Note: The output might be extensive; be cautious to avoid performance issues.
println!("Probabilities: {}", result.json());
println!("Recognition result: {}", result.get_text());
The project supports importing custom models trained using dddd_trainer.
Example usage:
use ddddocr::*;
let mut ocr = Ddddocr::with_model_charset(
"myproject_0.984375_139_13000_2022-02-26-15-34-13.onnx",
"charsets.json",
).unwrap();
let image_bytes = std::fs::read("888e28774f815b01e871d474e5c84ff2.jpg").unwrap();
let res = ocr.classification(&image_bytes).unwrap();
println!("{:?}", res);
The ocr_api_server
provides a simple API server for OCR tasks.
Usage: ddddocr.exe [OPTIONS]
Options:
-a, --address <ADDRESS>
Listening address [default: 127.0.0.1]
-p, --port <PORT>
Listening port [default: 9898]
-f, --full
Enable all options
--jsonp
Enable cross-origin requests; requires a query parameter specifying the callback function name; cannot use file (multipart) to pass parameters, e.g., http://127.0.0.1:9898/ocr/b64/text?callback=handle&image=xxx
--ocr
Enable content recognition; supports both new and old models
--old
Enable old model content recognition; supports both new and old models
--det
Enable object detection
--ocr-probability <OCR_PROBABILITY>
Enable content probability recognition; supports both new and old models; can only use official models; if the parameter is 0 to 7, it corresponds to the built-in character sets; if the parameter is an empty string, it indicates the default character set; other parameters indicate custom character sets, e.g., "0123456789+-x/="
--old-probability <OLD_PROBABILITY>
Enable old model content probability recognition; supports both new and old models; can only use official models; if the parameter is 0 to 7, it corresponds to the built-in character sets; if the parameter is an empty string, it indicates the default character set; other parameters indicate custom character sets, e.g., "0123456789+-x/="
--ocr-path <OCR_PATH>
Path to content recognition model and character set; uses hash value to determine if it's a custom model; using a custom model will disable the old option; path model/common corresponds to model/common.onnx and character set model/common.json [default: model/common]
--det-path <DET_PATH>
Path to object detection model [default: model/common_det.onnx]
--slide-match
Enable slider recognition
--simple-slide-match
Enable simple slider recognition
--slide-compare
Enable slot recognition
-h, --help
Print help
To test if the server is running, send a GET
or POST
request to http://{host}:{port}/ping
. A successful response will return pong
.
http://{host}:{port}/{opt}/{img_type}/{ret_type}
opt:
ocr Content recognition
old Old model content recognition
det Object detection
ocr_probability Content probability recognition
old_probability Old model content probability recognition
match Slider matching
simple_match Simple slider matching
compare Slot matching
img_type:
file File, i.e., multipart/form-data
b64 Base64, i.e., {"a": encode(bytes), "b": encode(bytes)}
ret_type:
json JSON; success: {"status": 200, "result": object}, failure: {"status": 404, "msg": "failure reason"}
text Text; failure returns an empty string
import requests
import base64
host = "http://127.0.0.1:9898"
file = open('./image/3.png', 'rb').read()
# Test JSONP; can only use b64, not file
api_url = f"{host}/ocr/b64/text"
resp = requests.get(api_url, params={
"callback": "handle",
"image": base64.b64encode(file).decode(),
})
print(f"jsonp, api_url={api_url}, resp.text={resp.text}")
# Test OCR
api_url = f"{host}/ocr/file/text"
resp = requests.post(api_url, files={'image': file})
print(f"api_url={api_url}, resp.text={resp.text}")
-
Ensure both CUDA and cuDNN are properly installed.
-
For CUDA 12, cuDNN 9.x is required.
-
For CUDA 11, cuDNN 8.x is required.
-
It's uncertain whether CUDA 10 is supported.
-
By default, static linking is used, and necessary libraries are automatically downloaded during the build process. Ensure your proxy settings are configured correctly. The CUDA feature does not support static linking and will download dynamic libraries as needed.
-
To specify the path for static linking libraries, set the
ORT_LIB_LOCATION
environment variable. Once set, automatic downloading of libraries will be disabled. -
For example, if the library path is
onnxruntime\build\Windows\Release\Release\onnxruntime.lib
, setORT_LIB_LOCATION
toonnxruntime\build\Windows\Release
. -
Static Linking: By default, the project uses static linking and will automatically download the necessary libraries during the build process. Ensure your proxy settings are correctly configured. Note that the
cuda
feature does not support static linking and will download dynamic libraries automatically. -
Specifying Library Paths: To specify the path for static libraries, set the
ORT_LIB_LOCATION
environment variable. Once set, the build process will not automatically download the libraries. For example, if your library path isonnxruntime\build\Windows\Release\Release\onnxruntime.lib
, setORT_LIB_LOCATION
toonnxruntime\build\Windows\Release
. -
Automatic Library Downloads: The
download-binaries
feature is enabled by default, which automatically downloads the necessary libraries. These libraries are stored inC:\Users\<YourUsername>\AppData\ort.pyke.io
. -
Dynamic Linking: To enable dynamic linking, use the following configuration:
ddddocr = { git = "https://github.com/86maid/ddddocr.git", branch = "master", features = ["load-dynamic"] }
-
After enabling the
load-dynamic
feature, you can specify the path to theonnxruntime
dynamic library usingDdddocr::set_onnxruntime_path
. -
Manual Library Management: With the
load-dynamic
feature enabled, the build process will not automatically download theonnxruntime
library. You must manually download theonnxruntime
library and place it in the program's runtime directory (or system API directory). This eliminates the need to callDdddocr::set_onnxruntime_path
again. -
Windows Static Linking Issues: If you encounter static linking failures on Windows, consider installing Visual Studio 2022.
-
Linux x86-64 Static Linking Issues: For static linking failures on Linux x86-64, install
gcc11
andg++11
. Ensure your Ubuntu version is 20.04 or higher. -
Linux ARM64 Static Linking Issues: On Linux ARM64, static linking failures may require
glibc
version 2.35 or higher (Ubuntu 22.04 or above). -
macOS Static Linking Issues: For macOS, static linking requires macOS version 10.15 or higher.
-
CUDA Testing Issues: When running
cargo test
with CUDA enabled, you might encounter a panic with exit code0xc000007b
. This occurs because the automatically generated dynamic library is located in thetarget/debug
directory. Manually copy it to thetarget/debug/deps
directory, as CUDA currently does not support static linking. -
Dynamic Linking Requirements: Dynamic linking requires onnxruntime version 1.18.x.
For more detailed troubleshooting and information, visit ort.pyke.io.