mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2025-04-20 03:18:44 +08:00
Some checks failed
autotools / linux (map[cxx:clang++-15 name:ubuntu-22.04-clang-15-autotools os:ubuntu-22.04]) (push) Has been cancelled
autotools / linux (map[cxx:g++-10 name:ubuntu-20.04-gcc-10-autotools os:ubuntu-20.04]) (push) Has been cancelled
autotools / linux (map[cxx:g++-11 name:ubuntu-22.04-gcc-11-autotools os:ubuntu-22.04]) (push) Has been cancelled
autotools / linux (map[cxx:g++-12 name:ubuntu-22.04-gcc-12-autotools os:ubuntu-22.04]) (push) Has been cancelled
autotools / linux (map[cxx:g++-9 name:ubuntu-20.04-gcc-9-autotools os:ubuntu-20.04]) (push) Has been cancelled
cmake / ${{ matrix.config.name }} (map[cxx:clang++ name:macos-14-clang-15-cmake os:macos-14]) (push) Has been cancelled
cmake / ${{ matrix.config.name }} (map[cxx:clang++ name:macos-15-clang-cmake os:macos-15]) (push) Has been cancelled
cmake / ${{ matrix.config.name }} (map[cxx:clang++-15 name:ubuntu-22.04-clang-15-cmake os:ubuntu-22.04]) (push) Has been cancelled
cmake / ${{ matrix.config.name }} (map[cxx:g++-10 name:ubuntu-20.04-gcc-10-cmake os:ubuntu-20.04]) (push) Has been cancelled
cmake / ${{ matrix.config.name }} (map[cxx:g++-11 name:ubuntu-22.04-gcc-11-cmake os:ubuntu-22.04]) (push) Has been cancelled
cmake / ${{ matrix.config.name }} (map[cxx:g++-12 name:ubuntu-22.04-gcc-12-cmake os:ubuntu-22.04]) (push) Has been cancelled
cmake / ${{ matrix.config.name }} (map[cxx:g++-14 name:macos-14-gcc-14-cmake os:macos-14]) (push) Has been cancelled
cmake / ${{ matrix.config.name }} (map[cxx:g++-9 name:ubuntu-20.04-gcc-9-cmake os:ubuntu-20.04]) (push) Has been cancelled
cmake-win64 / cmake-win64 (push) Has been cancelled
msys2 / windows (mingw-w64-x86_64, MINGW64) (push) Has been cancelled
autotools-macos / brew (map[cxx:clang++ name:macos-latest-clang-autotools os:macos-latest]) (push) Has been cancelled
autotools-macos / ports (map[cxx:clang++ name:macos-latest-clang-autotools os:macos-latest]) (push) Has been cancelled
vcpkg / build (windows-2019) (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
unittest / ${{ matrix.config.name }} (map[cxx:clang++ cxxflags:-g -O2 -fsanitize=address,undefined -stdlib=libc++ name:ubuntu-22.04-clang-unittest os:ubuntu-22.04]) (push) Has been cancelled
unittest / ${{ matrix.config.name }} (map[cxx:g++ cxxflags:-g -O2 -fsanitize=address,undefined name:ubuntu-20.04-gcc-unittest os:ubuntu-20.04]) (push) Has been cancelled
unittest-macos / ${{ matrix.config.name }} (map[cxx:clang++ name:macos-arm-14-clang-unittest os:macos-14]) (push) Has been cancelled
unittest-macos / ${{ matrix.config.name }} (map[cxx:clang++ name:macos-latest-clang-unittest os:macos-latest]) (push) Has been cancelled
unittest-macos / ${{ matrix.config.name }} (map[cxx:g++ name:macos-latest-gcc-unittest os:macos-latest]) (push) Has been cancelled
sw / build (fedora:latest, ubuntu-22.04) (push) Has been cancelled
sw / build (macos-latest) (push) Has been cancelled
sw / build (windows-2022) (push) Has been cancelled
unittest-disablelegacy / linux (clang++-15, ubuntu-22.04) (push) Has been cancelled
unittest-disablelegacy / linux (g++, ubuntu-22.04) (push) Has been cancelled
add info about using egorpugin/tessdata tessdata_unittest
2.9 KiB
2.9 KiB
Unit Testing for Tesseract
Requirements
Files and structure
├── langdata_lstm
│ ├── common.punc
│ ├── common.unicharambigs
│ ├── desired_bigrams.txt
│ ├── eng
│ │ ├── desired_characters
│ │ ├── eng.config
│ │ ├── eng.numbers
│ │ ├── eng.punc
│ │ ├── eng.singles_text
│ │ ├── eng.training_text
│ │ ├── eng.unicharambigs
│ │ ├── eng.wordlist
│ │ └── okfonts.txt
│ ├── extended
│ │ └── extended.config
│ ├── extendedhin
│ │ └── extendedhin.config
│ ├── font_properties
│ ├── forbidden_characters_default
│ ├── hin
│ │ ├── hin.config
│ │ ├── hin.numbers
│ │ ├── hin.punc
│ │ └── hin.wordlist
│ ├── kan
│ │ └── kan.config
│ ├── kor
│ │ └── kor.config
│ ├── osd
│ │ └── osd.unicharset
│ └── radical-stroke.txt
├── tessdata
│ ├── ara.traineddata
│ ├── chi_tra.traineddata
│ ├── eng.traineddata
│ ├── heb.traineddata
│ ├── hin.traineddata
│ ├── jpn.traineddata
│ ├── kmr.traineddata
│ ├── osd.traineddata
│ └── vie.traineddata
├── tessdata_best
│ ├── eng.traineddata
│ ├── fra.traineddata
│ ├── kmr.traineddata
│ └── osd.traineddata
├── tessdata_fast
│ ├── eng.traineddata
│ ├── kmr.traineddata
│ ├── osd.traineddata
│ └── script
│ └── Latin.traineddata
└── tesseract
...
├── test
├── unittest
│ └── third_party/googletest
└── VERSION
Fonts
- Microsoft fonts: arialbi.ttf, times.ttf, verdana.ttf - installation guide
- ae_Arab.ttf
- dejavu-fonts: DejaVuSans-ExtraLight.ttf
- Lohit-Hindi.ttf
- UnBatang.ttf
Run tests
To run the tests, do the following in tesseract folder
autoreconf -fiv
git submodule update --init
git clone https://github.com/egorpugin/tessdata tessdata_unittest --depth 1
cp tessdata_unittest/fonts/* test/testing/
mv tessdata_unittest/* ../
export TESSDATA_PREFIX=/prefix/to/path/to/tessdata
make check