Why Pytesseract Raise An Error With Arabic Language
Solution 1:
I suggest using the proper language model and the latest version:
For Windows 10:
tesseract-ocr-w64-setup-v5.0.0-alpha.20200328.exe (64 bit) resp.
To validate installation in the power shell or cmd terminal execute:
tesseract -v
It will output something like this: tesseract v5.0.0-alpha.20200328
For Mac OS:
brew install tesseract
To validate installation in the power shell or cmd terminal execute:
tesseract -v
It will output something like this: tesseract 4.1.1 and also the installed image libraries leptonica-1.80.0 libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1 Found AVX2 Found AVX Found FMA Found SSE
If you are not sure about the path, then simply copy paste the ara.traindata file in the same folder as that of your Python .py file
import pytesseract
from PIL import Image
import os
os.environ["TESSDATA_PREFIX"] = ""# Leaving it empty because file is already copy pasted in the current directoryprint(os.getenv("TESSDATA_PREFIX"))
# Copy paste the ara.traineddata file in the same directory as this python codeprint(pytesseract.image_to_string(Image.open('cropped.png'), lang="ara"))
For Linux/Ubuntu OS:
sudo apt-get install tesseract-ocr
The validation and run code is same as that of Mac Os
Also make sure the path is fine.
This code works fine if the ara.traineddata file is downloaded successfully:
import pytesseract
from PIL import Image
print(pytesseract.image_to_string(Image.open('cropped.png'), lang="ara"))
You can follow this tutorial for details. Here is the demo output of this tutorial which uses Arabic language as well.
Post a Comment for "Why Pytesseract Raise An Error With Arabic Language"