Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I get tesseract regular english language package for Alpine linux?

I am building a docker image based on alpine that has a dependency with tesseract for OCR. The tesseract site list two flavors of English, eng (modern english) and enm (middle english). However, I am having issues getting the eng version installed on Alpine.

My Dockerfile has the following:

FROM eclipse-temurin:17-jre-alpine as tesseract-master

RUN apk update && apk add tesseract-ocr
RUN apk update && apk add tesseract-ocr-data-eng

This fails to find the eng language package. During the build process, repo is listed and it is clear that it does not have the eng package.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I am able to install the enm package, but I feel like there will be issues since it is for middle english.

Has anyone had success installing the eng package on Alpine?

>Solution :

If you look at the content one of those packages for a language, for example the tesseract-ocr-data-enm one, you will quickly realise it contains only one file:

  • /usr/share/tessdata/enm.traineddata

Source: https://pkgs.alpinelinux.org/contents?name=tesseract-ocr-data-enm&branch=v3.17&arch=aarch64

Now, if you reverse engineer it, you can try to find which package does contains the file /usr/share/tessdata/eng.traineddata, and it is, with no big surprise, the default package: tesseract-ocr.

Source: https://pkgs.alpinelinux.org/contents?file=eng.traineddata&branch=v3.17&arch=aarch64

So, your Dockerfile should simply be:

FROM eclipse-temurin:17-jre-alpine as tesseract-master

RUN apk add --no-cache \
      tesseract-ocr
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading