Tesseract.js is a javascript library that gets words in almost any language out of images. (Demo(link is external))
Image Recognition
Video Real-time Recognition
Tesseract.js wraps an emscripten(link is external) port(link is external) of the Tesseract(link is external) OCR(link is external) Engine.
It works in the browser using webpack(link is external) or plain script tags with a CDN and on the server with Node.js(link is external).
After you install it, using it is as simple as:
import Tesseract from 'tesseract.js';
Tesseract.recognize(
'https://tesseract.projectnaptha.com/img/eng_bw.png',
'eng',
{ logger: m => console.log(m) }
).then(({ data: { text } }) => {
console.log(text);
})
Or more imperative
import { createWorker } from 'tesseract.js';
const worker = createWorker({
logger: m => console.log(m)
});
(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
console.log(text);
await worker.terminate();
})();
Check out the docs for a full explanation of the API.
Major changes in v3
- Significantly faster performance
- Runtime reduction of 84% for Browser and 96% for Node.js when recognizing the example images
- Upgrade to Tesseract v5.1.0 (using emscripten 3.1.18)
- Added SIMD-enabled build for supported devices
- Added support:
- Node.js version 18
- Removed support:
- ASM.js version, any other old versions of Tesseract.js-core (<3.0.0)
- Node.js versions 10 and 12
Major changes in v2
- Upgrade to tesseract v4.1.1 (using emscripten 1.39.10 upstream)
- Support multiple languages at the same time, eg: eng+chi_tra for English and Traditional Chinese
- Supported image formats: png, jpg, bmp, pbm
- Support WebAssembly (fallback to ASM.js when browser doesn't support)
- Support Typescript
Read a story about v2: Why I refactor tesseract.js v2?(link is external)
Check the support/1.x(link is external) branch for version 1
Installation
Tesseract.js works with a <script>
tag via local copy or CDN, with webpack via npm
and on Node.js with npm/yarn
.
CDN
<!-- v2 -->
<script src='https://unpkg.com/tesseract.js@v2.1.0/dist/tesseract.min.js'></script>
<!-- v1 -->
<script src='https://unpkg.com/tesseract.js@1.0.19/src/index.js'></script>
After including the script the Tesseract
variable will be globally available.
Node.js
Tesseract.js v3 requires Node.js v14 or higher
# For v3
npm install tesseract.js
yarn add tesseract.js
# For v2
npm install tesseract.js@2
yarn add tesseract.js@2
Documentation
Use tesseract.js the way you like!
- Offline Version: https://github.com/jeromewu/tesseract.js-offline(link is external)
- Electron Version: https://github.com/jeromewu/tesseract.js-electron(link is external)
- Custom Traineddata: https://github.com/jeromewu/tesseract.js-custom-traineddata(link is external)
- Chrome Extension #1: https://github.com/jeromewu/tesseract.js-chrome-extension(link is external)
- Chrome Extension #2: https://github.com/fxnoob/image-to-text(link is external)
- Firefox Extension: https://github.com/gnonio/korporize(link is external)
- With Vue: https://github.com/jeromewu/tesseract.js-vue-app(link is external)
- With Angular: https://github.com/jeromewu/tesseract.js-angular-app(link is external)
- With React: https://github.com/jeromewu/tesseract.js-react-app(link is external)
- Typescript: https://github.com/jeromewu/tesseract.js-typescript(link is external)
- Video Real-time Recognition: https://github.com/jeromewu/tesseract.js-video(link is external)
Contributing
Development
To run a development copy of Tesseract.js do the following:
# First we clone the repository
git clone https://github.com/naptha/tesseract.js.git
cd tesseract.js
# Then we install the dependencies
npm install
# And finally we start the development server
npm start
The development server will be available at http://localhost:3000/examples/browser/demo.html(link is external) in your favorite browser.
It will automatically rebuild tesseract.dev.js
and worker.dev.js
when you change files in the src folder.
Online Setup with a single Click
You can use Gitpod(A free online VS Code like IDE) for contributing. With a single click it will launch a ready to code workspace with the build & start scripts already in process and within a few seconds it will spin up the dev server so that you can start contributing straight away without wasting any time.
Building Static Files
To build the compiled static files just execute the following:
npm run build
This will output the files into the dist
directory.
Contributors
Code Contributors
This project exists thanks to all the people who contribute. [Contribute].
Financial Contributors
Become a financial contributor and help us sustain our community. [Contribute(link is external)]
Individuals
Organizations
Support this project with your organization. Your logo will show up here with a link to your website. [Contribute(link is external)]