DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.
NOTE: This documentation applies to the master branch of DeepSpeech only. If you're using a stable release, you must use the documentation for the corresponding version by using GitHub's branch switcher button above.
To install and use deepspeech all you have to do is:
# Create and activate a virtualenv virtualenv -p python3 $HOME/tmp/deepspeech-venv/ source $HOME/tmp/deepspeech-venv/bin/activate # Install DeepSpeech pip3 install deepspeech # Download pre-trained English model and extract curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-models.tar.gz tar xvf deepspeech-0.6.0-models.tar.gz # Download example audio files curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/audio-0.6.0.tar.gz tar xvf audio-0.6.0.tar.gz # Transcribe an audio file deepspeech --model deepspeech-0.6.0-models/output_graph.pbmm --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --audio audio/2830-3980-0043.wav
A pre-trained English model is available for use and can be downloaded using the instructions below. Currently, only 16-bit, 16 kHz, mono-channel WAVE audio files are supported in the Python client. A package with some example audio files is available for download in our release notes.
Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the release notes to find which GPUs are supported. To run
deepspeech on a GPU, install the GPU specific package:
# Create and activate a virtualenv virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/ source $HOME/tmp/deepspeech-gpu-venv/bin/activate # Install DeepSpeech CUDA enabled package pip3 install deepspeech-gpu # Transcribe an audio file. deepspeech --model deepspeech-0.6.0-models/output_graph.pbmm --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --audio audio/2830-3980-0043.wav
Please ensure you have the required CUDA dependencies.
See the output of
deepspeech -h for more information on the use of
deepspeech. (If you experience problems running
deepspeech, please check required runtime dependencies).
Table of Contents
- Using a Pre-trained Model
- Trying out DeepSpeech with examples
- Training your own Model
- Prerequisites for training a model
- Getting the training code
- Installing Python dependencies
- Common Voice training data
- Training a model
- Exporting a model for inference
- Exporting a model for TFLite
- Making a mmap-able model for inference
- Continuing training from a release model
- Training with Augmentation
- Contribution guidelines
- Contact/Getting Help