Whisper AI is a powerful speech-to-text model by OpenAI that allows for high-quality transcription. This guide walks you through the step-by-step installation process.
Step 1: Install Python
Whisper AI requires Python to run.
- Download Python from python.org.
- Ensure you install Python 3.8 or later (Whisper supports up to Python 3.11).
- During installation, check the box to add Python to PATH.
- Verify installation by running:
python --version
Step 2: Install PyTorch
Whisper AI depends on PyTorch for deep learning functionalities.
- Visit pytorch.org and follow the instructions for your system.
- Example installation command:
pip install torch torchvision torchaudio
- Verify installation:
python -c "import torch; print(torch.__version__)"
Step 3: Install Chocolatey (Windows Users Only)
Chocolatey is a package manager for Windows.
- Open PowerShell as an administrator and run:
Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
- Verify installation:
choco --version
Step 4: Install FFmpeg
FFmpeg is required for handling audio files.
- Windows (via Chocolatey):
choco install ffmpeg
- macOS:
brew install ffmpeg
- Linux (Ubuntu/Debian):
sudo apt update && sudo apt install ffmpeg
- Verify installation:
ffmpeg -version
Step 5: Install Whisper AI
- Run the installation command:
pip install -U openai-whisper
- Alternatively, install directly from GitHub for the latest version:
pip install git+https://github.com/openai/whisper.git
- To update Whisper AI:
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
- Verify installation:
whisper --help
Step 6: Run a Test Transcription
To check if Whisper AI is working, run:
whisper example.mp3 --model small
This will generate a transcription of example.mp3.
Additional Features
- Use Different Models: Whisper supports multiple models (tiny, small, medium, large). Example:
whisper example.mp3 --model medium
- Transcribe Multiple Files:
whisper file1.mp3 file2.mp3
- Specify Language:
whisper example.mp3 --language English
- Translate Non-English Speech to English:
whisper example.mp3 --task translate
CUDA Compatibility for GPU Acceleration
If you have an NVIDIA GPU, you can speed up transcription with CUDA:
- Install CUDA-compatible PyTorch by selecting the correct version from pytorch.org.
- Install NVIDIA drivers and CUDA from NVIDIA.
- Run Whisper using CUDA:
whisper example.mp3 --model large --device cuda
Congratulations! 🎉 You have successfully installed and set up Whisper AI for transcription. For further details, visit the Whisper GitHub repository.