Skip to content

Latest commit

 

History

History
41 lines (33 loc) · 1.92 KB

README.md

File metadata and controls

41 lines (33 loc) · 1.92 KB

Chaplin

Chaplin Thumbnail

A visual speech recognition (VSR) tool that reads your lips in real-time and types whatever you silently mouth. Runs fully locally.

Relies on a model trained on the Lip Reading Sentences 3 dataset as part of the Auto-AVSR project.

Watch a demo of Chaplin here.

Setup

  1. Clone the repository, and cd into it:
    git clone https://github.com/amanvirparhar/chaplin
    cd chaplin
  2. Download the required model components: LRS3_V_WER19.1 and lm_en_subword.
  3. Unzip both folders, and place them in their respective directories:
    chaplin/
    ├── benchmarks/
        ├── LRS3/
            ├── language_models/
                ├── lm_en_subword/
            ├── models/
                ├── LRS3_V_WER19.1/
    ├── ...
    
  4. Install and run ollama, and pull the llama3.2 model.
  5. Install uv.

Usage

  1. Run the following command:
    sudo uv run --with-requirements requirements.txt --python 3.12 main.py config_filename=./configs/LRS3_V_WER19.1.ini detector=mediapipe
  2. Once the camera feed is displayed, you can start "recording" by pressing the option key (Mac) or the alt key (Windows/Linux), and start mouthing words.
  3. To stop recording, press the option key (Mac) or the alt key (Windows/Linux) again. You should see some text being typed out wherever your cursor is.
  4. To exit gracefully, focus on the window displaying the camera feed and press q.