We are on a mission to bring fast, cross platform GPU accelerated inference on native + browser.
Note
Ratchet is currently in active development. We are working on the engine, adding more models and improving compatibility. Please, reach out if you'd like to help!
The easiest way to experience Ratchet is to check out our Hugging Face spaces:
To dig deeper, check out the examples
We welcome contributions from the community. If you have any ideas or suggestions, please feel free to open an issue or pull request.
// Asynchronous loading & caching with IndexedDB
let model = await Model.load(AvailableModels.WHISPER_TINY, Quantization.Q8, (p: number) => setProgress(p))
let result = await model.run({ input });
Rust crate & CLI coming soon...
We want a toolkit for developers to make integrating performant AI functionality into existing production applications easy. The following principles will help us accomplish this:
- Inference only
- WebGPU/CPU only
- First class quantization support
- Lazy computation
- Inplace by default
- Whisper
- Phi 2 & 3
- Moondream
- Gemini 2 2B