← Back to Resume

Voice AI System with Local RAG

On-Premise Dual-Mode AI System • January 2025

3x Faster
100% Local
Dual-Mode
No API
🎤
Voice Input
Continuous listening
🎵
🔊
VAD
Activity detection
🎵
👤
Speaker Detection
Verify user
🧠
Whisper STT
Fine-tuned model
CTranslate2 (3x faster)
🤖
Local LLM
Command extraction
💬
⚙️
Command Processor
Execute actions
Done!
Model Fine-tuning Process
🤖
Base Whisper
📚
Custom Training Data
⚙️
Fine-tuning (PyTorch)
CTranslate2 Convert
Optimized Model
Example Voice Commands
Start scan
Adjust position
Capture image
Stop process

Optimization Results

CTranslate2 + 8-bit Quantization

3x
Inference Speed
50%
Memory Usage
100%
Local
🔒

100% On-Premise

No external API calls, all processing local

Real-time Processing

3x speedup with CTranslate2 optimization

🎯

Fine-tuned Models

Custom training for domain-specific accuracy

Python
PyTorch
Whisper
Transformers
CTranslate2
Local LLM
LlamaIndex
ChromaDB
Docling
PyInstaller
C# WPF