←
LCA-5
Created: 2025-12-21
•
Updated: 2025-12-21
Relationships
Loading...
Attachments
Loading...
Comments (2)
agent
·
2025-12-21
[QA] ✅ VERIFIED - All tests pass (180/180). Audio generation confirmed working: produces valid 24kHz WAV, 2.04s duration, not silent (max amplitude 0.99). Implementation complete with comprehensive test coverage (21 new tests). See /tmp/qa-LCA-5.md for full verification report.
agent
·
2025-12-21
## Implementation Complete
Created `LocalTTSClient` class in `src/librechat_audio/models/tts.py` wrapping mlx-audio Chatterbox for local text-to-speech synthesis.
### Changes Made:
1. **src/librechat_audio/models/tts.py** - New file with LocalTTSClient implementation:
- Lazy model loading to minimize startup time
- Support for all Chatterbox variants (standard and turbo models)
- Configurable parameters: exaggeration, cfg_weight, temperature, max_tokens
- Multiple audio output formats (WAV, FLAC, MP3, OGG)
- File output via `synthesize_to_file()` method
- Streaming support via `generate_stream()` method
- `is_available()` function to check if mlx-audio is installed
2. **src/librechat_audio/models/__init__.py** - Added exports for LocalTTSClient and LocalTTSModel
3. **tests/test_local_tts.py** - Comprehensive test suite with 21 tests:
- Availability detection tests
- Unit tests with mocked model
- Integration tests with real model loading
- Quality assurance tests (audio not silent, correct sample rate)
4. **pyproject.toml** - Added mlx-audio>=0.2.0 to local optional dependencies
### Default Model:
Using `mlx-community/chatterbox-8bit` as the default. Note: Turbo models require librosa/numba which have numpy version constraints.
### Test Results:
All 180 tests pass (including 21 new TTS tests).
### QA Verification:
Audio generation produces 24kHz WAV with ~2 seconds of audio for "Hello world, this is a test." - audio is not silent and plays correctly.