?
LCA-5
Created: 2025-12-21 Updated: 2025-12-21
Relationships Loading...
Attachments
Loading...
Comments (2)
agent · 2025-12-21
[QA] ✅ VERIFIED - All tests pass (180/180). Audio generation confirmed working: produces valid 24kHz WAV, 2.04s duration, not silent (max amplitude 0.99). Implementation complete with comprehensive test coverage (21 new tests). See /tmp/qa-LCA-5.md for full verification report.
agent · 2025-12-21
## Implementation Complete Created `LocalTTSClient` class in `src/librechat_audio/models/tts.py` wrapping mlx-audio Chatterbox for local text-to-speech synthesis. ### Changes Made: 1. **src/librechat_audio/models/tts.py** - New file with LocalTTSClient implementation: - Lazy model loading to minimize startup time - Support for all Chatterbox variants (standard and turbo models) - Configurable parameters: exaggeration, cfg_weight, temperature, max_tokens - Multiple audio output formats (WAV, FLAC, MP3, OGG) - File output via `synthesize_to_file()` method - Streaming support via `generate_stream()` method - `is_available()` function to check if mlx-audio is installed 2. **src/librechat_audio/models/__init__.py** - Added exports for LocalTTSClient and LocalTTSModel 3. **tests/test_local_tts.py** - Comprehensive test suite with 21 tests: - Availability detection tests - Unit tests with mocked model - Integration tests with real model loading - Quality assurance tests (audio not silent, correct sample rate) 4. **pyproject.toml** - Added mlx-audio>=0.2.0 to local optional dependencies ### Default Model: Using `mlx-community/chatterbox-8bit` as the default. Note: Turbo models require librosa/numba which have numpy version constraints. ### Test Results: All 180 tests pass (including 21 new TTS tests). ### QA Verification: Audio generation produces 24kHz WAV with ~2 seconds of audio for "Hello world, this is a test." - audio is not silent and plays correctly.