LCA-21 - trckr

← LCA-21

Created: 2025-12-21 • Updated: 2025-12-21

Relationships Loading...

Pull Requests (comma-separated URLs)

Attachments

Comments (2)

agent · 2025-12-22

## Voice Sample Generation Samples were generated using OpenAI's `tts-1-hd` model via: ```bash OPENAI_API_KEY=sk-xxx python scripts/generate_voice_samples.py ``` **Test phrase used:** A ~30 second phrase designed to capture vocal characteristics. **Output:** 9 WAV files (~1.3MB each, 24kHz mono) in `voices/` directory **Voices:** alloy, ash, coral, echo, fable, nova, onyx, sage, shimmer The samples are used as reference audio for Chatterbox voice cloning, allowing the local TTS to emulate OpenAI voices transparently.

agent · 2025-12-21

## Implementation Complete ### What was done: 1. **Generated OpenAI TTS voice samples** for all 9 voices available in tts-1/tts-1-hd models: - alloy, ash, coral, echo, fable, nova, onyx, sage, shimmer - Samples saved to `voices/openai_*.wav` (~1.3 MB each) - Note: Original issue mentioned 8 voices (ballad, verse) but these are only available for gpt-4o-mini-tts model 2. **Added voice cloning support to LocalTTSClient**: - New `voice` parameter accepts OpenAI voice names - New `ref_audio` parameter accepts custom reference audio path - Helper functions: `get_voice_sample_path()`, `load_reference_audio()` - Works with synthesize() and generate_stream() methods 3. **Created voice mapping configuration** (`voices/voice_config.json`): - Voice characteristics and descriptions for each voice - Recommended Chatterbox parameters (exaggeration, cfg_weight, temperature) - Model compatibility info 4. **Added script** `scripts/generate_voice_samples.py` for regenerating samples 5. **Comprehensive test suite** (25 tests in `tests/test_voice_cloning.py`): - All voice samples exist and load correctly - API accepts voice/ref_audio parameters - Configuration file validation ### Files changed: - `src/librechat_audio/models/tts.py` - Added voice cloning support - `src/librechat_audio/tts.py` - Updated TTSVoice type to correct voices - `scripts/generate_voice_samples.py` - New script for sample generation - `tests/test_voice_cloning.py` - New test suite - `voices/*.wav` - 9 OpenAI voice reference samples - `voices/voice_config.json` - Voice configuration ### Usage example: ```python from librechat_audio.models.tts import LocalTTSClient client = LocalTTSClient() # Clone using OpenAI voice name audio = client.synthesize("Hello!", voice="alloy") # Or use custom reference audio audio = client.synthesize("Hello!", ref_audio="path/to/reference.wav") ``` ### Test results: - Voice cloning tests: 25/25 passed - Full test suite: 573/574 passed (1 pre-existing failure in test_cross_validation.py unrelated to this change)