←
LCA-21
Created: 2025-12-21
•
Updated: 2025-12-21
Relationships
Loading...
Attachments
Loading...
Comments (2)
agent
·
2025-12-22
## Voice Sample Generation
Samples were generated using OpenAI's `tts-1-hd` model via:
```bash
OPENAI_API_KEY=sk-xxx python scripts/generate_voice_samples.py
```
**Test phrase used:** A ~30 second phrase designed to capture vocal characteristics.
**Output:** 9 WAV files (~1.3MB each, 24kHz mono) in `voices/` directory
**Voices:** alloy, ash, coral, echo, fable, nova, onyx, sage, shimmer
The samples are used as reference audio for Chatterbox voice cloning, allowing the local TTS to emulate OpenAI voices transparently.
agent
·
2025-12-21
## Implementation Complete
### What was done:
1. **Generated OpenAI TTS voice samples** for all 9 voices available in tts-1/tts-1-hd models:
- alloy, ash, coral, echo, fable, nova, onyx, sage, shimmer
- Samples saved to `voices/openai_*.wav` (~1.3 MB each)
- Note: Original issue mentioned 8 voices (ballad, verse) but these are only available for gpt-4o-mini-tts model
2. **Added voice cloning support to LocalTTSClient**:
- New `voice` parameter accepts OpenAI voice names
- New `ref_audio` parameter accepts custom reference audio path
- Helper functions: `get_voice_sample_path()`, `load_reference_audio()`
- Works with synthesize() and generate_stream() methods
3. **Created voice mapping configuration** (`voices/voice_config.json`):
- Voice characteristics and descriptions for each voice
- Recommended Chatterbox parameters (exaggeration, cfg_weight, temperature)
- Model compatibility info
4. **Added script** `scripts/generate_voice_samples.py` for regenerating samples
5. **Comprehensive test suite** (25 tests in `tests/test_voice_cloning.py`):
- All voice samples exist and load correctly
- API accepts voice/ref_audio parameters
- Configuration file validation
### Files changed:
- `src/librechat_audio/models/tts.py` - Added voice cloning support
- `src/librechat_audio/tts.py` - Updated TTSVoice type to correct voices
- `scripts/generate_voice_samples.py` - New script for sample generation
- `tests/test_voice_cloning.py` - New test suite
- `voices/*.wav` - 9 OpenAI voice reference samples
- `voices/voice_config.json` - Voice configuration
### Usage example:
```python
from librechat_audio.models.tts import LocalTTSClient
client = LocalTTSClient()
# Clone using OpenAI voice name
audio = client.synthesize("Hello!", voice="alloy")
# Or use custom reference audio
audio = client.synthesize("Hello!", ref_audio="path/to/reference.wav")
```
### Test results:
- Voice cloning tests: 25/25 passed
- Full test suite: 573/574 passed (1 pre-existing failure in test_cross_validation.py unrelated to this change)