# Pipecat Voice Agent

Real-time conversational AI agent powered by Pipecat, integrated with Veena3 TTS.

## Features




- 🎙️ **Real-time Voice Conversations** - WebRTC support (WebSocket DISABLED)
- 🗣️ **Custom TTS Integration** - Veena3 streaming TTS with zero-buffering
- 🤖 **Multiple LLM Options** - OpenAI, Google Gemini, OpenRouter
- 🎯 **Smart Turn Detection** - Advanced
VAD with Silero
- 📊 **RTVI Protocol Support** - Full RTVI compatibility
- 🔄 **User Idle Detection** - Automatic nudges for inactive users
- ⚠️ **WebSocket Disabled** - Use WebRTC transport instead

## Architecture

```
User → WebRTC (Daily.co) → Pipecat → LLM (OpenRouter/OpenAI/Gemini)
                              ↓
                         STT (Deepgram)
                              ↓
                    TTS (Veena3 API - Zero Buffer)
                              ↓
                    Audio Stream (24kHz PCM)
                              ↓
                         Immediate Yield
```

**Note**: WebSocket transport is disabled. All connections use WebRTC.

## Voice Configuration

**Current Voice:** Australian Female (Late 20s)
- Low-pitch, slightly raspy timbre
- Upbeat pacing, approachable tone
- Full of energy and curiosity

**TTS Settings:**
- Endpoint: Veena3 Streaming API
- Sample Rate: 24kHz
- Temperature: 0.4
- Top P: 0.9
- Supports emotion tags: `<excited>`, `<curious>`, `<gasp>`, etc.

## Installation

### Prerequisites
- Python 3.10+
- Virtual environment

### Setup

1. **Clone the repository:**
```bash
git clone https://github.com/MayaResearch/pipecat.git
cd pipecat
```

2. **Create virtual environment:**
```bash
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
```

3. **Install dependencies:**
```bash
pip install -r requirements.txt
```

4. **Configure environment variables:**
Create a `.env` file:
```bash
# TTS API
TTS_API_URL=http://localhost:8000/v1/tts/stream

# STT (Deepgram)
DEEPGRAM_API_KEY=your_deepgram_api_key

# LLM (choose one or multiple)
OPENAI_API_KEY=your_openai_api_key
OPENROUTER_API_KEY=your_openrouter_api_key

# WebRTC (Daily.co)
DAILY_API_KEY=your_daily_api_key

# Server mode
WEBSOCKET_SERVER=fast_api
```

## Running the Server

### Standard Mode (FastAPI + WebSocket):
```bash
python server.py
```

Server will start on `http://0.0.0.0:3006`

### Production Mode (with logs):
```bash
mkdir -p logs
nohup python server.py > logs/server.log 2>&1 &
```

## API Endpoints

### ⚠️ WebSocket NOT Supported
**WS** `/ws`

WebSocket connections are immediately rejected with an error message.

**Response:**
```json
{
  "error": "websocket_not_supported",
  "message": "WebSocket transport is not currently supported. Please use WebRTC transport instead.",
  "supported_transports": ["webrtc"],
  "webrtc_endpoint": "/connect"
}
```

### WebRTC Connection (PRIMARY - Daily.co)
**POST** `/connect`

Creates a Daily.co room and returns connection details.

**Request:**
```json
{
  "language": "te"  // Options: "te", "en", "hi"
}
```

**Response:**
```json
{
  "room_url": "https://maya-c.daily.co/xxxxx",
  "token": "xxxxxxxx"
}
```

## Configuration Files

### Bot Implementations
- `bot_fast_api.py` - FastAPI bot (HTTP mode, no transport)
- `bot_webrtc_daily.py` - WebRTC/Daily.co bot (PRIMARY)
- `bot_websocket_server.py` - ⚠️ **DEPRECATED** - WebSocket not supported

### Core Services
- `custom_tts.py` - Veena3 TTS integration (zero-buffering, immediate streaming)
- `tts_verifier.py` - TTS output verification (stub)
- `server.py` - Main FastAPI server (WebSocket disabled, WebRTC only)

## Customization

### Change Voice
Edit the `voice_description` in `bot_fast_api.py` or `bot_webrtc_daily.py`:

```python
tts = CustomTTSService(
    api_url=os.getenv("TTS_API_URL", "http://localhost:8000/v1/tts/stream"),
    voice_description="Your custom voice description here",
    temperature=0.4,
    # ... other params
)
```

### Change LLM
Uncomment the desired LLM in the bot files:
- OpenAI
- Google Gemini
- OpenRouter (default)

### Adjust System Prompt
Modify the `SYSTEM_INSTRUCTION` in the bot files to change agent behavior.

## Logging

Logs are written to:
- `logs/server.log` - Server startup and connection logs
- `logs/bot.log` - Detailed bot operation logs (DEBUG level)

## Troubleshooting

### TTS 404 Errors
Ensure `TTS_API_URL` in `.env` points to a valid endpoint:
```bash
TTS_API_URL=http://localhost:8000/v1/tts/stream
```

### WebRTC Connection Issues
Verify `DAILY_API_KEY` is set correctly in `.env`

### Audio Quality Issues
Check sample rate matches between TTS (24kHz) and bot configuration

## Production Deployment

For production deployment with nginx:
- Use SSL/TLS certificates
- Configure proper CORS settings
- Enable rate limiting
- Set up monitoring and alerting

See `COMPLETE_SETUP.md` for full production setup guide.

## License

Copyright (c) 2025, Maya Research

## Important Notes

### WebSocket Disabled
WebSocket transport has been permanently disabled in favor of WebRTC for:
- Lower latency
- Better audio quality
- NAT traversal support
- Industry-standard implementation

See `WEBSOCKET_DISABLED.md` for full details on the migration.

### Zero-Buffering TTS
The TTS integration has been optimized for immediate streaming:
- No 20ms frame buffering
- Chunks sent as received from API
- WebRTC handles packetization
- Minimum latency configuration

## Support

For issues and questions, please open an issue on GitHub.

