# Maya3 Audio Diarization Pipeline

High-throughput, GPU-accelerated audio diarization and segmentation pipeline. Processes podcast audio from R2 storage, performs speaker diarization, music detection, and outputs clean speaker-segmented data.

## Performance

- **361x real-time** throughput (8x RTX 4090)
- **90+ hours** of audio processed in ~15 minutes
- Distributed processing via Supabase queue

## Architecture

```
R2 (Source Audio) → Supabase Queue → GPU Workers → R2 (Output .tar)
                         ↓
                    Status Updates
```

## Pipeline Stages

| Stage | Description |
|-------|-------------|
| 1. Download | Fetch audio from R2 bucket |
| 2. VAD | Voice activity detection (Silero) |
| 3. Chunking | Split into processable chunks |
| 4. Diarization | Speaker identification (PyAnnote) |
| 5. Embeddings | Speaker embedding extraction |
| 6. Clustering | Speaker clustering & refinement |
| 7. Music Detection | INA + PANNs hybrid detection |
| 8. Finalize | Quality filtering & validation |
| 9. Output | Generate segmented JSON |
| 10. Export | Upload .tar to R2 |

## Quick Start

### Prerequisites

- NVIDIA GPU with CUDA 12.1+
- Docker with NVIDIA Container Toolkit
- Hugging Face token (for PyAnnote models)
- Supabase project (for queue)
- Cloudflare R2 bucket (for storage)

### Setup

1. Clone and configure:
```bash
cp .env.example .env
# Edit .env with your credentials
```

2. Build container:
```bash
docker-compose build
```

3. Run workers:
```bash
docker-compose up -d
```

### Local Docker Smoke Test

On a machine with Docker + NVIDIA Container Toolkit:

```bash
chmod +x scripts/docker/smoke_test.sh
./scripts/docker/smoke_test.sh
```

Optional end-to-end test (process 1 Supabase-queued video on GPU 0 and validate TAR contents):

```bash
E2E=1 GPU_DEVICE=0 ./scripts/docker/smoke_test.sh
```

### Environment Variables

| Variable | Description |
|----------|-------------|
| `HF_TOKEN` | Hugging Face API token |
| `URL` | Supabase project URL |
| `SUPABASE_ADMIN` | Supabase service role key |
| `ACCOUNT_ID` | Cloudflare account ID |
| `API_TOKEN` | Cloudflare R2 API token |

## Local Development

```bash
# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run with Supabase queue
python massive_process.py --supabase-queue --output ./output
```

## Output Format

Each processed video produces a `.tar` containing:
- `{video_id}_segments.json` - Speaker-segmented data
- Metadata and quality statistics

## Scaling

Workers can be deployed across multiple machines. Each worker:
1. Claims videos from Supabase queue
2. Processes independently
3. Reports completion/failure back to Supabase

## Files

```
├── massive_process.py    # Main entry point
├── pipeline.py           # Core pipeline logic
├── src/
│   ├── config.py         # Configuration
│   ├── download.py       # R2 download
│   ├── vad.py            # Voice activity detection
│   ├── diarization.py    # Speaker diarization
│   ├── embeddings.py     # Speaker embeddings
│   ├── clustering.py     # Speaker clustering
│   ├── music_detection.py # Music detection
│   ├── export.py         # Output generation
│   ├── r2_client.py      # R2 storage client
│   ├── supabase_client.py # Supabase queue client
│   └── logger.py         # Clean logging
├── Dockerfile            # Container build
├── docker-compose.yml    # Container orchestration
└── requirements.txt      # Python dependencies
```

## License

MIT
