# Maya3 Transcription Pipeline

Audio transcription pipeline for Indian languages using Google Gemini AI.

## Production Configuration

```
Model: gemini-3-flash-preview
Temperature: 0.0 (deterministic)
Thinking: low (fast, prevents loops)
Scoring: CTC-based alignment (0.91 mean accuracy on Telugu)
```

## Quick Start

```bash
# Setup
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Transcribe with scoring
python transcription_utils.py audio.flac --language Telugu
```

## Python API

```python
from transcription_utils import transcribe_audio, transcribe_with_scoring

# Basic transcription
result = transcribe_audio("audio.flac", language="Telugu")
print(result['native_transcription'])
print(result['romanized'])

# With alignment scoring (recommended)
result = transcribe_with_scoring("audio.flac", language="Telugu")
print(f"Transcription: {result['native_transcription']}")
print(f"Score: {result['alignment_score']:.2f}")  # 0-1
print(f"Quality: {result['quality']}")  # high/medium/low
```

## Output Format

```json
{
  "native_transcription": "నాకు కొన్ని యాడ్స్ గుర్తుంటాయి",
  "native_with_punctuation": "నాకు కొన్ని యాడ్స్ గుర్తుంటాయి.",
  "code_switch": "నాకు కొన్ని ads గుర్తుంటాయి",
  "romanized": "naaku konni ads gurtuntaayi",
  "alignment_score": 0.8449,
  "quality": "high",
  "_metadata": {
    "model": "gemini-3-flash-preview",
    "processing_time_sec": 2.56
  }
}
```

## Alignment Scoring

CTC-based scoring validates transcription quality:

| Score Range | Quality | Action |
|-------------|---------|--------|
| ≥ 0.8 | High | Accept automatically |
| 0.7 - 0.8 | Medium | Accept with flag |
| < 0.7 | Low | Review manually |

**Test Results (20 Telugu segments):**
- Mean Score: **0.91**
- Min Score: **0.84**
- 75% scored ≥ 0.9

```python
# Direct scoring API
from src.validators import AlignmentScorer

scorer = AlignmentScorer(language="te")
result = scorer.score_transcription("audio.flac", "transcription text")
print(f"Score: {result.alignment_score:.4f}")
print(f"Word confidences: {[w.confidence for w in result.word_scores]}")
scorer.cleanup()
```

## Project Structure

```
maya3_transcribe/
├── transcription_utils.py      # Main API (production)
├── pipeline.py                 # Full pipeline orchestrator
├── requirements.txt
│
├── src/
│   ├── backend/                # Core modules
│   │   ├── config.py           # Environment config
│   │   ├── r2_storage.py       # R2 cloud storage
│   │   ├── supabase_client.py  # Supabase DB
│   │   ├── audio_processor.py  # Audio segmentation
│   │   └── gemini_transcriber.py
│   │
│   └── validators/             # Quality validation
│       ├── alignment_scorer.py # Primary: CTC scoring
│       ├── indicmfa_validator.py # Secondary: MFA alignment
│       └── indic_conformer_validator.py # Optional: ASR
│
└── analysis_results/
    ├── final_analysis.json     # Model comparison
    └── scoring_results.json    # Alignment scores
```

## Model Analysis Summary

| Model | Speed (temp=0) | Notes |
|-------|----------------|-------|
| **gemini-3-flash-preview** | 2.8s | ✅ Production choice |
| gemini-2.5-flash | 3.7s | Good alternative |
| gemini-3-pro-preview | 6.8s | Premium quality |

## Environment Variables

```bash
# .env file
GEMINI_KEY=your_gemini_api_key
R2_ENDPOINT_URL=your_r2_endpoint
R2_BUCKET=your_bucket
R2_ACCESS_KEY_ID=your_access_key
R2_SECRET_ACCESS_KEY=your_secret
URL=your_supabase_url
SUPABASE_ADMIN=your_supabase_key
```

## Dependencies

```bash
pip install boto3 google-genai pydantic supabase pydub
pip install torch torchaudio transformers soundfile
```