# Veena3 Modal TTS - Cleanup & Migration Plan

## Current State Analysis

### Architecture Overview

```
spark/ (current repo)
├── veena3modal/           # Modal TTS application (KEEP - this becomes the new repo)
├── veena3srv/             # Django app (LEGACY - extract needed parts)
├── external/
│   ├── sparktts/          # Spark TTS BiCodec model (EXTERNAL DEPENDENCY)
│   └── AP-BWE/            # Super-resolution model (EXTERNAL DEPENDENCY)
└── models/                # Model weights on Modal Volume (NOT in repo)
```

### Dependencies from veena3srv (What Modal Imports)

| File | Lines | Purpose | Action |
|------|-------|---------|--------|
| **services/model_loader.py** | 192 | vLLM engine init, tokenizer | ✅ MOVE to veena3modal |
| **services/pipeline.py** | 241 | BiCodec token extraction, audio generation | ✅ MOVE to veena3modal |
| **services/bicodec_decoder.py** | 303 | Decode BiCodec tokens to audio | ✅ MOVE to veena3modal |
| **services/streaming_pipeline.py** | 830 | Streaming with sliding window | ✅ MOVE to veena3modal |
| **services/super_resolution.py** | 341 | 16kHz → 48kHz upsampling | ✅ MOVE to veena3modal |
| **services/long_text_processor.py** | 394 | Text chunking + stitching | ✅ MOVE to veena3modal |
| **utils/indic_prompt_builder.py** | 287 | Build Spark TTS prompts | ✅ MOVE to veena3modal |
| **utils/text_normalizer.py** | 869 | Number/date expansion | ✅ MOVE to veena3modal |
| **utils/text_chunker.py** | 428 | Indic-aware sentence splitting | ✅ MOVE to veena3modal |
| **utils/audio_utils.py** | 69 | WAV header creation | ✅ MOVE to veena3modal |
| **utils/audio_fade.py** | 97 | Crossfade for chunk stitching | ✅ MOVE to veena3modal |
| **utils/emotion_normalizer.py** | 178 | <emotion> → [emotion] | ✅ MOVE to veena3modal |
| **constants.py** | 162 | Speakers, tokens, config | ✅ MOVE to veena3modal |
| **services/snac_decoder.py** | 533 | Legacy SNAC decoder | ❌ DELETE (not used) |
| **utils/audio_encoder.py** | 408 | Opus/MP3/FLAC encoding | ⚠️ OPTIONAL (fallback to WAV) |

**Total Lines to Move: ~4,200 lines (excluding optional)**

### External Dependencies

| Dependency | Purpose | How to Handle |
|------------|---------|---------------|
| `external/sparktts/` | BiCodec model code | ✅ KEEP (MIT license, copy to new repo) |
| `external/AP-BWE/` | Super-resolution | ✅ KEEP (Apache 2.0, copy to new repo) |

### Processing Pipeline Steps

```
Input Text
    │
    ├─→ [Text Normalization] ─→ Numbers/dates to words (text_normalizer.py)
    │
    ├─→ [Emotion Normalization] ─→ <laugh> → [laughs] (emotion_normalizer.py)
    │
    ├─→ [Text Chunking] ─→ Split long text (text_chunker.py + long_text_processor.py)
    │       └─→ Indic-aware sentence boundaries
    │
    ├─→ [Prompt Building] ─→ Spark TTS format (indic_prompt_builder.py)
    │       └─→ <|task_controllable_tts|><|start_content|>...<|speaker_X|>...
    │
    ├─→ [vLLM Generation] ─→ Generate BiCodec tokens (model_loader.py)
    │
    ├─→ [Token Extraction] ─→ Regex extract semantic/global IDs (pipeline.py)
    │
    ├─→ [BiCodec Decode] ─→ Tokens → 16kHz audio (bicodec_decoder.py)
    │
    ├─→ [Audio Stitching] ─→ Crossfade chunks (audio_fade.py)
    │
    └─→ [Super Resolution] ─→ 16kHz → 48kHz (super_resolution.py) [OPTIONAL]
           └─→ AP-BWE neural upsampling
```

---

## New Repository Structure

```
veena3-tts/                     # New repo name
├── README.md                   # Project overview, API docs
├── LICENSE
├── pyproject.toml              # Python package config
├── requirements.txt
│
├── veena3modal/
│   ├── __init__.py
│   ├── app.py                  # Modal entrypoint
│   │
│   ├── core/                   # Core TTS components (from veena3srv)
│   │   ├── __init__.py
│   │   ├── constants.py        # Speakers, tokens, config
│   │   ├── model_loader.py     # vLLM engine init
│   │   ├── pipeline.py         # Non-streaming generation
│   │   ├── streaming_pipeline.py # Streaming generation
│   │   ├── bicodec_decoder.py  # Token → audio decoding
│   │   └── super_resolution.py # 16kHz → 48kHz
│   │
│   ├── processing/             # Text processing
│   │   ├── __init__.py
│   │   ├── text_normalizer.py  # Number/date expansion
│   │   ├── text_chunker.py     # Indic-aware chunking
│   │   ├── long_text_processor.py
│   │   ├── emotion_normalizer.py
│   │   └── prompt_builder.py   # Spark TTS prompts
│   │
│   ├── audio/                  # Audio utilities
│   │   ├── __init__.py
│   │   ├── utils.py            # WAV header, format helpers
│   │   ├── crossfade.py        # Chunk stitching
│   │   └── encoder.py          # Opus/MP3/FLAC (optional)
│   │
│   ├── api/                    # FastAPI endpoints
│   │   ├── __init__.py
│   │   ├── fastapi_app.py
│   │   ├── schemas.py
│   │   ├── auth.py
│   │   ├── rate_limiter.py
│   │   ├── error_handlers.py
│   │   └── websocket_handler.py
│   │
│   ├── services/               # Business logic
│   │   ├── __init__.py
│   │   ├── tts_runtime.py      # Singleton runtime
│   │   ├── sentence_store.py   # Supabase logging
│   │   └── credits.py
│   │
│   └── shared/                 # Shared utilities
│       ├── __init__.py
│       ├── logging.py
│       └── metrics.py
│
├── external/                   # Third-party model code
│   ├── sparktts/               # BiCodec implementation
│   │   ├── models/
│   │   │   ├── __init__.py
│   │   │   ├── audio_tokenizer.py
│   │   │   └── bicodec.py
│   │   └── modules/            # Neural network components
│   │
│   └── ap_bwe/                 # Super-resolution
│       ├── models/
│       │   └── model.py
│       ├── datasets/
│       │   └── dataset.py
│       └── env.py
│
└── tests/
    ├── unit/
    ├── integration/
    └── modal_live/             # Live endpoint tests
```

---

## Migration Steps

### Phase 1: Prepare New Structure (Day 1)

1. **Create new directory structure**
   ```bash
   mkdir -p veena3modal/{core,processing,audio}
   ```

2. **Move veena3srv code to veena3modal/core/**
   - `constants.py`
   - `model_loader.py`
   - `pipeline.py`
   - `streaming_pipeline.py`
   - `bicodec_decoder.py`
   - `super_resolution.py`

3. **Move utils to veena3modal/processing/**
   - `text_normalizer.py`
   - `text_chunker.py`
   - `long_text_processor.py`
   - `emotion_normalizer.py`
   - `indic_prompt_builder.py` → `prompt_builder.py`

4. **Move audio utils to veena3modal/audio/**
   - `audio_utils.py` → `utils.py`
   - `audio_fade.py` → `crossfade.py`

### Phase 2: Update Imports (Day 1-2)

1. **Update all imports** from `from apps.inference.*` to `from veena3modal.core.*`

2. **Update tts_runtime.py** to use new import paths

3. **Update Modal app.py** to remove veena3srv from PYTHONPATH

4. **Test locally** before deploying

### Phase 3: Clean External Dependencies (Day 2)

1. **Simplify sparktts/**
   - Only keep: `models/audio_tokenizer.py`, `models/bicodec.py`, `modules/`
   - Remove CLI tools, unnecessary files

2. **Simplify AP-BWE/**
   - Only keep: `models/model.py`, `datasets/dataset.py`, `env.py`, `configs/`
   - Remove training code, docs, samples

### Phase 4: Create New Repository (Day 2-3)

1. **Initialize new git repo**
   ```bash
   cd veena3-tts
   git init
   git remote add origin <new-repo-url>
   ```

2. **Copy cleaned files** (not git history)

3. **Add proper README, LICENSE**

4. **Update Modal deployment**
   - Point to new repo structure
   - Update PYTHONPATH in image

### Phase 5: Verify & Deploy (Day 3)

1. **Run all tests**
   ```bash
   pytest tests/ -v
   ```

2. **Deploy to Modal**
   ```bash
   modal deploy veena3modal/app.py
   ```

3. **Verify endpoints**
   - Health check
   - Generate (streaming/non-streaming)
   - SR functionality

4. **Update documentation**

---

## Files NOT Needed (Can Delete)

### From veena3srv/
- `apps/api/` - Django REST views (replaced by FastAPI)
- `apps/inference/services/snac_decoder.py` - Legacy SNAC (replaced by BiCodec)
- `apps/inference/services/model_singleton.py` - Django singleton pattern
- `apps/inference/services/logits_processor.py` - Not used in Spark TTS
- `apps/inference/utils/request_logger.py` - Django-specific
- `apps/inference/utils/prompt_builder.py` - Replaced by indic_prompt_builder
- `apps/inference/utils/spark_prompt_builder.py` - Duplicate
- `apps/inference/utils/emotion_mapper.py` - Legacy
- All Django config (`settings/`, `manage.py`, `urls.py`, etc.)

### From root/
- `django_test.py` - Django test runner
- `.cursor/` - IDE config
- `test_*.py` files (move to tests/)
- `models/` - These are on Modal volume, not in repo

---

## Import Path Changes

| Old Import | New Import |
|------------|------------|
| `from apps.inference.services.model_loader import SparkTTSModel` | `from veena3modal.core.model_loader import SparkTTSModel` |
| `from apps.inference.services.pipeline import SparkTTSPipeline` | `from veena3modal.core.pipeline import SparkTTSPipeline` |
| `from apps.inference.services.bicodec_decoder import BiCodecDecoder` | `from veena3modal.core.bicodec_decoder import BiCodecDecoder` |
| `from apps.inference.services.streaming_pipeline import Veena3SlidingWindowPipeline` | `from veena3modal.core.streaming_pipeline import Veena3SlidingWindowPipeline` |
| `from apps.inference.services.super_resolution import SuperResolutionService` | `from veena3modal.core.super_resolution import SuperResolutionService` |
| `from apps.inference.services.long_text_processor import LongTextProcessor` | `from veena3modal.processing.long_text_processor import LongTextProcessor` |
| `from apps.inference.utils.indic_prompt_builder import IndicPromptBuilder` | `from veena3modal.processing.prompt_builder import IndicPromptBuilder` |
| `from apps.inference.utils.text_normalizer import normalize_text` | `from veena3modal.processing.text_normalizer import normalize_text` |
| `from apps.inference.utils.text_chunker import IndicSentenceChunker` | `from veena3modal.processing.text_chunker import IndicSentenceChunker` |
| `from apps.inference.utils.audio_utils import create_wav_header` | `from veena3modal.audio.utils import create_wav_header` |
| `from apps.inference.utils.audio_fade import crossfade_bytes_int16` | `from veena3modal.audio.crossfade import crossfade_bytes_int16` |
| `from apps.inference.constants import *` | `from veena3modal.core.constants import *` |

---

## Checklist

### Pre-Migration
- [ ] All 50 tests passing
- [ ] SR working with chunking
- [ ] Document current working state

### Migration
- [ ] Create new directory structure
- [ ] Copy files (not move, to keep backup)
- [ ] Update all import statements
- [ ] Update Modal app.py PYTHONPATH
- [ ] Test locally with `modal serve`

### Post-Migration
- [ ] Run full test suite
- [ ] Deploy to Modal
- [ ] Verify all endpoints work
- [ ] Create new GitHub repo
- [ ] Push clean code
- [ ] Update Modal deployment source

### Cleanup Old Repo
- [ ] Archive old repo
- [ ] Remove veena3srv from new repo
- [ ] Remove unused files
- [ ] Final documentation

---

## Questions to Resolve

1. **New repo name?** `veena3-tts`, `spark-tts-modal`, `maya-tts`?
2. **Keep Django app?** Or fully remove and only keep Modal?
3. **Model weights storage?** Currently on Modal Volume - keep as is?
4. **CI/CD?** GitHub Actions for tests?
5. **Versioning?** Semantic versioning for API?

---

## Timeline Estimate

| Phase | Duration | Tasks |
|-------|----------|-------|
| Phase 1 | 2-3 hours | Create structure, move files |
| Phase 2 | 2-3 hours | Update imports, fix dependencies |
| Phase 3 | 1-2 hours | Clean external dependencies |
| Phase 4 | 1 hour | Create new repo, push |
| Phase 5 | 1-2 hours | Deploy, verify, document |

**Total: ~8-12 hours**

---

*Created: Dec 25, 2025*
*Status: Ready for Review*