---
name: Pipeline Finalization Plan
overview: Assess pipeline readiness, wire in the AudioPolisher, fix two bugs, and identify what's left before production runs. No over-engineering - only changes that directly improve transcription quality.
todos:
  - id: wire-polisher
    content: Wire AudioPolisher into pipeline.py between Step 3 (AudioProcessor) and Step 4 (Transcribe). Add PipelineConfig.polish_audio flag (default True). Log polisher stats.
    status: pending
  - id: fix-english-ratio-bug
    content: "Fix simple_validator.py line 221: remove english_ratio reference from check_characters return and ValidationResult constructor. check_characters does not return this key."
    status: pending
  - id: fix-temperature-default
    content: Change TranscriptionConfig.temperature default from 1.0 to 0.0 in gemini_transcriber.py to match pipeline behavior and prevent accidental non-determinism.
    status: pending
  - id: add-resample-16k
    content: Add optional 16kHz resample to AudioPolisher.polish() - downsample from 48kHz before writing output. Reduces file size 3x, potentially helps Gemini internal encoder.
    status: pending
  - id: cleanup-test-artifacts
    content: Remove test wav/png files from project root (spectral_speaker_analysis.png, audio_polisher_v2_final.png, wav files, polished_test dirs).
    status: pending
---

# Pipeline Finalization Plan

## Current Pipeline (what exists and works)

```mermaid
flowchart LR
    R2[R2 Download] --> Lang[Supabase Language]
    Lang --> AP[AudioProcessor\nsplit >10s segments]
    AP --> Gemini[Gemini Transcribe\n4 output formats]
    Gemini --> Val[Validator\nchar + CTC alignment]
    Val --> JSON[JSON Output]
```

**All 5 stages work end-to-end.** Tested on video `pF_BQpHaIdU` (426 segments). The core is solid.

## What's missing: AudioPolisher is built but NOT wired in

The polisher ([src/backend/audio_polisher.py](src/backend/audio_polisher.py)) exists and is tested, but [pipeline.py](pipeline.py) never calls it. Audio goes straight from AudioProcessor to Gemini with no boundary cleanup.

**The fix is 10 lines** in `pipeline.py` between Step 3 and Step 4:

```python
# Step 3.5: Polish audio boundaries
from src.backend.audio_polisher import AudioPolisher
polisher = AudioPolisher()
polished_dir = os.path.join(segments_dir, "polished")
for chunk in chunks:
    result = polisher.polish(chunk.file_path, output_dir=polished_dir)
    if result.was_modified:
        chunk.file_path = result.output_path
```

## Two bugs to fix

**Bug 1: `english_ratio` reference in validator** - [src/validators/simple_validator.py](src/validators/simple_validator.py) line 221 references `char_check["english_ratio"]` but `check_characters()` never returns that key. This crashes validation on rejected transcriptions.

**Bug 2: Temperature default mismatch** - [src/backend/gemini_transcriber.py](src/backend/gemini_transcriber.py) line 29 defaults `TranscriptionConfig.temperature = 1.0`, but [pipeline.py](pipeline.py) line 54 sets `PipelineConfig.temperature = 0.0`. When pipeline runs, it correctly passes 0.0. But anyone using `GeminiTranscriber` directly (testing, etc.) silently gets temperature=1.0 which produces non-deterministic output.

## Honest assessment: What else can we do for audio quality?

**Evidence from existing transcriptions:**
- Gemini 2.5 Flash hallucinated "annamaata" from the cut artifact at the boundary
- Gemini 3 Flash correctly ignored the same artifact and transcribed the real content
- The polisher removes exactly these artifacts, which helps both models

**Worth doing (high impact, low effort):**
- Wire AudioPolisher into pipeline (10 lines, ~30ms/segment overhead, prevents hallucination)
- Resample to 16kHz before sending to Gemini - files are currently 48kHz (3x larger than needed, Gemini's internal audio encoder likely resamples anyway). Saves upload bandwidth and API latency.

**Not worth doing (Gemini handles it already):**
- Speaker embedding detection - complex, Gemini 3 already ignores mismatched content
- De-clipping - 87% of segments are clipped but transcriptions are fine
- Noise reduction - risky, may remove speech content
- MFCC-based spectral analysis - interesting for debugging but not actionable for auto-trimming

## What's left before production

1. **Wire polisher** into pipeline.py (this plan)
2. **Fix the two bugs** (this plan)
3. **Prompt tuning** (deferred per your request - separate session)
4. **Production run** on full video with finalized pipeline

The pipeline is functionally complete. These are finishing touches.