Main Speaker Extraction Dashboard

YouTube clip 2:13–30:13 · pyannote.ai precision-2 diarization · SoloSpeech target speaker extraction (default 200 steps × 4 candidates)

Original full clip (28:00)

clip.wav16 kHz mono · 1680 s

Clean main speaker (concat, no overlaps)

speaker_00_clean.wav12:08 · 278 segments · 16 kHz

Other speaker (for context)

speaker_01_clean.wav2:27 · 75 segments

Reference enrollment (used by SoloSpeech)

main_speaker_ref.wav9.02 s · longest clean SPEAKER_00 turn @ 201.26 s

A/B test — leakage fixes (5 sample clips)

Compare 3 approaches for removing residual SPEAKER_01 backchannels. Pick the winner and I'll apply it to all 156 clips.

Overlap regions — original · extracted · FINAL

sort by: