Main Speaker Extraction Dashboard
YouTube clip 2:13–30:13 · pyannote.ai precision-2 diarization · SoloSpeech target speaker extraction (default 200 steps × 4 candidates)
Original full clip (28:00)
clip.wav16 kHz mono · 1680 s
Clean main speaker (concat, no overlaps)
speaker_00_clean.wav12:08 · 278 segments · 16 kHz
Other speaker (for context)
speaker_01_clean.wav2:27 · 75 segments
Reference enrollment (used by SoloSpeech)
main_speaker_ref.wav9.02 s · longest clean SPEAKER_00 turn @ 201.26 s
A/B test — leakage fixes (5 sample clips)
Compare 3 approaches for removing residual SPEAKER_01 backchannels.
Pick the winner and I'll apply it to all 156 clips.
Overlap regions — original · extracted · FINAL
sort by: