🎙️ Speaker Identifier
Finds unique speakers across multiple HF audio datasets. Each dataset is assumed to have one speaker (e.g. an audiobook). The app fetches audio directly via the datasets-server API (no full parquet download), downloads in parallel, then embeds clips across 8 parallel workers.
Model: microsoft/wavlm-base-plus-sv — language-agnostic speaker embeddings, works for any language.
How to use
- Paste dataset repo IDs (one per line,
owner/nameformat) into the left box. - Adjust parameters if needed (defaults work well for audiobooks):
- Samples per book — audio chunks to average per dataset. More = more robust, slower. 3 is usually enough.
- Audio length (sec) — seconds of each chunk to use for embedding. 5 sec is sufficient for a clear voice.
- Same-speaker threshold — cosine similarity cutoff. Raise if too many books merge into one speaker; lower if one person gets split across IDs.
- HF Token — only needed for private repos.
- Click Identify Speakers. Downloads run in parallel, then all clips are embedded in one batch — expect ~20–40 sec for 35 books.
Output columns
| Column | Meaning |
|---|---|
dataset |
Repo name (short) |
speaker_id |
Cluster label — same ID = same voice |
books_with_speaker |
How many books share this speaker |
intra_sim |
Avg cosine similarity within cluster (1.0 = only one book; lower = cluster is less tight) |
closest_match |
Most similar other book and similarity score |
Tip: Sort by speaker_id to see all books by the same narrator grouped together.
The Errors / Timing box shows per-dataset timing and batch embed stats — useful for diagnosing slow datasets.
1 10
2 30
0.6 0.98
Results — sorted by speaker_id