🎙️ Speaker Identifier

Finds unique speakers across multiple HF audio datasets. Each dataset is assumed to have one speaker (e.g. an audiobook). The app fetches audio directly via the datasets-server API (no full parquet download), downloads in parallel, then embeds clips across 8 parallel workers.

Model: microsoft/wavlm-base-plus-sv — language-agnostic speaker embeddings, works for any language.


How to use

  1. Paste dataset repo IDs (one per line, owner/name format) into the left box.
  2. Adjust parameters if needed (defaults work well for audiobooks):
    • Samples per book — audio chunks to average per dataset. More = more robust, slower. 3 is usually enough.
    • Audio length (sec) — seconds of each chunk to use for embedding. 5 sec is sufficient for a clear voice.
    • Same-speaker threshold — cosine similarity cutoff. Raise if too many books merge into one speaker; lower if one person gets split across IDs.
    • HF Token — only needed for private repos.
  3. Click Identify Speakers. Downloads run in parallel, then all clips are embedded in one batch — expect ~20–40 sec for 35 books.

Output columns

Column Meaning
dataset Repo name (short)
speaker_id Cluster label — same ID = same voice
books_with_speaker How many books share this speaker
intra_sim Avg cosine similarity within cluster (1.0 = only one book; lower = cluster is less tight)
closest_match Most similar other book and similarity score

Tip: Sort by speaker_id to see all books by the same narrator grouped together.

The Errors / Timing box shows per-dataset timing and batch embed stats — useful for diagnosing slow datasets.


1 10
2 30
0.6 0.98

Results — sorted by speaker_id

Results — sorted by speaker_id