TLexDR

Methodology

How TLexDR turns a 3-hour conversation into a 5-minute read — and what we do to keep it honest.

The pipeline, in four steps

Every episode goes through the same pipeline. None of it is hand-edited; the operator only intervenes when something fails or a reader reports an issue.

  1. Transcript acquisition. We prefer Lex Fridman's own human-written transcripts when they're published on lexfridman.com — they're speaker-labeled, chapter-anchored, and verbatim. When no published transcript exists, we fall back to YouTube auto-captions, and finally to OpenAI Whisper for the audio we can't get any other way.
  2. Chunking. Long transcripts are split into ~3,000-token segments along natural chapter boundaries. Each chunk is summarized independently.
  3. Synthesis. A second pass stitches the chunk summaries into the structured JSON that drives the rest of the site — core takeaways, themes, quotes, references, jargon glossary, surprising moments, and a novelty score.
  4. Verification. Before the page renders, we run a filter pass: quotes are checked verbatim against the transcript, references must resolve to a real URL, and "Speaker N" / "Unknown" labels are filtered out of the visible output.

What we verify

  • Quotes are verbatim. Every displayed quote is matched back to the source transcript. Approximations get cut.
  • References resolve. Books, papers, links — if we can't find a real source for what the guest cited, we don't print it as a fact.
  • Speakers are people, not placeholders. The Whisper transcription path occasionally emits "Speaker 0", "Speaker 1". Those get filtered before render.
  • Topic coverage matches the title. If the episode title names a topic, the summary has to cover it. We flag mismatches as quality regressions.
  • Cost is bounded. Per-episode and daily caps stop runaway spend before it lands.

What AI can and can't do

Trust: the structure (which topics came up, who said what, what was quoted), the references when they resolve, and the timestamps when chapter markers are anchored to the transcript.

Double-check: the spirit of takeaways — AI sometimes summarises an exchange in a way that's directionally accurate but loses the speaker's actual nuance. For anything you'd cite, follow the source link back to the transcript.

Don't trust: claims of fact about people or events that you can't independently verify. AI can hallucinate names, dates, and authorship. If something in a summary contradicts what you know, the source transcript wins.

The stack

For the technically curious, the pipeline runs on:

  • OpenAI Whisper (transcription fallback), GPT-4o-mini (chunk pass), GPT-4o (synthesis), text-embedding-3-small (concept embeddings & related-episode recall).
  • Supabase Postgres (with pgvector for vector search) for persistence and auth.
  • Flask + APScheduler on Render for the web + worker processes.
  • Plausible for privacy-friendly analytics. No third-party trackers.

Reporting errors

Every episode page has a Report an inaccuracy link at the bottom of the summary. Submissions go into a human-review queue. We don't auto-correct — an inaccuracy report triggers a manual re-read, not a re-run of the AI. If you're a researcher and want to dig deeper, email support@tlexdr.com.

Related

About TLexDR · Privacy