New Lex Fridman Insight: Dan Kokotov: Speech Recognition with AI and Humans
Sent June 11, 2026
Key Insights
- Rev's ASR has a 14% word error rate, while human transcription is around 2-3%.
- Rev.ai focuses on automatic speech recognition to improve transcription efficiency.
- Machine translation between English and Russian is complex due to structural differences.
- Podcasting is valued for its depth and potential for human connection.
- Joe Rogan's $100M Spotify deal highlights the tension between exclusivity and open-source podcasting.
How the conversation moved
The episode begins with a discussion on Rev's approach to transcription and translation services, highlighting the evolution of Rev.ai as a key player in automatic speech recognition (ASR). Dan Kokotov explains how Rev was founded to improve on the Upwork model by simplifying the hiring process for transcription and translation services. The conversation sets the stage for a deeper dive into the technical challenges and business strategies that Rev employs to maintain its competitive edge in the market.
Dan Kokotov delves into the complexities of machine translation, particularly between English and Russian, due to structural differences like inflections and gender. He shares personal anecdotes about his father's work in translating poetry, which underscores the challenges of capturing the nuances and cultural context in translations. The conversation shifts to the importance of optimizing for long-term user happiness rather than short-term engagement, suggesting a more sustainable approach to business growth.
Despite the rich discussion, there is a noticeable lack of pushback from Lex on some of Kokotov's claims, particularly regarding the superiority of Rev's ASR technology. The conversation could have benefitted from a deeper exploration of the limitations and potential biases inherent in ASR systems, as well as a comparison with competitors. This absence of counter-arguments leaves some of Kokotov's statements unchallenged, which might have provided a more balanced view of the technology's capabilities.
The episode concludes with a broader reflection on the role of podcasting as a medium for deep human connection and the philosophical implications of creation. Kokotov emphasizes the value of long-form conversations in podcasting, contrasting it with the superficial nature of clickbait journalism. The discussion also touches on the impact of exclusivity deals in the podcasting industry, using Joe Rogan's Spotify deal as a case study to highlight tensions between monetization and the open-source spirit of podcasting.
Surprising moments
In-depth
Speech Recognition
- Rev.ai's focus on ASR aims to enhance transcription efficiency.
- ASR technology still lags behind human transcription in accuracy.
Translation Challenges
- Machine translation between English and Russian is complex due to structural differences.
- Translation of poetry is particularly challenging due to nuances and cultural context.
Podcasting as a Medium
- Podcasting allows for deep, nuanced conversations.
- Exclusivity deals like Joe Rogan's highlight tensions in the podcasting industry.
Notable Quotes
I'm allergic to the word brand.
Still open
- What are the specific limitations of Rev's ASR technology compared to its competitors?
- How can machine translation systems better handle the nuances of poetry and cultural context?