Skip to main content

AssemblyAI integration guide

How to use the AssemblyAI STT plugin for LiveKit Agents.

Overview

AssemblyAI provides a streaming STT service with high accuracy, realtime transcription. You can use the open source AssemblyAI plugin for LiveKit Agents to build voice AI with fast, accurate transcription.

Quick reference

This section provides a brief overview of the AssemblyAI STT plugin. For more information, see Additional resources.

Installation

Install the plugin from PyPI:

pip install "livekit-agents[assemblyai]~=1.0"

Authentication

The AssemblyAI plugin requires an AssemblyAI API key.

Set ASSEMBLYAI_API_KEY in your .env file.

Usage

Use AssemblyAI STT in an AgentSession or as a standalone transcription service. For example, you can use this STT in the Voice AI quickstart.

from livekit.plugins import assemblyai
session = AgentSession(
stt = assemblyai.STT(),
# ... vad, llm, tts, etc.
)

Parameters

This section describes some of the available parameters. See the plugin reference for a complete list of all available parameters.

format_turnsboolOptionalDefault: True

Whether to return formatted final transcripts. If enabled, formatted final transcripts are emitted shortly following an end-of-turn detection.

end_of_turn_confidence_thresholdfloatOptionalDefault: 0.7

The confidence threshold to use when determining if the end of a turn has been reached.

min_end_of_turn_silence_when_confidentintOptionalDefault: 160

The minimum duration of silence required to detect end of turn when confident.

max_turn_silenceintOptionalDefault: 2400

The maximum duration of silence allowed in a turn before end of turn is triggered.

Turn detection

AssemblyAI includes a custom phrase endpointing model that uses both audio and linguistic information to detect turn boundaries. To use this model for turn detection, set turn_detection="stt" in the AgentSession constructor. You should also provide a VAD plugin for responsive interruption handling.

session = AgentSession(
turn_detection="stt",
stt=assemblyai.STT(
end_of_turn_confidence_threshold=0.7,
min_end_of_turn_silence_when_confident=160,
max_turn_silence=2400,
),
vad=silero.VAD.load(), # Recommended for responsive interruption handling
# ... llm, tts, etc.
)

Additional resources

The following resources provide more information about using AssemblyAI with LiveKit Agents.