Overview
You can fully customize your agent's behavior at multiple nodes in the processing path. A node is a point in the path where one process transitions to another. Some example customizations include:
- Use a custom STT, LLM, or TTS provider without a plugin.
- Generate a custom greeting when an agent enters a session.
- Modify STT output to remove filler words before sending it to the LLM.
- Modify LLM output before sending it to TTS to customize pronunciation.
- Update the user interface when an agent or user finishes speaking.
The Agent
supports the following nodes and hooks. Some nodes are only available for STT-LLM-TTS pipeline models, and others are only available for realtime models.
Lifecycle hooks:
on_enter()
: Called after the agent becomes the active agent in a session.on_exit()
: Called before the agent gives control to another agent in the same session.on_user_turn_completed()
: Called when the user's turn has ended, before the agent's reply.
STT-LLM-TTS pipeline nodes:
stt_node()
: Transcribe input audio to text.llm_node()
: Perform inference and generate a new conversation turn (or tool call).tts_node()
: Synthesize speech from the LLM text output.
Realtime model nodes:
realtime_audio_output_node()
: Adjust output audio before publishing to the user.
Transcription node:
transcription_node()
: Adjust pipeline or realtime model transcription before sending to the user.
The following diagrams show the processing path for STT-LLM-TTS pipeline models and realtime models.
How to implement
Override the method within a custom Agent
subclass to customize the behavior of your agent at a specific node in the processing path. To use the default, call Agent.default.<node-name>()
. For instance, this code overrides the STT node while maintaining the default behavior.
async def stt_node(self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings) -> Optional[AsyncIterable[stt.SpeechEvent]]:# insert custom before STT processing hereevents = Agent.default.stt_node(self, audio, model_settings)# insert custom after STT processing herereturn events
Lifecycle hooks
The following lifecycle hooks are available for customization.
On enter
The on_enter
node is called when the agent becomes the active agent in a session. Each session can have only one active agent at a time, which can be read from the session.agent
property. Change the active agent using Workflows.
For example, to greet the user:
async def on_enter(self):await self.session.generate_reply(instructions="Greet the user with a warm welcome",)
On exit
The on_exit
node is called before the agent gives control to another agent in the same session as part of a workflow. Use it to save data, say goodbye, or perform other actions and cleanup.
For example, to say goodbye:
async def on_exit(self):await self.session.generate_reply(instructions="Tell the user a friendly goodbye before you exit.",)
On user turn completed
The on_user_turn_completed
node is called when the user's turn has ended, before the agent's reply. Override this method to modify the content of the turn, cancel the agent's reply, or perform other actions.
To use the on_user_turn_completed
node with a realtime model, you must configure turn detection to occur in your agent instead of within the realtime model.
The node receives the following parameters:
turn_ctx
: The fullChatContext
, up to but not including the user's latest message.new_message
: The user's latest message, representing their current turn.
After the node is complete, the new_message
is added to the chat context.
One common use of this node is retrieval-augmented generation (RAG). You can retrieve context relevant to the newest message and inject it into the chat context for the LLM.
from livekit.agents import ChatContext, ChatMessageasync def on_user_turn_completed(self, turn_ctx: ChatContext, new_message: ChatMessage,) -> None:rag_content = await my_rag_lookup(new_message.text_content())turn_ctx.add_message(role="assistant",content=f"Additional information relevant to the user's next message: {rag_content}")
Additional messages added in this way are not persisted beyond the current turn. To permanently add messages to the chat history, use the update_chat_ctx
method:
async def on_user_turn_completed(self, turn_ctx: ChatContext, new_message: ChatMessage,) -> None:rag_content = await my_rag_lookup(new_message.text_content())turn_ctx.add_message(role="assistant", content=rag_content)await self.update_chat_ctx(turn_ctx)
You can also edit the new_message
object to modify the user's message before it's added to the chat context. For example, you can remove offensive content or add additional context. These changes are persisted to the chat history going forward.
async def on_user_turn_completed(self, turn_ctx: ChatContext, new_message: ChatMessage,) -> None:new_message.content = ["... modified message ..."]
To abort generation entirely—for example, in a push-to-talk interface—you can do the following:
async def on_user_turn_completed(self, turn_ctx: ChatContext, new_message: ChatMessage,) -> None:if not new_message.text_content:# for example, raise StopResponse to stop the agent from generating a replyraise StopResponse()
For a complete example, see the multi-user agent with push to talk example.
STT-LLM-TTS pipeline nodes
The following nodes are available for STT-LLM-TTS pipeline models.
STT node
The stt_node
transcribes audio frames into speech events, converting user audio input into text for the LLM. By default, this node uses the Speech-To-Text (STT) capability from the current agent. If the STT implementation doesn't support streaming natively, a Voice Activity Detection (VAD) mechanism wraps the STT.
You can override this node to implement:
- Custom pre-processing of audio frames
- Additional buffering mechanisms
- Alternative STT strategies
- Post-processing of the transcribed text
To use the default implementation, call Agent.default.stt_node()
.
This example adds a noise filtering step:
from livekit import rtcfrom livekit.agents import ModelSettings, stt, Agentfrom typing import AsyncIterable, Optionalasync def stt_node(self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings) -> Optional[AsyncIterable[stt.SpeechEvent]]:async def filtered_audio():async for frame in audio:# insert custom audio preprocessing hereyield frameasync for event in Agent.default.stt_node(self, filtered_audio(), model_settings):# insert custom text postprocessing hereyield event
LLM node
The llm_node
is responsible for performing inference based on the current chat context and creating the agent's response or tool calls. It may yield plain text (as str
) for straightforward text generation, or llm.ChatChunk
objects that can include text and optional tool calls. ChatChunk
is helpful for capturing more complex outputs such as function calls, usage statistics, or other metadata.
You can override this node to:
- Customize how the LLM is used
- Modify the chat context prior to inference
- Adjust how tool invocations and responses are handled
- Implement a custom LLM provider without a plugin
To use the default implementation, call Agent.default.llm_node()
.
from livekit.agents import ModelSettings, llm, FunctionTool, Agentfrom typing import AsyncIterableasync def llm_node(self,chat_ctx: llm.ChatContext,tools: list[FunctionTool],model_settings: ModelSettings) -> AsyncIterable[llm.ChatChunk]:# Insert custom preprocessing hereasync for chunk in Agent.default.llm_node(self, chat_ctx, tools, model_settings):# Insert custom postprocessing hereyield chunk
TTS node
The tts_node
synthesizes audio from text segments, converting the LLM output into speech. By default, this node uses the Text-To-Speech capability from the agent. If the TTS implementation doesn't support streaming natively, it uses a sentence tokenizer to split text for incremental synthesis.
You can override this node to:
- Provide different text chunking behavior
- Implement a custom TTS engine
- Add custom pronunciation rules
- Adjust the volume of the audio output
- Apply any other specialized audio processing
To use the default implementation, call Agent.default.tts_node()
.
from livekit.agents import ModelSettings, rtc, Agentfrom typing import AsyncIterableasync def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings) -> AsyncIterable[rtc.AudioFrame]:# Insert custom text processing hereasync for frame in Agent.default.tts_node(self, text, model_settings):# Insert custom audio processing hereyield frame
Realtime model nodes
The following nodes are available for realtime models.
Realtime audio output node
The realtime_audio_output_node
is called when a realtime model outputs speech. This allows you to modify the audio output before it's sent to the user. For example, you can adjust the volume of the audio output.
To use the default implementation, call Agent.default.realtime_audio_output_node()
.
from livekit.agents import ModelSettings, rtc, Agentfrom typing import AsyncIterableasync def realtime_audio_output_node(self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings) -> AsyncIterable[rtc.AudioFrame]:# Insert custom audio preprocessing hereasync for frame in Agent.default.realtime_audio_output_node(self, audio, model_settings):# Insert custom audio postprocessing hereyield frame
Transcription node
The transcription_node
finalizes transcriptions from text segments. This node is part of the forwarding path for agent transcriptions and can be used to adjust or post-process text coming from an LLM (or any other source) into a final transcribed form.
By default, the node simply passes the transcription to the task that forwards it to the designated output. You can override this node to:
- Clean up formatting
- Fix punctuation
- Strip unwanted characters
- Perform any other text transformations
To use the default implementation, call Agent.default.transcription_node()
.
from livekit.agents import ModelSettingsfrom typing import AsyncIterableasync def transcription_node(self, text: AsyncIterable[str], model_settings: ModelSettings) -> AsyncIterable[str]:async for delta in text:yield delta.replace("😘", "")
Examples
The following examples demonstrate advanced usage of nodes and hooks:
Restaurant Agent
A restaurant front-of-house agent demonstrates the on_enter
and on_exit
lifecycle hooks.
Structured Output
Handle structured output from the LLM by overriding the llm_node
and tts_node
.
Chain-of-thought agent
Build an agent for chain-of-thought reasoning using the llm_node
to clean the text before TTS.
Keyword Detection
Use the stt_node
to detect keywords in the user's speech.
LLM Content Filter
Implement content filtering in the llm_node
.
Speedup Output Audio
Speed up the output audio of an agent with the tts_node
or realtime_audio_output_node
.