Streaming Responses
Streaming allows you to receive LLM responses chunk-by-chunk as they are generated, improving perceived latency for users.
Basic Streaming
Use broker.generate_stream to get a stream of chunks:
use mojentic::prelude::*;
use futures::StreamExt;
use std::sync::Arc;
#[tokio::main]
async fn main() -> Result<()> {
let gateway = Arc::new(OllamaGateway::new());
let broker = LlmBroker::new("qwen3:32b", gateway, None);
let messages = vec![LlmMessage::user("Tell me a story.")];
let mut stream = broker.generate_stream(&messages, None, None).await?;
while let Some(result) = stream.next().await {
match result {
Ok(chunk) => print!("{}", chunk),
Err(e) => eprintln!("Error: {}", e),
}
}
Ok(())
}
Streaming with Tools
Mojentic supports streaming even when tools are involved. The broker will pause streaming to execute tools and then resume streaming the final response.
#![allow(unused)]
fn main() {
use mojentic::tools::DateResolver;
let tools: Vec<Arc<dyn LlmTool>> = vec![Arc::new(DateResolver::new())];
let mut stream = broker.generate_stream(&messages, Some(tools), None).await?;
// The stream will contain text chunks.
// Tool execution happens transparently in the background.
while let Some(result) = stream.next().await {
// ...
}
}