Tutorial: Extracting Structured Data
Why Use Structured Output?
LLMs are great at generating text, but sometimes you need data in a machine-readable format like JSON. Structured output allows you to define a schema (using Rust structs with Serde) and force the LLM to return data that matches that schema.
This is essential for:
- Data extraction from unstructured text
- Building API integrations
- Populating databases
- ensuring reliable downstream processing
Getting Started
Let’s build an example that extracts user information from a natural language description.
1. Define Your Data Schema
We use serde to define the structure we want.
#![allow(unused)]
fn main() {
use serde::{Deserialize, Serialize};
use schemars::JsonSchema;
#[derive(Debug, Serialize, Deserialize, JsonSchema)]
struct UserInfo {
name: String,
age: u32,
interests: Vec<String>,
}
}
2. Initialize the Broker
#![allow(unused)]
fn main() {
use mojentic::prelude::*;
use std::sync::Arc;
let gateway = Arc::new(OllamaGateway::new());
let broker = LlmBroker::new("qwen3:32b", gateway, None);
}
3. Generate Structured Data
Use broker.generate_structured to request the data.
#![allow(unused)]
fn main() {
let text = "John Doe is a 30-year-old software engineer who loves hiking and reading.";
let user_info: UserInfo = broker.generate_structured(text).await?;
println!("{:?}", user_info);
// UserInfo {
// name: "John Doe",
// age: 30,
// interests: ["hiking", "reading"]
// }
}
How It Works
- Schema Definition: Mojentic uses
schemarsto convert your Rust struct into a JSON schema that the LLM can understand. - Prompt Engineering: The broker automatically appends instructions to the prompt, telling the LLM to output JSON matching the schema.
- Validation: When the response comes back, Mojentic parses the JSON and deserializes it into your struct, performing validation (e.g., ensuring
ageis a number).
Advanced: Nested Schemas
You can also use nested structs for more complex data.
#![allow(unused)]
fn main() {
#[derive(Debug, Serialize, Deserialize, JsonSchema)]
struct Address {
street: String,
city: String,
}
#[derive(Debug, Serialize, Deserialize, JsonSchema)]
struct UserProfile {
name: String,
address: Address,
}
}
Summary
Structured output turns unstructured text into reliable data structures. By defining Rust structs, you can integrate LLM outputs directly into your application’s logic with type safety and validation.