Image Analysis

Vision-capable models can analyze images by passing image file paths along with text prompts. The framework automatically handles reading image files and encoding them as Base64 for transmission to the LLM.

Basic Usage

Attach images to a message using the with_images() method:

#![allow(unused)]
fn main() {
use mojentic::prelude::*;

let message = LlmMessage::user("Describe this image")
    .with_images(vec!["/path/to/image.jpg".to_string()]);
}

You can attach multiple images to a single message:

#![allow(unused)]
fn main() {
let message = LlmMessage::user("Compare these images")
    .with_images(vec![
        "/path/to/image1.jpg".to_string(),
        "/path/to/image2.jpg".to_string(),
    ]);
}

Complete Example

use mojentic::prelude::*;
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<()> {
    // Create gateway and broker with a vision-capable model
    let gateway = OllamaGateway::new();
    let broker = LlmBroker::new("llava:latest", Arc::new(gateway), None);

    // Create a message with image
    let message = LlmMessage::user("What's in this image?")
        .with_images(vec!["path/to/image.jpg".to_string()]);

    // Generate response
    let response = broker.generate(&[message], None, None, None).await?;
    println!("{}", response);

    Ok(())
}

Vision-Capable Models

Common vision-capable models available through Ollama:

llava:latest - General-purpose vision model
bakllava:latest - BakLLaVA vision model
qwen3-vl:30b - Qwen3 vision-language model
gemma3:27b - Gemma 3 with vision support

Pull a model before using:

ollama pull llava:latest

How It Works

When you attach images to a message:

File Reading: The gateway reads the image file from the specified path
Base64 Encoding: The image bytes are encoded as Base64 using the base64 crate
API Transmission: The encoded image is included in the images field of the Ollama API request
Model Processing: The vision-capable model analyzes both the text prompt and image(s)

Error Handling

Image processing can fail if:

The image file doesn’t exist or isn’t readable
The file path is invalid
The model doesn’t support vision

Always handle errors when working with images:

#![allow(unused)]
fn main() {
match broker.generate(&[message], None, None, None).await {
    Ok(response) => println!("Response: {}", response),
    Err(e) => eprintln!("Error analyzing image: {}", e),
}
}

Supported Image Formats

The framework reads raw image bytes and passes them to the model. Supported formats depend on the specific model being used. Most vision models support common formats like JPEG and PNG.

Mojentic (Rust)