Image Analysis
Mojentic supports multimodal capabilities, allowing you to analyze images with vision-capable LLM models. This guide shows you how to send images to LLMs and get intelligent responses.
Overview
Vision-capable models can analyze images and answer questions about them. Mojentic provides utilities to encode images and send them alongside text prompts in a single message.
Requirements
- A vision-capable model (e.g.,
llava,bakllava,qwen3-vl,gemma3:27b) - Image files in common formats (JPEG, PNG, GIF, WebP, BMP, SVG)
Quick Example
import { LlmBroker, OllamaGateway, MessageRole, isOk } from 'mojentic';
import { imageContent, textContent } from 'mojentic/llm/utils/image';
const gateway = new OllamaGateway();
const broker = new LlmBroker('qwen3-vl:30b', gateway);
// Create a multimodal message with text and image
const message = {
role: MessageRole.User,
content: [
textContent('What do you see in this image?'),
imageContent('./path/to/image.jpg'),
],
};
const result = await broker.generate([message]);
if (isOk(result)) {
console.log(result.value);
}Image Utilities
imageToDataUri(filePath: string): string
Reads an image file and converts it to a base64 data URI suitable for LLM consumption.
import { imageToDataUri } from 'mojentic/llm/utils/image';
const dataUri = imageToDataUri('./photo.jpg');
// Returns: "data:image/jpeg;base64,/9j/4AAQSkZJRg..."Supported formats:
- JPEG (
.jpg,.jpeg) - PNG (
.png) - GIF (
.gif) - WebP (
.webp) - BMP (
.bmp) - SVG (
.svg)
imageContent(filePath: string)
Creates a ContentItem for an image, automatically handling the encoding.
import { imageContent } from 'mojentic/llm/utils/image';
const imageItem = imageContent('./diagram.png');
// Returns: { type: 'image_url', image_url: { url: 'data:image/png;base64,...' } }textContent(text: string)
Creates a ContentItem for text. Use this when combining text and images in the same message.
import { textContent } from 'mojentic/llm/utils/image';
const textItem = textContent('Describe this diagram:');
// Returns: { type: 'text', text: 'Describe this diagram:' }Multiple Images
You can include multiple images in a single message:
const message = {
role: MessageRole.User,
content: [
textContent('Compare these two images:'),
imageContent('./before.jpg'),
imageContent('./after.jpg'),
textContent('What are the key differences?'),
],
};Manual Construction
If you need more control, you can construct multimodal messages manually:
import { MessageRole } from 'mojentic';
import { readFileSync } from 'fs';
function imageToDataUri(filePath: string): string {
const imageBuffer = readFileSync(filePath);
const base64 = imageBuffer.toString('base64');
return `data:image/jpeg;base64,${base64}`;
}
const message = {
role: MessageRole.User,
content: [
{
type: 'text' as const,
text: 'What is in this image?',
},
{
type: 'image_url' as const,
image_url: {
url: imageToDataUri('./photo.jpg'),
},
},
],
};Vision-Capable Models
Ollama Models
Popular vision models available via Ollama:
- qwen3-vl:30b - Excellent vision understanding and text extraction
- llava:latest - General-purpose vision model
- bakllava:latest - Enhanced LLaVA variant
- gemma3:27b - Google's multimodal model
Pull a model with:
ollama pull qwen3-vl:30bComplete Example
Here's a complete working example that analyzes an image:
import { LlmBroker, OllamaGateway, MessageRole, isOk } from 'mojentic';
import { imageContent, textContent } from 'mojentic/llm/utils/image';
import { existsSync } from 'fs';
import { join } from 'path';
async function analyzeImage(imagePath: string, prompt: string) {
// Verify image exists
if (!existsSync(imagePath)) {
throw new Error(`Image not found: ${imagePath}`);
}
// Initialize gateway and broker
const gateway = new OllamaGateway();
const broker = new LlmBroker('qwen3-vl:30b', gateway);
// Create multimodal message
const message = {
role: MessageRole.User,
content: [
textContent(prompt),
imageContent(imagePath),
],
};
console.log(`Analyzing: ${imagePath}`);
console.log(`Prompt: ${prompt}\n`);
// Generate response
const result = await broker.generate([message]);
if (isOk(result)) {
console.log('Analysis:', result.value);
return result.value;
} else {
throw result.error;
}
}
// Usage
analyzeImage(
'./photo.jpg',
'Describe what you see in this image in detail.'
).catch(console.error);Use Cases
Document OCR
Extract text from images of documents, receipts, or signs:
const message = {
role: MessageRole.User,
content: [
textContent('Extract all text from this document in markdown format.'),
imageContent('./receipt.jpg'),
],
};Image Comparison
Compare multiple images to identify differences or similarities:
const message = {
role: MessageRole.User,
content: [
textContent('Compare these product photos and list the differences:'),
imageContent('./product_v1.jpg'),
imageContent('./product_v2.jpg'),
],
};Diagram Understanding
Analyze technical diagrams, charts, or visualizations:
const message = {
role: MessageRole.User,
content: [
textContent('Explain the architecture shown in this diagram:'),
imageContent('./system_architecture.png'),
],
};Accessibility
Generate alt text descriptions for images:
const message = {
role: MessageRole.User,
content: [
textContent('Generate a concise alt text description for this image:'),
imageContent('./photo.jpg'),
],
};Implementation Notes
Base64 Encoding
Images are automatically converted to base64-encoded data URIs when using the provided utilities. This is required by the Ollama API for image inputs.
Memory Considerations
Large images consume significant memory when base64-encoded. Consider:
- Resizing images before encoding
- Processing images in batches rather than all at once
- Using appropriate model context windows
Gateway Support
Currently, multimodal image support is implemented for:
- ✅ OllamaGateway
Support for other gateways (OpenAI, Anthropic) coming soon.
Error Handling
import { isOk, GatewayError } from 'mojentic';
const result = await broker.generate([message]);
if (!isOk(result)) {
if (result.error instanceof GatewayError) {
console.error('Gateway error:', result.error.message);
console.error('Status code:', result.error.statusCode);
} else {
console.error('Unexpected error:', result.error);
}
}Best Practices
- Check file existence before attempting to read images
- Use appropriate models - ensure your model supports vision
- Provide clear prompts - be specific about what you want to analyze
- Handle errors gracefully - network issues and missing files can occur
- Consider image size - large images may hit token limits or memory constraints
- Test with multiple models - different models have different strengths
See Also
- LLM Broker - Core message handling
- Tool Usage - Combining images with tool calls
- API Reference - Type definitions