Image Analysis

Mojentic supports multimodal capabilities, allowing you to analyze images with vision-capable LLM models. This guide shows you how to send images to LLMs and get intelligent responses.

Overview

Vision-capable models can analyze images and answer questions about them. Mojentic provides utilities to encode images and send them alongside text prompts in a single message.

Requirements

A vision-capable model (e.g., llava, bakllava, qwen3-vl, gemma3:27b)
Image files in common formats (JPEG, PNG, GIF, WebP, BMP, SVG)

Quick Example

typescript

import { LlmBroker, OllamaGateway, MessageRole, isOk } from 'mojentic';
import { imageContent, textContent } from 'mojentic/llm/utils/image';

const gateway = new OllamaGateway();
const broker = new LlmBroker('qwen3-vl:30b', gateway);

// Create a multimodal message with text and image
const message = {
  role: MessageRole.User,
  content: [
    textContent('What do you see in this image?'),
    imageContent('./path/to/image.jpg'),
  ],
};

const result = await broker.generate([message]);

if (isOk(result)) {
  console.log(result.value);
}

Image Utilities

`imageToDataUri(filePath: string): string`

Reads an image file and converts it to a base64 data URI suitable for LLM consumption.

typescript

import { imageToDataUri } from 'mojentic/llm/utils/image';

const dataUri = imageToDataUri('./photo.jpg');
// Returns: "data:image/jpeg;base64,/9j/4AAQSkZJRg..."

Supported formats:

JPEG (.jpg, .jpeg)
PNG (.png)
GIF (.gif)
WebP (.webp)
BMP (.bmp)
SVG (.svg)

`imageContent(filePath: string)`

Creates a ContentItem for an image, automatically handling the encoding.

typescript

import { imageContent } from 'mojentic/llm/utils/image';

const imageItem = imageContent('./diagram.png');
// Returns: { type: 'image_url', image_url: { url: 'data:image/png;base64,...' } }

`textContent(text: string)`

Creates a ContentItem for text. Use this when combining text and images in the same message.

typescript

import { textContent } from 'mojentic/llm/utils/image';

const textItem = textContent('Describe this diagram:');
// Returns: { type: 'text', text: 'Describe this diagram:' }

Multiple Images

You can include multiple images in a single message:

typescript

const message = {
  role: MessageRole.User,
  content: [
    textContent('Compare these two images:'),
    imageContent('./before.jpg'),
    imageContent('./after.jpg'),
    textContent('What are the key differences?'),
  ],
};

Manual Construction

If you need more control, you can construct multimodal messages manually:

typescript

import { MessageRole } from 'mojentic';
import { readFileSync } from 'fs';

function imageToDataUri(filePath: string): string {
  const imageBuffer = readFileSync(filePath);
  const base64 = imageBuffer.toString('base64');
  return `data:image/jpeg;base64,${base64}`;
}

const message = {
  role: MessageRole.User,
  content: [
    {
      type: 'text' as const,
      text: 'What is in this image?',
    },
    {
      type: 'image_url' as const,
      image_url: {
        url: imageToDataUri('./photo.jpg'),
      },
    },
  ],
};

Vision-Capable Models

Ollama Models

Popular vision models available via Ollama:

qwen3-vl:30b - Excellent vision understanding and text extraction
llava:latest - General-purpose vision model
bakllava:latest - Enhanced LLaVA variant
gemma3:27b - Google's multimodal model

Pull a model with:

bash

ollama pull qwen3-vl:30b

Complete Example

Here's a complete working example that analyzes an image:

typescript

import { LlmBroker, OllamaGateway, MessageRole, isOk } from 'mojentic';
import { imageContent, textContent } from 'mojentic/llm/utils/image';
import { existsSync } from 'fs';
import { join } from 'path';

async function analyzeImage(imagePath: string, prompt: string) {
  // Verify image exists
  if (!existsSync(imagePath)) {
    throw new Error(`Image not found: ${imagePath}`);
  }

  // Initialize gateway and broker
  const gateway = new OllamaGateway();
  const broker = new LlmBroker('qwen3-vl:30b', gateway);

  // Create multimodal message
  const message = {
    role: MessageRole.User,
    content: [
      textContent(prompt),
      imageContent(imagePath),
    ],
  };

  console.log(`Analyzing: ${imagePath}`);
  console.log(`Prompt: ${prompt}\n`);

  // Generate response
  const result = await broker.generate([message]);

  if (isOk(result)) {
    console.log('Analysis:', result.value);
    return result.value;
  } else {
    throw result.error;
  }
}

// Usage
analyzeImage(
  './photo.jpg',
  'Describe what you see in this image in detail.'
).catch(console.error);

Use Cases

Document OCR

Extract text from images of documents, receipts, or signs:

typescript

const message = {
  role: MessageRole.User,
  content: [
    textContent('Extract all text from this document in markdown format.'),
    imageContent('./receipt.jpg'),
  ],
};

Image Comparison

Compare multiple images to identify differences or similarities:

typescript

const message = {
  role: MessageRole.User,
  content: [
    textContent('Compare these product photos and list the differences:'),
    imageContent('./product_v1.jpg'),
    imageContent('./product_v2.jpg'),
  ],
};

Diagram Understanding

Analyze technical diagrams, charts, or visualizations:

typescript

const message = {
  role: MessageRole.User,
  content: [
    textContent('Explain the architecture shown in this diagram:'),
    imageContent('./system_architecture.png'),
  ],
};

Accessibility

Generate alt text descriptions for images:

typescript

const message = {
  role: MessageRole.User,
  content: [
    textContent('Generate a concise alt text description for this image:'),
    imageContent('./photo.jpg'),
  ],
};

Implementation Notes

Base64 Encoding

Images are automatically converted to base64-encoded data URIs when using the provided utilities. This is required by the Ollama API for image inputs.

Memory Considerations

Large images consume significant memory when base64-encoded. Consider:

Resizing images before encoding
Processing images in batches rather than all at once
Using appropriate model context windows

Gateway Support

Currently, multimodal image support is implemented for:

✅ OllamaGateway

Support for other gateways (OpenAI, Anthropic) coming soon.

Error Handling

typescript

import { isOk, GatewayError } from 'mojentic';

const result = await broker.generate([message]);

if (!isOk(result)) {
  if (result.error instanceof GatewayError) {
    console.error('Gateway error:', result.error.message);
    console.error('Status code:', result.error.statusCode);
  } else {
    console.error('Unexpected error:', result.error);
  }
}

Best Practices

Check file existence before attempting to read images
Use appropriate models - ensure your model supports vision
Provide clear prompts - be specific about what you want to analyze
Handle errors gracefully - network issues and missing files can occur
Consider image size - large images may hit token limits or memory constraints
Test with multiple models - different models have different strengths

Image Analysis ​

Overview ​

Requirements ​

Quick Example ​

Image Utilities ​

imageToDataUri(filePath: string): string ​

imageContent(filePath: string) ​

textContent(text: string) ​

Multiple Images ​

Manual Construction ​

Vision-Capable Models ​

Ollama Models ​

Complete Example ​

Use Cases ​

Document OCR ​

Image Comparison ​

Diagram Understanding ​

Accessibility ​

Implementation Notes ​

Base64 Encoding ​

Memory Considerations ​

Gateway Support ​

Error Handling ​

Best Practices ​

See Also ​

Image Analysis

Overview

Requirements

Quick Example

Image Utilities

`imageToDataUri(filePath: string): string`

`imageContent(filePath: string)`

`textContent(text: string)`

Multiple Images

Manual Construction

Vision-Capable Models

Ollama Models

Complete Example

Use Cases

Document OCR

Image Comparison

Diagram Understanding

Accessibility

Implementation Notes

Base64 Encoding

Memory Considerations

Gateway Support

Error Handling

Best Practices

See Also