pub struct TokenizerGateway { /* private fields */ }Expand description
Gateway for tokenizing and detokenizing text using tiktoken.
The tokenizer gateway provides encoding and decoding functionality, allowing you to convert text to tokens and back. This is essential for understanding token usage and managing context windows.
§Examples
use mojentic::llm::gateways::TokenizerGateway;
let tokenizer = TokenizerGateway::new("cl100k_base").unwrap();
let text = "Hello, world!";
let tokens = tokenizer.encode(text);
let decoded = tokenizer.decode(&tokens);
assert_eq!(text, decoded);Implementations§
Source§impl TokenizerGateway
impl TokenizerGateway
Sourcepub fn new(model: &str) -> Result<Self, Box<dyn Error>>
pub fn new(model: &str) -> Result<Self, Box<dyn Error>>
Creates a new TokenizerGateway with the specified encoding model.
§Arguments
model- The encoding model to use. Common options:- “cl100k_base” - Used by GPT-4 and GPT-3.5-turbo (default)
- “p50k_base” - Used by older GPT-3 models
- “r50k_base” - Used by even older models
§Errors
Returns an error if the specified model is not available.
§Examples
use mojentic::llm::gateways::TokenizerGateway;
let tokenizer = TokenizerGateway::new("cl100k_base").unwrap();Sourcepub fn encode(&self, text: &str) -> Vec<usize>
pub fn encode(&self, text: &str) -> Vec<usize>
Encodes text into tokens.
§Arguments
text- The text to encode
§Returns
A vector of token IDs representing the encoded text.
§Examples
use mojentic::llm::gateways::TokenizerGateway;
let tokenizer = TokenizerGateway::default();
let tokens = tokenizer.encode("Hello, world!");
println!("Token count: {}", tokens.len());Sourcepub fn decode(&self, tokens: &[usize]) -> String
pub fn decode(&self, tokens: &[usize]) -> String
Decodes tokens back into text.
§Arguments
tokens- The slice of token IDs to decode
§Returns
The decoded text.
§Examples
use mojentic::llm::gateways::TokenizerGateway;
let tokenizer = TokenizerGateway::default();
let tokens = vec![9906, 11, 1917, 0];
let text = tokenizer.decode(&tokens);
println!("Decoded: {}", text);Sourcepub fn count_tokens(&self, text: &str) -> usize
pub fn count_tokens(&self, text: &str) -> usize
Counts the number of tokens in a text string.
This is a convenience method that encodes the text and returns the token count without allocating the token vector.
§Arguments
text- The text to count tokens for
§Returns
The number of tokens in the text.
§Examples
use mojentic::llm::gateways::TokenizerGateway;
let tokenizer = TokenizerGateway::default();
let count = tokenizer.count_tokens("Hello, world!");
println!("Token count: {}", count);Trait Implementations§
Auto Trait Implementations§
impl Freeze for TokenizerGateway
impl RefUnwindSafe for TokenizerGateway
impl Send for TokenizerGateway
impl Sync for TokenizerGateway
impl Unpin for TokenizerGateway
impl UnwindSafe for TokenizerGateway
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more