WELCOME TO FUNCTIONGEMMA
Welcome to the tutorial on FunctionGemma. This interactive experience will teach you everything about function calling, tokenization, and prompt engineering through hands-on experimentation.
WHAT YOU'LL LEARN
- How tokenization works in language models
- Why zero-shot function calling fails
- How few-shot examples solve the problem
- Token-level analysis and debugging
- Best practices for prompt engineering
- ONNX model optimization and deployment
ABOUT FUNCTIONGEMMA-270M-IT-ONNX
Model: onnx-community/functiongemma-270m-it-ONNX
Size: 270 million parameters
Purpose: Specialized for function calling tasks
Format: ONNX quantized (q4 for WebGPU, q8 for WASM)
Key Finding: Requires few-shot examples to generate correct function calls!
Architecture: Based on Google's Gemma 3 270M, fine-tuned for function calling
MODEL LOADED SUCCESSFULLY
The model has been automatically loaded and is ready to use. You can now proceed with the lessons!
TOKENIZATION BASICS
Tokenization is the process of converting text into tokens (numbers) that the model can understand. Let's explore this interactively!
WHAT IS TOKENIZATION?
Language models don't understand words directly. They work with tokens - numeric IDs that represent pieces of text. A token can be a word, part of a word, or even a single character.
TRY IT YOURSELF
// How tokenization works in code: const text = "call:get_current_temperature"; // Tokenize the text const tokens = await tokenizer.encode(text); // Result: [6639, 236787, 828, 236779, 4002, 236779, 27495] // Each number represents a token ID // Decode tokens back to text const decoded = await tokenizer.decode(tokens); // Result: "call:get_current_temperature"
KEY INSIGHT
Special tokens like <start_function_call> have specific token IDs (e.g., token 48). The model uses these to understand structure.
ZERO-SHOT FUNCTION CALLING (WHY IT FAILS)
Zero-shot means asking the model to do something without showing it an example. Let's see what happens!
THE PROBLEM
Without examples, FunctionGemma generates error: instead of call: after <start_function_call>.
TEST ZERO-SHOT APPROACH
// Zero-shot approach - NO examples provided const messages = [ { role: "developer", content: "You are a model that can do function calling..." }, { role: "user", content: "What's the temperature in London?" } // โ No example shown to the model! ]; // Result: Model generates "error:" instead of "call:" // Token 1899 ("error") is chosen instead of token 6639 ("call")
TOKEN ANALYSIS
After <start_function_call> (token 48), the model's probability distribution favors token 1899 ("error") over token 6639 ("call") when no example is provided.
ONE-SHOT FUNCTION CALLING (PARTIAL SUCCESS)
One-shot means showing the model ONE example. Let's see if this helps!
TEST ONE-SHOT APPROACH
// One-shot approach - ONE example provided const messages = [ { role: "developer", content: "You are a model that can do function calling..." }, { role: "user", content: "What's the temperature in Paris?" }, { role: "assistant", // โ ONE example showing correct format content: "<start_function_call>call:get_current_temperature{location:<escape>Paris<escape>}<end_function_call>" }, { role: "user", content: "What's the temperature in Tokyo?" } ];
RESULTS MAY VARY
One-shot can work sometimes, but it's not as reliable as few-shot. The model needs more context to consistently generate correct function calls.
FEW-SHOT FUNCTION CALLING (THE SOLUTION!)
Few-shot means showing the model multiple examples. This is the proven solution!
THE SOLUTION
By providing a few-shot example, we shift the model's token probabilities. Token 6639 ("call") becomes more likely than token 1899 ("error").
TEST FEW-SHOT APPROACH
// โ FEW-SHOT APPROACH (PROVEN TO WORK): // Add example conversation showing correct format const messages = [ { role: "developer", content: "You are a model that can do function calling with the following functions" }, { role: "user", content: "What's the temperature in Paris?" }, { role: "assistant", // โ Example showing the EXACT format we want content: "<start_function_call>call:get_current_temperature{location:<escape>Paris<escape>}<end_function_call>" }, { role: "user", content: query // Your actual query } ]; // Apply chat template with tools const inputs = await tokenizer.apply_chat_template(messages, { tools: [weatherFunction], tokenize: true, add_generation_prompt: true, return_dict: true }); // Generate response const output = await model.generate({ ...inputs, max_new_tokens: 512, do_sample: false, temperature: 0.0 }); // โ Result: Correct function call generated! // <start_function_call>call:get_current_temperature{location:<escape>New York<escape>}<end_function_call>
WHY FEW-SHOT WORKS
- Shows the model the exact format we expect
- Shifts token probabilities in favor of "call:" instead of "error:"
- Provides context about the task structure
- Works consistently with the quantized ONNX model
TOKEN-LEVEL DEEP DIVE
Let's examine what happens at the token level when the model generates function calls.
TOKEN-LEVEL ANALYSIS
| TOKEN ID | TOKEN TEXT | CONTEXT | PROBABILITY SHIFT |
|---|---|---|---|
| 48 | <start_function_call> | Always correct | N/A |
| 1899 | "error" | Zero-shot (no example) | โ High probability |
| 6639 | "call" | Few-shot (with example) | โ High probability |
| 236787 | ":" | Always correct | N/A |
// Token-level analysis of generated output // First 20 generated tokens: // Token 48: "<start_function_call>" โ // Token 6639: "call" โ (with few-shot) or Token 1899: "error" โ (zero-shot) // Token 236787: ":" โ // Token 828: "get" โ // Token 236779: "_" โ // Token 4002: "current" โ // Token 236779: "_" โ // Token 27495: "temperature" โ // Token 236782: "{" โ // Token 7125: "location" โ // Token 236787: ":" โ // Token 52: "<escape>" โ // Token 27822: "London" โ // Token 52: "<escape>" โ // Token 236783: "}" โ // Token 49: "<end_function_call>" โ // The critical decision point is after token 48: // - Without example: Token 1899 ("error") is more likely // - With example: Token 6639 ("call") is more likely
HYPOTHESIS
The model was trained on function calling data that included error handling examples. Without context, it defaults to the error generation pattern. Few-shot examples provide the necessary context to trigger the correct generation path.
INTERACTIVE PLAYGROUND
Now it's your turn! Experiment with different queries and see how the model responds. Add your own examples to test zero-shot, one-shot, and few-shot approaches.
FUNCTION SCHEMA
SYSTEM MESSAGE
YOUR QUERY
CUSTOM EXAMPLES
Add example conversations to use in one-shot and few-shot modes. Each example should show a user query and the expected assistant response with function call.
EXAMPLE FORMAT
User: "What's the temperature in Paris?"
Assistant: "<start_function_call>call:get_current_temperature{location:<escape>Paris<escape>}<end_function_call>"
For few-shot, add multiple examples. For one-shot, only the first example will be used. For zero-shot, no examples are used.
OUTPUT
TIPS FOR EXPERIMENTATION
- Try different cities and locations
- Compare zero-shot vs few-shot results
- Modify the function schema and see what happens
- Add custom examples to test different scenarios
- Watch the token visualization to understand the generation process
- Experiment with different max_tokens values
RESOURCES & LINKS
Explore these resources to deepen your understanding of FunctionGemma, ONNX, and function calling.
OFFICIAL DOCUMENTATION
TUTORIALS & GUIDES
ONNX & OPTIMIZATION
FUNCTION CALLING & AI
KEY RESOURCES SUMMARY
FunctionGemma is a specialized 270M parameter model fine-tuned from Google's Gemma 3 for function calling tasks. It's optimized for edge deployment and requires few-shot examples for reliable function call generation.
ONNX (Open Neural Network Exchange) is an open format for representing machine learning models, enabling interoperability between different frameworks and optimized inference across platforms.
The model is available in quantized formats (q4 for WebGPU, q8 for WASM) to enable efficient browser-based inference.