Skip to main content
Open In Colab Ever wanted to combine the power of text and images in a single AI agent? In this tutorial, you’ll build an agent that can understand both text and images to help users discover recipes that are relevant to them. The approach shown combines LanceDB’s multimodal capabilities with Pydantic AI for the agentic workflow.

Key Technologies

  • LanceDB: Multimodal vector database for efficient storage and retrieval
  • PydanticAI: Modern AI agent framework with type safety
  • Sentence Transformers: Text embeddings for semantic search
  • CLIP: Vision-language model for image understanding
  • Streamlit: Interactive web application framework

Tutorial Overview

Option 1: Notebook

The notebook shows how to work through the steps and prepare a small sample recipe dataset, generate both text and image embeddings, store everything efficiently in LanceDB, and then build a PydanticAI agent with custom tools to query it. You’ll finish by testing the agent against a few example questions to see the full multimodal flow end to end.

Option 2: Demo Application (Local Setup)

The demo application is the full codebase: you’ll download and process a real recipe dataset with thousands of items, run a Streamlit chat interface that supports image upload, and follow a structure that includes production-minded touches like error handling, logging, and monitoring. Everything you need to deploy is included.

Download Tutorial Files

Download the files for the full demo application here.

Dataset Information

  • Source: Kaggle Recipe Dataset
  • Size: Thousands of recipes with images
  • Format: CSV file with recipe data and image references

Setup

# 1. Extract the downloaded files to a folder
# 2. Navigate to the folder in terminal
cd multimodal-recipe-agent

# 3. Install dependencies with uv
uv sync

# 4. Download the Kaggle dataset
# Visit: https://www.kaggle.com/datasets/pes12017000148/food-ingredients-and-recipe-dataset-with-images
# Extract recipes.csv to the data/ folder

# 5. Import the dataset
uv run python import.py

# 6. Run the complete Streamlit chat application
uv run streamlit run app.py