When to use multimodal inputs
Use multimodal inputs when you need the language model to process and respond to visual content, such as:- Analyzing images
- Extracting information from PDFs
- Generating descriptions or summaries of visual content
- Answering questions about images or documents
Including images in messages
To include an image in your message, use the'image_url'
type within the content list of a message. You can provide either a direct URL to the image or a base64-encoded image string with the appropriate MIME type (e.g., 'data:image/jpeg;base64,...'
).
Using an image URL
Using a base64-encoded image
Including PDFs in messages
Similarly, you can include PDFs using the'image_url'
type with either a PDF URL or a base64-encoded PDF string with the MIME type 'application/pdf'
.