Multimodal

An AI model that can work with more than one type of input or output, such as text, images, audio, and video.

Why it matters

Multimodal models can describe a photo, read a chart, or answer questions about a video, not just text.

Uploading a screenshot and asking an assistant to explain it uses a multimodal model.

Back to the full AI glossary.