It’s the prediction that models will be able to take any kind of input - text, media, image, video, and analyze it with ...