Given an input image, this model tries to generate captions for the image, basically a description of what's focused in that image.
The dataset used for training is the flicker8k dataset, link :- https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip