Arabic Image Captioning Using Neural Networks

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Image captioning, which aims to generate natural language descriptions for images, represents a major challenge at the intersection of computer vision and natural language processing. While significant advances have been made for English, Arabic image captioning remains underexplored, largely due to the lack of dedicated datasets and the complexity of the Arabic language. In this thesis, we present an Arabic image captioning model based on an encoder- decoder architecture, combining the VGG16 convolutional neural network for image feature extraction with a Long Short-Term Memory (LSTM) network for sentence generation. To address the lack of Arabic resources, we constructed a new dataset by translating and refining existing English image caption datasets of flickr8K. The proposed model was trained and evaluated on this dataset. By indicating its ability to generate relevant and coherent Arabic captions. These results highlight the potential of deep learning approaches to advance Arabic image captioning and contribute to bridging the resource gap for Arabic in the field of image understanding.

Description

Citation