Arabic Image Captioning Using Neural Networks
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Image captioning, which aims to generate natural language descriptions for images,
represents a major challenge at the intersection of computer vision and natural language
processing. While significant advances have been made for English, Arabic image
captioning remains underexplored, largely due to the lack of dedicated datasets and the
complexity of the Arabic language.
In this thesis, we present an Arabic image captioning model based on an encoder-
decoder architecture, combining the VGG16 convolutional neural network for image
feature extraction with a Long Short-Term Memory (LSTM) network for sentence
generation. To address the lack of Arabic resources, we constructed a new dataset by
translating and refining existing English image caption datasets of flickr8K.
The proposed model was trained and evaluated on this dataset. By indicating its
ability to generate relevant and coherent Arabic captions. These results highlight the
potential of deep learning approaches to advance Arabic image captioning and
contribute to bridging the resource gap for Arabic in the field of image understanding.
