Speaker
Description
This research presents a novel approach to obstacle detection during navigation using a
combination of Convolutional Neural Networks (CNNs) and Long Short-Term Memory
(LSTM) networks. The primary objective is to generate accurate image captions that describe
the content of images, which is crucial for applications such as autonomous driving and
assistive technologies for the visually impaired. We systematically analyze the architecture of
our model, which consists of three main components: a CNN for feature extraction, an LSTM
for sequence generation, and a mechanism for sentence formulation. By employing transfer
learning with the Inception v3 architecture, we enhance the model's performance while
reducing computational costs. Our experiments utilize the Flickr8k dataset, which comprises
8,000 images, each accompanied by five descriptive sentences. We introduce a simplified
version of Gated Recurrent Units (GRUs) as an alternative to LSTMs, demonstrating
comparable performance with fewer parameters, thus improving training efficiency. The
model's effectiveness is evaluated using the Bilingual Evaluation Understudy (BLEU) score,
which quantifies the quality of generated captions against reference sentences. Results indicate
that our architecture achieves a BLEU score of aprox 80% on the training set and approx 75%
on the test set, showcasing its capability to produce semantically and grammatically correct
captions. Additionally, we explore the integration of attention mechanisms to enhance the
model's focus on relevant image features during caption generation. The findings suggest that
our approach not only meets the challenges of automatic image captioning but also holds
potential for broader applications in image understanding and navigation systems. Future work
will involve expanding the dataset and refining the model to further improve accuracy and
robustness in diverse scenario
Session author's bio
Myself Yash Mishra .I done this research in Nitk Surathkal under Professor Kedarnath senapati
| In Person Attendance | In-person |
|---|---|
| Please confirm that there are included headshots of all speakers in their profiles | Yes |
| Level of Difficulty | Intermediate |
| Agree to Privacy Policy and Notice | I agree |