ReCap Pro: Caption Correction using Meta Learning

Main Article Content

Sakshi Birthi, Sanjana Mahesh, Sanskriti Mathuria, Sarang J Chilkund, Bhaskarjyoti Das

Abstract

This article presents ReCap Pro, a framework that corrects auto-generated captions by dealing with the possible errors in nouns and verbs in the caption. While caption correction has been attempted earlier, it is observed that it has never been tried as a meta-learning-based approach. The work described in this article offers few-shot learning enabling faster learning with fewer samples of images, solving one of the critical limitations of the traditional data-intensive caption generation models. An object detection model trained using Reptile Meta-Learning is employed to detect the correct nouns and a human object interaction (HOI) detection model trained using Prototypical Networks is used to detect the verbs in the image. The proposed method addresses a long-standing limitation of existing caption generation models that rely on large amounts of training data and can be used as an extra layer of performance enhancer with existing caption generators. The suggested technique can be applied as an additional performance enhancer layer over current caption generators to overcome a long-standing shortcoming of those models

Article Details

Section
Articles