An LSTM and Conventional, Global, and Object-Based Semantic Feature Fusion Framework for Indoor-Outdoor Scene Classification
Main Article Content
Abstract
This article proposes a novel approach that uses diverse features from a scene image using a variety of local, global, and object-based descriptors. The regional and image-based features are obtained using various conventional statistical descriptors, whereas a deep network VGG19 is used to extract the scene's global features. Scene objects based on specific features are concatenated to regional and global features after they are segmented using another deep network, YOLOV5m. While conventional features represent intensity-based characteristics, global and object features carry color depth details of the scene image. A long-term short memory (LSTM) network with a fully connected (FC) dense layer network is trained over images from four benchmark cross-datasets. Experimental evaluation over 5081 indoor-outdoor scene images showed that the proposed scene classification approach obtained 96% accuracy in identifying both categories on 15% test sample images.