• Nu S-Au Găsit Rezultate

Enhancing the performance of image classification through features automatically

N/A
N/A
Protected

Academic year: 2022

Share "Enhancing the performance of image classification through features automatically"

Copied!
21
0
0

Text complet

(1)

Enhancing the performance of image classification through features automatically

learned from depth-maps

George Ciubotariu, Vlad-Ioan Tomescu, Gabriela Czibula

September 2021

(2)

Contents

Original Contribution Introduction

Computer Vision and Deep Learning Data Set

Unsupervised Analysis

Supervised Analysis

Future Enhancements

(3)

Research Questions and Original Contributions

I

RQ1: How relevant are depth maps in the context of indoor-outdoor image classification?

I Unsupervised learning based analysis on DIODE dataset for indoor-outdoor classification

I t-SNE clustering support for further supervised investigations

I

RQ2: To what extent does aggregating visual features into more granular sub-images increase the performance of classifiers?

I Supervised learning based classification for supporting the unsupervised approach

I Multilayer Perceptron (MLP) classifier tested to confirm hypothesis

I

RQ3: How correlated are the results of the unsupervised based analysis and the performance of supervised models applied for indoor-outdoor image classification?

I Comparative analysis on image features aggregation

(4)

Introduction in the Approached Tasks

I

Indoor-Outdoor Classification

I motivation

I

Semantic Segmentation

I

Depth Estimation

(5)

Related Work

I A review on indoor-outdoor scene classification, feature extraction methods, classifiers and data sets is done by Tong et al. [TSYW06]

I multiple remarkable methods

I mentions good performances between 1998 and 2017 I features such as color, texture, edge etc.

I multiple data sets were mentioned I Cvetkovic et al. [CNI14]

I color and texture descriptors and a SVM classifier

I results of 93.71% and 92.36% accuracy on two public data sets I Tahir et al. [TMR15]

I computes the GIST descriptor as a feature vector I 90.8% accuracy on a public data set

I Raja et al. [RRDR13]

I uses HSV instead of RGB color encoding I extracts color, texture and entropy features I features extracted from 100 sub-images I lightweight KNN classifier

(6)

Computer Vision (CV) and Deep Learning (DL)

Most recent work implementConvolutional Neural Networks(CNNs) in dense visual tasks such asSemantic Segmentation(SS) orDepth Estimation(DE).

I

[LRSK19, RBK21] Dense Prediction Transformers (DPT)

I model that leverages visual transformers instead of

convolutions.

I robust architecture to serve as a backbone in our experiments I tested for both SS and DE tasks, achieving great results,

therefore offering us the possibility to create a comparative approach

(7)

Vision Transformers for Dense Prediction (DPT)

Model Image #extracted features #extracted features resolution after encoder after decoder Depth Estimation

384×384 49152 12582912

Semantic Segmentation

Table:DPT architectures details

Figure:DPT architecture

(8)

DIODE (Dense Indoor and Outdoor DEpth)

I

Data has been collected with a FARO Focus S350

I

It consists of 27858 1024

×

768 RGB-D images

I

Photos have been taken both at daytime and night, over several seasons (summer, fall, winter)

Apart from RGB-D images, DIODE dataset also provides us with normal maps that could further enhance the learning of depth and vice-versa

(9)

DIODE (Dense Indoor and Outdoor DEpth)

Figure:Sample images from DIODE dataset

(10)

DIODE Structure

Figure:Histogram of depth values frequency (%) for indoor train set

Figure:Histogram of depth values frequency (%) for outdoor train set

(11)

Methodology

I

Feature extraction

I manually engineered features I automatically learned features

I

Unsupervised learning-based analysis

I

Supervised learning-based analysis

I depth-augmented images

(12)

Automatic Feature Extraction

1. aggregating RGB from sub-images

I 3·k dimensional vector (k = 1,4,16) I average RGB values for each

sub-image

2. aggregating RGBD from sub-images

I 4·k dimensional vector (k = 1,4,16) I average RGBD values for each

sub-image

Figure:Structure of image splits

3. features from DPT encoder/decoder

I trained for SS

I trained for DE

(13)

Unsupervised Learning for Analysing the Data

I

3D t-SNE unsupervised clustering

I used fornon-linear dimensionality reduction I able to uncover more useful patterns in data

I usesStudent t-distributionto better disperse the clusters I

data normalization with the inverse hyperbolic sine (asinh)

I increased sensitivity to particularly small and large values I

parameters used

I perplexityof 20 I learning rateof 3.0

I for a slower converging but finer learning curve I 1000iterations

Measure RGBD features DPT DE DPT SS DPT SS depth

(4splits) learned features learned features augmented features

Prec 0.769 0.729 0.945 0.957

Table:Precvalues for the t-SNE transformations depicted in Figures6–9.

(14)

Features extracted aggregating RGB and RGBD values

I

4 splits

Figure:t-SNE for RGB with 4 splits Figure: t-SNE for RGB-D with 4 splits

(15)

Features Extracted from DL models

I

DPT trained for Semantic Segmentation

Figure:t-SNE of DPT encoder extracted features for SS

Figure:t-SNE of DTP encoder extracted features for DE

(16)

Supervised Learning Results

Features #Splits (n) Accuracy AUC Specificity Sensitivity 0 0.692±0.077 0.525±0.056 0.980±0.028 0.070±0.121 RGB 1 0.688±0.064 0.517±0.022 0.989±0.014 0.046±0.049 2 0.669±0.049 0.545±0.048 0.912±0.068 0.163±0.136 0 0.880±0.039 0.858±0.041 0.898±0.058 0.817±0.081 RGBD 1 0.876±0.043 0.862±0.044 0.894±0.046 0.829±0.063 2 0.838±0.044 0.826±0.053 0.848±0.060 0.804±0.099 DPT-DE 0 0.823±0.131 0.831±0.076 0.812±0.185 0.850±0.069 DPT-SS 0 0.950±0.027 0.942±0.029 0.969±0.034 0.915±0.053 DPT-SS+D 0 0.961±0.015 0.956±0.021 0.970±0.019 0.941±0.041

Table:The results of supervised learning indoor-outdoor classification on DIODE dataset. Confidence intervals of 95% were used in the analysis. Only the features extracted by the DPT encoder are used in the experiments.

(17)

Comparative Results

Benefits of our method:

I lightweight

I uses less features and parameters compared to other models I low memory and computational cost compared to other deep

learning methods

I significant increase in performance when adding depth cues I capable of being optimised using multi-threading

I displays potential of depth cues use for multiple visual tasks

According to the study performed by Tong et al., our approach which uses features extracted using DPT-SS+D (96.1% accuracy) establishes a new State-of-the-art in indoor-outdoor classification. The best performance presented in [TSYW06] is 93.8% accuracy.

(18)

Ongoing Experiments and Future Enhancements

I

Identifying features that can be used in both SS and DE

I

Identifying other problems that can be solved with adapted

DL models

I

Architecture Transfer from SS towards DE

I

Multitask and Collaborative Learning

(19)

Thank you!

Questions?

(20)

Bibliography I

Stevica Cvetkovic, Sasa Nikolic, and Slobodan Ilic.

Effective combining of color and texture descriptors for indoor-outdoor image classification.

Facta universitatis - series: Electronics and Energetics, 27:399–410, 01 2014.

Katrin Lasinger, Ren´ e Ranftl, Konrad Schindler, and Vladlen Koltun.

Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer.

CoRR, abs/1907.01341, 2019.

Ren´ e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun.

Vision transformers for dense prediction.

CoRR, abs/2103.13413, 2021.

(21)

Bibliography II

R. Raja, S. Md. Mansoor Roomi, D. Dharmalakshmi, and S. Rohini.

Classification of indoor/outdoor scene.

In 2013 IEEE International Conference on Computational Intelligence and Computing Research, pages 1–4, 2013.

Waleed Tahir, Aamir Majeed, and T. Rehman.

Indoor/outdoor image classification using gist image features and neural network classifiers.

12th International Conference on High-capacity Optical Networks and Emerging Technologies, pages 1–5, 2015.

Zhehang Tong, Dianxi Shi, Bingzheng Yan, and Jing Wei.

A review of indoor-outdoor scene classification.

In Proceedings of the 2017 2nd International Conference on

Control, Automation and Artificial Intelligence (CAAI 2017),

pages 469–474. Atlantis Press, 2017/06.

Referințe

DOCUMENTE SIMILARE

30001: the upper (grey) debris layer of the last construction phase, a compact grey soil, mixed with stones and fragmentary construction material.. of a compact yellow

Toate acestea sunt doar o parte dintre avantajele in care cred partizanii clonarii. Pentru a si le sustine, ei recurg la o serie de argumente. Unul dintre ele are in atentie

2 Referring to the constitutional regulation of Kosovo regarding the form of state regulation, we have a unitary state, but in practice the unitary state

During the period 1992-2004, for criminal offenses with elements of abuse in the field of real estate turnover in Kosovo there were accused in total 35 persons and none

rnetric sþacets X^rbsþectiael,y Y are-NS-isomorþkic, tken the corresþond'ing quoti,ent sþaces læ ønd, lo øre homeomorþhic.. Rernarh

The evolution to globalization has been facilitated and amplified by a series of factors: capitals movements arising from the need of covering the external

Then if the first experiment can result in any one of m possible outcomes and if, for each outcome of the first experiment, there are n possible outcomes of the second experiment,

RQ3: How correlated are the results of the unsupervised based analysis and the performance of supervised models applied for indoor-outdoor image classification.. Comparative analysis