View of Fake Information Classifier Using Random Multi-Model Deep Learning

(1)

Fake Information Classifier Using Random Multi-Model Deep Learning

1Aman Pandey,

SRM Institute of Science and Technology, Kattankulathur, Chennai, India.

Department of Computer Science and Engineering, Email Id: [email protected]

2Devansh Srivastava,

SRM Institute of Science and Technology, Kattankulathur, Chennai, India.

3Dr. S. Thenmalar,

SRM Institute of Science and Technology, Kattankulathur. Chennai, India.

ABSTRACT

Information is a leading factor to determine a nation’s growth and development. Recently the occurrences of Fake and Fraudulent data had increased over all the platforms. Misleading data has resulted in political polarization, decreased trust in public institutions, and undermined democracy. The amount of information is enormous and technological giants are trying to process and eliminate irrelevant information. The complexity of datasets is growing and it requires new methods to process them. Deep learning methods had already surpassed Machine Learning techniques in terms of accuracy and handling of non-linear data. This paper introduces a new classification strategy focused on ensemble deep learning: Via set of deep learning architectures, Random Multi-Model Deep Learning (RMDL) helps to minimise inaccuracy and provides a powerful model to solve traditional deep neural net structures.

Evaluation parameters like accuracy, recall score, Micro F1-Score are used to measure its accuracy and precision.

Keywords - Machine Learning, Deep Learning, Artificial Neural Network, Convolutional Neural Network Recurrent Neural Network, Long Short-Term Memory

I. INTRODUCTION

Huge datasets with a lot of variety including images, text, video, and documents are tough to handle and derive some context from them. Techniques like categorization and classification are used to process the data. But the earlier machine learning and deep learning techniques deal with a fixed and specific type of dataset which is a major drawback. Before discussing the deep neural networks, we will discuss why they are preferred over machine learning algorithms. The portrayal of an input to output mechanism is discovered and learned by Machine Learning algorithms. For parametric models, it uses weights while in the case of

(2)

classification problems decision boundaries come into play. But machine learning is unable to learn all the functions. Also, deep learning provides automated feature engineering which saves a lot of time and effort. When it comes to nonlinear and complex data, we use deep learning architectures but the question is which architecture is the best fit for the problem.

Also, if the architecture is known, still number of nodes and layers is a problem. The main architectures usually used are convolutional neural networks(CNN), recurrent neural networks(RNN), artificial neural networks(ANN) with different applications and efficiencies.

Figure 1: Voting Algorithm for RMDL Model

We have used a more hybrid way to solve previously present problems using a group of deep learning architectures. The method covered in the paper includes all 3 neural networks allowing to process of any type of complex and varied data.

As input layers, the three deep learning architectures use different feature space representations. DNN uses the term frequency-inverse document frequency for dimensionality reduction from sequential data. The number of non-visible layers and nodes in each and every hideous layer is identified using RDML [12] through the randomly generated hyperparameters. CNN is most commonly used for image recognition, but it may also be used for other types of data. CNN uses feature maps randomly along with hidden layers. The model is used to extract better hyperparameters, which are then used to fine-tune the model.

RNN techniques are mostly used for text classification, with a 1D (One Dimensional) convolutional layer for text and a 2D (Two Dimensional) convolutional layer for images [15], [16], [17] is used while Recurrent Neural Network technique is used mainly for text or word classification.

Random Multimodal Deep Learning covers 2 different approach, including 2 different Recurrent Neural Network structures: Gated Recurrent Units and Long Short-Term Memory.

(3)

Random generation of hyperparameters leads to an irregular number of GRUs and LSTM along with hidden layers.

The following are the paper's key contributions:

1.Use of multi-model [13], [15] ensemble techniques resulting in better accuracy and robustness.

2.Use of different optimization techniques giving better outcomes.

3.Multiple feature extraction techniques for different models for better knowledge of feature space.

4.Use of majority voting and multiple dropout layers to remove overfitting 5.At last RMDL model helps to process varied forms of data

II. RELATED WORK

Research scholars for different disciplines have done relevant work on this topic as described in this paper. The proposed model is divided into three category or three sub-processes: 1.

Feature extraction, 2. Classification techniques and 3. Deep Learning model for classification.

Feature Extraction: It involves decreasing the number of parameters needed for data processing. It means choosing that particular information from a multi-layered pile of data for the model or according to the particular requirement. Feature Extraction can also be defined as the method of constructing a combination of variables to solve the problem while still maintaining accuracy. As an example [1] introduced an efficient procedure for text categorization. This method involves counting words to form a structure for statistical learning. [2] proposed a method that modifies the weight of words and [7] frequency count call term frequency-inverse document frequency is used to change the weight of terms. The TF-IDF [11] vector measures the inverse frequency of a word's commonality through documents multiplied by the amount it can be witnessed in the document. The earlier methods do not measure the relationship between words in sequences. The following approach employs the principle of embedding [3], [9], [10], or inserting the term into a vector based on its background. Glove [5] is a learning space depiction of words that was used to create the model.

Classification Techniques: Classification is nothing but a predictive modelling case where a class label is predicted for the provided input data. A classification model, on the contrary, aims to draw some inference from the observed values, and these algorithms are used to manage and recognise these objects and to be able to categorise them. Naïve Bayes Classifier [4], [6] is one such classifier which is the depiction of the supervised learning classification problem. This approach provides an efficient text classification and information retrieval [8], [11] application. It takes an input vector of numeric or categorical data values and returns a

(4)

probability result for each potential output mark. Although the order of the sequences in the text is not reflected in the performance likelihood because it uses the bags of words [10]

method for feature extraction, Nave Bayes Classification [6] is quick and effective for text.

Another popular technique is the Support Vector Machine (SVM) [11] with a better accuracy over various types of data. To obtain greater accuracy latent variables are induced in the conditional model [8] allowing a contemporary structure for SVM for text-based classification. The downfall of using SVM is that it cannot handle a large amount of data which can be overcome by using Particle Swarm Optimization (PSO) [11].

Deep Learning: It's a man-made intelligence machine that mimics how the human brain processes information and generates patterns to be used in higher cognitive processes. It's a form of machine learning that uses neural networks to learn unsupervised from unstructured or unrelated data. The neural network in deep learning is merely the illustration of neurons within the human brain. Neural Network uses an iterative method for learning called as back- propagation and an optimizer. DNNs have multiple hidden layers and are based on a simple neural network architecture. The three basic deep learning models are DNN [14], Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN). The approach suggested in this paper employs all of the basic models.

III. DATASET

As our model is being prepared for all sorts of data, we need to pre-train it on different type of data to have a better and more accurate model. The dataset is further categorized into 2 parts i.e., one is text data and the other is images.

Text Dataset:

• WOS is a great dataset with over 7 parent classes for categorization

• Reuters is one of the best sources of information with a huge collection of 90 classes which makes feature engineering an important part

• IMDB (50,000 reviews that are divided into a 2 sets)

• Twenty newsgroups are a huge collection of words with maximum length extending till 1000 words

Image Dataset:

• Dataset from the National Institute of Standards and Technology that has been updated. It's a set of more than 50,000 images all small square grey - scale pictures of handwritten single digits ranging from 0 to 9.

• CIFAR

(5)

IV. FEATURE EXTRACTION AND PREPROCESSING

The process is divided into 2 parts based on the types of data that are images and text. The feature space is quite different for these categories like its unstructured for text whereas it is structured for image dataset.

• Text and Sequences: Many text feature extraction techniques are used including word embeddings and TF-IDF. Word embeddings include GloVe and Word2Vec techniques. In addition to using word vectorization methods to extract features, other method i.e., the N-gram model is often used to derive functionality for neural deep learning.

• Image and 3D objects: 3-Dimensional objects or images have dimensions including height(h), width(w) and colour(c). For grayscale images feature scale is h x w, else it has 3 dimensions (RGB).

V. ARCHITECTURE

Figure 2: Architecture Diagram

Proposed Solution

RMDL [12] technique is used for categorization and classification. It is a novel method and can be used with a different range of data including text, videos, and images. The below

(6)

diagram gives an overview of the techniques that are multi Deep Neural Networks (DNN) [18, 19, 20], Deep Convolutional Neural Networks (CNN), and Deep Recurrent Neural Networks (RNN).

Figure 3: Multimodal Deep Learning Architecture

The number of layers and nodes are generated randomly for the models used.

(For example, 9 Random Models in RMDL [12] constructed of 3 CNNs, 3 RNNs, and 3 DNNs, all of which are unique due to random creation)

The model structure includes 3 techniques of deep learning in parallel with the final model combination of d DNNs, r RNNs, and c CNNs.

Deep Neural Network Model: It is composed of multi-connection layers taking input from the previous layer and providing output to the next layer. We will be using multi-classes DNNs [14] where learning models are produced randomly. We will be using random combinations of hidden layers and nodes.

Recurrent Neural Network Model: It is a very efficient technique for sequential data because it allocates more weights to previous data points. Also, RNN can be used to categorize images. The problems with basic RNN are:

a.)Vanishing gradient descent b.)Exploding gradient descent

To overcome these problems, we use LSTM which is a unique kind of RNN.The LSTM is made up of a system with several gates that control the amount of data that can access each node state.

The gating mechanism used by LSTM is called as Gated Recurrent Unit, which is a simplified version of the LSTM architecture.

(7)

Convolution Neural Network: We use our third methodology, CNN, for primarily hierarchical text or image classification. An image tensor is convolved with a set of kernels of size dd for image processing, and these layers are referred to as feature maps. Multiple filters on the input can be provided by stacking the layers, and pooling layers can also be used.

Maximum pooling is one of the most commonly used pooling techniques.

Optimization: Two forms of stochastic gradient optimizers will be used in the model:

• RMSProp

• Adam optimizer

The key advantage of using multiple models with different optimizers is that if one optimizer fails to provide a good fit for a particular case, the RMDL model with n random models will ignore k inefficient models if and only if n > k.

VI. EVALUATION METRICS

The parameters used for evaluating the models are mentioned below:

Accuracy Score: It gives the measure of closeness to a specific value. The formula for accuracy score is mentioned below:

In this equation, y is the true label, y’ is the predicted label that is given after prediction. For multilabel classification, the accuracy score gives us the accuracy of the subset. If the whole set of predicted values is similar to the true label the accuracy score is given as 1.0, otherwise 0.0.

F1-Score: It is the measure of Test Set accuracy. It calculates the harmonic mean of precision (P) and recall(R) to find the score. The formula is given below:

The maximum value of F1 score is 1.

Receiver Operating Characteristic Curve: It is one of the ways to evaluate a model for classification based on their performance by considering the False Positive Rate (FPR) and True Positive Rate (TPR). The measure of TPR and FPR are computed by shifting thedecision threshold of the classifier. The TPR feature lies on Y-axis and FPR feature lies on

(8)

X-axis, the ideal result for the classifier is at the top left where FPR is zero and TPR is 1 which makes the classifier ideal. The performance is measured thru the graphical analysis i.e., the area is directly proportional to performance.

VII. RESULT AND ANALYSIS

ImageClassification:

Table 1: MNIST and CIFAR-10 Dataset

Text Classification:

Table 2: Web of Science and Reuters-21578 Dataset

(9)

Table 3: 20NewsGroup and IMDB Dataset

VIII. CONCLUSION

Deep learning is the best technique for huge amount of data but it comes with lot of selection problems like which architecture to use and selection of hyperparameters. This paper presents a novel solution to solve this problem and provide a better and optimized deep learning architecture. We have used random multimodal deep learning technique to overcome the selection problem which results in better accuracy and less loss percentage. The dataset used to train and test the model includes Web of Science (WOS), MNIST, Reuters, CIFAR, 20NewsGroups and IMDBs. The model uses parallel architecture to incorporate CNN, DNN and RNN which in return gives better result than the previously used methods like SVM, naïve Bayes, or single deep learning model. The proposed method has the potential to get more accuracy and scores with mode number of random layers used on different types data.

Figure 4: Epoch vs Loss for CIFAR

(10)

Figure 5: Epoch vs Loss for MNIST

Figure 6: Epoch vs Loss for Reuters-21578

Figure 7: Epoch vs Loss for WOS-5736

REFERENCES

[1] Lester E Krueger and Ronald G Shapiro. 1979. Letter detection with rapid serialvisual presentation: Evidence against word superiority at feature extraction. Journal of Experimental Psychology: Human Perception and Performance 5, 4(1979), 657.

(11)

[2] Hans Peter Luhn. 1957. A statistical approach to mechanized encoding andsearching of literary information. IBM Journal of research and development 1, 4(1957), 309–317.

[3] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficientestimation of word representations in vector space. arXiv:1301.3781 (2013).

[4] Kevin P Murphy. 2006. Naive bayes classifiers. University of British Columbia(2006).

[5] Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.

[6] Irina Rish. 2001. An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, Vol. 3. IBM, 41–46.

[7] Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513–523.

[8] Chun-Nam John Yu and Thorsten Joachims. 2009. Learning structural svms withlatent variables. In ICML. ACM, 1169–1176.

[9] AyyoobImani,AliMontazer,AzadehShakery and Amir Vakili, “Deep Neural Networks for QueryExpansion Using Word Embeddings”, Advances inInformation Retrieval, 2019.

[10] Dong Qiu, Haihuan Jiang and Shuqiao Chen, “Fuzzy Information Retrieval Based on Continuous Bag-of-Words Model”, Symmetry, 2020.

[11] Kamran Kowsari, MojtabaHeidarysafa, Donald E Brown, Kiana Jafari Meimandi and Laura E. Barnes; “RMDL: Random Multimodel Deep Learning for Classification”, ICISDM '18: Proceedings of the 2nd International Conference on Information System and Data Mining, 2018.

[12] Yawen Xiao, Jun Wu, Zongli Lin and Xiaodong Zhao; “A deep learning-based multimodel ensemble method for cancer prediction”, Computer Methods and Programs on Biomedicine, 2018.

[13] M. A. Khan, M. Y. Javed, M. Sharif, T. Saba and A. Rehman, "Multi-Model Deep Neural Network based Features Extraction and Optimal Selection Approach for Skin Lesion Classification," 2019 International Conference on Computer and Information Sciences (ICCIS), Sakaka, Saudi Arabia, 2019.

[14] Khamparia, A., Singh, A., Anand, D. et al. A novel deep learning-based multi-model ensemble method for the prediction of neuromuscular disorders. Neural Computo&Applic 32, 11083–11095 (2020). https://doi.org/10.1007/s00521-018-3896-0.

(12)

[15] KundidVasić M, Papić V. Multimodel Deep Learning for Person Detection in Aerial Images. Electronics. 2020; 9(9):1459. https://doi.org/10.3390/electronics9091459.

[16] Jie Xu, Wei Wang, Hanyuan Wang, Jinhong Guo,

Multi-model ensemble with rich spatial information for object detection, Pattern Recognition, Volume 99, 2020, 107098, ISSN 0031-3203, https://doi.org/10.1016/j.patcog.2019.107098.

[17] Melih S. Aslan, ZeyadHailat, Tarik K. Alafif, Xue-Wen Chen, Multi-channel multimodel feature learning for face recognition, Pattern Recognition, Letters, Volume 85, 2017, Pages 79-83, ISSN 0167-8655, https://doi.org/10.1016/j.patrec.2016.11.021.

[18] Zhou, T., Han, G., Xu, X. et al. A Learning-Based Multimodel Integrated Framework for Dynamic Traffic Flow Forecasting. Neural Process Lett 49, 407–430 (2019).

https://doi.org/10.1007/s11063-018-9804-x.

[19] Noha Radwan, Wolfram Burgard and Abhinav Valada, “Perspective on Deep Multimodel Robot Learning”, Journal: The International Journal of Robotics Research, 2020, Volume 39, Number 13, Page 1567, DOI: 10.1177/0278364920961809