View of A Deep Learning Based Algorithm Design for Fake News Detection Framework

(1)

A Deep Learning Based Algorithm Design for Fake News Detection Framework

M. MadhuBala^a, Ajay Kumar Yadav^b, G. Sucharitha^c, P. Praveen Kumar^d

a,b,c

Department of CSE, Institute of Aeronautical Engineering College, Hyderabad, India.

dDepartment of ECE, KLEF, Hyderabad, India.

a[email protected], ^b[email protected].^c[email protected],

Abstract: With the proliferation of social media, news is spreading faster. Moreover, social feedback became essential for organizations and governments to grow in intelligence and make strategies. However, there are cases of fake news that cause potential issues as they have wherewithal to influence people. The news in social media are polluted with fake news items if they are not identified correctly and removed from time to time. It is a challenging problem to be addressed. Many existing algorithms to detect fake news were good in performance. However, there is need for utilizing advanced Artificial Intelligence in the form of deep learning to leverage the state of the art. Towards this end, in this paper, an algorithm is designed to realize a fake news detection framework. The algorithm is known as Deep CNN based Fake New Detection (DCNN-FND) which exploits deep Convolutional Neural Network (CNN) and novel pre-processing mechanism with Natural Language Processing (NLP). The proposed framework implemented using Python data science platform, evaluated for its performance and compared with many existing techniques. The results revealed that the DCNN-FND shows better performance over the state of the art.

Keywords – Fake news detection, deep learning, artificial intelligence, convolutional neural networks

1. Introduction

Due to proliferation of social media, news items are increasing rapidly and people are influenced by the news and the information it carries. It is good to have such platforms to share data instantly. However, there is problem with fake news items. Unless they are detected and removed, it causes issues that cause people to suspect such news. In order to overcome this problem many techniques came into existence based on machine learning techniques. However, of late, advanced artificial intelligence (AI) is realized with deep learning techniques that are used for improving performance in data analytics[22][23][31].

(2)

Figure 1: Shows different methods used for fake news detection

Figure 2: General procedure used for fake news detection

As presented in Figure 1, there are many ways to perform fake news detection. In this paper we used model oriented approach. The general fake news detection process is illustrated in Figure 2. Literature has revealed different techniques that are used for prediction of fake

(3)

news. Detection of malicious content through emails is carried out in while fake news deception detection is investigated in with three kinds of fake news. Three different techniques are used in the detection of malicious advertisements. Social media data credibility analysis is made in while deep learning models are used in the literature, it is understood that the deep learning models need further optimization and ideal configurations in order to improve accuracy in fake news detection. Towards this end, in this paper, we proposed a deep learning based algorithm. Our contributions in this paper are as follows.

1. A fake news detection model known as Deep CNN based Fake New Detection (DCNN-FND) is proposed. It is nothing but an advanced CNN model that improves performance in detection accuracy.

2. We performed empirical study with different configurations of layers in the CNN model in order to fix the final CNN model (advanced model).

3. A prototype is built to evaluate the performance of DCN-FND and compare it with CNN baseline and the machine learning classifier Naïve Bayes.

The remainder of the paper is structured as follows. Section 2 reviews literature on fake review prediction models. Section 3 presents the proposed fake news detection framework and DCNN-FND algorithm. Section 4 presents experimental results while Section 5 concludes the research presented in this paper.

2. Related Work

This section presents review of literature on fake news detection models. Both machine learning and deep learning methods are found useful for detection of fake news collected from sources such as social media. Qbeitah, M. A., et.al [1] explored detection of malicious content that comes in the form of mails. They used NLP approaches and machine learning in order to detect it. Rubin, V. L., et.al [2]studied the concept of deception detection and investigated on three kinds of fakes in the news items over Internet. Fake information is also found through malicious advertisements. Masri, R.,et. al [3]proposed an automated methodology in order to detect malicious advertisements. Westerman et al [4]proposed a technique to find credibility of information over social media. It finds whether given information is credible and its probability. Chen.Y et al [5] on the other hand used news items and investigated on the need for an approach for automatic crap detector.Pogue, D et.al [6]

illustrated mechanisms useful for stamping out fake news so as to filter them. Konagalaet.al [7] used deep learning methods use supervised learning approach for fake news detection.

Pravin kshirsagar et.al.[[22][23][24][25]used machine learningtechniques to detect attacks that propagate malicious contents. Balmaset.al [9] defined methods to detect fake news from real and detected different political attitudes in the news spread by certain quarters. Brewer et al. [10] investigated on the impact of real news that talks about fake news.

Aldwairi et al [11] focused on malicious URLs that are spread over Internet. Abu-Nimeh et al. [12] investigated on spam posts and malicious news over social media. They proposed methods to detect such news items. Monti et al. [13] proposed a deep learning method based on Geometric deep learning to detect fake news. Messabi et al. [14] investigated DNS records along with domain name features in order to detect malware propagation. Qawasmeh et al.

(4)

[16] proposed a deep learning based framework to detect fake news automatically. From the literature, it is understood that the deep learning models need further optimization and ideal configurations in order to improve accuracy in fake news detection. We also anticipate that the developed algorithms will find potential applications in biomedical research [26][27][28][29][30] to identify the tumours in a tissue and various other applications [19- 21].

3. Proposed Fake News Detection Framework

The proposed fake news detection framework is described here. The framework illustrates the flow of operations in the process of detecting fake news collected from social media. The dataset collected is subjected to segmentation to divide the data into 80% training, 10%

testing and 10% for validation. The training data which has class labels is used for training the machine learning model Naive Bayes, deep learning model CNN and the proposed model known as CNN advanced model. After the training a model is created with knowledge meant for fake news detection. The knowledge gained from training is used to know whether news items in the testing data are fake or not fake news. Overview of the framework is shown in Figure 3.

Figure 3: Illustrates the overview of methodology

Before using data, there is pre-processing of data that performs NLP procedures such as stop word removal, stemming, etc. It also makes use of Google’s word2vec embeddings model for better vectorization of data. Thus it becomes easy to extract and compare features in order to determine whether a new item is fake or not. Pickle standard is used to serialize model for persistence and later reuse it. Figure 4 shows the underlying process in deep learning with convolutional and maxpooling layers.

Dataset (Fake News)

Segmentat ion

Naïve Bayes / CNN Baseline model / CNN Advanced

Model

Model for Detection of Fake

News

Not Fake News

Fake News

1 2

1

2

Labelled Data Unlabelled Data

(5)

Figure 4: The architectural overview of CNN

The input features are taken after pre-processing where NLP and Google’s word2vec vectorization process are involved are taken by CNN model. For each filter size convolutional and max pool layers are created and tanh is used as activation function. The convolutional layers are used to learn from data while subsampling is carried out by the max pool layers. The outputs are finally associated with fully connected and softmax layers prior to detection outcomes and performance evaluation results.

Algorithm 1:DCNN-FND algorithm

Algorithm: Deep CNN based Fake New Detection (DCNN-FND) Inputs: Fake news dataset D (from [15])

Output: Detection of fake news (labelling unlabelled training data) 1. Start

2. Initialize output vector R

3. (TrData, TestData)  Segmentation(D) 4. D’ = PreProcess(TrData)

5. Configure convolutional layers 6. Configure max pool layers 7. Configure activation function 8. Combine outputs

9. Add dropouts

10. Configure softmax layer for final scores 11. RGetFinalPredictions(TestData, model) 12. Compute accuracy

13. Compute loss function

(6)

14. Return R 15. End

As presented in the algorithm 1, the DCNN-FND is defined with different instructions. It takes Koggle dataset as input and produces predictions and also performance outcomes. It has different layers configured in order to realize the CNN advanced model.

Figure 5:Illustrates confusion matrix

Performance evaluation is carried out using confusion matrix provided in Figure 5 that is basis for computing model accuracy. The accuracy computed is based on the number of true positives, true negatives, false positives and false negatives exhibited by the prediction models evaluated (Naïve Bayes, CNN, CNN advanced model (DCNN-FND)). Eq. 1 shows the way accuracy is computed.

𝑇𝑃+𝑇𝑁

𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 (1) The proposed framework implemented using Python data science platform, evaluated for its performance and compared with many existing techniques. The results revealed that the DCNN-FND shows better performance over the state of the art. Section 4 presents more details about experimental results.

4. Experimental Results

A prototype is made using Python to implement the project with different packages such as pandas, numpy, keras. It uses TensorFlow as backend and NLP techniques for dealing with textual data.

(7)

Figure 6:Few instances in the training data

Dataset, as shown in Figure 6, is obtained from Koggle website [15]. The data is subjected to pre-processing. It makes use of NLP and word embeddings for better representation of textual data before subjected to model creation. CNN and CNN advanced models are used to detect fake news. Experiments are also made with Naïve Bayes classifier and the results are compared.

(a) (b)

Figure 7: Confusion matrix of CNN (a) and CNN advanced model (b)

(a) (b)

Figure 8: Results of baseline model (CNN)

(8)

(a) (b) Figure 9: Results of advanced model CNN

Figure 7 shows confusion matrix of both the CNN models that reflect the performance of the models as the accuracy is derived from it. As presented in Figure 8and Figure 9, the experimental results of both CNN and CNN advanced models are provided in terms of model accuracy and model loss. The former indicates performance of models while the latter is related to error rate. High level of accuracy and least model loss indicates the performance in fake news detection. The performance of advanced CNN model is found to be better than that of baseline model. With every epoch, there is improvement in model accuracy and reduction of model loss.

Table 1:Fake news detection performance comparison

Fake News Detection Models Performance (Accuracy %) Fake New Detection Model based on Naïve Bayes 0.8997

Fake New Detection Model based on CNN Baseline 0.9150 Fake New Detection Model based on CNN

Advanced Approach

0.9835

Figure 10: Fake news detection performance comparison

0.8997 0.915

0.9835

0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 FAKE NEW DETECTION MODEL BASED ON NAÏVE BAYES

FAKE NEW DETECTION MODEL BASED ON CNN BASELINE

FAKE NEW DETECTION MODEL BASED ON CNN ADVANCED APPROACH

Accuracy

Detection Models

Fake News Detection Performance

(9)

As presented in Table 1 and Figure 10 there is comparison of performance of different prediction models in fake news detection. The model based on Naïve Bayes showed least performance with 0.8997 while the CNN advanced model showed highest performance with 0.9835. Therefore, the results revealed that CNN advanced model able to predict more accurately. The rationale behind this is that the advanced CNN model has configurations in the model to meet the requirements for fake news detection dataset.

5. Conclusion & Future Work

In this paper CNN advanced model with an underlying algorithm known as Deep CNN based Fake New Detection (DCNN-FND) which exploits deep Convolutional Neural Network (CNN) and novel pre-processing mechanism with Natural Language Processing (NLP) is proposed and implemented. The algorithm is compared with baseline CNN model and traditional machine learning model known as Naïve Bayes. Koggle dataset is used for evaluation the proposed model. The news dataset is subjected pre-processing with NLP methods and word embeddings. The DCNN-FND, CNN and Naïve Bayes are used as fake new detection models for performance comparison. The results revealed that the DCNN-FND shows better performance over the state of the art. DCNN-FND showed highest performance with 0.9835 accuracy while CNN baseline model showed 0.9150 accuracy and Naïve Bayes 0.8997. In future, we use transfer learning methods to improve performance of DCNN-FND further.

References

1. Qbeitah, M. A., &Aldwairi, M. (2018, April). Dynamic malware analysis of phishing emails. In 2018 9th International Conference on Information and Communication Systems (ICICS) (pp. 18-24). IEEE.

2. Rubin, V. L., Chen, Y., & Conroy, N. K. (2015). Deception detection for news: three types of fakes. Proceedings of the Association for Information Science and Technology, 52(1), 1-4.

3. Masri, R., &Aldwairi, M. (2017, April). Automated malicious advertisement detection using virustotal, urlvoid, and trendmicro. In 2017 8th International Conference on Information and Communication Systems (ICICS) (pp. 336-341). IEEE.

4. Westerman, D., Spence, P. R., & Van Der Heide, B. (2014). Social media as information source: Recency of updates and credibility of information. Journal of computer-mediated communication, 19(2), 171-183.

5. Chen, Y., Conroy, N. K., & Rubin, V. L. (2015). News in an online world: The need for an “automatic crap detector”. Proceedings of the Association for Information Science and Technology, 52(1), 1-4.

6. Pogue, D. (2017). How to Stamp Out Fake News. Scientific American, 316(2), 24-24.

7. Konagala, V., &Bano, S. (2020). Fake News Detection Using Deep Learning:

Supervised Fake News Detection Analysis in Social Media With Semantic Similarity Method. In Deep Learning Techniques and Optimization Strategies in Big Data Analytics (pp. 166-177). IGI Global.

(10)

8. Aldwairi, M., Hasan, M., &Balbahaith, Z. (2020). Detection of drive-by download attacks using machine learning approach. In Cognitive Analytics: Concepts, Methodologies, Tools, and Applications (pp. 1598-1611). IGI Global.

9. Balmas, M. (2014). When fake news becomes real: Combined exposure to multiple news sources and political attitudes of inefficacy, alienation, and cynicism. Communication research, 41(3), 430-454.

10. Brewer, P. R., Young, D. G., &Morreale, M. (2013). The impact of real news about

“fake news”: Intertextual processes and political satire. International Journal of Public Opinion Research, 25(3), 323-343.

11. Kaur, S., Kumar, P., &Kumaraguru, P. (2020). Automating fake news detection system using multi-level voting model. Soft Computing, 24(12), 9049-9069..

12. Abu-Nimeh, S., Chen, T., &Alzubi, O. (2011). Malicious and spam posts in online social networks. Computer, 44(9), 23-28.

13. Monti, F., Frasca, F., Eynard, D., Mannion, D., & Bronstein, M. M. (2019). Fake news detection on social media using geometric deep learning. arXiv preprint arXiv:1902.06673.

14. Messabi, K. A., Aldwairi, M., Yousif, A. A., Thoban, A., &Belqasmi, F. (2018, June).

Malware detection using dns records and domain name features. In Proceedings of the 2nd International Conference on Future Networks and Distributed Systems (pp. 1-7).

15. “Kaggle Fake News Dataset”. Retrieved from https://www.kaggle.com/c/fake- news/data

16. Qawasmeh, E., Tawalbeh, M., & Abdullah, M. (2019, October). Automatic identification of fake news using deep learning. In 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS) (pp.

383-388). IEEE.

17. Poola, P. K., Kumaresan, Y., Krasnikov, I., &Seteikin, A. (2020). Terahertz Molecular Imaging and Its Clinical Applications. In Terahertz Biomedical and Healthcare Technologies (pp. 195-213). Elsevier.

18. Poola, P. K., Krasnikov, I., &Seteikin, A. (2020). Waveguides for Terahertz Endoscopy. In Terahertz Biomedical and Healthcare Technologies (pp. 215-224).

Elsevier.

19. Gudivada, A. A., Kumar, K. J., Jajula, S. R., Siddani, D. P., Poola, P. K., Vourganti, V., &Panigrahy, A. K. (2020). Design of area-efficient high speed 4× 4 Wallace tree multiplier using quantum-dot cellular automata. Materials Today: Proceedings.

20. Pragathi, D., Prasad, D., Padma, T., Reddy, P. R., Kumari, C. U., Poola, P. K.,

&Panigrahy, A. K. (2020). An extensive survey on reduction of noise coupling in TSV based 3D IC integration. Materials Today: Proceedings.

21. Divya, T. V., &Banik, B. G. (2021). A Walk Through Various Paradigms for Fake News Detection on Social Media. In Proceedings of International Conference on Computational Intelligence and Data Engineering (pp. 173-183). Springer, Singapore.

22. Kshirsagar PR, Akojwar SG, R. Dhanoriya, “Classification of ECG-signals using Artificial

NeuralNetworks”, https://www.researchgate.net/publication/317102153_Classificatio

(11)

n_of_ECGsignals_using_Artificial_Neural_Networks, International Conference on Electrical, Computer and Communication Technologies. 2017 .

23. P. Kshirsagar and S. Akojwar, "Classification & Detection of Neurological Disorders using ICA & AR as Feature Extractor", Int. J. Ser. Eng. Sci. IJSES, vol. 1, no. 1, Jan.

2015.

24. Pravin Kshirsagar and SudhirAkojwar, “Hybrid Heuristic Optimization for Benchmark Datasets”, International Journal of Computer Application (0975- 8887),Vol.146- No.7,July 2016 .

25. ]PravinKshirsagar and Dr.SudhirAkojwar, Novel Approach for Classification and Prediction of Non Linear Chaotic Databases, International Conference on Electrical, Electronics, and Optimization Techniques, March 2016.

26. Pravin Kshirsagar and Dr.SudhirAkojwar, Prediction of Neurological Disorders using Optimized Neural Network, In the proceeding of International Conference on signal processing, Communication, Power and Embedded System ,October (2016).

27. SudhirAkojwar, Pravin Kshirsagar, “A Novel Probabilistic-PSO Based Learning Algorithm for Optimization of Neural Networks for Benchmark Problems”, WSEAS International conference on Neural Network-2016, Rome, Italy.

28. SudhirAkojwar, Pravin Kshirsagar, “Performance Evolution of Optimization Techniques for Mathematical Benchmark Functions”, WSEAS International conference on Neural Network-2016, Rome,Italy.

29. Pravin Kshirsagar, Dr.SudhirAkojwar, “Classification and Prediction of Epilepsy using FFBPNN with PSO”, IEEE International Conference on Communication Networks, 2015.

30. SudhirAkojwar, Pravin Kshirsagar, VijetalaxmiPai “Feature Extraction of EEG Signals using Wavelet and Principal Component analysis”, National Conference on Research Trends In Electronics, Computer Science & Information Technology and Doctoral Research Meet, Feb 21st & 22nd ,2014