View of Ant Colony – Information Gain Based Feature Selection Method For Weather Dataset

(1)

Ant Colony – Information Gain Based Feature Selection Method For Weather Dataset

T. MALATHI ¹, DR. M. MANIMEKALAI²

1research Scholar, Shrimati Indira Gandhi College (Affiliated Bharathidasan University), Tiruchirappalli

2 Director, Department Of Computer Applications, Shrimati Indira Gandhi College (Affiliated Bharathidasan University), Tiruchirappalli

Abstract: Weather Forecasting Is An Emerging Domain That Predicts The Weather Condition At A Location At A Time. Weather Forecasting Is Considered As The Most Sensitive Research Field Which Facing A Lot Of Real- Time Issues Such As Inaccurate Prediction, Lack Of Handling In Huge Data Volume And Inadequate In Technology Advancement. Forecasting Weather Conditions Is Important For, E.G., Operation Of Hydro Power Plants And For Flood Management. Mechanistic Models Are Known To Be Computationally Demanding. Hence, It Is Of Interest To Develop Models That Can Predict Weather Conditions Faster Than Traditional Meteorological Models. The Field Of Machine Learning Has Received Much Interest From The Scientific Community. Due To Its Applicability In A Variety Of Fields, It Is Of Interest To Study Whether An Artificial Neural Network Can Be A Good Candidate For Prediction Of Weather Conditions In Combination With Large Data Sets. The Availability Of Meteorological Data From Multiple Online Sources Is An Advantage. In This Work, An Ant Colony – Information Gain Based Feature Selection Method Is Proposed Using Optimization Technique And Filter-Based Feature Selection Method.

Abstrak: (Abstract In Malysian Language) Gunakan Gaya Yang Diperuntukkan Dalam Dokumen Ini Untuk Menulis Kertas Anda. Anda Akan Mendapati Nama Gaya Antara Kurungan Pada Akhir Sifat Atau Perenggan.

(Normal).

Key Words: Weather Prediction, Feature Selection, Ant Colony Optimization, Information Gain, Classification

1. INTRODUCTION

Weather forecasting has been one of the most scientifically and technologically challenging problems around the world in the last century [1]. This is due mainly to two factors: first, it’s used for many human activities and secondly, due to the opportunism created by the various technological advances that are directly related to this concrete research field, like the evolution of computation and the improvement in measurement systems. To make an accurate prediction is one of the major challenges facing meteorologist all over the world. Since ancient times, weather prediction has been one of the most interesting and fascinating domain. Scientists have tried to forecast meteorological characteristics using a number of methods, some of these methods being more accurate than other [2].

To predict the weather by numerical means, meteorologists have developed atmospheric models that approximate the atmosphere by using mathematical equations to describe how atmospheric temperature, pressure, and moisture will change over time. The equations are programmed into a computer and data on the present atmospheric conditions are fed into the computer. The computer solves the equations to determine how the different atmospheric variables will change over the next few minutes. The computer repeats this procedure again and again using the output from one cycle as the input for the next cycle. For some desired time in the future (12, 24, 36, 48, 72 or 120 hours), the computer prints its calculated information. It then analyzes the data, drawing the lines for the projected position of the various pressure systems. The final computer-drawn forecast chart is called a prognostic chart, or prog [3]. A forecaster uses the progs as a guide to predicting the weather. There are many atmospheric models that represent the atmosphere, with each one interpreting the atmosphere in a slightly different way.

Climate is the long-term effect of the sun's radiation on the rotating earth's varied surface and atmosphere [4]. The Day-by-day variations in a given area constitute the weather, whereas climate is the longterm synthesis of such variations. Weather is measured by thermometers, rain gauges, barometers, and other instruments, but the study of climate relies on statistics. Nowadays, such

(2)

however, is still not a true picture of climate. To obtain this requires the analysis of daily, monthly, and yearly pattern [5].

2. RELATED WORKS

Singh, Nitin, Saurabh Chaturvedi, And Shamim Akhter [6] developed a weather forecasting system that can be used in remote areas is the main motivation of this work. The data analytics and machine learning algorithms, such as random forest classification, are used to predict weather conditions. In this paper, a low-cost and portable solution for weather prediction is devised.

Wang, Bin, et al [7] The authors design a data-driven method augmented by an effective information fusion mechanism to learn from historical data that incorporates prior knowledge from numerical weather prediction (NWP). The authors casted the weather forecasting problem as an end- to-end deep learning problem and solve it by proposing a novel negative log-likelihood error (nle) loss function. The metrics like mean absolute error (mae) and mean squared error (mse) are used in this paper.

Hua, Yuxiu, et al [8] The authors gave a brief introduction to the structure and forward propagation mechanism of LSTM. The authors aimed at reducing the considerable computing cost of lstm, the authors put forward a rclstm model by introducing stochastic connectivity to conventional lstm neurons. Therefore, rclstm exhibits a certain level of sparsity an leads to a decrease in computational complexity. The metrics like root mean squared error (rmse) and prediction accuracy are considered in this paper.

Hewage, Pradeep, et al [9] proposed a novel lightweight data-driven weather forecasting model by exploring temporal modelling approaches of long short-term memory (LSTM) and temporal convolutional networks (TCN). The proposed deep learning networks with LSTM and TCN layers are assessed in two different regressions, namely multi-input multi-output and multi-input single- output. MSE is considered as the evaluation metrics in this paper.

Karevan, Zahra, And Johan AK Suykens [10] The authors proposed Transductive LSTM (T- LSTM) which exploits the local information in time-series prediction. In this study, a quadratic cost function is considered for the regression problem. Localizing the objective function is done by considering a weighted quadratic cost function at which point the samples about the test point have larger weights. MAE and MSE are used as the performance metrics in this paper.

Wan, Renzhuo, et al [11] The authors utilized various deep learning models based on recurrent neural network (RNN) and convolutional neural network (cnn) methods. To improve the prediction accuracy and minimize the multivariate time series data dependence for aperiodic data, the dataset are analyzed by a novel Multivariate Temporal Convolution Network (M-TCN) model. In this model, multi-variable time series prediction is constructed as a sequence-to-sequence scenario for non- periodic datasets. The multichannel residual blocks in parallel with asymmetric structure based on deep convolution neural network is proposed. The metrics like RMSE, RRSE, and Correlation are used in this paper.

Mehrkanoon, Siamak [12] introduced novel data-driven predictive models based on deep convolutional neural networks (CNN) architecture for temperature and wind speed prediction in weather data. In particular, the proposed deep learning framework employs different upgrading versions of the convolutional neural networks i.E. 1d-, 2d- and 3d-cnn. The introduced models exploit the spatio-temporal multivariate weather data for learning shared representations using historical data and forecasting weather elements for a number of user defined weather stations simultaneously in an end-to-end fashion.

Dabrowski, Joel Janek, Yifan Zhang, And Ashfaqur Rahman [13] The authors argued that time invariance can reduce the capacity to perform multi-step-ahead forecasting, where modelling the

(3)

uses a deep feed-forward architecture to provide a time variant model. An additional novelty of forecast net is interleaved outputs, which the authors showed assist in mitigating vanishing gradients.

3. INFORMATION GAIN FEATURE SELECTION METHOD

In this module, An Information Gain [14][15] methodology is used to select the features for improving the business strategy by analyzing the weather dataset.

Entropy has frequently utilized in the information theory measure, which characterizes the purity of an arbitrary collection of examples. It is the foundation of the Information Gain (IG) and Gain Ratio (GR). The entropy measure is considered a measure of the system’s unpredictability. The entropy of Y is

H(Y) = ∑_y∈Yp(y) log₂(p(y)) (1)

The random feature Y, the marginal probability density function is p(y). If the detected values of Y in the training data set S has divided consorting to the values of a next feature X, and the entropy of Y concerning the split induced by X is minimum than the entropy of Y before dividing, then there is an association among features X and Y. Y Later observing X and Y entropy is given by :

H(Y|X) = ∑_{x ∈X}P(x)∑_{y ∈Y}P(y|x)log₂(p(y|x)) (2) The conditional probability of y is p(y|x) for given x

Given the entropy is a condition of impurity in a training set S, it can describe a measure excogitating additional information about Y rendered by X that represents the amount by which the entropy of Y decreases. This measure is known as IG. It is given by

IG = H(Y)-H(Y│X) = H(X)-H(X│Y) (3)

In this work, a new feature selection method is proposed to choose the most ideal feature subset for removing the irrelevant and redundant features from the weather dataset. In this method, ACO is opted for choosing the most ideal feature subsets with Information Gain whereas Information Gain is used as a heuristic function for choosing most appropriate structures. Information Gain in this contribution is used to find the fitness function of ACO or evaluation of the subset.

4. ANT COLONY OPTIMIZATION BASED FEATURE SELECTION METHOD

Ant colony optimization (ACO) [16][17] a population-based probabilistic Meta-Heuristics ACO is based on ants foraging behavior. Foraging behavior of ant is an interesting phenomenon by which ant colonies find the shortest path between food source and nest through indirect communication called stigmergy. Ants, like many other social insects, communicate with each other by dropping a chemical substance on their path. This chemical substance is called pheromone. It provides a positive feedback mechanism to attract other ants. Those paths which have a higher value of pheromone have a high probability of being selected. Whereas the paths that are not selected their pheromone is decreased by an evaporation process.

In ACO each ant constructs a complete solution using two things (1) node transition probability function which is based on the quantity of pheromone spread by ants and heuristic information about the importance and quality of each individual solutions and (2) already traversed solutions memory.

As generations get completed, solutions constructed by each ant are evaluated using some evaluation criteria. After that pheromone evaporation and update, mechanism is also used which evaporates intensity of pheromone from the paths with low fitness value and hence discarded gradually. The ACO algorithm requires specifying the following aspects for implementation:

1) Representation of the problem domain in such a way that it lends itself to incrementally building a solution for the problem, usually in the form of a graph.

(4)

2) Node transition probability rule based on the amount of pheromone value and of the heuristic function we have employed information gain as a heuristic function. Following is the equation for calculating the probability of each node:

P_jⁱ = [τ(i, j)]^α [η(i, j)]^β

∑_{k ϵS}[τ(i, j)]^α [η(i, j)]^β

Where P_jⁱ is a probability of the i^th ant to move from node i to node j at time t. P_jⁱ(t) =0 means that ants are not allowed to move to any node In the neighbour. [τ(i, j)]^α is the amount of pheromone on the edge connecting i and j, where is a constant which is used to control relative importance of pheromone information. After each iteration, this pheromone information is updated by all the ants and in some versions of ACO only best ant can update pheromone. [η(i, j)]^β is the heuristic function that denotes the heuristic value of edge connecting i and j. usually, the heuristic value does not change during execution of the algorithm. In this paper we have used information gain to denote heuristic value. β is a constant which is used to control relative importance of heuristic value.

3) A heuristic evaluation function called fitness function dependent on the problem, which provides a goodness measurement for the different solution components. We have used fitness function is based on Information gain to normalize the biasness of information gain and mutual information towards multi-valued attributes. Following formula being used to compute the value of the selected subset.

Fitness (S) = (F − S) ∗ (∑ⁿ_i=1(IG)) F

Where S is reduced subset selected by ACO, IG is the information gain of feature i in the subset S and F is the total number of features present in the dataset . It will select feature subset with high information gain value and with less number of features.

4) Pheromone evaporation and updating rule which takes into account the evaporation and reinforcement of the paths. Once subsets are evaluated using fitness function, pheromone trails are updated. Firstly using an evaporation rate ρ the pheromone trails on the edges are evaporated or decreased to minimize the effect of a sub-optimal feature to which the ants have previously converged.

Secondly amount of pheromone on the edges is updated with amounts proportional to the fitness of the solution. Some approaches for pheromone updating allowed all the ants to update their paths according to the fitness of their solution and in some approaches only best ant is allowed to update pheromone value on its path.

For the pheromone evaporation and updating following equations are used.

τ = (1 − P) ∗ τ where p is 0.15 τ = τ + (τ ∗ Q)

And Q = [1 − ( ¹

1+Fitness)]

The above equations, used for pheromone evaporation and updating.

5) Where ―”Fitness” is the value of the selected subset through an independent statistical measure.

6) Stopping/convergence criterion that decides when the algorithm terminates usually depends on maximum number of iterations.

(5)

5. PROPOSED ANT COLONY – INFORMATION GAIN BASED FEATURE SELECTION METHOD

In this approach ACO is used for selecting most optimal feature subsets along with Information Gain where Information Gain is used as heuristic function for selecting most relevant features. Fitness function or subset evaluation is also based on Information Gain. It’s a pure filter approach, along with it we have also used classifier ensemble to improve predictive performance of filter approaches comparable to the wrapper approaches.Figure 1 depicts proposed AC-IG feature selection method. It is filter approach which selects features on the basis independent Information Gain measure. So, some features that might be less important in terms of independent relevance to class but for a classifier, such features could be important. Therefore, AC-IG uses classifier ensemble on different convergence threshold values and classification accuracy of subsets is used to provide final feature subset.In proposed approach first dataset is loaded. Once dataset is loaded, information gain of each feature/attribute in data set is computed. Then all the parameters of ant colony optimization algorithm are initialized. Such as number of ants, α and β values of node transition probability function, path convergence threshold value, pheromone evaporation rate ρ. and maximum number of generations.

A search space is constructed that consists of nodes proportional to the number of features in the dataset. Fixed numbers of ants are generated in each iteration where each ant generates a candidate solution. After each generation, generated solutions are evaluated using a subset evaluator. Subset evaluator is based on Information Gain between selected features and the class. After subset evaluation best solution is gained on the basis of maximum fitness value and is preserved. Then termination criteria of the algorithm are checked which is based on two conditions i.e. on a maximum number of generations and convergence threshold. If termination criteria are not met each ant updates its pheromone value to the quality of solution generated by each ant. Otherwise, if any termination/stopping criterion is met algorithm outputs ten best subsets. These subsets are provided to Decision Tree classifier techniques like J48, ID3 and Classification and Regression Tree (CART).

Fig. 1. Proposed Ant Colony – Information Gain based feature Selection method

Then one subset is selected based on the highest average weighted accuracy of classifier ensemble and saved. Then again convergence threshold is checked. If it is less than 500, the whole process is repeated. Otherwise, the algorithm stops and one best subset with the highest accuracy is selected from all saved subsets and is considered as final subset Then new ants are produced and this complete process goes iteratively till highest number of epochs is reached or the algorithm convergence to a solution.

(6)

6. RESULT AND DISCUSSION

In this research paper, the evaluation of the feature selection techniques like IG, ACO, and proposed Ant Colony – Information Gain (AC-IG) Based Feature Selection Method is done with the classification techniques like ID3, CART and J48 for three different weather datasets. These weather datasets are considered and download in Kaggle repository [18]. The evaluation metrics like Accuracy (in %), True Positive Rate (in %), Precision (in %), False Positive Rate (in %), Miss Rate (in %), and False Discovery Rate (in %) are considered in this work to evaluate the performance of the feature selection techniques with different classifiers.

The efficiency of the proposed AC-IG based Feature Selection method is evaluated with the classification techniques like CART, ID3, J48 classical classification techniques with the above- mentioned performance metrics. Table 1 gives number of features obtained by the existing IG, ACO and proposed AC-IG feature selection method. From the table 1, it is clear that the proposed AC-IG feature selection method gives a smaller number of features when it is compared with other existing feature selection techniques.

Table 1: Number of Features Obtained by IG, ACO and proposed AC-IG feature selection method Feature Selection

Techniques

Number of Features obtained

Information Gain (IG) 35 Ant Colony Optimization 33 Proposed AC-IGFS

method

24 2.1. Performance Analysis of the Feature Selection Methods

The classification techniques like ID3, CART and J48 are used to analyse the performance of the feature selection techniques using evaluation metrics. Table 2 gives Classification Accuracy (in %) of the IG, ACO, and proposed IG-AC feature selection using ID3, CART and J48 Classifier. Figure 2 depicts the graphical representation of the Classification Accuracy (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers.

From the table 2 and figure 2, the proposed IG-AC based feature selection method with CART classifiers gives more accuracy than other techniques.

(7)

Table 2: Classification Accuracy (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers

Feature Selection Techniques Classification Accuracy (in %)

ID3 CART J48

Original Dataset 56.25 58.63 55.95

Information Gain (IG) 78.45 82.54 72.96

Ant Colony Optimization (ACO) 80.12 83.63 73.85

Proposed IG-AC based Feature Selection Method

86.42 87.64 83.36

Figure 2: Graphical representation of the Classification Accuracy (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and

J48 classifiers

Table 3 gives True Positive Rate (in %) of the IG, ACO, and proposed IG-AC feature selection using ID3, CART and J48 Classifier. Figure 3 depicts the graphical representation of the True Positive Rate (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers. From the table 3 and figure 3, the proposed IG-AC based feature selection method with CART classifiers gives more True Positive Rate than other techniques.

Table 3: True Positive Rate (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers

Feature Selection Techniques True Positive Rate (in %)

ID3 CART J48

Proposed IG-AC based Feature Selection Method 84.64 85.42 81.58

(8)

Figure 3: Graphical representation of the True Positive Rate (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48

classifiers

Table 4 gives Precision (in %) of the IG, ACO, and proposed IG-AC feature selection using ID3, CART and J48 Classifier. Figure 4 depicts the graphical representation of the Precision (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers. From the table 4 and figure 4, the proposed IG-AC based feature selection method with CART classifiers gives more precision than other techniques.

Table 4: Precision (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers

Feature Selection Techniques Precision (in %)

ID3 CART J48

(9)

Figure 4: Graphical representation of the Precision (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers

Table 5 gives False Positive Rate (in %) of the IG, ACO, and proposed IG-AC feature selection using ID3, CART and J48 Classifier. Figure 5 depicts the graphical representation of the False Positive Rate (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers. From the table 5 and figure 5, the proposed IG-AC based feature selection method with CART classifiers gives less False Positive Rate than other techniques.

Table 6 gives Miss Rate (in %) of the IG, ACO, and proposed IG-AC feature selection using ID3, CART and J48 Classifier. Figure 6 depicts the graphical representation of the Miss Rate (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers. From the table 6 and figure 6, the proposed IG-AC based feature selection method with CART classifiers gives less Miss Rate than other techniques.

Table 7 gives False Discovery Rate (in %) of the IG, ACO, and proposed IG-AC feature selection using ID3, CART and J48 Classifier. Figure 7 depicts the graphical representation of the False Discovery Rate (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers. From the table 7 and figure 7, the proposed IG-AC based feature selection method with CART classifiers gives less False Discovery Rate (in %) than other techniques.

Table 5: False Positive Rate (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers

Feature Selection Techniques False Positive Rate (in %)

ID3 CART J48

Proposed IG-AC based Feature Selection Method 21.9 19.32 27.47

(10)

Figure 5: Graphical representation of the False Positive Rate (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48

classifiers

Table 6: Miss Rate (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers

Feature Selection Techniques Miss Rate (in %)

ID3 CART J48

Figure 6: Graphical representation of the Miss Rate (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers

Table 7: False Discovery Rate (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48 classifiers

(11)

ID3 CART J48

Proposed IG-AC based Feature Selection Method

12.69 11.14 15.42

Figure 7: Graphical representation of the False Discovery Rate (in %) of the Original dataset, IG, ACO and proposed IG-AC based Feature Selection method using ID3, CART and J48

classifiers 7. CONCLUSION

Ant Colony – Information Gain (AC-IG) based Feature Selection method has been proposed in this paper. AC-IG is proposed by combining the IG and ACO with a Classifiers and has been used to optimize the feature subset selection process. Results showed that proposed approach has performed well in terms of dimensionality reduction and classification accuracy as compared to other approaches. The Information Gain is used as a heuristic measure in AC-IG which has normalized the biases of another heuristic measure towards multi-valued attributes and selected features that are highly relevant to the class. Secondly, the classifier ensemble has been used in a novel way with ACO.

It checks the classification accuracy of subsets achieved on different convergence threshold.

Classifiers helps to opt important features that are not selected by independent measure. It has not used classifier for optimizing results rather we have used it only for selecting a subset with the highest accuracy, so the proposed approach is not computationally costly.

REFERENCES

[1] Abdel-Kader, Hatem, Mustafa Abd-El Salam, and Mona Mohamed. "Hybrid Machine Learning Model for Rainfall Forecasting." Journal of Intelligent Systems and Internet of Things 1.1 (2021): 5-12.

[2] Diao, Li, et al. "Short-term weather forecast based on wavelet denoising and catboost." 2019 Chinese control conference (CCC). IEEE, 2019.

[3] Kale, Shivani S., and Preeti S. Patil. "Data mining technology with fuzzy logic, neural networks and machine learning for agriculture." Data management, analytics and innovation. Springer, Singapore, 2019. 79-87.

(12)

[4] Gad, Ibrahim, and Doreswamy Hosahalli. "A comparative study of prediction and classification models on NCDC weather data." International Journal of Computers and Applications (2020):

1-12.

[5] Hill, Aaron J., Gregory R. Herman, and Russ S. Schumacher. "Forecasting severe weather with random forests." Monthly Weather Review 148.5 (2020): 2135-2161.

[6] Singh, Nitin, Saurabh Chaturvedi, And Shamim Akhter. "Weather Forecasting Using Machine Learning Algorithm." 2019 International Conference On Signal Processing And Communication (ICSC). Ieee, 2019.

[7] Wang, Bin, Et Al. "Deep Uncertainty Quantification: A Machine Learning Approach For Weather Forecasting." Proceedings Of The 25th ACM SIGKDD International Conference On Knowledge Discovery & Data Mining. 2019.

[8] Hua, Yuxiu, Et Al. "Deep Learning With Long Short-term Memory For Time Series Prediction."

IEEE Communications Magazine 57.6 (2019): 114-119 .

[9] Hewage, Pradeep, Et Al. "Deep Learning-based Effective Fine-grained Weather Forecasting Model." Pattern Analysis And Applications (2020): 1-24.

[10] Karevan, Zahra, And Johan AK Suykens. "Transductive LSTM For Time-series Prediction: An Application To Weather Forecasting." Neural Networks 125 (2020): 1-9.

[11] Wan, Renzhuo, Et Al. "Multivariate Temporal Convolutional Network: A Deep Neural Networks Approach For Multivariate Time Series Forecasting." Electronics 8.8 (2019): 876.

[12] Mehrkanoon, Siamak. "Deep Shared Representation Learning For Weather Elements Forecasting." Knowledge-based Systems 179 (2019): 120-128.

[13] Dabrowski, Joel Janek, Yifan Zhang, And Ashfaqur Rahman. "Forecastnet: A Time-variant Deep Feed-forward Neural Network Architecture For Multi-step-ahead Time-series Forecasting." Arxiv Preprint Arxiv:2002.04155 (2020).

[14] Ahuja, Ravinder, et al. "Classification and clustering algorithms of machine learning with their applications." Nature-Inspired Computation in Data Mining and Machine Learning. Springer, Cham, 2020. 225-248.

[15] Singer, Gonen, Roee Anuar, and Irad Ben-Gal. "A weighted information-gain measure for ordinal classification trees." Expert Systems with Applications 152 (2020): 113375.

[16] Pan, Mingzhang, et al. "Photovoltaic power forecasting based on a support vector machine with improved ant colony optimization." Journal of Cleaner Production 277 (2020): 123948.

[17] Mehdizadeh, Saeid, et al. "Implementing novel hybrid models to improve indirect measurement of the daily soil temperature: Elman neural network coupled with gravitational search algorithm and ant colony optimization." Measurement 165 (2020): 108127.

[18] https://www.kaggle.com/datasets?search=weather+dataset