• Nu S-Au Găsit Rezultate

View of HBP-SRF: an intelligent model for prediction of Hormone Binding Proteins using Statistical Moment

N/A
N/A
Protected

Academic year: 2022

Share "View of HBP-SRF: an intelligent model for prediction of Hormone Binding Proteins using Statistical Moment"

Copied!
12
0
0

Text complet

(1)

HBP-SRF: an intelligent model for prediction of Hormone Binding Proteins using Statistical Moment

Khalid Allehaibi

Department of Information Technology, Faculty of Cmputing and Information Technology, King Abdul Aziz University Jeddah, Saudi Arabia

Email: [email protected]

ABSTRACT

The Hormone binding proteins (HBPs) is a protein type that reaches itself to the targeted hormones to regulate it. Inside the cell it resides in the outer region of growth hormone receptor and helps in the growth of cells inside the organisms. Due to its several advantages, it is important to identify the molecular mechanism of HBPs. Thus, researcher has developed computational model for the prediction of these proteins and achieved very good result, but further improvement is required to achieve more accurate results. The purpose of this study is to develop a computational model that can predict HBPs with high accuracy than the existing model. This model is constructed using statistical moment feature extraction method along with Random Forest Classification algorithm and assessed using Jackknife testing and 10-fold cross validation. The results obtained by this model are 94.5% for jackknife testing and 95.14% for 10-fold cross validation. The model performed well and obtained remarkable results as compared the previous models available in literature.

Keywords:

Hormone Binding Proteins, Statistical moment, K fold Cross Validation, Jack-knife testing, Random Forest

Introduction

A binding protein is a kind of protein which acts as an agent to combine two or more molecules.

The Hormone binding proteins (HBPs) belongs to protein family that reaches itself to the targeted hormones to regulate it. Primarily it was detected in the cell of pregnant mice and rabbits. Inside the cell it resides in the outer region of growth hormone receptor and plays an important role in the in the growth of cells inside the organisms (Baumann, 2001)-(K. Wang et al., 2019)-(Ozzola, 2016). It also regulates the circulating steroid hormones that are considered the primary gatekeeper of steroid actions (Steroid Binding Protein - an Overview | ScienceDirect Topics, n.d.)-. Any kind of abnormality in these can results several diseases(Nawaz et al., 2020)-(Rehman

& Shahzad, 2018). Different hormone binding proteins analysis is shown in Fig 1. Researchers have tried to figure out different biological function of HBPs, but it is still difficult due to its complex structure. Therefore, it is very essential to identify these proteins to accurately categorize HBPs and to understand its molecular mechanisms. Some traditional methods are used to identify HBPs but due to rapid changes in the its structure, these methods are considered as ineffective, time consuming and less accurate (Carnevali et al., 2018)-(Sohm et al., 1998)-(Y.

Zhang & Marchant, 1999). Here we need an intelligent model, which can do prediction with highly accuracy and consumes less time. In past, many researchers have developed computational models for the identification of these proteins. Tang et al developed a model using Support Vector Classifier along with incremental feature selection method, and performed jackknife testing method for assessment and obtained the accuracy result of 88.6% for HBPs and 81.3% for non HBPs (Tang et al., 2018). Basith et al. proposed a computational model iGHBP for the identification of growth hormone binding protein which yielded the accuracy of 84.9% (Basith et

(2)

al., 2018). Kaiyang et al (Qu et al., 2017) used three feature representation technique with mixed four mixed feature representation technique along with Support Vector Machine classifier for the prediction of DNA binding proteins. Similarly, Jui et al (Tan et al., 2019) constructed a model for the prediction of HBPs using SVM along with 5-fold cross validation. Shahid et al (Akbar et al., 2020) constructed another computational model using three datasets (S1, S2 and S3) and obtained results of 94.41, 92.31 and 90.48 respectively. The above-mentioned computational models are used to identify the hormone binding protein and performed very well. But still there is a room for improvement in identification of HBPs. To develop more intelligent model, this study has been proposed. Here we have constructed a model by using statistical moment with random forest classification method and for the assessment of model 10-fold cross validation and Jackknife testing is used. The model is more accurate as compared to the existing model in the literature.

Table 1Hormone binding protein analysis (Hormone Binding Proteins Chart - Women’s International Pharmacy, n.d.)

Sex Hormone Binding Thyroid (Thyroxine) Binding Globulin (TBG)

Cortisol binding globulin (CBG) (or Transcortin)

Albumin (Serum)

Production Liver (also brain, uterus, testes, placenta)

Liver Liver Liver

Purpose Binds

-Dihydrotestosterone -Testosterone -Estradiol/enteron

(DHEA and androstenedione bounds almost completely to albumin)

Binds

-Thyroid hormone in circulation (T3 and T4 primarily, also T1 and T2)

Binds -Cortisol -Progesterone -Aldosterone -11-

Deoxycorticosterone (DOC)—aldosterone precursor

Transports:

-Thyroids Hormones

-Fast soluble hormones

-Fatty acids to liver -Uniconj-bilirubin -Various drugs (influences T1/T2) -Various minerals Hormone

affinities

-Binds biologically active androgens and astrogens only (4-5x better than albumin)

-Carries the majority of T4 -Carries > 90% of cortisol in plasma.

-Somewhat lower for progesterone

-Binds all steroid hormones the same low affinity.

-99% of albumin binding sides remain opens

1. Materials and Methods 1.1.Benchmark Dataset

To perform well, for a computational model the selection of objective and valid dataset is a crucial step. As the valid and quality dataset leads to improve the performance of the model. The dataset for this model is constructed by collecting datasets from previous papers. Three sets of data (S1, S2 and S3) are used in this paper. In which S1 contains 123 HBPs sequences and 123 non-HBPs sequences and previously this data was utilized by Tang et al. (Tang et al., 2018). In S2, there are 31 HBPs sequences and 31 non-HBPs sequences while S3 contains 46 HBPs and 46

(3)

non-HBPs sequences (92 protein sequences as whole). For the training and testing of the proposed model we have combined all these datasets and passed them through statistical moment for the extraction of features.

1.2. Feature Extraction Technique

The extraction of protein sequences plays an important role in many areas such as to identify similarity in protein structure, to predict protein-protein interaction and also understanding the functionalities of protein (Mu et al., 2019). Many feature extraction techniques are used for the purpose (Buzuloiu, 1987)such as Amino Acid composition (AAC) (Li & Wang, 2016)-(Nguyen et al., 2017), Pseudo Code Amino Acid Composition (PseAAC) (Z. Ju, J.J. He, Prediction of Lysine Crotonylation... - Google Scholar, n.d.)-(Hajisharifi et al., 2014), Dipeptide composition (DPC) (L. Wang et al., 2017), Tri peptide composition (TPC) (Chen et al., 2018) , Statistical moment (Maros et al., 2020)-(Statistical Moment - an Overview | ScienceDirect Topics, n.d.)- (Mayer & Brazzell, 1988) and Position specific storage matrix (PSSM) (Dehzangi et al., 2017)- (Liang & Zhang, 2018). In the proposed study, the statistical moment is used for the extraction of hormone binding protein sequences.

1.2.1. Statistical Moment

Statistical moment is the commonly used feature extraction technique which is mostly used in pattern recognition and for the extraction of feature from biological sequences. The main purpose behind this method is to convert the sequences to numerical for the better training of the model (Technology et al., 2020). The Raw, Hahn and central moments are calculated for the datasets to find out the features of the frequency distribution. Further, to extract the hidden patterns and obscure features of the dataset the related terminologies like Accumulative-Absolute-Position- Incidence Vector (AAPIV), RAAPIV (Reverse-Accumulative Absolute Position-Incidence- Vector), Frequency matrix, Position relative index matrix (PRIM) and Reverse PRIM (Butt et al., 2019)- .

1.3. Classification

The classification step is the most important step while developing a computational model. Here the machine learning model is trained using various input variable for the purpose to make it capable to predict which class/category the new coming input falls into (Classification In Machine Learning | Classification Algorithms | Edureka, n.d.). Various classification algorithms such as Support Vector Machine (An Introduction to Support Vector Machines (SVM), n.d.), Logistic Regression (Alotaibi, 2019), K-Nearest Neighbor (KNN) (S. Zhang et al., 2018), Naïve Bayes (John & Langley, 2013), Random Forest (Decision Tree vs. Random Forest - Which Algorithm Should You Use?, n.d.) and neural network (Specht, 1990)-(Osman, 2016).

1.3.1. Random Forest

This proposed study used Random Forest for classification of datasets. Random Forest is a common classification and regression technique where the algorithm creates the forest with the number of decision tress. The more decision tress is used, the more robust classification model

(4)

will be obtained with high accuracy. Multiple decision trees are created the Random Forest with the Information Gain (KENT, 1983) and Gini Index approach (Gini Index - an Overview | ScienceDirect Topics, n.d.). For the prediction of new object, each decision tree gives prediction results for each class. The random Forest choose the class that has a greater number of votes among all the other decision tress in the forest and in case of regression takes average of the outputs by different values. The Random Forest algorithm is widely used due to its ability to handle missing values and maintain accuracy for them, avoid over fitting and also can handle large amount of dataset with large dimensions (Decision Tree vs. Random Forest - Which Algorithm Should You Use?, n.d.).

Figure 1Random Forest Classification(The Structure of the Random Forest Classifier | Download Scientific Diagram, n.d.)

2. Evaluation and testing methods 2.1. Evaluation matrices

To evaluate the performance of the existing model, some matrices are used. Evaluation matrices used to assess how much better the developed model perform. Accuracy, specificity, sensitivity and MCC are some evaluation criteria which we have used to measure the performance of this model. These evaluation criteria measure the outcomes of the classification model as True Positive (TP), True Negative (TN), and False Positive (FP), False Negative (FN). These outcomes are shown through the following confusion matrix (Evaluation Metrics Machine Learning, n.d.).

Table 2 Confusion Matrix

Actual Class Predicted Class

Negative Positive

(5)

Non HBPs (Negative) Ṫ𝒩 (Predicted as non HBPs) 𝐹Ṗ (Predicted as HBPs) HPBs (Positive) 𝐹𝒩 (Predicted as non HBPs) ṪṖ (Predicted as HBPs)

Accuracy: Accuracy is the degree which is close to the true values. From the whole dataset, the actual number of positive predicted values and the actual number of negative predicted values represent the accuracy value (What Is the Difference Between Accuracy and Precision?, n.d.). The mathematical formula is used to represent accuracy is:

Ẫ = ṪṖ+Ṫ𝒩

ṪṖ+Ṫ𝒩+𝐹Ṗ+𝐹𝒩

Sensitivity (ṥ ): The positive values correctly predicted as positive by the classification model are called specificity (YERUSHALMY, 1947).

ṥ = ṪṖ

ṪṖ+𝐹Ṗ

Specificity (𝑺ṕ): From the whole negative dataset, the predicted negative values are termed as specificity. Mathematically, the following formula represents specificity (Basic Evaluation Measures from the Confusion Matrix – Classifier Evaluation with Imbalanced Datasets, n.d.).

𝑆ṕ = Ṫ𝒩 FṖ + Ṫ𝒩

MCC (Mathew-correlation-Coefficient): The MCC value is used to evaluate the proposed model by computing the correlation for the negative and positive datasets. If the correlation between the actual and predicted class is observed high, then the prediction value is considered as accurate. The generalized equation to compute MMC is given below:

𝑴𝑪𝑪 = (ṪṖ ∗ 𝐹𝒩) − ( FṖ ∗ 𝐹𝒩)

ṪṖ + FṖ (ṪṖ + 𝐹𝒩)(Ṫ𝒩 + FṖ)(Ṫ𝒩 + 𝐹𝒩)

2.2. Testing with K Fold Cross validation

K fold cross-validation is a testing method used to measure the performance of machine learning models. In this method the whole dataset is divided into subgroups according to the value of k h, and among those subgroups one set is kept as testing set and the remaining sub groups are used for training the model. The number of iterations are repeated according to the value of k. in this study, we have kept the k value as 10 and the numbers of iterations performed for this model are ten times (Cross-Validation Tutorial, n.d.) -(10 Fold Cross Validation - Google Search, n.d.).

Result of 10-fold cross validation is shown in the given table.

Table 310-Fold Cross Validation Results Predictor Accuracy Matrices

HBP- SRF

Accuracy Specificity Sensitivity MCC

K 1 66.5 70 70 0.62

K 2 100 100 100 1

(6)

K 3 97.5 100 95 0.95

K4 100 100 100 1

K5 97.5 95 100 0.95

K6 97.5 100 95 0.95

K 7 97.5 100 95 0.95

K 8 100 100 100 1

K9 97.5 100 95 0.95

K10 97.4 100 94.74 0.94

Final 10CV Score = 95.144

Figure 2ROC Curve for Cross Validation 2.3.Testing with Jackknife Testing

Another testing evaluation method called Jackknife testing is also used to test the proposed model. Jackknife is a resampling method which is commonly used for bias and error estimation in statistical problems. For statistic estimation it equally leaves out one observation from the whole dataset. Finally, it calculates average of all the estimation and generates results (Jackknife Test - an Overview | ScienceDirect Topics, n.d.). For the model Jackknife testing performed very well and acquired an accuracy of 94.5%.

Predictor Name Accuracy% Specificity% Sensitivity%

HBP-SRF 95.1 96.2 95.45

Table 4 Jackknife Testing for HBP-SRF 3. Comparison with existing models

The proposed model is compared to the existing state of the arts models to evaluate its performance. HBP-RF is compared with the existing model iHBP-DeepPSSM(Akbar et al., 2020)

(7)

which has used three datasets and acquired the accuracy of 94.41%, 92.31% and 90.48%. HBP- RF is also compared with Tang el al (Tang et al., 2018) HBPs prediction model, iGHP(Basith et al., 2018), Wang et al prediction model which has 84.90%, 84.96% and 90.7%, respectively. The proposed model HBP-RF has performed well as compared to all these existing models and obtained remarkable results which are shown in the given table:

Predictor Name Accuracy% Specificity% Sensitivity%

HBP-SRF 95.14 96.5 94.47

iHBP-DeepPSSM 94.41 97.30 94.12

Tang et al 84.90 88.60 81.30

iGHP 84.96 88.62 81.30

Table 5Comparison Table of HBP-SRF with other models 4. Conclusion

In this work, an intelligent computational model is developed for the prediction of hormone binding proteins by using machine learning approaches. Dataset is collected from protein database and passed through feature extraction method of statistical moment. Further, classification algorithm Random Forest is used as classifier. Finally, the model is assessed using 10-fold cross validation and Jackknife testing and acquired accuracy results of 95.14 and 94.5 respectively. The obtained results are better than the existing model thus the proposed model is considered as accurate and efficient as compared to previous models available in literature.

REFERENCES

[1] 10 fold cross validation - Google Search. (n.d.). Retrieved March 5, 2021, from https://www.google.com/search?q=10+fold+cross+validation&client=avast&sxsrf=ALeKk 01oyrE_fvMvNvQ3YgEuXflGCH_sPw:1614921537270&source=lnms&tbm=isch&sa=X&

ved=2ahUKEwimqsbEs5jvAhUbiFwKHUS_CwoQ_AUoAXoECBUQAw&biw=1366&bih

=635#imgrc=M0d-gzdf1eyudM

[2] Akbar, S., Khan, S., Ali, F., Hayat, M., Qasim, M., & Gul, S. (2020). iHBP-DeepPSSM:

Identifying hormone binding proteins using PsePSSM based evolutionary features and deep learning approach. Chemometrics and Intelligent Laboratory Systems, 204(June), 104103.

https://doi.org/10.1016/j.chemolab.2020.104103

[3] Alotaibi, F. M. (2019). Classifying Text-Based Emotions Using Logistic Regression.

VAWKUM Transactions on Computer Sciences, 7(1), 31–37.

https://doi.org/10.21015/vtcs.v16i2.551

[4] An Introduction to Support Vector Machines (SVM). (n.d.). Retrieved February 1, 2021, from https://monkeylearn.com/blog/introduction-to-support-vector-machines-svm/

(8)

[5] Basic evaluation measures from the confusion matrix – Classifier evaluation with imbalanced datasets. (n.d.). Retrieved April 16, 2021, from https://classeval.wordpress.com/introduction/basic-evaluation-measures/

[6] Basith, S., Manavalan, B., Shin, T. H., & Lee, G. (2018). iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree. Computational and Structural Biotechnology Journal, 16, 412–420.

https://doi.org/10.1016/j.csbj.2018.10.007

[7] Baumann, G. (2001). Growth hormone binding protein 2001. In Journal of Pediatric Endocrinology and Metabolism (Vol. 14, Issue 4, pp. 355–375). Freund Publishing House Ltd. https://doi.org/10.1515/JPEM.2001.14.4.355

[8] Butt, A. H., Rasool, N., & Khan, Y. D. (2019). Prediction of antioxidant proteins by incorporating statistical moments based features into Chou’s PseAAC. Journal of Theoretical Biology, 473, 1–8. https://doi.org/10.1016/j.jtbi.2019.04.019

[9] Buzuloiu, V. (1987). IMAGE PROCESSING ARCHITECTURES. VFAST Transactions on Software Engineering, 8(1), 138–141. https://doi.org/10.21015/vtse.v13i2.508

[10] Carnevali, O., Yada, T., Kaiya, H., Björnsson, B. T., Einarsdóttir, I. E., Johansson, M., &

Gong, N. (2018). The impact of initial energy reserves on growth hormone resistance and Plasma growth hormone-Binding Protein levels in rainbow Trout Under Feeding and Fasting conditions. 9. https://doi.org/10.3389/fendo.2018.00231

[11] Chen, Z., Zhao, P., Li, F., Leier, A., Marquez-Lago, T. T., Wang, Y., Webb, G. I., Smith, A.

I., Daly, R. J., Chou, K.-C., & Song, J. (2018). iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics, 34(14), 2499–2502. https://doi.org/10.1093/bioinformatics/bty140

[12] Classification In Machine Learning | Classification Algorithms | Edureka. (n.d.). Retrieved April 16, 2021, from https://www.edureka.co/blog/classification-in-machine-learning/

[13] Cross-validation Tutorial. (n.d.). Retrieved March 5, 2021, from https://quantdev.ssri.psu.edu/sites/qdev/files/CV_tutorial.html

[14] Decision Tree vs. Random Forest - Which Algorithm Should you Use? (n.d.). Retrieved March 5, 2021, from https://www.analyticsvidhya.com/blog/2020/05/decision-tree-vs- random-forest-algorithm/

[15] Dehzangi, A., López, Y., Lal, S. P., Taherzadeh, G., Michaelson, J., Sattar, A., Tsunoda, T.,

(9)

& Sharma, A. (2017). PSSM-Suc: Accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. Journal of Theoretical Biology, 425, 97–102. https://doi.org/10.1016/j.jtbi.2017.05.005

[16] Evaluation Metrics Machine Learning. (n.d.). Retrieved March 5, 2021, from https://www.analyticsvidhya.com/blog/2019/08/11-important-model-evaluation-error- metrics/

[17] Gini Index - an overview | ScienceDirect Topics. (n.d.). Retrieved April 16, 2021, from https://www.sciencedirect.com/topics/mathematics/gini-index

[18] Hajisharifi, Z., Piryaiee, M., Mohammad Beigi, M., Behbahani, M., & Mohabatkar, H.

(2014). Predicting anticancer peptides with Chou’s pseudo amino acid composition and investigating their mutagenicity via Ames test. Journal of Theoretical Biology, 341, 34–40.

https://doi.org/10.1016/j.jtbi.2013.08.037

[19] Hormone Binding Proteins Chart - Women’s International Pharmacy. (n.d.). Retrieved April 16, 2021, from https://www.yumpu.com/en/document/view/38053145/hormone- binding-proteins-chart-womens-international-pharmacy

[20] Jackknife Test - an overview | ScienceDirect Topics. (n.d.). Retrieved January 26, 2021, from https://www.sciencedirect.com/topics/nursing-and-health-professions/jackknife-test [21] John, G. H., & Langley, P. (2013). Estimating Continuous Distributions in Bayesian

Classifiers. http://arxiv.org/abs/1302.4964

[22] KENT, J. T. (1983). Information gain and a general measure of correlation. Biometrika, 70(1), 163–173. https://doi.org/10.1093/biomet/70.1.163

[23] Li, F. M., & Wang, X. Q. (2016). Identifying anticancer peptides by using improved hybrid compositions. Scientific Reports, 6(1), 1–6. https://doi.org/10.1038/srep33910

[24] Liang, Y., & Zhang, S. (2018). Identify Gram-negative bacterial secreted protein types by incorporating different modes of PSSM into Chou’s general PseAAC via Kullback–Leibler divergence. Journal of Theoretical Biology, 454, 22–29.

https://doi.org/10.1016/j.jtbi.2018.05.035

[25] Maros, M. E., Capper, D., Jones, D. T. W., Hovestadt, V., von Deimling, A., Pfister, S. M., Benner, A., Zucknick, M., & Sill, M. (2020). Machine learning workflows to estimate class probabilities for precision cancer diagnostics on DNA methylation microarray data. Nature Protocols, 15(2), 479–512. https://doi.org/10.1038/s41596-019-0251-6

(10)

[26] Mayer, P. R., & Brazzell, R. K. (1988). Application of Statistical Moment Theory to Pharmacokinetics. The Journal of Clinical Pharmacology, 28(6), 481–483.

https://doi.org/10.1002/j.1552-4604.1988.tb03164.x

[27] Mu, Z., Yu, T., Qi, E., Liu, J., & Li, G. (2019). DCGR: Feature extractions from protein sequences based on CGR via remodeling multiple information. BMC Bioinformatics, 20(1), 351. https://doi.org/10.1186/s12859-019-2943-x

[28] Nawaz, M., Paracha, M. A., Majid, A., & Durad, H. (2020). Attack Detection From Network Traffic using Machine Learning. In VFAST Transactions on Software Engineering (Vol. 8, Issue 1). https://doi.org/10.21015/VTSE.V8I1.571

[29] Nguyen, V. N., Huang, K. Y., Huang, C. H., Lai, K. R., & Lee, T. Y. (2017). A New Scheme to Characterize and Identify Protein Ubiquitination Sites. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(2), 393–403.

https://doi.org/10.1109/TCBB.2016.2520939

[30] Osman, A. H. (2016). An Evaluation Model of Tecaching Assistant using Artificial Neural Network. VAWKUM Transactions on Computer Sciences, 11(2), 10.

https://doi.org/10.21015/vtcs.v11i2.438

[31] Ozzola, G. (2016). Essay of sex hormone binding protein in internal medicine:a brief review. In La Clinica terapeutica (Vol. 167, Issue 5, pp. e127–e129).

https://doi.org/10.7417/CT.2016.1956

[32] Qu, K., Han, K., Wu, S., Wang, G., & Wei, L. (2017). Identification of DNA-binding proteins using mixed feature representation methods. Molecules, 22(10).

https://doi.org/10.3390/molecules22101602

[33] Rehman, S., & Shahzad, I. (2018). Brain Tumor Detection by Using Computer Vision Based on Multi-Level Image Filteration. VAWKUM Transactions on Computer Sciences, 15(1), 41. https://doi.org/10.21015/vtcs.v15i1.488

[34] Sohm, F., Manfroid, I., Pezet, A., Rentier-Delrue, F., Rand-Weaver, M., Kelly, P. A., Boeuf, G., Postel-Vinay, M. C., De Luze, A., & Ederyt, M. (1998). Identification and modulation of a growth hormone-binding protein in rainbow trout (Oncorhynchus mykiss) plasma during seawater adaptation. General and Comparative Endocrinology, 111(2), 216–

224. https://doi.org/10.1006/gcen.1998.7106

[35] Specht, D. F. (1990). Probabilistic neural networks. Neural Networks, 3(1), 109–118.

(11)

https://doi.org/10.1016/0893-6080(90)90049-Q

[36] Statistical Moment - an overview | ScienceDirect Topics. (n.d.). Retrieved March 1, 2021, from https://www.sciencedirect.com/topics/engineering/statistical-moment

[37] Steroid Binding Protein - an overview | ScienceDirect Topics. (n.d.). Retrieved April 15, 2021, from https://www.sciencedirect.com/topics/medicine-and-dentistry/steroid-binding- protein

[38] Tan, J. X., Li, S. H., Zhang, Z. M., Chen, C. X., Chen, W., Tang, H., & Lin, H. (2019).

Identification of hormone binding proteins based on machine learning methods.

Mathematical Biosciences and Engineering, 16(4), 2466–2480.

https://doi.org/10.3934/mbe.2019123

[39] Tang, H., Zhao, Y. W., Zou, P., Zhang, C. M., Chen, R., Huang, P., & Lin, H. (2018).

HBPred: A tool to identify growth hormone-binding proteins. International Journal of Biological Sciences, 14(8), 957–964. https://doi.org/10.7150/ijbs.24174

[40] Technology, I., Aziz, K. A., & Arabia, S. (2020). PREDICTION OF SAUDI ARABIA SARS- COV 2 DIVERSIFICATIONS IN PROTEIN STRAIN AGAINST CHINA STRAIN. 8(1), 64–

73.

[41] The structure of the Random Forest classifier | Download Scientific Diagram. (n.d.).

Retrieved April 16, 2021, from https://www.researchgate.net/figure/The-structure-of-the- Random-Forest-classifier_fig3_338162309

[42] Wang, K., Li, S., Wang, Q., & Hou, C. (2019). Identification of hormone-binding proteins using a novel ensemble classifier. Computing, 101, 693–703.

https://doi.org/10.1007/s00607-018-0682-x

[43] Wang, L., Zhao, Y., Chen, Y., & Wang, D. (2017). The effect of three novel feature extraction methods on the prediction of the subcellular localization of multi-site virus proteins. https://doi.org/10.1080/21655979.2017.1373536

[44] What is the Difference Between Accuracy and Precision? (n.d.). Retrieved April 16, 2021, from https://www.forecast.app/faqs/what-is-the-difference-between-accuracy-and-precision [45] YERUSHALMY, J. (1947). Statistical problems in assessing methods of medical diagnosis,

with special reference to X-ray techniques. Public Health Reports, 62(40), 1432–1449.

https://doi.org/10.2307/4586294

[46] Z. Ju, J.J. He, Prediction of lysine crotonylation... - Google Scholar. (n.d.). Retrieved

(12)

January 5, 2021, from https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=+Z.+Ju%2C+J.J.+He%2C+P rediction+of+lysine+crotonylation+sites+by+incorporating+the+composition+of+k-

spaced+amino+acid+pairs+into+Chou%27s+general+PseAAC%2C+J.+Mol.+Graph.+Mod el.+77+%282017%29+200-204.+&btnG=

[47] Zhang, S., Cheng, D., Deng, Z., Zong, M., & Deng, X. (2018). A novel kNN algorithm with data-driven k parameter computation. Pattern Recognition Letters, 109, 44–54.

https://doi.org/10.1016/j.patrec.2017.09.036

[48] Zhang, Y., & Marchant, T. A. (1999). Identification of serum GH-binding proteins in the goldfish (Carassius auratus) and comparison with mammalian GH-binding proteins. Journal of Endocrinology, 161(2), 255–262. https://doi.org/10.1677/joe.0.1610255

Referințe

DOCUMENTE SIMILARE

A fully learnable context-driven object-based model for mapping land cover using multi-view data from unmanned aircraft systems. Evaluation of Feature Selection

Result for classification of DNA binding proteins into four major classes using 2 nd neural network based on protein sequence derived features with the varying number of

A phylogenetic tree of Carthamus species RuBisCO proteins were constructed using maximum likelihood method with MEGA 6.0 and bootstrap values were performed with

Survival analysis, Cox relative hazards model, Random forest algorithm, CoxRF turnover prediction algorithm. As a result, it's critical to investigate and

Here, a novel method is known as the Hybrid Linear stacking model for feature selection and Xgboost algorithm for heart disease classification (HLS-Xgboost)1. This model

In the existing system it is done by using Deep Hierarchical Context Model which utilizes the contextual information from the feature extraction and prior level

An Accurate Prediction of Disease Using Trapezoidal Long Short Term Recurrent Neural Network Model in Big Data.. K.Tamilselvi 1

Finally, we compare and evaluate few machine learning algorithms in spark using RDD-based regression and classification methods for Random forest, decision tree,