• Nu S-Au Găsit Rezultate

View of Detection Of Significant Patterns In Cervical And Breast Cancer Proteins

N/A
N/A
Protected

Academic year: 2022

Share "View of Detection Of Significant Patterns In Cervical And Breast Cancer Proteins"

Copied!
8
0
0

Text complet

(1)

Detection Of Significant Patterns In Cervical And Breast Cancer Proteins

Pranathi Jalapally1, Vani, K. Suvarna2, Sravya Madala3

1,2,3Velagapudi Ramakrishna Siddhartha Engineering College, Autonomous, JNTUK Vijayawada, AP-India

*[email protected], *[email protected], *[email protected],

ABSTRACT

Cancers of the cervix and breast kill more women than any other forms of cancer in all parts of the developing world. According to the world health organization(WHO) that the woman who is suffering from human papillomavirus(HPV) infection are vulnerable of developing breast cancer over a lifetime. In this project, the secondary structures and most significant patterns in both cervical and breast cancer proteins are identified using a protein contact map. The contact among the amino acid residues is represented by a two-dimensional matrix known as Protein Contact Map. These ideas have prospected in the structural class of all-alpha, all-beta, alpha+

beta and alpha/beta proteins to identify structural elements. Our proposed method obtained an accuracy of 92%

for the process of prediction of secondary structures as compared to the previous related works. The method successfully predicts the secondary structures and significant patterns at the off-diagonal interactions in the contact map. In future, a better understanding of these extracted features will helps to predict whether a person suffering from cervical cancer is vulnerable of developing breast cancer over a lifetime.

Keywords

Distance Matrix, Breast Cancer, Cervical Cancer Dynamic Programming, Contact Map.

Introduction

Breast and cervical are the most common cancers affecting the large number of women worldwide.

The World Health Organization has reported that the causes of the cervical and breast cancers are related. Woman who are suffering from human papillomavirus(HPV) infection are vulnerable of developing breast cancer over a lifetime., the findings from a new study suggest. Among the 100 different types of human papillomavirus (HPV) about 10 of are classified as “vulnerable” because they poses abnormal cells that can go on to cause genital or cervical cancers. The researchers among 855 types of the breast cancer cases they had identified 30 HPV types are of low-risk and 20 HPV types are of high-risk. While not conclusive, the finding strongly supports a growing body of evidence linking Researchers had discovered that HPV is transmitted through WBC from cervix to the breast and later it might spread throughout the body, including to the breast. They also identified there is higher risk of developing cancer in a woman with mutations in BRCA1 or BRCA2.

The proposed work mainly focuses on proposing an alternate route to solve classical problems of computational biology like protein secondary structure elements, protein fold signatures and to identify the Significant Patterns in the Breast and Cervical Cancer Proteins using Dynamic Programming. Our proposed method focus on the HER2 protein, HPV16, HPV18 protein contact maps, BRCA1, BRCA2, gene related proteins. As the onset of these two cancers is increasing there are many unexplored mysteries to be made known for which the study of these proteins and their contact maps is necessary. We also contrasted the outcomes of current schemes with those of the proposed system. In this, research we have identified the secondary structures and the unexplored regions known as the significant patterns in both breast and cervical cancers by masking the secondary structures in the contact maps of the cancer proteins. Previously, the existing methods only concentrated on the secondary structures and they have also stated that there exists a

(2)

lifetime. The proposed study tries to throw light on some of these unexplored areas like how these two cancers are related to each other by studying inter and intra fold patterns of proteins.

Literature Review

Niloofar Khodabandehlou, Shayan Mostafaei, et al, proposed a method which involves polymerase chain reaction and genotyping to measure the expression level of ovarian and breast cancer[6]. The presence of HR-HPVs in breast cancer, especially types HPV16 and HPV18, has been linked to invasive breast carcinomas. It was also discovered that HR-HPV type 16 E6/E7 oncoproteins cause non-invasive and non-metastatic breast carcinoma cells to become invasive and metastatic. These findings show that HR-HPVs exist and play a vital role in the development of breast cancer and metastasis [8].

Kashyap and Somani, proposed a method that diagnosis cervical cancer cells which involves image segmentation, noise removal and feature extraction and according to the severity of cancer images are classified [2]. Praveen kumar and K.Suvarnavani, proposed a method SMOTE for protein fold identification using machine learning on contact maps [1]. They also demonstrate that the proposed method addresses protein fold prediction rather than just protein fold recognition by comprehensive testing on noisy and predicted contact maps. Saha S, Ekbal A. et.al proposed a MEMM&CRF method for secondary structure predictions with 61% accuracy [4].

Qing.S, Tulake.W, Ru.M, Li.X, Yuemaier.R, Lidifu.D & Abudula.A proposed a method proteomics to identify human papillomavirus infection and potential biomarkers common to cervical carcinoma. They have present a profile of 67 proteins that are differentially expressed in HPV16-positive cervical SCC versus HPV-negative NC [3].

Zhang, B, Li.J& Lü, Q, proposed a deep network method to predict secondary structures on CB513 dataset with 71% accuracy [5]. Yaseen Ashraf and Li Yaohang, proposed a context based feature method for the secondary structure prediction with 95% accuracy [7]. Several methods proposed in the past could not able to withdraw the clear relationship between the breast and cervical cancer and could not predict the development of breast and cervical cancer. Our proposed method could able to predict the secondary structures in HER2, HPV1, HPV2, BRCA1 and BRCA2 proteins of breast and cervical cancer with 92 % accuracy. The proposed system helps to identify the unexplored regions of the proteins which causes the cancer. Thus it is hoped that a better understanding of the extracted features will helps to forecast whether a person suffering from cervical cancer is vulnerable of developing breast cancer over a lifetime.

Methodology

The diagrammatic representation of the proposed system is shown in the Fig.1, which includes the following steps. First, the PDB file of the proteins are extracted, then coordinates x, y and z of C- alpha atoms are extracted. Second, the proposed methodology involves in calculation of Euclidean distance and distance map extraction. Third, the contact map for all the proteins is constructed and patterns are extracted from those contact maps. Finally, we identify the significant patterns in both the breast and cervical cancer.

(3)

Fig 1. Methodology

A. Extraction of a PDB file

A PDB file is typically created from source files during compilation. It stores a list of all symbols in a module with their addresses and possibly the name of the file and the line on which the symbol was declared. Researchers swap the protein coordinates in the database by using a PDB file.

Previously, the coordinates were exchanged based on the width of the punch cards whose width of limit 80 columns.

The information about the researchers who defined the structure numerous other types of records are available in HEADER, TITLE and AUTHOR records. Similarly, the information about the sequences of the three peptide chains (named A, B and C), which are very short in this example but usually span multiple lines are available in SEQRES records. In this project, we have extracted the related PDB files of different genes of the breast and cervical cancer like HPV16, HPV18, BRCA1, BRCA2 and HER2.

B. Extracting the Coordinates

A PDB file contains various types of records, organized in a precise order to describe the structure.

The types of records that are present in the pdb file are ATOM, HETATOM, TER, HELIX, SHEET AND SSBOND. The ATOM record is an atomic coordinate record that contains x, y, z coordinates of peptide chain A which are represented in units of Årmgströms. Here, we have extracted the x, y, z Coordinates of the C-alpha atom of peptide chain A for each proteins using PDB parser.

C. Calculating the Euclidean Distance and Distance Map

Euclidean Distance between two points is given by Minkowski distance metric. Here, we have calculated the Euclidean distance using the coordinates extracted from the protein PDB files. The formula of Euclidean distance is as following.

𝑑(𝑝, 𝑞) = √∑

𝑛

𝑖 =1

(𝑝𝑖− 𝑞𝑖)2

where n is the number of dimensions. The numerical difference for each corresponding attributes of point p and point q is measured using the Euclidean Distance. Similarly, here we have used the

(4)

primary sequence chain from the N to C terminals is denoted by the entry (x, y) which is represented by a two dimensional symmetric matrix is known as the distance map.

D. Construction of Contact Map

Contact map is a binary two-dimensional matrix which represents the distance between all possible amino acid residue pairs. Here, we have constructed a contact map for different proteins of breast and cervical cancer using the distance map and threshold distance 'T'. The threshold distance is the distance between the Cα atoms. In this project, we have varied the threshold distance from (i.e. 5- 15 Å) and then, we have set the threshold value to 7Å for optimistic results.

Algorithm 1: Generation of the Contact Map Input: The Distance Matrix A[n][n],threshold Output: The Contact[][]

threshold←7Å

if (value<=threshold) set value=1

else set value=0 end if return A[][]

The above algorithm1 is used to generate the contact map. The contact map is constructed from the distance matrix by converting the every value in the distance mat, if it is less than the given threshold value i.e; 7Å then set the value to 1, else 0 in the contact map.

Algorithm 2: ExtractPatterns function Pattern(Contact,i,j) H ← Contact.length L ←Contact[0].length

if(i<0 | | j < 0 | | i ≥ H | | j ≥ L) return

Contact[i][j]=True// making visited Pattern(Contact, i-1, j)

Pattern(Contact,i+1,j) Pattern(Contact,i,j-1) Pattern(Contact,i,j+1) end function

E. Extraction of Patterns from the Contact Map

The specific secondary structures like helices, beta sheets are extracted along the diagonal of the proteins contact map of breast and cervical cancer proteins. The secondary structures are predicted by varying the width (w) from w ≥3 to w≤5 of the bands and testing all bands of length(l). The secondary structures prediction accuracy is given by the width (w). In this project, we have predicted the total number of helices and beta sheets present in the breast and cervical cancer

(5)

proteins for different widths varying from 3 to 5. We have also compared our results with the existing method results which predict the total number of helices that are present in the proteins.

Algorithm3: Finding the Number of Significant Patterns Input: Contact[i][j]: 0 ≤ i ≤m-1 ,0≤j≤n-1

Output: Return the Number of Significant Patterns Input Parameters: density

if (Contact = = null | | Contact.length = = 0) return 0;

density ← 0

for i ← 0 to m-1 do for j ← 0 to n-1 do

if (Contact[i][j]==1) density++

Pattern(Contact,i,j) end if

end for end for

return density++

F. Identifying the Significant Patterns using Dynamic Programming

To identify the significant patterns from the proteins, we need to follow the below steps. First, we need to identify the secondary structures along the diagonal in the contact map. Second, we have masked the upper triangular matrix of the contact map to extract the significant patterns in the off- diagonal region. Finally, we have extracted the significant patterns from the lower triangular matrix of the contact map using dynamic programming.

Significant Patterns in Breast Cancer Proteins and Cervical Cancer Proteins are identified using Dynamic Programming as shown in the Fig 3 and Fig 4. In the proposed method as shown in the Algorithm2 the masked contact map is given as the input to identify the Significant Patterns of Contact maps using Depth First Search. The Number of Significant Patterns is obtained using Algorithm3. Here, we traversed each index of the Contact map. If the value at that index is 1, we have checked all its 8 neighbors. If a neighbor is also equal to 1, take the union of the index and its neighbor.

Results

The outputs of the proposed algorithms display the following

1. Number of Secondary Structures in Breast and Cervical Cancer Proteins.

2. Number of Significant Patterns in Breast and Cervical Cancer Proteins.

3. The Two Dimensional View of the Significant Patterns in Breast and Cervical Cancer Proteins.

In this paper, we have analyzed that our proposed system was producing more precise and accurate results than the existing system for the extraction of secondary structures. Here, we also

(6)

all the values that are reported, the highest accuracy is recorded for the width equals to four. The proposed method for the prediction of secondary structures obtained an accuracy of 92% compared to the previous related works as shown in the Fig 9. The proposed system also produced the number of significant patterns present in both breast and cervical cancer proteins using dynamic programming are shown in Fig 5 and Fig 6. Two dimesional representation of the significant patterns of breast and cervical cancer proteins are shown in Fig 7 and Fig 8.

Fig 2. Number of Helixes in Breast Cancer Proteins

Fig 3. Number of Helixes in Cervical Cancer Fig 4. Number of Beta sheets in Cervical

Proteins Cancer Proteins

Fig 5. Number of Significant Patterns in Fig 6. Number of Significant Patterns Breast Cancer Proteins in Cervical Cancer Proteins

(7)

Fig 7. Patterns in Cervical Cancer Protein(2lzj) Fig 8. Patterns in Breast Cancer Protein(3pxa)

Fig 9. Performance of the proposed method

Conclusion

Woman who are suffering from human papilloma virus (HPV) infection are vulnerable of developing breast cancer over a lifetime. In this research, we had conducted a detailed analysis of contact maps in order to derive features that pertain to fold information in both cervical and breast cancer proteins. The main focus of this project is to extract the significant patterns and secondary structures in both cervical and breast cancer proteins. Our proposed method obtained an accuracy of 92% for the prediction of secondary structures in HER2, HPV1, HPV2, BRCA1 and BRCA2 proteins of breast and cervical cancer. Future work, the process need to withdraw comparisons between the significant patterns of both cancer proteins. Thus it is hoped that a better understanding of the extracted features will helps to forecast whether a person suffering from cervical cancer is vulnerable of developing breast cancer over a lifetime.

(8)

References

[1]. Dr. K.Suvarna Vani, K. Praveen Kumar.:Protein fold identification using machine learning methods on contact maps.

IEEE Conference on CIBCB, 1-6, DOI:10.1109/CIBCB.2016.7758096. (2016)

[2]. Kashyap.D, Somani.A, Shekhar.J, Bhan.A, Dutta.MK, Burget.R, Riha.K.: Cervical cancer detection and classification using Independent Level sets and multi SVMs. Telecommunications and Signal Processing (TSP), 523 – 528, DOI:

10.5958/0976-5506.2018.01759.X.(2016).

[3]. Qing.S, Tulake.W, Ru.M, Li.X, Yuemaier.R, Lidifu.D, Abudula.A.: Proteomic identification of potential biomarkers for cervical squamous cell carcinoma and human papillomavirus infection, International Society for Oncology and Biomarkers (ISOBM) 39, DOI:10.1177/1010428317697547. (2017)

[4]. Saha S., Ekbal A., Sharma S., Bandyopadhyay S., Maulik U.: Protein Secondary Structure Prediction Using Machine Learning. In: Abraham A., Thampi S. (eds) Intelligent Informatics. Advances in Intelligent Systems and Computing, 57- 63, vol 182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32063-7_7. (2013)

[5]. Zhang, B., Li, J., & Lü, Q.: Prediction of 8-state protein secondary structures by a novel deep learning architecture. BMC Bioinformatics, 19(1). doi:10.1186/s12859-018-2280-5. (2018)

[6]. Niloofar Khodabandehlou., Shayan Mostafaei., Ashkan Etemadi., Amir Ghasemi., Mehrdad Payandeh., Shima Hadifa.r, Amir Hossein Norooznezhad., Anoshirvan Kazemnejad., Mohsen Moghoofei .:Human papilloma virus and breast cancer:

the role of inflammation and viral expressed proteins. BMC Cancer,19(61),2-11, DOI:10.1186/s12885-019-5286-0.

(2019)

[7]. Yaseen Ashraf., Li Yaohang.: Context-based features enhance protein secondary structure prediction accuracy. Journal of chemical informationand modeling, 54(3):992–1002. (2014)

[8]. Alshammari, FD .: Association between HPV, CMV, EBV and HS Viruses and Breast Cancer in Saudi Arabia, Journal of Cancer Prevention& Current Research 7(3):00236. DOI: 10.15406/jcpcr.2017.07.00236. (2017)

Referințe

DOCUMENTE SIMILARE

Predicting protein secondary structure, based only on its sequence, is an apparently simple task that has been challenging several generations of prediction methods for already

In the contact map prediction, one obvious input at each (i, j) location is the pair of corre- sponding amino acids. Amino acids can be represented using orthogonal encoding, i.e.

In this chapter, we explore the role of SI algorithms in certain bioinformatics tasks like micro- array data clustering, multiple sequence alignment, protein structure prediction

We review a number of applications of computational intelligence to problems in bioinformatics and computational biology, including gene expression, gene selection,

Motivation: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolutionary analysis.. This work

According to nonverbal communication literature, the words we use effect 7%, our voice qualities (tone, pitch etc) 38%, while body language effects 55% on our

The number of vacancies for the doctoral field of Medicine, Dental Medicine and Pharmacy for the academic year 2022/2023, financed from the state budget, are distributed to

 Erich Gamma, Richard Helm, Ralph Johnson, John Vissides: Design Patterns, Elements of Reusable Object-Oriented Software, Addisson Wesley, 1998.