• Nu S-Au Găsit Rezultate

Protein Structure Prediction Using Neural Networks

N/A
N/A
Protected

Academic year: 2022

Share "Protein Structure Prediction Using Neural Networks"

Copied!
28
0
0

Text complet

(1)

Protein Structure Prediction Using Neural Networks

Martha Mercaldi Kasia Wilamowska

Literature Review December 16, 2003

(2)

The Protein Folding Problem

(3)

Evolution of Neural Networks

Neural networks originally designed to approximate connections between neurons in the brain

(4)

Evolution of Neural Networks

(5)

Why use Neural Nets for Protein Folding?

• Successful applications in:

– Secondary structure prediction – Solvent access

• No “inherent shortcoming” yet found

• Can incorporate evolutionary information via multiple alignments

• Detect previous misclassifications

(6)

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

• Purpose

– Using neural nets, effectively predict the secondary structure of proteins.

• Current best for secondary structure

prediction is SSpro8 with accuracy in the

range of 62-63%

(7)

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

• Input to the system

– Can choose to use DNA or amino acid sequences

– SSpro8 uses amino acid sequences

– The authors’ system, UTMPred, uses DNA

• Output - forms consisting of alpha helices, beta

sheets and loops expanded to eight structure forms

Protein Secondary Structure Forms

(8)

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

(9)

x - input to the network

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

(10)

y – number of input nodes

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

(11)

p – prototype,

representing a number of k nearest previously

trained input to the current tested input

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

(12)

i – number of prototypes in y-dimensional feature space

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

(13)

s – the activation function

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

(14)

M – number of output classes

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

(15)

ujq – a weight which represents the degree of membership for

prototype j to output class q

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

(16)

mjq – the BAA mass, product of weight mjq and activation function sj

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

(17)

hjq – the conjunctive combination of the BBA’s

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

(18)

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

• The DBNN input are DNA sequences converted to binary format prior to use

• The sequences are:

– 88 Escheichia coli proteins – 25 yeast Saccharomyces

cerevisiae proteins

– 166 mammalian proteins (80 of which are human)

(19)

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

• The input window size for UTMPred is set to 7 codons, which

results in 84 input nodes and 8 output

notes which represent the expanded

structural forms.

(20)

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

• UTMPred used 200 prototypes and after the training was completed, the system was able to predict H and E

forms with accuracy above 75%. At the same time, the system had difficulty predicting form I, due to a small amount of data in the training samples.

(21)

Assignment of Protein Sequence to Functional Family Using Neural Network and Dempster-Shafer Theory

• Purpose

– Using neural networks, efficiently predict protein function

• Using databases such as Prosite, Pfam, and Prints, either query the databases for motifs within a protein in question, or

query for an absence or presence of

arbitrary combinations of motifs.

(22)

Assignment of Protein Sequence to Functional Family Using Neural Network and Dempster-Shafer Theory

• Given a training set,

induce a classifier able to assign novel protein

sequences to one of the protein families

represented in the training set

• Once trained, the

classifier will be able to predict novel proteins into specific functional

families based on its

knowledge of the training set

(23)

Assignment of Protein Sequence to Functional Family Using Neural Network and Dempster-Shafer Theory

• Input data

– From the Prosite database containing over 1100 entries. Each entry describes a function shared by some proteins. In the experiment one Prosite

documentation entry corresponded to a protein class, and each protein class could, in turn, be characterized by one or more motif patterns/profiles. Only motifs

considered significant matches by profileScan were chosen.

• DBNN was used as the classifier.

(24)

Assignment of Protein Sequence to Functional Family Using Neural Network and Dempster-Shafer Theory

• 585 proteins belonging to one of ten classes were used, out of which

subsets of varying size were picked randomly to become the training set.

• Once the DBNN was trained, all 585 proteins were used as the test set to determine accuracy

With only 10% of the total training

samples, DBNN could be constructed to classify proteins with a 95% accuracy.

(25)

Assignment of Protein Sequence to Functional Family Using Neural Network and Dempster-Shafer Theory

• The number of false positives generated by DBNN were significantly lower than those resulting from a Prosite search.

• As the size of the data set approaches 100%, the

false positives discovered by DBNN approaches

zero.

The number of false positives resulting from the use of the DBNN trained using training sets of different sizes.

(26)

Assignment of Protein Sequence to Functional Family Using Neural Network and Dempster-Shafer Theory

• A second data set of 73

protein sequences drawn from five classes were used to build a DBNN classifier

• Using the DBNN classifier built by random sized datasets, the output exceeded 96%

accuracy when the training set was greater or equal to 22

• Once the input contained more than 80% (58 or more

sequences) of the dataset, all sequences were correctly

predicted Result of classifying proteins

containing common motifs

(27)

Future Work

• Ultimate solution to “protein folding” will probably be a hybrid

• Neural networks likely to be included due to their successful application to related problems

– Secondary structure – Solvent access

– Distance between residues in final structure – Protein interface recognition

• In addition, neural nets can combine knowledge

from multiple sources

(28)

Bibliography

• B. Rost. “Neural networks for protein structure prediction: hype or hit?” Artificial intelligence and heuristic methods for bioinformatics (2003): 34-50.

• S.N.V. Arjunan, S. Deris, R.M. Illias. “Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network.” ICAIET

Proceedings (2002): 554-560.

• N.M.Zaki, S. Deris, S.N.V. Arjunan. “Assignment of Protein Sequence to Functional Family Using Neural Network and Dempster-Shafer Theory” Journal of Theoretics 5-1 (2003).

Background information

• S.N.V Arkimam, S. Deris, R.M.Illias. “Prediction of Protein Secondary Structure” Jurnal Teknologi 35(C) (2001): 81-90.

• T.Wessels, C.W. Omlin.”Refining Hidden Markov Models with Recurrent Neural Networks”.

Referințe

DOCUMENTE SIMILARE

Keywords:Artificial neural networks, Data mining techniques, Meteorological data, Rainfall prediction, Support Vector

Result for classification of DNA binding proteins into four major classes using 2 nd neural network based on protein sequence derived features with the varying number of

We implemented a novel approach for predicting the DNA binding and non-DNA binding proteins from its amino acid sequence using artificial neural network (ANN).. The ANN used in

Predicting protein secondary structure, based only on its sequence, is an apparently simple task that has been challenging several generations of prediction methods for already

Finally, an algorithm for clustering by graph flow simulation, TribeMCL [21], is used to cluster the interaction network and then to reconvert the identified clusters from

In this chapter, we explore the role of SI algorithms in certain bioinformatics tasks like micro- array data clustering, multiple sequence alignment, protein structure prediction

Motivation: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolutionary analysis.. This work

To find the image classification of weather reports using a convolution neural network and suggest a deep learning algorithm using TensorFlow or Keras.. KEYWORDS: Computer