ABSTRACT
The annotation of genomic information is a major challenge in biology and bioinformatics. Existing databases of known gene functions are incomplete and prone to errors, and the bimolecular experiments needed to improve these databases are slow and costly. While computational methods are not a substitute for experimental verification, they can help in two ways: algorithms can aid in the curation of gene annotations by automatically suggesting inaccuracies, and they can predict previously-unidentified gene functions, accelerating the rate of gene function discovery. In this work, we develop an algorithm that achieves both goals using deep autoencoder neural networks. With experiments on gene annotation data from the Gene Ontology project, we show that deep autoencoder networks achieve better performance than other standard machine learning methods, including the popular truncated singular value decomposition.
- G. Pandey, V. Kumar, and M. Steinbach, "Computational approaches for protein function prediction: A survey". Twin Cities: Department of Computer Science and Engineering, University of Minnesota, 2006. Google ScholarDigital Library
- The Gene Ontology Consortium, "Creating the Gene Ontology Resource: Design and Implementation". Genome Research, vol. 11, pp. 1425--1433, 2001.Google ScholarCross Ref
- M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock, "Gene Ontology: tool for the unification of biology". Nature Genetics, vol. 25.1: pp. 25--29, 2000.Google ScholarCross Ref
- O. D. King, R. E. Foulger, S. S. Dwight, J. V. White, and F. P. Roth, "Predicting gene function from patterns of annotation". Genome Research 13.5: pp. 896--904, 2003.Google ScholarCross Ref
- Y. Tao, L. Sam, J. Li, C. Friedman, and Y. A. Lussier, "Information theory applied to the sparse gene ontology annotation network to predict novel gene function". Bioinformatics, vol. 23.13: pp. 529--538, 2007. Google ScholarDigital Library
- Z. Barutcuoglu, R. E. Schapire, and O. G. Troyanskaya, "Hierarchical multi-label prediction of gene function". Bioinformatics, vol. 22.7: pp. 830--836, 2006. Google ScholarDigital Library
- S. Raychaudhuri, et al. "Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature". Genome Research, vol. 12.1: pp. 203--214, 2002.Google ScholarCross Ref
- A. Perez, C. Perez-Iratxeta, P. Bork, G. Thode, and M. A. Andrade, "Gene annotation from scientific literature using mappings between keyword systems". Bioinformatics, vol. 20.13: pp. 2084--2091, 2004. Google ScholarDigital Library
- G. Yu, H. Rangwala, C. Domeniconi, G. Zhang, and Z. Yu, "Protein Function Prediction with Incomplete Annotations". IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 11.3: pp. 579--591, 2013. Google ScholarDigital Library
- P. Khatri, B. Done, A. Rao, A. Done, and S. Draghici, "A semantic analysis of the annotations of the human genome". Bioinformatics, vol. 21.16: pp. 3416--3421, 2005. Google ScholarDigital Library
- M. Masseroli, M. Tagliasacchi, and D. Chicco, "Semantically improved genome-wide prediction of Gene Ontology annotations". Proceedings of IEEE ISDA 2011, the 11th International Conference on Intelligent Systems Design and Applications, pp. 1080--1085, 2011.Google ScholarCross Ref
- P. Pinoli, D. Chicco, and M. Masseroli. "Improved biomolecular annotation prediction through Weighting Scheme methods". Proceedings of CIBB 2013, the Tenth International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, Nice, France, pp. 1--12, 2013.Google Scholar
- H. Bourlard, and Y. Kamp, "Auto-association by multilayer perceptrons and singular value decomposition." Biological cybernetics, vol. 59.4-5, pp. 291--294, 1988.Google ScholarDigital Library
- P. Baldi and K. Hornik, "Neural networks and principal component analysis: Learning from examples without local minima." Neural networks, vol. 2.1, pp. 53--58, 1989. Google ScholarDigital Library
- P. Baldi, "Autoencoders, Unsupervised Learning, and Deep Architectures". Journal of Machine Learning Research-Proceedings Track, vol. 27, pp. 37--50, 2012.Google Scholar
- G. H. Golub, and C. Reinsch, "Singular value decomposition and least squares solutions". Numerische Mathematik vol. 14.5: pp. 403--420, 1970.Google ScholarDigital Library
- M. Masseroli, M. Tagliasacchi, "Web resources for gene list analysis in biomedicine", In: Lazakidou, A., editor. Web-based Applications in Health Care and Biomedicine. Heidelberg, D: Springer, Annals of Information Systems Series, vol. 7, pp. 117--141, 2010Google Scholar
- B. Done, P. Khatri, A. Done, and S. Draghici, "Semantic analysis of genome annotations using weighting schemes". Proceedings of CIBCB 2007, the IEEE Symposium Computational Intelligence and Bioinformatics and Computational Biology, pp. 212--218, 2007.Google ScholarCross Ref
- B. Done, P. Khatri, A. Done, and S. Draghici, "Predicting novel human gene ontology annotations using semantic analysis." IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7.1: pp. 91--99, 2010. Google ScholarDigital Library
- D. Chicco, and M. Masseroli, "A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves". Proceedings of IEEE BIBE 2013, the 13rd International Conference on Bioinformatics and Bioengineering, pp. 1--4, 2013.Google ScholarCross Ref
- R. Collobert, K. Kavukcuoglu and C. Farabet, "Torch7: A Matlab-like Environment for Machine Learning". BigLearn, NIPS Workshop, 2011.Google Scholar
- A. Canakoglu, G. Ghisalberti, and M. Masseroli, "Integration of Biomolecular Interaction Data in a Genomic and Proteomic Data Warehouse to Support Biomedical Knowledge Discovery". Computational Intelligence Methods for Bioinformatics and Biostatistics, Springer Berlin Heidelberg, pp. 112--126, 2012.Google ScholarCross Ref
- F. Pessina, M. Masseroli, and A. Canakoglu, "Visual composition of complex queries on an integrative Genomic and Proteomic Data Warehouse". Engineering, vol. 5:10B, pp. 1--8, 2013.Google ScholarCross Ref
- D. Chicco, M. Tagliasacchi, and M. Masseroli, "Genomic annotation prediction based on integrated information". Computational Intelligence Methods for Bioinformatics and Biostatistics, Springer Berlin Heidelberg, pp. 238--252, 2012.Google ScholarCross Ref
- D. Chicco, M. Tagliasacchi, and M. Masseroli, "Biomolecular annotation prediction through information integration". Proceedings of CIBB 2011, the 8th Computational Intelligence Methods for Bioinformatics and Biostatistics, pp. 1--8, 2011.Google Scholar
- M. Masseroli, D. Chicco, and P. Pinoli, "Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annotations". Proceedings of IEEE IJCNN 2012, the International Joint Conference on Neural Networks, pp- 1--8 2012.Google ScholarCross Ref
- P. Pinoli, D. Chicco, and M. Masseroli, "Latent Dirichlet Allocation based on Gibbs Sampling for Gene Function Prediction". Proceedings of IEEE CIBCB 2014, the Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1--4, 2014.Google ScholarCross Ref
- S. Ceri, D. Braga, F. Corcoglioniti, M. Grossniklaus, and S. Vadacca, "Search computing challenges and directions". Springer, Berlin Heidelberg, 2010. Google ScholarDigital Library
- D. Chicco, "Integration of bioinformatics web services through the Search Computing technology". Technical Report, TR 2012/02. Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milan, Italy.Google Scholar
Index Terms
- Deep autoencoder neural networks for gene ontology annotation predictions
Recommendations
Analysis of cancer-related lncRNAs using gene ontology and KEGG pathways
We investigated cancer-related lncRNAs using GO and KEGG enrichment scores of the co-expressed neighbors of lncRNAs.The biological analysis confirmed the crucial cancer associated GO term and KEGG pathways we screened out.This study provided novel ...
Gene Ontology analysis in multiple gene clusters under multiple hypothesis testing framework
Objective: Gene Ontology (GO) has become a routine resource for functional analysis of gene lists. Although a number of tools have been provided to identify enriched GO terms in one or two gene lists, two technical challenges remain. First, how to ...
GOSAP: Gene Ontology-Based Semantic Alignment of Biological Pathways
We present a new method for semantic comparison of biological pathways, aiming to discover evolutionary conservation of pathways between species. Our method uses all three sub-ontologies of Gene Ontology (GO) and a measure of semantic similarity to ...
Comments