skip to main content
10.1145/2649387.2649442acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

Deep autoencoder neural networks for gene ontology annotation predictions

Published:20 September 2014Publication History

ABSTRACT

The annotation of genomic information is a major challenge in biology and bioinformatics. Existing databases of known gene functions are incomplete and prone to errors, and the bimolecular experiments needed to improve these databases are slow and costly. While computational methods are not a substitute for experimental verification, they can help in two ways: algorithms can aid in the curation of gene annotations by automatically suggesting inaccuracies, and they can predict previously-unidentified gene functions, accelerating the rate of gene function discovery. In this work, we develop an algorithm that achieves both goals using deep autoencoder neural networks. With experiments on gene annotation data from the Gene Ontology project, we show that deep autoencoder networks achieve better performance than other standard machine learning methods, including the popular truncated singular value decomposition.

References

  1. G. Pandey, V. Kumar, and M. Steinbach, "Computational approaches for protein function prediction: A survey". Twin Cities: Department of Computer Science and Engineering, University of Minnesota, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. The Gene Ontology Consortium, "Creating the Gene Ontology Resource: Design and Implementation". Genome Research, vol. 11, pp. 1425--1433, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock, "Gene Ontology: tool for the unification of biology". Nature Genetics, vol. 25.1: pp. 25--29, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  4. O. D. King, R. E. Foulger, S. S. Dwight, J. V. White, and F. P. Roth, "Predicting gene function from patterns of annotation". Genome Research 13.5: pp. 896--904, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  5. Y. Tao, L. Sam, J. Li, C. Friedman, and Y. A. Lussier, "Information theory applied to the sparse gene ontology annotation network to predict novel gene function". Bioinformatics, vol. 23.13: pp. 529--538, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Z. Barutcuoglu, R. E. Schapire, and O. G. Troyanskaya, "Hierarchical multi-label prediction of gene function". Bioinformatics, vol. 22.7: pp. 830--836, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Raychaudhuri, et al. "Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature". Genome Research, vol. 12.1: pp. 203--214, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Perez, C. Perez-Iratxeta, P. Bork, G. Thode, and M. A. Andrade, "Gene annotation from scientific literature using mappings between keyword systems". Bioinformatics, vol. 20.13: pp. 2084--2091, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Yu, H. Rangwala, C. Domeniconi, G. Zhang, and Z. Yu, "Protein Function Prediction with Incomplete Annotations". IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 11.3: pp. 579--591, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Khatri, B. Done, A. Rao, A. Done, and S. Draghici, "A semantic analysis of the annotations of the human genome". Bioinformatics, vol. 21.16: pp. 3416--3421, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Masseroli, M. Tagliasacchi, and D. Chicco, "Semantically improved genome-wide prediction of Gene Ontology annotations". Proceedings of IEEE ISDA 2011, the 11th International Conference on Intelligent Systems Design and Applications, pp. 1080--1085, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  12. P. Pinoli, D. Chicco, and M. Masseroli. "Improved biomolecular annotation prediction through Weighting Scheme methods". Proceedings of CIBB 2013, the Tenth International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, Nice, France, pp. 1--12, 2013.Google ScholarGoogle Scholar
  13. H. Bourlard, and Y. Kamp, "Auto-association by multilayer perceptrons and singular value decomposition." Biological cybernetics, vol. 59.4-5, pp. 291--294, 1988.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Baldi and K. Hornik, "Neural networks and principal component analysis: Learning from examples without local minima." Neural networks, vol. 2.1, pp. 53--58, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Baldi, "Autoencoders, Unsupervised Learning, and Deep Architectures". Journal of Machine Learning Research-Proceedings Track, vol. 27, pp. 37--50, 2012.Google ScholarGoogle Scholar
  16. G. H. Golub, and C. Reinsch, "Singular value decomposition and least squares solutions". Numerische Mathematik vol. 14.5: pp. 403--420, 1970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Masseroli, M. Tagliasacchi, "Web resources for gene list analysis in biomedicine", In: Lazakidou, A., editor. Web-based Applications in Health Care and Biomedicine. Heidelberg, D: Springer, Annals of Information Systems Series, vol. 7, pp. 117--141, 2010Google ScholarGoogle Scholar
  18. B. Done, P. Khatri, A. Done, and S. Draghici, "Semantic analysis of genome annotations using weighting schemes". Proceedings of CIBCB 2007, the IEEE Symposium Computational Intelligence and Bioinformatics and Computational Biology, pp. 212--218, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  19. B. Done, P. Khatri, A. Done, and S. Draghici, "Predicting novel human gene ontology annotations using semantic analysis." IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7.1: pp. 91--99, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Chicco, and M. Masseroli, "A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves". Proceedings of IEEE BIBE 2013, the 13rd International Conference on Bioinformatics and Bioengineering, pp. 1--4, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. Collobert, K. Kavukcuoglu and C. Farabet, "Torch7: A Matlab-like Environment for Machine Learning". BigLearn, NIPS Workshop, 2011.Google ScholarGoogle Scholar
  22. A. Canakoglu, G. Ghisalberti, and M. Masseroli, "Integration of Biomolecular Interaction Data in a Genomic and Proteomic Data Warehouse to Support Biomedical Knowledge Discovery". Computational Intelligence Methods for Bioinformatics and Biostatistics, Springer Berlin Heidelberg, pp. 112--126, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  23. F. Pessina, M. Masseroli, and A. Canakoglu, "Visual composition of complex queries on an integrative Genomic and Proteomic Data Warehouse". Engineering, vol. 5:10B, pp. 1--8, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  24. D. Chicco, M. Tagliasacchi, and M. Masseroli, "Genomic annotation prediction based on integrated information". Computational Intelligence Methods for Bioinformatics and Biostatistics, Springer Berlin Heidelberg, pp. 238--252, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  25. D. Chicco, M. Tagliasacchi, and M. Masseroli, "Biomolecular annotation prediction through information integration". Proceedings of CIBB 2011, the 8th Computational Intelligence Methods for Bioinformatics and Biostatistics, pp. 1--8, 2011.Google ScholarGoogle Scholar
  26. M. Masseroli, D. Chicco, and P. Pinoli, "Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annotations". Proceedings of IEEE IJCNN 2012, the International Joint Conference on Neural Networks, pp- 1--8 2012.Google ScholarGoogle ScholarCross RefCross Ref
  27. P. Pinoli, D. Chicco, and M. Masseroli, "Latent Dirichlet Allocation based on Gibbs Sampling for Gene Function Prediction". Proceedings of IEEE CIBCB 2014, the Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1--4, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  28. S. Ceri, D. Braga, F. Corcoglioniti, M. Grossniklaus, and S. Vadacca, "Search computing challenges and directions". Springer, Berlin Heidelberg, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Chicco, "Integration of bioinformatics web services through the Search Computing technology". Technical Report, TR 2012/02. Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milan, Italy.Google ScholarGoogle Scholar

Index Terms

  1. Deep autoencoder neural networks for gene ontology annotation predictions

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
              September 2014
              851 pages
              ISBN:9781450328944
              DOI:10.1145/2649387
              • General Chairs:
              • Pierre Baldi,
              • Wei Wang

              Copyright © 2014 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 20 September 2014

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • short-paper

              Acceptance Rates

              Overall Acceptance Rate254of885submissions,29%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader