Sequential Deep Learning for Human Action Recognition

Baccouche, Moez; Mamalet, Franck; Wolf, Christian; Garcia, Christophe; Baskurt, Atilla

doi:10.1007/978-3-642-25446-8_4

Moez Baccouche^18,19,
Franck Mamalet¹⁸,
Christian Wolf¹⁹,
Christophe Garcia¹⁹ &
…
Atilla Baskurt¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7065))

Included in the following conference series:

International Workshop on Human Behavior Understanding

6377 Accesses
354 Citations
3 Altmetric

Abstract

We propose in this paper a fully automated deep model, which learns to classify human actions without using any prior knowledge. The first step of our scheme, based on the extension of Convolutional Neural Networks to 3D, automatically learns spatio-temporal features. A Recurrent Neural Network is then trained to classify each sequence considering the temporal evolution of the learned features for each timestep. Experimental results on the KTH dataset show that the proposed approach outperforms existing deep models, and gives comparable results with the best related works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Action Classification in Soccer Videos with Long Short-Term Memory Recurrent Neural Networks. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010. LNCS, vol. 6353, pp. 154–159. Springer, Heidelberg (2010)
Chapter Google Scholar
Chen, M.y., Hauptmann, A.: MoSIFT: Recognizing human actions in. surveillance videos. Tech. Rep. CMU-CS-09-161, Carnegie Mellon University (2009)
Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Google Scholar
Gao, Z., Chen, M.-y., Hauptmann, A.G., Cai, A.: Comparing Evaluation Protocols on the KTH Dataset. In: Salah, A.A., Gevers, T., Sebe, N., Vinciarelli, A. (eds.) HBU 2010. LNCS, vol. 6219, pp. 88–100. Springer, Heidelberg (2010)
Chapter Google Scholar
Garcia, C., Delakis, M.: Convolutional face finder: a neural architecture for fast and robust face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(11), 1408–1423 (2004)
Article Google Scholar
Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research 3, 115–143 (2003)
MathSciNet MATH Google Scholar
Ikizler, N., Cinbis, R., Duygulu, P.: Human action recognition with line and flow histograms. In: International Conference on Pattern Recognition, pp. 1–4 (2008)
Google Scholar
Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: International Conference on Computer Vision, pp. 2146–2153 (2009)
Google Scholar
Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: International Conference on Computer Vision, pp. 1–8 (2007)
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. In: International Conference on Machine Learning, pp. 495–502 (2010)
Google Scholar
Kim, H.J., Lee, J., Yang, H.S.: Human Action Recognition Using a Modified Convolutional Neural Network. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds.) ISNN 2007. LNCS, vol. 4492, pp. 715–723. Springer, Heidelberg (2007)
Chapter Google Scholar
Kim, T.K., Wong, S.F., Cipolla, R.: Tensor canonical correlation analysis for action classification. In: International Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: International Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
LeCun, Y., Kavukcuoglu, K., Farabet, C.: Convolutional networks and applications in vision. In: IEEE International Symposium on Circuits and Systems, pp. 253–256 (2010)
Google Scholar
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: International Conference on Computer Vision and Pattern Recognition, pp. 1996–2003 (2009)
Google Scholar
Liu, J., Shah, M.: Learning human actions via information maximization. In: International Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: International Conference on Computer Vision and Pattern Recognition, pp. 2929–2936 (2009)
Google Scholar
Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79, 299–318 (2008)
Article Google Scholar
Ning, F., Delhomme, D., LeCun, Y., Piano, F., Bottou, L., Barbano, P.E.: Toward automatic phenotyping of developing embryos from videos. IEEE Transactions on Image Processing 14(9), 1360–1371 (2005)
Article Google Scholar
Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Google Scholar
Ryoo, M., Aggarwal, J.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: International Conference on Computer Vision, pp. 1593–1600 (2009)
Google Scholar
Schindler, K., van Gool, L.: Action snippets: How many frames does human action recognition require? In: International Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: International Conference on Pattern Recognition, vol. 3, pp. 32–36 (2004)
Google Scholar
Sun, X., Chen, M., Hauptmann, A.: Action recognition via local descriptors and holistic features. In: International Conference on Computer Vision and Pattern Recognition Workshops, pp. 58–65 (2009)
Google Scholar
Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional Learning of Spatio-temporal Features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 140–153. Springer, Heidelberg (2010)
Chapter Google Scholar
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Orange Labs, 4 rue du Clos Courtel, 35510, Cesson-Sévigné, France
Moez Baccouche & Franck Mamalet
LIRIS, UMR 5205 CNRS, INSA-Lyon, F-69621, France
Moez Baccouche, Christian Wolf, Christophe Garcia & Atilla Baskurt

Authors

Moez Baccouche
View author publications
You can also search for this author in PubMed Google Scholar
Franck Mamalet
View author publications
You can also search for this author in PubMed Google Scholar
Christian Wolf
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Atilla Baskurt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Engineering, Bogaziçi University, 34342, Bebek, Istanbul, Turkey
Albert Ali Salah
FBK - Fondazione Bruno Kessler, Via Sommarive 18, 38100, Trento, Italy
Bruno Lepri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A. (2011). Sequential Deep Learning for Human Action Recognition. In: Salah, A.A., Lepri, B. (eds) Human Behavior Understanding. HBU 2011. Lecture Notes in Computer Science, vol 7065. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25446-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-25446-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25445-1
Online ISBN: 978-3-642-25446-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics