Two-Stream Convolutional Networks for Action Recognition in Videos

Simonyan, Karen; Zisserman, Andrew

Computer Science > Computer Vision and Pattern Recognition

arXiv:1406.2199 (cs)

[Submitted on 9 Jun 2014 (v1), last revised 12 Nov 2014 (this version, v2)]

Title:Two-Stream Convolutional Networks for Action Recognition in Videos

Authors:Karen Simonyan, Andrew Zisserman

View PDF

Abstract:We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. The challenge is to capture the complementary information on appearance from still frames and motion between frames. We also aim to generalise the best performing hand-crafted features within a data-driven learning framework.
Our contribution is three-fold. First, we propose a two-stream ConvNet architecture which incorporates spatial and temporal networks. Second, we demonstrate that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data. Finally, we show that multi-task learning, applied to two different action classification datasets, can be used to increase the amount of training data and improve the performance on both.
Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art. It also exceeds by a large margin previous attempts to use deep nets for video classification.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1406.2199 [cs.CV]
	(or arXiv:1406.2199v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1406.2199

Submission history

From: Karen Simonyan [view email]
[v1] Mon, 9 Jun 2014 14:44:14 UTC (654 KB)
[v2] Wed, 12 Nov 2014 20:48:33 UTC (680 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2014-06

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Karen Simonyan
Andrew Zisserman

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Two-Stream Convolutional Networks for Action Recognition in Videos

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Two-Stream Convolutional Networks for Action Recognition in Videos

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators