Generic 3D Representation via Pose Estimation and Matching

Zamir, Amir R.; Wekel, Tilman; Argrawal, Pulkit; Weil, Colin; Malik, Jitendra; Savarese, Silvio

doi:10.1007/978-3-319-46487-9_33

Computer Science > Computer Vision and Pattern Recognition

arXiv:1710.08247 (cs)

[Submitted on 23 Oct 2017]

Title:Generic 3D Representation via Pose Estimation and Matching

Authors:Amir R. Zamir, Tilman Wekel, Pulkit Argrawal, Colin Weil, Jitendra Malik, Silvio Savarese

View PDF

Abstract:Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited. In this paper, we learn a generic 3D representation through solving a set of foundational proxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching. Our method is based upon the premise that by providing supervision over a set of carefully selected foundational tasks, generalization to novel tasks and abstraction capabilities can be achieved. We empirically show that the internal representation of a multi-task ConvNet trained to solve the above core problems generalizes to novel 3D tasks (e.g., scene layout estimation, object pose estimation, surface normal estimation) without the need for fine-tuning and shows traits of abstraction abilities (e.g., cross-modality pose estimation). In the context of the core supervised tasks, we demonstrate our representation achieves state-of-the-art wide baseline feature matching results without requiring apriori rectification (unlike SIFT and the majority of learned features). We also show 6DOF camera pose estimation given a pair local image patches. The accuracy of both supervised tasks come comparable to humans. Finally, we contribute a large-scale dataset composed of object-centric street view scenes along with point correspondences and camera pose information, and conclude with a discussion on the learned representation and open research questions.

Comments:	Published in ECCV16. See the project website this http URL and dataset website this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
Cite as:	arXiv:1710.08247 [cs.CV]
	(or arXiv:1710.08247v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1710.08247
Journal reference:	ECCV 2016 535-553
Related DOI:	https://doi.org/10.1007/978-3-319-46487-9_33

Submission history

From: Tilman Wekel [view email]
[v1] Mon, 23 Oct 2017 13:01:05 UTC (7,367 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Generic 3D Representation via Pose Estimation and Matching

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Generic 3D Representation via Pose Estimation and Matching

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators