Introduction To Robot Vision: Ziv Yaniv Computer Aided Interventions and Medical Robotics, Georgetown University

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 28

Introduction to Robot Vision

Ziv Yaniv
Computer Aided Interventions and Medical Robotics,
Georgetown University

The special sense by which the qualities of an

object (as color, luminosity, shape, and size)
constituting its appearance are perceived
through a process in which light rays entering
the eye are transformed by the retina into
electrical signals that are transmitted to the brain
via the optic nerve.
[Miriam Webster dictionary]
The Sensor


Single Lens Reflex

(SLR) Camera

C-arm X-ray
The Sensor

Model: Pin-hole Camera, Perspective Projection


y Im
e p la
pri ne
is l
ax tica


x z

focal point
Machine Vision

Obtain useful information about the 3D world from 2D


Regions 3D Geometry
Textures Object identification
Corners Activity detection
Lines …
images … actions
Machine Vision

Obtain useful information about the 3D world from 2D
•Low level (image processing)
• image filtering (smoothing, histogram modification…),
• feature extraction (corner detection, edge detection,…)
• stereo vision
• shape from X (shading, motion,…)
• High level (machine learning/pattern recognition)
• object detection
• object recognition
• clustering
Machine Vision

• How hard can it be?

Machine Vision

• How hard can it be?

Robot Vision

1. Simultaneous Localization and Mapping


2. Visual Servoing.
Robot Vision

1. Simultaneous Localization and Mapping

(SLAM) – create a 3D map of the world and
localize within this map.

NASA stereo vision image processing, as used by the MER Mars rovers
Robot Vision

1. Simultaneous Localization and Mapping

(SLAM) – create a 3D map of the world and
localize within this map.

“Simultaneous Localization and Mapping with Active Stereo Vision”, J.

Diebel, K. Reuterswärd, S. Thrun, J. Davis, R. Gupta, IROS 2004.
Robot Vision

1. Visual Servoing – Using visual feedback to

control a robot:

a) image-based systems: desired motion directly from


“An image-based visual servoing scheme for

following paths with nonholonomic mobile
robots” A. Cherubini, F. Chaumette, G. Oriolo,
ICARCV 2008.
Robot Vision

1. Visual Servoing – Using visual feedback to

control a robot:

b) Position-based systems: desired motion from 3D

reconstruction estimated from image.
System Configuration

• Difficulty of similar tasks in different settings

varies widely:

How many cameras?

Are the cameras calibrated?

What is the camera-robot configuration?

Is the system calibrated (hand-eye calibration)?
Common configurations:
x y

y x
x y y z
y y
z x x
x x
z z
z z
System Characteristics

• The greater the control over the system

configuration and environment the easier it is to
execute a task.

• System accuracy is directly dependent upon model

accuracy – what accuracy does the task require?.

• All measurements and derived quantitative values

have an associated error.
Stereo Reconstruction

• Compute the 3D location of a point in the stereo rig’s coordinate system:

• Rigid transformation between the two cameras is known.
• Cameras are calibrated –given a point in the world coordinate system we
know how to map it to the image.
• Same point localized in the two images.


Camera 1

Camera 2
Commercial Stereo Vision

Polaris Vicra infra-red system MicronTracker visible light system

(Northern Digitial Inc.) (Claron Technology Inc.)
Commercial Stereo Vision

Images acquired by the Polaris Vicra infra-red stereo system:

left image right image

Stereo Reconstruction

• Wide or short baseline – reconstruction accuracy vs. difficulty of point matching

Camera 1 Camera 2

Camera 2 Camera 2
Camera Model
• Points P, p, and O, given in the camera coordinate system, are collinear.

There is a number  for which O + P = p

P = p

 = f/Z , therefore x  f y f

y X 
x u   f 0 0 0  
v  0   Y
z    f 0 0
Z 
 w  0 0 1 0  
Camera Model
Transform the pixel coordinates from the camera coordinate system to the image
coordinate system:
• Image origin (principle point) is at [x0,y0] relative to the camera coordinate
• Need to change from metric units to pixels, scaling factors kx, ky.
X 
y u '
   x fk 0 x 0 0  
[x’,y’]  v'    0 fk y 0 Y 
   y 0  Z 
x  w'  0 0 1 0  
principle point

• Finally, the image coordinate system may be skewed resulting in:

X 
 u '   fk x s x0 0  
 v'    0   Y
   fk y y0 0
Z 
 w'  0 0 1 0  
Camera Model

• As our original assumption was that points are given in the camera coordinate
system, a complete projection matrix is of the form:

R  RC
M 34  K 33[I 33 | 031 ]   KR[I | C]
0 1   fk x s x0 
K   0 fk y y0 
 0 0 1 
C – camera origin in the
world coordinate system.
• How many degrees of freedom does M have?

 m11 m12 m13 m14   M 1T 

 
M 34  m21 m22 m23 m24    M 2T 
 m31 m32 m33 m34   M 3T 
Camera Calibration

• Given pairs of points, piT=[x,y,w], PiT=[X,Y,Z,W], in homogenous coordinates we

image z
p  MP coordinate
x calibration object/
y world coordinate
principle point

Our goal is to estimate M

camera coordinate system

•As the points are in homogenous coordinates the vectors p and MP are not
necessarily equal, they have the same direction but may differ by a non-zero scale
p  MP  0
Camera Calibration

• After a bit of algebra we have:

 0T  wi PiT yi PiT   M1 
 
 wi Pi
0T  xi PiT  M 2   0
 yi PiT xi PiT 0 T  M 3 

Am  0
xi y
• The three equations are linearly dependent:  A1  i A 2  A 3
wi wi
• Each point pair contributes two equations.

• Exact solution: M has 11 degrees of freedom, requiring a minimum of n=6 pairs.

• Least squares solution: For n>6 minimize ||Am|| s.t. ||m||=1.

Obtaining the Rays

• Camera location in the calibration object’s coordinate system, C, is

given by the one dimensional right null space of the matrix M

• A 3D homogenous point P = M+p is on the ray defined by p and the

camera center [it projects onto p, MM+p =Ip=p].

• These two points define our ray in the world coordinate system.

• As both cameras were calibrated with respect to the same

coordinate system the rays will be in the same system too.
Intersecting the Rays

r1 (t1 )  a1  t1n1
r2 (t 2 )  a 2  t 2n 2 a1 n1

((a 2  a1 )  n 2 ) (n1  n 2 ) ((a 2  a1 )  n1 ) T (n1  n 2 )
t1  t2  2
n1  n 2
2 n1  n 2

[r1 (t1 )  r2 (t 2 )]
World vs. Model
• Actual cameras most often don’t follow the ideal pin-hole model, usually exhibit
some form of distortion (barrel, pin-cushion, S).

• Sometimes the world changes to fit your model, improvements in camera/lens

quality can improve model performance.

old image-Intensifier x-ray: replaced by flat panel x-ray: pin-hole

Additional Material
• Code:
– Camera calibration toolbox for matlab (Jean-Yves Bouguet )

• Machine Vision:
– “Multiple View Geometry in Computer Vision”, Hartley and Zisserman,
Cambridge University Press.
– "Machine Vision", Jain, Kasturi, Schunck, McGraw-Hill.

• Robot Vision:
– “Simultaneous Localization and Mapping: Part I”, H. Durant-Whyte, T. Bailey,
IEEE Robotics and Automation Magazine, Vol. 13(2), pp. 99-110, 2006.
– “Simultaneous Localization and Mapping (SLAM) : Part II”,T. Bailey, H. Durant-
Whyte, IEEE Robotics and Automation Magazine, Vol. 13(3), pp. 108-117, 2006.
– “Visual Servo Control Part I: Basic Approaches”, IEEE Robotics and Automation
Magazine, Vol. 13(4), 82-90, 2006.
– Visual Servo Control Part II: Advanced Approaches”, IEEE Robotics and
Automation Magazine, Vol. 14(1), 109-118, 2007.

You might also like