Introduction To Robot Vision: Ziv Yaniv Computer Aided Interventions and Medical Robotics, Georgetown University

Introduction to Robot Vision
Ziv Yaniv
Computer Aided Interventions and Medical Robotics,
Georgetown University
Vision
The special sense by which the qualities of an

object (as color, luminosity, shape, and size)
constituting its appearance are perceived
through a process in which light rays entering
the eye are transformed by the retina into
electrical signals that are transmitted to the brain
via the optic nerve.
[Miriam Webster dictionary]
The Sensor
endoscope
Single Lens Reflex

(SLR) Camera
webcam
C-arm X-ray
The Sensor
Model: Pin-hole Camera, Perspective Projection

z
x
y Im
ag
e p la
pri ne
nc
iple
po
int
is l
ax tica
y
op
x z
focal point
Machine Vision
Goal:
Obtain useful information about the 3D world from 2D
images.
Model:
Regions 3D Geometry
Textures Object identification
Corners Activity detection
Lines …
images … actions
Machine Vision
Goal:
Obtain useful information about the 3D world from 2D
images.
•Low level (image processing)
• image filtering (smoothing, histogram modification…),
• feature extraction (corner detection, edge detection,…)
• stereo vision
• shape from X (shading, motion,…)
•…
• High level (machine learning/pattern recognition)
• object detection
• object recognition
• clustering
•…
Machine Vision
• How hard can it be?

Machine Vision
• How hard can it be?

Robot Vision
1. Simultaneous Localization and Mapping

(SLAM)
2. Visual Servoing.
Robot Vision

(SLAM) – create a 3D map of the world and
localize within this map.
NASA stereo vision image processing, as used by the MER Mars rovers
Robot Vision

(SLAM) – create a 3D map of the world and
localize within this map.
“Simultaneous Localization and Mapping with Active Stereo Vision”, J.

Diebel, K. Reuterswärd, S. Thrun, J. Davis, R. Gupta, IROS 2004.
Robot Vision
1. Visual Servoing – Using visual feedback to

control a robot:
a) image-based systems: desired motion directly from

image.
“An image-based visual servoing scheme for

following paths with nonholonomic mobile
robots” A. Cherubini, F. Chaumette, G. Oriolo,
ICARCV 2008.
Robot Vision
1. Visual Servoing – Using visual feedback to

control a robot:
b) Position-based systems: desired motion from 3D

reconstruction estimated from image.
System Configuration
• Difficulty of similar tasks in different settings

varies widely:
–
How many cameras?
–
Are the cameras calibrated?
–
What is the camera-robot configuration?
–
Is the system calibrated (hand-eye calibration)?
Common configurations:
y
x y
y x
z
x y y z
y y
z x x
x x
z z
z z
System Characteristics
• The greater the control over the system

configuration and environment the easier it is to
execute a task.
• System accuracy is directly dependent upon model

accuracy – what accuracy does the task require?.
• All measurements and derived quantitative values

have an associated error.
Stereo Reconstruction
• Compute the 3D location of a point in the stereo rig’s coordinate system:

• Rigid transformation between the two cameras is known.
• Cameras are calibrated –given a point in the world coordinate system we
know how to map it to the image.
• Same point localized in the two images.
world
Camera 1
Camera 2
2
T
1
Commercial Stereo Vision
Polaris Vicra infra-red system MicronTracker visible light system

(Northern Digitial Inc.) (Claron Technology Inc.)
Commercial Stereo Vision
Images acquired by the Polaris Vicra infra-red stereo system:
left image right image

Stereo Reconstruction
• Wide or short baseline – reconstruction accuracy vs. difficulty of point matching
Camera 1 Camera 2
Camera 2 Camera 2
Camera Model
• Points P, p, and O, given in the camera coordinate system, are collinear.
There is a number  for which O + P = p
P = p
X Y
 = f/Z , therefore x  f y f
Z Z
P=[X,Y,Z]
p=[x,y,f]
y X 
x u   f 0 0 0  
v  0   Y
z    f 0 0
O
f
Z 
 w  0 0 1 0  
1
Camera Model
Transform the pixel coordinates from the camera coordinate system to the image
coordinate system:
• Image origin (principle point) is at [x0,y0] relative to the camera coordinate
system.
• Need to change from metric units to pixels, scaling factors kx, ky.
X 
y u '
   x fk 0 x 0 0  
[x’,y’]  v'    0 fk y 0 Y 
   y 0  Z 
x  w'  0 0 1 0  
principle point
1
• Finally, the image coordinate system may be skewed resulting in:

X 
 u '   fk x s x0 0  
 v'    0   Y
   fk y y0 0
Z 
 w'  0 0 1 0  
1
Camera Model
• As our original assumption was that points are given in the camera coordinate
system, a complete projection matrix is of the form:
R  RC
M 34  K 33[I 33 | 031 ]   KR[I | C]
0 1   fk x s x0 
K   0 fk y y0 
 0 0 1 
C – camera origin in the
world coordinate system.
• How many degrees of freedom does M have?
 m11 m12 m13 m14   M 1T 

 
M 34  m21 m22 m23 m24    M 2T 
 m31 m32 m33 m34   M 3T 
Camera Calibration
• Given pairs of points, piT=[x,y,w], PiT=[X,Y,Z,W], in homogenous coordinates we

have:
image z
p  MP coordinate
system
x calibration object/
y world coordinate
system
principle point
Our goal is to estimate M

y
z
x
camera coordinate system
•As the points are in homogenous coordinates the vectors p and MP are not
necessarily equal, they have the same direction but may differ by a non-zero scale
factor.
p  MP  0
Camera Calibration
• After a bit of algebra we have:
 0T  wi PiT yi PiT   M1 
 
 wi Pi
T
0T  xi PiT  M 2   0
 yi PiT xi PiT 0 T  M 3 

Am  0
xi y
• The three equations are linearly dependent:  A1  i A 2  A 3
wi wi
• Each point pair contributes two equations.
• Exact solution: M has 11 degrees of freedom, requiring a minimum of n=6 pairs.
• Least squares solution: For n>6 minimize ||Am|| s.t. ||m||=1.

Obtaining the Rays
• Camera location in the calibration object’s coordinate system, C, is

given by the one dimensional right null space of the matrix M
(MC=0).
• A 3D homogenous point P = M+p is on the ray defined by p and the

camera center [it projects onto p, MM+p =Ip=p].
• These two points define our ray in the world coordinate system.
• As both cameras were calibrated with respect to the same

coordinate system the rays will be in the same system too.
Intersecting the Rays
r1 (t1 )  a1  t1n1
r2 (t 2 )  a 2  t 2n 2 a1 n1
n2
a2
T
((a 2  a1 )  n 2 ) (n1  n 2 ) ((a 2  a1 )  n1 ) T (n1  n 2 )
t1  t2  2
n1  n 2
2 n1  n 2
[r1 (t1 )  r2 (t 2 )]
2
World vs. Model
• Actual cameras most often don’t follow the ideal pin-hole model, usually exhibit
some form of distortion (barrel, pin-cushion, S).
• Sometimes the world changes to fit your model, improvements in camera/lens

quality can improve model performance.
old image-Intensifier x-ray: replaced by flat panel x-ray: pin-hole

pin-hole+distortion
Additional Material
• Code:
– Camera calibration toolbox for matlab (Jean-Yves Bouguet )
http://www.vision.caltech.edu/bouguetj/calib_doc/
• Machine Vision:
– “Multiple View Geometry in Computer Vision”, Hartley and Zisserman,
Cambridge University Press.
– "Machine Vision", Jain, Kasturi, Schunck, McGraw-Hill.
• Robot Vision:
– “Simultaneous Localization and Mapping: Part I”, H. Durant-Whyte, T. Bailey,
IEEE Robotics and Automation Magazine, Vol. 13(2), pp. 99-110, 2006.
– “Simultaneous Localization and Mapping (SLAM) : Part II”,T. Bailey, H. Durant-
Whyte, IEEE Robotics and Automation Magazine, Vol. 13(3), pp. 108-117, 2006.
– “Visual Servo Control Part I: Basic Approaches”, IEEE Robotics and Automation
Magazine, Vol. 13(4), 82-90, 2006.
– Visual Servo Control Part II: Advanced Approaches”, IEEE Robotics and
Automation Magazine, Vol. 14(1), 109-118, 2007.

Introduction To Robot Vision: Ziv Yaniv Computer Aided Interventions and Medical Robotics, Georgetown University

Uploaded by

Copyright:

Available Formats

Introduction To Robot Vision: Ziv Yaniv Computer Aided Interventions and Medical Robotics, Georgetown University

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Robot Vision: Ziv Yaniv Computer Aided Interventions and Medical Robotics, Georgetown University

Uploaded by

Copyright:

Available Formats

Introduction to Robot Vision

The special sense by which the qualities of an

Single Lens Reflex

Model: Pin-hole Camera, Perspective Projection

• How hard can it be?

• How hard can it be?

1. Simultaneous Localization and Mapping

1. Simultaneous Localization and Mapping

1. Simultaneous Localization and Mapping

“Simultaneous Localization and Mapping with Active Stereo Vision”, J.

1. Visual Servoing – Using visual feedback to

a) image-based systems: desired motion directly from

“An image-based visual servoing scheme for

1. Visual Servoing – Using visual feedback to

b) Position-based systems: desired motion from 3D

• Difficulty of similar tasks in different settings

• The greater the control over the system

• System accuracy is directly dependent upon model

• All measurements and derived quantitative values

• Compute the 3D location of a point in the stereo rig’s coordinate system:

Polaris Vicra infra-red system MicronTracker visible light system

Images acquired by the Polaris Vicra infra-red stereo system:

left image right image

• Wide or short baseline – reconstruction accuracy vs. difficulty of point matching

There is a number  for which O + P = p

• Finally, the image coordinate system may be skewed resulting in:

 m11 m12 m13 m14   M 1T 

• Given pairs of points, piT=[x,y,w], PiT=[X,Y,Z,W], in homogenous coordinates we

Our goal is to estimate M

• After a bit of algebra we have:

• Exact solution: M has 11 degrees of freedom, requiring a minimum of n=6 pairs.

• Least squares solution: For n>6 minimize ||Am|| s.t. ||m||=1.

• Camera location in the calibration object’s coordinate system, C, is

• A 3D homogenous point P = M+p is on the ray defined by p and the

• As both cameras were calibrated with respect to the same

• Sometimes the world changes to fit your model, improvements in camera/lens

old image-Intensifier x-ray: replaced by flat panel x-ray: pin-hole

You might also like