The Patch Transform and Its Applications To Image Editing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

The patch transform and its applications to image editing

Taeg Sang Cho∗, Moshe Butman†, Shai Avidan‡, William T. Freeman∗



CSAIL, Massachusetts Institute of Technology

Bar-Ilan University

Adobe Systems Inc.
[email protected], [email protected], [email protected], [email protected]

Abstract image “patch transform”. We break the image into small,


non-overlapping patches, and manipulate the image in this
We introduce the patch transform, where an image is “patch domain”. We can constrain patch positions, and
broken into non-overlapping patches, and modifications or add or remove patches from this pool. This allows explicit
constraints are applied in the “patch domain”. A modi- control of how much of each texture is in the image, and
fied image is then reconstructed from the patches, subject where textures and objects appear. From this modified set
to those constraints. When no constraints are given, the of patches, we reconstruct an image, requiring that all the
reconstruction problem reduces to solving a jigsaw puzzle. patches fit together while respecting the user’s constraints.
Constraints the user may specify include the spatial loca- This allows many useful image editing operations. The
tions of patches, the size of the output image, or the pool user can select regions of the image and move them to new
of patches from which an image is reconstructed. We define locations. The patch transform reconstruction will then try
terms in a Markov network to specify a good image recon- to complete the rest of the image with remaining patches
struction from patches: neighboring patches must fit to form in a visually pleasing manner, allowing the user to generate
a plausible image, and each patch should be used only once. images with different layout but similar content as the orig-
We find an approximate solution to the Markov network us- inal image. If the user specifies both the position of some
ing loopy belief propagation, introducing an approximation patches and the size of the target image, we can perform im-
to handle the combinatorially difficult patch exclusion con- age retargeting [2], fitting content from the original image
straint. The resulting image reconstructions show the origi- into the new size. Alternatively, the user may increase or de-
nal image, modified to respect the user’s changes. We apply crease the number of patches from a particular region (say,
the patch transform to various image editing tasks and show the sky or clouds), then reconstruct an image that respects
that the algorithm performs well on real world images. those modifications. The user can also mix patches from
multiple images to generate a collage combining elements
of the various source images.
1. Introduction
To “invert” the patch transform–reconstruct an image
A user may want to make various changes to an image, from the constrained patches–we need two ingredients. The
such as repositioning objects, or adding or removing tex- patches should all fit together with minimal artifacts, so we
tures. These image changes can be difficult to make using need to define a compatibility function that specifies how
existing editing tools. Consider repositioning an object. It likely any two patches are to be positioned next to each
first must be selected, moved to the new location, blended other. For this, we exploit recent work on the statistics of
into its surroundings, then the hole left where the object natural images, and favor patch pairings for which the abut-
was must be filled-in through texture synthesis or image in- ting patch regions are likely to form images. We then need
painting. Even after these steps, the image may not look an algorithm to find a good placement for all the patches,
right: the pixels over which the object is moved are lost, while penalizing the use of a patch more than once. For
and the repositioned object may not fit well in its new sur- that, we define a probability for all possible configuration
roundings. In addition, filled-in textures may change the of patches, and introduce a tractable approximation for a
balance of textures from that of the original image. term requiring that each patch be used only once. We then
It would be more convenient to specify only the desired solve the resulting Markov Random Field for a good set of
changes, and let the image automatically adjust itself ac- patch placements using belief propagation (BP).
cordingly. To allow this form of editing, we introduce an We describe related work in Section 2 and develop the
patch transform algorithm in Section 3. Section 4 will in- ence calculations in the overlap regions. Freeman, Pasztor
troduce several applications of the patch transform. and Carmichael [11] used similar patch compatibilities, and
used loopy belief propagation in an MRF to select image
2. Background patches from a set of candidates. Kwatra et al. [17], and
Komodakis and Tziritas [16] employed related Markov ran-
The inverse patch transform is closely related to solving dom field models, solved using graph cuts or belief prop-
jigsaw puzzles. The jigsaw puzzle problem was shown to agation, for texture synthesis and image completion. The
be NP-complete because it can be reduced to the Set Parti- squared-difference compatibility measures don’t general-
tion Problem [8]. Nevertheless, attempts to (approximately) ize to new patch combinations as well as our compatibility
solve the jigsaw puzzle abound in various applications: re- measures based on image statistics. The most salient differ-
constructing archaeological artifacts [15], fitting a protein ence from all texture synthesis methods is the patch trans-
with known amino acid sequence to a 3D electron density form’s constraint against multiple uses of a single patch.
map [26], and reconstructing a text from fragments [ 19]. This allows for the patch transform’s controlled rearrange-
Image jigsaw puzzles can be solved by exploiting the ment of an image.
shape of patches, their contents, or both. In a shape-based
approach, the patches do not take a rectangular shape, but 3. The inverse patch transform
the problem is still NP-complete because finding the correct
order of the boundary patches can be reduced to the travel- After the user has modified the patch statistics of the
ing salesperson problem. The largest jigsaw puzzle solved original image, or has constrained some patch positions, we
with a shape-based approach was 204 patches [12]. Chung want to perform an “inverse patch transform”, piecing to-
et al. [6] used both shape and color to reconstruct an image gether the patches to form a plausible image. To accomplish
and explore several graph-based assignment techniques. this, we define a probability for all possible combination of
Our patch transform approach tries to side-step other patches.
typical image editing tasks, such as region selection [21] In a good placement of patches, (1) adjacent patches
and object placement or blending [18, 22, 28]. The patch should all plausibly fit next to each other, (2) each patch
transform method allows us to use very coarse image re- should not be used more than once (in solving the patch
gion selection, only to patch accuracy, rather than pixel or placements, we relax this constraint to each patch seldom
sub-pixel accuracy. Simultaneous Matting and Composit- being used more than once), and (3) the user’s constraints
ing [27] works on a pixel level and was shown to work well on patch positions should be maintained. Each of these re-
only for in-place object scaling, thus avoiding the difficult quirements can be enforced by terms in a Markov Random
tasks of hole filling, image re-organization or image retar- Field (MRF) probability.
geting. Using the patch transform, one does not need to Let each node in an MRF represent a spatial position
be concerned about the pixels over which an object will be where we will place a patch. The unknown state at the ith
moved, since those underlying patches will “get out of the node is the index of the patch to be placed there, x i . Based
way” and reposition themselves elsewhere in the image dur- on how plausibly one patch fits next to another, we define
ing the image reconstruction step. Related functionalities, a compatibility, ψ. Each patch has four neighbors (except
obtained using a different approach, are described in [ 25]. at the image boundary), and we write the compatibility of
Because we seek to place image patches together in a patch k with patch l, placed at neighboring image positions
composite, our work relates to larger spatial scale versions i and j to be ψi,j (k, l). (We use the position subscripts i, j
of that task, including Auto Collage [24] and panorama in the function ψ i,j only to keep track of which of the four
stitching [5], although with different goals. Jojic et al. [13] neighbor relationships of j relative to i is being referred to
and Kannan et al. [14] have developed “epitomes” and “jig- (up, down, left, or right)).
saws”, where overlapping patches from a smaller source im- We let x be a vector of the unknown patch indices x i at
age are used to generate a larger image. These models are each of the N image positions i. We include a “patch ex-
applied primarily for image analysis. clusion” function, E(x), which is zero if any two elements
Non-parametric texture synthesis algorithms, such as of x are the same (if any patch is used more than once)
[4], and image filling-in, such as [3, 7, 10], can involve com- and otherwise one. The user’s constraints on patch posi-
bining smaller image elements and are more closely related tions are represented by local evidence terms, φ i (xi ), and
to our task. Also related, in terms of goals and techniques, are described more in detail in Section 4.
are the patch-based image synthesis methods [7, 9], which Combining these terms, we define the probability of an
also require compatibility measures between patches. Efros assignment, x, of patches to image positions to be
and Freeman [9] and Liang et al. [20] used overlapping
1  
patches to synthesize a larger texture image. Neighboring P (x) = φi (xi ) ψij (xi , xj )E(x) (1)
patch compatibilities were found through squared differ- Z i
i,j∈N (i)
1

0.8
Patch i Patch j
0.6

0.4
A
Figure 1. ψi,j is computed by convolving the boundary of two
0.2
patches with filters, and combining the filter outputs with a GSM-
FOE model. 0
(a) (b)
We have already defined the user-constraints, φ, and Figure 2. (a) and (b) show a part of pLR , pDU , of Fig. 4(a) in a
patch exclusion term, E. In the next section, we specify matrix form. pLR (i, j) is the probability of placing the patch i to
the patch-to-patch compatibility term, ψ, and then describe the right of the patch j, whereas pDU (i, j) is the probability of
how we find patch assignments x that approximately maxi- placing the patch i to the top of the patch j. The patches are pre-
ordered, so the correct matches, which would generate the original
mize P (x) in Eq. (1).
image, are the row-shifted diagonal components.

3.1. Computing the compatibility among patches score for all four possible spatial arrangements of all pos-
We want two patches to have a high compatibility score sible pairs of patches for the image. Fig. 2 shows the result-
if, when they are placed next to each other, the pixel values ing patch-patch compatibility matrices for two of the four
across the seam look like natural image data. We quantify possible patch spatial relationships.
this using two terms, a natural image prior and a color dif-
ference prior. 3.2. Approximate solution by belief propagation
For the natural image prior term, we apply the filters of Now we have defined all the terms in Eq. (1) for the
the Gaussian Scale Mixture Fields of Experts (GSMFOE) probability of any assignment x of patches to image po-
model [23, 29] to compute a score, ψ i,j
A
(k, l), for patches k sitions. Finding the assignment x that maximizes P (x) in
and l being in the relative relationship of positions i and j, the MRF of Eq. (1) is NP-hard, but approximate methods
as illustrated in Fig. 1. The compatibility score is computed can nonetheless give good results. One such method is be-
with Eq. (2): lief propagation. Belief propagation is an exact inference
algorithm for Markov networks without loops, but can give
J  
1   πq good results even in some networks with loops [30]. For
ψi,j
A
(k, l) = exp(−wl xm (k, l))
T
(2) belief propagation applied in networks with loops, different
Z q=1
σq
l,m factorizations of the MRF joint probability can lead to dif-
ferent results. Somewhat counter-intuitively, we found bet-
where x(k, l) is the luminance component at the boundary ter results for this problem using an alternative factorization
of patches (k, l), σq , πq are GSMFOE parameters, and w l of Eq. (1) as a directed graph we describe below.
are the learned filters. σ q , πq , wl are available online 1 . We can express Eq. (1) in terms of conditional probabil-
We found improved results if we included an addi- ities if we define a normalized compatibility,
tional term that is sensitive to color differences between the
patches. We computed the color compatibility, ψ i,j B
, be- ψi,j (xi , xj )
pi,j (xi |xj ) = M (4)
tween two patches by exponentiating the sum of squared i=1 ψi,j (xi , xj )
distance among adjacent pixels at the patch boundaries.
and the local evidence term p(y i |xi ) = φi (xi ). Then we
  can express the joint probability of Eq. (1) as
(r(k) − r(l))2
ψi,j
B
(k, l) ∝ exp − 2 (3) 1  
σclr P (x) = p(yi |xi )pi,j (xj |xi )p(xi )E(x) (5)
Z i=1
j∈N (i)
where r(· ) is the color along the corresponding boundary
of the argument, and σ clr is fixed as 0.2 after cross vali- where N (i) is the neighboring indices of x i , yi is the origi-
dation. The final patch compatibility is then ψ i,j (k, l) = nal patch at location i, and p i,j is the appropriate normalized
ψi,j
A
(k, l)ψi,j
B
(k, l). compatibility determined by the relative location of x j with
Typically, we break the image into patches of 32 × 32 respect to xi . A similar factorization for an MRF was used
pixels, and for typical image sizes this generates ∼ 300 in [11]. We can manipulate the patch statistics (Section 4.2)
non-overlapping patches. We compute the compatibility through p(xi ), but in most cases we model p(x i ) as a uni-
form distribution, and is amortized into the normalization
1 http://www.cs.huji.ac.il/ yweiss/BRFOE.zip constant Z.
The approximate marginal probability at a node i can be
found by iterating the belief propagation message update
rules until convergence [30]. Ignoring the exclusivity term
E(x) for now, the message update rules for this factoriza-
tion are as follows. Let us suppose that x j is an image node
to the left of xi . Then the message from x j to xi is:
 
mji (xi ) ∝ pi,j (xi |xj )p(yj |xj ) mlj (xj ) (6)
xj l∈N (j)\i
(a) (b)
Figure 3. (a) The inverse patch transform with the proposed com-
Messages from nodes that are to the right/top/bottom of x i patibility function reconstructs the original image perfectly. (b)
are similarly defined with an appropriate p i,· (xi |· ). The When a simple color and gradient-based compatibility measure is
patch assignment at node x i is: used in the proposed message passing scheme, the algorithm can-
x̂i = argmax bi (xi = l) (7) not reconstruct the original image perfectly.
l

where the belief at node x i is defined as follows: where we have assumed that m tf is normalized to 1. In
 words, the factor f tells the node x i to place low probability
bi (xi ) = p(yi |xi ) mji (xi ) (8)
on l if l has already been claimed by another node with a
j∈N (i)
high probability, and is intuitively satisfying.
The proposed message passing scheme has been tested to
3.2.1 Handling the patch exclusion term solve the jigsaw puzzle problem (Fig. 3.) In most cases, the
In most cases, running the above message passing scheme original image is perfectly reconstructed (Fig. 3(a)), but if
does not result in a visually plausible image because a triv- the region lacks structure, such as in foggy or clear sky, the
ial solution to the above message passing scheme (without algorithm mixes up the order of patches. When one patch
any local evidence) is to assign a single bland patch to all is only weakly favored over others, it may lack the power
nodes xi . To find a more plausible solution, we want to re- to suppress its re-use, in our approximate exclusion term.
quire that each patch be used only once. We call this an However, for image editing applications, these two recon-
exclusivity term. struction shortcomings seldom cause visible artifacts.
Since the exclusivity term is a global function involving
all xi , we can represent it as a factor node [30] that’s con- 4. Image Editing Applications
nected to every image node x i . The message from x i to the
The patch transform framework renders a new perspec-
factor (mif ) is the same as the belief without the exclusivity
tive on the way we manipulate images. Applications in-
term (Eq. (8)), and the message from the factor to the node
troduced in this section follow a unified pipeline: the user
xi can be computed as follows:
  manipulates the patch statistics of an image, and specifies
mf i (xi ) = ψF (x1 , ..., xN |xi ) mtf (xt ) a number of constraints to be satisfied by the new image.
{x1 ,...,xN }\xi t∈S\i Then the patch transform generates an image that conforms
(9) to the request.
where S is the set of all nodes x i . If any of the two The user-specified constraint can be incorporated into
nodes (xl , xm ) ∈ S share the same patch, ψ F (· ) is zero, the patch transform framework with the local evidence term.
and is one otherwise. The message computation involves If the user has constrained patch k to be at image position
marginalizing N − 1 state variables that can take on M dif- i, then p(yi |xi = k) = 1 and p(yi |xi = l) = 0, for l = k.
ferent values (i.e. O(M (N −1) )), which is intractable. At unconstrained nodes, the low-resolution version of the
We approximate ψ F (· ) as follows: original image can serve as a noisy observation y i :
  
ψF (x1 , ..., xN |xi ) ≈ ψFt (xt |xi ) (10) (yi − m(l))2
p(yi |xi = l) ∝ exp − 2 (12)
t∈S\i σevid
where ψFj (xj |xi ) = 1 − δ(xj − xi ). Combining Eq. (9)
where m(l) is the mean color of patch l, and σ evid = 0.4
and Eq. (10),
determined through cross-validation. Eq. ( 12) allows the
 
M
algorithm to keep the scene structure correct (i.e. sky at the
mf i (xi = l) ≈ ψFt (xt |xi = l)mtf (xt ) top and grass at the bottom), and is used in all applications
t∈S\i xt =1 (11) described in this section unless specified otherwise. The

= (1 − mtf (xt = l)) spatial constraints of patches are shown by the red bounding
t∈S\i boxes in the resulting image.
(a) (b) (c) (d)
Figure 4. This example illustrates how the patch transform framework can be used to recenter a region / object of interest. (a) The original
image. (b) The inverse patch transform result. Notice that the overall structure and context is preserved. (c) Another inverse patch transform
result. This figure shows that the proposed framework is insensitive to the size of the bounding box. (d) Using the same constraint as that
of (b), a texture synthesis method by Efros and Leung [10] is used to generate a new image.

(a) (b)
Figure 6. This example verifies that the proposed framework can
still work well in the presence of complex background. (a) The
original image. (b) The inverse patch transform result with the user
constraint. While the algorithm fixed the building, the algorithm
(a) (b) reshuffled the patches in the garden to accommodate the changes
Figure 5. This example shows that the proposed framework can in woman’s position.
be used to change the relative position of multiple objects in the
image. (a) The original image. (b) The inverse patch transform re- the image is preserved (e.g. the mountain background.) The
sult with a user specified constraint that the child should be placed algorithm is robust to changes in the size of the bounding
further ahead of his father. box, as shown in Fig. 4(c), as long as enough distinguished
region is selected.
Since BP can settle at local minima, we run the patch We compared our result with texture synthesis. Fig. 4(d)
transform multiple times with random initial seeds, and let shows that the Efros and Leung algorithm [10] generates
the user choose the best-looking image from the resulting artifacts by propagating girl’s hair into the bush. Because
candidates. To stabilize BP, the message at iteration i is of the computational cost of [10], we computed Fig. 4(d) at
damped by taking the weighted geometric mean with the one quarter the resolution of Fig. 4(b).
message at iteration i − 1. Inevitably, in the modified im- The user can also reconfigure the relative position of ob-
age there will be visible seams between some patches. We jects in an image. For example, in Fig. 5(a), the user may
suppress these artifacts by using the Poisson equation [22] prefer a composition with the child further ahead of his
to reconstruct the image from all its gradients, except those father. Conventionally, the user would generate a meticu-
across any patch boundary. lous matte of the child, move him to a new location, blend
the child into that new location, and hope to fill in the
4.1. Reorganizing objects in an image
original region using an image inpainting technique. The
The user may be interested in moving an object to a new patch transform framework provides a simple alternative.
location in the image while keeping the context. An exam- The user constraint specifies that the child and his shadow
ple of re-centering a person is shown in Fig. 4. The user first should move to the left, and the inverse patch transform re-
coarsely selects a region to move, and specifies the location arranges the image patches to meet that constraint, Fig. 5(b).
at which the selected region will be placed. A new image The proposed framework can also work well in the pres-
is generated satisfying this constraint with an inverse patch ence of a complex background. In Fig. 6, the user wants
transform (Fig. 4(b).) Note that the output image is a visu- to recenter the woman in Fig. 6(a) such that she’s aligned
ally pleasing reorganization of patches. The overall struc- with the center of the building. The inverse patch transform
ture follows the specified local evidence, and the content of generates Fig. 6(b) as the output. The algorithm kept the
(a) (b) (c)
Figure 8. In this example, the original image shown in (a) is resized such that the width and height of the output image is 80% of the original
image. (b) The reconstructed image from the patch transform framework. (c) The retargeting result using Seam Carving [2]. While Seam
Carving preserves locally salient structures well, our work preserves the global context of the image through local evidence.

shown in Fig. 7. σsp = 0.2 in this example. Starting with


Fig. 7(a), we have moved the tree to the right, and specified
that the sky/cloud region should be reduced, respectively.
The result for these constraints are shown in Fig. 7(b) and
Fig. 7(c). Notice that cloud patches and sky patches are used
multiple times in each images: The energy penalty paid for
using these patches multiple times is compensated by the
energy preference specified with Eq. (13). This example
(a) (b) (c) can easily be extended to favor patches from a certain class.
Figure 7. This example shows how the proposed framework can
be used to manipulate the patch statistics of an image. The tree is
specified to move to the right side of the image. (a) is the original
image. (b) is the inverse patch transform result with a constraint to 4.3. Resizing an image
use less sky patches. (c) is the inverse patch transform result with
a constraint to use fewer cloud patches. The patch transform can be used to change the size of
the overall image without changing the size of any patch.
building still, and reorganized the flower in the garden to This operation is called image retargeting. This can be
meet the constraints. There is some bleeding of a faint red thought of solving a jigsaw puzzle on a smaller palette
color into the building. If that were objectionable, it could (leaving some patches unused.) In retargeting Fig. 8(a), the
be corrected by the user. user specified that the width and length of the output im-
age should be 80% of the original image. The reconstructed
4.2. Manipulating the patch statistics of an image image with the specified constraints is shown in Fig. 8(b).
With the patch transform, users can manipulate the patch Interestingly, while the context is preserved, objects within
statistics of an image, where the patch statistics encode how the image have reorganized themselves: a whole row of
many patches from a certain class (such as sky, cloud, grass, windows in the building has disappeared to fit the image
etc...) are used in reconstructing the image. Such a request vertically, and the objects are reorganized laterally as well
can be folded into the p(x i ) we modeled as a constant. For to fit the image width. What makes retargeting work in the
example, if a user specified that sky should be reduced (by patch transform framework is that while the local compat-
clicking on a sky patch x s ), p(xi ) can be parameterized so ibility term tries to simply crop the original image, the lo-
that BP tries not to use patches similar to x s : cal evidence term competes against that to contain as much
  information as possible. The patch transform will balance
(f (xi ) − f (xs ))2 these competing interests to generate the retargeted image.
p(xi ; xs ) ∝ exp 2
(13)
σsp
The retargeting result using Seam Carving [2] is shown
where σsp is a specificity parameter, and f (· ) is a function in Fig. 8(c). While Seam Carving better preserves the
that captures the characteristic the user wants to manipulate. salient local structures, the patch transform framework does
In this work, f (· ) is the mean color of the argument. Users a better job in preserving the global proportion of regions
can specify how strong this constraint should be by chang- (such as the sky, the building and the pavement) through
ing σsp manually. The statistics manipulation example is local evidence.
(a) (b) (c) (d)
Figure 9. In this example, we collage two images shown in (a) and (b). (c) The inverse patch transform result. The user wants to copy the
mountain from (b) into the background of (a). The new, combined image looks visually pleasing (although there is some color bleeding of
the foreground snow.) (d) This figure shows from which image the algorithm took the patches. The green region denotes patches from (a)
and the yellow region denotes patches from (b).

4.4. Adding two images in the patch domain


Here we show that the proposed framework can generate
an image that captures the characteristics of two or more
images by mixing the patches. In this application, the local
evidence is kept uniform for all image nodes other than the
nodes within the bounding box to let the algorithm deter-
mine the structure of the image. An example is shown in
Fig. 9. A photographer may find it hard to capture the per-
son and the desired background at the same time at a given
shooting position (Fig. 9(a).) In this case, we can take mul-
tiple images (possibly using different lenses) and combine
them in the patch domain: Fig. 9(b) is the better view of the
mountain using a different lens. The patch transform result
is shown in Fig. 9(c). Interestingly, the algorithm tries to
stitch together the mountains from both images so that arti- Figure 10. These examples illustrate typical failure cases. In the
facts are minimized. This is similar to the work of Digital top example, although the objects on the beach reorganize them-
selves to accommodate the user constraint, the sky patches prop-
Photomontage developed by Agarwala et al. [ 1]. The in-
agate into the sea losing the overall structure of the image. The
verse patch transform finds the optimal way to place patches bottom example shows that some structures cannot be reorganized
together to generate a visually-pleasing image. to generate natural looking structures.

5. Discussions and conclusions framework works especially well when the background is
textured (e.g. natural scenes) or regular (i.e. grid-type.)
We have demonstrated that the patch transform can be With our relatively unoptimized MATLAB implementa-
used in several image editing operations. The patch trans- tion on a 2.66GHz CPU, 3GB RAM machine, the compati-
form provides an alternative to an extensive user interven- bility computation takes about 10 minutes with 300 patches,
tion to generate natural looking edited images. and the BP takes about 3 minutes to run 300 iterations with
The user has to specify two inputs to reconstruct an im- 300 image nodes. For most of the results shown, we ran BP
age: the bounding box that contains the object of interest, from 5 different randomized initial conditions and selected
and the desired location of the patches in the bounding box. the best result. The visually most pleasing image may not
As shown in Fig. 4, the algorithm is robust to changes in the always correspond to the most probable image evaluated by
size of the bounding box. We found it the best to fix as small Eq. (5) because the user may penalize certain artifacts (such
a region as possible if the user wants to fully explore space as misaligned edges) more than others while the algorithm
of natural looking images. However, if the user wants to penalizes all artifacts on an equal footing of the natural im-
generate a natural-looking image with a small number of BP age and color difference prior.
iterations, it’s better to fix a larger region in the image. The Although the algorithm performed well on a diverse set
algorithm is quite robust to changes in the relative location of images, it can break down under two circumstances (Fig-
of bounding boxes, but the user should roughly place the ure 10.) If the input image lacks structure such that the
bounding boxes in such a way that a natural looking image compatibility matrix is severely non-diagonal, the recon-
can be anticipated. We also learned that the patch transform struction algorithm often assigns the same patch to multiple
nodes, violating the local evidence. Another typical failure [10] A. A. Efros and T. K. Leung. Texture synthesis by non-
case arises when the it’s not possible to generate a plausi- parametric sampling. In Proc. IEEE ICCV, 1999. 2, 5
ble image with the given user constraints and patches. Such [11] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael. Learn-
a situation arises partly because some structures cannot be ing low-level vision. International Journal of Computer Vi-
reorganized to generate other natural looking structures. sion, 40(1):25–47, 2000. 2, 3
The main limitation of this work is that the control over [12] D. Goldberg, C. Malon, and M. Bern. A global approach
the patch location is inherently limited by the size of the to automatic solution of jigsaw puzzles. In Proc. Annual
Symposium on Computational Geometry, 2002. 2
patch, which can lead to visible artifacts. If patches are
[13] N. Jojic, B. J. Frey, and A. Kannan. Epitomic analysis of
too small, the patch assignment algorithm breaks down due
appearance and shape. In Proc. IEEE ICCV, 2003. 2
to exponential growth in the state dimensionality. A sim-
[14] A. Kannan, J. Winn, and C. Rother. Clustering appearance
ple extension to address this issue is to represent the image and shape by learning jigsaws. In Advances in Neural Infor-
with overlapping patches, and generate the output image by mation Processing Systems 19, 2006. 2
“quilting” these patches [9]. We could define the compat- [15] D. Koller and M. Levoy. Computer-aided reconstruction and
ibility using the “seam energy” [2]. Since seams can take new matches in the forma urbis romae. In Bullettino Della
arbitrary shapes, less artifact is expected. Another limita- Commissione Archeologica Comunale di Roma, 2006. 2
tion of this work is the large amount computation. To en- [16] N. Komodakis and G. Tziritas. Image completion using effi-
able an interactive image editing using the patch transform, cient belief propagation via priority scheduling and dynamic
both the number of BP iterations and the amount of compu- pruning. IEEE Trans. Image Processing, 16(11):2649–2661,
tation per BP iteration should be reduced. The overlapping November 2007. 2
patch transform framework may help in this regard as well [17] V. Kwatra, A. Schödl, I. Essa, G. Turk, and A. Bobick.
since larger patches (i.e. less patches per image) can be used Graphcut textures: image and video synthesis using graph
without degrading the output image quality. cuts. In ACM SIGGRAPH, 2003. 2
[18] J.-F. Lalonde, D. Hoiem, A. A. Efros, C. Rother, J. Winn,
Acknowledgments and A. Criminisi. Photo clip art. ACM SIGGRAPH, 2007. 2
This research is partially funded by ONR-MURI grant [19] M. Levison. The computer in literary studies. In A. D. Booth,
N00014-06-1-0734 and by Shell Research. The first author editor, Machine Translation, pages 173–194. North-Holland,
is partially supported by Samsung Scholarship Foundation. Amsterdam, 1967. 2
Authors would like to thank Myung Jin Choi, Ce Liu, Anat [20] L. Liang, C. Liu, Y.-Q. Xu, B. Guo, and H.-Y. Shum. Real-
Levin, and Hyun Sung Chang for fruitful discussions. Au- time texture synthesis by patch-based sampling. ACM Trans-
thors would also like to thank Flickr for images. actions on Graphics, 2001. 2
[21] E. N. Mortensen and W. A. Barrett. Intelligent scissors for
image composition. In ACM SIGGRAPH, 1995. 2
References
[22] P. Pérez, M. Gangnet, and A. Blake. Poisson image editing.
[1] A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, In ACM SIGGRAPH, 2003. 2, 5
A. Colburn, B. Curless, D. Salesin, and M. Cohen. Inter- [23] S. Roth and M. Black. A framework for learning image pri-
active digital photomontage. In ACM SIGGRAPH, 2004. 7 ors. In Proc. IEEE CVPR. 3
[2] S. Avidan and A. Shamir. Seam carving for content-aware [24] C. Rother, L. Bordeaux, Y. Hamadi, and A. Blake. Autocol-
image resizing. ACM SIGGRAPH, 2007. 1, 6, 8 lage. In ACM SIGGRAPH, 2006. 2
[3] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester. Image [25] D. Simakov, Y. Caspi, E. Shechtman, and M. Irani. Sum-
inpainting. In ACM SIGGRAPH, 2000. 2 marizing visual data using bidirectional similarity. In Proc.
[4] J. D. Bonet. Multiresolution sampling procedure for analysis IEEE CVPR, 2008. 2
and synthesis of texture images. In ACM SIGGRAPH, 1997. [26] C.-S. Wang. Determining molecular conformation from dis-
2 tance or density data. PhD thesis, Massachusetts Institute of
[5] M. Brown and D. Lowe. Recognising panoramas. In Proc. Technology, 2000. 2
IEEE ICCV, 2003. 2 [27] J. Wang and M. Cohen. Simultaneous matting and composit-
[6] M. G. Chung, M. M. Fleck, and D. A. Forsyth. Jigsaw puzzle ing. In Proc. IEEE CVPR, 2007. 2
solver using shape and color. In Proc. International Confer- [28] J. Wang and M. F. Cohen. An iterative optimization approach
ence on Signal Processing, 1998. 2 for unified image segmentation and matting. In Proc. IEEE
[7] A. Criminisi, P. Pérez, and K. Toyama. Region filling and ICCV, 2005. 2
object removal by exemplar-based image inpainting. IEEE [29] Y. Weiss and W. T. Freeman. What makes a good model of
Transactions on Image Processing, 2004. 2 natural images? In Proc. IEEE CVPR, 2007. 3
[8] E. D. Demaine and M. L. Demaine. Jigsaw puzzles, edge [30] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Understand-
matching, and polyomino packing: Connections and com- ing belief propagation and its generalizations. Exploring ar-
plexity. Graphs and Combinatorics, 23, 2007. 2 tificial intelligence in the new millennium, pages 239–269,
[9] A. A. Efros and W. T. Freeman. Image quilting for texture 2003. 3, 4
synthesis and transfer. In SIGGRAPH, 2001. 2, 8

You might also like