The Patch Transform and Its Applications To Image Editing
The Patch Transform and Its Applications To Image Editing
The Patch Transform and Its Applications To Image Editing
0.8
Patch i Patch j
0.6
0.4
A
Figure 1. ψi,j is computed by convolving the boundary of two
0.2
patches with filters, and combining the filter outputs with a GSM-
FOE model. 0
(a) (b)
We have already defined the user-constraints, φ, and Figure 2. (a) and (b) show a part of pLR , pDU , of Fig. 4(a) in a
patch exclusion term, E. In the next section, we specify matrix form. pLR (i, j) is the probability of placing the patch i to
the patch-to-patch compatibility term, ψ, and then describe the right of the patch j, whereas pDU (i, j) is the probability of
how we find patch assignments x that approximately maxi- placing the patch i to the top of the patch j. The patches are pre-
ordered, so the correct matches, which would generate the original
mize P (x) in Eq. (1).
image, are the row-shifted diagonal components.
3.1. Computing the compatibility among patches score for all four possible spatial arrangements of all pos-
We want two patches to have a high compatibility score sible pairs of patches for the image. Fig. 2 shows the result-
if, when they are placed next to each other, the pixel values ing patch-patch compatibility matrices for two of the four
across the seam look like natural image data. We quantify possible patch spatial relationships.
this using two terms, a natural image prior and a color dif-
ference prior. 3.2. Approximate solution by belief propagation
For the natural image prior term, we apply the filters of Now we have defined all the terms in Eq. (1) for the
the Gaussian Scale Mixture Fields of Experts (GSMFOE) probability of any assignment x of patches to image po-
model [23, 29] to compute a score, ψ i,j
A
(k, l), for patches k sitions. Finding the assignment x that maximizes P (x) in
and l being in the relative relationship of positions i and j, the MRF of Eq. (1) is NP-hard, but approximate methods
as illustrated in Fig. 1. The compatibility score is computed can nonetheless give good results. One such method is be-
with Eq. (2): lief propagation. Belief propagation is an exact inference
algorithm for Markov networks without loops, but can give
J
1 πq good results even in some networks with loops [30]. For
ψi,j
A
(k, l) = exp(−wl xm (k, l))
T
(2) belief propagation applied in networks with loops, different
Z q=1
σq
l,m factorizations of the MRF joint probability can lead to dif-
ferent results. Somewhat counter-intuitively, we found bet-
where x(k, l) is the luminance component at the boundary ter results for this problem using an alternative factorization
of patches (k, l), σq , πq are GSMFOE parameters, and w l of Eq. (1) as a directed graph we describe below.
are the learned filters. σ q , πq , wl are available online 1 . We can express Eq. (1) in terms of conditional probabil-
We found improved results if we included an addi- ities if we define a normalized compatibility,
tional term that is sensitive to color differences between the
patches. We computed the color compatibility, ψ i,j B
, be- ψi,j (xi , xj )
pi,j (xi |xj ) = M (4)
tween two patches by exponentiating the sum of squared i=1 ψi,j (xi , xj )
distance among adjacent pixels at the patch boundaries.
and the local evidence term p(y i |xi ) = φi (xi ). Then we
can express the joint probability of Eq. (1) as
(r(k) − r(l))2
ψi,j
B
(k, l) ∝ exp − 2 (3) 1
σclr P (x) = p(yi |xi )pi,j (xj |xi )p(xi )E(x) (5)
Z i=1
j∈N (i)
where r(· ) is the color along the corresponding boundary
of the argument, and σ clr is fixed as 0.2 after cross vali- where N (i) is the neighboring indices of x i , yi is the origi-
dation. The final patch compatibility is then ψ i,j (k, l) = nal patch at location i, and p i,j is the appropriate normalized
ψi,j
A
(k, l)ψi,j
B
(k, l). compatibility determined by the relative location of x j with
Typically, we break the image into patches of 32 × 32 respect to xi . A similar factorization for an MRF was used
pixels, and for typical image sizes this generates ∼ 300 in [11]. We can manipulate the patch statistics (Section 4.2)
non-overlapping patches. We compute the compatibility through p(xi ), but in most cases we model p(x i ) as a uni-
form distribution, and is amortized into the normalization
1 http://www.cs.huji.ac.il/ yweiss/BRFOE.zip constant Z.
The approximate marginal probability at a node i can be
found by iterating the belief propagation message update
rules until convergence [30]. Ignoring the exclusivity term
E(x) for now, the message update rules for this factoriza-
tion are as follows. Let us suppose that x j is an image node
to the left of xi . Then the message from x j to xi is:
mji (xi ) ∝ pi,j (xi |xj )p(yj |xj ) mlj (xj ) (6)
xj l∈N (j)\i
(a) (b)
Figure 3. (a) The inverse patch transform with the proposed com-
Messages from nodes that are to the right/top/bottom of x i patibility function reconstructs the original image perfectly. (b)
are similarly defined with an appropriate p i,· (xi |· ). The When a simple color and gradient-based compatibility measure is
patch assignment at node x i is: used in the proposed message passing scheme, the algorithm can-
x̂i = argmax bi (xi = l) (7) not reconstruct the original image perfectly.
l
where the belief at node x i is defined as follows: where we have assumed that m tf is normalized to 1. In
words, the factor f tells the node x i to place low probability
bi (xi ) = p(yi |xi ) mji (xi ) (8)
on l if l has already been claimed by another node with a
j∈N (i)
high probability, and is intuitively satisfying.
The proposed message passing scheme has been tested to
3.2.1 Handling the patch exclusion term solve the jigsaw puzzle problem (Fig. 3.) In most cases, the
In most cases, running the above message passing scheme original image is perfectly reconstructed (Fig. 3(a)), but if
does not result in a visually plausible image because a triv- the region lacks structure, such as in foggy or clear sky, the
ial solution to the above message passing scheme (without algorithm mixes up the order of patches. When one patch
any local evidence) is to assign a single bland patch to all is only weakly favored over others, it may lack the power
nodes xi . To find a more plausible solution, we want to re- to suppress its re-use, in our approximate exclusion term.
quire that each patch be used only once. We call this an However, for image editing applications, these two recon-
exclusivity term. struction shortcomings seldom cause visible artifacts.
Since the exclusivity term is a global function involving
all xi , we can represent it as a factor node [30] that’s con- 4. Image Editing Applications
nected to every image node x i . The message from x i to the
The patch transform framework renders a new perspec-
factor (mif ) is the same as the belief without the exclusivity
tive on the way we manipulate images. Applications in-
term (Eq. (8)), and the message from the factor to the node
troduced in this section follow a unified pipeline: the user
xi can be computed as follows:
manipulates the patch statistics of an image, and specifies
mf i (xi ) = ψF (x1 , ..., xN |xi ) mtf (xt ) a number of constraints to be satisfied by the new image.
{x1 ,...,xN }\xi t∈S\i Then the patch transform generates an image that conforms
(9) to the request.
where S is the set of all nodes x i . If any of the two The user-specified constraint can be incorporated into
nodes (xl , xm ) ∈ S share the same patch, ψ F (· ) is zero, the patch transform framework with the local evidence term.
and is one otherwise. The message computation involves If the user has constrained patch k to be at image position
marginalizing N − 1 state variables that can take on M dif- i, then p(yi |xi = k) = 1 and p(yi |xi = l) = 0, for l = k.
ferent values (i.e. O(M (N −1) )), which is intractable. At unconstrained nodes, the low-resolution version of the
We approximate ψ F (· ) as follows: original image can serve as a noisy observation y i :
ψF (x1 , ..., xN |xi ) ≈ ψFt (xt |xi ) (10) (yi − m(l))2
p(yi |xi = l) ∝ exp − 2 (12)
t∈S\i σevid
where ψFj (xj |xi ) = 1 − δ(xj − xi ). Combining Eq. (9)
where m(l) is the mean color of patch l, and σ evid = 0.4
and Eq. (10),
determined through cross-validation. Eq. ( 12) allows the
M
algorithm to keep the scene structure correct (i.e. sky at the
mf i (xi = l) ≈ ψFt (xt |xi = l)mtf (xt ) top and grass at the bottom), and is used in all applications
t∈S\i xt =1 (11) described in this section unless specified otherwise. The
= (1 − mtf (xt = l)) spatial constraints of patches are shown by the red bounding
t∈S\i boxes in the resulting image.
(a) (b) (c) (d)
Figure 4. This example illustrates how the patch transform framework can be used to recenter a region / object of interest. (a) The original
image. (b) The inverse patch transform result. Notice that the overall structure and context is preserved. (c) Another inverse patch transform
result. This figure shows that the proposed framework is insensitive to the size of the bounding box. (d) Using the same constraint as that
of (b), a texture synthesis method by Efros and Leung [10] is used to generate a new image.
(a) (b)
Figure 6. This example verifies that the proposed framework can
still work well in the presence of complex background. (a) The
original image. (b) The inverse patch transform result with the user
constraint. While the algorithm fixed the building, the algorithm
(a) (b) reshuffled the patches in the garden to accommodate the changes
Figure 5. This example shows that the proposed framework can in woman’s position.
be used to change the relative position of multiple objects in the
image. (a) The original image. (b) The inverse patch transform re- the image is preserved (e.g. the mountain background.) The
sult with a user specified constraint that the child should be placed algorithm is robust to changes in the size of the bounding
further ahead of his father. box, as shown in Fig. 4(c), as long as enough distinguished
region is selected.
Since BP can settle at local minima, we run the patch We compared our result with texture synthesis. Fig. 4(d)
transform multiple times with random initial seeds, and let shows that the Efros and Leung algorithm [10] generates
the user choose the best-looking image from the resulting artifacts by propagating girl’s hair into the bush. Because
candidates. To stabilize BP, the message at iteration i is of the computational cost of [10], we computed Fig. 4(d) at
damped by taking the weighted geometric mean with the one quarter the resolution of Fig. 4(b).
message at iteration i − 1. Inevitably, in the modified im- The user can also reconfigure the relative position of ob-
age there will be visible seams between some patches. We jects in an image. For example, in Fig. 5(a), the user may
suppress these artifacts by using the Poisson equation [22] prefer a composition with the child further ahead of his
to reconstruct the image from all its gradients, except those father. Conventionally, the user would generate a meticu-
across any patch boundary. lous matte of the child, move him to a new location, blend
the child into that new location, and hope to fill in the
4.1. Reorganizing objects in an image
original region using an image inpainting technique. The
The user may be interested in moving an object to a new patch transform framework provides a simple alternative.
location in the image while keeping the context. An exam- The user constraint specifies that the child and his shadow
ple of re-centering a person is shown in Fig. 4. The user first should move to the left, and the inverse patch transform re-
coarsely selects a region to move, and specifies the location arranges the image patches to meet that constraint, Fig. 5(b).
at which the selected region will be placed. A new image The proposed framework can also work well in the pres-
is generated satisfying this constraint with an inverse patch ence of a complex background. In Fig. 6, the user wants
transform (Fig. 4(b).) Note that the output image is a visu- to recenter the woman in Fig. 6(a) such that she’s aligned
ally pleasing reorganization of patches. The overall struc- with the center of the building. The inverse patch transform
ture follows the specified local evidence, and the content of generates Fig. 6(b) as the output. The algorithm kept the
(a) (b) (c)
Figure 8. In this example, the original image shown in (a) is resized such that the width and height of the output image is 80% of the original
image. (b) The reconstructed image from the patch transform framework. (c) The retargeting result using Seam Carving [2]. While Seam
Carving preserves locally salient structures well, our work preserves the global context of the image through local evidence.
5. Discussions and conclusions framework works especially well when the background is
textured (e.g. natural scenes) or regular (i.e. grid-type.)
We have demonstrated that the patch transform can be With our relatively unoptimized MATLAB implementa-
used in several image editing operations. The patch trans- tion on a 2.66GHz CPU, 3GB RAM machine, the compati-
form provides an alternative to an extensive user interven- bility computation takes about 10 minutes with 300 patches,
tion to generate natural looking edited images. and the BP takes about 3 minutes to run 300 iterations with
The user has to specify two inputs to reconstruct an im- 300 image nodes. For most of the results shown, we ran BP
age: the bounding box that contains the object of interest, from 5 different randomized initial conditions and selected
and the desired location of the patches in the bounding box. the best result. The visually most pleasing image may not
As shown in Fig. 4, the algorithm is robust to changes in the always correspond to the most probable image evaluated by
size of the bounding box. We found it the best to fix as small Eq. (5) because the user may penalize certain artifacts (such
a region as possible if the user wants to fully explore space as misaligned edges) more than others while the algorithm
of natural looking images. However, if the user wants to penalizes all artifacts on an equal footing of the natural im-
generate a natural-looking image with a small number of BP age and color difference prior.
iterations, it’s better to fix a larger region in the image. The Although the algorithm performed well on a diverse set
algorithm is quite robust to changes in the relative location of images, it can break down under two circumstances (Fig-
of bounding boxes, but the user should roughly place the ure 10.) If the input image lacks structure such that the
bounding boxes in such a way that a natural looking image compatibility matrix is severely non-diagonal, the recon-
can be anticipated. We also learned that the patch transform struction algorithm often assigns the same patch to multiple
nodes, violating the local evidence. Another typical failure [10] A. A. Efros and T. K. Leung. Texture synthesis by non-
case arises when the it’s not possible to generate a plausi- parametric sampling. In Proc. IEEE ICCV, 1999. 2, 5
ble image with the given user constraints and patches. Such [11] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael. Learn-
a situation arises partly because some structures cannot be ing low-level vision. International Journal of Computer Vi-
reorganized to generate other natural looking structures. sion, 40(1):25–47, 2000. 2, 3
The main limitation of this work is that the control over [12] D. Goldberg, C. Malon, and M. Bern. A global approach
the patch location is inherently limited by the size of the to automatic solution of jigsaw puzzles. In Proc. Annual
Symposium on Computational Geometry, 2002. 2
patch, which can lead to visible artifacts. If patches are
[13] N. Jojic, B. J. Frey, and A. Kannan. Epitomic analysis of
too small, the patch assignment algorithm breaks down due
appearance and shape. In Proc. IEEE ICCV, 2003. 2
to exponential growth in the state dimensionality. A sim-
[14] A. Kannan, J. Winn, and C. Rother. Clustering appearance
ple extension to address this issue is to represent the image and shape by learning jigsaws. In Advances in Neural Infor-
with overlapping patches, and generate the output image by mation Processing Systems 19, 2006. 2
“quilting” these patches [9]. We could define the compat- [15] D. Koller and M. Levoy. Computer-aided reconstruction and
ibility using the “seam energy” [2]. Since seams can take new matches in the forma urbis romae. In Bullettino Della
arbitrary shapes, less artifact is expected. Another limita- Commissione Archeologica Comunale di Roma, 2006. 2
tion of this work is the large amount computation. To en- [16] N. Komodakis and G. Tziritas. Image completion using effi-
able an interactive image editing using the patch transform, cient belief propagation via priority scheduling and dynamic
both the number of BP iterations and the amount of compu- pruning. IEEE Trans. Image Processing, 16(11):2649–2661,
tation per BP iteration should be reduced. The overlapping November 2007. 2
patch transform framework may help in this regard as well [17] V. Kwatra, A. Schödl, I. Essa, G. Turk, and A. Bobick.
since larger patches (i.e. less patches per image) can be used Graphcut textures: image and video synthesis using graph
without degrading the output image quality. cuts. In ACM SIGGRAPH, 2003. 2
[18] J.-F. Lalonde, D. Hoiem, A. A. Efros, C. Rother, J. Winn,
Acknowledgments and A. Criminisi. Photo clip art. ACM SIGGRAPH, 2007. 2
This research is partially funded by ONR-MURI grant [19] M. Levison. The computer in literary studies. In A. D. Booth,
N00014-06-1-0734 and by Shell Research. The first author editor, Machine Translation, pages 173–194. North-Holland,
is partially supported by Samsung Scholarship Foundation. Amsterdam, 1967. 2
Authors would like to thank Myung Jin Choi, Ce Liu, Anat [20] L. Liang, C. Liu, Y.-Q. Xu, B. Guo, and H.-Y. Shum. Real-
Levin, and Hyun Sung Chang for fruitful discussions. Au- time texture synthesis by patch-based sampling. ACM Trans-
thors would also like to thank Flickr for images. actions on Graphics, 2001. 2
[21] E. N. Mortensen and W. A. Barrett. Intelligent scissors for
image composition. In ACM SIGGRAPH, 1995. 2
References
[22] P. Pérez, M. Gangnet, and A. Blake. Poisson image editing.
[1] A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, In ACM SIGGRAPH, 2003. 2, 5
A. Colburn, B. Curless, D. Salesin, and M. Cohen. Inter- [23] S. Roth and M. Black. A framework for learning image pri-
active digital photomontage. In ACM SIGGRAPH, 2004. 7 ors. In Proc. IEEE CVPR. 3
[2] S. Avidan and A. Shamir. Seam carving for content-aware [24] C. Rother, L. Bordeaux, Y. Hamadi, and A. Blake. Autocol-
image resizing. ACM SIGGRAPH, 2007. 1, 6, 8 lage. In ACM SIGGRAPH, 2006. 2
[3] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester. Image [25] D. Simakov, Y. Caspi, E. Shechtman, and M. Irani. Sum-
inpainting. In ACM SIGGRAPH, 2000. 2 marizing visual data using bidirectional similarity. In Proc.
[4] J. D. Bonet. Multiresolution sampling procedure for analysis IEEE CVPR, 2008. 2
and synthesis of texture images. In ACM SIGGRAPH, 1997. [26] C.-S. Wang. Determining molecular conformation from dis-
2 tance or density data. PhD thesis, Massachusetts Institute of
[5] M. Brown and D. Lowe. Recognising panoramas. In Proc. Technology, 2000. 2
IEEE ICCV, 2003. 2 [27] J. Wang and M. Cohen. Simultaneous matting and composit-
[6] M. G. Chung, M. M. Fleck, and D. A. Forsyth. Jigsaw puzzle ing. In Proc. IEEE CVPR, 2007. 2
solver using shape and color. In Proc. International Confer- [28] J. Wang and M. F. Cohen. An iterative optimization approach
ence on Signal Processing, 1998. 2 for unified image segmentation and matting. In Proc. IEEE
[7] A. Criminisi, P. Pérez, and K. Toyama. Region filling and ICCV, 2005. 2
object removal by exemplar-based image inpainting. IEEE [29] Y. Weiss and W. T. Freeman. What makes a good model of
Transactions on Image Processing, 2004. 2 natural images? In Proc. IEEE CVPR, 2007. 3
[8] E. D. Demaine and M. L. Demaine. Jigsaw puzzles, edge [30] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Understand-
matching, and polyomino packing: Connections and com- ing belief propagation and its generalizations. Exploring ar-
plexity. Graphs and Combinatorics, 23, 2007. 2 tificial intelligence in the new millennium, pages 239–269,
[9] A. A. Efros and W. T. Freeman. Image quilting for texture 2003. 3, 4
synthesis and transfer. In SIGGRAPH, 2001. 2, 8