Producing stylized videos using the AnimVideo rendering tool

Producing Stylized Videos Using the AnimVideoRendering Tool

Rafael B. Gomes,1 Lucas M. Oliveira,1 Laurindo S. Britto-Neto,1 Tiago S. Santos,1

Gilbran S. Andrade,1 Bruno M. Carvalho,1 Luiz M. G. Goncalves2

1 Departamento de Informatica e Matematica Aplicada, Universidade Federal do Rio Grande doNorte, Campus Universitario, S/N, Lagoa Nova, Natal, RN 59.072-970, Brazil

2 Departamento de Engenharia de Computacao e Automacao, Universidade Federal do RioGrande do Norte, Campus Universitario, S/N, Lagoa Nova, Natal, RN 59.072-970, Brazil

Received 15 August 2008; accepted 2 March 2009

ABSTRACT: Stylized rendering is the process of generating images

or videos that can have the visual appeal of pieces of art, expressing

the visual and emotional characteristics of artistic styles. A major

problem in stylizing videos is the absence of temporal coherence,something that results in flickering of the structural drawing elements

(such as brush strokes or curves), also known as swimming. This arti-

cle describes the AnimVideo rendering tool that was developed forstylizing videos with temporal coherence. The temporal coherence is

achieved by first fully segmenting the input video with a fast fuzzy

segmentation algorithm that uses hybrid color spaces and motion in-

formation. The result of the segmentation algorithm is used to con-strain the result of an optical flow algorithm, given as dense optical

flow maps that are then used to correctly move, remove, or add

structural drawing elements. The combination of these two methods

is referred to as constrained optical flow, and we also provide theoption of initializing the optical flow computation with displacement

maps computed by homographies that map objects in adjacent

frames. Also, we briefly describe some stylized rendering methodsthat were implemented in the tool. Finally, experimental results are

shown, including snapshots of the tool’s interface and illustrative

examples of the produced renderings that validates the proposed

techniques. VVC 2009 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 19,

100–110, 2009; Published online in Wiley InterScience (www.interscience.

wiley.com). DOI 10.1002/ima.20180

Key words: stylized rendering; intraobject temporal coherence;

fuzzy segmentation; constrained optical flow; hybrid color spaces

I. INTRODUCTION

Stylized rendering is the process of generating images or videos

that can have the visual appeal of pieces of art, expressing the visual

and emotional characteristics of artistic styles. This class of techni-

ques was originally named Non-Photorealistic Rendering (NPR).

As this negative definition points out, NPR is a class of techniques

defined by what they do not aim, the realistic rendering of artificial

scenes. Another way of defining stylized rendering or NPR techni-

ques is that they aim to reproduce artistic techniques renderings,

trying to express feelings and moods on the rendered scenes.

There are many stylized rendering techniques published in the

literature, and some are currently being used in application areas as

movies, games, advertisement, and technical illustrations. Some

stylized rendering techniques include pen-and-ink drawings (Win-

kenbach and Salesin, 1994), cartoon shading, mosaics (Di Blasi and

Gallo, 2005), impressionist-style rendering (Litwinowicz, 1997),

and water-coloring (Bousseau et al., 2006). For comprehensive

reviews of several NPR techniques and applications, the reader

should refer to (Gooch and Gooch, 2001; Strothotte and Schlecht-

weg, 2002).

Animation techniques can convey information that cannot be

simply captured by shooting a real scene with a video camera.

However, such kind of animation is labor intensive and requires a

fair amount of artistic skill. On the other hand, one could use styl-

ized rendering techniques and graphical tools to generate highly

abstracted animations with little user intervention, thus, making it

possible for nonartist users to create their own animations with less

effort. However, there is a major problem in video stylization,

which is the absence of temporal coherence. Temporal incoherence

occurs when drawing elements move in undesired directions, pro-

ducing a distracting effect. This effect happens in the form of flick-

ering of the structural drawing elements (such as brush strokes or

curves), and it is also known as swimming. Because of this problem,

many animations have been produced with animators working with

a single or a few frames at a time, thus, increasing the artistic and

computational efforts needed to produce the animation.

The AnimVideo rendering tool (http://www.lablic.dimap.

ufrn.br/animvideo) (initially known as AVP) described in this arti-

cle employs a fast fuzzy segmentation algorithm for segmenting

input videos as 3D volumes, and an optical flow algorithm for

enforcing intraobject temporal coherence in the animations. The

Correspondence to: B. M. Carvalho; e-mail: [email protected] sponsors: This work was partially supported by Universal and PDPG-TI

CNPq grants.

' 2009 Wiley Periodicals, Inc.

https://www.researchgate.net/publication/221523186_SILLION_F_Interactive_watercolor_rendering_with_temporal_coherence_and_abstraction?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/2815206_Computer-Generated_Pen-and-Ink_Illustration?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==


https://www.researchgate.net/publication/246962972_Non-Photorealistic_Computer_Graphics_Modeling_Rendering_and_Animation?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==


https://www.researchgate.net/publication/228608613_Images_and_video_for_an_impressionist_effect?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/236973460_Non-Photorealistic_Rendering?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

structure of the article is described next. Section II talks about video

stylization techniques published in the literature, whereas Section

III describes how the AnimVideo rendering tool works, how the

user interacts with it, and some artistic styles implemented in the

AnimVideo rendering tool. Section IV shows illustrative experi-

ments, presenting some frames of the animations produced. Finally,

Section V presents some conclusions.

II. STYLIZING VIDEOS

The main objective of stylized rendering techniques is the produc-

tion of animations from real videos using automatic or semi-auto-

matic techniques. Thus, stylized rendering techniques allow a user

with little or no artistic training to generate animations with little

effort, when comparing to the task of creating an animation from

scratch. Video stylization also offers the choice of mixing real mov-

ies with stylized objects, by superimposing them on the video or by

segmenting objects and rendering them with an artistic style.

As mentioned in Section I, above, if the input for the stylized

video is a normal video, there is the problem of temporal incoher-

ence, mainly due to brightness variations caused by shadowing,

noise, and changes in illumination. This temporal incoherence

appears in the form of swimming, where features of the animation

move and change their intensities within the rendered animation.

This flickering can happen because borders of objects may be blurry

and thus detected in the wrong place or because static areas are

being rendered differently each time, due to some slight brightness

changes.

The first technique to address this problem is proposed by Litwi-

nowicz (1997) to maintain temporal coherence in video sequences

stylized using an impressionist style. Litwinowicz’s technique

advocates the use of an optical flow method for tracking movement

in the scene and moving, adding or removing brush strokes from

frame to frame. An extension to this approach is proposed by Hertz-

mann and Perlin (2000), by detecting areas of change from frame to

frame and painting over them, i.e., keeping the brush strokes of the

static areas, and enforcing intraobject temporal coherence by warp-

ing the brush stroke’s control points using the output of an optical

flow method.

Wang et al. (2004) proposed a method for creating cartoon ani-

mations from video sequences by dividing the problem in two steps

as follows: a segmentation step, used to isolate the objects which

would be rendered using the style throughout the animation, in this

case performed using a mean shift segmentation algorithm for end-

to-end video shot segmentation; and a coherence step, where the

user selected constraint points on key-frames of the video shot

through a graphical interface. The selected points are then used for

interpolating the region boundaries between key-frames. Since the

key-frames have to be similar for generating nice results for the

interpolation process (a typical interval between key-frames is

between 10 and 15), there is still significant interaction with the

user in the rendering process to ensure temporal coherence of the

animation.

Another step forward to create a system for producing tempo-

rally coherent NPR animations, but with less user interaction is

proposed by Collomosse et al. (2005). Their framework, called

‘‘Stroke Surfaces,’’ works in a similar way to the technique of

Wang et al. (2004), in the sense that it also treats the input video as

a 3D spatio-temporal volume. The idea, again, is to segment the

input video, achieving an abstract description of the video, in this

case called Intermediate Representation (IR). This intermediate rep-

resentation is processed by a video analysis front end that uses heu-

ristics and user intervention to associate regions into semantic

volumes, and feed them into the rendering back end. That has some

interaction with the user, including the option of rendering the

video using a few different NPR techniques. The intermediate rep-

resentation also stores local motion estimates for video objects, and

these can be used to calculate a homography that can be used to

recover internal edge positions. However, since this homography

calculation assumes both planar motions and rigid objects, the intra-

object temporal coherence can be affected in the case of curved

objects or movements and nonrigid transformations. This also limits

the use of this technique for achieving intraobject temporal coher-

ence, since any object with high-curvature parts or with nonrigid

motion would pose a problem that would probably result in wrong

mappings of drawing elements.

So, a technique that enforces temporal coherence is necessary

that may use information from the user to guarantee that nonrigid

objects can also be handled. Ways of interaction between the user

and the tool must also be provided, as necessary. Methods devel-

oped in this direction, described next, takes place in the AnimVideo

rendering tool proposed in this work.

Similarly to other tools developed to address the same problem,

the AnimVideo tool can assume that the video was preacquired,

since the goal is to postprocess the video to produce an animation.

Because of the huge gamut of choices, one artist can make in the

course of creating such an animation, the main goal of the AnimVi-

deo tool is to provide powerful software tools for manipulating the

input videos and creating animations. In the next section, we will

describe how the methods implemented in the AnimVideo render-

ing tool enforce temporal coherence, what information is needed

from the user, and how the interaction between the user and the tool

takes place.

III. THE ANIMVIDEO RENDERING TOOL

Our method for producing intraobject temporally coherent stylized

animations is divided into three parts, the segmentation of the input

video, followed by the calculation of the Constrained Optical Flow

map, and the rendering of objects using some artistic style. Interac-

tions between the parts mentioned above can be seen in Figure 1 for

the mosaic rendering style. For a different style, only the parts

related with the rendering and creation/update/deletion of drawing

structural primitives (the two boxes on the top and bottom right of

Fig. 1) have to be replaced.

The technique implemented in the AnimVideo rendering tool to

enforce temporal coherence also treats input videos as spatio-tem-

poral volumes, and segments the volume into space-time objects.

The segmentation step is performed by using a semi-automatic

region-growing segmentation technique (Carvalho et al., 2005,

2006), where the user interaction is performed by selecting seeds

for the objects being segmented. Since the video is treated as a 3D

volume, the user can easily select seeds on several different frames,

solving the problem of segmenting objects that appear later in the

video. The segmentation algorithm (Carvalho et al., 2005, 2006) is

designed to be very fast, so the user can execute it, refine the seg-

mentation by adding/removing seeds, and run the program again,

and still produce a good segmentation in a reasonable time (less

than 5 s).

The temporal coherence of object boundaries can be easily

obtained by following the boundaries of the segmented objects,

whereas intraobject temporal coherence is enforced by computing a

Vol. 19, 100–110 (2009) 101

https://www.researchgate.net/publication/221292999_Fuzzy_Segmentation_of_Color_Video_Shots?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==


https://www.researchgate.net/publication/220081185_Simultaneous_Fuzzy_Segmentation_of_Multiple_Objects?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==


https://www.researchgate.net/publication/7617669_Stroke_Surfaces_Temporally_Coherent_Artistic_Animations_from_Video?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==



constrained optical-flow, limited to the area of the segmented object

itself. The optical flow is computed individually for each object,

using Proesmans’ et al. method (1994) that generates a dense map

with good estimates of the optical flow. The results of the optical

flow can also be used to generate a motion emphasis effect in a

painterly technique or any other stroke-based stylized rendering

technique. We now proceed to describe in more detail the techni-

ques used in the AnimVideo rendering tool.

A. Fuzzy Segmentation Using Hybrid Color Spaces andMotion Information. To segment the color video shots, we use a

multiobject fuzzy segmentation algorithm (Herman and Carvalho,

2001; Carvalho et al., 2005) based on hybrid color spaces and

motion information. This approach extends a previous one (Udupa

and Samarasekera, 1996) working with arbitrary digital spaces

(Herman, 1998). By definition, a digital space is a pair (V, p), whereV is a set and p is a symmetric binary relation on V such that V is

connected under p. In the theory presented below, we refer to the

elements of the set V as spels, which is short for spatial elements,even though here we deal only with videos that are segmented as

3D volumes. This method simultaneously computes the grade of

membership of the spels of a video to a number of objects. This

number is between 0 and 1, where 0 indicates that the spel definitely

does not belong to the object and 1 indicates that it definitely does.

To compute the grade of membership, we assign, to every or-

dered pair (c,d) of spels, a real number (in the range [0,1]), which is

referred to as the fuzzy connectedness of c to d (a concept intro-

duced by Rosenfeld (1979)). In the approach used in our tool, fuzzy

connectedness is defined in the following general manner. We call a

sequence of spels a chain, and its links are the ordered pairs of con-

secutive spels in the sequence. The strength of a link is also a fuzzy

concept, i.e., for every ordered pair (c,d) of spels, we assign a real

number (between 0 and 1), which we define as the strength of thelink from c to d. We say that the w-strength of a link is the appropri-

ate value of a fuzzy spel affinity function w: V2 ? [0,1], i.e., a func-

tion that assigns a value (between 0 and 1) to every pair of spels in

V. A set U (( V) is said to be w-connected if, for every pair of spelsin U, there is a chain in U of positive w-strength from the first to the

second spel of the pair. A chain is formed by one or more links and

the w-strength of a chain is the w-strength of its weakest link; the

w-strength of a chain with only one spel in it is 1 by definition.

Since we are dealing with the simultaneous segmentation of

multiple objects, we define an M-semisegmentation of V as a func-

tion r that maps each c 2 V into an (M 1 1)-dimensional vector rc

5 (r0c, r1

c, . . ., rMc), where rm

c represents the grade of membership

of the spel c in the mth object, and r0c is always equal to max1 � m �

Mrmc. An M-segmentation is defined as an M-semisegmentation r

where r0c is positive, for every spel c. An M-fuzzy graph is then

defined as a pair (V,C), where V is a nonempty finite set and C 5

(w1,. . ., wM) with wm (for 1� m�M) being a fuzzy spel affinity.

Here, we use a property that states that a spel d is associated

with an object n, if, and only if, there is a chain of maximal strength

(located entirely inside the nth object) connecting a seed spel c [ Vn

to d. We present the proofs associated with this property in a previ-

ous published theorem (Carvalho et al., 2005). In that work, we

show that there is one, and only one, M-semisegmentation that

satisfies the properties stated in it.

The descriptions of the original and the fast multiobject fuzzy

Segmentation (MOFS and Fast-MOFS, respectively) can be found

in our previous work (Carvalho et al., 2005). The Fast-MOFS

assumes that the affinity functions can assume a small number of

values, without affecting, significantly, the quality of the segmenta-

tions. In the experiments shown there, the affinity functions are

rounded to three decimal places, allowing the segmentations to be

computed much faster (with speedup factors around seven) without

any visible degradation on the results.

To apply the algorithms mentioned above to image segmenta-

tion, we still have to define the fuzzy spel affinities wm (for 1 � m

Figure 1. Diagram showing the

interactions between the parts of

our method for generating intra-object, temporally coherent, styl-

ized animations. [Color figure can

be viewed in the online issue,

which is available at www.interscience.wiley.com.]

102 Vol. 19, 100–110 (2009)

https://www.researchgate.net/publication/3193251_Multiseeded_segmentation_using_fuzzy_connectedness?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==


https://www.researchgate.net/publication/220246810_Fuzzy_Digital_Topology?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==




https://www.researchgate.net/publication/220632717_Fuzzy_Connectedness_and_Object_Definition_Theory_Algorithms_and_Applications_in_Image_Segmentation?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==


https://www.researchgate.net/publication/252401975_Geometry_of_digital_spaces?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

� M). Usually this is done by a computer program, based on some

minimal information supplied by a user (Udupa and Samarasekera,

1996; Herman, 1998; Carvalho et al., 1999). The idea is that, even

though the user probably would not be able to define, mathemati-

cally, the characteristics of the objects he/she wants to segment, it

is easy for him/her to select points that belong to them. A computer

program can then compute some statistics based on the neighbor-

hoods of the selected spels and use these statistics to compute the

fuzzy spel affinities. Since we are dealing with color videos, previ-

ous approach (Udupa and Samarasekera, 1996; Herman, 1998; Car-

valho et al., 1999) used to create the fuzzy spel affinities wm (for

1 � m � M) is adapted here to incorporate color information.

The idea behind the usage of color and motion information to

segment color videos is that, in general, segmentation algorithms

have problems to segment objects when they have colors that are

similar to the background. In this case, motion information may

help to distinguish objects. On the other hand, segmenting videos

using motion information alone may lead to problems, since motion

information can be unreliable on the boundaries of objects.

Khan and Shah (2001) proposes a maximum a posteriori proba-

bility (MAP) method to segment objects in a video by using several

features, such as spatial coordinates, color, and motion. Weights are

assigned to each feature according to a group of heuristics, and a

clustering algorithm is used to segment the first frame of the video.

Then, the segmentation is propagated to the other frames of the

video.

Since our previous algorithm (Herman and Carvalho, 2001;

Carvalho et al., 2005) is a semiautomatic one, requiring user

interaction, the applicability of this methodology is restricted to

the segmentation of preacquired video shots. The adaptation of

the method consists in the construction of fuzzy affinities that

incorporate the hybrid color spaces and the motion information of

each particular object. Besides that, the video is treated as a 3D

volume, with the frames being z slices, allowing the method to

easily handle temporally nonconvex objects or segment several

objects with similar characteristics as a single one. This is done

by using the high-level knowledge about the objects introduced

in the method through the selection of seed spels by the user.

The algorithm chosen for this application was the Fast-MOFS

previously described (Carvalho et al., 2005), because it allows

the user to segment the video shot, to evaluate the quality of the

segmentation, add and/or to remove seed spels, and recalculate

the segmentation in a few time.

B. Motion Estimation. There are many published methods for

estimating motion from videos. The methods based on the optical

flow equation (Horn and Schunck, 1981; Lucas and Kanade, 1981;

Singh, 1991) estimate the motion of intensities in a pair of succes-

sive frames of video, allowing the recovery of approximate image

velocities by measuring spatio-temporal derivatives. According to

Horn and Schunck (1981), ‘‘optical flow is the distribution of appa-rent velocities of movement of brightness patterns in an image.’’ Areview of earlier methods for computing the optical flow field can

be seen in Beauchemin et al. work (1995), as well as a taxonomy

for the methods, that are classified into differential, frequency-

based, correlation-based, multiple motion, and temporal refinement

methods. Other surveys about earlier methods can be found in the

works of Aggarwal and Nandhakumar (1988), Otte and Nagel

(1994), and Stiller and Konrad (1999).

There are several problems that can increase the complexity of

the computation of the optical flow, such as occlusion, motions of

semitransparent objects, nonrigid objects, nonuniform illumination,

and noise. Besides these problems, because optical flow was usually

computed in the scale of resolution defined by the visual sensor

(Horn and Schunck, 1981), it is not appropriate to estimate large

image motions (Beauchemin and Barron, 1995). To circumvent

that, several methods were either created (Anandan, 1989) or

adapted (McCane et al., 2001) to perform hierarchical computation

of optical flow fields.

The idea here is to use motion information to aid the segmenta-

tion of objects in videos, specially when one object occludes

another object of a similar color. The motion information is incor-

porated in the fuzzy segmentation method as part of the fuzzy spel

affinities. This is done by computing a dense optical flow map

between all frames of the video shot. McCane et al. (2001) analyze

the behavior of seven optical flow methods when applied to several

complex synthetic scenes and controlled real scenes with ground

truth, and comes to the conclusion that the most accurate method is

a multiresolution implementation of the method proposed by Proes-

mans’ et al. (1994).

The method of Proesmans’ et al. (1994) computes the optical

flow of a pair of images by employing a set of nonlinear diffusion

equations that integrate the traditional differential approach with a

correlation method. The optical flow is computed in an iterative

dual scheme, with both the forward and backward flow being com-

puted. This is done by computing the optical flow from frame n to

frame n 1 1 and from frame n 1 1 to frame n. During these proc-

esses, consistency maps are computed and possibly are fed back

into the optical flow computation for another iteration. This aniso-

tropic diffusion allows the smoothing of the flow fields, encourag-

ing smoothing within regions but attenuating smoothing across

boundaries by using the consistency maps, and thus, increasing

flow stability within objects while maintaining flow discontinuities

between objects (Novins et al., 1998). The result of the optical flow

computed by Proesmans’ et al. (1994) method is controlled by three

parameters, the number of iterations the optical flow is computed,

the number of hierarchical levels used to compute the optical flow,

and the smoothness parameter k that controls the amount of aniso-

tropic diffusion.

C. Selecting the Color Channels. There are many color

spaces used for color image segmentation, but several different

studies (Gauch and Hsia, 1992; Liu and Yang, 1994; Cheng

et al., 2001) show that there is not a single color model that is

more appropriate for segmenting all kinds of color images, thus,

making the selection of the color space a very important step in

color image segmentation, because, if we can select the color

channels with the aim of maximizing this color separation, we

can improve the accuracy of the color video segmentation. The

heuristics used here to achieve this is based on the selection of

the color channels with the lowest correlation values between

them. The reasoning is that the channels selected in this way

increase the variety of information (diversification of information)

used in the fuzzy affinity functions employed in the fuzzy seg-

mentation algorithm.

According to Hair et al. (2005), the Pearson’s correlation coeffi-

cient measures the intensity or the grade of association between two

variables, i.e., the linear dependency between two variables. Given

two variables, the Pearson correlation coefficient assumes a value

between 21 and 1, where the value 1 indicates that the two varia-

bles have a perfect positive correlation, i.e., the variables present

the same linear distribution. On the other hand, the value 21

Vol. 19, 100–110 (2009) 103


https://www.researchgate.net/publication/227055194_Algorithms_for_Fuzzy_Segmentation?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==



https://www.researchgate.net/publication/2984059_On_the_Computation_of_Motion_from_Sequences_of_Images_-_a_Review?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/227334363_Optical_flow_estimation_Advances_and_comparisons?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==


https://www.researchgate.net/publication/220659727_Anandan_P_A_computational_framework_and_an_algorithm_for_the_measurement_of_visual_motion_Int_J_Comput_Vis_2_283-310?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/227129765_Determination_of_optical_flow_and_its_discontinuous_using_non-linear_diffusion?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==



https://www.researchgate.net/publication/234804325_The_computation_of_optical_flow?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==



https://www.researchgate.net/publication/222450615_Determining_Optical_Flow?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==



https://www.researchgate.net/publication/228083880_An_Iterative_Image_Registration_Technique_with_an_Application_toStereo_Vision?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/221259206_Recovering_Motion_Fields_An_Evaluation_of_Eight_Optical_Flow_Algorithms?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/222685082_Color_image_segmentation_Advances_and_prospects?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==


https://www.researchgate.net/publication/3192267_Multiresolution_color_image_segmentation?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/222702730_On_Benchmarking_Optical_Flow?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==







https://www.researchgate.net/publication/230675508_Optic_Flow_Computation_a_Unified_Perspective?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/233784953_A_comparison_of_three_color_image_segmentation_algorithm_in_four_color_space?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/259920989_Multivariate_Data_Analysis_5?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

indicates a perfect negative correlation, i.e., when the value of one

variable increases, the other one decreases. A value 0 indicates that

there is no correlation between the two variables. The Pearson

correlation can be defined as follows:

Xi;j ¼Pn

t¼1 pit � pi� �

pjt � pj� �

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPnt¼1 pit � pi

� �2q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPnt¼1 pjt � pj

� �2r ; ð1Þ

where pti and pt

j are the values of the spel t for the channels i and j,and pi and pj are the means of the values of the spels for the chan-

nels i and j, respectively. The value of n is defined by the number of

spels in the neighborhood of the seed spels selected by the user, i.e.,

the correlation is computed only with the spel values in areas

selected by the user as being representatives of the objects to be

segmented. The matrix X contains the correlation values between

all k channels analyzed, with no maximum limit on the number of

color models analyzed. The heuristics used to select the channels is

the following: The first channel selected is the one with the lowest

correlation with all the other channels while the second channel is

the one with the lowest correlation value to the first one selected.

Finally, the third channel selected is the one with the lowest correla-

tion values to the first two selected. To find the channel with the

lowest correlation values to the other channels, we compute, for ev-

ery channel i, the value Yi, which indicates the amount of correla-

tion of this channel to the other k 2 1 channels, and is given by the

following:

Yi ¼ k �Xkj¼1

jXi;jj; ð2Þ

where Xi,j is the correlation value between the channels i and j.High Yi values indicate that the channel i has low correlation with

the other channels. The first selected channel (ch1) is the one with

the highest Yi value, whereas the second channel (ch2) is the chan-

nel with the lowest Xi,ch1 value, for 1 � i � k, i = ch1 and the third

channel is the channel that minimizes the value of Xi,ch1 1 Xi,ch2,

for 1 � i � k, i = ch1 and i = ch2.

D. Incorporating Motion and Color Information. The fuzzy

spel affinities that incorporate motion and color information are

built in a similar manner to what was done in Eq. (4) for gray-level

images. In this case, the three color channels selected using the heu-

ristics above and two motion channels (�u and �v, which give the hori-zontal and vertical components of the optical flow) are used to com-

pose the affinity functions. Thus, the color component of the fuzzy

spel affinities is given by the following:

wmðc; dÞcolor ¼P3

i¼1 qgi;mhi;mðgiÞ þ qai;mbi;mðaiÞ� �

6; ð3Þ

where gi,m is the mean and hi,m is the standard deviation of the

average values of the color channel i, for 1 � i � 3, for all pairs

of neighboring spels belonging to Vm, and ai,m is the mean and

bi,m is the standard deviation of the absolute difference of the

values for all pairs of neighboring spels belonging to Vm for the

color channel i. The motion component of the fuzzy spel affin-

ities is based on precomputed optical flow maps that are com-

bined by as follows:

wmðc; dÞmotion ¼qg�v;mh�v;mðg�vÞ þ qa�v;mb�v;mða�vÞ

� �6

;

þqg�u;mh�u;mðg�uÞ þ qa�u;mb�u;mða�uÞ

� �6

; ð4Þ

where the functions g, h, a, and b have the same definition as above,

but are computed over the values of the motion components �u and �v.Depending on the input video and the objects that one wants to seg-

ment, the motion may be more or less important in discerning the

objects. Then, weights are assigned to both color and motion com-

ponents of the fuzzy spel affinities wm(c,d)color and wm(c,d)motion,

so, the fuzzy spel affinities are given by the following:

wmðc; dÞ ¼w1wmðc; dÞmotion ifðc; dÞ 2 p;þw2wmðc; dÞcolor;

otherwise;0;

8><>: ð5Þ

where w1 and w2 are weights such that w1 1 w2 5 1.0. Comparisons

of the results produced by the MOFS algorithm with the original

fuzzy affinities and the affinities described here can be seen in

(Oliveira, 2007).

E. Optical Flow. Several works, such as (Litwinowicz, 1997;

Hertzmann and Perlin, 2000), have proposed the use of an optical

flow method for enforcing temporal coherence. However, the local

characteristic present in the computation of optical flow techniques

and their sensitivity to noise somehow limit their applicability. To

overcome that, we proposed a method (Gomes et al., 2007a) where

an optical flow algorithm is used for enforcing temporal coherence,

but with the search area for the spel matching restricted by object

boundaries obtained by the segmentation algorithm. Thus, the opti-

cal flow information can be used to enforce intraobject temporal co-

herence on these sequences.

F. Constrained Optical Flow. We decided to use a multiresolu-

tion implementation of the optical flow algorithm proposed by Pro-

esmans et al. (1994) because it produces a very dense optical flow

map (with one motion estimate per spel), and because it has been

evaluated by McCane et al. (McCane et al., 2001) as the best (in ac-

curacy and consistency) between several algorithms when applied

to three complex synthetics scenes and one real scene. The

algorithm uses a system of six nonlinear diffusion equations that

computes forward and backward disparity maps.

The Constrained Optical Flow can then be computed by limiting

the search area for the optical flow algorithm to the area occupied

by the same object in the next frame. This is defined as follows:

Given a 3D image I (the video stored as a sequence of frames) and

an M-segmentation map r, the constrained optical flow of the spels

belonging to object k is computed over the image Ik, that is definedby the following:

Ikðx; y; zÞ ¼ Iðx; y; zÞ ifrðx;y;zÞk 6¼ 0;�1 otherwise;

�ð6Þ

where rk(x,y,z)

= 0 indicates that the spel (x,y,z) belongs, with grade

of membership rk(x,y,z) to object k, and the value 21 is used as a flag

104 Vol. 19, 100–110 (2009)

https://www.researchgate.net/publication/2808556_Painterly_Rendering_for_Video_and_Interaction?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==





to signal the optical flow algorithm that its computation should not

include the spel (x,y,z). Thus, the Constrained Optical Flow calcu-

lated from two successive frames for the whole frame is given by

the union of nonnull flow vectors of the calculated Constrained Op-

tical Flow from the individual objects, i.e., the individual flow maps

computed for each object that is going to be stylized are combined

prior the rendering step. It is important to note that the Constrained

Optical Flow computation is needed only for the objects we want to

render with intraobject temporal coherence. The limitation of the

optical flow search area in the constrained optical flow results in

flow maps with much less error than when using the global optical

flow counterpart method (Gomes et al., 2007a), as can be seen in

Figure 2.

G. Homography Seeding. A framework for producing tempo-

rally coherent stylized videos is proposed in a previous work by

Collomosse et al. (2005). The idea is to segment objects treating the

video as a 3D image. The object boundaries stored, frame to frame,

are used to compute local motion estimates for video objects that

are then used to model the interframe motion as a homography.

A homography is defined in 2D space as a mapping between a

point on a ground plane as seen from one camera, to the same point

on the ground plane as seen from a second camera (Hartley and Zis-

serman, 2003) (or in the next frame, in our case). This has many

practical applications; most notably it provides a method for com-

posing a scene by pasting 2D or 3D objects into an image or video

with the correct pose. More formally, if we have two cameras, aand b, looking at points pi in a plane. Passing the projections of pifrom bpi in b to a point api in a as follows:

api ¼ KaHbaK�1b

bpi; ð7Þ

where Hba is given by the following:

Hba ¼ RtnT

d: ð8Þ

In Eq. (8) above, R is the rotation matrix by which frame b is

rotated in relation to frame a; t is the translation vector from a to b;n and d are the normal vector of the plane and the distance to the

plane, respectively; Ka and Kb are the camera’s intrinsic parameter

matrices (Horn and Schunck, 1981).

When the image region in which the homography is computed is

small or the image has been acquired with a large focal length, an

affine homography is a more appropriate model for image displace-

ments. An affine homography is a special type of a general homog-

raphy whose last row of Hba is fixed to h31 5 h32 5 0 and h33 5 1.

In our case, homogeneous coordinates are used in practice to imple-

ment the homography transformation because matrix multiplication

cannot directly perform the division required for perspective projec-

tion, becoming thus an affine transformation plus a division.

By the formalism above, one can note that the homography is

not a linear (even not an affine) transformation, but assumes both

planar motion and rigid objects, thus limiting the accuracy of this

method when applied to a wide range of objects and motions. Non-

rigidity is the case of some objects used in the current work, for

example, when using the object shown in Figure 3, a frog that has

nonrigid motion of its articulations. We introduce an improvement

over the above cited algorithms by treating nonrigid objects as a

connected set of rigid components, as few as necessary, each one

modeled separately. So, a homography (rigid) can be calculated for

each one of these components, allowing the tracking of the object

in every frame.

This simple approach allows the tracking of the object in every

frame, with some help from the user that, presumably, knows the

Figure 2. Application of the multiresolution implementation of the Proesmans’ et al. optical flow algorithm to two subsequent frames of the

Pooh sequence, on the whole image (a) and constrained to the segmented object only (b). By looking at the original sequence, we can see that

the Constrained Optical Flow yields better results, especially close to the borders of Winnie the Pooh. Only part of the frames is shown toemphasize the differences between the results.

Vol. 19, 100–110 (2009) 105



https://www.researchgate.net/publication/220692721_Multiple_view_geometry_in_computer_vision_2_ed?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==


objects to be tracked in the movie. An initialization is necessary, in

which the user breaks each nonrigid object in its components. Then,

the tracking of each component is performed by using an approach

based on correlation measures. This simple approach has demon-

strated to be enough for the image sequences used in this work.

Other approaches for tracking were not tried here since this one sol-

ves our problem, with a good performance, as it will be shown in

the experiments performed. Besides, tracking nonrigid objects in

video sequences is a well-known topic in the field of Computer

Vision and several works are available in the literature (Tissai-

nayagam and Suter, 2005).

The framework developed in our rendering tool uses the homog-

raphy-based interframe motion estimates to seed the flow map used

in the constrained optical flow, observing that the homography is

applied to object components instead of a global one for each object

to be tracked. This approach has two potential advantages over the

constrained optical flow described above. First, a good motion esti-

mate speeds up the computation of the optical flow maps. Second,

the optical flow maps become smoother, especially close to the bor-

ders of the objects, since the homography maps the object shapes in

two adjacent frames. The use of the constrained optical flow to pro-

cess the homography motion estimates allows us to overcome the

limitations of the homography method described by Collomosse

et al. (2005) when dealing with nonrigid objects and nonplanar

motion.

H. FusionFlow. We mentioned before that the main goal of the

AnimVideo tool is to provide powerful software tools for manip-

ulating the input videos and creating animations. Thus, we devel-

oped the AnimVideo tool in a modular way, making it easy to

add other segmentation, tracking, motion estimation, and render-

ing methods.

There are many other optical flow methods, and the Middle-

bury’s optical flow page (http://vision.middlebury.edu/flow) pro-

vides a ranking of several state-of-the-art optical flow methods.

The data sets and evaluation methods used on the Middlebury’s

optical flow ranking (Baker et al., 2007) emphasize problems

associated with nonrigid motion, real sensor noise, complex natu-

ral scenes, and motion discontinuities. They achieve that by

including data sets with realistic synthetic sequences, nonrigid

motion where the ground-truth is known, stereo pairs of static

scenes, and high frame-rate video used to assess interpolation

error. Four quality measures are used for ranking the optical flow

methods, two for flow accuracy, and two for frame interpolation

quality, where the two measures of flow accuracy are the angular

error and the end-point error.

The ability to easily make use of different methods for parts of

the animation’s creation process was exercised here by using

another optical flow method to generate an animation. The method

chosen is the FusionFlow method, proposed by Lempitsky et al.

(2008), that is currently ranked as one of the top four methods in

the Middlebury’s optical flow page in all four measures.

The FusionFlow method (Lempitsky et al., 2008) formulates the

optical flow computation as graph cut problem, as iteratively fuses

a set of candidate solutions (proposals) using minimum cuts, and it

models the optical flow estimation using pairwise Markov Random

Fields, as it was done in (Heitz and Bouthemy, 1993; Black and

Anandan,1996). The energy function used has two terms, a data

term, that measures how well the flow field describes the matching

between pixels in the two images, and the spatial term, that penal-

izes changes in horizontal and vertical flow between adjacent pix-

els. The distance between two pixels in the data term is the Euclid-

ean distance in RGB space that is computed after performing a

high-pass filtering of the input data, to make the data term more

robust to illumination and exposure changes.

The proposals were computed using the Lucas-Kanade (LK)

method (Lucas and Kanade, 1981) that usually produces accurate

estimates for textured regions but not for textureless regions, and

the Horn-Schunck (HS) method (Horn and Schunck, 1981), that

usually produce accurate estimates for regions with smooth motion,

but over-smooths areas with motion discontinuities. Apart from

those computed proposals, some constant flow fields were also

used.

The process starts by randomly choosing one of the LK or HS

proposals as an initial solution, randomly visiting all other pro-

posals, and fusing them with the current solution, one by one. The

constant flow fields are then computed using clusters of flow vectors

of the fused solution produced after this first pass. Then, the fusion

process is repeated twice for all proposals, now including the con-

stant flow proposals. It is important to emphasize that the fusions do

not increase the energy of the solution, and it was observer by Lem-

pitsky et al. (2008) that the final solution ‘‘always has an energythat is much smaller than the energy of the best proposal.’’ Afterperforming this discrete optimization, a standard conjugate gradient

method is used to perform local optimization that produces more

accurate flow estimates for areas where the proposal solutions were

not diverse enough.

Frames of two animations generated using the flow maps

produced by the FusionFlow method are shown in Section IV.

Figure 3. The Frog, a model

with nonrigid motion betweenframes. [Color figure can be

viewed in the online issue,

which is available at www.interscience.wiley.com.]

106 Vol. 19, 100–110 (2009)

https://www.researchgate.net/publication/3192209_Multimodal_estimation_of_discontinuous_optical_flow_using_Markov_random_fields?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/226516316_A_Database_and_Evaluation_Methodology_for_Optical_Flow?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/222813701_Object_tracking_in_image_sequences_using_point_features?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==



https://www.researchgate.net/publication/221361471_FusionFlow_Discrete-continuous_optimization_for_optical_flow_estimation?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==








I. Stylized Rendering. As mentioned above, the tool can be used

with any stylized rendering technique, as long as it is implemented

in the framework of the AnimVideo rendering tool. So, to validate

the AnimVideo rendering tool, we have implemented five artistic

styles.

The first artistic style implemented is impressionist painting.Similarly to what is done in the original work (Litwinowicz, 1997),

we render the brush strokes according to a predetermined size,

using the average color of the region where the brush stroke is

placed. The difference between our method and the previous

method (Litwinowicz, 1997) is that we use the constrained optical

flow as described in Section III, resulting in much less error in the

rendering of brush strokes in our case. We can also use the optical

flow information to create a velocity, as well as a motion emphasis

effect, by using the orientation and magnitude of the flow vectors to

determine the size and orientation of the brush strokes, or their life

span, respectively.

The second artistic style described here is mosaicing. In a previ-

ous work (Gomes et al., 2007b), we have implemented techniques

for performing mosaics using different initial tile distributions, such

as the Centroidal Voronoi Diagrams (CVDs), similarly to the works

of Hausner (2001) and Faustino and Figueiredo (2005) for static

mosaics, or the Distance Transform Matrices (DTM), as done by Di

Blasi and Gallo (2005) to distribute quadrilateral tiles, all designed

for generating static mosaics. After computing the DTM of an

image, we can compute the gradient and the level line matrices,

which determine the tile orientations and positions, respectively.

However, the method of Di Blasi and Gallo (2005) handles only

tiles of the same size. On the other hand, since we segment the input

video into disjoint objects, our method can associate different char-

acteristics with them, such as the tile size, emphasizing regions

close to borders, as is done by Faustino and Figueiredo (2005).

Tile removal or addition becomes necessary when objects move

closer or further away from the camera, or when new parts of the

scene appear in the video. This is done to maintain a consistent

appearance of the tiles in the animation, i.e., a homogeneously

dense animated mosaic. To control the addition/removal of tiles, we

defined a threshold that specifies the maximal superposition that

two tiles can have, and they appear as if one of the tiles had been

cut to free space for another tile to be placed, something common

Figure 4. Stylized renderings

produced using the AnimVideo

rendering tool showing the ren-dering styles of colored sand

bottle (a and b), watercolor (c

and d), and the combination of

watercolor (background) andimpressionism (frog) (e and f).

[Color figure can be viewed in

the online issue, which is avail-

able at www.interscience.wiley.com.]

Vol. 19, 100–110 (2009) 107

https://www.researchgate.net/publication/4225359_Simple_Adaptive_Mosaic_Effects?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==




https://www.researchgate.net/publication/221411358_Mosaic_Animations_from_Video_Inputs?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

in real life mosaics. By playing with this threshold, we can achieve

more or less tightly packed tiles in areas where the video is

changing.

Another artistic style implemented simulates a typical art craft

from the Northeastern region of Brazil that uses colored sand to

compose landscape images on the inner surface of transparent glass

bottles. Since the visual interaction takes place solely at the inner

glass surface of the bottle, we implemented a method for generating

2D procedural sand textures (Britto-Neto and Carvalho, 2007) that

can be then combined to compose images similar to the ones pro-

duced by the artists. We also implemented two techniques to mimic

effects created by the artists using their tools. The images generated

can then be texture mapped into the inner surface of a 3D glass bot-

tle model. (Artists can also create pictures between two flat pieces

of glass, producing a ‘‘painting’’ that can be laid on a flat surface.)

By implementing these techniques (here called Csand, short for col-ored sand), we allowed users to create not only bottles with similar

images to the ones produced by the artists, but also animations

using these sand bottles, something close to impossible in real life

with the original technique.

A related technique implemented is the one named Sandbox(Britto-Neto, 2007), where movies are shown as they were playing

on a sandbox, and objects inside it push sand around as they move

about it. This method, inspired in the work of Sumner and co-

workers (O’Brien et al., 1999), is used more effectively on movies,

where the background is static and there are few objects moving on

it. After the objects of interest are segmented with the segmentation

module of the AnimVideo tool, we generate depth masks for each

moving object, followed by the computation of the compression of

the sand under the objects and the dislocation of sand on the edges

of the objects. Finally, some erosion is performed to smooth out the

sand ripples generated by the objects.

The fifth style implemented in the AnimVideo tool is the Water-color style proposed by Bousseau et al. (2006) to perform image

stylization. This method consists of creating a simplified, abstracted

version of the input image and applying textures that simulate a

watercolor appearance to this abstracted image. This technique was

later extended to handle videos in (Bousseau et al., 2007), where the

temporal coherence was achieved by using texture advection along

lines of optical flow, whereas the video abstraction was performed

with 3D morphology filters, treating the time (i.e., the frames of the

video) as the third dimension, in the same way it is done here.

IV. RESULTS

To validate our tool, we have made experiments for producing styl-

ized rendering. Figure 4 shows examples of stylized animations

produced using the AnimVideo Rendering tool, with 4a and 4b

Figure 5. Two original frames (a and b) and the correspondent stylized frames (c and) generated busing a directional painting style. [Color fig-

ure can be viewed in the online issue, which is available at www.interscience.wiley.com.]

108 Vol. 19, 100–110 (2009)

https://www.researchgate.net/publication/47463327_Video_Watercolorization_using_Bidirectional_Texture_Advection?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==


https://www.researchgate.net/publication/4285144_Message_in_a_Bottle_Stylized_Rendering_of_Sand_Movies?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/261132853_Renderizacoes_Nao_Fotorealisticas_para_Estilizacao_de_Imagens_e_Videos_usando_Areia_Colorida?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

showing two frames of an animation using the Csand technique,

whereas 4c and 4d show two frames of a video rendered using the

watercolor style of [8,9], and Figures 4e and 4f show two frames of

an impressionist rendering of the frog, while the background is ren-

dered as a watercolor. Finally, Figures 5a and 5b show two input

frames, whereas Figures 5c, 5d, 6a, and 6b show the directional

painting renderings of two frames from two different input videos

of the Middlebury evaluation data sets, that were generated using

the flow maps produced by the FusionFlow method (Lempitsky

et al., 2008).

On the left of Figure 5d, where the orange ball is located, we

can see the effect of wrong flow vectors being used for mapping the

brush strokes. This is exactly the case where a segmentation step

can help, since a previous segmentation of the ball as a separate

object can be used to restrict the search space of the optical flow

method to the area where the ball is located in the next frame.

The quality of the stylized videos is very dependent on the qual-

ity of the segmentation of the objects. If the segmentation is not

good, the rendering module will render parts of the object that were

mistakenly segmented in an erroneous way. Of course, very noisy

videos will affect the quality of the Constrained Optical Flow result,

even to the point of making it useless. However, the segmentation

method described here has been successfully used to segment very

diverse videos, some of which contained several occluded objects,

and moving shadows (Oliveira, 2007). The speed of the segmenta-

tion process is also important, since the user can interact with the

program, adding and/or removing seed spels, and reprocess the

video. The average time for segmenting the videos mentioned here

was about 4 ls per spel on a Pentium 4 3.0 GHz.

Another potential problem is the aperture problem. The aperture

problem is an underlying limitation of optical flow methods. Possi-

ble approaches to solve this problem are the conversion of the

motion problem to a stereo problem and finding the correspondence

between a number of points in the image at time t to the image at

time t 1 dt, or the computation of the optical flow and the use of its

geometrical properties to deduce 3D information about the scene

and the motion. Here, we adopt an alternative solution to attenuate

the aperture problem by using homographies to estimate motion,

breaking the image objects into rigid components, and then using

these motion estimates as the initial flow field of the Proesmans’

et al. optical flow method (Proesmans et al., 1994). Such estimates

could also be used as one of the proposals in the fusion process of

the FusionFlow method (Lempitsky et al., 2008).

V. CONCLUSIONS

We propose a tool used for generating stylized renderings of videos.

The tool implements a method for enforcing temporal coherence

that is based on the full segmentation of the input video shot using a

fast fuzzy segmentation algorithm, followed by the computation of

optical flow maps, that are produced by restricting the search area

of the optical flow method to the correspondent segmented object.

The segmented objects can then be used as different layers in the

rendering process, thus, providing us with many options in the ren-

dering phase, such as rendering different objects using different

artistic styles, even though this is not done here. The approach

based on homographies can be used to produce good estimations

for motion to serve as an initial solution for the iterative optical

flow computation. We have also shown that flow maps produced by

a different optical flow method can be used to generate animations,

due to the modular structure of the tool.

In the experiments, we produced several frames, for several ani-

mations, using the AnimVideo rendering tool rendering modules,

with the rendering styles of Mosaic, CSand, Sandbox, Impressionist

painting and Watercolor. The AnimVideo tool was designed to eas-

ily allow the addition of plugins containing new rendering styles or

segmentation and point tracking techniques (used to compute the

homographies), so it could be also used as a framework for the

development of well-known and new rendering styles in undergrad-

uate computer graphics classes. Other future works include the

addition of other computer vision techniques to allow more robust

processing of highly complex video shots.

ACKNOWLEDGMENTS

A preliminary version of this article (Oliveira et al., 2008) was pre-

sented at the 12th International Workshop on Combinatorial Image

Analysis, which took place in Buffalo, NY, in April 7–9, 2008. The

authors thank Stefan Roth, V. Lempitsky, and C. Rother for making

available to us the flow maps used to generate Figures 5 and 6.

REFERENCES

J.K. Aggarwal and N. Nandhakumar, On the computation of motion from

sequences of images—A review, Proc IEEE 76 (1988), 917–935.

P. Anandan, A computational framework and an algorithm for the measure-

ment of visual motion, Int J Comput Vis 2 (1989), 283–310.

Figure 6. Two stylized frames generated busing a directional paint-

ing style. [Color figure can be viewed in the online issue, which is

available at www.interscience.wiley.com.]

Vol. 19, 100–110 (2009) 109









S. Baker, S. Roth, D. Scharstein, M.J. Black, J.P. Lewis, and R. Szeliski, A

database and evaluation methodology for optical flow, Int Conf Comput

Vis’07 1 (2007), 1–8.

S.S. Beauchemin and J.L. Barron, The computation of optical flow, ACM

Comput Surv 27 (1995), 433–466.

M.J. Black and P. Anandan, The robust estimation of multiple motions:

Parametric and piecewise-smooth flow fields, Comput Vis Image Under-

stand 63 (1996), 75–104.

A. Bousseau, M. Kaplan, J. Thollot, and F. Sillion, Interactive color render-

ing with temporal coherence and abstraction, Proc Int Symp Non-Photoreal-

istic Anim Render 1 (2006), 141–149.

A. Bousseau, F. Neyret, J. Thollot, and D. Salesin, Video watercolorization

using bidirectional texture advection, ACM Trans Graphics (Proc SIG-

GRAPH) 26 (2007), 104.

L.S. Britto-Neto, Renderizacoes nao fotorealısticas para estilizacao de

imagens e vıdeos usando areia colorida, Master’s thesis, Universidade Fed-

eral do Rio Grande do Norte, 2007.

L.S. Britto-Neto and B.M. Carvalho, Message in a bottle: Stylized rendering

of sand movies, Proc XX Braz Symp Comput Graphics Image Process (SIB-

GRAPI’07), IEEE, Los Alamitos, CA, 2007, pp. 11–18.

B.M. Carvalho, C.J. Gau, G.T. Herman, and T.Y. Kong, Algorithms for

fuzzy segmentation, Pattern Anal Appl 2 (1999), 73–81.

B.M. Carvalho, G.T. Herman, and T.Y. Kong, Simultaneous fuzzy segmen-

tation of multiple objects, Discrete Appl Math 151 (2005), 55–77.

B.M. Carvalho, L.M. Oliveira, and G.S. Silva, Fuzzy segmentation of color

video shots, Proc DGCI, Springer-Verlag, London, 2006, Vol. 4245, pp.

402–407.

H.D. Cheng, H. Jiang, Y. Sun, and J.I. Wang, Color image segmentation:

Advances and prospects, Pattern Recognit 34 (2001), 2259–2281.

J.P. Collomosse, D. Rowntree, and P.M. Hall, Stroke surfaces: Temporally

coherent artistic animations from video, IEEE Trans Visual Comput Graph

11 (2005), 540–549.

G. Di Blasi and G. Gallo, Artificial mosaics, Vis Comput 21 (2005), 373–383.

G. Faustino and L. Figueiredo, Simple adaptive mosaic effects, Proc SIB-

GRAPI 1 (2005), 315–322.

J. Gauch and C. Hsia, A comparison of three color image segmentation algo-

rithm in four color space, SPIE Vis Commun Image Process’92 1818 (1992),

1168–1181.

R.B. Gomes, T.S. Santos, and B.M. Carvalho, Coeroncia temporal intra-

objeto para NPR utilizando fluxo optico restrito, Revista Eletronica de Ini-

ciacao Cientıfica 7 (2007a), 2007205.

R.B. Gomes, T.S. Santos, and B.M. Carvalho,Mosaic animations from video

inputs, Proc IEEE Pacific-Rim Symp Image Video Technol, Springer-Ver-

lag, London, 2007b, Vol. 4872, pp. 87–99.

B. Gooch and A. Gooch, Non-photorealistic rendering, AK Peters, Natick,

MA, 2001.

J. Hair, B. Black, B. Babin, R. Anderson, and R. Tatham, Multivariate data

analysis, 6th edition, Prentice Hall, Upper Saddle River, NJ, 2005.

R. Hartley and A. Zisserman, Multiple view geometry in computer vision,

Cambridge University, Cambridge, UK, 2003.

A. Hausner, Simulating decorative mosaics, Proc ACM SIGGRAPH 1

(2001), 207–214.

F. Heitz and P. Bouthemy, Multimodal estimation of discontinuous optical

flow using markov random fields, Trans Pattern Anal Appl 17 (1993), 185–

203.

G.T. Herman, Geometry of digital spaces, Springer, Danvers, MA, 1998.

G.T. Herman and B.M. Carvalho, Multiseeded segmentation using fuzzy

connectedness, IEEE Trans Pattern Anal Mach Intell 23 (2001), 460–474.

A. Hertzmann and K. Perlin, Painterly rendering for video and interaction,

Proc NPAR 1 (2000), 7–12.

B. Horn and B. Schunck, Determining optical flow, Artif Intell 17 (1981),

185–203.

S. Khan and M. Shah, Object based segmentation of video using color,

motion and spatial information, Proc IEEE CVPR 2 (2001), 746–751.

V. Lempitsky, S. Roth, and C. Rother, Fusionflow: Discrete-continuous opti-

mization for optical flow estimation, CVPR08 1 (2008), 1–8.

P. Litwinowicz, Processing images and video for an impressionist effect,

Proc ACM SIGGRAPH 1 (1997), 407–414.

J. Liu and Y.-H. Yang, Multiresolution color image segmentation, IEEE

Trans Pattern Anal Mach Intell 16 (1994), 689–700.

B. Lucas and T. Kanade, An iterative image registration technique with an

application to stereo vision, 7th Int Joint Conf Artif Intell 1 (1981), 674–679.

B. McCane, K. Novins, D. Crannitch, and B. Galvin, On benchmarking

optical flow, Comput Vis Image Understand 84 (2001), 126–143.

K. Novins, D. Mason, S. Mills, B. Galvin, and B. McCane, Recovering

motion fields: An evaluation of eight optical flow algorithms, In 9th British

Machine Vision Conference, Southampton, UK, 1998, pp. 195–204.

J.F. O’Brien, R. Sumner, and J.K. Hodgins, Animating sand, mud, and

snow, Comput Graph Forum 18 (1999), 17–26.

L.M. Oliveira, Segmentacao fuzzy de imagens e vıdeos, Master’s thesis,

Universidade Federal do Rio Grande do Norte, Natal, Brazil, 2007.

L.M. Oliveira, L.S. Britto-Neto, R.B. Gomes, T.S. Santos, G.S. Andrade,

and B.M. Carvalho, ‘‘Producing stylized renderings using the AVP render-

ing tool,’’ In Image Analysis—From Theory to Applications, R.P. Barneva

and V.E. Brimkov (Editors), Research Publishing, Singapore, 2008, pp. 55–

64.

M. Otte and H.-H. Nagel, Optical flow estimation: Advances and compari-

sons, Eur Conf Comput Vis 1 (1994), 51–60.

M. Proesmans, L.V. Gool, E. Pauwels, and A. Oosterlinck, Determination of

optical flow and its discontinuities using non-linear diffusion, Proc 3rd

ECCV 2 (1994), 295–304.

A. Rosenfeld, Fuzzy digital topology, Inform Contr 40 (1979), 76–87.

A. Singh, Optic flow computation: A unified perspective, IEEE Computer

Society Press, Los. Alamitos, California, 1991.

C. Stiller and J. Konrad, Estimating motion in image sequences: A tutorial

on modeling and computation of 2D motion, IEEE Signal Process Mag 16

(1999), 70–91.

T. Strothotte and S. Schlechtweg, Non-photorealistic computer graphics:

Modeling, rendering and animation, Morgan Kaufman, San Francisco, CA,

2002.

P. Tissainayagam and D. Suter, Object tracking in image sequences using

point features, Pattern Recognit 38 (2005), 105–113.

J.K. Udupa and S. Samarasekera, Fuzzy connectedness and object definition:

Theory, algorithms, and applications in image segmentation, Graph Model

Image Process 58 (1996), 246–261.

J. Wang, Y. Xu, H.-Y. Shum, and M.F. Cohen, Video tooning, ACM Trans

Graphics 23 (2004), 574–583.

G. Winkenbach and D.H. Salesin, Computer-generated pen-and-ink illustra-

tion, In Proc SIGGRAPH 1994, ACM SIGGRAPH, Orlando, FL, 1994, pp.

91–100.

110 Vol. 19, 100–110 (2009)











https://www.researchgate.net/publication/220135301_The_Robust_Estimation_of_Multiple_Motions_Parametric_and_Piecewise-Smooth_Flow_Fields?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==














https://www.researchgate.net/publication/2906640_Simulating_Decorative_Mosaics?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/2906640_Simulating_Decorative_Mosaics?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/2577310_Estimating_Motion_in_Image_Sequences_-_A_tutorial_on_modeling_and_computation_of_2D_motion?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==






https://www.researchgate.net/publication/220246810_Fuzzy_Digital_Topology?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==
















https://www.researchgate.net/publication/273128407_Producing_Stylized_Renderings_Using_the_AVP_Rendering_Tool?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==





https://www.researchgate.net/publication/3940707_Object_based_segmentation_of_video_using_color_motion_and_spatial_information?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/3940707_Object_based_segmentation_of_video_using_color_motion_and_spatial_information?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==














































https://www.researchgate.net/publication/263339244_Animating_Sand_Mud_and_Snow?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

https://www.researchgate.net/publication/263339244_Animating_Sand_Mud_and_Snow?el=1_x_8&enrichId=rgreq-f4697b40-0c71-4e73-8d22-8d7816df0686&enrichSource=Y292ZXJQYWdlOzIyNzYxNzQyMDtBUzoxMDMyNDA2MTE5MjYwMjhAMTQwMTYyNTg4ODcxMg==

Producing stylized videos using the AnimVideo rendering tool

Documents

Transcript of Producing stylized videos using the AnimVideo rendering tool