Intelligent Scissors is an interactive image segmentation tool that allows a user to select piece-wise globally optimal contour segments(based on an optimal path search in a graph)that correspond to a desired object boundary.This dissertation uses tobogganing to raise the granularity of the image primitive above the pixel level,producing a region-based basic processing unit that is object-centered rather than device-dependent.The resulting region-based elements form the basis for several contributions to the field of computer vision general and to Intelligent Scissors in particular.These contributions reduce the human time and effort needed for object selection with Intelligent Scissors while simultaneously increasing the accuracy of boundary definition.

The region-based image primitives resulting from tobogganing form the basis for a graph formulation that is many times smaller than the pixel-based graph used previously

by Intelligent Scissors,thus providing faster,more interactively responsive optimal path computations.The object-centered atomic units also provide an efficient and consistent framework in which to compute a4-parameter edge model,allowing subpixel boundary localization,noise-independent edge blur adjustment,and automatic alpha matte genera-tion and color separation of boundary transition pixels.The increased size of the basic pro-cessing unit also facilitates an edge confidence measure that forms the basis for two new techniques called confidence threshold snapping and live-wire path extension,which fur-ther reduce the human burden involved with object boundary definition by automatically finding and following object boundaries.Finally,this dissertation presents a new paradigm for simultaneously interacting with multiple frames from a temporal image sequence by parallelizing both the user input and the interactive visual feedback,thus allowing a user to interact with a montage of image frames in order to define the boundary of a moving object while adhering to the same interactive style that has demonstrated to be effective for the single-image Intelligent Scissors.


I would like to sincerely thank the many people who,through their encouragement, counsel,support,and advice,have made it possible for me to complete this work.To my advisor,Dr.William Barrett,who,by his patient instruction,has guided me along the road to completion,to Dr.Bryan Morse who has always been willing to discuss ideas,and to Dr.Parris Egbert for his friendship and encouragement.

I would particularly like to thank my dear wife Christina and my four children for their patience,support,and understanding through the years.Finally,my most sincere gratitude goes to God for the many blessings I have received at his hand.


The ability to segment one or more images presented to a vision system(whether biologi-cal,optical,electronic,or otherwise)is a key step in image understanding.Image segmen-tation is the process of defining the regions and/or boundaries of one or more objects of interest in a digital image so that they can be separated from each other and the back-ground.The ultimate goal of many computer vision systems is to“understand”an image, or set of images,by creating a descriptive representation of the image(s).As such,accu-rate and timely image segmentation is a key requirement of many higher-level vision sys-tems.

This dissertation presents improvements to Intelligent Scissors[110,112],an interac-tive,general purpose,digital image selection tool that allows rapid and accurate object extraction from arbitrarily complex backgrounds using simple gesture motions with a mouse.The goal of this work is not only to extend the utility for selecting an object in a single image but also to expedite object extraction in temporal image sequences for subse-quent image composition operations.In doing so,the desire is to create a single tool that


2CHAPTER1.INTRODUCTION allows a user to define a moving object in a temporal sequence by exploiting the tech-niques used to extract a still object from a single image.By displaying several frames of a movie clip as a montage,the user is able to use the same interactive style of Intelligent Scissors to simultaneously interact with all the visible frames.As the pointer is moved in an“active”frame,the live-wire is updated and displayed in all frames.

To achieve the necessary interaction with multiple image frames,this dissertation uses tobogganing[49,183]to presegment an image into small,homogeneous regions.These regions become the basic primitive in a weighted graph formulation used for interactive live-wire selection,thereby raising the granularity of the segmentation problem from the pixel level to a fundamental unit that(usually)contains multiple connected pixels.The resulting region-based graph is several times smaller than the previous pixel-based version of Intelligent Scissors,allowing for a more responsive optimal path computation within a multi-image,interactive live-wire environment.In addition,the region-based graph allows for edge costs that incorporate region-based features and more advanced curvature met-rics.Further,tobogganing provides a consistent and efficient mechanism for computing a four-parameter edge model that estimates subpixel boundary position,edge blur,and fore-ground/background color.Finally,the region-based graph also facilitates a“proactive”path extension and a more intelligent cursor snapping in addition to providing a frame-work for future enhancements such as object selection,free-point sub-tree matching,live-wire feature-cost coupling,etc.

1.1.STATEMENT OF THE PROBLEM3 1.1Statement of the Problem

Fully automatic general image segmentation is an unsolved problem due to the wide variety of image sources,content,and complexity.In fact,any general purpose image seg-mentation tool—such as the selection tools used in image manipulation packages like Adobe Photoshop[1]—will continue to require some degree of human guidance due to the essential role of the user in identifying which image components are of interest.Thus, intelligent tools that exploit high-level visual expertise but require minimal user interac-tion become appealing[126].

With the increasing use of digital images and movies in web design,entertainment, medical analysis,virtual environment creation,etc.,the demand will continue to grow for fast and easy-to-use segmentation tools that accurately select,extract,measure,and/or define objects of interest.Unfortunately,basic and often tedious tools such as lassoing and magic wand are still widely used when a complex image component must be separated from a nontrivial background.For this reason,research continues to search for better selection tools that can reduce the human burden.

Recently,this author,along with Dr.William Barrett,introduced a unique boundary based segmentation technique called Intelligent Scissors[110,112],which allows rapid and accurate object extraction using simple gestures motions with a mouse.Based on the live-wire interactive optimal path selection tool[10-11,109,111],Intelligent Scissors per-forms general purpose,interactive object segmentation by allowing the user to choose a minimum cost contour segment corresponding to a portion of the desired object boundary. As the mouse position comes in proximity to an object edge,a live-wire boundary“snaps”to and wraps around the object of fc625a3510661ed9ad51f355pared to manual tracing(i.e.,lassoing),

4CHAPTER1.INTRODUCTION object selection using Intelligent Scissors is many times faster,more accurate,and more reproducible[112].

Nevertheless,while Intelligent Scissors,as presented in[10-11,110-112],reduces (often greatly)the user input needed to define an object boundary compared to previous techniques,it by no means represents the minimum amount of human effort required for object selection.Further,even Intelligent Scissors becomes tedious when trying to define, on a frame-by-frame basis,the boundary of a moving object in a temporal sequence of images.Consequently,this dissertation presents several extensions to the Intelligent Scis-sors object selection tool with the primary goal of reducing the user input required to accu-rately define object boundaries and a secondary goal of facilitating image editing operations.While the original objective is,for the most part,to facilitate boundary defini-tion of a moving object within a temporal image sequence for subsequent editing opera-tions(e.g.,composition),many of these extensions also reduce(sometimes greatly)the amount of human effort needed to segment an object from a single image.The objectives of this work are to:

1.Extend Intelligent Scissors to easily and accurately extract moving or changing

objects from temporal or spatial image sequences.

2.Maintain the advantage Intelligent Scissors has over previous user-guided segmen-

tation techniques—namely,the immediate interactive feedback afforded by live-

wire optimal path selection.

3.Provide a tight coupling of an object’s contour properties between neighboring

image frames.

4.Increase the potential accuracy of Intelligent Scissors as well as to facilitate more

convincing compositions by providing subpixel boundary definition,automatic

edge blur estimation,and foreground/background color separation.

5.Improve Intelligent Scissors’general utility by providing more intelligent cursor

snapping and introducing“proactive”live-wire to reduce the effort in placing seed points and the number of seed points needed.

1.1.STATEMENT OF THE PROBLEM5 The end goal is to extend Intelligent Scissors in order to extract moving objects from time-varying images with little or no more effort than is needed to select an object in a still image.In doing so,we wanted to adhere to the unique interactive style for which Intelli-gent Scissors is known—the immediate live-wire feedback.Thus,rather than define an object boundary in one frame and subsequently“batch”process the other frames,multi-frame Intelligent Scissors displays several frames as a montage and allows the user to interact with all visible frames simultaneously.As the user gestures in one frame,a live-wire is computed and displayed in all visible frames.Further,the user can easily jump between frames to make local adjustments during live-wire selection.

In terms of the effort required to define a moving object boundary through a temporal image sequence,a tool that requires a user to manually place seed points in every frame provides little or no benefit over single image Intelligent Scissors.Ideally,a user interacts within a single frame and Intelligent Scissors tracks the object through all frames and immediately displays the resulting boundary in process.Tracking of object boundaries typically requires matching of object features between neighboring images.Thus,another objective of this work is to explore methods of interactively matching free-point features through the sequence.Since matching occurs during boundary definition,information about the object—such as the foreground color and object shape in the vicinity of the point being matched—is not explicitly available.Though point matching techniques abound [172,147,185],many of them fail to choose the correct match when multiple“high confi-dence”possibilities are available or when local properties change.Without some higher-level knowledge of the object and how it moves,it is difficult to provide reliable matches of edge points in a dynamic system.Fortunately,the interactive nature of multi-frame

