CSE 473/573 Computer Vision and Image Processing (CVIP)

53
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu [email protected]

Transcript of CSE 473/573 Computer Vision and Image Processing (CVIP)

CSE 473/573 Computer Vision and Image

Processing (CVIP)

Ifeoma Nwogu [email protected]

Today

• Logistics • Schedule • Introductions • What is computer vision? • Why is vision so hard?

Prerequisites

• This course is appropriate for students with these essential prerequisites: – A good working knowledge of MATLAB programming (or willingness

and time to pick it up quickly!) – Linear algebra – Vector calculus

• The course does not assume prior imaging experience, computer vision, image processing, or graphics

Text

Strongly recommended

Optional

Matlab

• Problem sets and projects will involve Matlab programming. Matlab runs on all the CSE lab Windows and UNIX systems.

• CSE 473/573 students can use their existing accounts in CSE labs, and can request new CSE accounts (for any non CSE majors)

• Any issues with CSE lab machines and accounts should be forwarded to [email protected]

Grading

• There will be four components to the course grade – Approximately ten short online quizzes – Three programming assignments – Comprehensive mid-term exam – Final project (including evaluation of final report)

• Class participation is strongly encouraged to offset any negative performance in any of the above components.

Final project

• Significant implementation of a technique related to the course content

• Teams of 2 encouraged (document each role!) • CVPR-type review article (no teams) • Two components (if you implement a project

different from what is assigned) : – proposal document (no more than 2 pages) – final write-up with results (no more than 8 pages

CVPR style)

Course goals

• Values of computer vision to society • Principles of image formation • Convolution and image pyramids • Feature detection, matching and alignment • Motion estimation • Visual recognition • Machine learning models in vision • Temporal models

Course goals

• Values of computer vision to society • Principles of image formation • Convolution and image pyramids • Feature detection, matching and alignment • Motion estimation • Visual recognition • Machine learning in vision • Temporal models

What is computer vision?

• What does it mean, to see? “to know what is where by looking”.

• How to discover from images what is present in the world, where things are, what actions are taking place. – Computing properties of the 3D world from visual data

(measurement) – Algorithms and representations to allow a machine to

recognize objects, people, scenes, and activities. (perception and interpretation)

from Marr, 1982

The problem

Real world scene

Sensing device Interpreting device Interpretation

A person/ A smiling person/

Dr. Ifeoma Nwogu/ etc.

•Want to make a computer understand images

• We know it is possible – we do it effortlessly!

NSF Frontiers in computer vision workshop, 2011

Current state of the art

• The next slides show some examples of what current vision systems can do

Earth viewers (3D modeling)

Image from Microsoft’s Virtual Earth (see also: Google Earth)

Photosynth

http://labs.live.com/photosynth/

Based on Photo Tourism technology developed by Noah Snavely, Steve Seitz, and Rick Szeliski

Photo Tourism overview

Scene reconstruction

Photo Explorer Input photographs Relative camera positions

and orientations

Point cloud

Sparse correspondence

System for interactive browsing and exploring large collections of photos of a scene. Computes viewpoint of each photo as well as a sparse 3d model of the scene.

Optical character recognition (OCR)

Digit recognition, AT&T labs http://www.research.att.com/~yann/

Technology to convert scanned docs to text • If you have a scanner, it probably came with OCR software

License plate readers http://en.wikipedia.org/wiki/Automatic_number_plate_recognition

Face detection

• Many new digital cameras now detect faces – Canon, Sony, Fuji, …

Object recognition (in supermarkets)

LaneHawk by EvolutionRobotics “A smart camera is flush-mounted in the checkout lane, continuously watching for items. When an item is detected and recognized, the cashier verifies the quantity of items that were found under the basket, and continues to close the transaction. The item can remain under the basket, and with LaneHawk,you are assured to get paid for it… “

Face recognition

Who is she?

Vision-based biometrics

“How the Afghan Girl was Identified by Her Iris Patterns” Read the story

Login without a password…

Fingerprint scanners on many new laptops,

other devices

Face recognition systems now beginning to appear more widely

http://www.sensiblevision.com/

Snaptell Amazon acquires SnapTell – WSJ 2009

Sports

Sportvision first down line Nice explanation on www.howstuffworks.com

Sports

Brief explanation on how hawk-eye works can be found here

Smart cars

• Mobileye – Vision systems currently in high-end BMW, GM, Volvo models – Back-up camera requirement for all new cars and light trucks – Video demo

Vision-based interaction (and games)

Nintendo Wii has camera-based IR tracking built in. See Lee’s work at CMU on clever tricks on using it to create a multi-touch display!

Digimask: put your face on a 3D avatar.

“Game turns moviegoers into Human Joysticks”, CNET Camera tracking a crowd, based on this work.

Medical imaging

Image guided surgery Grimson et al., MIT

3D imaging MRI, CT

Course goals

• Values of computer vision to society • Principles of image formation • Convolution and image pyramids • Feature detection, matching and alignment • Motion estimation • Visual recognition • Machine learning in vision • Temporal models

Structure of light

Left) scene illuminated with a ceiling lamp. Right) the two Images on the right have been obtained by illuminating the scene with a laser pointer. On each image, the red arrow indicates the approximate direction of the light beam produced by pointer.

Why is vision so hard?

The structure of ambient light

The structure of ambient light

The Plenoptic Function

The intensity P can be parameterized as:

P (θ, φ, t, λ, X, Y, Z) “The complete set of all convergence points constitutes the permanent possibilities of vision.” Gibson

Adelson & Bergen, 91

Why is vision so hard?

Measuring light vs. measuring scene properties

by Roger Shepard (”Turning the Tables”)

Depth processing is automatic, and we can not shut it down…

Measuring light vs. measuring scene properties

Measuring light vs. measuring scene properties

Why is vision so hard?

Some things have strong variations in appearance

Why is vision so hard?

Related disciplines

Cognitive science

Algorithms

Image processing

Artificial intelligence

Graphics Machine learning

Computer vision

Again, what is computer vision?

• Mathematics of geometry of image formation?

• Statistics of the natural world? • Models for neuroscience? • Engineering methods for matching images? • Science Fiction?

Ans: All of the above and more….

sky

water

Ferris wheel

amusement park

Cedar Point

12 E

tree

tree

tree

carousel deck

people waiting in line

ride

ride

ride

umbrellas

pedestrians

maxair

bench

tree

Lake Erie

people sitting on ride

Objects Activities Scenes Locations Text / writing Faces Gestures Motions Emotions…

The Wicked Twister

Goal of computer vision is to write computer programs that can interpret images

Slide Credits

• Darrell Trevor – UC Berkeley

• Antonio Torralba – MIT Vision Group

• Rob Fergus – NYU Vision, Learning and Graphics group

Next class

• Readings for today: Szeliski, Ch. 1 • Overview on linear algebra in the context of

optimization techniques

Questions

Physical parameters of image formation

• Geometric – Type of projection – Camera pose

• Optical – Sensor’s lens type – focal length, field of view, aperture

• Photometric – Type, direction, intensity of light reaching sensor – Surfaces’ reflectance properties

Next class

• More on Image Formation • Readings for today: Szeliski, Ch. 1 • Readings for next lecture: Szeliski 2.1-2.3.1,

Forsyth and Ponce 1.1, 1.4 (optional).