Post on 23-Feb-2023
CSE 473/573 Computer Vision and Image
Processing (CVIP)
Ifeoma Nwogu inwogu@buffalo.edu
Prerequisites
• This course is appropriate for students with these essential prerequisites: – A good working knowledge of MATLAB programming (or willingness
and time to pick it up quickly!) – Linear algebra – Vector calculus
• The course does not assume prior imaging experience, computer vision, image processing, or graphics
Matlab
• Problem sets and projects will involve Matlab programming. Matlab runs on all the CSE lab Windows and UNIX systems.
• CSE 473/573 students can use their existing accounts in CSE labs, and can request new CSE accounts (for any non CSE majors)
• Any issues with CSE lab machines and accounts should be forwarded to cse-consult@cse.buffalo.edu
Grading
• There will be four components to the course grade – Approximately ten short online quizzes – Three programming assignments – Comprehensive mid-term exam – Final project (including evaluation of final report)
• Class participation is strongly encouraged to offset any negative performance in any of the above components.
Final project
• Significant implementation of a technique related to the course content
• Teams of 2 encouraged (document each role!) • CVPR-type review article (no teams) • Two components (if you implement a project
different from what is assigned) : – proposal document (no more than 2 pages) – final write-up with results (no more than 8 pages
CVPR style)
Course goals
• Values of computer vision to society • Principles of image formation • Convolution and image pyramids • Feature detection, matching and alignment • Motion estimation • Visual recognition • Machine learning models in vision • Temporal models
Course goals
• Values of computer vision to society • Principles of image formation • Convolution and image pyramids • Feature detection, matching and alignment • Motion estimation • Visual recognition • Machine learning in vision • Temporal models
What is computer vision?
• What does it mean, to see? “to know what is where by looking”.
• How to discover from images what is present in the world, where things are, what actions are taking place. – Computing properties of the 3D world from visual data
(measurement) – Algorithms and representations to allow a machine to
recognize objects, people, scenes, and activities. (perception and interpretation)
from Marr, 1982
The problem
Real world scene
Sensing device Interpreting device Interpretation
A person/ A smiling person/
Dr. Ifeoma Nwogu/ etc.
•Want to make a computer understand images
• We know it is possible – we do it effortlessly!
Earth viewers (3D modeling)
Image from Microsoft’s Virtual Earth (see also: Google Earth)
Photosynth
http://labs.live.com/photosynth/
Based on Photo Tourism technology developed by Noah Snavely, Steve Seitz, and Rick Szeliski
Photo Tourism overview
Scene reconstruction
Photo Explorer Input photographs Relative camera positions
and orientations
Point cloud
Sparse correspondence
System for interactive browsing and exploring large collections of photos of a scene. Computes viewpoint of each photo as well as a sparse 3d model of the scene.
Optical character recognition (OCR)
Digit recognition, AT&T labs http://www.research.att.com/~yann/
Technology to convert scanned docs to text • If you have a scanner, it probably came with OCR software
License plate readers http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
Smile detection?
Sony Cyber-shot® T70 Digital Still Camera
Object recognition (in supermarkets)
LaneHawk by EvolutionRobotics “A smart camera is flush-mounted in the checkout lane, continuously watching for items. When an item is detected and recognized, the cashier verifies the quantity of items that were found under the basket, and continues to close the transaction. The item can remain under the basket, and with LaneHawk,you are assured to get paid for it… “
Vision-based biometrics
“How the Afghan Girl was Identified by Her Iris Patterns” Read the story
Login without a password…
Fingerprint scanners on many new laptops,
other devices
Face recognition systems now beginning to appear more widely
http://www.sensiblevision.com/
Object recognition (in mobile phones)
– Microsoft Research
– Point & Find, Nokia – SnapTell.com (now amazon)
Sports
Sportvision first down line Nice explanation on www.howstuffworks.com
Sports
Brief explanation on how hawk-eye works can be found here
Smart cars
• Mobileye – Vision systems currently in high-end BMW, GM, Volvo models – Back-up camera requirement for all new cars and light trucks – Video demo
Vision-based interaction (and games)
Nintendo Wii has camera-based IR tracking built in. See Lee’s work at CMU on clever tricks on using it to create a multi-touch display!
Digimask: put your face on a 3D avatar.
“Game turns moviegoers into Human Joysticks”, CNET Camera tracking a crowd, based on this work.
Medical imaging
Image guided surgery Grimson et al., MIT
3D imaging MRI, CT
Course goals
• Values of computer vision to society • Principles of image formation • Convolution and image pyramids • Feature detection, matching and alignment • Motion estimation • Visual recognition • Machine learning in vision • Temporal models
Structure of light
Left) scene illuminated with a ceiling lamp. Right) the two Images on the right have been obtained by illuminating the scene with a laser pointer. On each image, the red arrow indicates the approximate direction of the light beam produced by pointer.
The Plenoptic Function
The intensity P can be parameterized as:
P (θ, φ, t, λ, X, Y, Z) “The complete set of all convergence points constitutes the permanent possibilities of vision.” Gibson
Adelson & Bergen, 91
Measuring light vs. measuring scene properties
by Roger Shepard (”Turning the Tables”)
Depth processing is automatic, and we can not shut it down…
Related disciplines
Cognitive science
Algorithms
Image processing
Artificial intelligence
Graphics Machine learning
Computer vision
Again, what is computer vision?
• Mathematics of geometry of image formation?
• Statistics of the natural world? • Models for neuroscience? • Engineering methods for matching images? • Science Fiction?
Ans: All of the above and more….
sky
water
Ferris wheel
amusement park
Cedar Point
12 E
tree
tree
tree
carousel deck
people waiting in line
ride
ride
ride
umbrellas
pedestrians
maxair
bench
tree
Lake Erie
people sitting on ride
Objects Activities Scenes Locations Text / writing Faces Gestures Motions Emotions…
The Wicked Twister
Goal of computer vision is to write computer programs that can interpret images
Slide Credits
• Darrell Trevor – UC Berkeley
• Antonio Torralba – MIT Vision Group
• Rob Fergus – NYU Vision, Learning and Graphics group
Next class
• Readings for today: Szeliski, Ch. 1 • Overview on linear algebra in the context of
optimization techniques
Physical parameters of image formation
• Geometric – Type of projection – Camera pose
• Optical – Sensor’s lens type – focal length, field of view, aperture
• Photometric – Type, direction, intensity of light reaching sensor – Surfaces’ reflectance properties