Deep Learning

Selective Topics

– Basic concepts in Machine learning

– Neural Networks

– The Convolutional neural networks (CNN)

– Training and Optimization

– Solving CV problems using CNN:

Object detection, image segmentation, object recognition.

– Unsupervised learning using CNN (auto-encoders)

– Learning sequence using Recurrent NN (RNN)

– Generative models (GAN)

2

Disclaimer

• Most of the course will be dedicated to study the the

main principles and basic concepts of deep learning.

• Modern techniques for image understanding, object

recognition and image generation will be covered at

the second half of the course as time permits.

• Lots of math – (but math is fun )

3

4

Textbooks

Books:• Deep Learning, by Ian Goodfellow, Yoshua

Bengio and Aaron Courville.

– Mostly theoretical

– Free online.

• Neural Networks and Deep Learning, by Michael Nielsen.

– Theory-based deep learning

– Free online

• Deep Learning with PyTorch, by Stevens & Antiga.

– Practitioner’s approach to deep learning

– Using PyTorch

5

Other Resources

Online Courses

• Stanford CS231: Convolutional Neural Networks for Visual Recognition

• CS231 in YouTube

• Andrew Ng's Course: Machine learning course given at Coursera

• Andrew Ng's Specialization: A serious of 5 courses in Coursera

• Geoffrey Hinton course: Neural Networks for Machine Learninggiven at Coursera and YouTube

Tutorials

• PyTorch tutorial

Machine Learning for

Computer Vision

• Images provide visual information about the world

around us.

• We’re trying to infer facts about the world through visual

information.

• In the last decade – huge improvement using

Deep Learning.

• We are in the midst of a huge REVOLUTION. This

revolution will influence any one of us in the near future.

• This course is all about this revolution.

6

Machine Learning for

Computer Vision

• Images provide visual information about the world

around us.

• We’re trying to infer facts about the world through visual

information.

• In the last decade – huge improvement using Deep

Learning.

• We are in the midst of a huge REVOLUTION. This

revolution will influence any one of us in the near future.

• This course is all about this revolution.

7

Classical problems in Computer Vision

9

ClassificationClassification + Localization

CAT CAT CAT, DOG, DUCK

Object Detection Segmentation

CAT, DOG, DUCK

Single

object

Multiple

objects

form Fei Fei Li

Classification + Localization

Biederman 1987

How many categories are there?

Brightness

dark

Chroma

light

blue

gray

Machine Learning: Toy example

• Detect images of beaches taken during the day.

• Let’s extract two features: Brightness & Chrominance.

Brightness

dark

Chroma

light

blue

gray

Learning from examples

Trained machine

for image classification

feature

extraction

feature

extraction

Trained machine

for image classification

Feature 2

Feature 1

But features are not always separable

“Good” Features for Machine Learning

• We may use hundreds of features!

• Trying to find features that are separable.

• Challenges:

− How to design “good features” (invariant and robust).

− How to extract good separation / generalization

feature

extraction

apple

tomato

cow

Since the 90’s - The search for good features

SIFT Spin image

HoG RIFT

Textons GLOH

Object Bag of ‘words’

Example of “features” for object classification

US Presidential Speeches Tag Cloudhttp://chir.ag/projects/preztags/

• Bag of words: Represent a document by the rate of

appearance of each word (= an entry in the English dictionary)

Source: Document classification

Bag of words for images Represent an image by a collection of “visual words”.

Bag of words for images

1. Extract visual words from many images



2. Build a visual vocabulary (using VQ)




3. Represent an image by a histogram of visual

words




3. Represent an image by a histogram of visual

words

4. Train a classifier from a set of examples


• Needs expert knowledge

• Not optimal

• Time-consuming and expensive

• Does not generalize to other domains

Problems with hand-crafted features

Success story: ILSVRC 2012

ImageNet Large Scale Visual Recognition Competition

Slide from Boris Ginsburg

ILSVRC: ClassificationThe Image Classification Challenge:

1000 object classes

1,431,167 images

ImageNet Classifications 2012

• A paper published by a research group from UOT showed

a dramatic breakthrough in the ImageNet Challenge.

• It reduces the error rates by almost 50%!

• The paper used Deep Learning for classification.

• The architecture used was termed “AlexNet” after Alex

Krizhevsky, the first author of this paper.

• This work is widely considered as the beginning of the Deep Learning era.

“AlexNet”



Geoffrey Hinton (right) Alex Krizhevsky, and Ilya Sutskever (left)


Slide from Boris Ginsburg

ImageNet Classifications until 2015

What is the idea behind “deep learning”?

Feature

extraction

apple

tomato

cow

What is the idea behind “deep learning”?

apple

tomato

cow

• Feature extraction and classification are learnt

together end-to-end.

How does it work?

• Multilayers of computational units called “neurons”.

• Information passes from layer to layer through weighted

connections.

• Resembles the brain structure, thus called “Neural Network”

output

hiddenlayers

CAT DOG

input

connections

neuron

How does it work?

Neural network – Training

CAT DOGCAT DOG

• Searching for the weights that will correctly classify the

training set


CAT DOGCAT DOG


example set

inference


CAT DOGCAT DOG


example set

weight update

CAT DOGCAT DOG

Neural network – Training• Searching for the weights that will correctly classify the

example set

CAT DOGCAT

Neural network – Inference

Neural network – Inference

CAT DOGDOG

49

Neural Network vs. Brain

• The network structure resembles the structure of

mammal’s brains:

50

Neural Network vs. Brain• The layer’s receptive-fields in very similar to

what is known about mammal’s brain

51

Neural Network vs. Brain

• David Hubel and Torsten Wiesel got the Nobel Prize in

1981 for their discoveries (in the 60’s) concerning

information processing in the brain.

52

Hubel and Wiesel’s Experiment

53

Deep Learning – why deep?

BUZZ

Conv Nets are everywhere• If it’s good for images – why not for other problems?

• CNN is a big hammer - we just need a right nail!

2018 Turing Award was given to

“The godfathers of Deep Learning”

56

[Faster R-CNN: Ren, He, Girshick, Sun 2015]

Detection Segmentation

[Farabet et al., 2012]

[Taigman et al. 2014]

Face recognition (metric-learning)

57

self-driving cars

58

Age and Gender Recognition

59

Body Pose Estimation

60

Amazon Go

61

Amazon Go - Seattle, Jan 2018

[Oriol et al 2015]

Image captioning

Other Domains• DeepMind’s AlphaGo was trained to play the game Go.

• It uses Deep Learning.

• It was believed that a computer program playing Go cannot beat a human player as the number of possibilities grows very fast (bigger than the number of atoms in the universe).

• On March 2016 AlphaGo defeated the world champion Lee Sedol 4-1.

Google’s AlphaGo defeated a world class human player

https://www.youtube.com/watch?v=HT-UZkiOLv8

Creativity - A non-trivial generalization • In step 37 AlphaGo chose a move that seemed to be a mistake by the

commentators and raised contempt.

• After Lee Sedol understood the move, he needed a 20 min break to recover.

https://www.youtube.com/watch?v=HT-UZkiOLv8

Self Learning

• On 2017, DeepMind’s AlphaZero was trained to play Chess.

• The machine trained itself without access to previous games.

• After 4 hours of training, the system won IBM’s DeepBlue.

IBM Deep Blue

• On 1997 IBM’s DeepBlue won the world champion Garry

Kasparov.

• DeepBlue was developed by ~100 people during 10 years!

Machine learning from machine

Machine learning from machine (GAN)• Two deep networks:

– Network D is trained to distinguish between real image to fake image.

– Network G is trained to deceive network D

– The two networks learn from each and improve along time.

Computer generated images

70

Image-to-image translation• Two deep networks:

– Network D is trained to distinguish between images form domain A and image from domain B.

– Network G is getting an image from domain A and trained to modify it towards deceiving network D.


Image-to-image translation• Two deep networks:

– Network D is trained to distinguish between image form domain A to image from domain B.

– Network G is getting an image from domain A and trained to modify it towards deceiving network D


Image-to-image translation

Sometimes it fails…

• Will machine replace humans?

• Are machine close to pass the Turing test?

76

77

Semantic understanding

78

Semantic understanding

79

Context Reasoning

80

Sense of Humor

The New-Yorker Cartoon Caption Contest

• To present knowledge, including general knowledge, to

plan and use strategy, to solve puzzles.

• Ability to plan.

Ability to apply self-learning

Ability to communicate with a natural language.

• Long way to go…

• But we are (arguably) getting there.

• No doubt, we’re going to have an exciting and fascinating

decade.

81

Strong Artificial Intelligence

Everybody Dance Now…

https://www.youtube.com/watch?v=PCBTZh41Ris

Wish you all

a fruitful and enjoyable semester

83

Deep Learning

Documents

Transcript of Deep Learning