Download - 2 1 Introduction to Neural Networks

Neural Networks NN 1 1

Neural Networks

Teacher:Elena

MarchioriR4.47

[email protected]

Assistant: Marius Codrea

S4.16 [email protected]


Course OutlineThe course is divided in two parts:

theory and practice. 1. Theory covers basic topics in

neural networks theory and application to supervised and unsupervised learning.

2. Practice deals with basics of Matlab and application of NN learning algorithms.


Course Information• Register for practicum: send

email to [email protected] with:1. Subject: NN practicum2. Content: names, study numbers, study

directions (AI,BWI,I, other)• Course information, plan, slides

and links to on-line material are available at

http://www.cs.vu.nl/~elena/nn.html


Course Evaluation• Course value: 6 ects• Evaluation is based on the

following two parts: – theory (weight 0.5): final exam at

the end of the course consisting of questions about theory part. (Dates to be announced)

– practicum(weight 0.5): Matlab programming assignments to be done in couples (Available during the course at http://www.few.vu.nl/~codrea/nn)


What are Neural Networks?

• Simple computational elements forming a large network– Emphasis on learning (pattern recognition)

– Local computation (neurons)

• Definition of NNs is vague– Often | but not always | inspired by biological brain


History• Roots of work on NN are in:• Neurobiological studies (more than one century

ago):• How do nerves behave when stimulated by

different magnitudes of electric current? Is there a minimal threshold needed for nerves to be activated? Given that no single nerve cel is long enough, how do different nerve cells communicate among each other?

• Psychological studies:• How do animals learn, forget, recognize and

perform other types of tasks?• Psycho-physical experiments helped to understand

how individual neurons and groups of neurons work.• McCulloch and Pitts introduced the first

mathematical model of single neuron, widely applied in subsequent work.


HistoryPrehistory: • Golgi and Ramon y Cajal study the nervous system

and discover neurons (end of 19th century)History (brief):• McCulloch and Pitts (1943): the first artificial

neural network with binary neurons• Hebb (1949): learning = neurons that are together

wire together• Minsky (1954): neural networks for reinforcement

learning• Taylor (1956): associative memory• Rosenblatt (1958): perceptron, a single neuron for

supervised learning


History• Widrow and Hoff (1960): Adaline• Minsky and Papert (1969): limitations of single-

layer perceptrons (and they erroneously claimed that the limitations hold for multi-layer perceptrons)

Stagnation in the 70's:• Individual researchers continue laying foundations• von der Marlsburg (1973): competitive learning and

self-organizationBig neural-nets boom in the 80's• Grossberg: adaptive resonance theory (ART)• Hopfield: Hopfield network• Kohonen: self-organising map (SOM)


History

• Oja: neural principal component analysis (PCA)• Ackley, Hinton and Sejnowski: Boltzmann machine• Rumelhart, Hinton and Williams: backpropagationDiversification during the 90's:• Machine learning: mathematical rigor, Bayesian

methods, infomation theory, support vector machines (now state of the art!), ...

• Computational neurosciences: workings of most subsystems of the brain are understood at some level; research ranges from low-level compartmental models of individual neurons to large-scale brain models


Course Topics Learning Tasks

Supervised UnsupervisedData:Labeled examples (input , desired output)

Tasks:classificationpattern recognition regressionNN models:perceptron adalinefeed-forward NN radial basis functionsupport vector machines

Data:Unlabeled examples (different realizations of the input)

Tasks:clusteringcontent addressable memory

NN models:self-organizing maps (SOM)Hopfield networks


NNs: goal and design– Knowledge about the learning task is given in the form of a set of examples (dataset) called training examples.

– A NN is specified by:• an architecture: a set of neurons and links connecting neurons. Each link has a weight,

• a neuron model: the information processing unit of the NN,

• a learning algorithm: used for training the NN by modifying the weights in order to solve the particular learning task correctly on the training examples.

The aim is to obtain a NN that generalizes well, that is, that behaves correctly on new examples of the learning task.


Example: AlvinnAutonomous driving at 70 mph on a public highway

Camera image

30x32 pixelsas inputs

30 outputsfor steering 30x32 weights

into one out offour hiddenunit

4 hiddenunits


Dimensions of a Neural Network

• network architectures• types of neurons• learning algorithms• applications


Network architectures

• Three different classes of network architectures

– single-layer feed-forward neurons are organized– multi-layer feed-forward in acyclic layers– recurrent

• The architecture of a neural network is linked with the learning algorithm used to train


Single Layer Feed-forward

Input layerof

source nodes

Output layerof

neurons


Multi layer feed-forward

Inputlayer

Outputlayer

Hidden Layer

3-4-2 Network


Recurrent Network with hidden neuron: unit delay operator z-1

is used to model a dynamic system

z-1

z-1

z-1

Recurrent network

inputhiddenoutput


The Neuron

Inputvalues

weights

Summingfunction

Biasb

ActivationfunctionLocal

Fieldv Output

y

x1

x2

xm

w2

wm

w1

)(

………….


Input Signal and Weights

Input signalsAn input may be either a raw / preprocessed signal or

image. Alternatively, some specific features can also be

used.If specific features are used

as input, their number and selection is crucial and application dependent

WeightsWeights are connectedbetween an input and asumming node. These affect

to the summing operation. The quality of network can

be seen from weightsBias is a constant input

with certain weight. Usually the weights are randomized in the beginning


The Neuron• The neuron is the basic information processing unit of a NN. It consists of:1 A set of links, describing the neuron inputs, with weights W1, W2, …, Wm

2 An adder function (linear combiner) for computing the weighted sum of the inputs (real numbers):

3 Activation function (squashing function) for limiting the amplitude of the neuron output.

m

1jj xwu

j

) (u y b


Bias of a Neuron • The bias b has the effect of applying an affine transformation to the weighted sum u

v = u + b• v is called induced field of the neuron

x2x1 u x1-x2=0

x1-x2= 1

x1

x2

x1-x2= -1


Bias as extra input

Inputsignal

Synapticweights

Summingfunction

ActivationfunctionLocal

Fieldv Output

y

x1

x2

xm

w2

wm

w1

)(

w0x0 = +1

• The bias is an external parameter of the neuron. It can be modeled by adding an extra input.

bw

xwv jm

jj

0

0

…………..


Activation Function There are different activation functions used in

different applications. The most common ones are:

Hard-limiter

Piecewise linear

Sigmoid

Hyperbolic tangent

0001

vifvif

v

2102121211

vifvifv

vifv )exp(1

1av

v

vv tanh


Neuron Models• The choice of determines the neuron model. Examples:• step function:

• ramp function:

• sigmoid function: with z,x,y parameters

• Gaussian function:

2

21exp

21)(

vv

)exp(11)(

yxvzv

otherwise ))/())(((

if if

)(cdabcva

dvbcva

v

cvbcva

v if if )(


Learning Algorithms

Depend on the network architecture:• Error correcting learning (perceptron)

• Delta rule (AdaLine, Backprop)• Competitive Learning (Self Organizing Maps)


Applications• Classification:

– Image recognition – Speech recognition– Diagnostic– Fraud detection– …

• Regression:– Forecasting (prediction on base of past history)– …

• Pattern association:– Retrieve an image from corrupted one– …

• Clustering: – clients profiles– disease subtypes– …


Supervised learning

Non-linear classifiers

Linear classifiersPerceptron Adaline

Feed-forward networksRadial basis function networks

Support vector machines

Unsupervised learning

Clustering

Content addressable memoriesOptimization Hopfield networks

Self-organizing maps K-means


Vectors: basics (slides by Hyoungjune Yi:

[email protected])• Ordered set of numbers: (1,2,3,4)

• Example: (x,y,z) coordinates of pt in space.

runit vecto a is ,1 If12

),,2,1(

vv

n

iixv

nxxxv

mailto:[email protected]


Vector Addition

),(),(),( 22112121 yxyxyyxx wv

vvww

V+wV+w


Scalar Product

),(),( 2121 axaxxxaa v

vvavav


Operations on vectors• sum• max, min, mean, sort, …• Pointwise: .^


Inner (dot) Productvv

ww

22112121 .),).(,(. yxyxyyxxwv

The inner product is a The inner product is a SCALAR!SCALAR!

cos||||||||),).(,(. 2121 wvyyxxwv

wvwv 0.


Matrices

nmnn

m

m

m

mn

aaa

aaaaaaaaa

A

21

33231

22221

11211

mnmnmn BAC Sum:Sum:

ijijij bac

A and B must have the A and B must have the same dimensionssame dimensions


Matrices

pmmnpn BAC Product:Product:

m

kkjikij bac

1

A and B must have A and B must have compatible dimensionscompatible dimensions

nnnnnnnn ABBA

Identity Matrix:

AAIIAI

100

010001


Matrices

mnT

nm AC Transpose:Transpose:

jiij ac TTT ABAB )(

TTT BABA )(

IfIf AAT A is symmetricA is symmetric


Matrices

IAAAA nnnnnnnn

11

Inverse:Inverse: A must be squareA must be square

1121

1222

12212211

1

2221

1211 1aaaa

aaaaaaaa


2D Translation

ttPP

P’P’


2D Translation Equation

PP

xx

yy

ttxx

ttyyP’P’

tt

tPP ),(' yx tytx

),(),(

yx ttyx

tP


2D Translation using Matrices

PP

xx

yy

ttxx

ttyyP’P’

tt),(),(

yx ttyx

tP

11

001' y

x

tt

tytx

y

x

y

xP

tt PP


Scaling

PP

P’P’


Scaling Equation

PP

xx

yy

s.xs.x

P’P’s.ys.y

),('),(sysx

yx

PP

PP s'

yx

ss

sysx

00'P

SPSP '