2
Who wrote the song “Kiss from a Rose”?
Question Analysis: POS/Parsing/NER
Query Formulation/ Template Extraction
2
Who wrote the song “Kiss from a Rose”?
Question Analysis: POS/Parsing/NER
Query Formulation/ Template Extraction
Knowledge Base Search/ Candidate Answer Generation
2
Who wrote the song “Kiss from a Rose”?
Question Analysis: POS/Parsing/NER
Query Formulation/ Template Extraction
Knowledge Base Search/ Candidate Answer Generation
Evidence Retrieval/ Candidate Scoring
2
Who wrote the song “Kiss from a Rose”?
Question Analysis: POS/Parsing/NER
Query Formulation/ Template Extraction
Knowledge Base Search/ Candidate Answer Generation
Answer Type Selection
Evidence Retrieval/ Candidate Scoring
2
Who wrote the song “Kiss from a Rose”?
Question Analysis: POS/Parsing/NER
Query Formulation/ Template Extraction
Knowledge Base Search/ Candidate Answer Generation
Answer Type Selection
Evidence Retrieval/ Candidate Scoring
Final Ranking
2
Who wrote the song “Kiss from a Rose”?
Question Analysis: POS/Parsing/NER
Query Formulation/ Template Extraction
Knowledge Base Search/ Candidate Answer Generation
Answer Type Selection
Evidence Retrieval/ Candidate Scoring
Final Ranking
Seal
3
Neural Network
External Knowledge
Classifier
Who wrote the song “Kiss from a Rose”?
Seal
Can we replace all of these modules with a single neural network?
3
Neural Network
External Knowledge
Classifier
Who wrote the song “Kiss from a Rose”?
Seal
Can we replace all of these modules with a single neural network?
Outline• Briefly: deep learning + NLP basics
• Factoid QA
• Reasoning-based QA
• Visual QA
• Future directions!
4
Let’s start with words• Represent words by low-dimensional vectors called
embeddings
• e.g., president [0.23, 1.3, -0.3, 0.43]
6
Computing Vectors for Questions• How do we compose word embeddings into vectors
that capture the meanings of questions?
Who wrote Macbeth ?
7
Computing Vectors for Questions• How do we compose word embeddings into vectors
that capture the meanings of questions?
Who wrote Macbeth ? g( ) =
7
Computing Vectors for Questions• How do we compose word embeddings into vectors
that capture the meanings of questions?
Who wrote Macbeth ? g( ) =
7
Neural Net!
Recurrent Neural Networks
Who directed Predator ?c1 c2 c3 c4
8
h1 = f(W
. . .c1
�) h2 = f(W
h1
c2
�)
Hidden layer
Recurrent Neural Networks
Who directed Predator ?c1 c2 c3 c4
8
h1 = f(W
. . .c1
�) h2 = f(W
h1
c2
�) h3 = f(W
h2
c3
�)
Hidden layer
Recurrent Neural Networks
Who directed Predator ?c1 c2 c3 c4
8
h1 = f(W
. . .c1
�) h2 = f(W
h1
c2
�) h3 = f(W
h2
c3
�) h4 = f(W
h3
c4
�)
Hidden layer
Recurrent Neural Networks
Who directed Predator ?c1 c2 c3 c4
8
softmax: predict answer
h1 = f(W
. . .c1
�) h2 = f(W
h1
c2
�) h3 = f(W
h2
c3
�) h4 = f(W
h3
c4
�)
Hidden layer
Recurrent Neural Networks
Who directed Predator ?c1 c2 c3 c4
8
softmax: predict answer
h1 = f(W
. . .c1
�) h2 = f(W
h1
c2
�) h3 = f(W
h2
c3
�) h4 = f(W
h3
c4
�)
More complex variants: LSTMs, GRUs
Hidden layer
Recursive Neural Networks
• g can also depend on a parse tree of the input text sequence
Who directed ?c1 c2 c3 c4
9
Predator
Recursive Neural Networks
• g can also depend on a parse tree of the input text sequence
Who directed ?c1 c2 c3 c4
9
Predator
h1 = f(W
c2c3
�)
Recursive Neural Networks
• g can also depend on a parse tree of the input text sequence
Who directed ?c1 c2 c3 c4
9
Predator
h1 = f(W
c2c3
�)
h2 = f(W
h1
c4
�)
Recursive Neural Networks
• g can also depend on a parse tree of the input text sequence
Who directed ?c1 c2 c3 c4
9
Predator
h1 = f(W
c2c3
�)
h2 = f(W
h1
c4
�)
h3 = f(W
c1h2
�)
Recursive Neural Networks
• g can also depend on a parse tree of the input text sequence
Who directed ?c1 c2 c3 c4
softmax: predict answer
9
Predator
h1 = f(W
c2c3
�)
h2 = f(W
h1
c4
�)
h3 = f(W
c1h2
�)
Deep Averaging Networks
Who directed Predator ?c1 c2 c3 c4
av =4X
i=1
ci4
z1 = f(W1 · av)
z2 = f(W2 · z1)
10
Deep Averaging Networks
Who directed Predator ?c1 c2 c3 c4
av =4X
i=1
ci4
softmax: predict answer
z1 = f(W1 · av)
z2 = f(W2 · z1)
10
Softmax Answer Classification• Multinomial logistic regression
• Output is a distribution over a finite set of answers
• Later on: a max-margin answer ranking approach can yield better results
11
yp = softmax(Wans · hq)
softmax(q) =exp q
Pkj=1 exp qj
How do we train these models?
• Model parameters learned through variants of backpropagation (Rumelhart et al., 1986; Goller and Kuchler, 1996) given QA pairs as input
• In theory, use the chain rule to compute partial derivatives of the error function with respect to every parameter
• In practice, use Theano (or Torch) and never have to compute any derivatives by hand!
12
Factoid QA• Given a description of an entity, identify the person,
place, or thing discussed.
• Neural nets never previously applied to this task
• Traditionally approached using information retrieval, querying huge knowledge bases for the answer
14
Quiz BowlThis creature has female counterparts named Penny and Gown.
15
This creature appears dressed in Viking armor and carrying an ax when he is used as the mascot of PaX, a least privilege protection patch.
Quiz BowlThis creature has female counterparts named Penny and Gown.
15
This creature appears dressed in Viking armor and carrying an ax when he is used as the mascot of PaX, a least privilege protection patch.This creature’s counterparts include Daemon on the Berkeley Software Distribution, or BSD.
Quiz BowlThis creature has female counterparts named Penny and Gown.
15
This creature appears dressed in Viking armor and carrying an ax when he is used as the mascot of PaX, a least privilege protection patch.This creature’s counterparts include Daemon on the Berkeley Software Distribution, or BSD.For ten points, name this mascot of the Linux operating system, a penguin whose name refers to formal male attire.
Quiz BowlThis creature has female counterparts named Penny and Gown.
15
This creature appears dressed in Viking armor and carrying an ax when he is used as the mascot of PaX, a least privilege protection patch.This creature’s counterparts include Daemon on the Berkeley Software Distribution, or BSD.For ten points, name this mascot of the Linux operating system, a penguin whose name refers to formal male attire.
Answer: Tux
Two Neural Models
• Dependency-tree recursive neural network (DT-RNN)
• Deep averaging network (DAN)
• Both models are initialized with pretrained word2vec embeddings and have the same hidden layer dimensionality for fair comparison
17
Experimental Datasets
• History: 4,415 QA pairs with 16,895 sentences and 451 unique answers
• Literature: 5,685 QA pairs with 21,549 sentences and 595 unique answers
18
Choosing an Error Function
• Answers can appear as part of question text (e.g., a question on World War II might mention the Battle of the Bulge and vice versa)
• Instead of using a softmax output layer, can we take advantage of these co-occurrences by modeling answers and questions in the same vector space?
19
Max-Margin Objective• Replace softmax output layer with a contrastive max-
margin function
• Given a question q with correct answer a and an incorrect answer b, the loss is
20
max(0, 1� xa · hq + xb · hq)
Geometric Intuition
21
Q: name this fascist leader of Italy
Benito Mussolini
Rihanna
Walt Disney
Maximize inner product
Geometric Intuition
21
Q: name this fascist leader of Italy
Benito Mussolini
Rihanna
Walt Disney
Maximize inner productMinimize inner product
Experiments
22
History Literature
Model Pos 1 Pos 2 Full Pos 1 Pos 2 Full
BOW-DT 35.4 57.7 60.2 24.4 51.8 55.7IR 37.5 65.9 71.4 27.4 54.0 61.9DAN 46.4 70.8 71.8 35.3 67.9 69.0DT-RNN 47.1 72.1 73.7 36.4 68.2 69.1
DAN is 20 times faster to train than DT-RNN
Learning a Vector Space
23
TSNE-1
TSNE-2
Wars, rebellions, and battlesU.S. presidentsPrime ministersExplorers & emperorsPoliciesOther
tammany_hall
calvin_coolidge
lollardy
fourth_crusade
songhai_empire
peace_of_westphalia
inca_empire
atahualpa
charles_sumner
john_paul_jones
wounded_knee_massacre
huldrych_zwingli
darius_i
battle_of_ayacucho
john_cabotghana
ulysses_s._grant
hartford_conventioncivilian_conservation_corps
roger_williams_(theologian)
george_h._pendleton
william_mckinley
victoria_woodhull
credit_mobilier_of_america_scandal henry_cabot_lodge,_jr.
mughal_empire
john_marshall
cultural_revolution
guadalcanal
louisiana_purchase
night_of_the_long_knives
chandragupta_maurya
samuel_de_champlain
thirty_years'_war
compromise_of_1850
battle_of_hastings
battle_of_salamis
akbar
lewis_cass
dawes_plan
hernando_de_soto
carthage
joseph_mccarthy
maine
salvador_allende
battle_of_gettysburg
mikhail_gorbachev
aaron_burr
equal_rights_amendment
war_of_the_spanish_succession
coxey's_army
george_meade
fourteen_points
mapp_v._ohiosam_houston
ming_dynasty
boxer_rebellion
anti-masonic_party
porfirio_diaz
treaty_of_portsmouth
thebes,_greece
golden_horde
francisco_i._madero
hittites
james_g._blaineschenck_v._united_states
caligula
william_walker_(filibuster)
henry_vii_of_england
konrad_adenauer
kellogg-briand_pact
battle_of_culloden
treaty_of_brest-litovsk
william_penn
a._philip_randolph
henry_l._stimson
whig_party_(united_states)
caroline_affairclarence_darrow
whiskey_rebellion
battle_of_midway
battle_of_lepanto
adolf_eichmann
georges_clemenceau
battle_of_the_little_bighornpontiac_(person)
black_hawk_war
battle_of_tannenberg
clayton_antitrust_act
provisions_of_oxford
battle_of_actium
suez_crisis
spartacus
dorr_rebellion
jay_treaty
triangle_shirtwaist_factory_fire
kamakura_shogunate
julius_nyerere
frederick_douglass
pierre_trudeau
nagasaki
suleiman_the_magnificent
falklands_war
war_of_devolution
charlemagne
daniel_boone
edict_of_nantes
harry_s._truman
shaka
pedro_alvares_cabral
thomas_hart_benton_(politician)
battle_of_the_coral_sea
peterloo_massacre
battle_of_bosworth_field
roger_b._taney
bernardo_o'higgins
neville_chamberlain
henry_hudson
cyrus_the_great
jane_addams
rough_riders
james_a._garfield
napoleon_iii
missouri_compromise
battle_of_leyte_gulf
ambrose_burnside
trent_affair
maria_theresa
william_ewart_gladstone
walter_mondale
barry_goldwaterlouis_riel
hideki_tojo
marco_polo
brian_mulroney
truman_doctrine
roald_amundsen
tokugawa_shogunate
eleanor_of_aquitaine
louis_brandeis
battle_of_trenton
khmer_empire
benito_juarez
battle_of_antietam
whiskey_ring
otto_von_bismarck
booker_t._washington
battle_of_bannockburneugene_v._debs
erie_canal
jameson_raid
green_mountain_boys
haymarket_affair
finland
fashoda_incident
battle_of_shiloh
hannibal
john_jay
easter_rising
jamaica
brook_farm
umayyad_caliphate
muhammad
francis_drake
clara_barton
shays'_rebellionverdun
hadrianvyacheslav_molotovoda_nobunaga
canossa
samuel_gompers
battle_of_bunker_hillbattle_of_plassey
david_livingstone
solonpericles
tang_dynasty
teutonic_knights
second_vatican_council
alfred_dreyfus
henry_the_navigator
nelson_mandela
peasants'_revolt
gaius_marius
getulio_vargas
horatio_gates
john_t._scopes
league_of_nations
first_battle_of_bull_run
alfred_the_great
leonid_brezhnev
cherokee
long_march
emiliano_zapata
james_monroe
woodrow_wilson
vandals
william_henry_harrison
battle_of_puebla
battle_of_zama
justinian_i
thaddeus_stevens
cecil_rhodes
kwame_nkrumah
diet_of_worms
george_armstrong_custer
battle_of_agincourt
seminole_wars
shah_jahan
amerigo_vespucci
john_foster_dulles
lester_b._pearson
oregon_trail
claudius
lateran_treaty
chester_a._arthur
opium_wars
treaty_of_utrechtknights_of_labor
alexander_hamilton
plessy_v._ferguson
horace_greeley
mary_baker_eddy
alexander_kerensky
jacquerie
treaty_of_ghentbay_of_pigs_invasion
antonio_lopez_de_santa_anna
great_northern_war
henry_i_of_england
council_of_trent
chiang_kai-shek
samuel_j._tilden
fidel_castro
wilmot_proviso
yuan_dynasty
bastille
benjamin_harrison
war_of_the_austrian_successioncrimean_war
john_brown_(abolitionist)
teapot_dome_scandal
albert_b._fall
marcus_licinius_crassus
earl_warren
warren_g._harding
gunpowder_plot
homestead_strike
samuel_adams
john_peter_zenger
thomas_paine
free_soil_party
st._bartholomew's_day_massacre
arthur_wellesley,_1st_duke_of_wellington
charles_de_gaulle
leon_trotsky
hugh_capet
alexander_h._stephens
haile_selassie
william_h._seward
rutherford_b._hayes
safavid_dynasty
muhammad_ali_jinnah
kulturkampf
maximilien_de_robespierre
hubert_humphrey
luddite
hull_house
philip_ii_of_macedon
guelphs_and_ghibellines
byzantine_empire
albigensian_crusade
diocletian
fort_ticonderoga
parthian_empire
charles_martel
william_jennings_bryan
alexander_ii_of_russia
ferdinand_magellan
state_of_franklin
ivan_the_terrible
martin_luther_(1953_film)
millard_fillmore
francisco_franco
aethelred_the_unready
ronald_reagan
benito_mussolini
henry_clay
kitchen_cabinet
black_hole_of_calcutta
ancient_corinth
john_wilkes_booth
john_tyler
robert_walpole
huey_long
tokugawa_ieyasu
thomas_nast
nikita_khrushchev
andrew_jackson
portugal
labour_party_(uk)
monroe_doctrine
john_quincy_adams
congress_of_berlin
tecumseh
jacques_cartier
battle_of_the_thames
spanish_civil_war
ethiopia
fugitive_slave_laws
john_a._macdonald
council_of_chalcedon
pancho_villa
war_of_the_pacific
george_wallace
susan_b._anthony
marcus_garvey
grover_clevelandjohn_hay
george_b._mcclellan
october_manifesto
vitus_bering
john_hancock
william_lloyd_garrison
platt_amendment
mary,_queen_of_scots
first_triumvirate
francisco_vasquez_de_coronado
margaret_thatcher
sherman_antitrust_act
hanseatic_league
henry_morton_stanley
july_revolution
stephen_a._douglas
xyz_affair
jimmy_carter
francisco_pizarro
kublai_khan
vasco_da_gama
sparta
battle_of_caporetto
ostend_manifesto
mustafa_kemal_ataturk
peter_the_great
gang_of_four
battle_of_chancellorsville
david_lloyd_george
cardinal_mazarin
embargo_act_of_1807
brigham_young
charles_lindbergh
hudson's_bay_company
attila
paris_commune
jefferson_davis
amelia_earhart
mali_empire
adolf_hitler
benedict_arnold
camillo_benso,_count_of_cavour
meiji_restoration
black_panther_party
mark_antony
franklin_pierce
molly_maguires
zachary_taylor
han_dynasty
adlai_stevenson_ii
james_k._polk
douglas_macarthur
boston_massacre
toyotomi_hideyoshi
greenback_party
second_boer_warthird_crusade
james_buchanan
john_sherman
george_washington
wars_of_the_roses
atlantic_charter
eleanor_roosevelt
congress_of_vienna
john_wycliffe
winston_churchill
emilio_aguinaldo
miguel_hidalgo_y_costilla
second_bank_of_the_united_states
council_of_constance
seneca_falls_convention
first_crusade
spiro_agnew
taiping_rebellion
mao_zedong
paul_von_hindenburg
albany_congress
jawaharlal_nehru
battle_of_blenheim
ethan_allen
antonio_de_oliveira_salazar
herbert_hoover
pepin_the_short
indira_gandhi
william_howard_taftthomas_jefferson
gamal_abdel_nasser
oliver_cromwell
salmon_p._chase
battle_of_austerlitz
benjamin_disraeli
gadsden_purchase
girolamo_savonarola
treaty_of_tordesillas
battle_of_marathon
elizabeth_cady_stanton
battle_of_kings_mountainchristopher_columbus
william_the_conqueror
battle_of_trafalgar
charles_evans_hughes
cleisthenes
william_tecumseh_sherman
mobutu_sese_seko
prague_spring
babur
peloponnesian_war
jacques_marquette
nero
paraguay
hyksos
martin_van_buren
bonus_army
charles_stewart_parnell
edward_the_confessor
bartolomeu_dias
salem_witch_trials
battle_of_the_bulge
john_adams
maginot_line
henry_cabot_lodge
giuseppe_garibaldi
daniel_webster
john_c._calhoun
treaty_of_waitangi
zebulon_pike
genghis_khan
calvin_coolidgewilliam_mckinley
james_monroe
woodrow_wilson
william_henry_harrison
benjamin_harrison
millard_fillmore
ronald_reagan
john_tyler andrew_jacksonjohn_quincy_adams
grover_cleveland
jimmy_carter
franklin_pierce
zachary_taylor
james_buchanan
george_washington
herbert_hooverwilliam_howard_taft
thomas_jefferson
martin_van_buren
john_adams
Exact String Matches• One current weakness of neural models
• Practical solution: train language model on original source material / Wikipedia and combine with output of neural network
24
In this poem, the narrator meets a “traveller from an antique land” who tells of a statue with a “wrinkled lip,
and sneer of cold command”.
QA: Man vs. Machine• Scaled up a DAN to handle ~100k Q/A pairs with
~5k unique answers! Also added thousands of Wikipedia sentence/page-title pairs
• To play against humans, we need to decide not only what answer to give but also when we are confident enough to buzz in. • Another classifier re-ranks the top 200 guesses of
the DAN using language model features to decide whether to buzz on any of them or wait for more clues
25
Code available!
DT-RNN code: cs.umd.edu/~miyyer/qblearn DAN code: github.com/miyyer/dan Full quiz bowl system code: github.com/miyyer/qb Video of Ken Jennings match: youtu.be/kTXJCEvCDYk
1. Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daumé III. A Neural Network for Factoid Question Answering over Paragraphs. EMNLP 2014.
2. Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daumé III. Deep Unordered Composition Rivals Syntactic Methods for Text Classification. ACL 2015.
28
30
John moved to the bedroom. Mary grabbed the football there. Sandra journeyed to the bedroom. Sandra went back to the hallway. Mary moved to the garden. Mary journeyed to the office.
Where is the football?
30
John moved to the bedroom. Mary grabbed the football there. Sandra journeyed to the bedroom. Sandra went back to the hallway. Mary moved to the garden. Mary journeyed to the office.
Where is the football?
30
John moved to the bedroom. Mary grabbed the football there. Sandra journeyed to the bedroom. Sandra went back to the hallway. Mary moved to the garden. Mary journeyed to the office.
Where is the football?
30
John moved to the bedroom. Mary grabbed the football there. Sandra journeyed to the bedroom. Sandra went back to the hallway. Mary moved to the garden. Mary journeyed to the office.
Where is the football?
30
John moved to the bedroom. Mary grabbed the football there. Sandra journeyed to the bedroom. Sandra went back to the hallway. Mary moved to the garden. Mary journeyed to the office.
Where is the football?
Naïve Neural Approach
31
image: http://smerity.com/articles/2015/keras_qa.html
Naïve Neural Approach
31
softmax: predict answer
image: http://smerity.com/articles/2015/keras_qa.html
Problems• Doesn’t scale to long / complex question types
• RNNs/LSTMs are very bad at remembering facts from the distant past!
• Solution: add an external memory component that learns to store important facts and reason about them
32
Dynamic Memory Networks• Collaboration with Richard Socher and colleagues
from MetaMind
• Extends simple RNNs with an iterative attention mechanism that focuses on one fact at a time and enables transitive reasoning
33
Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, and Richard Socher. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. NIPS Deep Learning Symposium, 2015.
34
1. Compute vector si for every sentence in input and vector q for the question using recurrent network A
34
1. Compute vector si for every sentence in input and vector q for the question using recurrent network A
2. Compute an attention score ai for every sentence
ai = G(si,mt�1, q)
34
1. Compute vector si for every sentence in input and vector q for the question using recurrent network A
2. Compute an attention score ai for every sentence
3. Compute an episodic memory mt by weighting each si with its corresponding ai and passing them through another recurrent network B
ai = G(si,mt�1, q)
34
1. Compute vector si for every sentence in input and vector q for the question using recurrent network A
2. Compute an attention score ai for every sentence
3. Compute an episodic memory mt by weighting each si with its corresponding ai and passing them through another recurrent network B
4. Repeat until network B outputs a “finished reading” signal
ai = G(si,mt�1, q)
34
1. Compute vector si for every sentence in input and vector q for the question using recurrent network A
2. Compute an attention score ai for every sentence
3. Compute an episodic memory mt by weighting each si with its corresponding ai and passing them through another recurrent network B
4. Repeat until network B outputs a “finished reading” signal
5. Feed final episodic memory m to a softmax layer to predict the answer
ai = G(si,mt�1, q)
35
John moved to the bedroom. Mary grabbed the football there.
Sandra journeyed to the bedroom. Sandra went back to the hallway.
Mary moved to the garden. Mary journeyed to the office.
Where is the football?
35
John moved to the bedroom. Mary grabbed the football there.
Sandra journeyed to the bedroom. Sandra went back to the hallway.
Mary moved to the garden. Mary journeyed to the office.
Where is the football?
35
John moved to the bedroom. Mary grabbed the football there.
Sandra journeyed to the bedroom. Sandra went back to the hallway.
Mary moved to the garden. Mary journeyed to the office.
Where is the football?
m1
0.10.10.00.00.80.0
35
John moved to the bedroom. Mary grabbed the football there.
Sandra journeyed to the bedroom. Sandra went back to the hallway.
Mary moved to the garden. Mary journeyed to the office.
Where is the football?
m1
0.10.10.00.00.80.0
0.20.60.00.00.20.0
m2
35
John moved to the bedroom. Mary grabbed the football there.
Sandra journeyed to the bedroom. Sandra went back to the hallway.
Mary moved to the garden. Mary journeyed to the office.
Where is the football?
m1
0.10.10.00.00.80.0
0.20.60.00.00.20.0
m2
0.70.20.00.00.10.0
m3
35
John moved to the bedroom. Mary grabbed the football there.
Sandra journeyed to the bedroom. Sandra went back to the hallway.
Mary moved to the garden. Mary journeyed to the office.
Where is the football?
m1
0.10.10.00.00.80.0
0.20.60.00.00.20.0
m2
0.70.20.00.00.10.0
m3m3
softmax: predict answer
Evaluation: FB bAbi• 20 very simple tasks (e.g., counting, basic deduction,
induction, coreference)
• DMNs solve 18 out of 20 tasks with over 95% accuracy, comparable to other baselines that use hand-engineered features (e.g., n-grams, positional features)
• Can also be applied to many other NLP tasks (what is the sentiment of this sentence? what is this sentence’s translation in French?)
36
38
• Is this truck considered “vintage”?
• Does the road look new? • What kind of tree is
behind the truck?
VisualQA Dataset• collaboration between Virginia Tech and Microsoft
Research
• questions were created and answered by Amazon Turkers (800k questions on 250k images)
39
“We have built a smart robot. It understands a lot about images. It can recognize and name all
the objects, it knows where the objects are, it can recognize the scene (e.g., kitchen, beach), people’s expressions and
poses, and properties of objects (e.g., color of objects, their texture). Your task is to stump this smart robot!”
Brief Aside: ConvNets
40
Convolutional Layers: slide a set of small
filters over the image
Pooling Layers: reduce dimensionality
of representation
image: https://cs231n.github.io/convolutional-networks/
Naïve VisualQA• i = ConvNet(image) > use an existing network trained
for image classification and freeze weights
• q = RNN(question) > learn weights
• answer = softmax([i;q])
42
Visual Attention• Use the question representation q to determine
where in the image to look
43
How many benches are shown?
Visual Attention• Use the question representation q to determine
where in the image to look
43
How many benches are shown?
44
How many benches are shown?
0.2
0.20.3
0.1
0.05
0.05
0.05 0.05 0.0
0.00.0
0.0 0.0 0.0
0.0
0.0 0.0 0.0
0.00.00.00.00.00.0
attention over final convolutional layer in network: 196 boxes, captures
color and positional information
44
How many benches are shown?
0.2
0.20.3
0.1
0.05
0.05
0.05 0.05 0.0
0.00.0
0.0 0.0 0.0
0.0
0.0 0.0 0.0
0.00.00.00.00.00.0
softmax: predict answer
attention over final convolutional layer in network: 196 boxes, captures
color and positional information
Issues• Visual attention is more complicated than textual
attention; requires many more QA pairs than are currently available
• focusing on more than one “box” at a time is difficult for the current model; perhaps an iterative attention mechanism like the DMN’s can solve this problem
• Work in progress, full evaluation coming soon!
45
Future Trends• Neural networks with attention mechanisms are
cutting-edge models with broad applications!
• With more data and bigger networks, we can begin to answer more complex questions
• Multi-task learning, such as a single model that learns to reason over both text and images
47
A Major Limitation• All of these networks generalize very poorly to new
facts or information at test-time, would fail at:
48
xxwf moved to the rfecs. dawas grabbed the gndsa there. gfdg journeyed to the klnmkb. gfdg went back to the aqqs. dawas moved to the mnsh. dawas journeyed to the taaaed.
Where is the gndsa?
Constants vs. Variables• Currently, every word in a question is represented
with an embedding.
• This doesn’t make much sense for numbers, proper names, or other entities
49
An amusement park sells 2 kinds of tickets. Tickets for children cost $1.50. Adult tickets cost $4. On a certain day, 278 people entered the park. On that same day the admission fees collected totaled $792.
How many children were admitted on that day?
Top Related