Post on 07-Jan-2023
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 1
Can Blog Communication Dynamics be
correlated with Stock Market Activity?
Munmun De Choudhury
Hari Sundaram
Ajita John
Dorée Duncan Seligmann
November 5, 2008
Arts, Media & Engineering
Arizona State University, Tempe
Collaborative Applications Research
Avaya Labs, New Jersey
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 2
Introduction
� Why is it important?
• Provides insights into understanding communication patterns of people.
• The communication dynamics yield correlations with certain external events, justifying their predictive power.
• Useful to corporations interested in identifying the ‘moods’ of people in response to company related events.
A model to: (a) study and analyze communication dynamics in the blogosphere, and (b) use
these dynamics to determine interesting correlations with stock market movement.
November 5, 2008
Figure: Normalized predicted stock movements of four
companies . Red represents negative movement, and blue
positive movement.
Graph for Blog communication Stock Movement data
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 3
Our Approach
� Characterize the communication dynamics in www.engadget.com through several contextual features for a particular company.
� Use these features and stock market movement in an SVM regressor to predict the movement.
� Our technique yields satisfactory results with errors of 22 % and 13 % in predicting the magnitude and direction of movement respectively.
B1 B2 SVR Actual B1 B2 SVR Actual B1 B2 SVR Actual
APPLE GOOGLE MICROSOFT NOKIA
B1 B2 SVR Actual
Positive stock movement
Negative stock movement
Time
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 4
Related Work
� Work on determining correlations between activity in Internet message boards (through frequency counts of relevant messages) and stock volatility and trading volume by Antweiler et al.
� Gruhl’s work on determining if blog data exhibit any recognizable pattern prior to spikes in the ranking of the sales of books on Amazon.com.
� Limitation:• Modeling information roles and communication dynamics in the prior work
have been done in a context-independent manner.
November 5, 2008
Antweiler et al.
Gruhl et al.
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 5
Introduction / Related work
Problem StatementModeling Information (Activity) Roles
Modeling Communication Dynamics
Determining Correlation
Experimental Results
Conclusions
Outline
November 5, 2008 5
• What is online communication
activity?
• Data Model
Posts
Comments
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 6
What is online communication activity?
� Blogs serve as rich sources of active discussion on a wide variety of events.
� In a technically blog community like Engadget, editors post news stories, and only authorized readers can comment, thus creating a high quality information source.
� Communication activity in a blog refers to the communication participation of agroup of users who perform specific actions with respect to certain events.
• In this paper, the events are denoted simply by the posts related to a company name.
� We are interested in observing how these activity dynamics could be predictors of the events they relate to.
Time
Event is supposed to occur Blog communication activity Event occurs
iPhone release
Posts
Comments
Stock movement for ‘Apple’
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 7
Data Model
� The data model can be described by a 3-tuple {V, Σ, Ψ} having the following entities with respect to a query:• V: A set of individuals – bloggers and commenters, V = {i | i = 1 … N},
• Σ : A set of posts Σ; each post p is associated with an author (blogger) i, a time stamp of posting ts and a tag cloud Φp, giving the following tuple, {(i , ts, Φp)} for each p,
• V: A set of comments Ψ on each post p in Σ; each comment has an author (commenter) j and time stamp tm tuple {(j,tm)}.
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 8
Introduction / Related work
Problem Statement
Modeling Information
(Activity) Roles
Modeling Communication Dynamics
Determining Correlation
Experimental Results
Conclusions
Outline
November 5, 2008 8
• Roles due to responsivity
• Roles due to measure of
participation
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 9
What are Information roles due to activity?
� People regularly involved in communication develop certain structures which characterizes their communication behavior. This we call an information role due to activity.
� The information roles of people characterize the nature of communication dynamics in the community.
� We define roles along two dimensions:• Roles due to response behavior
o Early Responders
o Late Trailers
• Roles due to measure of communication activity
o Loyals
o Outliers
People respond quickly A highly responsive
community; showed by short
distance between nodes
People write several
comments frequentlyA highly active community;
showed by thickness of edges
Significance of analysis of information roles of people
Response time
Number of comments
Nu
mb
er
of
pe
op
leN
um
be
r o
f p
eo
ple
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 10
Activity Role due to Responsivity
� Response time of a comment is the time elapsed between the publishing of the original blog and the publishing of the comment.
� The normalized response time of a comment depends on
• the time at which it was published and
• the rank of the comment
� The response time rc(x) of post px is defined as,
� We classify a person P as early responder Eor late trailer T as,
Figure : Importance of rank of a comment.
Response Distribution of a Blog Post
0
0.2
0.4
0.6
0.8
1
1.2
1 4 7 10 13 16 19 22 25 28 31 34 37 40
Comment IDs
Res
pon
se T
ime
( )
( )1 2( ) 1 1
( )
m sc
c
e s
t tr x
t t n x
κθ θ −
= − + − −
Rank of comment cκ
Number of comments on px
nc(x)
Normalizing weightsθ1, θ
2
Time of last comment on px
te
Time of comment ctm
Time of posting of px
ts
If , then
otherwise
r P E
P T
ρ≥ ∈
∈
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 11
Activity Role due to measure of participation
� An activity distribution is constructed per topic where the measure of participation of an individual is the number of comments she wrote over a period of time.
� People are classified into loyals L and outliers O based on a suitably chosen threshold θ over their measure of participation across several weeks in the topic activity distribution.
� Let a person P post C comments over time [0, t].
Activity Distribution
0
20
40
60
80
100
120
osh
ean
Ch
arl
es
2ti
mer
AD
MIN
V
Cla
rio
n
CW
Bri
an
Bh
aal
Bo
zo
Bain
s
An
toin
e
Bre
nd
an
# C
om
men
ts
Figure : Activity distribution.
If , then
otherwise
C P L
P O
θ≥ ∈
∈
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 12
Introduction / Related work
Problem Statement
Modeling Information (Activity) Roles
Modeling Communication
Dynamics
Determining Correlation
Experimental Results
Conclusions
Outline
November 5, 2008 12
• Contextual features of
communication activity
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 13
Modeling Communication Dynamics
� For the set of all features, we assume that that the stock movement yt on a certain weekday depends on the communication activity over a period of time (we consider past week).
• Number of posts
• Number of comments
• Length, response time of comments
o Mean and standard deviations of the length and response times of comments:
ki: total number of posts on day (t – i)
lt-ic(x): mean length of comments on post px at day (t – i)
• Strength of comments
o In Engadget, every comment is marked for its significance by other users: highest ranked,
highly ranked, neutral, low ranked and lowest ranked.
• Size of early responders / late trailers set
• Size of loyals / outliers set
1
1( )
ikl c
t i t i
xi
l xk
µ − −
=
= ∑2
1
1( )
ikl c c
t i t i t i
xi
l x lk
σ−
− − −
=
= −
∑
,
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 14
Introduction / Related work
Problem Statement
Modeling Information (Activity) Roles
Modeling Communication Dynamics
Determining CorrelationExperimental Results
Conclusions
Outline
November 5, 2008 14
• Dataset
• Determining stock movement
of a company
• SVR Prediction of stock
movement
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 15
Dataset
1,94826010718Nokia
4,46943219422Microsoft
2,73134714219Google
5,09257327624Apple
#Comments#Commenters#Posts#EditorsTopics
� The stock movement of a company c at a day t to be the change in stock price from the closing value of the past day, normalized by the return of the past day.
, where φt is the stock price of the company at day t.
� Note: It is important to take into account the effect of the overall stock market sentiment as well.
• For example, a negative movement of the stock prices of a particular company may be attributed due to negative movements in the overall stock market index (e.g. NASDAQ, ISE, S&P500 etc).
� We use NASDAQ index (http://www.finance.google.com) for the four companies.
( )1
1
t tc
t
t
yϕ ϕ
ϕ
−
−
−=
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 16
SVR Prediction of Stock Movement
� Features vectors are determined from the crawled dataset http://www.engadget.com.
� We use these features and stock market movement of the company over N weeks for training an SVM regressor.
� The trained parameters are used to predict the movement at the (N+1)th week.
where X are the input features at each day. The convex optimization problem is given as,
where ε is the allowed precision.
, with , c
ty w x b w b= + ∈ ∈X ℝ
21minimize
2
,subject to
,
i i
i i
w
y w x b
w x b y
ε
ε
− − ≤
+ − ≤
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 17
Introduction / Related work
Problem Statement
Modeling Information (Activity) Roles
Modeling Communication Dynamics
Determining Correlation
Experimental Results
Conclusions
Outline
November 5, 2008 17
• Baseline Methods
• Prediction Results
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 18
Baseline Methods
� Comment Frequency (B1): This method uses the frequency of comments per day. We use a linear regressor to learn the correlation coefficients incrementally based on stock movements and number of comments:
,
where ŷct is the stock movement of company c at day t and fc
t-i is the frequency of comments i days before t.
� Linear Relationship among features (B2): In this method, we assume a linear relationship between the contextual features anduse a linear regressor to learn the correlation coefficients incrementally.
• The regression fit is given as, Y = α ∙ X, where α is the vector of α1, α2, …, αn, the correlation coefficients and Y and X are the vectors of movements and communication features.
^
1 1 2 2 7 7
c
c c c
t t t t t tty f f fα α α− − − − − −= + + +…
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 19
Figure : Visualization of stock movements with time on vertical scale. B1
and B2
are the two baseline techniques. The SVR
prediction is found to follow the movement trend very closely with an error of 13.41 %.
Open Text
Corporation extends
alliance with
Microsoft (Aug 20)
Apple Beatles settles
trademark lawsuit
(Feb 5)
B1 B2 SVR Actual B1 B2 SVR Actual B1 B2 SVR Actual
APPLE GOOGLE MICROSOFT NOKIA
B1 B2 SVR Actual
Ipod classic, ipod
touch (Sep 5)
US STOCKS-
Tech drag trips
Dow, Nasdaq
extends drop (Nov
29)
iPhone release
(Jun 29)
Google outbids
Microsoft for Dell
bundling deal
(May 25)
Google getting
into e-books (Jan
22)
Nokia E61i, E65
(Jun 25)
Microsoft Xbox 360
Arcade release (Oct
23)
Microsoft loses EU
anti-trust appeal (Sep
17)
Nokia NGage gaming
service (Aug 29)
Nokia recalls 46M
batteries (Aug 14)
Microsoft buying
Yahoo rumor (May
4)
Microsoft To Acquire
Parlano (Aug 30)
The FTC to conduct
antitrust review of
Google and
DoubleClick deal
(May 29)
Experimental Results
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 20
Introduction / Related work
Problem Statement
Modeling Information (Activity) Roles
Modeling Communication Dynamics
Determining Correlation
Experimental Results
Conclusions
Outline
November 5, 2008 20
• Summary
• Future Work
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 21
� A simple model
• to study and analyze communication dynamics in blogosphere and
• use those dynamics to determine interesting correlations with stock market movement.
� Characterize the communication dynamics in a blog through several contextual features for a particular company.
� Excellent results on stock movement prediction of four companies using a widely read technology blog dataset www.engadget.com
November 5, 2008 21
Summary
Blog communication + stock movement
correlation
November 5, 2008@ACM HyperText and HyperMedia 2008, Pittsburgh, PA 22
Conclusions
� Consequences
• Blog communication exhibits remarkable predictive power.
� Future Work
• Characterize communication activity at multiple scales.
• Identifying an implicit macro property that underlies communication dynamics on the blogosphere – accountable by a vocal minority or majority.
Individuals Groups Community
Characterizing communication at multiple scales