Social Networks Applied

14
Steffen Staab University of Koblenz-Landau [email protected] 80 1541-1672/05/$20.00 © 2005 IEEE IEEE INTELLIGENT SYSTEMS Published by the IEEE Computer Society Trends & Controversies Mining Social Networks for Viral Marketing Pedro Domingos, University of Washington Traditionally, social-network models have been descrip- tive rather than predictive. They’re built at a very coarse level, typically with only a few global parameters, and aren’t useful for predicting the network’s behavior. In the past, this was due largely to lack of data; the networks available for study were small and few and contained min- imal information about each node. Fortunately, the Internet’s rise has changed this dramati- cally. Massive quantities of data on large social networks are available from blogs, knowledge-sharing sites, collabo- rative-filtering systems, online gaming, social-networking sites, newsgroups, chat rooms, and so on. These networks typically number in the tens of thousands to millions of nodes. They often contain sufficient information to build models of individual nodes, which we can then assemble into models of the networks they’re part of. This gives us an unprecedented level of detail in social-network analysis, along with the potential for new understanding, useful pre- dictions, and their productive use in decision making. My colleagues and I have begun to build social-network models at this scale using data from the Epinions knowl- edge-sharing site, the EachMovie collaborative-filtering system, and others. 1,2 These models let us design viral- marketing plans that maximize positive word of mouth among customers. In our experiments, this has made it pos- sible to achieve much higher profits than if we ignored interactions among customers and the corresponding net- work effects, as traditional marketing does. Customers’ network values Customer value is usually defined as the expected profit from sales to a customer over the lifetime of his or her rela- tionship to the company. Customer value is of critical inter- est to companies because it determines how much is worth spending to acquire a particular customer. However, tradi- tional measures of customer value ignore the fact that in addition to buying products, a customer can influence oth- ers to buy them. For example, if I see a movie and persuade three friends to see it with me, my customer value with respect to that movie has effectively quadrupled, and the movie studio is justified in spending more on marketing the movie to me. Conversely, if I tend to decide what movies to see on the basis of what my friends tell me, marketing to me might be a waste of resources that would be better spent Social Networks Applied Social networks have interesting properties. They influence our lives enormously without us being aware of the implications they raise: How does a kind of fashion become en vogue? How does a virus spread and infect people? How does a research topic become a hot topic? Why are some companies successful and others aren’t? All these questions affect us, and understanding them by building and investigating computational models might give us a powerful tool to improve our health system, increase individual and general wealth, or just increase awareness about how the people around us actually influence our opinions, which we frequently believe that we shape. Pedro Domingos investigates how to exploit our unprecedented wealth of data and how we can mine social networks for purposes such as marketing campaigns. It’s fascinating how such techniques could turn current marketing strategies upside down. Peter Mika considers a particular form of influence: the way that people agree on terminology and this phenomenon’s implications for the way we build ontologies and the Semantic Web. In a nut- shell, he concludes that the Semantic Web will either include social networks’ influence in its architecture or wither away. While Peter and Pedro target social networks as something we can discover from data, Jennifer Golbeck takes a constructivist view: people already provide explicit social network information in formats such as friend-of-a-friend files. If we refine this kind of information, we could offer a wealth of new applications, such as better recom- mendations for restaurants, trustworthy email senders, or (maybe) blind dates. Li Ding, Tim Finin, and Anupam Joshi also take a constructivist view by investigating the richness and difficulty of harvesting FOAF information. Andrzej Nowak and Robin R. Vallacher conclude this issue from the viewpoint of social scientists who’ve studied extensively how information processing is bound to social context. They point us to the intriguing ways that network topology’s definition determines its outcomes. Steffen Staab

Transcript of Social Networks Applied

Steffen StaabUniversity of [email protected]

80 1541-1672/05/$20.00 © 2005 IEEE IEEE INTELLIGENT SYSTEMSPublished by the IEEE Computer Society

T r e n d s & C o n t r o v e r s i e s

Mining Social Networks for ViralMarketingPedro Domingos, University of Washington

Traditionally, social-network models have been descrip-tive rather than predictive. They’re built at a very coarselevel, typically with only a few global parameters, and

aren’t useful for predicting the network’s behavior. In thepast, this was due largely to lack of data; the networksavailable for study were small and few and contained min-imal information about each node.

Fortunately, the Internet’s rise has changed this dramati-cally. Massive quantities of data on large social networksare available from blogs, knowledge-sharing sites, collabo-rative-filtering systems, online gaming, social-networkingsites, newsgroups, chat rooms, and so on. These networkstypically number in the tens of thousands to millions ofnodes. They often contain sufficient information to buildmodels of individual nodes, which we can then assembleinto models of the networks they’re part of. This gives usan unprecedented level of detail in social-network analysis,along with the potential for new understanding, useful pre-dictions, and their productive use in decision making.

My colleagues and I have begun to build social-networkmodels at this scale using data from the Epinions knowl-edge-sharing site, the EachMovie collaborative-filteringsystem, and others.1,2 These models let us design viral-marketing plans that maximize positive word of mouthamong customers. In our experiments, this has made it pos-sible to achieve much higher profits than if we ignoredinteractions among customers and the corresponding net-work effects, as traditional marketing does.

Customers’ network valuesCustomer value is usually defined as the expected profit

from sales to a customer over the lifetime of his or her rela-tionship to the company. Customer value is of critical inter-est to companies because it determines how much is worthspending to acquire a particular customer. However, tradi-tional measures of customer value ignore the fact that inaddition to buying products, a customer can influence oth-ers to buy them. For example, if I see a movie and persuadethree friends to see it with me, my customer value withrespect to that movie has effectively quadrupled, and themovie studio is justified in spending more on marketing themovie to me. Conversely, if I tend to decide what movies tosee on the basis of what my friends tell me, marketing to memight be a waste of resources that would be better spent

Social Networks Applied

Social networks have interesting properties. They influence ourlives enormously without us being aware of the implications theyraise: How does a kind of fashion become en vogue? How does avirus spread and infect people? How does a research topic become ahot topic? Why are some companies successful and others aren’t? Allthese questions affect us, and understanding them by building andinvestigating computational models might give us a powerful tool toimprove our health system, increase individual and general wealth,or just increase awareness about how the people around us actuallyinfluence our opinions, which we frequently believe that we shape.

Pedro Domingos investigates how to exploit our unprecedentedwealth of data and how we can mine social networks for purposessuch as marketing campaigns. It’s fascinating how such techniquescould turn current marketing strategies upside down.

Peter Mika considers a particular form of influence: the way thatpeople agree on terminology and this phenomenon’s implicationsfor the way we build ontologies and the Semantic Web. In a nut-shell, he concludes that the Semantic Web will either include socialnetworks’ influence in its architecture or wither away.

While Peter and Pedro target social networks as something wecan discover from data, Jennifer Golbeck takes a constructivist view:people already provide explicit social network information in formatssuch as friend-of-a-friend files. If we refine this kind of information,we could offer a wealth of new applications, such as better recom-mendations for restaurants, trustworthy email senders, or (maybe)blind dates.

Li Ding, Tim Finin, and Anupam Joshi also take a constructivistview by investigating the richness and difficulty of harvesting FOAFinformation.

Andrzej Nowak and Robin R. Vallacher conclude this issue fromthe viewpoint of social scientists who’ve studied extensively howinformation processing is bound to social context. They point us tothe intriguing ways that network topology’s definition determinesits outcomes. —Steffen Staab

marketing to my friends. A customer’s net-work value is the expected increase in salesto others that results from marketing to thatcustomer.

Clearly, ignoring customers’network val-ues, as traditional direct marketing does, canlead to suboptimal marketing decisions. Butwhile the marketing literature has acknowl-edged network effects’existence, it has gener-ally considered them unquantifiable, particu-larly at the individual-customer level. Thedata sources now available have changed this.

Our models let us measure a customer’snetwork value. We model how likely eachcustomer is to buy some product, as a func-tion of the customer’s and product’s intrin-sic properties and of the influence of thecustomer’s neighbors in the network. Byperforming probabilistic inference over thejoint model of all customers, we can answerquestions such as, “If we market to a partic-ular set of customers, what’s the expectedprofit from the whole network after thosecustomers’ influence has propagated through-out?” Using this capability, we can searchfor the optimal set of customers to marketto—that is, the set that will yield the highestreturn on investment. Intuitively, we can lookfor the customers with the highest networkvalues, market to them, and reap the benefitsof the ensuing wave of word of mouth.

Factors that influence networkvalue

What makes for a customer with highnetwork value? Clearly, high connectivityin the network should help, but our modelidentifies other factors.

Customer opinionFirst, it’s important that the customer like

the product, preferably a lot. Customers whohave high connectivity but dislike a productcan have negative network value, and weshould avoid marketing to them. In our exper-iments with EachMovie, our model took thisinto account, which helped it outperform astandard direct-marketing approach. Thestandard approach assumed that the most ithad to lose by marketing to a customer whodidn’t like the product was the marketing’scost, which is typically small per customer.As a result, the standard approach marketedeven to customers whose chances of likingthe product were relatively low.

Asymmetric influenceAnother key aspect is that to have high

network value, a customer should influencehis or her acquaintances more (ideally muchmore) than they influence that customer. Ifinfluence is symmetric, searching for themost influential customers has no advan-tages. Fortunately, asymmetric influence iswidespread in practice, and our approachexploits it. While various fields have well-known opinion leaders, such as celebrities,our approach lets us identify them at thelocal level.

Chain of influencePerhaps most important, a customer’s

network value doesn’t end with his or herimmediate acquaintances. Those acquain-tances in turn influence other people and so on until they potentially reach the entirenetwork. A customer who’s not widely con-nected might, in fact, have high network

value if an acquaintance is highly connected(for example, an advisor to an opinionleader). In our experiments with the Epin-ions Web site, the most valuable customerhad a network value of over 20,000, mean-ing that marketing to him was as effectiveas marketing to over 20,000 others in theabsence of network effects. However, thecustomer’s number of direct links to othersin the network—that is, people who read hisreviews—was much smaller.

Our model’s consequencesWord-of-mouth marketing might not be

effective in some markets because the req-uisite influence networks aren’t present.Although this is known at a high level forsome market types,3 many startup compa-nies have failed by investing heavily tounleash network effects that never material-ized. Conversely, trials for some products,

such as cash cards and interactive tele-vision, have failed because the companiesdidn’t appreciate that giving the product to a small sample of isolated customersdoesn’t allow network effects to take hold.When the data is available, our models letus measure these effects precisely andmake better decisions.

Another interesting consequence of ourmodel is that it might pay to lose money on some customers, if they’re influentialenough. In traditional direct marketing,customers receive an offer only if theirexpected profits exceed the offer’s cost. Inviral marketing, giving a free product to awell-chosen customer could pay off manytimes in sales to other customers.

Maximizing word of mouthGiven a social-network model, we have

a well-defined optimization problem: choosethe set of customers to market to so as tomaximize net profits—that is, profits fromsales minus marketing’s cost. David Kempe,Jon Kleinberg, and Éva Tardos have shownthis problem to be NP-hard, but approx-imable within 63 percent of the optimalusing a simple hill-climbing search proce-dure.4 We obtained similar results with aneven faster approach: we added each cus-tomer to the current marketing set as longas this improved overall profit.

With careful implementation, the poten-tially prohibitive cost of performing proba-bilistic inference over the whole network ateach search step (necessary to measure theeffect of adding a customer to the marketingset) also turns out not to be a problem. Thisis because the vast majority of customershave low network values; their influencedoesn’t propagate far, so the computationfor them converges quickly. For the fewcustomers with high network values, thecomputation can indeed take substantialtime, but amortized over all search steps, itbecomes quite manageable. We found theoptimal marketing set for a network withtens of thousands of nodes in minutes.

No matter how much data we have, com-pletely capturing the network of social inter-actions among people in the real world willnever be feasible. So, the important questionarises of whether our approach to maximiz-ing word of mouth still works when ourknowledge of the network is incomplete.We’ve tested this by randomly removing avariable number of edges from the networkbefore passing it to the data-mining system.

JANUARY/FEBRUARY 2005 www.computer.org/intelligent 81

What makes for a customer with

high network value? Clearly, high

connectivity in the network should

help, but our model identifies other

factors.

We found the system to be quite robust, with70 percent of the total increase in profitobtained when the system knew only five per-cent of the edges. We can also use our modelto determine the most cost-effective way togather additional knowledge. We’ve foundthat the simple heuristic of iteratively askingthe customers with the highest network valuewho their acquaintances are is quite effective.

ProspectsTraditional marketing is in crisis because

customers are increasingly inured to televi-sion commercials, direct mailings, and soon. At the same time, companies such asAmazon, Google, and Hotmail succeedwith virtually no marketing, solely on thebasis of word of mouth.3 A recent studyfound that positive word of mouth amongcustomers is by far the best predictor of acompany’s growth.5 Word-of-mouth mar-keting has a key advantage: a recommenda-tion from a friend or other trusted sourcehas credibility that advertisements lack.6

Because it leverages customers themselvesto do the marketing, it can also produceunparalleled returns on investment. Untilnow, it’s been somewhat of a black art. Ourgoal is to put it on a firmer foundation, andthe results so far are promising.

Beyond marketing, word-of-mouth opti-mization is potentially applicable in anysetting where we desire a large social out-come with limited resources. Examplesinclude reducing the spread of HIV, com-bating teenage smoking, and grassrootspolitical initiatives. Until recently, sociol-ogy lagged behind other sciences in devel-oping a computational branch. The wealthof social data the Internet provides couldchange this, and we see our work as a stepin this direction.

We’ve only begun to scratch the surface ofthe rich set of possibilities that building pre-dictive social-network models opens up. Realsocial networks evolve and have multipletypes of arcs and nodes. Multiple players’actions affect them, and we can mine themfrom a combination of sources. Because datapoints aren’t independent and identicallydistributed, subtle statistical issues arise.We’re designing a rich language for model-ing these and other aspects of social net-works and developing learning and infer-ence algorithms for it. This language, calledMarkov logic networks, combines the prob-abilistic modeling of Markov random fieldswith first-order logic’s expressiveness.7 In

preliminary experiments, it speeded devel-opment of a complex social-network modeland yielded more accurate predictions thanstandard methods.

We’re all familiar with the notion that abutterfly flapping its wings in Beijing cancause a storm in New York. At the same time,the chances that a given butterfly flapping itswings will indeed cause a storm in New Yorkare very small. Our approach, in a nutshell, isto ask, “If we wanted to cause a storm in NewYork, and could make a few butterflies flaptheir wings, which ones would we choose?”Our experiments so far show that, at least inthe marketing world, this is an effective wayto unleash storms on demand.

References

1. P. Domingos and M. Richardson, “Mining theNetwork Value of Customers,” Proc. 7th ACMSIGKDD Int’l Conf. Knowledge Discoveryand Data Mining, ACM Press, 2001, pp.57–66.

2. M. Richardson and P. Domingos, “MiningKnowledge-Sharing Sites for Viral Market-ing,” Proc. 8th ACM SIGKDD Int’l Conf.Knowledge Discovery and Data Mining,ACM Press, 2002, pp. 61–70.

3. R. Dye, “The Buzz on Buzz,” Harvard Busi-ness Rev., vol. 78, no. 6, 2000, pp. 139–146.

4. D. Kempe, J. Kleinberg, and E. Tardos, “Max-imizing the Spread of Influence through aSocial Network,” Proc. 9th ACM SIGKDDInt’l Conf. Knowledge Discovery and DataMining, ACM Press, 2003, pp. 137–146.

5. F. Reichheld, “The One Number You Need toGrow,” Harvard Business Rev., vol. 81, no.12, 2003, pp. 47–54.

6. S. Jurvetson, “What Exactly Is Viral Market-ing?” Red Herring, no. 78, pp. 110–112; www.redherring.com/Article.aspx?a=5485&hed=From+the+ground+floor%0a.

7. M. Richardson and P. Domingos, Markov LogicNetworks, tech. report, Dept. Computer Scienceand Eng., Univ. of Washington, 2004; www.cs.washington.edu/homes/pedrod/mln.pdf.

Social Networks and theSemantic Web: The NextChallengePeter Mika, Free University Amsterdam

The 2004 International World Wide WebConference heralded the completion of thefirst phase of the World Wide Web Consor-tium’s Semantic Web Activity (www.w3.

org/2001/sw/Activity): laying the SemanticWeb architecture’s foundations. Publicationsand demonstrations at past WWW eventsand the International Semantic Web Confer-ence series have presented an impressivepicture of standardized representation lan-guages, from RDF Schema to variants ofOWL; tools for creating, storing, querying,and reasoning with ontologies; and brows-ing, search, and knowledge-sharing appli-cations, driven by the ontologies under thehood. With Semantic Web Activity Phase 2beginning, the message is that by and large,the Semantic Web is ready for deployment.

Looking more closely, some issues seemto be left for future research—those hotpotatoes Semantic Web researchers havepassed around from the beginning. Two inparticular stick out from the thick proceed-ings volumes: ontology learning and ontol-ogy mapping. Ontology learning or extrac-tion is the attempt to recreate a conceptualmodel from existing knowledge sources, inparticular natural text. Ontology mapping(also known as merging, alignment, and soon) refers to finding and reconciling therelations between two or more conceptualmodels and creating a single model thatcaptures their intentions and the relationshipsbetween them.

The underlying reason that automatingthese tasks is difficult is our machines’ lackof understanding when it comes to ontolo-gies. Interestingly, this is the problem theSemantic Web set out to solve—the ideabeing that we would attach machine-processable descriptions to content that’sotherwise unintelligible to computers. Asthe well-known slide about what it’s like to be a computer illustrates, although themachine doesn’t understand strings ofhuman symbols, it has no problem identify-ing the parts in angled brackets. Many havenoted that the strings in the angled bracketsare just symbols as well, but at least theirlimits are clear. Our machines have no prob-lems applying all kinds of rules to them,generating further symbols along the way(or pointing out to us that the way we usedthem is inconsistent with the rules we setout for them). If the conclusions don’t seemto be intelligent or are surprising to us, weadd more rules to fix it.

What the machine can’t do is access whatwe think those symbols’ interpretations are,and therein lies the problem. Unfortu-nately, this is the crux of ontology learningand mapping.

82 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

Semantics are usCreating and reconciling interpretations

is a human-complex problem, as Figure 1illustrates. The picture is a puzzle, althoughan easier one than the computer must face,because we uncovered two of the network’sthree terms. The puzzle imitates the basicstep of creating or recreating interpreta-tions, namely placing or retrieving a newconcept on the basis of the context. Thequestion to the reader is, “What concept ishidden under the code?”

Unfortunately for the computer, this chal-lenge’s answer is a matter of association.What comes to your mind when you look atthe combination of these terms? The answernecessarily depends on your mental schemata;you’ll accept an answer as plausible basedon your own response. Interestingly, we hap-pen to know the answer that came from a setof Edinburgh college students in 1973.

Knowledge engineers collected that year’sEdinburgh Associative Thesaurus by hand-ing a list of words to students and instructingthem to write as quickly as possible next toeach stimulus word the first word it madethem think of. The experiment’s next roundused those words. The engineers repeated thecycle three times; by then, the number ofresponses was so large that they couldn’t allbe reused as stimuli.

According to the EAT, the shortest pathfrom love to money in the students’ mindsruns through the notions of girl and security(see Figure 2). If we consider only lines withweights less than five, the path runs as fol-lows: love, sex, yes, please, thanks, a lot,money. So much for Edinburgh college stu-dents’ love lives in the ’70s.

Deeper network analysis confirms thatmoney was central to these students’ thoughtprocesses (it had the largest inward degreeand betweenness centrality), even more sothan sex.1 Moreover, it appears that the stu-dents also had water and food on their mind,when not preoccupied with the ideas men-tioned earlier.

The EAT is actually an ontology. Theontology-engineering method is particularlyrevealing, because once the initial set ofwords is selected, the only parameter to theprocess is the population chosen. In particu-lar, the knowledge engineer has no otherrole than handing out questionnaires andcollecting responses. Some results are likelyto hold for other communities (such as theoverwhelming tendency for Christians tosay “Noah” when they hear “ark”), but the

experiment’s subjects’ collective mindsetdrives many of the aggregated associations.The well-known dynamics of social net-works create this collective mindset: interac-tion creates similarity and vice versa. (Caveatto the second part: in practice, physical con-straints such as geographic distance oftenlimit interaction.)

Communities and theSemantic Web

Even if we should try to avoid enteringold debates about the nature of knowledgeand learning, we should bow to the socialsciences at this point and consider knowl-edge’s embeddedness in the social context.(For a summary of the different views onknowledge and learning, see Figure 3.) Ibelieve we can learn something here for theSemantic Web. Communities aren’t merelysets of users, but an integral and dynamicpart of the architecture. Without intendingor even realizing, we’ve elevated them tothis place by integrating the notions of on-tology and semantics into the first Web’stechnological framework.

It’s easy to predict what’s going to fail ifwe decide to ignore social-context elements.Communities will start creating their ownontologies that reflect their identities, lan-guages, and collective intelligence, develop-ing them through interaction surrounding apractice or interest. With these ontologies,the communities will annotate their online oroffline content. Communities, ontologies,and content make up the three layers of theSemantic Web (see Figure 4). Islands ofsemantics will arise unless we find a way tomap these ontologies. Mappings, however,just reflect the similarities in the conceptu-alizations the separate communities havedeveloped, possibly through interactionbetween community members.

Further, these conceptualizations change

as communities evolve and learn. Unless wemake communities part of the system, thesystem will have significant difficulties catch-ing up with changing semantics. (Ontologyversioning is really an ontology mappingproblem between a model’s old and newversions.)2 The more unstable knowledge is,the more difficulty we can expect in formal-izing and sharing it on a large scale.3

If all this sounds a bit gloomy, I shouldquickly add that we might have alreadycome across a way toward the solution bypure chance. Created and spread for the solepurpose of having fun (another first for theSemantic Web), the Friend of a Friend pro-ject’s simple ontology lets us identify, de-scribe, and relate users using URIs (uniformresource identifiers), RDF, and a few extraterms, just as we would with conventional,Web-accessible resources such as HTMLpages. On the representation side, it’s asmall step from here to associating FOAF’susers and groups to the Semantic Web’sontologies and metadata.

It will take time and a new, interdiscipli-nary mindset to find out how we couldcharacterize a community’s relationship toan ontology and its content and what this

JANUARY/FEBRUARY 2005 www.computer.org/intelligent 83

Money#^&#)^”/Love

Figure 1. The ontology puzzle.

Money

Girl

Love

99 95

95 99Security

Figure 2. The shortest path between loveand money according to the EAT.

AbstractPassiveCerebral

IndividualTransmission

Context freeMostly explicit

ModelResourceTransfer

SituatedActive

EmbodiedSocial

Reception

ContextualMostly tacit

PracticeProcess

Experience

Learning is ... Knowledge is ... Learning is ...AI and cognitive sciences Social sciences

Knowledge is ...

Figure 3. A comparison of the AI or cognitive science and social science views onknowledge and learning. Based on a presentation by Paul Duguid (Int’l PhD Seminaron Organizational Learning, Networks, and Communities, Amsterdam, 2004).

84 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

relationship means for an ontology’s use andthe metadata created with it. How could wemove part of the social process surroundingontologies within the system’s boundariesso that we can give our machines a chance toget a grip on the messy, confused world ofhuman knowledge?

Proof of conceptMany researchers have used a Web key-

word index such as Google for ontology-learning purposes, most recently in PhilippCimiano and his colleagues’ excellent

work.4 In my recent effort, I’ve extendedthis technique with the EAT idea, buildinga conceptual model that reflects a com-munity’s rather than the whole Web’s wis-dom. Besides mining social networks fromthe Web, as in the pioneering ReferralWebproject,5 I queried Google with communitymembers’ names (in my case, SemanticWeb researchers), along with terms fromtheir domains, such as research topics, toolnames, and so on. I’ve measured the asso-ciations among people and concepts by thenumber of pages returned—that is, the num-

ber of pages where the name and the conceptco-occur—after a normalization step (seealso the online demonstration at http://flink.semanticweb.org).6,7

By taking the most highly associatedindividuals as representatives of a conceptor of community members, I measured theassociations between concepts by the num-ber of people shared between the two com-munities, much like the EAT (see Figure 5).Although the method is under evaluation,the first results suggest that an ontologyobtained this way represents the communitybetter than one obtained by mining associa-tions on the basis of all of Google’s indexpages, especially when it comes to termsthat have both a generic and a community-specific interpretation (see Figure 6). Forme, this simple experiment is the proof ofconcept: elusive, fuzzy communities are areal independent variable in learning anontology. If I’d run the same experiment withthe Edinburgh master’s students’names, I’mcertain I would have gotten a quite differentpicture.

As for the Semantic Web Activity’sPhase 2, the message is clear. It’s time tothink about that final layer of the SemanticWeb architecture, the social structures andprocesses that one day will lead us to a sus-tainable worldwide ecosystem of peopleand semantics.

AcknowledgmentsThe Vrije Universiteit Research School for

Business Information Sciences (VUBIS) providedfunding for this work.

References1. A. Ziberna, “Network Analysis of a Word

Association Thesaurus,” Proc. XXIV Int’l Sun-belt Social Network Conf., Univ. of Ljubljana,2004, pp. 131–132.

2. M. Klein et al., “Ontology Versioning andChange Detection on the Web,” Proc. 13thInt’l Conf. Knowledge Eng. and KnowledgeManagement (EKAW02), LNCS 2473, A.Gómez-Pérez and V.R. Benjamins, eds.,Springer-Verlag, 2002, pp. 197–212.

3. L. van Elst and A. Abecker, “Ontologies forInformation Management: Balancing For-mality, Stability, and Sharing Scope,” ExpertSystems with Applications, vol. 23, no. 4, Nov.2002, pp. 357–366.

4. P. Cimiano, S. Handschuh, and S. Staab,“Towards the Self-Annotating Web,” Proc.13th Int’l Conf. World Wide Web, ACM Press,2004, pp. 462–471.

Figure 5. Members of the Semantic Web community and their association to researchinterests.

Web site A

Community ZCommunity Y

Ontology X Ontology ZOntology Y

Web site B

Community X

Figure 4. The Semantic Web’s three layers.

5. H. Kautz, B. Selman, and M. Shah, “The Hid-den Web,” AI Magazine, vol. 18, no. 2, 1997,pp. 27–36.

6. P. Mika, “Bootstrapping the FOAF-Web: AnExperiment in Social Network Mining,”Proc.1st Workshop Friend of a Friend, Social Net-working and the Semantic Web, 2004; www.w3.org/2001/sw/Europe/events/foaf-galway/papers/fp/bootstrapping_the_foaf_web.

7. P. Mika, “Social Networks and the SemanticWeb: An Experiment in Online Social Net-work Analysis,” Proc. IEEE/WIC/ACM Int’lConf. Web Intelligence (WI 2004), IEEE CSPress, 2004, pp. 285–291.

Sharing and Using Links inSocial NetworksJennifer Golbeck, University of Maryland

Social networking has grown dramaticallymore popular since the release of the film SixDegrees of Separation and the now-infamousKevin Bacon Game’s popularization (see theRelated Links sidebar). In the past few years,several Web sites have emerged to supportthe public interest in social networking.Some, such as LinkedIn, focus on buildingbusiness relationships, while others, such asTribe, Orkut, and Friendster, have social andentertainment motivations. Recently, a Website even emerged to extend social network-ing to our dogs, though the people at Dogsterprefer that you call it social petworking.

Data about millions of people and theirconnections is publicly available on theWeb. Users spend a lot of time maintaining

their data, building up networks, and brows-ing links. If it’s possible to integrate thisdistributed data into centralized models, weshould be able to create socially aware,intelligent applications that will let usersbenefit from their participation in onlinesocial networks. To achieve this goal, wemust answer two questions. First, how canwe merge data from separate databases intoa large social network model that applica-tions can access? Second, how can socialnetworks benefit users?

The Friend of a Friend projectThe question of making data accessible

and understandable by applications has astrong foundation of work behind it. TheFriend of a Friend project is one of theSemantic Web’s largest and most popular.It’s essentially a vocabulary for describingpeople and whom they know. Literally mil-

lions of FOAF files exist on the SemanticWeb, some from users who’ve authoredtheir own data and others from Web sitesthat publish data from their databases usingthe FOAF ontology. Because users are begin-ning to accept FOAF as something of astandardized ontology for representingsocial networks on the Semantic Web, it’s agood option for Web sites that want to startsharing some of their data.

However, FOAF isn’t a complete data-sharing solution. The vocabulary has a lim-ited set of properties, and only one type ofrelationship exists between people: the“knows” relationship. In reality, the scale ofknowing someone varies from lifelong bestfriend to Internet acquaintance. A specificapplication might want properties that aren’tpart of FOAF, but on the Semantic Web,that’s not much of a problem. By nature, theSemantic Web lets individual projects make

JANUARY/FEBRUARY 2005 www.computer.org/intelligent 85

DAML-S

Web services

Semantic Web

OWL-S

0.57

0.46

0.82

0.50 0.71

0.40

0.77

0.450.46

0.41

0.400.55

0.45

0.42

0.60

0.52

0.44

0.47

KnowledgeManagement

Foundational Ontology

On-To-Knowledge

Sesame

RQL

DOLCE

ontology

SeRQLRDQL

0.63

0.820.41

0.45

0.48

Jena

F-Logic

agent

RDF

0.47

0.50

0.48

0.40

multimedia

OWLgrid

OilEd

Description

database

Figure 6. The resulting community ontology is the result of folding Figure 5’s bipartite graph and normalizing the weights usinggeometric normalization.

The Oracle of Bacon at Virginia: www.cs.virginia.edu/oracleLinkedIn: www.linkedin.comTribe: www.tribe.netOrkut: www.orkut.comFriendster: www.friendster.comDogster: www.dogster.comFriend of a Friend project: www.foaf-project.orgTrust and Reputation project: http://trust.mindswap.orgFriend of a Friend Relationship module: www.perceive.net/schemas/20021119/relationship

Related Links

86 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

their own extensions to the vocabulary whilepreserving FOAF as a core. My own project,for example, studies trust in social networks.FOAF doesn’t define trust, but my simpleontology extends FOAF with trust properties.Another project—the FOAF Relationshipmodule—defines a long list of relationshiptypes.

The benefits of using FOAF as a core aretwofold. A Web site that understands FOAFcan augment its own databases with infor-mation gathered from other FOAF files onthe Web. Also, by publishing a set of datawith a FOAF core on the Semantic Web,Web sites provide a service to users becauseother FOAF-based services can now usethat data. (Paul Mutton describes one tech-nique for spidering FOAF data and creatingapplications with it.1)

ApplicationsAccess to this large data model holds great

promise for improving and developing intel-ligent systems. Based on the time they spenddeveloping networks online, users clearlyfeel that there’s a benefit to building socialnetworks. The question to us as scientists is,“What can we do with the connections inthese networks?” Some of the greatest poten-tial for creating socially intelligent systemslies in being able to compose relationshipswithin the network and integrate that infor-mation into systems.

What can we say about two people con-nected by intermediate friends? With thestandard approach of social networking,which has a single relationship indicatingsome sort of connection, we have too littleinformation to say much. By adding dimen-sion to relationships, more possibilities openup. Once we start refining relationships, wecan ask a series of questions that will affectthe analysis and potential for composition. IfAlice and Chris aren’t directly connected ina social network, and we want to calculate arecommendation about the relationshipbetween them, we need to know about therelationships and intermediate people con-necting them.

Using trust as an example, say Alicehighly trusts Bob, and Bob highly trustsChris. Can we recommend that Aliceshould have some level of trust for Chris?It’s debatable, but in practice, we use thissort of social logic every day. Asking for arecommendation about a mechanic orrestaurant employs exactly this type ofcomputation, taking into account the trust

we have for the person we’re asking andtheir trust of the mechanic or restaurant.Blind dates are another (albeit often unsuc-cessful) application of the same logic.

With algorithms that can accurately inferrelationships between people—be they trustrelationships or other types—using thoseinferences in applications is a natural nextstep. Continuing with the trust example, wecan integrate recommendations we makeabout trust using a social network in manycontexts: using ratings to allow access to per-sonal information on the Web, using them asa filter for Web-based information, or, as inone of my projects, integrating ratings intoan email client. In email, we could use rat-ings as scores for email messages, indicatinghow much the user should trust the sender—even if they’ve never met.2 That essentiallycreates a social email filter, benefiting fromalgorithms applied to large social networks.

Certainly, we can use values in otherways and compose other types of relation-ships. The core point is that we must be ableto bring something useful to systems fromsocial networks. Telling users the numberof relationships and number of friends-of-friends they have might be entertaining, butfalls short of offering any real assistance.The Web-based infrastructure exists thatwill let people and Web sites share socialnetwork data in a distributed way, and thatwill let services and applications aggregatethat large set of data into queriable models.Our work must focus on doing somethingwith the social network, like composingrelationships. In that type of analysis liesthe real ability for creating intelligent,socially aware systems that integrate thesocial and computational.

References1. P. Mutton, IRC Hacks, O’Reilly, 2004.

2. J. Golbeck and J. Hendler, “Reputation Net-work Analysis for Email Filtering,” Proc. 1stConf. Email and Anti-Spam, CEAS, 2004;www.ceas.cc/papers-2004/177.pdf.

Analyzing Social Networks onthe Semantic WebLi Ding, Tim Finin, and Anupam Joshi,University of Maryland, Baltimore County

The past year has seen a dramatic increasein the amount of social information publishedin RDF documents. Our investigations show

that the Friend of a Friend ontology (http://xmlns.com/foaf/0.1) is among the most-used Semantic Web ontologies.1,2 This istrue if we measure the number of SemanticWeb documents (SWDs) that use the FOAFnamespace, as Table 1 shows, or the numberof triples using FOAF terms. The SwoogleOntology Dictionary (http://swoogle.umbc.edu) shows that the class foaf:Person (the quali-fied name, or QName, of http://xmlns.com/foaf/0.1/Person) has nearly one millioninstances spread over about 45,000 Webdocuments. The FOAF ontology isn’t theonly one people use to publish social informa-tion on the Web. For example, Swoogle iden-tifies more than 360 RDF Schema or OWLclasses defined with the local name “person.”

The Semantic Web and social-networkmodels support one another. On one hand,the Semantic Web enables online and explic-itly represented social information; on theother hand, social networks, especiallytrust networks,3 provide a new paradigmfor knowledge management in which users“outsource” knowledge and beliefs viatheir social networks.4 To turn these objec-tives into reality, we need to address manychallenging issues such as

• Knowledge representation. Althoughvarious ontologies capture rich socialconcepts, we don’t need hundreds ofdialectic ontologies defining the sameconcept. How can we coalesce around asmall number of common, comprehen-sive ontologies?

• Knowledge management. Compared tothe entire Web, the Semantic Web isfairly well connected at the RDF-graphlevel but poorly connected at the RDF-document level. The Semantic Web’sopen, distributed nature introduces otherissues. How do we provide efficient, effec-tive mechanisms for accessing knowledge,especially social networks, on the Seman-tic Web?

• Social network extraction, integration,and analysis. Even with well-definedontologies for social concepts, extractingsocial networks correctly from the noisy,incomplete knowledge on the SemanticWeb is difficult. What are good heuris-tics for integrating and fusing socialinformation, and what metrics are usefulfor the results’ credibility and utility?

• Provenance- and trust-aware distributedinference. Provenance associates factswith social entities that are interconnected

JANUARY/FEBRUARY 2005 www.computer.org/intelligent 87

in social networks. We can derive trustamong social entities from social net-works. How do we manage and reducethe complexity of distributed inferencesby using provenance of knowledge in thecontext of a given trust model?

DatasetsTo understand how social networks on

the Semantic Web are being modeled, wecollected two datasets: DS-SWOOGLE andDS-FOAF. (We noticed that these datasetsare the largest among related works.1,5,6)We used Swoogle, a crawler-based index-ing and retrieval system for Semantic Webdocuments, to collect the first dataset,2

which provides a baseline model of theontologies and information encoded inRDF on the Web. The dataset shows thatthe terms in the FOAF ontology, especiallyfoaf:Person, are among the most used andpopulated. (We say that a class or propertyis populated when it has direct instances.)We assume that it’s reasonable to use thefoaf:knows property to connect people formingsocial networks. Therefore, we collected thesecond dataset for the SemDis project1 tofocus on available FOAF documents con-taining instances of foaf:Person. We collectedboth datasets from conventional Web searchengines, user-supplied URLs, and ourSemantic Web crawlers.

DS-SWOOGLE

At the time of this writing, DS-SWOOGLE

represented more than 335,000 valid SWDs(that is, online RDF documents in formatssuch as RDF/XML and N3). The SWDscontain about 47,000,000 RDF triples andare hosted by about 125,000 Web sites.(Swoogle is running continuously, and itsdatabase grows as new SWDs are added tothe Web.) Swoogle samples at most 10,000documents from each Web site to avoidbeing overwhelmed by Web sites with mil-lions of RDF documents. Swoogle Ontol-ogy Dictionary and Swoogle Statistics arebased on this dataset.

DS-FOAF and DS-FOAF-VARThe DS-FOAF dataset has over one mil-

lion valid online FOAF document URLsfrom over 1,800 sites. We consider a FOAFdocument to be any RDF document that hasat least one instance of the foaf:Person class.We count Web sites by DNS (Domain NameSystem) name in DS-SWOOGLE and by IP(Internet Protocol) address in DS-FOAF.

Five major blog sites, which use limitedvocabularies and fixed structures in describ-ing personal profiles, host more than 95 per-cent of the URLs. To reduce these sites’impact, we studied a smaller dataset, DS-FOAF-VAR, which considers Web sites thathost at most 1,000 FOAF documents. Thisdataset has over 7,000 FOAF documentsdrawn from 1,065 Web sites that definenearly 37,000 instances of foaf:Person. Theseinclude 4,158 strict FOAF documents, in-tended to describe one person and his or heracquaintances. Table 2 shows the two data-sets’ detailed statistics.

Building a common socialontology

The Semantic Web provides a powerful

distributed mechanism to represent andpublish social network information. Al-though FOAF terms are widely used toencode social relations, other ontologiesshow up as well. We expect these to coa-lesce and merge as they evolve. In light of the statistical approach to finding com-mon terms,1,5 we studied a particular class:foaf:Person, the most frequently used class fordescribing personal profiles, according toour datasets. The definition of foaf:Personcomes from three sources: its ontology defi-nition, which relates it to other classes; theontological properties that relate to it viardfs:domain relation; and empirical properties,which correlate with it by modifying itsinstances (see Figure 1).

The DS-SWOOGLE dataset includes 17

Table 2. Statistics of DS-FOAF and DS-FOAF-VAR.

DS-FOAF DS-FOAF-VAR

max avg std max avg std

Persons/doc 2,216 30.5 52.3 2,196 5.1 49.4

SeeAlso/doc 2,238 29.3 51.8 2,066 1.9 36.7

Triples/person — — — 3,192 5.5 36.1

Table 1.The seven namespaces most frequently used in RDF documents (from Swoogle).

Namespace URI Number of documents

1 www.w3.org/1999/02/22-rdf-syntax-ns# 200,097 ( 96.9%)

2 http://purl.org/dc/elements/1.1/ 146,923 ( 71.2%)

3 http://purl.org/rss/1.0/ 111,595 ( 54.0%)

4 http://webns.net/mvcb/ 68,330 ( 33.1%)

5 http://xmlns.com/foaf/0.1/ 49,504 ( 24.0%)

6 www.w3.org/2000/01/rdf-schema# 44,656 ( 21.6%)

7 http://purl.org/rss/1.0/modules/content/ 28,607 ( 13.9%)

Ontology definition• rdfs:subClassOf—foaf:Agent• rdfs:label—“Person”

foaf:mbox

owl:Class

foaf:name

“Tim's FOAF File”dc:title

rdfs: domainrdfs: domain

rdfs:subClassOfrdfs:label“Person”

rdf:typerdf:type

foaf:name

SWD1Ontological properties • foaf:mbox • foaf:name

Empirical properties • foaf:name • dc:title

foaf:Agent

SWD2 SWD3

“Tim Finin”

foaf:Person

Figure 1. Three sources of term definition.

88 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

ontologies that add to the foaf:Person defini-tion. For example, it is defined both as anowl:Class and as an rdfs:Class and has the namedsuper-classes foaf:Agent, wordnet:Person, geo:Spa-tialThing and con:Person (the geo prefix refers towww.w3.org/2003/01/geo/wgs84_pos#and the con prefix refers to www.w3.org/2000/10/swap/pim/contact#). DS-SWOOGLE

reveals 162 ontological properties of foaf:Per-son, most representing social relations. Sev-enty-four properties exist whose rdfs:domainand rdfs:range are both foaf:Person. DS-SWOOGLE

also finds 558 empirical properties of foaf:Per-son that are populated with instance data.Tables 3 and 4 list the 10 most frequentlyused empirical properties, which suggestthat people publishing personal informationare concerned about privacy. They use theproperty foaf:mobx_sha1sum (which hides thetrue email address) much more frequentlythan foaf:mbox.

The empirical cardinality also shows howusers organize their profiles. The high valuefor maximum cardinality results from an

unusual usage of FOAF vocabulary to builda collection of FOAF documents. In Table4, the properties that documents (but not in-stances) use frequently tend to describe thestrict FOAF documents’ owner.

Extracting social networksExtracting social networks from noisy,

real-world data is challenging, even if theinformation is already encoded in RDFusing well-defined ontologies. The processhas three steps: discovering instances offoaf:Person, deciding which instances denotethe same person and merging their informa-tion, and linking people through varioussocial-relation properties, such as foaf:knows.Determining whether two foaf:Person in-stancesdenote the same person is the critical, but dif-ficult, step. The semantics of FOAF vocabu-lary suggest several heuristics to answer thisquestion, such as

• Named URI. Non-anonymous individualsusing the same URI denote the same person.

• Inverse-functional properties. Inverse-functional properties such as foaf:mboxand foaf:homepage identify unique individ-uals. In practice, we can use one or moreproperties that aren’t strictly inversefunctional, such as foaf:name and foaf:nick,in conjunction with other properties suchas foaf:phone to identify individuals withhigh probability.

• Semantic equality. When two or morevalues of an inverse-functional propertycoexist in an individual’s description,they’re semantically equivalent to identi-fying the same individual.

• rdfs:seeAlso. This property almost alwayslinks to a strict FOAF document wherethe root and referrer persons are thesame.

In our preliminary study of DS-FOAF-VAR, we applied the first three heuristicsand only consider foaf:mbox_sha1sum andfoaf:mbox as inverse-functional properties.We found 18,603 merged people, but only10,247 have unique identifiers. Figure 2shows that the cumulative distribution ofgroup size follows a Zipf distribution.Here, group refers to the collection of indi-viduals being merged as a person.

These heuristics for merging individualscan fail in two ways: inconsistency andseparation. OWL gives one inconsistencycriterion, where cardinality constraintslimit a property’s semantically distinct val-ues. For example, when property P has anowl:cardinality of one when modifying class C,all P’s values for an individual of type Cshould be semantically equivalent. In prac-tice, because most people only have onename, we derive a cardinality constraintover foaf:Person. We can validate a person’ssemantic consistency by checking whetherit has two different names. Separation occurswhen a person’s information remains in twodisjoint groups after merging. This givesrise to a dilemma—applying more mergeheuristics might reduce separation butincrease inconsistency.

Social network analysisSocial network analysis (SNA) is a large,

active research area, so we’ve limited ourwork to studying some of the extractedsocial network’s basic graph features. PeterMika shows more applications of basicSNA measures on a smaller social network(n = 167) extracted from FOAF and otherWeb sources.7

Table 4. Top 10 empirical properties of foaf:Person in DS-FOAF-VAR.

Property usage per document Property usage per instance

1 foaf:name 80% foaf:name 65%

2 foaf:mbox_sha1sum 70% foaf:mbox_sha1sum 60%

3 foaf:nick 51% rdfs:seeAlso 37%

4 foaf:homepage 40% foaf:nick 24%

5 foaf:depiction 35% foaf:homepage 16%

6 foaf:weblog 30% foaf:mbox 14%

7 foaf:knows 28% foaf:weblog 14%

8 foaf:surname 27% foaf:firstName 12%

9 foaf:firstName 27% foaf:surname 12%

10 rdfs:seeAlso 26% foaf:depiction 9%

Table 3. Top 10 empirical properties of foaf:Person in DS-SWOOGLE.

Max. Min. DocumentsProperty cardinality cardinality Number Percent

1 foaf:mbox_sha1sum 12 1 41,403 95%

2 foaf:nick 7 1 36,095 83%

3 foaf:weblog 5 1 35,303 81%

4 rdfs:seeAlso 329 1 27,838 64%

5 foaf:name 4 1 26,749 62%

6 foaf:knows 3,187 1 25,736 59%

7 foaf:homepage 3 1 17,616 41%

8 foaf:dateOfBirth 1 1 12,783 29%

9 foaf:page 3 1 11,255 26%

10 foaf:interest 300 1 10,314 24%

JANUARY/FEBRUARY 2005 www.computer.org/intelligent 89

Degree analysisDegree analysis is an important measure

in analyzing social networks. Our analysisof 14,164 distinct “knows” relations in DS-FOAF-VAR shows that both in-degree andout-degree follow a Zipf distribution (seeFigure 3). We further put person into fourcategories: in only (51.8 percent), out only(5.8 percent), in and out (5.4 percent), andisolated (37.1 percent), according to theirin-degree and out-degree. Such a socialnetwork isn’t well connected because onlya few individuals (in and out) lie betweenothers. Ninety-four percent of “in only”persons are known by only one person.

Patterns of connected componentsWe’ve discovered 834 connected com-

ponents and 6,904 isolated persons. Theconnected components exhibit interestinggraphical patterns: six singletons that link tothemselves, a giant component that has 6,053connected persons, and several stars withmany out-links (the average out-degree forsuch nodes is 6.8). Figure 4 shows a selectionof connected components. We hypothesizethat the FOAF network topology evolves overtime; a FOAF network starts from somedisjointed star-like connected components,which link together to form trees, forests,and eventually a scale-free network.

Our research uses real-world data in anopen, distributed context to provide a data-digest service for efficient data access onthe Semantic Web and reasoning over theknowledge encoded in Semantic Web lan-guages. We are also working on modelingtrust across multiple social networks andbuilding a general architecture for prove-nance- and trust-aware inference in open,distributed, and heterogeneous environ-ments, such as the Web and multiagentsystems.

Figure 5 illustrates our ongoing work onreasoning trust across multiple social net-works and reputation systems. To improvecoverage and connectivity, we integratesocial networks and reputation systems bymapping people to better derive and propa-gate trust relations through social relations.As Figure 5 shows, the gap between thetwo people P. Kolari and A. Sheth is con-nected by mapping the person T. Fininbetween two social networks. The reputa-tion systems might offer default trust tosocial entities.

Our ongoing work focuses on improvingthe efficiency and effectiveness of data-

1 10 100 10,0001,000

Num

ber o

f peo

ple

100,000

10,000

1,000

100

10

1

Group size

OutIn

Figure 3. Cumulative distribution of in-degree and out-degree.

Figure 4. Some connected components in FOAF network.

1 10 100 1,000Nu

mbe

r of i

ndvi

dual

s

100,000

10,000

1,000

100

10

1

Group size

Figure 2. Cumulative distribution of group size.

90 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

digest services, social-network extractionand integration, and modeling provenanceand trust for distributed inference services.

AcknowledgmentsDARPA contract F30602-00-0591(DAML) and

NSF awards ITR-IIS-0326460(SPIRE) and ITR-IIS-0325464(SEMDIS) provided partial researchsupport.

References1. L. Ding et al., “How the Semantic Web Is

Being Used: An Analysis of FOAF,” Proc.38th Ann. Hawaii Int’l Conf. System Sciences(HICSS 05), CD-ROM, IEEE CS Press, 2005.

2. L. Ding et al., “Swoogle: A Search and Meta-data Engine for the Semantic Web,” Proc.13th ACM Conf. Information and KnowledgeManagement (CIKM 04), ACM Press, 2004,pp. 652–659.

3. J. Golbeck, B. Parsia, and J. Hendler, “TrustNetworks on the Semantic Web,” Proc.7thInt’l Workshop Cooperative Intelligent Agents(CIA 2003), LNCS 2782, Matthias Klusch etal., eds., Springer-Verlag, 2003, pp. 238–249.

4. L. Ding, L. Zhou, and T. Finin, “Trust BasedKnowledge Outsourcing for Semantic WebAgents,” Proc. IEEE/WIC Int’l Conf. WebIntelligence (WI 03), IEEE CS Press, 2003,pp. 379–387.

5. J.C. Paolillo and E. Wright, “The Challengesof FOAF Characterization,” Proc. 1st Work-shop Friend of a Friend, Social Networkingand the Semantic Web, 2004, www.w3.org/2001/sw/Europe/events/foaf-galway/papers/fp/challenges_of_foaf_characterization.

6. G.A. Grimnes, P. Edwards, and A. Preece,“Learning Meta-descriptions of the FOAFNetwork,” Proc. 3rd Int’l Semantic Web Conf.(ISWC 2004), LNCS 3298, S.A. McIlwraith,D. Plexousakis, and F. van Harmelen, eds.,Springer-Verlag, 2004, pp. 152–165.

7. P. Mika, “Bootstrapping the FOAF-Web: AnExperiment in Social Network Mining,” Proc.1st Workshop Friend of a Friend, Social Net-working and the Semantic Web, 2004, www.w3.org/2001/sw/Europe/events/foaf-galway/papers/fp/bootstrapping_the_foaf_web.

Information and Influence inthe Construction of SharedRealityAndrzej Nowak, Warsaw UniversityRobin R. Vallacher, Florida Atlantic University

Information isn’t hard to come by intoday’s world. Evaluating that information,however, is a daunting task for individuals,groups, and institutions. How do you deter-

mine which information is important? Howdo you resolve conflict among differentpieces of information? How should youassemble information into a meaningfulhigher-order structure? These are difficultenough issues for the individual; their diffi-culty multiplies when a group must coordi-nate individuals’ decisions and act on thebasis of socially validated information.

Research in social psychology suggeststhat individuals interact, in large part, toconstruct a shared reality that consists notonly of shared information, but also ofagreed-upon opinions. In this process, theydon’t simply transmit information. Moreimportantly, they influence one another toarrive at a common interpretation of infor-mation. Most research on social networks isconcerned with information transmission.1–3

We aim to supplement the social-networkperspective by incorporating mechanismsthat govern social influence.

Dynamics of social influenceModels of social influence focus on the

effects that people’s words, actions, or pres-ence have on other people’s thoughts, feel-ings, attitudes, and actions. Numerous exper-iments have shown that three critical factorsdetermine social influence’s impact: thenumber of sources exerting the influence, thesources’ immediacy to the targets, and thesources’ strength. We can describe these vari-ables’ joint effect in terms of a multiplicativefunction.4 Whether the issue is conformity,stage fright, or interest in news events, influ-ence grows approximately as a square root ofthe number of people involved, decreaseswith the square of the distance between thesource and target, and is proportional to thesources’ strength (for example, social statusor credibility). In reality, of course, eachindividual influences and is influenced byother individuals, which provides the basisfor dynamic social impact theory.5 The for-mula below describes the mutual influenceamong individuals who differ in strength andwho occupy different locations in socialspace, where Ii denotes total influence, sj cor-responds to each individual’s strength, and dij

corresponds to the distance between individ-uals i and j.

The formula indicates that individuals

Is

di

N j

ij

= ∑

1 2

2 1 2/

J. Golbeck

knows

knows

knows

H. Chen J. Hendler

F. Perich

Y. Peng

A. Sheth

A. JoshiT. Finin

L. Ding

P. Kolari

Kagal

source

sinkhubisland

T. Finin A. JoshiL. Kagal

H. ChenF. Perich

L. Ding

M.P. Singh

mapTo

coauthor

16

15

28

Golbeck'sTrust Network

Google PageRank

Citeseer Rank

DBLP Coauthor Network

FOAF network Reputation systems

Figure 5. Deriving and propagating trust from multiple sources.

JANUARY/FEBRUARY 2005 www.computer.org/intelligent 91

who are strongest and nearest to the recipi-ent have the greatest impact. With respectto strength, research has established lead-ers’ crucial importance for group-level phe-nomena. Some individuals are clearly moreinfluential than others, either because ofstable characteristics (such as credibility,expertise, or social status) or transientstates (such as motivation to influence).

With respect to locality of influence,research has shown two effects. First, theprobability of interaction (and hence ofinfluence) decreases with the square of thedistance between individuals. In both Chinaand the US, for example, the probabilitythat two individuals will discuss matters ofmutual importance decreases as a square ofthe distance between their residences.6 Sec-ond, information’s impact decreases withthe distance between source and recipient.4

For example, information concerning a traf-fic accident is more likely to attract attentionif it occurs in your town than if it occurs in adistant one.

Individual variations in the influencefunction’s strength and locality are criticalfactors that dictate the dynamics of informa-tion in a social network.7 A model showinghow attitudes combine to form public opin-ion initially demonstrated these factors’importance.5 The social group, modeled in amanner resembling cellular automata, con-sists of n individuals on a 2D grid. Each in-dividual has an opinion (such as pro or conin the simplest case) and a value of persua-sive strength. To simulate influence in socialinteractions, each individual assesses thedegree of support for each position on theissue according to the formula. (In mostsimulations, we also assume a small noise,represented as a small random number thatwe add to a position’s resultant influence.) Ifany opinion’s resultant strength is greater thanthe individual’s current opinion’s strength,his or her opinion changes to match the pre-vailing opinion. The model performs thisprocess for each person in the group until nofurther opinion changes occur.

A typical simulation starts with a clearmajority and a clear minority (such as 60 and40 percent), randomly distributed in the grid.After several rounds of discussion, the groupreaches an equilibrium that typically entails alarger majority and a smaller minority (suchas 90 and 10 percent). Opinions are no longerdistributed randomly; rather, minority opin-ions survive in clusters of like-minded peo-ple, often formed around strong individuals.

Local effectsLocal effects drive the model’s global

dynamics. In the initial random configura-tion, more majority than minority memberssurround the average group member. Thisresults in more minority members convert-ing to the majority opinion than vice versa.Research concerning attitude change ingroups8 and opinion swings on the eve of anelection9 has observed such polarization.The other global property—clustering—reflects the influence function’s locality(that is, neighboring individuals have thestrongest influence). In a clustered society,social interaction provides a biased estimateof your opinions’ prevalence. Opinions inthe global minority thus form a local major-ity. Many social phenomena—from socialattitudes to farming techniques and clothingfashions—display pronounced clustering.

Locality of the influence functionThe influence function’s locality also

shapes social change’s dynamics. Computersocial-change models based on the social-influence formula indicate that transitionsoccur as clusters of new that appear andgrow in the sea of old. From this perspec-tive, interacting groups rather than isolatedindividuals are subject to change. Cluster-ing of change is pervasive across a varietyof social and economic phenomena.10

Research has demonstrated the equivalenceof simulation and empirical data withrespect to the economic and politicaltransformations in Poland following com-munism’s collapse in the late 1980s.11,12 Inthe social-interaction process, individualsdefine and shape their common social real-ity. Because this happens on a local level,it promotes different social realities thatare separated in space. Social transitions

occur as the new reality gains at the oldreality’s expense.

Social-space topologyThe topology of the social space defining

interaction patterns is critical for social-influence dynamics.7,12 Social space’s topol-ogy constrains interactions’ spatial structureand thus dictates cluster formation’s nature,the resultant clusters’ shapes, and the proba-bility of their survival or decay. Locality isessentially absent when the communicationpattern resembles a random structure. Underthese conditions, minority opinion rapidlydecays because minority opinions can’tform clusters.

You can interpret one-dimensional topol-ogy, in which people interact primarily withpeople on their left and right, as a villagestretching along a road. Because of well-pronounced local interactions, this topologyinduces strong clustering but no polarization.In a hierarchical topology, where peopleare divided into groups, subgroups, and soforth, the distance between two individualsdepends on the level at which they belong toa common unit. Two members of the sameresearch unit in a university, for example, arecloser than two members of the same depart-ment, who in turn are closer than two schol-ars from different departments. In hierarchi-cal topology, borders of clusters will usuallycoincide with subgroups at some level. Theopinion structure thus follows social-interac-tion patterns. There is pronounced polariza-tion, the degree of which depends mainly onthe groups’ size on the lowest level, initialproportion of minority, and the distributionof individual differences in strength. Onceformed, clusters in such a topology are quitestable. Clearly, we can envision more elabo-rate social-space geometries, each likely tobe associated with specific dynamics.13

Implications for artificialintelligence

The process by which humans constructsocial reality might prove informative fordesigning rules for interaction among intel-ligent agents. The present model’s primaryimplication is that information isn’t merelyacquired, but also evaluated and negotiatedin a social context. An individual is likelyto receive many pieces of mutually contra-dictory information relevant to almost anyissue. Individuals resolve such conflicts notonly by cognitive means, but also by social-influence mechanisms. Mimicking this

Computer social-change models

based on the social-influence

formula indicate that transitions

occur as clusters of new that

appear and grow in the sea of old.

92 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

process might let artificial agents deal withabundant information of varying qualityand degree of conflict. This would requireestablishing mechanisms by which agentsmutually negotiate different informationalelements’ importance and validity. Eachagent evaluates information, and based on this evaluation, decides on its furthertransmission.

The magnitude of influence determineswhich information individuals will adoptand transmit. It’s also conceivable that thesheer volume of information, weighted byits sources’ strength, and the information’scoherent versus contradictory nature mightserve as criteria for evaluating informationflow. Depending on the information’s coher-ence, the agents might interact further toevaluate the information or reach a judgmenton the basis of information that can provide aplatform for action.

In this process, agents differ in theirstrength, which is equivalent to social sta-tus or credibility. To some degree, networkdesigners might predefine strength. Design-ers specify some nodes, for example, to be more important due to their greater pro-

cessing capacity (supercomputers versusdesktop computers) or their superior accessto information (such as more precise sen-sors). Strength can also change in the inter-action process. Nodes broadcasting earlyinformation that turns out to be importantand valid are likely to acquire higher sta-tus than nodes that are slow to broadcast orthat broadcast unimportant or invalidinformation.

In larger networks, the abundance ofinformation (including noise) might proveoverpowering to agents. Moreover, infor-mation’s importance and validity are likelyto vary depending on its location in socialspace. Information might be important andvalid in one location but irrelevant or falsein another. Interactions’ locality enablessocial means of validating informationwhile allowing for diverse decisions in dif-ferent locations. The possibility of localevents in the network depends on the inter-action space’s topology. By choosing theappropriate topology of interaction patternsamong artificial agents, you can choose themost appropriate dynamics to handle dif-ferent tasks. To ensure that all agents pre-

serve safety standards, for example, youmight design a network with no topology,thereby making it impossible for a localgroup of agents to negotiate a lower safetystandard. If an adaptation is crucial for alocal changing environment, however, inter-action based on proximity in that environ-ment might be most beneficial. Such aninteraction structure would not only let dif-ferent subgroups of agents negotiate themost appropriate information for individualand group action, but would also let themtry different adaptation strategies in differ-ent locations. The rest of the network’sagents might then adopt the adaptationstrategies that work out best.

Connections in social networks providelinks for information transmission andsocial influence. For adaptations in the realworld, both processes are vital. Differentnetwork features dictate the dynamics ofinformation transmission and social influ-ence. Social-network formalisms are flexi-ble. In principle, we can accommodate thesocial-influence features we’ve described insocial-network models,14 thereby makingthem similar to attractor neural networks.15

Pedro Domingos is an associate professor inthe University of Washington’s Department ofComputer Science and Engineering. Contacthim at [email protected].

Peter Mika is a PhD student in the Free Univer-sity Amsterdam’s Department of Business Infor-matics and Department of Management andOrganizations. Contact him at [email protected].

Jennifer Golbeck is a PhD candidate at theUniversity of Maryland, College Park. Contacther at [email protected].

Li Ding is a PhD student and a graduate researchassistant at the University of Maryland, Balti-more County. Contact him at [email protected].

Tim Finin is a professor in the University ofMaryland, Baltimore County’s Department ofComputer Science and Electrical Engineering.Contact him at [email protected].

Anupam Joshi is an associate professor ofcomputer science and electrical engineering atthe University of Maryland, Baltimore County.Contact him at [email protected].

Andrzej Nowak is a psychology professor atthe University of Warsaw and a professor at theWarsaw School of Psychology and FloridaAtlantic University. Contact him at [email protected].

Robin R. Vallacher is a psychology professorat Florida Atlantic University and a researchaffiliate at Warsaw University’s Center forComplex Systems. Contact him [email protected].

Although such a description is well suitedto capture social interactions’ existing struc-ture, it provides few clear cues about influ-ence’s dynamics in the network.

Networks of artificial agents, by mimic-king social interaction, should be capableof dealing with real-world information.Social psychology might prove informativein designing interaction rules for such agentsin the same way that cognitive psychologyproved instructive for constructing individ-ual agents.

References

1. A.L. Barabasi, Linked: How Everything IsConnected to Everything Else and What ItMeans, Blume Books, 2003.

2. S. Wasserman, K. Faust, and M.R. Granovet-ter, eds., Social Network Analysis: Methodsand Applications, Cambridge Univ. Press,1994.

3. D.J. Watts, Six Degrees: The Science of aConnected Age, W.W. Norton, 2003.

4. B. Latané, “The Psychology of SocialImpact,” American Psychologist, vol. 36, no.4, 1981, pp. 343–356.

5. A. Nowak, J. Szamrej, and B. Latané, “FromPrivate Attitude to Public Opinion: ADynamic Theory of Social Impact,” Psycho-logical Rev., vol. 97, no. 3, 1990, pp. 362–376.

6. B. Latané et al., “Distance Matters: PhysicalSpace and Social Influence,” Personality andSocial Psychology Bull., vol. 21, no. 8, 1995,pp. 795–805.

7. M. Lewenstein, A. Nowak, and B. Latané,“Statistical Mechanics of Social Impact,”Physical Rev. A, vol. 45, no. 2, 1993, pp.703–716.

8. S. Moscovici and M. Zavalloni, “The Groupas a Polarizer of Attitudes,” J. Personality andSocial Psychology, vol. 12, no. 1969, pp.124–135.

9. E. Noelle-Neumann, The Spiral of Silence:Public Opinion—Our Social Skin, Univ. ofChicago Press, 1984.

10. F. Perroux, “Economic Space: Theory andApplication,” Quarterly J. Economics, vol.

64, Feb. 1950, pp. 89–104.

11. A. Nowak et al., “Simulating the Coordina-tion of Individual Economic Decisions,”Physica A, vol. 297, 2002, pp. 613–630.

12. A. Nowak and M. Lewenstein, “ModelingSocial Change with Cellular Automata,” Mod-eling and Simulation in the Social Sciencesfrom the Philosophy of Science Point of View,K.T. Hegselmann and U. Müller, eds., KluwerAcademic, 1996, pp. 249–285.

13. A. Nowak, B. Latané, and M. Lewenstein,“Social Dilemmas Exist in Space,” SocialDilemmas and Cooperation, U. Schulz, W.Albers, and U. Müller, eds., Springer-Verlag,1994, pp. 114–131.

14. A. Nowak, R.R. Vallacher, and E. Burnstein,“Computational Social Psychology:A NeuralNetwork Approach to Interpersonal Dynam-ics,” Computer Modeling of Social Processes,W. Liebrand, A. Nowak, and R. Hegselman,eds., Sage, 1998, pp. 97–125.

15. J.J. Hopfield, “Neural Networks and Physi-cal Systems with Emergent Collective Com-putational Abilities,” Proc. Nat’l Academy ofSciences, vol. 79, no. 8, 1982, pp. 2554–2558.

JANUARY/FEBRUARY 2005 www.computer.org/intelligent 93

Advertising Sales OfficesMatthew BertholfPhone: +1 785 843 1234 ext. 267Fax: +1 785 843 [email protected]

Sandy Brown10662 Los Vaqueros Circle, Los Alamitos, CA90720-1314; phone +1 714 821 8380; fax +1 714 8214010; [email protected]

Advertiser/Product IndexJanuary/February 2005

For production information, conference and classified

advertising, contact Marian Anderson, IEEE Intelligent

Systems, 10662 Los Vaqueros Circle, Los Alamitos, CA

90720-1314; phone +1 714 821 8380; fax +1 714 821

4010; [email protected]; www.computer.org.

N E X T I S S U E

March/April: Planning with Templates

Despite the promise of intelligent systems technology, interactive intel-ligent systems are still hard and expensive to build, hard to understand,hard to use, and hard to evolve and modify through their use. Onepromising new technology area is agent-based intelligent forms as typ-ified in DARPA’s research and demonstration program in Active Tem-plates. Such forms let people create and use applications with simple,yet smart interfaces for getting the information they need and for cap-turing decisions they make on the basis of this information. This spe-cial issue will present the state of the practice, descriptions of researchand evolving technology, application-focused case studies, and aroadmap for required research. In addition, this issue will documentthe substantial progress that has taken place in the past 10 years. TheFebruary 1995 IEEE Expert (now IEEE Intelligent Systems) describedthe state of the practice in knowledge-based planning and schedulingwith case studies of application in the military context and to user-cen-tered software engineering. Significant progress since then has yieldedpowerful, yet underutilized new capabilities that can improve intelli-gent systems applications in many areas.

Page No.

IEEE MAS&S 2005 4