How Does NetBase Achieve the Best Accuracy for ...

10
How Does NetBase Achieve the Best Accuracy for Understanding Consumers Online? NETBASE SEPTEMBER 2010

Transcript of How Does NetBase Achieve the Best Accuracy for ...

How Does NetBase Achieve the Best Accuracy for Understanding Consumers Online?

NETBASE SEPTEMBER 201 0

How Does NetBase Achieve the Best Accuracy for Understanding Consumers Online? 2

The Big Opportunity from Social Media: Understanding Consumers OnlineBusiness professionals around the world are recognizing that social media represents not only a way of communicating with customers but also a strategic data source for making decisions. Consumers are sharing a constant stream of opinions, emotions, and behaviors online, and this chatter represents valuable information about what they are thinking and feeling about your brand.

The social media universe is growing at an astonishing rate. According to Harris Interactive, almost two-thirds of online Americans now use social media.1 Facebook reports that the average user creates 90 pieces of content each month.2 As many as 80 percent of consumers report that they do online research before they purchase certain consumer electronics products.3 Every hour, NetBase’s natural language processing (NLP) engine finds and indexes 500,000+ blog entries, status updates, forum comments, and news articles that consumers have posted.4 With this volume of data available, those companies that can break through the noise to truly understand their consumers have unique opportunities to:

• Optimize their messaging, advertising, and packaging • Identify “hot topics,” market trends, and unmet consumer needs • Measure and monitor their social media brand equity in real time • Make better decisions across their businesses

Where Is Social Media Analysis Today? To date, businesses have used social media data primarily to monitor the quantity of conversations about their brand (“buzz”) and to get a quick read on sentiment, that is, whether online posts are positive, negative, or neutral. While useful in concept, automated sentiment analysis has been plagued by questions of data quality5 because the accuracy of the first generation of technologies has been low – under 50 percent by most estimates. In other words, traditional automated sentiment analysis tools incorrectly interpret the content of an online post nearly half the time, deciding, for example, that a negative is a positive or a positive is a neutral.

In order to realize the strategic value of social media data, businesses need answers to three key questions:

• Can sentiment be a useful measure to capture from social media? • How accurate is the machine-scored analysis upon which sentiment data relies? • What actionable information could we extract from social media other than sentiment?

This white paper will provide insights into these questions. We will discuss the challenges of accurately measuring consumer opinions, explore the type of insights that social media analysis can provide above and beyond sentiment, and describe how NetBase achieves the best accuracy on the market for sentiment analysis and beyond.

Accurate Sentiment Analysis Reveals Social Brand HealthSentiment analysis focuses on analyzing the content of online posts, determining whether they are positive, negative, or neutral, and aggregating the sentiments detected into a single generic score. Sentiment analysis can serve as a practical, cost-effective way to track general brand health over time, assuming you have a way to measure sentiment accurately.

How Does NetBase Achieve the Best Accuracy for Understanding Consumers Online? 3

What Makes Sentiment Analysis Inaccurate?On the surface, it may seem that sentiment analysis would be an easy thing to do. However, two factors have held back automated efforts to date:

• The complexity of language itself • The volume and variety of content on the social Web

First-generation solutions for automated sentiment analysis typically assess sentiment based on some form of statistical keyword-matching, where the technology looks for the occurrence of “good” and “bad” words in conjunction with the brand being tracked. The problem with this approach is that understanding language is much more complex than simply looking for the presence of positive or negative words. That’s why most studies show that keyword matching is wrong more than 50 percent of the time. Even the newer natural language processing technologies must analyze content at a very deep level in order to be able to infer sentiment accurately.

To understand language, a technology must be able to account for the fact that meaning can change depending on the context in which a word is used. It needs to look not just at the word itself but also at the “connective tissue” surrounding that word in the sentence. For example, the sentence, “The iPhone has never been good,” is actually a negative statement in spite of the fact that it uses the word “good.” An almost identical sentence, “The iPhone has never been this good,” is positive. In the sentence, “I like using my iPhone, but I hate the way that applications work on the Droid,” the words “iPhone” and “hate” occur close together but are not associated with each other.

One Small Word Makes a Big Difference

The iPhone has never been good. The iPhone has never been this good.

Furthermore, accurate sentiment analysis requires a new kind of scalability. A technology must be able to process billions of sentences and it must also be able to address millions of variations in expression within those billions of sentences. For example, the type of language people use in tweets is different from that used in status updates or reviews. Bloggers tend to use much richer language than contributors to other types of social media. Many social media posts are full of misspellings, poor grammar, colloquialisms, forum-specific acronyms, rapidly evolving “slanguage” or “urban words.” A technology that cannot take all of this variation into account will either only work for a limited set of content types, or it will miss the accuracy mark.

How Does NetBase Achieve the Best Accuracy for Understanding Consumers Online? 4

NetBase Introduces a New Level of Accuracy NetBase’s NLP engine represents a big leap in accurately analyzing content from the social media universe. It is nearly 90 percent accurate in determining not only sentiment but deeper insights like opinions and emotions because it takes a very deep approach to understanding language. In addition, our NLP engine is highly scalable; you can access a massive, always-on index containing 12 months of social media content and get insights in seconds, on an unlimited number of brands, with no setup or professional services required.

NetBase’s NLP engine does not count words or analyze text; it reads sentences, evaluates grammatical sentence patterns, and organizes results to be fully searchable on a wide variety of attributes. NetBase analyzes social media content in two steps: parsing and normalization.

First, NetBase’s NLP engine parses each sentence it captures from social media at a very deep level. This process is similar to the sentence diagramming that students do in a high school English class—it identifies and links the subjects, objects, verbs, adjectives, and other linguistic patterns in the sentence to extract deep and accurate understanding of what is being said. By analyzing this “connective tissue” within each sentence, our NLP engine can account for the complexities in language that make keyword-matching algorithms inaccurate.

Surface Form Logical Form

Unrelated Posts

I covet an iPhone as it is adorable.

iPhones are wanted by many people due to how adorabel they are.

I want the iPhone, which is adorable...

I want the iPhone. It is adorable.

I want an iPhone as it has lots of apps and is adorable...

I want my iPhone to look adorable.

I want my iPhones to show you the adorable shoes I want.

Surface Variations

I want an iPhone because they’re adorable.

Want iPhone adorable

Analyzing the “Connective Tissue” of a Sentence

Next, our NLP engine normalizes all the parsed sentences to make them easy to search. It takes sentences (what we call “sound bites”) and stores them, based on the type of insight they reveal, in a single, consistent format, regardless of the structure of the underlying sentences. This allows NetBase to instantly display insights that meet a certain set of search criteria without re-indexing them or re-parsing them. Normalization is a fundamental part of NetBase’s unique value because it allows our tool to display not just positives and negatives but also deeper-level insights embedded within the content.

How Does NetBase Achieve the Best Accuracy for Understanding Consumers Online? 5

For example, a brand manager of a certain car model wants to understand what people like and dislike about her cars. With NetBase, she can see what percentage of people like or dislike the styling, colors, handling, safety features, pricing, and much more, and she can compare these social metrics across competing brands.

NetBase is the only company that has a massive repository of more than nine billion sound bites of pre-parsed social media content that is searchable in real-time. This index uses a very efficient, scalable database so that users can get results in a matter of seconds and quickly drill down to the level of individual posts. Our NLP engine parses and stores new content at Internet scale—huge volumes on a daily basis.

Working with many of the strongest brands in the world, we have optimized the NetBase NLP engine specifically for understanding social media and the Web. In addition to standard English, our extensive lexicon includes a wide variety of “urban words” and phrases, alternative spellings, and abbreviations common in social media, as well as common misspellings. We are constantly incorporating new rules into this lexicon based on the work of our internal linguistics experts, ongoing testing that we do using “crowdsourced” human evaluators (described later in this white paper), and on feedback from customers. Because NetBase develops its products using two-week release cycles, our NLP engine adapts very rapidly.

Beyond Sentiment Analysis: Positives and Negatives Are Just One Way of Looking at the WorldThe NetBase NLP engine makes it possible to read and understand billions of sentences very accurately. However, positives and negatives are just one way of looking at the world. There are other pieces to the consumer insights puzzle beyond sentiment: opinion, emotion, and behavior. The real question that many decision makers ask is, Why do consumers think and feel as they do? NetBase not only makes sentiment data trustworthy but gives you ways to answer these deeper questions.

Many NetBase customers view sentiment analysis as just the first step towards these deeper levels of understanding. It is a departure point from which they dig down into the underlying opinions, emotions, and behaviors expressed in the comments that NetBase classifies and stores.

Consider the case of a large CPG company that used a number of automated online listening tools, including NetBase, to monitor and understand an emerging issue in its customer base. The aggregate sentiment measures provided some insights into the severity of the issue, but the manufacturer needed to move beyond sentiment to take action. Analyzing the opinions and emotions expressed in both positive and negative posts, the manufacturer was able to understand the underlying consumer motivators and rapidly craft a response strategy.

Another phenomenon provides further illustration of the importance of deeper understanding. We have found that even in situations where negative sentiment surges, it’s common for positive sentiment to surge as well, driven by “loyalists” who are singing the praises of their brand in the face of adversity. For example, we used ConsumerBase to look at chatter about the Toyota Prius during the period in early 2010 when the company was facing a firestorm of media criticism over its accelerator pedals’ tendency to stick and its braking problems.* During this same period, loyal Prius owners were filling the social Web with postings that expressed their love and passion for their vehicles. Looking at sentiment alone would not have told Toyota the whole story of its consumers’ feelings.

* Unless otherwise stated, all brand examples are based on publicly available Internet and social media content and do not reflect the usage or opinions of their respective owners.

How Does NetBase Achieve the Best Accuracy for Understanding Consumers Online? 6

Social media analysis on Toyota Prius. (August 2009-July 2010)

In another instance, we used ConsumerBase to look at some of the biggest brands in golf: Callaway, Nike, TaylorMade, Titleist, and Wilson. We discovered that consumers like all these brands—sentiment is positive across the board. However, in looking at what they were saying, we discovered that Callaway stands out in that consumers truly “love” its products.

Social media analysis of passion about select golf brands. (June 2010)

These examples show the power of understanding consumers online. For example, Toyota would probably want to know what the specifics are beneath the negative sentiments and see how opinions about “expensive” and “ugly” compare to opinions about “brake problems” or “accelerator.” The brand managers would benefit by tracking how specific reported behaviors change over time, such as “buy” versus “not buy.” They might also want

How Does NetBase Achieve the Best Accuracy for Understanding Consumers Online? 7

to know how passionate consumers are about the Prius: Do they say they “love” it? “Like” it? “Don’t trust” it? At NetBase, we’re calling these new measures “social metrics.” Social metrics include social purchase intent, social satisfaction, social recommend, and social passion. Together, they provide much more actionable information than sentiment alone.

When evaluating the accuracy of tools for social media analysis, it’s important to look not only at the accuracy of sentiment analysis but also at the tools’ ability to accurately aggregate and organize data for more levels of analysis than “positive” and “negative.”

How Accurate Is NetBase? At NetBase, we know that our NLP engine provides significant accuracy advantages over traditional sentiment analysis tools because we test its accuracy on an ongoing basis. In addition to internal testing, we do studies that compare a selection of NetBase results to similar assessments conducted by humans. The most recent NetBase accuracy study, conducted between August 23 and September 3, 2010, showed accuracy between 87% and 99% for the four different types of insights that NetBase delivers: sentiments, likes and dislikes, emotions, and behaviors.

NetBase Accuracy Study MethodologyIn the study, we constructed a sample of 2,500 results retrieved from ConsumerBase across five products: the Apple iPhone, the Chevrolet Camaro, Taco Bell, Wal-Mart, and fish oil. Then, we created a survey that asked human evaluators to assess the accuracy of ConsumerBase’s categorization of the sentiments, likes and dislikes, emotions, and behaviors being expressed by reviewing the sound bites and reporting what they believed the authors’ opinions to be.

Representative sound bite from the NetBase sample.

Survey item testing sentiment in this sound bite.

A small percentage of the survey items were “gold” items, meaning that we encoded a correct answer to test the reliability of the evaluators.

Finally, we recruited “crowdsourced” workers based in the United States to serve as evaluators. When agreeing to participate in a project like the accuracy study, crowdsourced workers have no knowledge of the purpose of the survey and therefore are able to provide unbiased opinions.

How Does NetBase Achieve the Best Accuracy for Understanding Consumers Online? 8

Once the survey was online, the workers could choose to complete as many items as they wished, provided that they were able to answer the “gold” items correctly. “Gold” items were automatically shown frequently towards the beginning of the task and less frequently thereafter. When a worker got a “gold” item wrong, the survey alerted the worker to the correct answer, providing feedback and training on the task. If a worker got less than 60% of “gold” items correct, their judgments were not collected. The surveys remained online until at least three “trustworthy” workers had provided a judgment for each sound bite being tested.

Because the survey data consisted of high-precision results, the crowdsourced workers agreed on what was being expressed in the majority of the sound bites. In cases where they did not agree among themselves, we eliminated the sound bites from our analysis.

The study showed the following accuracy levels:

TYPE OF RESULT ACCURACY MEASURED

Sentiment (positive or negative) 87%

Likes and Dislikes 87%

Behaviors 91%

Emotions 99%

How Does NetBase Achieve the Best Accuracy for Understanding Consumers Online? 9

ConclusionAutomated language understanding will never be 100% accurate even with the best of technologies. There will always be some limitations; for example, sarcasm is often difficult for even the deepest NLP algorithms to identify. (Humans also have problems interpreting sarcasm for many of the same reasons.) Nonetheless, the near-90 percent accuracy that NetBase delivers is a game-changer for businesses. For the first time ever, you can use social media analysis not just to measure buzz but also to drive better business decisions. As you look more deeply at sentiment analysis and other ways you could leverage social media today and in the future, here are three key considerations:

• Make sure you have the smartest possible social media analysis tools. You should know the accuracy of the data that powers your decisions, and you should look for solutions that provide you with the best possible results. NetBase delivers by far the best accuracy in the market. We will continue to improve on our accuracy and make it transparent through studies like the one described above, so that our customers know they can rely upon NetBase data to make business decisions. • Look beyond sentiment. The real value of social media analysis is in surfacing actionable insights, not just in generating metrics. Sentiment is important, but going beyond sentiment is what will make online listening part of the fabric of your organization.

• Consider the value that accurate social media analysis can provide throughout your organization: – Brand management and marketing professionals can track social brand health and make better messaging, packaging, and strategy decisions. – Market research professionals can conduct quick social media market research or netnography, or take a “fast first look” prior to starting a lengthy or expensive market research study. – Product development professionals can track competitors and find new product ideas. – Advertising professionals can improve ad targeting, refine ad copy, and track ad effectiveness. – Public relations professionals can augment social listening strategies by spot-checking issues for deeper understanding and historical trending. – Sales representatives can tailor their communications to real issues that their prospects face. – Customer service teams can uncover new product issues through online chatter, track the importance of issues, and improve support activities.

1. http://www.harrisinteractive.com/NewsRoom/HarrisPolls/tabid/447/mid/1508/articleId/403/ctl/ReadCustom%20Default/Default.aspx

2. http://www.facebook.com/press/info.php?statistics

3. Internal research reported by NetBase customers

4. Statistics from NetBase natural language processing engine

5. See Forrester Research, “Trends 2010: Listening Platforms, Findings From Forrester’s Listening Platforms Wave™ Evaluation,” September 2010.

NetBase provides tools for understanding consumer opinions, emotions, and behaviors as

expressed in social media and the Web. Our state-of-the-art natural language processing

(NLP) engine reads billions of conversations from more than 75 million social media sources.

It automatically organizes up to one year of brand-related chatter to determine not only

sentiment but also deep, actionable insights. NetBase NLP is nearly 90 percent accurate, so

our customers have the trusted information they need to make better business decisions.

Customers like Coca-Cola and Procter & Gamble are turning to NetBase because our tools are

smarter, faster, and cheaper than any alternative on the market. Based in the heart of Silicon

Valley, NetBase is a privately held company. For more information, visit: www.netbase.com.

©2010 NetBase Solutions, Inc.

NetBase Solutions, Inc.

2087 Landings Drive

Mountain View | CA 94043

P 650.810.2100

F 650.968.4872

www.netbase.com

About NetBase