A Non-Geek's Big Data Cheat Sheet - SAS

24
Copyright © 2012, SAS Institute Inc. All rights reserved. A NON-GEEK’S BIG DATA CHEAT SHEET: FIVE QUESTIONS FOR SAVVY TECHNOLOGY LEADERS TAMARA DULL, DIRECTOR OF EMERGING TECHNOLOGIES #SASGIS16 @tamaradull

Transcript of A Non-Geek's Big Data Cheat Sheet - SAS

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

A NON-GEEK’S BIG DATA CHEAT SHEET:

FIVE QUESTIONS FOR SAVVY TECHNOLOGY LEADERS

TAMARA DULL, DIRECTOR OF EMERGING TECHNOLOGIES

#SASGIS16

@tamaradull

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

Big data

is not

new.

POS DATA CRMTROUBLE

TICKETS

FINANCIAL

DATA

LOYALTY

CARD DATA

EMAIL PDF FILES RFID TAGSSPREAD-

SHEETS

WORD

PROCESSING

DOCUMENTS

GPSWEB LOG

DATASOCIAL

MEDIA DATAPHOTOS

SATELLITE

IMAGES

BLOGS FORUMS XML DATA

CLICK-

STREAM

DATA

VIDEOS

MOBILE

DATA

WEBSITE

CONTENTCALL CENTER

TRANSCRIPTSRSS FEEDS

AUDIO

FILES

20%

80%

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

TODAY’S AGENDA

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHAT’S TRENDING?

PART 1 OF 3

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

The market is growing.

SOURCE: http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2013-2017

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

The success rate is okay, but not great.

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

People issues trump technology issues.

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

Analytics keep them coming back.

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

THE 5 QUESTIONS

PART 2 OF 3

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

THE 5 QUESTIONS

1. What can Hadoop do that my data warehouse can’t?

2. We’re not doing “big” data, so why do we need Hadoop?

3. Is Hadoop enterprise-ready?

4. Isn’t a data lake just the data warehouse revisited?

5. What are some of the pros and cons of a data lake?

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

QUESTION 1 WHAT CAN HADOOP DO THAT MY DATA WAREHOUSE CAN’T?

2. Process data more quickly (and cheaply).

1. Store data more cheaply.$

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

QUESTION 2 WE’RE NOT DOING “BIG” DATA, SO WHY DO WE NEED HADOOP?

Stage structured data. Process structured data. Archive any data.

Process any data. Access any data.(via data warehouse)

Access any data.(via Hadoop)

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

QUESTION 3 IS HADOOP REALLY ENTERPRISE-READY?

For your organization: MaybeFor all organizations: No Are we

there yet?

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

QUESTION 4 ISN’T A DATA LAKE JUST THE DATA WAREHOUSE REVISITED?

DATA WAREHOUSE vs. DATA LAKE

structured, processed DATA structured / semi-structured / unstructured, raw

schema-on-write PROCESSING schema-on-read

expensive for large data volumes

STORAGE designed for low-cost storage

less agile, fixed configuration

AGILITY highly agile, configure and reconfigure as needed

mature SECURITY maturing

business professionals USERS data scientists et. al.

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

QUESTION 5 WHAT ARE SOME OF THE PROS AND CONS OF A DATA LAKE?

strengths

lower costsone-stop data shopping

weaknesses

data managementsecurity

opportunities

discovery advanced analytics

threats

status quo skills

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

A COMPARISON & CONTRAST EXERCISE

PART 3 OF 3

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

COMPARISON &

CONTRASTA FUNCTIONAL COMPARISON: TRADITIONAL & BIG DATA

Business Requirements Traditional Big Data

Discovery of unexplored business questions

Clean, transformed, high-quality aggregated data

Low latency, interactive reports, OLAP

High volumes of raw, highly granular, unstructured data

Exploratory analysis of preliminary data

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

COMPARISON &

CONTRASTA COST COMPARISON: THE TCOD MODEL

Challenge: Which platform is the most cost-effective – EDW or Hadoop?

The Total Cost of Data (TCOD) model:

Calculates the cost of using data over a 5-year period

Includes these costs:

» System and data administration

» Data integration

» Query development

» Procedural program development

» Analytic application developmentFree downloads: Special Report: http://www.wintercorp.com/tcod-report Spreadsheet: http://www.wintercorp.com/tcod-spreadsheet

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

COMPARISON &

CONTRASTTCOD EXAMPLE 1: BUILDING A DATA WAREHOUSE

Requirements:

Large number of data sources, users, complex queries, analyses and analytic applications

Data integration and integrity

Reusability and agility to accommodate rapidly changing business requirements and long data life

Data volume: 500 TB

Source: Special Report – Big Data: What Does It Really Cost?, Wintercorp, 2013

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

COMPARISON &

CONTRASTTCOD EXAMPLE 2: BUILDING A DATA REFINERY

Objective: Refine the sensor output of large industrial diesel engines

Requirements:

Rapid, intensive processing of a small number of closely-related data sets

Analysis reads the entire dataset

Life of the raw data is relatively short

Small group of experts collaborate on analysis

Data volume: 500 TB

Source: Special Report – Big Data: What Does It Really Cost?, Wintercorp, 2013

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

@tamaradull #SASGIS16

COMPARISON &

CONTRASTA COST COMPARISON: TCOD 5-YEAR SUMMARY (IN USD$)

CostData Warehouse

PlatformHadoop

System Cost $44.6 $1.4

Initial acquisition $10.8 $0.2

Upgrades $16.4 $0.3

Maintenance/support $15.9 $0.2

Power/space/cooling $1.5 $0.6

Administration $7.7 $8.5

Application development $16.5 $36.0

ETL $18.4 --

Complex queries $88.7 $475.0

Analysis $88.7 $219.0

Total Cost of Data $265.0 million $740.0 million

Example 1: Data Warehouse Example 2: Data Refinery

Data Warehouse Appliance

Hadoop

$22.7 $1.4

$5.5 $0.2

$8.4 $0.3

$8.2 $0.2

$0.6 $0.7

$0.8 $0.8

$6.6 $7.2

-- --

-- --

-- --

$30.0 million $9.3 million

Source: Special Report – Big Data: What Does It Really Cost?, Wintercorp, 2013

HADOOP 3X MORE EXPENSIVE HADOOP 1/3rd THE COST

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

KEY TAKEAWAYS

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .sas.com

IT’S A BIG DATA WORLD OUT THERE.

NOW LET’S BE SAFE.

Tamara Dull

[email protected]

@tamaradull