A Non-Geek's Big Data Cheat Sheet - SAS
-
Upload
khangminh22 -
Category
Documents
-
view
3 -
download
0
Transcript of A Non-Geek's Big Data Cheat Sheet - SAS
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
A NON-GEEK’S BIG DATA CHEAT SHEET:
FIVE QUESTIONS FOR SAVVY TECHNOLOGY LEADERS
TAMARA DULL, DIRECTOR OF EMERGING TECHNOLOGIES
#SASGIS16
@tamaradull
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
Big data
is not
new.
POS DATA CRMTROUBLE
TICKETS
FINANCIAL
DATA
LOYALTY
CARD DATA
EMAIL PDF FILES RFID TAGSSPREAD-
SHEETS
WORD
PROCESSING
DOCUMENTS
GPSWEB LOG
DATASOCIAL
MEDIA DATAPHOTOS
SATELLITE
IMAGES
BLOGS FORUMS XML DATA
CLICK-
STREAM
DATA
VIDEOS
MOBILE
DATA
WEBSITE
CONTENTCALL CENTER
TRANSCRIPTSRSS FEEDS
AUDIO
FILES
20%
80%
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHAT’S TRENDING?
PART 1 OF 3
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
The market is growing.
SOURCE: http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2013-2017
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
The success rate is okay, but not great.
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
People issues trump technology issues.
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
Analytics keep them coming back.
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
THE 5 QUESTIONS
PART 2 OF 3
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
THE 5 QUESTIONS
1. What can Hadoop do that my data warehouse can’t?
2. We’re not doing “big” data, so why do we need Hadoop?
3. Is Hadoop enterprise-ready?
4. Isn’t a data lake just the data warehouse revisited?
5. What are some of the pros and cons of a data lake?
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
QUESTION 1 WHAT CAN HADOOP DO THAT MY DATA WAREHOUSE CAN’T?
2. Process data more quickly (and cheaply).
1. Store data more cheaply.$
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
QUESTION 2 WE’RE NOT DOING “BIG” DATA, SO WHY DO WE NEED HADOOP?
Stage structured data. Process structured data. Archive any data.
Process any data. Access any data.(via data warehouse)
Access any data.(via Hadoop)
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
QUESTION 3 IS HADOOP REALLY ENTERPRISE-READY?
For your organization: MaybeFor all organizations: No Are we
there yet?
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
QUESTION 4 ISN’T A DATA LAKE JUST THE DATA WAREHOUSE REVISITED?
DATA WAREHOUSE vs. DATA LAKE
structured, processed DATA structured / semi-structured / unstructured, raw
schema-on-write PROCESSING schema-on-read
expensive for large data volumes
STORAGE designed for low-cost storage
less agile, fixed configuration
AGILITY highly agile, configure and reconfigure as needed
mature SECURITY maturing
business professionals USERS data scientists et. al.
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
QUESTION 5 WHAT ARE SOME OF THE PROS AND CONS OF A DATA LAKE?
strengths
lower costsone-stop data shopping
weaknesses
data managementsecurity
opportunities
discovery advanced analytics
threats
status quo skills
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
A COMPARISON & CONTRAST EXERCISE
PART 3 OF 3
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
COMPARISON &
CONTRASTA FUNCTIONAL COMPARISON: TRADITIONAL & BIG DATA
Business Requirements Traditional Big Data
Discovery of unexplored business questions
Clean, transformed, high-quality aggregated data
Low latency, interactive reports, OLAP
High volumes of raw, highly granular, unstructured data
Exploratory analysis of preliminary data
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
COMPARISON &
CONTRASTA COST COMPARISON: THE TCOD MODEL
Challenge: Which platform is the most cost-effective – EDW or Hadoop?
The Total Cost of Data (TCOD) model:
Calculates the cost of using data over a 5-year period
Includes these costs:
» System and data administration
» Data integration
» Query development
» Procedural program development
» Analytic application developmentFree downloads: Special Report: http://www.wintercorp.com/tcod-report Spreadsheet: http://www.wintercorp.com/tcod-spreadsheet
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
COMPARISON &
CONTRASTTCOD EXAMPLE 1: BUILDING A DATA WAREHOUSE
Requirements:
Large number of data sources, users, complex queries, analyses and analytic applications
Data integration and integrity
Reusability and agility to accommodate rapidly changing business requirements and long data life
Data volume: 500 TB
Source: Special Report – Big Data: What Does It Really Cost?, Wintercorp, 2013
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
COMPARISON &
CONTRASTTCOD EXAMPLE 2: BUILDING A DATA REFINERY
Objective: Refine the sensor output of large industrial diesel engines
Requirements:
Rapid, intensive processing of a small number of closely-related data sets
Analysis reads the entire dataset
Life of the raw data is relatively short
Small group of experts collaborate on analysis
Data volume: 500 TB
Source: Special Report – Big Data: What Does It Really Cost?, Wintercorp, 2013
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
@tamaradull #SASGIS16
COMPARISON &
CONTRASTA COST COMPARISON: TCOD 5-YEAR SUMMARY (IN USD$)
CostData Warehouse
PlatformHadoop
System Cost $44.6 $1.4
Initial acquisition $10.8 $0.2
Upgrades $16.4 $0.3
Maintenance/support $15.9 $0.2
Power/space/cooling $1.5 $0.6
Administration $7.7 $8.5
Application development $16.5 $36.0
ETL $18.4 --
Complex queries $88.7 $475.0
Analysis $88.7 $219.0
Total Cost of Data $265.0 million $740.0 million
Example 1: Data Warehouse Example 2: Data Refinery
Data Warehouse Appliance
Hadoop
$22.7 $1.4
$5.5 $0.2
$8.4 $0.3
$8.2 $0.2
$0.6 $0.7
$0.8 $0.8
$6.6 $7.2
-- --
-- --
-- --
$30.0 million $9.3 million
Source: Special Report – Big Data: What Does It Really Cost?, Wintercorp, 2013
HADOOP 3X MORE EXPENSIVE HADOOP 1/3rd THE COST
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .sas.com
IT’S A BIG DATA WORLD OUT THERE.
NOW LET’S BE SAFE.
Tamara Dull
@tamaradull