Pentaho and NoSQL
-
Upload
feris-thia -
Category
Documents
-
view
1.083 -
download
6
description
Transcript of Pentaho and NoSQL
1JaMU – Jakarta 7 Maret 2014
Pentaho and NoSQLJava Meet Up (JaMU), Jakarta
7th March, 2014
Feris [email protected]
2JaMU – Jakarta 7 Maret 2014
ABOUT ME
Founder
2007 2013Feris Thia
PHI-Integration
3JaMU – Jakarta 7 Maret 2014
ABOUT ME
Book Author
Feris Thia
November 2013
4JaMU – Jakarta 7 Maret 2014
ABOUT ME
Community Manager
Feris Thia
Excel Indonesia User Group (EIUG)
Pentaho User Group Indonesia (Pentaho-
ID)2008(~1000 members)
2013(~5000
members)
5JaMU – Jakarta 7 Maret 2014
ABOUT MEPHI-Integration Clients
Community Manager
Feris Thia
6JaMU – Jakarta 7 Maret 2014
AGENDA
DATA PREPARATIONWhat and why it is
important?
PENTAHO DATA INTEGRATIONPopular Open Source ETL
NOSQLAn Emerging Non Relational
DatabaseTechnology
7JaMU – Jakarta 7 Maret 2014
PROBLEMS?
8JaMU – Jakarta 7 Maret 2014
image source: http://www.huntbigsales.com/winning-in-the-meeting-after-the-meeting/
What cause sales increase in this area? Is
there something unusual happen?
WHAT?? So we cannot make any decisions until the data ready.
We need some times to prepare additional data to
answer that.
Yes, sir….
9JaMU – Jakarta 7 Maret 2014
Image Source: http://wrapbootstrap.com/preview/WB0KDM51J/
TYPICAL SOLUTION
SOPHISTICATED REPORTING OR DASHBOARD APPLICATION!
10JaMU – Jakarta 7 Maret 2014
Image Source: http://reallybadboss.com/wp-content/uploads/2012/02/frustration.jpg
PROBLEMS REMAIN…
11JaMU – Jakarta 7 Maret 2014
Time Spent on Data Preparation
80 %
Data Quality
50%
Extract, Transformation & Load
30%
12JaMU – Jakarta 7 Maret 2014
13JaMU – Jakarta 7 Maret 2014
DATA PREPARATION IS THE KEY
Entry Systems Data PreparationReportingBasic Data
Presentation
Performance Dashboard
(Visualization)
1 2 3 4
Notes: Data preparation is often undermine.
14JaMU – Jakarta 7 Maret 2014
DATA WAREHOUSE
Entry Systems Data Warehouse BusinessIntelligence
1 2 3
15JaMU – Jakarta 7 Maret 2014
DATA WAREHOUSE
16JaMU – Jakarta 7 Maret 2014
CHALLENGES
17JaMU – Jakarta 7 Maret 2014
INTEGRATIONof many data sources
INCREMENTALExtract only changes
DATA SIZEBig data
INFRASTRUCTUREnetwork failure, high latency, slow i/o, etc.
DATA QUALITYmissing data, conversion etc.
PROTOCOLdriver availability, reliability, etc.
EXTRACT
18JaMU – Jakarta 7 Maret 2014
NORMALIZE
DENORMALIZE
SPLIT / MERGE
DATA REDUCTION
(Aggregate, etc)
TRANSPOSE
TEXT PARSING
TRANSFORM
19JaMU – Jakarta 7 Maret 2014
PERFORMANCEof many data sources
CHANGESstructure, data type, column
size, etc
DATA SIZEBig data
INFRASTRUCTUREnetwork failure, high latency, slow i/o, etc.
DATA MAPPINGsync with correlated data
Output FormatExcel, PDF, HTML, RDBMS, etc.
LOAD
20JaMU – Jakarta 7 Maret 2014
DEMOData structure changes to increase SQL query performance.
21JaMU – Jakarta 7 Maret 2014
Pentaho Data IntegrationOpen Source ETL
22JaMU – Jakarta 7 Maret 2014
FEATURES AND BENEFITS
• Open Source
• Cost Efficient
• More than 200 modules
• Multi OS Platform
• Working with emerging Big Data platforms
• Low Learning Curve
23JaMU – Jakarta 7 Maret 2014
DEMO
Basic Extract and
TransformaionMore I/O Helper Table
(Closure)
1 2 3
24JaMU – Jakarta 7 Maret 2014
NoSQLNot only SQL
25JaMU – Jakarta 7 Maret 2014
2009Redis Initial Release
TIMELINEEmergence of open source NoSQL
2004 2006 2007 2008 2009 2011 2012 2013 2014
2007MongoDB Started,
Neo4J Initial Release
2004Google’s Map Reduce Paper
Published
2012Google Spanner
PaperPublished
1998
1998NoSQL coined
2006HadoopStarted
2008Apache Hbase,
Apache Cassandra
26JaMU – Jakarta 7 Maret 2014
NOSQL GROUPS
DOCUMENTMongoDB, CouchDB,
Riak
WIDE COLUMNCassandra, Hbase,
Hypertable
GRAPHNeo4J, OrientDB
KEY - VALUERedis, MemcacheDB,
SimpleDB
<K, V>
27JaMU – Jakarta 7 Maret 2014
NOSQL VS SQL
http://gigaom.com/2010/07/12/nosql-pioneers-are-driving-the-webs-manifest-destiny/
Data Store Type Use Cases Advantages Disadvantages Key Product
Key-Value In-memory cache, web-site analytics, log file analysis
Simple, replication, versioning, locking, transactions, and sorting web-accessible, schema-less, distributed
Simple, small set of data types, limited transaction support
Redis, Scalaris, Tokyo Cabinet
Tabular or Columnar Data mining, analytics Rapid data aggregation, scalable, versioning, locking, web-accessible, schema-less, distributed
Limited transaction support Google BigTable, Hbase or HyperTable, Cassandra
Document Store Document management CRM, Business continuity
Stores and retrieves unstructured documents, map/reduce, web- accessible, schema-less, distributed
Limited transaction support CouchDB, MongoDB, Riak
Traditional Relational Transaction processing, typical corporate workloads
Well documented and supported, mature code, widely implemented in production
Cost, vertical scaling, increased complexity
Oracle, Microsoft SQL Server, MySQL Cluster
28JaMU – Jakarta 7 Maret 2014
Nosql VS SQL
• Schema are much more flexible
• Non relational (no joins)
• Horizontal Scalability
• Master – Slave
• Peer-to-peer
• Data Pipeline
– Expressions
– Functional Programming
• ACID (Atomicity, Consistency, Isolation, Durability)
• BASE (Basic Availability, Soft-state, Eventual consistency)
• CAP (Consistency, Availability, Partition Tolerance)
29JaMU – Jakarta 7 Maret 2014
DB-ENGINES.COM DB RANKINGPER 7 MARCH 2014
Rank Last Month DBMS Database Model Score Changes
1 1Oracle Relational DBMS 1491.8 -8.43
2 2MySQL Relational DBMS 1290.21 1.83
3 3Microsoft SQL Server Relational DBMS 1205.28 -8.99
4 4PostgreSQL Relational DBMS 235.06 4.61
5 5MongoDB Document store 199.99 4.81
6 6DB2 Relational DBMS 187.32 -1.14
7 7Microsoft Access Relational DBMS 146.48 -6.4
8 8SQLite Relational DBMS 92.98 -0.03
9 9Sybase ASE Relational DBMS 81.55 -6.33
10 10Cassandra Wide column store 78.09 -2.23
30JaMU – Jakarta 7 Maret 2014
MongoDBDocument Oriented Database
• Schemaless
• Distributed
• Auto Sharding
• Map Reduce Capabilities
• Multi Platform
• Structures
– Database
– Collections
– Documents
• Document
– A record is a document
– Similar to JSON Objects
31JaMU – Jakarta 7 Maret 2014
MongoDB
• MongoDB Shell
• Insertdb.koleksi.insert( {nama: “PHI-Integration”, type: “Company”})
• Insert / Updatedb. koleksi.update( {nama: “PHI-Integration”}, {name: “Lightora”},
{upsert:true})
• Deletedb. koleksi.remove( {nama: “PHI-Integration”, type: “Company”})
• Read / Query
db. koleksi.find( {nama: “PHI-Integration”, $and [ {posting: {$gt : 100}} , posting: {$lt: 200}}])
Basic Commands & Expressions
32JaMU – Jakarta 7 Maret 2014
MONGODB DEMO
Basic Commands
PDI ExtractandLoad
Aggregation Framework
1 2 3
33JaMU – Jakarta 7 Maret 2014
Neo4jGraph Database
Properties
RelationshipCypher
Node
34JaMU – Jakarta 7 Maret 2014
Neo4J
• Neo4J Web Admin
• Create Node
CREATE (n {property_name :“property_value" })
• Create Relation
CREATE n-[:RELATION]->m
• Where:
– n, m is identifier
– :RELATION is relation name
Basic Utility, Commands & Expressions
35JaMU – Jakarta 7 Maret 2014
Neo4J
• Matching and Returning Objects
START emil=node:people(name='Emil')
MATCH emil-[:MARRIED_TO]-madde
RETURN madde
Basic Commands & Expressions
36JaMU – Jakarta 7 Maret 2014
HIERARCHICAL MODELNeo4j Case Demo
Root
Child 3 Child 4Child 2Child 1 Child 5
37JaMU – Jakarta 7 Maret 2014
Q&A
38JaMU – Jakarta 7 Maret 2014
Universitas Multimedia NusantaraNew Media Tower, Lv.12Scientia Boulevard St.Tangerang, Banten, 15811
+6221-7038-7738 (phone)+ 628176-474-525 (mobile)
https://www.facebook.com/feris.thia
@FerisThia
CONTACT ME
39JaMU – Jakarta 7 Maret 2014
BIGTHANK YOU !