Oracle's Machine Learning: Newest Features and Road Map

100
1 Oracle’s Machine Learning: Newest Features and Road Map Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics [email protected] www.twitter.com/CharlieDataMine Move the Algorithms; Not the Data!

Transcript of Oracle's Machine Learning: Newest Features and Road Map

1

Oracle’s Machine Learning: Newest Features and Road Map

Charlie Berger, MS Engineering, MBASr. Director Product Management, Machine Learning, AI and Cognitive [email protected] www.twitter.com/CharlieDataMine

Move the Algorithms; Not the Data!

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

2

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 3

Analytics Value vs. Maturity

Reports & Dashboards

Analytical Maturity

Val

ue

of

An

alyt

ics

Diagnostic Analysis & Reports

Predictive / Machine Learning

“ML Enabled” Applications

What Happened?

Why it Happened?

What WILL happen?

Automated ML Appls

Oracle Machine Learning’s Differentiators

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Oracle’s Data Management + Machine Learning

Market Observations:

• Machine learning & “AI” now Must-Have requirements

• Separate islands for data management and machine learning don’t work

• Enterprises whose data science teams most rapidly extract insights win

Conclusions:

• Must “operationalize” ML insights throughout enterprise to realize results

• Multi-lingual ML: SQL, R, Python, Workflow UI, Notebooks, Embedded ML & Predictive Appls

• Market evolving towards hybrid Data Management + ML that can both manage and “think” about data—a “thinking Database”

4

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 5

Algorithms Operate on Data

ML and AI are just “Algorithms”

Move the Algorithms; Not the Data!; It Changes Everything!

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Data remains in Database & Hadoop

Model building and scoring occur in-database

Use R packages with data-parallel invocations

Leverage investment in Oracle IT

Eliminate data duplication

Eliminate separate analytical servers

Deliver enterprise-wide applications

GUI for ML/Predictive Analytics & code gen

R interface leverages database as HPC engine

Major Benefits

Oracle’s Machine Learning & Advanced Analytics

Traditional ML

Hours, Days or Weeks

Data Extraction

Data Prep & Transformation

Data Mining Model Building

Data MiningModel “Scoring”

Data Prep. &Transformation

Data Import

avings

Model “Scoring”Embedded Data Prep

Data Preparation

Model Building

Oracle’s in-DB Machine Learning

Secs, Mins or Hours

Fastest Way to Deliver Enterprise-wide Predictive Analytics

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

CLASSIFICATION– Naïve Bayes– Logistic Regression (GLM)– Decision Tree– Random Forest– Neural Network– Support Vector Machine– Explicit Semantic Analysis

CLUSTERING– Hierarchical K-Means– Hierarchical O-Cluster– Expectation Maximization (EM)

ANOMALY DETECTION– One-Class SVM

TIME SERIES– State of the art forecasting using

Exponential Smoothing– Includes all popular models

e.g. Holt-Winters with trends, seasons, irregularity, missing data

REGRESSION– Linear Model– Generalized Linear Model– Support Vector Machine (SVM)– Stepwise Linear regression– Neural Network– LASSO *

ATTRIBUTE IMPORTANCE– Minimum Description Length– Principal Comp Analysis (PCA)– Unsupervised Pair-wise KL Div – CUR decomposition for row & AI

ASSOCIATION RULES– A priori/ market basket

PREDICTIVE QUERIES– Predict, cluster, detect, features

SQL ANALYTICS– SQL Windows, SQL Patterns,

SQL Aggregates

• OAA (Oracle Data Mining + Oracle R Enterprise) and ORAAH combined• OAA includes support for Partitioned Models, Transactional, Unstructured, Geo-spatial, Graph data. etc,

Oracle’s Machine Learning & Adv. Analytics Algorithms

FEATURE EXTRACTION– Principal Comp Analysis (PCA)– Non-negative Matrix Factorization– Singular Value Decomposition (SVD)– Explicit Semantic Analysis (ESA)

TEXT MINING SUPPORT– Algorithms support text– Tokenization and theme extraction– Explicit Semantic Analysis (ESA) for

document similarity

STATISTICAL FUNCTIONS– Basic statistics: min, max,

median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc.

R PACKAGES– CRAN R Algorithm Packages

through Embedded R Execution– Spark MLlib algorithm integration

EXPORTABLE ML MODELS– REST APIs for deployment

*

X1

X2

A1A2 A3A4 A5 A6 A7

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

STATISTICAL FUNCTIONS

– Descriptive statistics (e.g. median, stdev, mode, sum, etc.)

– Hypothesis testing (t-test, F-test, Kolmogorov-Smirnov test, Mann Whitney test, Wilcoxon Signed Ranks test

– Correlations analysis(parametric and nonparametric e.g.Pearson’s test for correlation, Spearman's rho coefficient, Kendall's tau-b correlation coefficient)

– Ranking functions

– Cross Tabulations with Chi-square statistics|

– Linear regression

– ANOVA (Analysis of variance)

– Test Distribution fit (e.g. Normal distribution test, Binomial test, Weibull test, Uniform test, Exponential test, Poisson test, etc.)

– Statistical Aggregates (min, max, mean, median, stdev, mode, quantiles, plus x sigma, minus x sigma, top n outliers, bottom n outliers)

Oracle’s Machine Learning & Adv. Analytics Algorithms*

ANALYTICAL SQL

– SQL Windows

– SQL Aggregate functions

– LAG/LEAD functions

– SQL for Pattern Matching

– Additional approximate query processing: APPROX_COUNT, APPROX_SUM, APPROX_RANK

– Regular Expressions

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Statistical Functions

SELECT SUBSTR(cust_income_level, 1, 22) income_level,

AVG(DECODE(cust_gender, 'M', amount_sold, null)) sold_to_men,

AVG(DECODE(cust_gender, 'F', amount_sold, null)) sold_to_women,

STATS_T_TEST_INDEPU(cust_gender, amount_sold, 'STATISTIC', 'F') t_observed,

STATS_T_TEST_INDEPU(cust_gender, amount_sold) two_sided_p_value

FROM customers c, sales s

WHERE c.cust_id = s.cust_id

GROUP BY ROLLUP(cust_income_level)

ORDER BY income_level, sold_to_men, sold_to_women, t_observed;

Simple SQL Syntax—Statistical Comparisons (t-tests)

STATS_T_TEST_INDEPU (SQL) Example; P_Values < 05 show statistically significantly differences in the amounts purchased by men vs. women

Compare AVE Purchase Amounts Men vs. Women Grouped_By INCOME_LEVEL

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

OAA Model Build and Real-time SQL Apply Prediction

BEGIN

DBMS_DATA_MINING.CREATE_MODEL(

model_name => 'BUY_INSUR1',

mining_function => dbms_data_mining.classification,

data_table_name => 'CUST_INSUR_LTV',

case_id_column_name => 'CUST_ID',

target_column_name => 'BUY_INSURANCE',

settings_table_name => 'CUST_INSUR_LTV_SET');

END;

/

Simple SQL Syntax—Classification Model

Select prediction_probability(BUY_INSUR1, 'Yes'

USING 3500 as bank_funds, 825 as checking_amount, 400 as credit_balance, 22 as age,

'Married' as marital_status, 93 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership)

from dual;

ML Model Build (PL/SQL)

Model Apply (SQL query)

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

OAA Model Build and Real-time SQL Apply Prediction

BEGIN

DBMS_DATA_MINING.CREATE_MODEL(

model_name => 'BUY_INSURANCE_AI',

mining_function => DBMS_DATA_MINING.ATTRIBUTE_IMPORTANCE,

data_table_name => 'CUST_INSUR_LTV',

case_id_column_name => 'cust_id',

target_column_name => 'BUY_INSURANCE',

settings_table_name => 'Att_Import_Mode_Settings');

END;

/

Simple SQL Syntax—Attribute Importance

SELECT attribute_name, explanatory_value, rank

FROM BUY_INSURANCE_AI

ORDER BY rank, attribute_name;

ML Model Build (PL/SQL)

Model Results (SQL query)

ATTRIBUTE_NAME RANK ATTRIBUTE_VALUE

BANK_FUNDS 1 0.2161

MONEY_MONTLY_OVERDRAWN 2 0.1489

N_TRANS_ATM 3 0.1463

N_TRANS_TELLER 4 0.1156

T_AMOUNT_AUTOM_PAYMENTS 5 0.1095

A1A2 A3A4 A5 A6 A7

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

ORE Model Build

> ore.odmAI (BUY_INSURANCE ~ ., CUST_INSUR_LTV)

Call:

ore.odmAI(formula = BUY_INSURANCE ~ ., data = CUST_INSUR_LTV)

Simple R Language Syntax—Attribute ImportanceML Model Build (R)

Model Results (R)Importance:

importance rank

BANK_FUNDS 0.2161187797 1

MONEY_MONTLY_OVERDRAWN 0.1489347141 2

N_TRANS_ATM 0.1463026512 3

N_TRANS_TELLER 0.1155879786 4

T_AMOUNT_AUTOM_PAYMENTS 0.1095178647 5

A1A2 A3A4 A5 A6 A7

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

OML4Py Model Build

> ai_mod = ai(**setting) # Create AI model object

> ai_mod = ai_mod.fit(train_x, train_y)

Simple Python Language Syntax—Attribute ImportanceML Model Build (Python)

Model Results (Python)Importance:

variable importance rank

BANK_FUNDS 0.2161187797 1

MONEY_MONTLY_OVERDRAWN 0.1489347141 2

N_TRANS_ATM 0.1463026512 3

N_TRANS_TELLER 0.1155879786 4

T_AMOUNT_AUTOM_PAYMENTS 0.1095178647 5

A1A2 A3A4 A5 A6 A7

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Oracle Machine LeaningMultiple Languages UIs Supported for End Users & Apps Development

Application DevelopersDBAs

R & Python Data Scientists “Citizen” Data ScientistsNotebook Users & DS Teams

New! New!

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 15

Manage and Analyze All Your Data

Big Data SQL / R

SQL / R / Python

Object Store

“Engineered Features”– Derived attributes that reflect domain knowledge—key to best models e.g.:• Counts• Totals• Changes

over time

Boil down the Data Lake

Architecturally, Many Options and Flexibility

Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |

OAA Algorithm Performance

Algorithm Time (sec.) Number of Attributes

SVM SGD Solver 83s 900

GLM Classification 180s 9000

SVM IPM Solver 811s 900

GLM Regression 59s 900

K-Means 168s 9000

• In-memory 640 million rows, Airline On-Time dataset

• SPARC M8-2 hardware

• Courtesy of the SPARC Performance Team

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Operationalizing and Embedding Analytics for Action

17

How long does it take to put a defined model into operational use?

?

??Length of time to put a model into production.

Based on 141 respondents who stated they are doing this today

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Example Customer References

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Forth CorporationExtracting Meaningful Information from Customer User Activity

Sources: https://www.youtube.com/watch?v=xHNhghPGy-g&feature=youtu.be

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

DX MarketingEnterprise Level Data and Analytics for More Targeted Marketing

Reference: https://www.youtube.com/watch?v=QwkkHunVT5c&feature=youtu.be

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

UK National Health ServiceCombating Healthcare Fraud

Objectives

Use new insight to help identify cost savings and meet goals

Identify and prevent healthcare fraud and benefit eligibility errors to save costs

Leverage existing data to transform business and productivity

Solution

Identified up to GBP100 million (US$156 million) potentially saved through benefit fraud and error reduction

Used anomaly detection to uncover fraudulent activity where some dentists split a single course of treatment into multiple parts and presented claims for multiple treatments

Analyzed billions of records at one time to measure longer-term patient journeys and to analyze drug prescribing patterns to improve patient care

“Oracle Advanced Analytics’ data mining capabilities and Oracle

Exalytics’ performance really impressed us. The overall solution is

very fast, and our investment very quickly provided value. We can

now do so much more with our data, resulting in significant

savings for the NHS as a whole”– Nina Monckton, Head of Information Services,

NHS Business Services Authority

Oracle Exadata Database Machine

Oracle Advanced Analytics

Oracle Exalytics In-Memory Machine

Oracle Endeca Information Discovery

Oracle Business Intelligence EE

Update: £400M confirmed fraud£700M additional potential identified

£1 Billion in savings…Moving to Cloud

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Valdosta State UniversityLeveraging ML and Analytics in Higher Education

Objectives

Customizable Dashboard that is interactive, identifies the number of “at-risk students” in courses and prioritizes risk based on their risk indicators.

Solution

Student Profiles provide important information about students as well likelihood of success.

Early Alert Triggers through a flagging system that allows faculty to alert advisors, student success personnel, housing faculty when needed

Student Support and Resource Workflow that shows when students utilize advise and student services –integrating with the student portal

"Institutions use a variety of technologies, both on premise and

cloud. Hybrid environments often lack the ability to provide

integrations across all enterprise resources. Choosing the right

technology will enable you to continue developing data-driven

solutions."

– Brian Haugabrook, Information Technology, Valdosta State Univ.

Excerpted from: https://www.oracle.com/ocom/groups/public/@otn/documents/webcontent/5232486.pdf

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

FiservRisk Analytics in Electronic Payments

Objectives

Prevent $200M in losses every year using data to monitor, understand and anticipate fraud

Solution

We installed OAA analytics for model development during 2014

When choosing the tools for fraud management, speed is a critical factor

OAA provided a fast and flexible solution for model building, visualization and integration with production processes

“When choosing the tools for fraud management, speed is a

critical factor. Oracle Advance Analytics provided a fast and

flexible solution for model building, visualization and integration

with production processes.”

– Miguel Barrera, Director of Risk Analytics, Fiserv Inc.

– Julia Minkowski, Risk Analytics Manager, Fiserv Inc.

3 months to run & deploy Logistic Regression

(using SAS) 1 month to estimate and deploy Trees and GLM

1 week to estimate, 1 week to install rules in online application

1 day to estimate and deploy Trees + GLM models (using Oracle Advanced Analytics)

Oracle Advanced Analytics

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Caixa Bank (the leading financial group in Spain)

Caixa Bank Leads Innovation with Big Data Technology

Objectives

Solution

“In order to meet today’s business needs and better serve our

customers and improve their experience, it is important to

embrace big data to enable new analytic capabilities and manage

more information,”

– Jordi Fontanals, chief operations officer (and former CIO) of

CaixaBank in Barcelona, Spain, and winner of the Oracle

Excellence Award - CIO of the Year—Europe, Middle East, Africa

Need to serve 14M customers every day through multi-channels

Need to find new and better ways to interact with customers in the digital age.

Caixa Bank chose the Big Data Solution from Oracle

Exadata, Advanced Analytics, Big Data Appliance, Exalytics

Migrated all their previous analytics platforms to an Oracle architecture on hardware running Intel processors

Can now provide personalized offers for an individual vspreviously only by a segment

Predictive models to help risk management

Sources: https://www.youtube.com/watch?v=xHNhghPGy-g&feature=youtu.be

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Introducing…

ADW’s Oracle Machine Learning SQL Notebooks

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

ADWC + Oracle Machine Learning Notebooks

Oracle Exadata Cloud Service

Oracle Database Cloud Service

Express Cloud Service

Data Warehouse Services(EDWs, DW, departmental marts and sandboxes)

Autonomous Data Warehouse Cloud

Service Console

Built-in Access Tools

Oracle Machine Learning

Service Management

Autonomous Database Cloud

Oracle SQL Developer

Developer Tools

Oracle Object Storage CloudFlat Files and Staging

Data Integration

Services

Oracle Data Integration Platform Cloud

3rd Party DI on Oracle Cloud Compute

3rd Party DI On-premises

26

Oracle Analytics Cloud

Analytics

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Oracle Machine Learning

Key Features

• Collaborative UI for data scientists– Packaged with Autonomous Data

Warehouse Cloud (V1)

– Easy access to shared notebooks, templates, permissions, scheduler, etc.

– SQL ML algorithms API (V1)

– Supports deployment of ML analytics

Machine Learning Notebook for Autonomous Data Warehouse Cloud

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Oracle Machine Learning

Key Features

• Collaborative UI for data scientists– Packaged with Autonomous Data

Warehouse Cloud (V1)

– Easy access to shared notebooks, templates, permissions, scheduler, etc.

– SQL ML algorithms API (V1)

– Supports deployment of ML analytics

Machine Learning Notebook for Autonomous Data Warehouse Cloud

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 29

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

CLASSIFICATION– Naïve Bayes– Logistic Regression (GLM)– Decision Tree– Random Forest– Neural Network– Support Vector Machine– Explicit Semantic Analysis

CLUSTERING– Hierarchical K-Means– Hierarchical O-Cluster– Expectation Maximization (EM)

ANOMALY DETECTION– One-Class SVM

TIME SERIES– Holt-Winters, Regular & Irregular,

with and w/o trends & seasonal– Single, Double Exp Smoothing

REGRESSION– Linear Model– Generalized Linear Model– Support Vector Machine (SVM)– Stepwise Linear regression– Neural Network

ATTRIBUTE IMPORTANCE– Minimum Description Length– Principal Comp Analysis (PCA)– Unsupervised Pair-wise KL Div

ASSOCIATION RULES– A priori/ market basket

PREDICTIVE QUERIES– Predict, cluster, detect, features

SQL ANALYTICS– SQL Windows, SQL Patterns,

SQL Aggregates

•OAA includes support for Partitioned Models, Transactional, data. etc,

Oracle Machine Learning Algorithms

FEATURE EXTRACTION– Principal Comp Analysis (PCA)– Non-negative Matrix Factorization– Singular Value Decomposition (SVD)– Explicit Semantic Analysis (ESA)

STATISTICAL FUNCTIONS– Basic statistics: min, max,

median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc.

X1

X2

A1A2 A3A4 A5 A6 A7

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Quick OML + OAC Demo:

Targeting Best Likely High Credit Score Customers

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Attribute Importance Model

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Build Predictive Models to Predict Customers who are Likely to Have Good_Credit

39

Split Data into Train and Test

Build and Test Classification Model

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Try ADWC + OML for FREE! FREE Oracle Cloud Credits - https://go.oracle.com/adwc

1

2

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Additional ML and AA Product Details

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Oracle Data Miner GUIDrag and Drop, Workflows, Easy to Use UI for “Citizen Data Scientist”

• Easy to use to define analytical methodologies that can be shared

• SQL Developer Extension

• Workflow API and generates SQL code for immediate deployment

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 47

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 50

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Oracle R EnterpriseR Language API Component to OAA Machine Learning Functions and SQL

• R languageSQL“push down”– Transparency layer

for “push down” to equivalent SQL for parallelized in-DB processing

• Direct access to DB data– ROracle pkg for OCI

connectivity

– “Embedded R” call outs to R packages

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

R: Transparency via function overloadingInvoke in-database aggregation function

> aggdata <- aggregate(ONTIME_S$DEST, + by = list(ONTIME_S$DEST), + FUN = length)

ONTIME_S

In-dbStats

Oracle Database

Oracle SQL

select DEST, count(*)from ONTIME_Sgroup by DEST

Oracle Advanced Analytics

ORE Client Packages

Transparency Layer

> class(aggdata)[1] "ore.frame"attr(,"package")[1] "OREbase"> head(aggdata)

Group.1 x1 ABE 2372 ABI 343 ABQ 13574 ABY 105 ACK 36 ACT 33

Database Server

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 54

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

How Oracle R Enterprise Works

•R-SQL Transparency Layer for “Push Down” to SQL for in-Database execution

• Interactive display of graphical results using R

•All the power and flexibility of open source R

•Leverage in-Database algorithms (ORE.odmSVM, ORE.odmDT, etc.)

•Ability to perform analytics using R if that’s data scientist’s native language

•Leverage database strengths: SQL parallelism, scale to large datasets, security

Other R packages

Oracle R Enterprise (ORE) packages

R-> SQL Transparency “Push-Down”

•R Engine(s) spawned by Oracle DB for data parallelism

•ore.groupApply for high performance scoring

•Enables deployment and automation of R scripts

R-> SQL

Results

In-Database ML & Stats SQL Functions

R Engine Other R packages

Oracle R Enterprise packages

Embedded R Package Callouts

R

Results

1 2 3

Oracle Database 18c

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Oracle R Advanced Analytics for Hadoop (ORAAH)R Language API Component to OAA Machine Learning Functions and SQL

• R & Java nterface on BDA/BDA Cloud– Manipulate data stored in a local File System, HDFS, HIVE, Impala or JDBC sources– General computation framework where users invoke parallel, distributed MapReduce

jobs from R, writing custom mappers and reducers in R while also leveraging open source CRAN packages.

• Parallel and distributed Machine Learning algorithms • Leverage Hadoop nodes and Spark for cluster scalable, high performance• Functions use the expressive R formula object for Spark parallel execution

–ORAAH's custom LM/GLM/MLP NN algorithms scale better and run faster than the open-source Spark Mllib. Also Expose Mllib algorithms

• Availability• On premise: Part of Oracle Big Data Connectors license Oracle BDA• Cloud: Part of Oracle Big Data Cloud Service and Oracle Big Data Cloud at Customer

Oracle Cloud

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Oracle R Advanced Analytics for Hadoop (ORAAH)R Language API Component to OAA Machine Learning Functions and SQL

• New Algorithms:

– ELM (Extreme Learning Machines) and Hierarchical-ELM (Oracle’s MPI/Spark-engines)

– Distributed Stochastic PCA (Oracle’s MPI/Spark-engines)

– Distributed Stochastic SVD (Oracle’s MPI/Spark-engines)

– More statistical distributions, special and aggregation functions• Beta, Binomial, Cauchy, Chi-squared, Exponential, F-distribution, Gamma, Geometric, Hypergeometric, Log-normal, Normal,

Poisson, Student's t, Triangular, Uniform, Weibull and Pareto

• Special functions: gamma(x), lgamma(x), digamma(x), trigamma(x), lanczos(x), factorial(x), lfactorial(x), lbeta(a, b), lchoose(n, k) - natural logarithm of the binomial coefficient

• New Aggregate Functions: avg(colExpr), max(colExpr), min(colExpr), sd(colExpr), sum(colExpr), variance(colExpr), kurtosis(colExpr), skewness(colExpr)

• Spark 2.x support

• Misc ML new features/enhancements

Oracle Cloud

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Newest Additional “GA” ML Features:

Oracle Advanced Analytics 18c & 12.2Oracle Data Miner 18.4 UIORE 1.5.1ORAAH 2.8

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

New in 18c: Machine Learning and Advanced Analytics

• Autonomous Data Warehouse Cloud + OML

• In-DB Machine Learning Algorithms (SQL API)

– Scalable Random Forests for Classification

– Scalable Neural Networks for both classification and regression

– CUR decomposition algorithm for attribute and row importance

– Time Series: State of the art forecasting1 using Exponential Smoothing. • Includes all popular models e.g. Holt-Winters. Supports trend, damped

trend and seasonality. Performs automatic aggregations based on user define time periods or handles irregularly spaced data as direct input. Handles missing values.

• OML Microservices for applications deployment

1 Gartner paper

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Oracle Advanced Analytics 12.2New Oracle Database Features

• Significant Performance Improvements for all Algorithms– New parallel model build / apply redesigned infrastructure to enable faster new algorithm introduction

– Scale to larger data volumes found in big data and cloud use cases

• Unsupervised Feature Selection– Unsupervised algorithm for pair-wise correlations (Kullback-Leibler Divergence (KLD))

for numeric & categorical attributes to find highest “information containing” attributes

• Association Rules Enhancements– Adds calculation of values associated with AR rules such as sales amount to indicate the

value of co-occurring items in baskets

– Can filter input items prior to market basket analysis

• Partitioned Models – Instead of building, naming and referencing 10s or 1000s of models, partitioned

models organize and represent multiple models as partitions in a single model entity

Partition 1

Partition 2…

Partition …N

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Oracle Advanced Analytics 12.2New Cognitive Modeling for Text Processing• Explicit Semantic Analysis (ESA) algorithm

– Useful technique for extracting meaningful, interpretable features; better than LDA

– English Wikipedia is Text corpus default to equate tokens with human identifiable features and concepts

– ESA improves text processing, classification, document similarity and topic identification

• Compare docs that may not mention same topics e.g. al-Qa ida or Osama bin Laden:Document 1 'Senior members of the Saudi royal family paid at least $560 million to Osama bin Laden terror group and the Taliban for an agreement his forces would not attack targets in Saudi Arabia, according to court documents. The papers, filed in a $US3000 billion ($5500 billion) lawsuit in the US, allege the deal was made after two secret meetings between Saudi royals and leaders of al-Qa ida, including bin Laden. The money enabled al-Qa ida to fund training camps in Afghanistan later attended by the September 11 hijackers. The disclosures will increase tensions between the US and Saudi Arabia.'

Document 2 'The Saudi Interior Ministry on Sunday confirmed it is holding a 21-year-old Saudi man the FBI is seeking for alleged links to the Sept. 11 hijackers. Authorities are interrogating Saud Abdulaziz Saud al-Rasheed "and if it is proven that he was connected to terrorism, he will be referred to the sharia (Islamic) court," the official Saudi Press Agency quoted an unidentified ministry official as saying.‘

– ESA Similarity Score = 0.62

• New Wikipedia-Based Cognitive Model Available for Text Processing blog post

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Oracle R Enterprise 1.5.1

• OREdplyr– OREdplyr provides a subset of dplyr functionality

extending the Oracle R Enterprise transparency layer.

• OAAgraph– Supports use a graph representation of data (Oracle PGX)

where data entities are nodes and relationships are edges.

• ORE support for new in-database algorithms – Expectation Maximization (EM)

– Explicit Semantic Analysis (ESA)

– Singular Value Decomposition (SVD)

• ORE support for Automated Text Processing

• ORE support for Partitioned Models

• Extensible R Algorithm Models

Partition 1

Partition 2…

Partition …N

https://docs.oracle.com/cd/E83411_01/ORERN/toc.htm#ORERN-GUID-3B6ED739-40BC-49AF-A1F9-7E9EC93A2429

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

• ORAAH analytics as standalone Java library

• New MPI/Spark-based Algorithms

– ELM and H-ELM on Spark+MPI

– Distributed Stochastic PCA & SVD

• Updated ORAAH GLM and LM algorithms

• Neural Networks full formula processing in Spark

• Full formula support and summary for orch.lm2() and orch.glm2

• Support for IMPALA and Spark DataFrames

• Gradient Boosted Trees (Spark MLlib)

63

Oracle R Advanced Analytics for Hadoop (ORAAH)New Features in ORAAH 2.8

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

ORAAH: Spark + MPI Combined Performance

64

• Spark Strengths

– Excellent for embarrassingly parallel problems

– Poor at iterative problems

• MPI Strengths– Excels for highly interconnected

problems with thousands of nodes

– Strong scaling for iterative problems

• Spark + MPI Combined Performance

– FASTER! Best of both worlds

+

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

New Oracle Machine Learning for Python—OML4Py(Available Soon on OTN for Download as OAA Add-in Component —like Oracle R Enterprise)

Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |

Oracle Machine Learning for PythonOracle Advanced Analytics Option to 18c+

• Similar architecture to OAA’s Oracle R Enterprise

• OML4Py Transparency Layer – Use Oracle Database as High Performance

Computing environment

• OML4Py OAA Model Build and Apply – Use OAA parallel and distributed ML algorithms

– Manage Python scripts and Python objects in Oracle Database

• OML4Py Embedded Python– Make callout to Python packages

– Integrate Python results into applications via SQL

66

Oracle Database

User tables

In-dbstats

Database ServerMachine

SQL InterfacesSQL*Plus,

SQLDeveloper, …

OML for Python

Python Client

Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |

cx_Oracle

• Python package enabling scalable and performant connectivity to Oracle Database

– Open source, publicly available on PyPI, OTN, and github

– Oracle is maintainer

• Oracle Database Interface for Python conforming to Python DB API 2.0 specification

– Optimized driver based on OCI

– Execute SQL statements from Python

– Enables transactional behavior for insert, update, and delete

Oracle Database

cx_Oracle

67

Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |

OML4Py Embedded Python

def fit(data):

from sklearn.svm import LinearSVC

x = data.drop('TARGET',

axis = 1).values

y = data['TARGET']

return LinearSVC().fit(x, y)

oml.script.create('sk_svc_fit', fit,

overwrite = True)

oml.script.dir()

mod = oml.table_apply(train_dat,

func = 'sk_svc_fit',

oml_input_type = 'pandas.DataFrame')

Confidential – Oracle Internal/Restricted/Highly Restricted 68

Client Python Engine

OML4Py

User tables

pyq*eval () interface

2

3Oracle Database

extproc

DB Python Engine4

OML4Py

1

Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |

oml.group_apply – partitioned data flow

Client Python Engine

OML4Py

User tables

DB Python Engine

pyq*eval () interface

extproc

2

3

4

OML4Py

Oracle Database

extproc

DB Python Engine4

OML4Py

def build_lm(dat):

from sklearn import linear_model

lm = linear_model.LinearRegression()

X = dat[['PETAL_WIDTH']]

y = dat[['PETAL_LENGTH']]

lm.fit(X, y)

return lm

index = oml.DataFrame(IRIS['SPECIES'])

mods = oml.group_apply(

IRIS[:,['PETAL_LENGTH',

"PETAL_WIDTH",

'SPECIES']],

index,

func=build_lm)

sorted(mods.pull().items())

1

Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |

Embedded Python Execution functions

• oml.do_eval(func[, func_value, func_owner, . . . ])

– Executes the user-defined Python function at the Oracle Database server machine

• oml.table_apply(data, func[, func_value, . . . ])

– Executes the user-defined Python function at the Oracle Database server machine supplying data pulled from Oracle Database

• oml.row_apply(data, func[, func_value, . . . ])

– Partitions a table or view into row chunks and executes the user-defined python function on each chunk within one or more Python processes running at the Oracle Database server machine

• oml.group_apply(data, index, func[, . . . ])

– Partitions a table or view by the values in column(s) specified in index and executes the user-defined python function on those partitions within one or more Python processes running at the Oracle Database server machine

• oml.index_apply(times, func[, func_value, . . . ])

– Executes the user-defined python function multiple times inside Oracle Database server

Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |

Script repository functions for saving Python scripts in ODB

• oml.script.create(name, func[, is_global, . . . ])

– Creates a Python script, which contains a single function definition, in the Oracle Database Python script repository

• oml.script.dir([name, regex_match, sctype])

– Lists the scripts present in the Oracle Database Python script repository

• oml.script.load(name[, owner])

– Loads the named script from the Oracle Database Python script repository as a callable object

• oml.script.drop(name[, is_global, silent])

– Drops the named script from the Oracle Database Python script repository

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

New AutoML included initially available in Oracle Machine Learning for Python—OML4Py(Available Soon on OTN for Download as OAA Add-in Component —like Oracle R Enterprise)

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Model Build & Real-time SQL Apply Prediction

ms = ModelSelection(

mining_function = 'classification',

score_metric = 'accuracy')

best_model = ms.select(X_train, y_train)

OML4Py Syntax

y_pred = best_model.predict(X_test)

ML Model Build Best Model (Python)

Model Apply (Python)

Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |

Automated Machine Learning (AutoML) Components

74

X1 X2 X3 X4 X5 Y

Machine Learning Pipeline ML Model

X1 X2 X3 X4 X5

Y

Inference

AutoMS4x faster than

exhaustive search

AutoFS>50% reduction in

features

AutoTuneUpto 24% accuracy

improvement

Training Data

Auto Feature Selection Auto Model Selection Auto Tuning of Hyperparameters

1 2 3

Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |

AutoTune: Evaluation for OAA Neural Network

.

Confidential – Oracle Internal/Restricted/Highly Restricted 75

8-22% improvement for several datasets

Avg Improvement of 2.86%

40%

50%

60%

70%

80%

90%

100%

Leav

e-o

ut

Test

Set

Acc

ura

cy (

%)

OAA Neural Network - Default vs Tuned Accuracy

Default Tuned

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Restricted 76

Oracle Applications that Embed Oracle Machine Learning Algorithms

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

HCM Cloud Workforce Predictions

CRM Sales Cloud Sales Prediction

Retail GBUCustomer Insights, Customer Segmentation

Adaptive Intelligent Applfor Manufacturing

CPQ CloudConfigure, Price, Quote

Industry Data ModelsComms, SNA, Utilities, Airlines, Retail, …

Oracle E-Business SuiteSpend Classification

Oracle Identity MgmtAdaptive Access Mgmt

FSGBUAnalytical ApplicationsInfrastructure

Oracle Machine Learning Applications: GA

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

HCM Cloud Workforce Predictions

CRM Sales Cloud Sales Prediction

Retail GBUCustomer Insights, Customer Segmentation

Adaptive Intelligent Applfor Manufacturing

CPQ CloudConfigure, Price, Quote

Industry Data ModelsComms, SNA, Utilities, Airlines, Retail, …

Oracle E-Business SuiteSpend Classification

Oracle Identity MgmtAdaptive Access Mgmt

FSGBUAnalytical ApplicationsInfrastructure

Oracle Machine Learning Applications: GA

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Oracle Adaptive Intelligent (AI) Apps for Manufacturing

ML Use Cases

• Insights (Patterns and Correlations Analysis)– Discover key influencers and patterns that affect yield & quality

• Predictive Analytics– Predictive critical outcomes during manufacturing to minimize losses

Reasons why using/like OAA’s ML

• Easy-to-integrate R & PL/SQL APIs for many ML algorithms

• In-database execution & scalable performance

• Enterprise grade support for OAA ML

GA Q4FY18

Achieve Manufacturing Operational Excellence using Machine Learning & AI

Screen shots, conceptual diagram

Screen shots, conceptual diagram

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

New Oracle Machine Learning Microservices

Cognitive Microservices now available internally as hosted ML Services for Oracle Applications

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

New Oracle Machine Learning Microservices(Available Now to Internal Oracle Application Teams)

• Services for building, storing, and deploying Oracle Machine Learning models

• Cognitive Services

– Image and Text

• Documented RESTful APIs for applintegration

• Containers (Docker images) for portability

• Model repository for storing, versioning, and comparing ML models

• Kubernetes support for container management

81

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

OML Image Microservices

• Cognitive APIs for images: tagging, classification, face detection, age, gender, inappropriate content, image similarity…– Deep Learning models &

Transfer Learning

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

OML Image Microservices

• Cognitive APIs for images: tagging, classification, face detection, age, gender, inappropriate content, image similarity…– Deep Learning models &

Transfer Learning

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

OML Image Microservices

• Cognitive APIs for images: tagging, classification, face detection, age, gender, inappropriate content, image similarity…– Deep Learning models &

Transfer Learning

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 85

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 86

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

OML Text Microservices

• Cognitive Text API– Summary

– Key words

– Topics

– Sentiment

– Similarity

• Uses Explicit Semantic Analysis (ESA) Wikipedia model

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

OML Text Microservices

• Cognitive Text API– Summary

– Key words

– Topics

– Sentiment

– Similarity

• Uses Explicit Semantic Analysis (ESA) Wikipedia model

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

OML Text Microservices

• Cognitive Text API– Summary

– Key words

– Topics

– Sentiment

– Similarity

• Uses Explicit Semantic Analysis (ESA) Wikipedia model

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

OML Text Microservices

• Cognitive Text API– Summary

– Key words

– Topics

– Sentiment

– Similarity

• Uses Explicit Semantic Analysis (ESA) Wikipedia model

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 91

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 92

Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | 93

Content and Experience CloudPRO 4648

Deliver Personalized Experiences and Automate Content Management with AI and ML

Tuesday10/23 5:45p

Marriott Marquis (Yerba Buena Level) Nob Hill C/D

Kiran Bellare

Content and Experience Cloud (CEC)• Originally Documents Cloud Service which focused on EFSS (Enterprise File Sync and Share) for simple

document management, focus was on: Files, Folders and Sharing

• Migrating toward Web Content Management and Customer Experience, focus shifting to: Digital Assets, Content Items, Web Sites, Multi-channel delivery, and the Customer Experience.

Content and Experience Cloud• Upon instance delivery, CEC is an empty container

• CEC has no specific target business focus

• Customers leverage the toolset to construct their environments: content types, assets, delivery channels, web site pages, custom web components, etc.

• Customers then develop specific content (images, descriptions, blogs, articles, summaries, reviews, etc)

• Phase 1: Content Development and Curation

• Apply non-supervised learning algorithms, Wikipedia based models as common baseline especially for smaller repositories.

• Text and Image keyword tagging

• Identify inappropriate content

• Auto-classification via similarity match against customer defined categories or hierarchies

• Similar content via similarity recommendations

• Sentiment analysis on visitor comments

• OML Microservices

• Cognitive Image: Classification, NSFW

• Cognitive Text: Topics, Keywords, Similarity, Sentiment

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Conclusion:Data Management + Machine Learning— Simpler, More Powerful, Smarter

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

“Why Oracle?

Because that’s where the data is!”

– Larry Ellison, Executive Chairman and CTO of Oracle Corporation

97

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Getting Started—Oracle ML/AA Resources & LinksOracle Advanced Analytics Overview Information

• Oracle Machine Learning Newest Features and Road Map.pptx presentation• Blog post: Simple Guide to Oracle’s Machine Learning and Advanced Analytics • Oracle Advanced Analytics Public Customer References• Oracle’s Machine Learning and Advanced Analytics Data Management Platforms white paper on OTN • Oracle INTERNAL ONLY Oracle ML & AA Product Management Wiki and Beehive Workspace

YouTube recorded Oracle Advanced Analytics Presentations and Demos, White Papers • Oracle's Machine Learning & Advanced Analytics 12.2 & Oracle Data Miner 4.2 New Features YouTube video• Library of YouTube Movies on Oracle Advanced Analytics, Data Mining, Machine Learning (7+ “live” Demos

e.g. Oracle Data Miner 4.0 New Features, Retail, Fraud, Loyalty, Overview, etc.)• Overview YouTube video of Oracle’s Advanced Analytics and Machine Learning

Getting Started/Training/Tutorials Link to OAA/Oracle Data Miner Workflow GUI Online (free) Tutorial Series on OTN Link to OAA/Oracle R Enterprise (free) Tutorial Series on OTN Link to Try the Oracle Cloud Now! Link to Getting Started w/ ODM blog entry Link to New OAA/Oracle Data Mining 2-Day Instructor Led Oracle University course. Oracle Data Mining Sample Code Examples

Additional Resources, Documentation & OTN Discussion Forums• Oracle Advanced Analytics Option on OTN page• OAA/Oracle Data Mining on OTN page, ODM Documentation & ODM Blog• OAA/Oracle R Enterprise page on OTN page, ORE Documentation & ORE Blog• Oracle SQL based Basic Statistical functions on OTN• Oracle R Advanced Analytics for Hadoop (ORAAH) on OTN

Analytics and Data Summit , All Analytics, All Data, No Nonsense. March 12-14, 2019, Redwood Shores, CA

Send email now to [email protected] and you’ll get my “away message” with the links.

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Try Oracle Database 18c + Oracle Data Miner for FREE!

Navigate to https://cloud.oracle.com/en_US/database1

2 Sign up for a Free Oracle Database Trial

3

Download SQL Developer and Configure Oracle Data Miner