Post on 29-Apr-2023
1
Oracle’s Machine Learning: Newest Features and Road Map
Charlie Berger, MS Engineering, MBASr. Director Product Management, Machine Learning, AI and Cognitive Analyticscharlie.berger@oracle.com www.twitter.com/CharlieDataMine
Move the Algorithms; Not the Data!
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
2
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 3
Analytics Value vs. Maturity
Reports & Dashboards
Analytical Maturity
Val
ue
of
An
alyt
ics
Diagnostic Analysis & Reports
Predictive / Machine Learning
“ML Enabled” Applications
What Happened?
Why it Happened?
What WILL happen?
Automated ML Appls
Oracle Machine Learning’s Differentiators
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle’s Data Management + Machine Learning
Market Observations:
• Machine learning & “AI” now Must-Have requirements
• Separate islands for data management and machine learning don’t work
• Enterprises whose data science teams most rapidly extract insights win
Conclusions:
• Must “operationalize” ML insights throughout enterprise to realize results
• Multi-lingual ML: SQL, R, Python, Workflow UI, Notebooks, Embedded ML & Predictive Appls
• Market evolving towards hybrid Data Management + ML that can both manage and “think” about data—a “thinking Database”
4
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 5
Algorithms Operate on Data
ML and AI are just “Algorithms”
Move the Algorithms; Not the Data!; It Changes Everything!
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Data remains in Database & Hadoop
Model building and scoring occur in-database
Use R packages with data-parallel invocations
Leverage investment in Oracle IT
Eliminate data duplication
Eliminate separate analytical servers
Deliver enterprise-wide applications
GUI for ML/Predictive Analytics & code gen
R interface leverages database as HPC engine
Major Benefits
Oracle’s Machine Learning & Advanced Analytics
Traditional ML
Hours, Days or Weeks
Data Extraction
Data Prep & Transformation
Data Mining Model Building
Data MiningModel “Scoring”
Data Prep. &Transformation
Data Import
avings
Model “Scoring”Embedded Data Prep
Data Preparation
Model Building
Oracle’s in-DB Machine Learning
Secs, Mins or Hours
Fastest Way to Deliver Enterprise-wide Predictive Analytics
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
CLASSIFICATION– Naïve Bayes– Logistic Regression (GLM)– Decision Tree– Random Forest– Neural Network– Support Vector Machine– Explicit Semantic Analysis
CLUSTERING– Hierarchical K-Means– Hierarchical O-Cluster– Expectation Maximization (EM)
ANOMALY DETECTION– One-Class SVM
TIME SERIES– State of the art forecasting using
Exponential Smoothing– Includes all popular models
e.g. Holt-Winters with trends, seasons, irregularity, missing data
REGRESSION– Linear Model– Generalized Linear Model– Support Vector Machine (SVM)– Stepwise Linear regression– Neural Network– LASSO *
ATTRIBUTE IMPORTANCE– Minimum Description Length– Principal Comp Analysis (PCA)– Unsupervised Pair-wise KL Div – CUR decomposition for row & AI
ASSOCIATION RULES– A priori/ market basket
PREDICTIVE QUERIES– Predict, cluster, detect, features
SQL ANALYTICS– SQL Windows, SQL Patterns,
SQL Aggregates
• OAA (Oracle Data Mining + Oracle R Enterprise) and ORAAH combined• OAA includes support for Partitioned Models, Transactional, Unstructured, Geo-spatial, Graph data. etc,
Oracle’s Machine Learning & Adv. Analytics Algorithms
FEATURE EXTRACTION– Principal Comp Analysis (PCA)– Non-negative Matrix Factorization– Singular Value Decomposition (SVD)– Explicit Semantic Analysis (ESA)
TEXT MINING SUPPORT– Algorithms support text– Tokenization and theme extraction– Explicit Semantic Analysis (ESA) for
document similarity
STATISTICAL FUNCTIONS– Basic statistics: min, max,
median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc.
R PACKAGES– CRAN R Algorithm Packages
through Embedded R Execution– Spark MLlib algorithm integration
EXPORTABLE ML MODELS– REST APIs for deployment
*
X1
X2
A1A2 A3A4 A5 A6 A7
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
STATISTICAL FUNCTIONS
– Descriptive statistics (e.g. median, stdev, mode, sum, etc.)
– Hypothesis testing (t-test, F-test, Kolmogorov-Smirnov test, Mann Whitney test, Wilcoxon Signed Ranks test
– Correlations analysis(parametric and nonparametric e.g.Pearson’s test for correlation, Spearman's rho coefficient, Kendall's tau-b correlation coefficient)
– Ranking functions
– Cross Tabulations with Chi-square statistics|
– Linear regression
– ANOVA (Analysis of variance)
– Test Distribution fit (e.g. Normal distribution test, Binomial test, Weibull test, Uniform test, Exponential test, Poisson test, etc.)
– Statistical Aggregates (min, max, mean, median, stdev, mode, quantiles, plus x sigma, minus x sigma, top n outliers, bottom n outliers)
Oracle’s Machine Learning & Adv. Analytics Algorithms*
ANALYTICAL SQL
– SQL Windows
– SQL Aggregate functions
– LAG/LEAD functions
– SQL for Pattern Matching
– Additional approximate query processing: APPROX_COUNT, APPROX_SUM, APPROX_RANK
– Regular Expressions
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Statistical Functions
SELECT SUBSTR(cust_income_level, 1, 22) income_level,
AVG(DECODE(cust_gender, 'M', amount_sold, null)) sold_to_men,
AVG(DECODE(cust_gender, 'F', amount_sold, null)) sold_to_women,
STATS_T_TEST_INDEPU(cust_gender, amount_sold, 'STATISTIC', 'F') t_observed,
STATS_T_TEST_INDEPU(cust_gender, amount_sold) two_sided_p_value
FROM customers c, sales s
WHERE c.cust_id = s.cust_id
GROUP BY ROLLUP(cust_income_level)
ORDER BY income_level, sold_to_men, sold_to_women, t_observed;
Simple SQL Syntax—Statistical Comparisons (t-tests)
STATS_T_TEST_INDEPU (SQL) Example; P_Values < 05 show statistically significantly differences in the amounts purchased by men vs. women
Compare AVE Purchase Amounts Men vs. Women Grouped_By INCOME_LEVEL
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
OAA Model Build and Real-time SQL Apply Prediction
BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name => 'BUY_INSUR1',
mining_function => dbms_data_mining.classification,
data_table_name => 'CUST_INSUR_LTV',
case_id_column_name => 'CUST_ID',
target_column_name => 'BUY_INSURANCE',
settings_table_name => 'CUST_INSUR_LTV_SET');
END;
/
Simple SQL Syntax—Classification Model
Select prediction_probability(BUY_INSUR1, 'Yes'
USING 3500 as bank_funds, 825 as checking_amount, 400 as credit_balance, 22 as age,
'Married' as marital_status, 93 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership)
from dual;
ML Model Build (PL/SQL)
Model Apply (SQL query)
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
OAA Model Build and Real-time SQL Apply Prediction
BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name => 'BUY_INSURANCE_AI',
mining_function => DBMS_DATA_MINING.ATTRIBUTE_IMPORTANCE,
data_table_name => 'CUST_INSUR_LTV',
case_id_column_name => 'cust_id',
target_column_name => 'BUY_INSURANCE',
settings_table_name => 'Att_Import_Mode_Settings');
END;
/
Simple SQL Syntax—Attribute Importance
SELECT attribute_name, explanatory_value, rank
FROM BUY_INSURANCE_AI
ORDER BY rank, attribute_name;
ML Model Build (PL/SQL)
Model Results (SQL query)
ATTRIBUTE_NAME RANK ATTRIBUTE_VALUE
BANK_FUNDS 1 0.2161
MONEY_MONTLY_OVERDRAWN 2 0.1489
N_TRANS_ATM 3 0.1463
N_TRANS_TELLER 4 0.1156
T_AMOUNT_AUTOM_PAYMENTS 5 0.1095
A1A2 A3A4 A5 A6 A7
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
ORE Model Build
> ore.odmAI (BUY_INSURANCE ~ ., CUST_INSUR_LTV)
Call:
ore.odmAI(formula = BUY_INSURANCE ~ ., data = CUST_INSUR_LTV)
Simple R Language Syntax—Attribute ImportanceML Model Build (R)
Model Results (R)Importance:
importance rank
BANK_FUNDS 0.2161187797 1
MONEY_MONTLY_OVERDRAWN 0.1489347141 2
N_TRANS_ATM 0.1463026512 3
N_TRANS_TELLER 0.1155879786 4
T_AMOUNT_AUTOM_PAYMENTS 0.1095178647 5
A1A2 A3A4 A5 A6 A7
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
OML4Py Model Build
> ai_mod = ai(**setting) # Create AI model object
> ai_mod = ai_mod.fit(train_x, train_y)
Simple Python Language Syntax—Attribute ImportanceML Model Build (Python)
Model Results (Python)Importance:
variable importance rank
BANK_FUNDS 0.2161187797 1
MONEY_MONTLY_OVERDRAWN 0.1489347141 2
N_TRANS_ATM 0.1463026512 3
N_TRANS_TELLER 0.1155879786 4
T_AMOUNT_AUTOM_PAYMENTS 0.1095178647 5
A1A2 A3A4 A5 A6 A7
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine LeaningMultiple Languages UIs Supported for End Users & Apps Development
Application DevelopersDBAs
R & Python Data Scientists “Citizen” Data ScientistsNotebook Users & DS Teams
New! New!
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 15
Manage and Analyze All Your Data
Big Data SQL / R
SQL / R / Python
Object Store
“Engineered Features”– Derived attributes that reflect domain knowledge—key to best models e.g.:• Counts• Totals• Changes
over time
Boil down the Data Lake
Architecturally, Many Options and Flexibility
Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
OAA Algorithm Performance
Algorithm Time (sec.) Number of Attributes
SVM SGD Solver 83s 900
GLM Classification 180s 9000
SVM IPM Solver 811s 900
GLM Regression 59s 900
K-Means 168s 9000
• In-memory 640 million rows, Airline On-Time dataset
• SPARC M8-2 hardware
• Courtesy of the SPARC Performance Team
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Operationalizing and Embedding Analytics for Action
17
How long does it take to put a defined model into operational use?
?
??Length of time to put a model into production.
Based on 141 respondents who stated they are doing this today
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Forth CorporationExtracting Meaningful Information from Customer User Activity
Sources: https://www.youtube.com/watch?v=xHNhghPGy-g&feature=youtu.be
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
DX MarketingEnterprise Level Data and Analytics for More Targeted Marketing
Reference: https://www.youtube.com/watch?v=QwkkHunVT5c&feature=youtu.be
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
UK National Health ServiceCombating Healthcare Fraud
Objectives
Use new insight to help identify cost savings and meet goals
Identify and prevent healthcare fraud and benefit eligibility errors to save costs
Leverage existing data to transform business and productivity
Solution
Identified up to GBP100 million (US$156 million) potentially saved through benefit fraud and error reduction
Used anomaly detection to uncover fraudulent activity where some dentists split a single course of treatment into multiple parts and presented claims for multiple treatments
Analyzed billions of records at one time to measure longer-term patient journeys and to analyze drug prescribing patterns to improve patient care
“Oracle Advanced Analytics’ data mining capabilities and Oracle
Exalytics’ performance really impressed us. The overall solution is
very fast, and our investment very quickly provided value. We can
now do so much more with our data, resulting in significant
savings for the NHS as a whole”– Nina Monckton, Head of Information Services,
NHS Business Services Authority
Oracle Exadata Database Machine
Oracle Advanced Analytics
Oracle Exalytics In-Memory Machine
Oracle Endeca Information Discovery
Oracle Business Intelligence EE
Update: £400M confirmed fraud£700M additional potential identified
£1 Billion in savings…Moving to Cloud
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Valdosta State UniversityLeveraging ML and Analytics in Higher Education
Objectives
Customizable Dashboard that is interactive, identifies the number of “at-risk students” in courses and prioritizes risk based on their risk indicators.
Solution
Student Profiles provide important information about students as well likelihood of success.
Early Alert Triggers through a flagging system that allows faculty to alert advisors, student success personnel, housing faculty when needed
Student Support and Resource Workflow that shows when students utilize advise and student services –integrating with the student portal
"Institutions use a variety of technologies, both on premise and
cloud. Hybrid environments often lack the ability to provide
integrations across all enterprise resources. Choosing the right
technology will enable you to continue developing data-driven
solutions."
– Brian Haugabrook, Information Technology, Valdosta State Univ.
Excerpted from: https://www.oracle.com/ocom/groups/public/@otn/documents/webcontent/5232486.pdf
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
FiservRisk Analytics in Electronic Payments
Objectives
Prevent $200M in losses every year using data to monitor, understand and anticipate fraud
Solution
We installed OAA analytics for model development during 2014
When choosing the tools for fraud management, speed is a critical factor
OAA provided a fast and flexible solution for model building, visualization and integration with production processes
“When choosing the tools for fraud management, speed is a
critical factor. Oracle Advance Analytics provided a fast and
flexible solution for model building, visualization and integration
with production processes.”
– Miguel Barrera, Director of Risk Analytics, Fiserv Inc.
– Julia Minkowski, Risk Analytics Manager, Fiserv Inc.
3 months to run & deploy Logistic Regression
(using SAS) 1 month to estimate and deploy Trees and GLM
1 week to estimate, 1 week to install rules in online application
1 day to estimate and deploy Trees + GLM models (using Oracle Advanced Analytics)
Oracle Advanced Analytics
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Caixa Bank (the leading financial group in Spain)
Caixa Bank Leads Innovation with Big Data Technology
Objectives
Solution
“In order to meet today’s business needs and better serve our
customers and improve their experience, it is important to
embrace big data to enable new analytic capabilities and manage
more information,”
– Jordi Fontanals, chief operations officer (and former CIO) of
CaixaBank in Barcelona, Spain, and winner of the Oracle
Excellence Award - CIO of the Year—Europe, Middle East, Africa
Need to serve 14M customers every day through multi-channels
Need to find new and better ways to interact with customers in the digital age.
Caixa Bank chose the Big Data Solution from Oracle
Exadata, Advanced Analytics, Big Data Appliance, Exalytics
Migrated all their previous analytics platforms to an Oracle architecture on hardware running Intel processors
Can now provide personalized offers for an individual vspreviously only by a segment
Predictive models to help risk management
Sources: https://www.youtube.com/watch?v=xHNhghPGy-g&feature=youtu.be
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Introducing…
ADW’s Oracle Machine Learning SQL Notebooks
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
ADWC + Oracle Machine Learning Notebooks
Oracle Exadata Cloud Service
Oracle Database Cloud Service
Express Cloud Service
Data Warehouse Services(EDWs, DW, departmental marts and sandboxes)
Autonomous Data Warehouse Cloud
Service Console
Built-in Access Tools
Oracle Machine Learning
Service Management
Autonomous Database Cloud
Oracle SQL Developer
Developer Tools
Oracle Object Storage CloudFlat Files and Staging
Data Integration
Services
Oracle Data Integration Platform Cloud
3rd Party DI on Oracle Cloud Compute
3rd Party DI On-premises
26
Oracle Analytics Cloud
Analytics
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning
Key Features
• Collaborative UI for data scientists– Packaged with Autonomous Data
Warehouse Cloud (V1)
– Easy access to shared notebooks, templates, permissions, scheduler, etc.
– SQL ML algorithms API (V1)
– Supports deployment of ML analytics
Machine Learning Notebook for Autonomous Data Warehouse Cloud
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning
Key Features
• Collaborative UI for data scientists– Packaged with Autonomous Data
Warehouse Cloud (V1)
– Easy access to shared notebooks, templates, permissions, scheduler, etc.
– SQL ML algorithms API (V1)
– Supports deployment of ML analytics
Machine Learning Notebook for Autonomous Data Warehouse Cloud
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
CLASSIFICATION– Naïve Bayes– Logistic Regression (GLM)– Decision Tree– Random Forest– Neural Network– Support Vector Machine– Explicit Semantic Analysis
CLUSTERING– Hierarchical K-Means– Hierarchical O-Cluster– Expectation Maximization (EM)
ANOMALY DETECTION– One-Class SVM
TIME SERIES– Holt-Winters, Regular & Irregular,
with and w/o trends & seasonal– Single, Double Exp Smoothing
REGRESSION– Linear Model– Generalized Linear Model– Support Vector Machine (SVM)– Stepwise Linear regression– Neural Network
ATTRIBUTE IMPORTANCE– Minimum Description Length– Principal Comp Analysis (PCA)– Unsupervised Pair-wise KL Div
ASSOCIATION RULES– A priori/ market basket
PREDICTIVE QUERIES– Predict, cluster, detect, features
SQL ANALYTICS– SQL Windows, SQL Patterns,
SQL Aggregates
•OAA includes support for Partitioned Models, Transactional, data. etc,
Oracle Machine Learning Algorithms
FEATURE EXTRACTION– Principal Comp Analysis (PCA)– Non-negative Matrix Factorization– Singular Value Decomposition (SVD)– Explicit Semantic Analysis (ESA)
STATISTICAL FUNCTIONS– Basic statistics: min, max,
median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc.
X1
X2
A1A2 A3A4 A5 A6 A7
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Quick OML + OAC Demo:
Targeting Best Likely High Credit Score Customers
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Build Predictive Models to Predict Customers who are Likely to Have Good_Credit
39
Split Data into Train and Test
Build and Test Classification Model
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Try ADWC + OML for FREE! FREE Oracle Cloud Credits - https://go.oracle.com/adwc
1
2
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Additional ML and AA Product Details
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Data Miner GUIDrag and Drop, Workflows, Easy to Use UI for “Citizen Data Scientist”
• Easy to use to define analytical methodologies that can be shared
• SQL Developer Extension
• Workflow API and generates SQL code for immediate deployment
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle R EnterpriseR Language API Component to OAA Machine Learning Functions and SQL
• R languageSQL“push down”– Transparency layer
for “push down” to equivalent SQL for parallelized in-DB processing
• Direct access to DB data– ROracle pkg for OCI
connectivity
– “Embedded R” call outs to R packages
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
R: Transparency via function overloadingInvoke in-database aggregation function
> aggdata <- aggregate(ONTIME_S$DEST, + by = list(ONTIME_S$DEST), + FUN = length)
ONTIME_S
In-dbStats
Oracle Database
Oracle SQL
select DEST, count(*)from ONTIME_Sgroup by DEST
Oracle Advanced Analytics
ORE Client Packages
Transparency Layer
> class(aggdata)[1] "ore.frame"attr(,"package")[1] "OREbase"> head(aggdata)
Group.1 x1 ABE 2372 ABI 343 ABQ 13574 ABY 105 ACK 36 ACT 33
Database Server
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
How Oracle R Enterprise Works
•R-SQL Transparency Layer for “Push Down” to SQL for in-Database execution
• Interactive display of graphical results using R
•All the power and flexibility of open source R
•Leverage in-Database algorithms (ORE.odmSVM, ORE.odmDT, etc.)
•Ability to perform analytics using R if that’s data scientist’s native language
•Leverage database strengths: SQL parallelism, scale to large datasets, security
Other R packages
Oracle R Enterprise (ORE) packages
R-> SQL Transparency “Push-Down”
•R Engine(s) spawned by Oracle DB for data parallelism
•ore.groupApply for high performance scoring
•Enables deployment and automation of R scripts
R-> SQL
Results
In-Database ML & Stats SQL Functions
R Engine Other R packages
Oracle R Enterprise packages
Embedded R Package Callouts
R
Results
1 2 3
Oracle Database 18c
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle R Advanced Analytics for Hadoop (ORAAH)R Language API Component to OAA Machine Learning Functions and SQL
• R & Java nterface on BDA/BDA Cloud– Manipulate data stored in a local File System, HDFS, HIVE, Impala or JDBC sources– General computation framework where users invoke parallel, distributed MapReduce
jobs from R, writing custom mappers and reducers in R while also leveraging open source CRAN packages.
• Parallel and distributed Machine Learning algorithms • Leverage Hadoop nodes and Spark for cluster scalable, high performance• Functions use the expressive R formula object for Spark parallel execution
–ORAAH's custom LM/GLM/MLP NN algorithms scale better and run faster than the open-source Spark Mllib. Also Expose Mllib algorithms
• Availability• On premise: Part of Oracle Big Data Connectors license Oracle BDA• Cloud: Part of Oracle Big Data Cloud Service and Oracle Big Data Cloud at Customer
Oracle Cloud
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle R Advanced Analytics for Hadoop (ORAAH)R Language API Component to OAA Machine Learning Functions and SQL
• New Algorithms:
– ELM (Extreme Learning Machines) and Hierarchical-ELM (Oracle’s MPI/Spark-engines)
– Distributed Stochastic PCA (Oracle’s MPI/Spark-engines)
– Distributed Stochastic SVD (Oracle’s MPI/Spark-engines)
– More statistical distributions, special and aggregation functions• Beta, Binomial, Cauchy, Chi-squared, Exponential, F-distribution, Gamma, Geometric, Hypergeometric, Log-normal, Normal,
Poisson, Student's t, Triangular, Uniform, Weibull and Pareto
• Special functions: gamma(x), lgamma(x), digamma(x), trigamma(x), lanczos(x), factorial(x), lfactorial(x), lbeta(a, b), lchoose(n, k) - natural logarithm of the binomial coefficient
• New Aggregate Functions: avg(colExpr), max(colExpr), min(colExpr), sd(colExpr), sum(colExpr), variance(colExpr), kurtosis(colExpr), skewness(colExpr)
• Spark 2.x support
• Misc ML new features/enhancements
Oracle Cloud
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Newest Additional “GA” ML Features:
Oracle Advanced Analytics 18c & 12.2Oracle Data Miner 18.4 UIORE 1.5.1ORAAH 2.8
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
New in 18c: Machine Learning and Advanced Analytics
• Autonomous Data Warehouse Cloud + OML
• In-DB Machine Learning Algorithms (SQL API)
– Scalable Random Forests for Classification
– Scalable Neural Networks for both classification and regression
– CUR decomposition algorithm for attribute and row importance
– Time Series: State of the art forecasting1 using Exponential Smoothing. • Includes all popular models e.g. Holt-Winters. Supports trend, damped
trend and seasonality. Performs automatic aggregations based on user define time periods or handles irregularly spaced data as direct input. Handles missing values.
• OML Microservices for applications deployment
1 Gartner paper
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics 12.2New Oracle Database Features
• Significant Performance Improvements for all Algorithms– New parallel model build / apply redesigned infrastructure to enable faster new algorithm introduction
– Scale to larger data volumes found in big data and cloud use cases
• Unsupervised Feature Selection– Unsupervised algorithm for pair-wise correlations (Kullback-Leibler Divergence (KLD))
for numeric & categorical attributes to find highest “information containing” attributes
• Association Rules Enhancements– Adds calculation of values associated with AR rules such as sales amount to indicate the
value of co-occurring items in baskets
– Can filter input items prior to market basket analysis
• Partitioned Models – Instead of building, naming and referencing 10s or 1000s of models, partitioned
models organize and represent multiple models as partitions in a single model entity
Partition 1
Partition 2…
Partition …N
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics 12.2New Cognitive Modeling for Text Processing• Explicit Semantic Analysis (ESA) algorithm
– Useful technique for extracting meaningful, interpretable features; better than LDA
– English Wikipedia is Text corpus default to equate tokens with human identifiable features and concepts
– ESA improves text processing, classification, document similarity and topic identification
• Compare docs that may not mention same topics e.g. al-Qa ida or Osama bin Laden:Document 1 'Senior members of the Saudi royal family paid at least $560 million to Osama bin Laden terror group and the Taliban for an agreement his forces would not attack targets in Saudi Arabia, according to court documents. The papers, filed in a $US3000 billion ($5500 billion) lawsuit in the US, allege the deal was made after two secret meetings between Saudi royals and leaders of al-Qa ida, including bin Laden. The money enabled al-Qa ida to fund training camps in Afghanistan later attended by the September 11 hijackers. The disclosures will increase tensions between the US and Saudi Arabia.'
Document 2 'The Saudi Interior Ministry on Sunday confirmed it is holding a 21-year-old Saudi man the FBI is seeking for alleged links to the Sept. 11 hijackers. Authorities are interrogating Saud Abdulaziz Saud al-Rasheed "and if it is proven that he was connected to terrorism, he will be referred to the sharia (Islamic) court," the official Saudi Press Agency quoted an unidentified ministry official as saying.‘
– ESA Similarity Score = 0.62
• New Wikipedia-Based Cognitive Model Available for Text Processing blog post
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle R Enterprise 1.5.1
• OREdplyr– OREdplyr provides a subset of dplyr functionality
extending the Oracle R Enterprise transparency layer.
• OAAgraph– Supports use a graph representation of data (Oracle PGX)
where data entities are nodes and relationships are edges.
• ORE support for new in-database algorithms – Expectation Maximization (EM)
– Explicit Semantic Analysis (ESA)
– Singular Value Decomposition (SVD)
• ORE support for Automated Text Processing
• ORE support for Partitioned Models
• Extensible R Algorithm Models
Partition 1
Partition 2…
Partition …N
https://docs.oracle.com/cd/E83411_01/ORERN/toc.htm#ORERN-GUID-3B6ED739-40BC-49AF-A1F9-7E9EC93A2429
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
• ORAAH analytics as standalone Java library
• New MPI/Spark-based Algorithms
– ELM and H-ELM on Spark+MPI
– Distributed Stochastic PCA & SVD
• Updated ORAAH GLM and LM algorithms
• Neural Networks full formula processing in Spark
• Full formula support and summary for orch.lm2() and orch.glm2
• Support for IMPALA and Spark DataFrames
• Gradient Boosted Trees (Spark MLlib)
63
Oracle R Advanced Analytics for Hadoop (ORAAH)New Features in ORAAH 2.8
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
ORAAH: Spark + MPI Combined Performance
64
• Spark Strengths
– Excellent for embarrassingly parallel problems
– Poor at iterative problems
• MPI Strengths– Excels for highly interconnected
problems with thousands of nodes
– Strong scaling for iterative problems
• Spark + MPI Combined Performance
– FASTER! Best of both worlds
+
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
New Oracle Machine Learning for Python—OML4Py(Available Soon on OTN for Download as OAA Add-in Component —like Oracle R Enterprise)
Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning for PythonOracle Advanced Analytics Option to 18c+
• Similar architecture to OAA’s Oracle R Enterprise
• OML4Py Transparency Layer – Use Oracle Database as High Performance
Computing environment
• OML4Py OAA Model Build and Apply – Use OAA parallel and distributed ML algorithms
– Manage Python scripts and Python objects in Oracle Database
• OML4Py Embedded Python– Make callout to Python packages
– Integrate Python results into applications via SQL
66
Oracle Database
User tables
In-dbstats
Database ServerMachine
SQL InterfacesSQL*Plus,
SQLDeveloper, …
OML for Python
Python Client
Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
cx_Oracle
• Python package enabling scalable and performant connectivity to Oracle Database
– Open source, publicly available on PyPI, OTN, and github
– Oracle is maintainer
• Oracle Database Interface for Python conforming to Python DB API 2.0 specification
– Optimized driver based on OCI
– Execute SQL statements from Python
– Enables transactional behavior for insert, update, and delete
Oracle Database
cx_Oracle
67
Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
OML4Py Embedded Python
def fit(data):
from sklearn.svm import LinearSVC
x = data.drop('TARGET',
axis = 1).values
y = data['TARGET']
return LinearSVC().fit(x, y)
oml.script.create('sk_svc_fit', fit,
overwrite = True)
oml.script.dir()
mod = oml.table_apply(train_dat,
func = 'sk_svc_fit',
oml_input_type = 'pandas.DataFrame')
Confidential – Oracle Internal/Restricted/Highly Restricted 68
Client Python Engine
OML4Py
User tables
pyq*eval () interface
2
3Oracle Database
extproc
DB Python Engine4
OML4Py
1
Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
oml.group_apply – partitioned data flow
Client Python Engine
OML4Py
User tables
DB Python Engine
pyq*eval () interface
extproc
2
3
4
OML4Py
Oracle Database
extproc
DB Python Engine4
OML4Py
def build_lm(dat):
from sklearn import linear_model
lm = linear_model.LinearRegression()
X = dat[['PETAL_WIDTH']]
y = dat[['PETAL_LENGTH']]
lm.fit(X, y)
return lm
index = oml.DataFrame(IRIS['SPECIES'])
mods = oml.group_apply(
IRIS[:,['PETAL_LENGTH',
"PETAL_WIDTH",
'SPECIES']],
index,
func=build_lm)
sorted(mods.pull().items())
1
Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Embedded Python Execution functions
• oml.do_eval(func[, func_value, func_owner, . . . ])
– Executes the user-defined Python function at the Oracle Database server machine
• oml.table_apply(data, func[, func_value, . . . ])
– Executes the user-defined Python function at the Oracle Database server machine supplying data pulled from Oracle Database
• oml.row_apply(data, func[, func_value, . . . ])
– Partitions a table or view into row chunks and executes the user-defined python function on each chunk within one or more Python processes running at the Oracle Database server machine
• oml.group_apply(data, index, func[, . . . ])
– Partitions a table or view by the values in column(s) specified in index and executes the user-defined python function on those partitions within one or more Python processes running at the Oracle Database server machine
• oml.index_apply(times, func[, func_value, . . . ])
– Executes the user-defined python function multiple times inside Oracle Database server
Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Script repository functions for saving Python scripts in ODB
• oml.script.create(name, func[, is_global, . . . ])
– Creates a Python script, which contains a single function definition, in the Oracle Database Python script repository
• oml.script.dir([name, regex_match, sctype])
– Lists the scripts present in the Oracle Database Python script repository
• oml.script.load(name[, owner])
– Loads the named script from the Oracle Database Python script repository as a callable object
• oml.script.drop(name[, is_global, silent])
– Drops the named script from the Oracle Database Python script repository
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
New AutoML included initially available in Oracle Machine Learning for Python—OML4Py(Available Soon on OTN for Download as OAA Add-in Component —like Oracle R Enterprise)
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Model Build & Real-time SQL Apply Prediction
ms = ModelSelection(
mining_function = 'classification',
score_metric = 'accuracy')
best_model = ms.select(X_train, y_train)
OML4Py Syntax
y_pred = best_model.predict(X_test)
ML Model Build Best Model (Python)
Model Apply (Python)
Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
Automated Machine Learning (AutoML) Components
74
X1 X2 X3 X4 X5 Y
Machine Learning Pipeline ML Model
X1 X2 X3 X4 X5
Y
Inference
AutoMS4x faster than
exhaustive search
AutoFS>50% reduction in
features
AutoTuneUpto 24% accuracy
improvement
Training Data
Auto Feature Selection Auto Model Selection Auto Tuning of Hyperparameters
1 2 3
Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
AutoTune: Evaluation for OAA Neural Network
.
Confidential – Oracle Internal/Restricted/Highly Restricted 75
8-22% improvement for several datasets
Avg Improvement of 2.86%
40%
50%
60%
70%
80%
90%
100%
Leav
e-o
ut
Test
Set
Acc
ura
cy (
%)
OAA Neural Network - Default vs Tuned Accuracy
Default Tuned
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Restricted 76
Oracle Applications that Embed Oracle Machine Learning Algorithms
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
HCM Cloud Workforce Predictions
CRM Sales Cloud Sales Prediction
Retail GBUCustomer Insights, Customer Segmentation
Adaptive Intelligent Applfor Manufacturing
CPQ CloudConfigure, Price, Quote
Industry Data ModelsComms, SNA, Utilities, Airlines, Retail, …
Oracle E-Business SuiteSpend Classification
Oracle Identity MgmtAdaptive Access Mgmt
FSGBUAnalytical ApplicationsInfrastructure
Oracle Machine Learning Applications: GA
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
HCM Cloud Workforce Predictions
CRM Sales Cloud Sales Prediction
Retail GBUCustomer Insights, Customer Segmentation
Adaptive Intelligent Applfor Manufacturing
CPQ CloudConfigure, Price, Quote
Industry Data ModelsComms, SNA, Utilities, Airlines, Retail, …
Oracle E-Business SuiteSpend Classification
Oracle Identity MgmtAdaptive Access Mgmt
FSGBUAnalytical ApplicationsInfrastructure
Oracle Machine Learning Applications: GA
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Adaptive Intelligent (AI) Apps for Manufacturing
ML Use Cases
• Insights (Patterns and Correlations Analysis)– Discover key influencers and patterns that affect yield & quality
• Predictive Analytics– Predictive critical outcomes during manufacturing to minimize losses
Reasons why using/like OAA’s ML
• Easy-to-integrate R & PL/SQL APIs for many ML algorithms
• In-database execution & scalable performance
• Enterprise grade support for OAA ML
GA Q4FY18
Achieve Manufacturing Operational Excellence using Machine Learning & AI
Screen shots, conceptual diagram
Screen shots, conceptual diagram
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
New Oracle Machine Learning Microservices
Cognitive Microservices now available internally as hosted ML Services for Oracle Applications
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
New Oracle Machine Learning Microservices(Available Now to Internal Oracle Application Teams)
• Services for building, storing, and deploying Oracle Machine Learning models
• Cognitive Services
– Image and Text
• Documented RESTful APIs for applintegration
• Containers (Docker images) for portability
• Model repository for storing, versioning, and comparing ML models
• Kubernetes support for container management
81
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
OML Image Microservices
• Cognitive APIs for images: tagging, classification, face detection, age, gender, inappropriate content, image similarity…– Deep Learning models &
Transfer Learning
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
OML Image Microservices
• Cognitive APIs for images: tagging, classification, face detection, age, gender, inappropriate content, image similarity…– Deep Learning models &
Transfer Learning
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
OML Image Microservices
• Cognitive APIs for images: tagging, classification, face detection, age, gender, inappropriate content, image similarity…– Deep Learning models &
Transfer Learning
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
OML Text Microservices
• Cognitive Text API– Summary
– Key words
– Topics
– Sentiment
– Similarity
• Uses Explicit Semantic Analysis (ESA) Wikipedia model
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
OML Text Microservices
• Cognitive Text API– Summary
– Key words
– Topics
– Sentiment
– Similarity
• Uses Explicit Semantic Analysis (ESA) Wikipedia model
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
OML Text Microservices
• Cognitive Text API– Summary
– Key words
– Topics
– Sentiment
– Similarity
• Uses Explicit Semantic Analysis (ESA) Wikipedia model
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
OML Text Microservices
• Cognitive Text API– Summary
– Key words
– Topics
– Sentiment
– Similarity
• Uses Explicit Semantic Analysis (ESA) Wikipedia model
Copyright © 2018 Oracle and/or its affiliates. All rights reserved. | 93
Content and Experience CloudPRO 4648
Deliver Personalized Experiences and Automate Content Management with AI and ML
Tuesday10/23 5:45p
Marriott Marquis (Yerba Buena Level) Nob Hill C/D
Kiran Bellare
Content and Experience Cloud (CEC)• Originally Documents Cloud Service which focused on EFSS (Enterprise File Sync and Share) for simple
document management, focus was on: Files, Folders and Sharing
• Migrating toward Web Content Management and Customer Experience, focus shifting to: Digital Assets, Content Items, Web Sites, Multi-channel delivery, and the Customer Experience.
Content and Experience Cloud• Upon instance delivery, CEC is an empty container
• CEC has no specific target business focus
• Customers leverage the toolset to construct their environments: content types, assets, delivery channels, web site pages, custom web components, etc.
• Customers then develop specific content (images, descriptions, blogs, articles, summaries, reviews, etc)
• Phase 1: Content Development and Curation
• Apply non-supervised learning algorithms, Wikipedia based models as common baseline especially for smaller repositories.
• Text and Image keyword tagging
• Identify inappropriate content
• Auto-classification via similarity match against customer defined categories or hierarchies
• Similar content via similarity recommendations
• Sentiment analysis on visitor comments
• OML Microservices
• Cognitive Image: Classification, NSFW
• Cognitive Text: Topics, Keywords, Similarity, Sentiment
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Conclusion:Data Management + Machine Learning— Simpler, More Powerful, Smarter
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
“Why Oracle?
Because that’s where the data is!”
– Larry Ellison, Executive Chairman and CTO of Oracle Corporation
97
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Getting Started—Oracle ML/AA Resources & LinksOracle Advanced Analytics Overview Information
• Oracle Machine Learning Newest Features and Road Map.pptx presentation• Blog post: Simple Guide to Oracle’s Machine Learning and Advanced Analytics • Oracle Advanced Analytics Public Customer References• Oracle’s Machine Learning and Advanced Analytics Data Management Platforms white paper on OTN • Oracle INTERNAL ONLY Oracle ML & AA Product Management Wiki and Beehive Workspace
YouTube recorded Oracle Advanced Analytics Presentations and Demos, White Papers • Oracle's Machine Learning & Advanced Analytics 12.2 & Oracle Data Miner 4.2 New Features YouTube video• Library of YouTube Movies on Oracle Advanced Analytics, Data Mining, Machine Learning (7+ “live” Demos
e.g. Oracle Data Miner 4.0 New Features, Retail, Fraud, Loyalty, Overview, etc.)• Overview YouTube video of Oracle’s Advanced Analytics and Machine Learning
Getting Started/Training/Tutorials Link to OAA/Oracle Data Miner Workflow GUI Online (free) Tutorial Series on OTN Link to OAA/Oracle R Enterprise (free) Tutorial Series on OTN Link to Try the Oracle Cloud Now! Link to Getting Started w/ ODM blog entry Link to New OAA/Oracle Data Mining 2-Day Instructor Led Oracle University course. Oracle Data Mining Sample Code Examples
Additional Resources, Documentation & OTN Discussion Forums• Oracle Advanced Analytics Option on OTN page• OAA/Oracle Data Mining on OTN page, ODM Documentation & ODM Blog• OAA/Oracle R Enterprise page on OTN page, ORE Documentation & ORE Blog• Oracle SQL based Basic Statistical functions on OTN• Oracle R Advanced Analytics for Hadoop (ORAAH) on OTN
Analytics and Data Summit , All Analytics, All Data, No Nonsense. March 12-14, 2019, Redwood Shores, CA
•
Send email now to charlie.berger@oracle.com and you’ll get my “away message” with the links.
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Try Oracle Database 18c + Oracle Data Miner for FREE!
Navigate to https://cloud.oracle.com/en_US/database1
2 Sign up for a Free Oracle Database Trial
3
Download SQL Developer and Configure Oracle Data Miner