Create A Machine Learning Notebook
Create A Machine Learning Notebook
In this Lab we will take the role of Lisa who is a data manager who has spent most of her time over the past couple of years extracting
and preparing data for analysis.
The large volumes of data need extracting, and processing mean Lisa spends most of her time waiting for jobs to finish and very little of
her time analyzing the data.
Demands from marketing are forcing a new approach whereby the data remains in the data warehouse and is processed there.
Lisa started taking a look at Oracle and found the simple SQL commands in ADWC are familiar, and execute extremely fast, leveraging
all the performance features of the platform. Furthermore, once she is done can apply the learning models to incoming data at any time,
and allow end user analysts to immediately see mining results.
This drastically reduces the cycle of data preparation, analysis, and publishing. It also means there is no change to analysis/reporting
Data Visualization toolset that users are familiar with.
Lisa has a hunch that weakening sales may be due to the company selling to non-optimal customers; customers who perhaps have
poor credit and fail to make their payments for their purchases.
Lisa has over 100 variables to consider so wants to first explore her data using simple charts and graphs, but then move onto using
Oracle Machine Learning's powerful algorithms to automatically sift through her data to find patterns, new insights and to make
predictions that target her best customers–those who have good credit.
In this Lab we will process a small data set of 100,000 records but could use a 100M or billion row data set without worrying about
processing time.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Section 1: Explore Oracle Machine Learning
1. From your Console click the hamburger icon in the top left corner and selecting Autonomous Data Warehouse
2. This will return you to your database page where you can select your live database.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
2
3. Click Database Actions > View All Database Actions.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
3
5. Enter the login details of the user you created in the previous section and click Sign In.
6. You will see the Oracle Machine Learning Console and its landing page as below:
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
4
7. Click the Hamburger menu in the top left, Select Notebooks.
9. Enter Predict Credit Scores in the name field of the dialog and Click OK.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
5
10. The notebook server will start.
12. Once the notebook server has started and the notebook has loaded we need to set the interpreter binding. Click on the gear icon.
14. For the remainder of this lab we will build the functionality of the Machine Learning Notebook.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
6
Section 2: Working with a Machine Learning Notebook
Prepare the Data in the notebook and click the run icon to execute. In this step we will create 2 views.
2. To create a new Paragraph, position your mouse at the bottom of the previously created Paragraph, +Add Paragraph should pop
up. Click + Add Paragraph to create a new Paragraph.
3. In the newly created Paragraph , delete the %sql text, add the following PL/SQL script and click the Run this paragraph icon:
%script
/* Click on the arrow to the right to execute this.*/
BEGIN
execute immediate 'create or replace view credit_scoring_100k_v as select * from admin.credit_scoring_100k';
execute immediate 'create or replace view credit_scoring_new_cust_v as select * from admin.credit_scoring_100k
sample (10)';
end;
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
7
4. You should receive a PL/SQL procedure successfully completed message and a new Paragraph will have been created.
NOTE: If you do not receive a PL/SQL procedure successfully completed message then check you have typed the code for the
PL/SQL statement to create 2 views exactly as above.
5. In the newly created paragraph, delete the %script text and add the following text in its place:
6. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon
%sql
/* Click on the arrow in the upper right to run a query that shows the tables and views owned by the current ML user (the
current one we are using).*/
Select table_name from user_tables where table_name not like ('DM$%')
UNION
select view_name from user_views where view_name not like ('DM$%')
7. The SQL should execute, and you should be shown a list of tables and views including the ones we created previously:
8. In the newly created paragraph, delete the %sql and add the following text in its place:
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
8
9. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon
%sql
/* Execute this (arrow upper right) */
select column_name from user_tab_cols where table_name='CREDIT_SCORING_100K_V';
11. In the newly created paragraph, delete the % sql and add the following text in its place:
Show a sample of the records in the table. We will use the historical credit scoring data to predict the likelihood of a
customer having good credit in the future
12. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon
%sql
/* This shows the credit scoring data we will use historical data to predict the likelihood of a customer having good credit.
*/
Select * from credit_scoring_100k_v where rownum < 100
13. You should see a sample of the rows in the table returned by the query.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
9
14. In the newly created paragraph, delete the % sql and add the following text in its place:
Good Credit Customers are Hard to Find! Show a Bar Graph of a sample of the records in the table.
15. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon
%sql
select customer_id, credit_score_bin from credit_scoring_100k_v sample (10)
17. To display this data in a graph format, click the Bar Graph icon.
18. You should see a graphical representation of the query. It’s not very informative.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
10
19. To change the settings of the graphical representation, click settings
21. We wish to set the Key to CREDIT_SCORE_BIN and the Values to CUSTOMER_ID COUNT.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
11
c) Add the new key CREDIT_SCORE_BIN by selecting and dragging CREDIT_SCORE_BIN from Available Fields: to Keys
d) Add the new value CUSTOMER_ID by selecting and dragging CUSTOMER_ID from Available Fields: to Values
e) By default, the newly added value field will be aggregated using the SUM function.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
12
f) We wish the newly added value field CUSTOMER_ID SUM to be counted using the COUNT function. To do this click on the
CUSTOMER_ID SUM to display a drop-down list of available functions.
22. a) The Graph should now show a count of Good Credit customers and Other Credit Customers. Hover your mouse over the bars to
see the count.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
13
23. In the newly created paragraph below the bar chart, delete the % sql and add the following text in its place:
24. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon
%sql
select customer_id, age, income, tenure, loan_type, loan_amount, occupation, number_of_current_accounts,
max_cc_spent_amount, mode_job_of_contacts from credit_scoring_100k_v where rownum < 1000
25. You should see a sample of the rows in the table returned by the query.
26. We wish to visualize this in a chart format. To display this data in a graph format, click the Bar Graph icon.
27. You should see a graphical representation of the query. It’s not very informative.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
14
29. The current chart settings will be displayed.
30. We wish to set the Key to MODE_JOB_OF_CONTACTS and the Values to CUSTOMER_ID COUNT.
c) Add the new key MODE_JOB_OF_CONTACTS by selecting and dragging MODE_JOB_OF_CONTACTS from Available Fields:
to Keys
d) Add the new value CUSTOMER_ID by selecting and dragging CUSTOMER_ID from Available Fields : to Values
e) Change the function for the newly added value field CUSTOMER_ID SUM to use the COUNT function. To do this click on the
CUSTOMER_ID SUM to display a drop-down list of available functions and select COUNT.
31. The Graph should now show a count of the job groupings of a set of customers. Hover your mouse over the various bars to see the
count.
33. In the newly created paragraph below the bar chart, delete the % text and add the following text in its place:
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
15
34. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon
%sql
35. You should see a sample of the rows in the table returned by the query.
36. We wish to visualize this in a chart format. For this query we wish to use a pie chart, click the Pie Chart icon.
37. You should see a graphical representation of the query. It’s very colorful but not very informative.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
16
39. The current chart settings will be displayed.
40. We wish to set the Key to OCCUPATION and the Values to CUSTOMER_ID COUNT.
c) Add the new key OCCUPATIONS by selecting and dragging OCCUPATION from Available Fields: to Keys
d) Add the new value CUSTOMER_ID by selecting and dragging CUSTOMER_ID from Available Fields: to Values
e) Change the function for the newly added value field CUSTOMER_ID SUM to use the COUNT function. To do this click on the
CUSTOMER_ID SUM to display a drop-down list of available functions and select COUNT.
41. The Graph should now show a count of the OCCUPATION groupings of a set of customers. Hover your mouse over the various
sections to see the count.
42. In the newly created paragraph below the bar chart, delete the % text and add the following text in its place:
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
17
43. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon
%sql
44. You should see a sample of the rows in the table returned by the query.
45. We wish to visualize this in a chart format. For this query we wish to use a pie chart, click the Pie Chart icon.
46. You should see a graphical representation of the query. Again, It’s very colorful but not very informative.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
18
48. The current chart settings will be displayed.
49. We wish to set the Key to LOAN_TYPE and the Values to CUSTOMER_ID COUNT.
c) Add the new key LOAN_TYPE by selecting and dragging LOAN_TYPE from Available Fields: to Keys
d) Add the new value CUSTOMER_ID by selecting and dragging CUSTOMER_ID from Available Fields: to Values
e) Change the function for the newly added value field CUSTOMER_ID SUM to use the COUNT function. To do this click on the
CUSTOMER_ID SUM to display a drop-down list of available functions and select COUNT.
50. The Graph should now show a count of the LOAN_TYPE groupings of a set of customers. Hover your mouse over the various
sections to see the count.
51. In the newly created paragraph below the bar chart, delete the % text and add the following text in its place:
Review Income by Age and Tenure Grouped by loan type in a Scatter Graph format
52. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon
%sql
select customer_id, age, income, tenure, loan_type, loan_amount, occupation, education_level
from credit_scoring_100k_v where rownum < 40
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
19
53. You should see a sample of the rows in the table returned by the query.
54. We wish to visualize this in a chart format. For this query we wish to use a Scatter Chart, click the Scatter Chart icon.
55. You should see a graphical representation of the query. It is based on only one field and not very informative.
NOTE: In a scatter chart we do not have keys and fields but represent data on the X and Y axis.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
20
58. We wish to set the xAxis to AGE, the yAxis to TENURE, the grouping to LOAN_TYPE and the size to CUSTOMER_ID.
b) Add the new xAxis AGE by selecting and dragging AGE from Available Fields: to xAxis
d) Add the new yAxis TENURE by selecting and dragging TENURE from Available Fields: to yAxis
e) Add the new group LOAN_TYPE by selecting and dragging LOAN_TYPE from Available Fields: to group
f) Add the new size CUSTOMER_ID by selecting and dragging CUSTOMER_ID from Available Fields: to size
59. The Scatter Chart should now show a count of the LOAN_TYPE groupings of a set of customers and be sized by the number of
customers in that LOAN_TYPE. Hover your mouse over the various sections to see the count.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
21
Section 3: Bin the Variables
In this section we will execute a procedure which “bins” all the numeric columns for you. Binning is a way to group a number of more or
less continuous values into a smaller number of "bins". For example, if you have data about a group of people, you might want to
arrange their ages into a smaller number of age intervals. By binning the age of the people into a new column, data can be visualized
for the different age groups instead of for each individual.
1. In the newly created paragraph below the bar chart, delete the % text and add the following text in its place:
Create a view that bins all the numeric columns for you
2. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon
%script
BEGIN
BEGIN
EXECUTE IMMEDIATE 'DROP TABLE bin_num_tbl';
EXCEPTION WHEN OTHERS THEN NULL;
END;
BEGIN
EXECUTE IMMEDIATE 'DROP VIEW mining_data_bin_view';
EXCEPTION WHEN OTHERS THEN NULL;
END;
dbms_data_mining_transform.create_bin_num(
bin_table_name => 'bin_num_tbl');
dbms_data_mining_transform.insert_autobin_num_eqwidth(
bin_table_name => 'bin_num_tbl',
data_table_name => 'CREDIT_SCORING_100K_V',
bin_num => 5,
max_bin_num => 10,
exclude_list => dbms_data_mining_transform.COLUMN_LIST('CUSTOMER_ID'));
dbms_data_mining_transform.xform_bin_num(
bin_table_name => 'bin_num_tbl',
data_table_name => 'CREDIT_SCORING_100K_V',
xform_view_name => 'mining_data_bin_view');
END;
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
22
3. You should receive the following success message.
4. We will now create and run a query using the binned data. In the newly created paragraph below the PL/SQL code, delete the %
text and add the following text in its place:
Explore the data to make sure it makes sense and matches the data they believe we are working with.
5. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon
%sql
select customer_id, age, income, tenure, loan_type, loan_amount, occupation, education_level
from mining_data_bin_view;
6. You should see a sample of the rows in the table returned by the query.
7. We wish to visualize this in a chart format. To display this data in a graph format, click the Bar Graph icon.
8. You should see a graphical representation of the query. It’s not very informative.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
23
9. To change the settings of the graphical representation, click settings
11. We wish to set the Key to AGE and the Values to CUSTOMER_ID COUNT.
c) Add the new key AGE by selecting and dragging AGE from Available Fields: to Keys
d) Add the new value CUSTOMER_ID by selecting and dragging CUSTOMER_ID from Available Fields: to Values
e) Change the function for the newly added value field CUSTOMER_ID SUM to use the COUNT function. To do this click on the
CUSTOMER_ID SUM to display a drop-down list of available functions and select COUNT.
12. The Graph should now show a count of the CUSTOMER_ID. Hover your mouse over the various bars to see the count.
As you can see, you can create some pretty good charts in a Machine Learning notebook but what about the Machine Learning?
Let's Run Some OML Machine Learning Algorithms!
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
24
Section 4: Machine Learning Algorithms in a ML Notebook
By producing and looking at simple graphs (pie, column and scatter plots) we may determine some patterns and make business
decisions based on these patterns, but in most real-world problems, it’s not that easy.
Typically, we have lots of data in both the amount of records, but more interestingly, the number of possible variables that might have
some correlation or influence on our specific problem.
In this business case, we are looking to discover which variables (also referred to as “attributes”) have the strongest correlation with the
business problem to Increase Sales by Targeting our Best Customers; Good Credit Customers. Machine Learning Techniques and
Algorithms can be used to discover and analyze these attributes to better make business decisions.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
25
Regression Technique for predicting a Generalized Linear Models
continuous numerical outcome Multiple Regression —classic
such as customer lifetime value, statistical technique but now
available inside the Oracle
house value, process yield rates.
Database as a highly performant,
scalable, parallized
implementation. Supports ridge
regression, feature creation and
feature selection. Supports text
and transactional data.
26
(dense and sparse) data mining
problems.
Feature Selection and Extraction Produces new attributes as linear Non-negative Matrix
combination of existing Factorization—Maps the original
attributes. Applicable for text data into the new set of
data, latent semantic analysis, attributes
data compression, data
decomposition and projection, Principal Components Analysis
and pattern recognition. (PCA)—creates new fewer
composite attributes that
represent all the attributes.
In this example we will use Oracle Machine Learning's Attribute Importance algorithm and specifically the
DBMS_Predictive_Analytics.Explain simplist version of it. These OML Feature Selection machine learning algorithms automatically
sift through all the input variables or attributes looking for those variables that have the strongest correlation with another key variable.
First, we are looking to find those variables that most influence finding Good Credit customers. While Oracle Machine Learning's
algorithms can build predictive models using hundreds or thousands of “big data” variables, often it is extremely useful simply to find
those fewer variables that are most important. Once found, you can use that information to change business practices, create better
reports and seek out more similar variables that may prove even more informative and useful.
1. In the newly created paragraph, delete the % text and add the following text in its place:
Create Attribute Importance Machine Learning Model for Good Credit Customers
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
27
2. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon. In this step we determine the
attributed which contribute most to good credit.
%script
/* Find the importance of attributes that independently impact the target attribute: CREDIT_SCORE_BIN */
DECLARE
v_sql varchar2(100);
BEGIN
BEGIN
EXECUTE IMMEDIATE 'DROP TABLE ai_explain_output_credit_score_bin';
EXCEPTION WHEN OTHERS THEN NULL;
END;
BEGIN
DBMS_PREDICTIVE_ANALYTICS.EXPLAIN(
data_table_name => 'CREDIT_SCORING_100K_V',
explain_column_name => 'CREDIT_SCORE_BIN',
result_table_name => 'AI_EXPLAIN_OUTPUT_CREDIT_SCORE_BIN');
END;
END;
4. In the newly created paragraph, delete the % text and add the following text in its place:
Display the Top Attributes for Good Credit Customers (change rownum < value to increase Attribute Scope)
5. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon. In this step we determine the
attributed which contribute most to good credit.
%sql
Select * from ai_explain_output_CREDIT_SCORE_BIN where rownum < 7;
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
28
6. You should see the top 7 attribute rows in the table returned by the query.
7. We wish to visualize this in a chart format. To display this data in a graph format, click the Bar Chart icon.
8. You should see a graphical representation of the query. It’s not very informative.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
29
11. We wish to leave the Key as ATTRIBUTE_NAME and change the Values to EXPLANATORY_VALUE SUM.
b) Add the new value by selecting and dragging EXPLANATORY_VALUE from Available Fields: to Values
NOTE: EXPLANATORY_VALUE SUM should be displayed by default. If it is not, then click on the EXPLANATORY_VALUE to
display a drop-down list of available functions and select SUM.
12. The Graph should now show a count of the top 7 attributes contributing to good credit.
We will now create an Attribute Importance Model for MAX_CC_SPENT_AMOUNT to find the importance of attributes that
independently impact the target attribute: MAX_CC_SPENT_AMOUNT (Max Credit Card Spent Amount).
1. In the newly created paragraph, delete the % text and add the following text in its place:
2. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon. In this step we determine the
attributed which contribute most to good credit.
%script
/* Find the importance of attributes that independently impact the target attribute: MAX_CC_SPENT_AMOUNT */
DECLARE
v_sql varchar2(100);
BEGIN
BEGIN
EXECUTE IMMEDIATE 'DROP TABLE ai_explain_output_MAX_CC_SPENT_AMOUNT';
EXCEPTION WHEN OTHERS THEN NULL;
END;
BEGIN
DBMS_PREDICTIVE_ANALYTICS.EXPLAIN(
data_table_name => 'Credit_Scoring_100k_v',
explain_column_name => 'MAX_CC_SPENT_AMOUNT',
result_table_name => 'ai_explain_output_MAX_CC_SPENT_AMOUNT');
END;
END;
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
30
3. You should receive the following success message.
4. In the newly created paragraph, delete the % text and add the following text in its place:
Display the Top Attributes for MAX_CC_SPENT_AMOUNT (change rownum < value to increase Attribute Scope)
5. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon. In this step we determine the
attributed which contribute most to good credit.
%sql
Select * from ai_explain_output_MAX_CC_SPENT_AMOUNT where rownum < 7;
6. You should see the top 7 attribute rows in the table returned by the query.
7. We wish to visualize this in a chart format. To display this data in a graph format, click the Bar Graph icon.
8. You should see a graphical representation of the query. It’s not very informative.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
31
9. To change the settings of the graphical representation, click settings
11. We wish to leave the Key as ATTRIBUTE_NAME and change the Values to EXPLANATORY_VALUE SUM.
b) Add the new value by selecting and dragging EXPLANATORY_VALUE from Available Fields: to Values
NOTE: EXPLANATORY_VALUE SUM should be displayed by default. If it is not, then click on the EXPLANATORY_VALUE to
display a drop-down list of available functions and select SUM.
12. The Graph should now show a count of the top 7 attributes contributing to good credit.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
32
Section 6: Machine Learning Algorithms – Create a Predictive Model
Now that we have found the key attributes that most influence finding Good Credit customers and also making better Maximum Credit
Card Amount decisions, we can leverage Oracle Machine Learning's powerful in-Database, parallelized algorithms to build predictive
models that help better target “the right customers” with the “right offers”.
Data Gathering and Preparation: We have assembled 100K records with 100+ variable about each customer and have created
a target field (Good Customer/Other Customer) so we can use OML's Supervised Algorithms, specifically let's start by using a
decision tree algorithm.
Model Building and Evaluation: We will create a randomly selected sample from our Credit_Scoring_100k historical data and
use 60% as training data for the machine learning model building phase. Then, we'll use the remaining 40% as a holdout sample
to test our model's accuracy using various model evaluation tools such as a “lift chart”.
Knowledge Deployment: Once we are satisfied that we have a useful ML model that can predict with some accuracy which
customers we should target (Good Credit customers), we want to apply our OML model to new customer data inside ADWC and
then take a deeper look at them. Lastly, we'll jump over to Oracle Analytics Cloud for a more interactive, exploratory data
analysis experience but now focusing on our customers of interest (Good Credit customers).
We will now carry out the Preparatory Steps to Automate the Model Build and Test and Clean up using PL/SQL.
1. In the newly created paragraph, delete the % text and add the following text in its place:
Build a classification model and then generate a lift test result and an apply result
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
33
2. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.
%script
/* Build a classification model and then generate a lift test result and an apply result. Click on the arrow in the upper
right. */
DECLARE
v_sql varchar2(100);
BEGIN
/*
--------------------------------------------
-- REMOVE PRIOR RUNS
--------------------------------------------
*/
BEGIN
v_sql := 'DROP TABLE n1_build_settings PURGE';
EXECUTE IMMEDIATE v_sql;
DBMS_OUTPUT.PUT_LINE (v_sql ||': succeeded');
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE (v_sql ||': drop unneccessary - no table exists');
END;
/* drop model */
BEGIN
v_sql := 'CALL DBMS_DATA_MINING.DROP_MODEL(''N1_CLASS_MODEL'')';
EXECUTE IMMEDIATE v_sql;
DBMS_OUTPUT.PUT_LINE (v_sql ||': succeeded');
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE (v_sql ||': drop unneccessary - no model exists');
END;
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
34
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE (v_sql ||': drop unneccessary - no table exists');
END;
/*
-- Populate and Adjust Model Setting (DT) for Model Build
EXECUTE IMMEDIATE 'INSERT INTO n1_build_settings VALUES (''TREE_TERM_MAX_DEPTH'', 7)';
EXECUTE IMMEDIATE 'INSERT INTO n1_build_settings VALUES (''TREE_TERM_MINREC_SPLIT'', 20)';
EXECUTE IMMEDIATE 'INSERT INTO n1_build_settings VALUES (''TREE_TERM_MINPCT_SPLIT'', .1)';
EXECUTE IMMEDIATE 'INSERT INTO n1_build_settings VALUES (''TREE_TERM_MINREC_NODE'', 10)';
EXECUTE IMMEDIATE 'INSERT INTO n1_build_settings VALUES (''TREE_TERM_MINPCT_NODE'', 0.05)';
*/
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
35
EXECUTE IMMEDIATE 'CALL DBMS_DATA_MINING.CREATE_MODEL(''N1_CLASS_MODEL'', ''CLASSIFICATION'',
''N1_TRAIN_DATA'', ''CUSTOMER_ID'','' CREDIT_SCORE_BIN'', ''n1_build_settings'')';
DBMS_OUTPUT.PUT_LINE ('Created model: N1_CLASS_MODEL ');
/* Test the Model by generating a apply result and then create a lift result */
EXECUTE IMMEDIATE 'CALL
DBMS_DATA_MINING.APPLY(''N1_CLASS_MODEL'',''N1_TEST_DATA'',''CUSTOMER_ID'',''N1_APPLY_RESULT'')';
DBMS_OUTPUT.PUT_LINE ('Created apply result: N1_APPLY_RESULT ');
EXECUTE IMMEDIATE 'CALL
DBMS_DATA_MINING.COMPUTE_LIFT(''N1_APPLY_RESULT'',''N1_TEST_DATA'',''CUSTOMER_ID'',''CREDIT_SCORE_BIN'
',''N1_LIFT_TABLE'',''Good Credit'',''PREDICTION'',''PROBABILITY'',100)';
DBMS_OUTPUT.PUT_LINE ('Created lift result: N1_LIFT_TABLE ');
END;
4. In the newly created paragraph, delete the % text and add the following text in its place:
5. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.
%sql
6. You should see the cumulative gains in the table returned by the query.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
36
7. We wish to visualize this in a chart format. To display this data in a graph format, click the Bar Graph icon.
8. You should see a graphical representation of the query. Much more informative but not quite what we want.
11. We wish to leave the Key as QUANTILE_NUMBER and change the Values to GAIN_CUMMULATIVE AVG.
a) Click on the GAIN_CUMMULATIVE SUM to display a drop-down list of available functions and select AVG.
12. The Graph should now show the average cumulative gain.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
37
13. We will now apply the Oracle Machine Learning Classification Model to New Customers. This will Show Customers Most Likely to
Have Good Credit. In the newly created paragraph, delete the % text and add the following text in its place:
Apply the Oracle Machine Learning Classification Model to New Customers display the top predictive attributes
14. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.
%sql
/* This will show the top predictive attributes */
select a.customer_id
, a.prob_Credit_Score_Bin
, b.age, b.income, b.tenure, b.loan_type, b.loan_amount, b.occupation, b.education_level, b.marital_status
from (select * from (select Customer_id, round(prob_Credit_Score_Bin *100,2) prob_Credit_Score_Bin from (select
Customer_ID, prediction_probability(N1_CLASS_MODEL, NULL using *) prob_Credit_Score_Bin from
credit_scoring_new_cust_v))) a, credit_scoring_100k_v b
where a.customer_id = b.customer_id
order by a.prob_Credit_Score_Bin desc
15. You should see the Customers Most Likely to Have Good Credit on a probability of credit score 100.
16. We will now create a new table in order to visualize these results. We will first drop the table created previously. In the newly
created paragraph, delete the % text and add the following text in its place:
17. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.
%script
begin
execute immediate 'drop table credit_score_new_predictions';
DBMS_OUTPUT.PUT_LINE ('succeeded');
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE ('drop unneccessary - no table exists');
END;
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
38
18. You should receive the one of the following messages depending on whether the table exists.
OR
19. We will now create a New Table CREDIT_SCORE_NEW_PREDICTIONS in order to visualize the results of the classification
model. In the newly created paragraph, delete the % text and add the following text in its place:
Visualize the Oracle Machine Learning Classification Model to New Customers display the top predictive attributes
20. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.
%sql
/* create table with results above so we can view in Data Visualization */
create table credit_score_new_predictions as
select a.customer_id
, a.prob_good_credit
, b.age, b.income, b.tenure, b.loan_type, b.loan_amount, b.occupation, b.education_level, b.marital_status
from (select * from (select Customer_id, round(prob_good_credit *100,2) prob_good_credit from (select Customer_ID,
prediction_probability(N1_CLASS_MODEL, 'Good Credit' using *) prob_good_credit from credit_scoring_new_cust_v))) a
, credit_scoring_100k_v b
where a.customer_id = b.customer_id
21. We will now review the CREDIT_SCORE_NEW_PREDICTIONS Table and rank Good Customers Based on Prediction Probablity,
and Other Factors. In the newly created paragraph, delete the % text and add the following text in its place:
Score and rank new customers that are a professional, married, and borrowing for education.
22. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.
%sql
/* Score and rank new customers that are a professional, married, and borrowing for education (in this case). You can
substitute or add other filters. */
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
39
23. You should see the new customers that are a professional, married, and borrowing for education with the highest probability of
having good credit.
24. We will now review the CREDIT_SCORE_NEW_PREDICTIONS Table and rank Customers most likely to have poor credit based
on Prediction Probablity, and Other Factors. In the newly created paragraph, delete the % text and add the following text in its
place:
Score and rank new customers that are a professional, married, and borrowing for education.
25. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.
%sql
/* Score and rank new customers that are a professional, married, and borrowing for education (in this case). You can
substitute or add other filters. */
select * from credit_score_new_predictions
WHERE LOAN_TYPE = 'Education' and MARITAL_STATUS = 'Married' and OCCUPATION = 'Professional'
order by rank() over (order by PROB_GOOD_CREDIT asc)
26. You should see the new customers that are a professional, married, and borrowing for education with the lowest probability of
having good credit.
Congratulations! You have created a ML Notebook from beginning to end in Oracle Autonomous Data Warehouse. As you have
experienced ML Notebooks allow us to utilize combine the powerful of data visualization with powerful ML algorithms which enable the
ability to run predictive analysis on very large data sets.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
40