0% found this document useful (0 votes)
8 views

Create A Machine Learning Notebook

Uploaded by

andreimcpe123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Create A Machine Learning Notebook

Uploaded by

andreimcpe123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Oracle Cloud Autonomous Data Warehouse - Introduction to Machine

Learning (Create a Machine Learning Notebook)


Targeting Likely Good Credit Customers using Oracle Machine Learning's (OML)
Classification Models

In this Lab we will take the role of Lisa who is a data manager who has spent most of her time over the past couple of years extracting
and preparing data for analysis.

The large volumes of data need extracting, and processing mean Lisa spends most of her time waiting for jobs to finish and very little of
her time analyzing the data.

Demands from marketing are forcing a new approach whereby the data remains in the data warehouse and is processed there.

Lisa started taking a look at Oracle and found the simple SQL commands in ADWC are familiar, and execute extremely fast, leveraging
all the performance features of the platform. Furthermore, once she is done can apply the learning models to incoming data at any time,
and allow end user analysts to immediately see mining results.

This drastically reduces the cycle of data preparation, analysis, and publishing. It also means there is no change to analysis/reporting
Data Visualization toolset that users are familiar with.

The Business Problem:

Increase Sales by Targeting our Best Customers; Good Credit Customers!

Lisa has a hunch that weakening sales may be due to the company selling to non-optimal customers; customers who perhaps have
poor credit and fail to make their payments for their purchases.

Lisa has over 100 variables to consider so wants to first explore her data using simple charts and graphs, but then move onto using
Oracle Machine Learning's powerful algorithms to automatically sift through her data to find patterns, new insights and to make
predictions that target her best customers–those who have good credit.

In this Lab we will process a small data set of 100,000 records but could use a 100M or billion row data set without worrying about
processing time.

Please complete Oracle Cloud Autonomous Data Warehouse – Introduction to Machine


Learning (Preparing to use a Machine Learning Notebook) before continuing with this lab.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Section 1: Explore Oracle Machine Learning

1. From your Console click the hamburger icon in the top left corner and selecting Autonomous Data Warehouse

2. This will return you to your database page where you can select your live database.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

2
3. Click Database Actions > View All Database Actions.

4. Under the Development section click Oracle Machine Learning.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

3
5. Enter the login details of the user you created in the previous section and click Sign In.

6. You will see the Oracle Machine Learning Console and its landing page as below:

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

4
7. Click the Hamburger menu in the top left, Select Notebooks.

8. Click Create to start creating a new Notebook.

9. Enter Predict Credit Scores in the name field of the dialog and Click OK.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

5
10. The notebook server will start.

11. You will be presented with the blank notebook.

12. Once the notebook server has started and the notebook has loaded we need to set the interpreter binding. Click on the gear icon.

13. Select the _high interpreter and then Save.

14. For the remainder of this lab we will build the functionality of the Machine Learning Notebook.
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

6
Section 2: Working with a Machine Learning Notebook

1. Click in the Edit Window and enter the following text:

Prepare the Data in the notebook and click the run icon to execute. In this step we will create 2 views.

This is a Paragraph. We build text and statements in Paragraphs.

2. To create a new Paragraph, position your mouse at the bottom of the previously created Paragraph, +Add Paragraph should pop
up. Click + Add Paragraph to create a new Paragraph.

3. In the newly created Paragraph , delete the %sql text, add the following PL/SQL script and click the Run this paragraph icon:

%script
/* Click on the arrow to the right to execute this.*/
BEGIN
execute immediate 'create or replace view credit_scoring_100k_v as select * from admin.credit_scoring_100k';
execute immediate 'create or replace view credit_scoring_new_cust_v as select * from admin.credit_scoring_100k
sample (10)';
end;

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

7
4. You should receive a PL/SQL procedure successfully completed message and a new Paragraph will have been created.

NOTE: If you do not receive a PL/SQL procedure successfully completed message then check you have typed the code for the
PL/SQL statement to create 2 views exactly as above.

5. In the newly created paragraph, delete the %script text and add the following text in its place:

Show the tables and views owned by the current ML user.

6. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon

%sql
/* Click on the arrow in the upper right to run a query that shows the tables and views owned by the current ML user (the
current one we are using).*/
Select table_name from user_tables where table_name not like ('DM$%')
UNION
select view_name from user_views where view_name not like ('DM$%')

7. The SQL should execute, and you should be shown a list of tables and views including the ones we created previously:

8. In the newly created paragraph, delete the %sql and add the following text in its place:

Show the columns which can be used for variables

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

8
9. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon

%sql
/* Execute this (arrow upper right) */
select column_name from user_tab_cols where table_name='CREDIT_SCORING_100K_V';

10. You should see a list of columns.

11. In the newly created paragraph, delete the % sql and add the following text in its place:

Show a sample of the records in the table. We will use the historical credit scoring data to predict the likelihood of a
customer having good credit in the future

12. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon

%sql
/* This shows the credit scoring data we will use historical data to predict the likelihood of a customer having good credit.
*/
Select * from credit_scoring_100k_v where rownum < 100

13. You should see a sample of the rows in the table returned by the query.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

9
14. In the newly created paragraph, delete the % sql and add the following text in its place:

Good Credit Customers are Hard to Find! Show a Bar Graph of a sample of the records in the table.

15. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon

%sql
select customer_id, credit_score_bin from credit_scoring_100k_v sample (10)

16. You should see the results of the SQL.

17. To display this data in a graph format, click the Bar Graph icon.

18. You should see a graphical representation of the query. It’s not very informative.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

10
19. To change the settings of the graphical representation, click settings

20. The current chart settings will be displayed.

21. We wish to set the Key to CREDIT_SCORE_BIN and the Values to CUSTOMER_ID COUNT.

a) Clear values by clicking x on CREDIT_SCORE_BIN SUM in Values

b) Clear keys by clicking x on CUSTOMER_ID in Keys

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

11
c) Add the new key CREDIT_SCORE_BIN by selecting and dragging CREDIT_SCORE_BIN from Available Fields: to Keys

d) Add the new value CUSTOMER_ID by selecting and dragging CUSTOMER_ID from Available Fields: to Values

e) By default, the newly added value field will be aggregated using the SUM function.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

12
f) We wish the newly added value field CUSTOMER_ID SUM to be counted using the COUNT function. To do this click on the
CUSTOMER_ID SUM to display a drop-down list of available functions.

g) From the drop down, list of available functions select COUNT.

22. a) The Graph should now show a count of Good Credit customers and Other Credit Customers. Hover your mouse over the bars to
see the count.

b) To hide the graph settings and fields click Settings

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

13
23. In the newly created paragraph below the bar chart, delete the % sql and add the following text in its place:

Review Data by Mode of Job Contacts and Income

24. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon

%sql
select customer_id, age, income, tenure, loan_type, loan_amount, occupation, number_of_current_accounts,
max_cc_spent_amount, mode_job_of_contacts from credit_scoring_100k_v where rownum < 1000

25. You should see a sample of the rows in the table returned by the query.

26. We wish to visualize this in a chart format. To display this data in a graph format, click the Bar Graph icon.

27. You should see a graphical representation of the query. It’s not very informative.

28. To change the settings of the graphical representation, click settings

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

14
29. The current chart settings will be displayed.

30. We wish to set the Key to MODE_JOB_OF_CONTACTS and the Values to CUSTOMER_ID COUNT.

a) Clear values by clicking x on AGE SUM in Values

b) Clear keys by clicking x on CUSTOMER_ID in Keys

c) Add the new key MODE_JOB_OF_CONTACTS by selecting and dragging MODE_JOB_OF_CONTACTS from Available Fields:
to Keys

d) Add the new value CUSTOMER_ID by selecting and dragging CUSTOMER_ID from Available Fields : to Values

e) Change the function for the newly added value field CUSTOMER_ID SUM to use the COUNT function. To do this click on the
CUSTOMER_ID SUM to display a drop-down list of available functions and select COUNT.

31. The Graph should now show a count of the job groupings of a set of customers. Hover your mouse over the various bars to see the
count.

32. Hide the graph settings by clicking on Settings

33. In the newly created paragraph below the bar chart, delete the % text and add the following text in its place:

Review Data by Occupation in a Pie Chart format

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

15
34. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon

%sql

select customer_id, age, income, tenure, loan_type, loan_amount, occupation, marital_status


from credit_scoring_100k_v where rownum < 1000

35. You should see a sample of the rows in the table returned by the query.

36. We wish to visualize this in a chart format. For this query we wish to use a pie chart, click the Pie Chart icon.

37. You should see a graphical representation of the query. It’s very colorful but not very informative.

38. To change the settings of the graphical representation, click settings

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

16
39. The current chart settings will be displayed.

40. We wish to set the Key to OCCUPATION and the Values to CUSTOMER_ID COUNT.

a) Clear values by clicking x on AGE SUM in Values

b) Clear keys by clicking x on CUSTOMER_ID in Keys

c) Add the new key OCCUPATIONS by selecting and dragging OCCUPATION from Available Fields: to Keys

d) Add the new value CUSTOMER_ID by selecting and dragging CUSTOMER_ID from Available Fields: to Values

e) Change the function for the newly added value field CUSTOMER_ID SUM to use the COUNT function. To do this click on the
CUSTOMER_ID SUM to display a drop-down list of available functions and select COUNT.

41. The Graph should now show a count of the OCCUPATION groupings of a set of customers. Hover your mouse over the various
sections to see the count.

42. In the newly created paragraph below the bar chart, delete the % text and add the following text in its place:

Review Data Grouped by loan type in a Pie Chart format

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

17
43. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon

%sql

select customer_id, age, income, tenure, loan_type, loan_amount, occupation, marital_status


from credit_scoring_100k_v where rownum < 1000

44. You should see a sample of the rows in the table returned by the query.

45. We wish to visualize this in a chart format. For this query we wish to use a pie chart, click the Pie Chart icon.

46. You should see a graphical representation of the query. Again, It’s very colorful but not very informative.

47. To change the settings of the graphical representation, click settings

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

18
48. The current chart settings will be displayed.

49. We wish to set the Key to LOAN_TYPE and the Values to CUSTOMER_ID COUNT.

a) Clear values by clicking x on AGE SUM in Values

b) Clear keys by clicking x on CUSTOMER_ID in Keys

c) Add the new key LOAN_TYPE by selecting and dragging LOAN_TYPE from Available Fields: to Keys

d) Add the new value CUSTOMER_ID by selecting and dragging CUSTOMER_ID from Available Fields: to Values

e) Change the function for the newly added value field CUSTOMER_ID SUM to use the COUNT function. To do this click on the
CUSTOMER_ID SUM to display a drop-down list of available functions and select COUNT.

50. The Graph should now show a count of the LOAN_TYPE groupings of a set of customers. Hover your mouse over the various
sections to see the count.

51. In the newly created paragraph below the bar chart, delete the % text and add the following text in its place:

Review Income by Age and Tenure Grouped by loan type in a Scatter Graph format

52. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon

%sql
select customer_id, age, income, tenure, loan_type, loan_amount, occupation, education_level
from credit_scoring_100k_v where rownum < 40

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

19
53. You should see a sample of the rows in the table returned by the query.

54. We wish to visualize this in a chart format. For this query we wish to use a Scatter Chart, click the Scatter Chart icon.

55. You should see a graphical representation of the query. It is based on only one field and not very informative.

56. To change the settings of the graphical representation, click settings

57. The current chart settings will be displayed.

NOTE: In a scatter chart we do not have keys and fields but represent data on the X and Y axis.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

20
58. We wish to set the xAxis to AGE, the yAxis to TENURE, the grouping to LOAN_TYPE and the size to CUSTOMER_ID.

a) Clear the xAxis value by clicking x on CUSTOMER_ID

b) Add the new xAxis AGE by selecting and dragging AGE from Available Fields: to xAxis

c) Clear the yAxis value by clicking x on AGE

d) Add the new yAxis TENURE by selecting and dragging TENURE from Available Fields: to yAxis

e) Add the new group LOAN_TYPE by selecting and dragging LOAN_TYPE from Available Fields: to group

f) Add the new size CUSTOMER_ID by selecting and dragging CUSTOMER_ID from Available Fields: to size

59. The Scatter Chart should now show a count of the LOAN_TYPE groupings of a set of customers and be sized by the number of
customers in that LOAN_TYPE. Hover your mouse over the various sections to see the count.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

21
Section 3: Bin the Variables

In this section we will execute a procedure which “bins” all the numeric columns for you. Binning is a way to group a number of more or
less continuous values into a smaller number of "bins". For example, if you have data about a group of people, you might want to
arrange their ages into a smaller number of age intervals. By binning the age of the people into a new column, data can be visualized
for the different age groups instead of for each individual.

1. In the newly created paragraph below the bar chart, delete the % text and add the following text in its place:

Create a view that bins all the numeric columns for you

2. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon

%script

BEGIN
BEGIN
EXECUTE IMMEDIATE 'DROP TABLE bin_num_tbl';
EXCEPTION WHEN OTHERS THEN NULL;
END;

BEGIN
EXECUTE IMMEDIATE 'DROP VIEW mining_data_bin_view';
EXCEPTION WHEN OTHERS THEN NULL;
END;

dbms_data_mining_transform.create_bin_num(
bin_table_name => 'bin_num_tbl');

dbms_data_mining_transform.insert_autobin_num_eqwidth(
bin_table_name => 'bin_num_tbl',
data_table_name => 'CREDIT_SCORING_100K_V',
bin_num => 5,
max_bin_num => 10,
exclude_list => dbms_data_mining_transform.COLUMN_LIST('CUSTOMER_ID'));

dbms_data_mining_transform.xform_bin_num(
bin_table_name => 'bin_num_tbl',
data_table_name => 'CREDIT_SCORING_100K_V',
xform_view_name => 'mining_data_bin_view');
END;

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

22
3. You should receive the following success message.

4. We will now create and run a query using the binned data. In the newly created paragraph below the PL/SQL code, delete the %
text and add the following text in its place:

Explore the data to make sure it makes sense and matches the data they believe we are working with.

5. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon

%sql
select customer_id, age, income, tenure, loan_type, loan_amount, occupation, education_level
from mining_data_bin_view;

6. You should see a sample of the rows in the table returned by the query.

7. We wish to visualize this in a chart format. To display this data in a graph format, click the Bar Graph icon.

8. You should see a graphical representation of the query. It’s not very informative.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

23
9. To change the settings of the graphical representation, click settings

10. The current chart settings will be displayed.

11. We wish to set the Key to AGE and the Values to CUSTOMER_ID COUNT.

a) Clear keys by clicking x on CUSTOMER_ID in Keys

b) Clear values by clicking x on AGE SUM in Values

c) Add the new key AGE by selecting and dragging AGE from Available Fields: to Keys

d) Add the new value CUSTOMER_ID by selecting and dragging CUSTOMER_ID from Available Fields: to Values

e) Change the function for the newly added value field CUSTOMER_ID SUM to use the COUNT function. To do this click on the
CUSTOMER_ID SUM to display a drop-down list of available functions and select COUNT.

12. The Graph should now show a count of the CUSTOMER_ID. Hover your mouse over the various bars to see the count.

As you can see, you can create some pretty good charts in a Machine Learning notebook but what about the Machine Learning?
Let's Run Some OML Machine Learning Algorithms!

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

24
Section 4: Machine Learning Algorithms in a ML Notebook

By producing and looking at simple graphs (pie, column and scatter plots) we may determine some patterns and make business
decisions based on these patterns, but in most real-world problems, it’s not that easy.

Typically, we have lots of data in both the amount of records, but more interestingly, the number of possible variables that might have
some correlation or influence on our specific problem.

In this business case, we are looking to discover which variables (also referred to as “attributes”) have the strongest correlation with the
business problem to Increase Sales by Targeting our Best Customers; Good Credit Customers. Machine Learning Techniques and
Algorithms can be used to discover and analyze these attributes to better make business decisions.

Technique Applicability Algorithms

Classification Most commonly used technique Generalized Linear Models


for predicting a specific outcome Logistic Regression —classic
such as response / no-response, statistical technique available
high / medium / low-value inside the Oracle Database in a
customer, likely to buy / not buy. highly performant, scalable,
parallized implementation
(applies to all OAA ML
algorithms). Supports text and
transactional data (applies to
nearly all OAA ML algorithms)

Naive Bayes —Fast, simple,


commonly applicable. Leverages
Database's speed in counting.

Support Vector Machine—Newer


generation machine learning
algorithm, supports text and wide
data.

Decision Tree —Popular ML


algorithm for
interpretability. Provides human-
readable "rules".

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

25
Regression Technique for predicting a Generalized Linear Models
continuous numerical outcome Multiple Regression —classic
such as customer lifetime value, statistical technique but now
available inside the Oracle
house value, process yield rates.
Database as a highly performant,
scalable, parallized
implementation. Supports ridge
regression, feature creation and
feature selection. Supports text
and transactional data.

Support Vector Machine —Newer


generation machine learning
algorithm, supports text and wide
data.
Attribute Importance Ranks attributes according to Minimum Description Length—
strength of relationship with Considers each attribute as a
target attribute. Use cases include simple predictive model of the
finding factors most associated target class and provides relative
with customers who respond to influence.
an offer, factors most associated
with healthy patients.

Anomaly Detection Identifies unusual or suspicious One-Class Support Vector


cases based on deviation from Machine —Trains on "normal"
the norm. Common examples cases and then flag unusual cases.
include health care fraud,
expense report fraud, and tax
compliance.

Clustering Useful for exploring data and Enhanced K-Means—Supports


finding natural groupings. text mining, hierarchical
Members of a cluster are more clustering, distance based.
like each other than they are like
members of a different cluster. Orthogonal Partitioning
Common examples include Clustering—Hierarchical
finding new customer segments, clustering, density based.
and life sciences discovery.
Expectation Maximization—
Clustering technique that
performs well in mixed data
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

26
(dense and sparse) data mining
problems.

Association Finds rules associated with Apriori—Industry standard for


frequently co-occurring items, market basket analysis.
used for market basket analysis,
cross-sell, root cause analysis.
Useful for product bundling, in-
store placement, and defect
analysis.

Feature Selection and Extraction Produces new attributes as linear Non-negative Matrix
combination of existing Factorization—Maps the original
attributes. Applicable for text data into the new set of
data, latent semantic analysis, attributes
data compression, data
decomposition and projection, Principal Components Analysis
and pattern recognition. (PCA)—creates new fewer
composite attributes that
represent all the attributes.

Singular Vector Decomposition—


established feature extraction
method that has a wide range of
applications.

In this example we will use Oracle Machine Learning's Attribute Importance algorithm and specifically the
DBMS_Predictive_Analytics.Explain simplist version of it. These OML Feature Selection machine learning algorithms automatically
sift through all the input variables or attributes looking for those variables that have the strongest correlation with another key variable.

First, we are looking to find those variables that most influence finding Good Credit customers. While Oracle Machine Learning's
algorithms can build predictive models using hundreds or thousands of “big data” variables, often it is extremely useful simply to find
those fewer variables that are most important. Once found, you can use that information to change business practices, create better
reports and seek out more similar variables that may prove even more informative and useful.

1. In the newly created paragraph, delete the % text and add the following text in its place:

Create Attribute Importance Machine Learning Model for Good Credit Customers

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

27
2. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon. In this step we determine the
attributed which contribute most to good credit.

%script
/* Find the importance of attributes that independently impact the target attribute: CREDIT_SCORE_BIN */
DECLARE
v_sql varchar2(100);
BEGIN
BEGIN
EXECUTE IMMEDIATE 'DROP TABLE ai_explain_output_credit_score_bin';
EXCEPTION WHEN OTHERS THEN NULL;
END;
BEGIN
DBMS_PREDICTIVE_ANALYTICS.EXPLAIN(
data_table_name => 'CREDIT_SCORING_100K_V',
explain_column_name => 'CREDIT_SCORE_BIN',
result_table_name => 'AI_EXPLAIN_OUTPUT_CREDIT_SCORE_BIN');
END;
END;

3. You should receive the following success message.

4. In the newly created paragraph, delete the % text and add the following text in its place:

Display the Top Attributes for Good Credit Customers (change rownum < value to increase Attribute Scope)

5. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon. In this step we determine the
attributed which contribute most to good credit.

%sql
Select * from ai_explain_output_CREDIT_SCORE_BIN where rownum < 7;

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

28
6. You should see the top 7 attribute rows in the table returned by the query.

7. We wish to visualize this in a chart format. To display this data in a graph format, click the Bar Chart icon.

8. You should see a graphical representation of the query. It’s not very informative.

9. To change the settings of the graphical representation, click settings

10. The current chart settings will be displayed.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

29
11. We wish to leave the Key as ATTRIBUTE_NAME and change the Values to EXPLANATORY_VALUE SUM.

a) Clear values by clicking x on ATTRIBUTE_SUBNAME SUM in Values

b) Add the new value by selecting and dragging EXPLANATORY_VALUE from Available Fields: to Values

NOTE: EXPLANATORY_VALUE SUM should be displayed by default. If it is not, then click on the EXPLANATORY_VALUE to
display a drop-down list of available functions and select SUM.

12. The Graph should now show a count of the top 7 attributes contributing to good credit.

Section 5: Machine Learning Algorithms – Build Importance Models on Single Attributes

We will now create an Attribute Importance Model for MAX_CC_SPENT_AMOUNT to find the importance of attributes that
independently impact the target attribute: MAX_CC_SPENT_AMOUNT (Max Credit Card Spent Amount).

1. In the newly created paragraph, delete the % text and add the following text in its place:

Create Attribute Importance Machine Learning Model for MAX_CC_SPENT_AMOUNT

2. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon. In this step we determine the
attributed which contribute most to good credit.

%script
/* Find the importance of attributes that independently impact the target attribute: MAX_CC_SPENT_AMOUNT */
DECLARE
v_sql varchar2(100);
BEGIN
BEGIN
EXECUTE IMMEDIATE 'DROP TABLE ai_explain_output_MAX_CC_SPENT_AMOUNT';
EXCEPTION WHEN OTHERS THEN NULL;
END;
BEGIN
DBMS_PREDICTIVE_ANALYTICS.EXPLAIN(
data_table_name => 'Credit_Scoring_100k_v',
explain_column_name => 'MAX_CC_SPENT_AMOUNT',
result_table_name => 'ai_explain_output_MAX_CC_SPENT_AMOUNT');
END;
END;

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

30
3. You should receive the following success message.

4. In the newly created paragraph, delete the % text and add the following text in its place:

Display the Top Attributes for MAX_CC_SPENT_AMOUNT (change rownum < value to increase Attribute Scope)

5. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon. In this step we determine the
attributed which contribute most to good credit.

%sql
Select * from ai_explain_output_MAX_CC_SPENT_AMOUNT where rownum < 7;

6. You should see the top 7 attribute rows in the table returned by the query.

7. We wish to visualize this in a chart format. To display this data in a graph format, click the Bar Graph icon.

8. You should see a graphical representation of the query. It’s not very informative.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

31
9. To change the settings of the graphical representation, click settings

10. The current chart settings will be displayed.

11. We wish to leave the Key as ATTRIBUTE_NAME and change the Values to EXPLANATORY_VALUE SUM.

a) Clear values by clicking x on ATTRIBUTE_SUBNAME SUM in Values

b) Add the new value by selecting and dragging EXPLANATORY_VALUE from Available Fields: to Values

NOTE: EXPLANATORY_VALUE SUM should be displayed by default. If it is not, then click on the EXPLANATORY_VALUE to
display a drop-down list of available functions and select SUM.

12. The Graph should now show a count of the top 7 attributes contributing to good credit.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

32
Section 6: Machine Learning Algorithms – Create a Predictive Model

Now that we have found the key attributes that most influence finding Good Credit customers and also making better Maximum Credit
Card Amount decisions, we can leverage Oracle Machine Learning's powerful in-Database, parallelized algorithms to build predictive
models that help better target “the right customers” with the “right offers”.

The machine learning process:

Problem Definition: Target Good Credit Customers

Data Gathering and Preparation: We have assembled 100K records with 100+ variable about each customer and have created
a target field (Good Customer/Other Customer) so we can use OML's Supervised Algorithms, specifically let's start by using a
decision tree algorithm.

Model Building and Evaluation: We will create a randomly selected sample from our Credit_Scoring_100k historical data and
use 60% as training data for the machine learning model building phase. Then, we'll use the remaining 40% as a holdout sample
to test our model's accuracy using various model evaluation tools such as a “lift chart”.

Knowledge Deployment: Once we are satisfied that we have a useful ML model that can predict with some accuracy which
customers we should target (Good Credit customers), we want to apply our OML model to new customer data inside ADWC and
then take a deeper look at them. Lastly, we'll jump over to Oracle Analytics Cloud for a more interactive, exploratory data
analysis experience but now focusing on our customers of interest (Good Credit customers).

Data Mining and Machine Learning Process

We will now carry out the Preparatory Steps to Automate the Model Build and Test and Clean up using PL/SQL.

1. In the newly created paragraph, delete the % text and add the following text in its place:

Build a classification model and then generate a lift test result and an apply result

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

33
2. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.

%script
/* Build a classification model and then generate a lift test result and an apply result. Click on the arrow in the upper
right. */
DECLARE
v_sql varchar2(100);
BEGIN
/*
--------------------------------------------
-- REMOVE PRIOR RUNS
--------------------------------------------

-- drop build settings

*/

BEGIN
v_sql := 'DROP TABLE n1_build_settings PURGE';
EXECUTE IMMEDIATE v_sql;
DBMS_OUTPUT.PUT_LINE (v_sql ||': succeeded');
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE (v_sql ||': drop unneccessary - no table exists');
END;

/* drop model */
BEGIN
v_sql := 'CALL DBMS_DATA_MINING.DROP_MODEL(''N1_CLASS_MODEL'')';
EXECUTE IMMEDIATE v_sql;
DBMS_OUTPUT.PUT_LINE (v_sql ||': succeeded');
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE (v_sql ||': drop unneccessary - no model exists');
END;

/* drop apply result */


BEGIN
v_sql := 'DROP TABLE N1_APPLY_RESULT PURGE';
EXECUTE IMMEDIATE v_sql;
DBMS_OUTPUT.PUT_LINE (v_sql ||': succeeded');
EXCEPTION

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

34
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE (v_sql ||': drop unneccessary - no table exists');
END;

/* drop lift result */


BEGIN
v_sql := 'DROP TABLE N1_LIFT_TABLE PURGE';
EXECUTE IMMEDIATE v_sql;
DBMS_OUTPUT.PUT_LINE (v_sql ||': succeeded');
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE (v_sql ||': drop unneccessary - no table exists');
END;

/* Split the Data into N1_TRAIN_DATA and N1_TEST_DATA */


EXECUTE IMMEDIATE 'CREATE OR REPLACE VIEW N1_TRAIN_DATA AS SELECT * FROM CREDIT_SCORING_100K_v
SAMPLE (60) SEED (1)';
DBMS_OUTPUT.PUT_LINE ('Created N1_TRAIN_DATA');
EXECUTE IMMEDIATE 'CREATE OR REPLACE VIEW N1_TEST_DATA AS SELECT * FROM CREDIT_SCORING_100K_v
MINUS SELECT * FROM N1_TRAIN_DATA';
DBMS_OUTPUT.PUT_LINE ('Created N1_TEST_DATA');

/* Create a Build Setting (DT) for Model Build */


EXECUTE IMMEDIATE 'CREATE TABLE n1_build_settings (setting_name VARCHAR2(30),setting_value
VARCHAR2(4000))';
EXECUTE IMMEDIATE 'INSERT INTO n1_build_settings (setting_name, setting_value) VALUES (''ALGO_NAME'',
''ALGO_DECISION_TREE'')';
EXECUTE IMMEDIATE 'INSERT INTO n1_build_settings (setting_name, setting_value) VALUES (''PREP_AUTO'', ''ON'')';
DBMS_OUTPUT.PUT_LINE ('Created model build settings table: n1_build_settings ');

/*
-- Populate and Adjust Model Setting (DT) for Model Build
EXECUTE IMMEDIATE 'INSERT INTO n1_build_settings VALUES (''TREE_TERM_MAX_DEPTH'', 7)';
EXECUTE IMMEDIATE 'INSERT INTO n1_build_settings VALUES (''TREE_TERM_MINREC_SPLIT'', 20)';
EXECUTE IMMEDIATE 'INSERT INTO n1_build_settings VALUES (''TREE_TERM_MINPCT_SPLIT'', .1)';
EXECUTE IMMEDIATE 'INSERT INTO n1_build_settings VALUES (''TREE_TERM_MINREC_NODE'', 10)';
EXECUTE IMMEDIATE 'INSERT INTO n1_build_settings VALUES (''TREE_TERM_MINPCT_NODE'', 0.05)';
*/

/* Build a Classification Model */

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

35
EXECUTE IMMEDIATE 'CALL DBMS_DATA_MINING.CREATE_MODEL(''N1_CLASS_MODEL'', ''CLASSIFICATION'',
''N1_TRAIN_DATA'', ''CUSTOMER_ID'','' CREDIT_SCORE_BIN'', ''n1_build_settings'')';
DBMS_OUTPUT.PUT_LINE ('Created model: N1_CLASS_MODEL ');

/* Test the Model by generating a apply result and then create a lift result */
EXECUTE IMMEDIATE 'CALL
DBMS_DATA_MINING.APPLY(''N1_CLASS_MODEL'',''N1_TEST_DATA'',''CUSTOMER_ID'',''N1_APPLY_RESULT'')';
DBMS_OUTPUT.PUT_LINE ('Created apply result: N1_APPLY_RESULT ');
EXECUTE IMMEDIATE 'CALL
DBMS_DATA_MINING.COMPUTE_LIFT(''N1_APPLY_RESULT'',''N1_TEST_DATA'',''CUSTOMER_ID'',''CREDIT_SCORE_BIN'
',''N1_LIFT_TABLE'',''Good Credit'',''PREDICTION'',''PROBABILITY'',100)';
DBMS_OUTPUT.PUT_LINE ('Created lift result: N1_LIFT_TABLE ');
END;

3. You should receive the following success message.

4. In the newly created paragraph, delete the % text and add the following text in its place:

View Model's Cumulative Gains Chart

5. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.

%sql

SELECT QUANTILE_NUMBER, GAIN_CUMULATIVE from N1_LIFT_TABLE where rownum < 40

6. You should see the cumulative gains in the table returned by the query.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

36
7. We wish to visualize this in a chart format. To display this data in a graph format, click the Bar Graph icon.

8. You should see a graphical representation of the query. Much more informative but not quite what we want.

9. To change the settings of the graphical representation, click settings

10. The current chart settings will be displayed.

11. We wish to leave the Key as QUANTILE_NUMBER and change the Values to GAIN_CUMMULATIVE AVG.

a) Click on the GAIN_CUMMULATIVE SUM to display a drop-down list of available functions and select AVG.

12. The Graph should now show the average cumulative gain.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

37
13. We will now apply the Oracle Machine Learning Classification Model to New Customers. This will Show Customers Most Likely to
Have Good Credit. In the newly created paragraph, delete the % text and add the following text in its place:

Apply the Oracle Machine Learning Classification Model to New Customers display the top predictive attributes

14. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.

%sql
/* This will show the top predictive attributes */
select a.customer_id
, a.prob_Credit_Score_Bin
, b.age, b.income, b.tenure, b.loan_type, b.loan_amount, b.occupation, b.education_level, b.marital_status
from (select * from (select Customer_id, round(prob_Credit_Score_Bin *100,2) prob_Credit_Score_Bin from (select
Customer_ID, prediction_probability(N1_CLASS_MODEL, NULL using *) prob_Credit_Score_Bin from
credit_scoring_new_cust_v))) a, credit_scoring_100k_v b
where a.customer_id = b.customer_id
order by a.prob_Credit_Score_Bin desc

15. You should see the Customers Most Likely to Have Good Credit on a probability of credit score 100.

16. We will now create a new table in order to visualize these results. We will first drop the table created previously. In the newly
created paragraph, delete the % text and add the following text in its place:

Drop the credit_score_new_predictions table.

17. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.

%script
begin
execute immediate 'drop table credit_score_new_predictions';
DBMS_OUTPUT.PUT_LINE ('succeeded');
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE ('drop unneccessary - no table exists');
END;

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

38
18. You should receive the one of the following messages depending on whether the table exists.

OR

19. We will now create a New Table CREDIT_SCORE_NEW_PREDICTIONS in order to visualize the results of the classification
model. In the newly created paragraph, delete the % text and add the following text in its place:

Visualize the Oracle Machine Learning Classification Model to New Customers display the top predictive attributes

20. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.

%sql
/* create table with results above so we can view in Data Visualization */
create table credit_score_new_predictions as
select a.customer_id
, a.prob_good_credit
, b.age, b.income, b.tenure, b.loan_type, b.loan_amount, b.occupation, b.education_level, b.marital_status
from (select * from (select Customer_id, round(prob_good_credit *100,2) prob_good_credit from (select Customer_ID,
prediction_probability(N1_CLASS_MODEL, 'Good Credit' using *) prob_good_credit from credit_scoring_new_cust_v))) a
, credit_scoring_100k_v b
where a.customer_id = b.customer_id

21. We will now review the CREDIT_SCORE_NEW_PREDICTIONS Table and rank Good Customers Based on Prediction Probablity,
and Other Factors. In the newly created paragraph, delete the % text and add the following text in its place:

Score and rank new customers that are a professional, married, and borrowing for education.

22. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.

%sql

/* Score and rank new customers that are a professional, married, and borrowing for education (in this case). You can
substitute or add other filters. */

select * from credit_score_new_predictions

WHERE LOAN_TYPE = 'Education' and MARITAL_STATUS = 'Married' and OCCUPATION = 'Professional'

order by rank() over (order by PROB_GOOD_CREDIT desc)

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

39
23. You should see the new customers that are a professional, married, and borrowing for education with the highest probability of
having good credit.

24. We will now review the CREDIT_SCORE_NEW_PREDICTIONS Table and rank Customers most likely to have poor credit based
on Prediction Probablity, and Other Factors. In the newly created paragraph, delete the % text and add the following text in its
place:

Score and rank new customers that are a professional, married, and borrowing for education.

25. Create a new Paragraph and enter the following SQL and click the Run this paragraph icon.

%sql
/* Score and rank new customers that are a professional, married, and borrowing for education (in this case). You can
substitute or add other filters. */
select * from credit_score_new_predictions
WHERE LOAN_TYPE = 'Education' and MARITAL_STATUS = 'Married' and OCCUPATION = 'Professional'
order by rank() over (order by PROB_GOOD_CREDIT asc)

26. You should see the new customers that are a professional, married, and borrowing for education with the lowest probability of
having good credit.

Congratulations! You have created a ML Notebook from beginning to end in Oracle Autonomous Data Warehouse. As you have
experienced ML Notebooks allow us to utilize combine the powerful of data visualization with powerful ML algorithms which enable the
ability to run predictive analysis on very large data sets.

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

40

You might also like