Decision Tree Algorithm in PL/SQL
Decision Tree Algorithm in PL/SQL
Author JP Vijaykumar
Date Sept 14th 2019
This script was generated/tested in Oracle 12C environment. This script may not work in other
versions of Oracle, pls modify the script as may be needed using your own discretion/ingenuity.
I had the following data, and want to predict output using decision tree algorithm.
I am using pl/sql scripting to process the data. This can also be processed using oracle data miner.
I can generate the rules with the attribute conditions from the above table using decision tree
algorithm in python. I executed python script from gitbash console.
Can I produce a similar output, on the attribute conditions from the above table using decision tree
algorithm in pl/sql? Tough task! Strain the brain.
Implementing the complex entropy calculations in pl/sql are challenging. Python is rich with advanced
features / libraries, and comparable features/libraries are not available in pl/sql. It took me quite
long time, working off and on, to formulate the logic and implement the Decision tree algorithm in
pl/sql. I used the below mentioned urls, which were of great help, in understanding the complex
algorithm and code it.
There are few other ways in pl/sql to solve this problem. I used entropy method in my script. Besides
scripting, I love literature, this is one of my favourite idioms “leaving no stone unturned”.
Scripting is fun.
export ORACLE_SID=orcl
export ORACLE_HOME=c:/app/oracle/product/12.2.0/dbhome_1
sqlplus
/ as sysdba
connect veeksha/saketh
During the script’s execution, if table or view does not exist error or the following error keeps
occurring, then create temp_vw_jp as shown in the following sql statement.
ORA-06502: PL/SQL: numeric or value error: Bulk Bind: Truncated Bind
create or replace view temp_vw_jp as select outlook cl1,humidity cl2,play_ball cl3 from
xtern_decision_jp;
Run the script from sqlplus prompt as shown below (pls note the script’s location may change as per
your environment/file location).
Formatting the output between python script and pl/sql script is a bit different. But the decision tree
is implemented on the given data using pl/sql.
Pls note: I tested this script on a small dataset in 12c db. On a larger dataset, the script may take
longer time and generate heavy system overhead. Readers’ discretion is advised.
Pls note: This script may not work on datasets with more attributes and needs modifications.
All I was interested is to see, how can I predict outcome from a given dataset using pl/sql.
Happy scripting.
References:
SCRIPT
set serverout on size 1000000 timing on
declare
v_table varchar2(30):='XTERN_DECISION_JP';
v_classifier varchar2(30):='PLAY_BALL';
--v_table varchar2(30):='DECISION_LP_JP';
--v_classifier varchar2(30):='R';
v_view varchar2(30):='TEMP_VW_JP';
v_numrows number(10,4);
v_entropy number(10,4);
v_centropy number(10,4);
v_gain number(10,4):=0;
v_gainm number(10,4):=0;
v_rootnode varchar2(30):='';
v_rootvalue varchar2(30):='';
v_createview varchar2(4000):='create or replace view temp_vw_jp as select ';
v_predicate varchar2(4000);
begin
execute immediate 'select count(*) from '||v_table||' ' into v_numrows;
execute immediate
'select round((-1)*(sum((val)*(log(2,(val))))),4) '||
' from ( '||
' select '||v_classifier||',count(*)/'||v_numrows||' val '||
'from '||v_table||' group by '||v_classifier||')' into v_entropy;
dbms_output.put_line('Entropy '||v_entropy);
execute immediate 'with t1 as (select '||c1.col||' cl1 ,count(*) val from '||v_table||
' group by '||c1.col||'), t2 as (select '||c1.col||' cl1,'||v_classifier||
' cl2,count(*) val from '||v_table||' group by '||c1.cOl||','||v_classifier||
'), t3 as (select t1.cl1,t1.val val1 ,t2.cl2,t2.val val2 from t1,t2 where t1.cl1 = t2.cl1 ),
t4 as (select cl1,(t3.val1/'||v_numrows||')*(sum((-1)*(val2/val1)*(log(2,(val2/val1))))) sum_val
from t3 group by cl1,t3.val1,'||v_numrows||')
select '||v_entropy||' - round(sum(sum_val),4) from t4' into v_gain;
if( v_gain > v_gainm) then
v_gainm:=v_gain;
v_rootnode:=c1.col;
end if;
end loop;
dbms_output.put_line('Root Node '||v_rootnode||' '||v_gainm);
for c32 in (
with t1 as (select count(*) val from temp_vw_jp where cl1=c31.cl1),
t2 as (select cl1,cl2,count(*) val from temp_vw_jp where cl1=c31.cl1 group by cl1,cl2),
t3 as (select cl1,cl2,cl3,count(*) val from temp_vw_jp where cl1=c31.cl1 group by cl1,cl2,cl3),
t4 as (select sum((-1)*(t3.val/t1.val)*(log(2,(t3.val/t2.val)))) val from t1,t2,t3 where t2.cl1=t3.cl1
and t2.cl2=t3.cl2) select t2.cl1,t2.cl2,t3.cl3,t2.val val2,t3.val val3,t4.val val4
from t2,t3,t4 where t2.cl1=t3.cl1 and t2.cl2=t3.cl2) loop
if (c32.val2 = c32.val3) then
dbms_output.put_line(v_rootnode||' -> '||c32.cl1||' '||c3.column_name||
' -> '||c32.cl2||'('||c32.val2||') '||v_classifier||' -> '||c32.cl3||' ('||c32.val3||')');
v_predicate:=v_predicate||' and '||v_rootnode||' not in ('''||c32.cl1||''')';
execute immediate v_createview||v_rootnode||' cl1,'||c3.column_name||' cl2,'||v_classifier||
' cl3 from '||v_table||' where 1=1 '||v_predicate;
end if;
end loop;
end loop;
end loop;
end;
/