0% found this document useful (0 votes)

103 views13 pages

Apriori Algorithm in SQL, PL/SQL and Spark SQL

The document discusses the implementation of the Apriori Algorithm using SQL, PL/SQL, and Spark SQL, focusing on extracting frequent item sets from transactional datasets. It includes code snippets for both Spark SQL and Oracle SQL, emphasizing the use of table joins and aggregate functions to analyze data. The author provides a CSV file example and highlights the efficiency of using Spark SQL for larger datasets compared to traditional SQL methods.

Uploaded by

JP Vijaykumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views13 pages

Apriori Algorithm in SQL, PL/SQL and Spark SQL

Uploaded by

JP Vijaykumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 13

"""

Apriori Algorithm in sql,pl/sql and spark sql

Written JP Vijaykumar
Date Sep 14 2021

********************************************************************************
This script is provided for educational purpose only.

The sql scripts were tested in 11g & 19c databases.

While testing the sql scripts, pls test it in a dev/test db only and exercise
precaution.

Readers discretion/ingenuity is advised, in making necessary changes/modifications

to these
scripts as may be required for thier use in their respective environments.
The user assumes full responsibility while executing these scripts.
********************************************************************************
"""
#spark-submit.cmd python/pysparkapriori.py
import time

start_time = time.time()
"""
Previously I presented a document with the title "Apriori Algorithm in pl/sql",
wherein I used
for loops/if then else logic to process data in pl/sql and fetch frequent patterns
from a
transactional dataset.

Pls note, I am not implementing support,confidence calculation in my scripts.

The scripts just pull the frequent item sets from the transactional dataset.

I was curious as to how to pull frequent pattern data from transactional data, just
using sql commands.
Without using any programatical logic.

Fortunately, I implemented table joins/sql commands in this script to accomplish

the purpose.
The same logic was also implemented in Oracle table's data to pull frequent pattern
data.
I presented the sql scripts at the end of this article.

This table join logic works fine on smaller datsets to process data to find
frequent patterns.
It is taking more time and not efficient to process large amounts of data with this
option.
using python supplied libraries, large amounts of data can be processed in less
time, efficiently.

I am sure, in Oracle, the tables' join logic can be fine tuned using hints, tables'
join order.

In spark sql, listagg function is not available, instead, a similar option

"collect_list" was used.

Created the file apriori_spark.csv, with the following data and saved it in my
local folder "f:/data".

tran_id,item
1,curd
1,apple
1,fruit
1,lamp
1,pot
1,rose
1,jam
2,spoon
2,detergent
2,glue
2,hat
2,tape
2,umbrella
2,jam
3,jam
4,curd
4,apple
4,fruit
4,pot
4,rose
4,tape
5,fruit
5,banana
5,spoon
5,hat
5,ink
5,lamp
5,umbrella
6,curd
6,detergent
6,glue
6,hat
6,ink
6,jam
7,fruit
7,glue
7,tape
8,spoon
8,ink
8,rose
8,umbrella
9,curd
9,banana
9,ink
10,apple
10,spoon
10,ink
10,umbrella
10,jam

"""
#spark sql code starts
here################################################################
from pyspark import SparkContext,SparkConf
conf = SparkConf().setAppName("Apriori").setMaster("local[*]")
sc = SparkContext(conf=conf)

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("apriori").master("local[*]").getOrCreate()
spark.sparkContext.setLogLevel("ERROR") #ERROR / WARN / INFO #TO SUPRESS INFO
MESSAGES
from pyspark.sql.functions import *

df =
spark.read.format("csv").option("infer_schema","false").option("header","true").opt
ion("sep",",").load("f:/data/apriori_spark.csv")
print(df.columns)
print(df.describe())

print(df.count(),len(df.columns)) #equivalent to python df.shape

df.show()
tranCount = df.count()

print(tranCount)

df.createOrReplaceTempView("data")

spark.sql("select tran_id,count(*) num_products from data group by tran_id having

count(*)>1 order by tran_id").show()
###apriori
#concat row values into comma separated line
df = spark.sql("""with t as (select tran_id,count(*) from data
group by tran_id having count(*) > 1 order by tran_id)
select t.tran_id,lower(item) item from data,t
where data.tran_id = t.tran_id order by 1,2""")
df.show()

#similar to listagg function in Oracle

df.groupby('tran_id').agg(collect_list('item').alias('item')).createOrReplaceTempVi
ew("item_agg")
spark.sql("select item,count(*) from item_agg group by item order by 2
desc").show(60,False) #with this option, "show" will not chop column length in
display

#convert array type to str type using concat_ws in spark sql and search for
frequent patterns
spark.sql("""with a as (select distinct item item from data order by 1),
t2 as (select t1.item item1,t2.item item2 from a t1, a t2 where
t1.item < t2.item order by 1,2)
select t2.item1,t2.item2,count(*) from t2,item_agg
where concat_ws(' ',item_agg.item) like '%'||t2.item1||'%'
and concat_ws(' ',item_agg.item) like '%'||t2.item2||'%'
group by t2.item1,t2.item2 having count(*)>1 order by 3
desc,1,2""").show(80,False)
print(time.time() - start_time) #check time taken for the script to run
#spark sql code ends
here################################################################
"""
When I run the above script with the following command
spark-submit.cmd python/pysparkapriori.py
I get the output as shown below:

['tran_id', 'item']
DataFrame[summary: string, tran_id: string, item: string]
49 2
+-------+---------+
|tran_id| item|
+-------+---------+
| 1| curd|
| 1| apple|
| 1| fruit|
| 1| lamp|
| 1| pot|
| 1| rose|
| 1| jam|
| 2| spoon|
| 2|detergent|
| 2| glue|
| 2| hat|
| 2| tape|
| 2| umbrella|
| 2| jam|
| 3| jam|
| 4| curd|
| 4| apple|
| 4| fruit|
| 4| pot|
| 4| rose|
+-------+---------+
only showing top 20 rows

49
+-------+------------+
|tran_id|num_products|
+-------+------------+
| 1| 7|
| 10| 5|
| 2| 7|
| 4| 6|
| 5| 7|
| 6| 6|
| 7| 3|
| 8| 4|
| 9| 3|
+-------+------------+

+-------+---------+
|tran_id| item|
+-------+---------+
| 1| apple|
| 1| curd|
| 1| fruit|
| 1| jam|
| 1| lamp|
| 1| pot|
| 1| rose|
| 10| apple|
| 10| ink|
| 10| jam|
| 10| spoon|
| 10| umbrella|
| 2|detergent|
| 2| glue|
| 2| hat|
| 2| jam|
| 2| spoon|
| 2| tape|
| 2| umbrella|
| 4| apple|
+-------+---------+
only showing top 20 rows

+--------------------------------------------------+--------+
|item |count(1)|
+--------------------------------------------------+--------+
|[apple, ink, jam, spoon, umbrella] |1 |
|[banana, fruit, hat, ink, lamp, spoon, umbrella] |1 |
|[apple, curd, fruit, pot, rose, tape] |1 |
|[banana, curd, ink] |1 |
|[apple, curd, fruit, jam, lamp, pot, rose] |1 |
|[detergent, glue, hat, jam, spoon, tape, umbrella]|1 |
|[ink, rose, spoon, umbrella] |1 |
|[fruit, glue, tape] |1 |
|[curd, detergent, glue, hat, ink, jam] |1 |
+--------------------------------------------------+--------+

+---------+--------+--------+
|item1 |item2 |count(1)|
+---------+--------+--------+
|spoon |umbrella|4 |
|ink |spoon |3 |
|ink |umbrella|3 |
|apple |curd |2 |
|apple |fruit |2 |
|apple |jam |2 |
|apple |pot |2 |
|apple |rose |2 |
|banana |ink |2 |
|curd |fruit |2 |
|curd |ink |2 |
|curd |jam |2 |
|curd |pot |2 |
|curd |rose |2 |
|detergent|glue |2 |
|detergent|hat |2 |
|detergent|jam |2 |
|fruit |lamp |2 |
|fruit |pot |2 |
|fruit |rose |2 |
|fruit |tape |2 |
|glue |hat |2 |
|glue |jam |2 |
|glue |tape |2 |
|hat |ink |2 |
|hat |jam |2 |
|hat |spoon |2 |
|hat |umbrella|2 |
|ink |jam |2 |
|jam |spoon |2 |
|jam |umbrella|2 |
|pot |rose |2 |
+---------+--------+--------+

Here I will be generating the Apriori Algorithm in sql using tables' joins.
connect veeksha/saketh

drop table apriori;

create table apriori(tran_id number,item varchar2(30));
truncate table apriori;

The csv file can be read into an Oracle external table and process data.
Instead, I preferred to insert the data into a table using insert commands.
You may load the csv file via sql*loader option also.

insert into apriori values(1,'curd');

insert into apriori values(1,'apple');
insert into apriori values(1,'fruit');
insert into apriori values(1,'lamp');
insert into apriori values(1,'pot');
insert into apriori values(1,'rose');
insert into apriori values(1,'jam');
insert into apriori values(2,'spoon');
insert into apriori values(2,'detergent');
insert into apriori values(2,'glue');
insert into apriori values(2,'hat');
insert into apriori values(2,'tape');
insert into apriori values(2,'umbrella');
insert into apriori values(2,'jam');
insert into apriori values(3,'jam');
insert into apriori values(4,'curd');
insert into apriori values(4,'apple');
insert into apriori values(4,'fruit');
insert into apriori values(4,'pot');
insert into apriori values(4,'rose');
insert into apriori values(4,'tape');
insert into apriori values(5,'fruit');
insert into apriori values(5,'banana');
insert into apriori values(5,'spoon');
insert into apriori values(5,'hat');
insert into apriori values(5,'ink');
insert into apriori values(5,'lamp');
insert into apriori values(5,'umbrella');
insert into apriori values(6,'curd');
insert into apriori values(6,'detergent');
insert into apriori values(6,'glue');
insert into apriori values(6,'hat');
insert into apriori values(6,'ink');
insert into apriori values(6,'jam');
insert into apriori values(7,'fruit');
insert into apriori values(7,'glue');
insert into apriori values(7,'tape');
insert into apriori values(8,'spoon');
insert into apriori values(8,'ink');
insert into apriori values(8,'rose');
insert into apriori values(8,'umbrella');
insert into apriori values(9,'curd');
insert into apriori values(9,'banana');
insert into apriori values(9,'ink');
insert into apriori values(10,'apple');
insert into apriori values(10,'spoon');
insert into apriori values(10,'ink');
insert into apriori values(10,'umbrella');
insert into apriori values(10,'jam');
commit;

select * from apriori;

TRAN_ID ITEM
---------- ------------------------------
1 curd
1 apple
1 fruit
1 lamp
1 pot
1 rose
1 jam
2 spoon
2 detergent
2 glue
2 hat
2 tape
2 umbrella
2 jam
3 jam
4 curd
4 apple
4 fruit
4 pot
4 rose
4 tape
5 fruit
5 banana
5 spoon
5 hat
5 ink
5 lamp
5 umbrella
6 curd
6 detergent
6 glue
6 hat
6 ink
6 jam
7 fruit
7 glue
7 tape
8 spoon
8 ink
8 rose
8 umbrella
9 curd
9 banana
9 ink
10 apple
10 spoon
10 ink
10 umbrella
10 jam

49 rows selected.

Generating the Apriori Algorithm in sql using the query:

Here I am using "listagg" Oracle function to pull all items pertaining to a tran_id
into one comma separated line.
A similar function is utilized in spark sql, while processing the data using
apriori algorithm.

with t1 as (select tran_id,item from apriori order by 1,2),

t2 as (select tran_id,listagg(item,',') within group(order by item ) as item
from t1 group by tran_id order by tran_id),
t3 as (select distinct item from apriori order by item),
t4 as (select a1.item item1,a2.item item2 from t3 a1,t3 a2 where a1.item <
a2.item)
select item1, item2,count(*) from t4,t2
where t2.item like '%'||t4.item1||'%'||t4.item2||'%'
group by t4.item1,t4.item2 having count(*)>1 order by 3 desc,1,2;

ITEM1 ITEM2 COUNT(*)

spoon umbrella 4
ink spoon 3
ink umbrella 3
apple curd 2
apple fruit 2
apple jam 2
apple pot 2
apple rose 2
banana ink 2
curd fruit 2
curd ink 2
curd jam 2
curd pot 2
curd rose 2
detergent glue 2
detergent hat 2
detergent jam 2
fruit lamp 2
fruit pot 2
fruit rose 2
fruit tape 2
glue hat 2
glue jam 2
glue tape 2
hat ink 2
hat jam 2
hat spoon 2
hat umbrella 2
ink jam 2
jam spoon 2
jam umbrella 2
pot rose 2

32 rows selected.

Verified the final results from spark-sql script output and sql query output and
they are matching.

Here I used the 19c feature of creating a function with "with" clause inside my
select query and pulled the data:

WITH
FUNCTION get_str(id IN NUMBER) RETURN varchar2 IS
v_str varchar2(200):='';
BEGIN
for c1 in (select item from apriori where tran_id = id order by 1) loop
v_str:=v_str||' '||c1.item;
end loop;
RETURN v_str;
END;
t1 as (SELECT get_str(tran_id) item
FROM (select distinct tran_id from apriori order by tran_id)),
t2 as (select distinct item item from apriori order by item),
t3 as (select a1.item item1,a2.item item2 from t2 a1,t2 a2 where a1.item <
a2.item)
select t3.item1, t3.item2,count(*) from t3,t1
where t1.item like '%'||t3.item1||'%'||t3.item2||'%'
group by t3.item1,t3.item2 having count(*)>1 order by 3 desc,1,2;

ITEM1 ITEM2 COUNT(*)

32 rows selected.

I don't like using complex aggregate functions or the 19c WITH CLAUSE FUNCTION
feature to process the transaction data
and find frequent patterns using apriori algorithm.

Can I just use the transactional table's data, as it is and produce the frequent
patterns.
The following is the simplest solution to produce frequent patterns using apriori
algorithm.

with t1 as (select tran_id,item from apriori order by 1,2)

select a1.item,a2.item, count(*) from t1 a1,t1 a2
where a1.tran_id = a2.tran_id and a1.item < a2.item
group by a1.item,a2.item having count(*)> 1 order by 3 desc,1,2

ITEM ITEM COUNT(*)

32 rows selected.

Pls note, the three above sql queries are producing identical results, and they are
matching.

To find frequent patterns of three items bought together in my transactional table,

I use the following modified query.

with t1 as (select tran_id,item from apriori order by 1,2)

select a1.item,a2.item, a3.item,count(*) from t1 a1,t1 a2,t1 a3
where a1.tran_id = a2.tran_id and a2.tran_id = a3.tran_id and a1.item <
a2.item and a2.item < a3.item
group by a1.item,a2.item,a3.item having count(*)> 1 order by 4 desc,1,2,3;

ITEM ITEM ITEM COUNT(*)

ink spoon umbrella 3
apple curd fruit 2
apple curd pot 2
apple curd rose 2
apple fruit pot 2
apple fruit rose 2
apple pot rose 2
curd fruit pot 2
curd fruit rose 2
curd pot rose 2
detergent glue hat 2
detergent glue jam 2
detergent hat jam 2
fruit pot rose 2
glue hat jam 2
hat spoon umbrella 2
jam spoon umbrella 2

17 rows selected.

Sql is very powerful and you can do most complex joins and solve complicated logic.

To find frequent patterns of four items bought together in my transactional table,

I use the following modified query.

with t1 as (select tran_id,item from apriori order by 1,2)

select a1.item,a2.item, a3.item,a4.item,count(*) from t1 a1,t1 a2,t1 a3,t1 a4
where a1.tran_id = a2.tran_id and a2.tran_id = a3.tran_id and a3.tran_id =
a4.tran_id and a1.item < a2.item
and a2.item < a3.item and a3.item < a4.item
group by a1.item,a2.item,a3.item,a4.item having count(*)> 1 order by 5
desc,1,2,3,4

ITEM ITEM ITEM ITEM COUNT(*)

apple curd fruit pot 2
apple curd fruit rose 2
apple curd pot rose 2
apple fruit pot rose 2
curd fruit pot rose 2
detergent glue hat jam 2

6 rows selected.

This simplified tables' join is used to find frequent pattern of 4 items sold
together.
This simplified tables' join can be used in a package to find the frequent pattern
of n items sold together in a dataset.

I processed the complete logic in the following pl/sql script.

The output is not sorted as it was with sql queries' output as shown above in my
sql queries.
Though the sql queries are complex,the amount of coding is less as compared to
pl/sql scripting.

set serverout on size 1000 linesize 200

declare
v_num number:=0;
v_str varchar2(200):='';
v_stt varchar2(100):='';
v_nut number:=0;
begin
for c0 in (with t as (select distinct item from apriori order by 1)
select t1.item item1,t2.item item2 from t t1, t t2
where t1.item < t2.item) loop
v_num:=0;

for c1 in (select tran_id,item from apriori order by 1,2) loop

if (v_str is null and v_nut = 0) then
v_str:=c1.item;
v_stt:=c1.item;
v_nut:=c1.tran_id;
elsif (v_str is not null and v_nut = c1.tran_id and v_stt <> c1.item) then
v_str:=v_str||','||c1.item;
v_stt:=c1.item;
elsif (v_nut <> c1.tran_id and v_nut <> 0) then

if ( v_str like '%'||c0.item1||'%'||c0.item2||'%') then

v_num:=v_num+1;

end if;
v_nut:=c1.tran_id;
v_stt:=c1.item;
v_str:=c1.item;
end if;
end loop;

if ('%'||c0.item1||'%'||c0.item2||'%' like v_str) then

v_num:=v_num+1;

end if;
if (v_num > 1) then
dbms_output.put_line(v_num||' -- '||c0.item1||' '||c0.item2);
end if;
end loop;
end;

Statement processed.
2 -- apple curd
2 -- apple fruit
2 -- apple jam
2 -- apple pot
2 -- apple rose
2 -- banana ink
2 -- curd fruit
2 -- curd ink
2 -- curd jam
2 -- curd pot
2 -- curd rose
2 -- detergent glue
2 -- detergent hat
2 -- detergent jam
2 -- fruit lamp
2 -- fruit pot
2 -- fruit rose
2 -- fruit tape
2 -- glue hat
2 -- glue jam
2 -- glue tape
2 -- hat ink
2 -- hat jam
2 -- hat spoon
2 -- hat umbrella
2 -- ink jam
3 -- ink spoon
3 -- ink umbrella
2 -- jam spoon
2 -- jam umbrella
2 -- pot rose
4 -- spoon umbrella

Pls note, if we are mining for frequent patterns for three or more items together,
some of the above scripts' logic changes.
Most of these scripts holds good for frequent pattern search of two items in a
given dataset.

Pls note, I had chosen my transactional table with items' details in multiple rows,
one item per row.
If we need to process the data of multiple items from a single row, then the
scripts need to be modified accordingly.

Next steps:
I tested these scripts with a smaller dataset.
Bench marking all the available options on a larger dataset and choosing the best
method.
Fine tuning the sql queries to perform better, using hints and modifying the
tables' join order etc.
Implement code change to generate the support,confidence numbers.
I implemented the sql query with listagg option in spark sql. Using other query
options in spark sql and generate output.
Using pl/sql in spark sql is not supported. Convert the pl/sql logic into hpl/sql
and use it inside spark sql to generate the output.

Happy scripting.

Reference:
http://www.orafaq.com/node/3138
http://www.orafaq.com/node/3222 To install spark and setup spark on my desktop.

https://livesql.oracle.com/ I tested my Apriori Algorithm's sql

scripts here.
https://dzone.com/articles/apriori-itemset-generation-with-oracle-sql
https://programming.vip/docs/data-mining-apriori-association-analysis-to-reproduce-
the-myth-of-beer-diapers.html

"""

Day 27
No ratings yet
Day 27
6 pages
1 Lab Program 3 2 Vinay Sirohi 3 2139472: December 1, 2021
No ratings yet
1 Lab Program 3 2 Vinay Sirohi 3 2139472: December 1, 2021
6 pages
Ex 1
No ratings yet
Ex 1
8 pages
Computer
No ratings yet
Computer
13 pages
IP Class 12 Worksheet-02
No ratings yet
IP Class 12 Worksheet-02
3 pages
E-Note 28879 Content Document 20241209125940PM
No ratings yet
E-Note 28879 Content Document 20241209125940PM
20 pages
Dataframe+wrkst+-2+ (+with+solutions) Add
No ratings yet
Dataframe+wrkst+-2+ (+with+solutions) Add
4 pages
Pyspark Interview Questions
No ratings yet
Pyspark Interview Questions
4 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
Sales Data Analysis Report
No ratings yet
Sales Data Analysis Report
10 pages
57 Pandas
No ratings yet
57 Pandas
7 pages
Association Rules Ans
No ratings yet
Association Rules Ans
28 pages
E Farm Dbms
No ratings yet
E Farm Dbms
39 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
56 pages
Apriori Algorithm in PL/SQL
No ratings yet
Apriori Algorithm in PL/SQL
4 pages
Ppsfinal 2
No ratings yet
Ppsfinal 2
5 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Practical
No ratings yet
Practical
6 pages
Practical 3
No ratings yet
Practical 3
14 pages
PRACTICALS
No ratings yet
PRACTICALS
52 pages
Exp9 - Apriori - Ipynb - Colaboratory
No ratings yet
Exp9 - Apriori - Ipynb - Colaboratory
16 pages
Python DataFrame Techniques
No ratings yet
Python DataFrame Techniques
10 pages
Market Basket Analysis Guide
No ratings yet
Market Basket Analysis Guide
6 pages
Normalized Database Tables
No ratings yet
Normalized Database Tables
7 pages
DWM Lab1
No ratings yet
DWM Lab1
5 pages
RLAB KP
No ratings yet
RLAB KP
16 pages
Inven Track
No ratings yet
Inven Track
3 pages
Student Copy of Apriori Example - Colaboratory
No ratings yet
Student Copy of Apriori Example - Colaboratory
14 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
PySpark Cheatsheet - Elaborate
No ratings yet
PySpark Cheatsheet - Elaborate
14 pages
Normal Forms
No ratings yet
Normal Forms
23 pages
Normalization
No ratings yet
Normalization
11 pages
Python and SQL Code Exercises
No ratings yet
Python and SQL Code Exercises
10 pages
DBMS Project
No ratings yet
DBMS Project
4 pages
Big Data
No ratings yet
Big Data
5 pages
Data Mining 2, 3 Material
No ratings yet
Data Mining 2, 3 Material
173 pages
Record Book Programs 2024-2025
No ratings yet
Record Book Programs 2024-2025
11 pages
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
No ratings yet
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
18 pages
Unit-4 Da
No ratings yet
Unit-4 Da
15 pages
Stationary Management System Ip Class Xii (2024-25)
No ratings yet
Stationary Management System Ip Class Xii (2024-25)
26 pages
MRA Project Milestone2 PDF
100% (1)
MRA Project Milestone2 PDF
1 page
Marking Scheme Practical Paper
No ratings yet
Marking Scheme Practical Paper
7 pages
Dataframe
No ratings yet
Dataframe
19 pages
Stationary Shop Management System (Ip Class Xii)
No ratings yet
Stationary Shop Management System (Ip Class Xii)
23 pages
Ab Initio Interview Questions Scenarios
No ratings yet
Ab Initio Interview Questions Scenarios
44 pages
Cs Project..
No ratings yet
Cs Project..
17 pages
Customer Data Outliers Pyspark
No ratings yet
Customer Data Outliers Pyspark
1 page
PWC Data Analyst Interview
No ratings yet
PWC Data Analyst Interview
33 pages
Datamining Lect2 Frequent
No ratings yet
Datamining Lect2 Frequent
59 pages
Python Pandas and DataFrame Basics
No ratings yet
Python Pandas and DataFrame Basics
20 pages
Python Programming Examples
No ratings yet
Python Programming Examples
53 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Leet Code Problems DBMS
No ratings yet
Leet Code Problems DBMS
14 pages
12 Ip Ak 18112024
No ratings yet
12 Ip Ak 18112024
4 pages
Practical List
No ratings yet
Practical List
9 pages
Experiment No. 9
No ratings yet
Experiment No. 9
9 pages
PL/SQL Decision Tree Guide
No ratings yet
PL/SQL Decision Tree Guide
5 pages
Squares and Square Roots
No ratings yet
Squares and Square Roots
3 pages
Time Series Analysis in Spark SQL
No ratings yet
Time Series Analysis in Spark SQL
5 pages
Logic Works 06
No ratings yet
Logic Works 06
15 pages
Naive Bayes with SQL for Educators
No ratings yet
Naive Bayes with SQL for Educators
4 pages
Linear Regression Algorithm in PL/SQL
No ratings yet
Linear Regression Algorithm in PL/SQL
1 page
Tablespace Growth Analysis Report
No ratings yet
Tablespace Growth Analysis Report
4 pages
Prime Numbers
No ratings yet
Prime Numbers
7 pages
Oracle Index Optimization Guide
No ratings yet
Oracle Index Optimization Guide
14 pages
Handling Large Numbers in Oracle
No ratings yet
Handling Large Numbers in Oracle
9 pages
Table/Index Row Count Mismatch
No ratings yet
Table/Index Row Count Mismatch
3 pages
Performance On Compound Column Indexes
No ratings yet
Performance On Compound Column Indexes
8 pages
Efficient Large Table Purge Script
No ratings yet
Efficient Large Table Purge Script
3 pages
Oracle View Creation Script
No ratings yet
Oracle View Creation Script
2 pages
PLSQL String Parser
No ratings yet
PLSQL String Parser
9 pages
Who Am I?
100% (1)
Who Am I?
35 pages
Cloud Computing Exam Paper 2020
No ratings yet
Cloud Computing Exam Paper 2020
2 pages
German A1 Notes PDF
100% (2)
German A1 Notes PDF
118 pages
Demand Paging: Amna Ahmad Muhammad Mustafa
0% (1)
Demand Paging: Amna Ahmad Muhammad Mustafa
24 pages
Temenos T24 Close of Business: User Guide
No ratings yet
Temenos T24 Close of Business: User Guide
12 pages
Agile Testing Tutorial
0% (1)
Agile Testing Tutorial
13 pages
Intro to C Programming Basics
No ratings yet
Intro to C Programming Basics
15 pages
G de Quick Start Guide 073010
No ratings yet
G de Quick Start Guide 073010
37 pages
Android IPC and Binder Guide
No ratings yet
Android IPC and Binder Guide
29 pages
SAP ArchiveLink
No ratings yet
SAP ArchiveLink
4 pages
Synopsis Centralized Visa Processing System
100% (3)
Synopsis Centralized Visa Processing System
23 pages
Class Programme: Type Approval
No ratings yet
Class Programme: Type Approval
10 pages
Google My Business: Made by Anshdeep
No ratings yet
Google My Business: Made by Anshdeep
10 pages
Log A
No ratings yet
Log A
20 pages
BBB
No ratings yet
BBB
199 pages
MapInfoProfessionalInstallGuide 2
No ratings yet
MapInfoProfessionalInstallGuide 2
69 pages
TW 06082012 Mca
100% (1)
TW 06082012 Mca
426 pages
Wireless Camera Setup Guide
No ratings yet
Wireless Camera Setup Guide
9 pages
Chapter 2 Basic Hardware and Software
No ratings yet
Chapter 2 Basic Hardware and Software
33 pages
Let Us C Yashwant Kanetkar Notes
0% (7)
Let Us C Yashwant Kanetkar Notes
773 pages
7 Year+ Resume
No ratings yet
7 Year+ Resume
10 pages
Learn Microsoft Visual Basic Net in A Weekend
0% (1)
Learn Microsoft Visual Basic Net in A Weekend
228 pages
DNS Tunneling with DNScat Guide
No ratings yet
DNS Tunneling with DNScat Guide
14 pages
Kendall Sad9 PP 16 Saaaad
No ratings yet
Kendall Sad9 PP 16 Saaaad
88 pages
ASSIGNMENT Questions Course
No ratings yet
ASSIGNMENT Questions Course
85 pages
Basavaraj-Azure Developer
No ratings yet
Basavaraj-Azure Developer
6 pages
Android App Components Guide
No ratings yet
Android App Components Guide
57 pages
Foundation Admin 2023
No ratings yet
Foundation Admin 2023
152 pages
Producer Consumer Problem With Pthreads Using Semaphores
No ratings yet
Producer Consumer Problem With Pthreads Using Semaphores
7 pages
Mqseries and Cics Connections: For Domino
No ratings yet
Mqseries and Cics Connections: For Domino
4 pages
SQL Server Installation Guide
No ratings yet
SQL Server Installation Guide
54 pages

Apriori Algorithm in SQL, PL/SQL and Spark SQL

Uploaded by

Apriori Algorithm in SQL, PL/SQL and Spark SQL

Uploaded by

"""

Apriori Algorithm in sql,pl/sql and spark sql

The sql scripts were tested in 11g & 19c databases.

Readers discretion/ingenuity is advised, in making necessary changes/modifications

Pls note, I am not implementing support,confidence calculation in my scripts.

Fortunately, I implemented table joins/sql commands in this script to accomplish

In spark sql, listagg function is not available, instead, a similar option

from pyspark.sql import SparkSession

print(df.count(),len(df.columns)) #equivalent to python df.shape

spark.sql("select tran_id,count(*) num_products from data group by tran_id having

#similar to listagg function in Oracle

drop table apriori;

insert into apriori values(1,'curd');

select * from apriori;

Generating the Apriori Algorithm in sql using the query:

with t1 as (select tran_id,item from apriori order by 1,2),

ITEM1 ITEM2 COUNT(*)

ITEM1 ITEM2 COUNT(*)

with t1 as (select tran_id,item from apriori order by 1,2)

ITEM ITEM COUNT(*)

To find frequent patterns of three items bought together in my transactional table,

with t1 as (select tran_id,item from apriori order by 1,2)

ITEM ITEM ITEM COUNT(*)

To find frequent patterns of four items bought together in my transactional table,

with t1 as (select tran_id,item from apriori order by 1,2)

ITEM ITEM ITEM ITEM COUNT(*)

I processed the complete logic in the following pl/sql script.

set serverout on size 1000 linesize 200

for c1 in (select tran_id,item from apriori order by 1,2) loop

if ( v_str like '%'||c0.item1||'%'||c0.item2||'%') then

if ('%'||c0.item1||'%'||c0.item2||'%' like v_str) then

https://livesql.oracle.com/ I tested my Apriori Algorithm's sql

You might also like