0% found this document useful (0 votes)

31 views12 pages

A Simple Approach To Weather Predictions by Using Naive Bayes Classifiers

This document summarizes a research paper that presents a simple approach to weather prediction using naive Bayes classifiers. The paper compares the accuracy of a naive Bayes classifier using different probability distributions on weather data. It describes how the naive Bayes classifier works and the conditional probability formula used. It then outlines the key steps of the program, including data preprocessing, model training and validation, and implementing five different probability distributions (Gaussian, Laplace, log-normal, uniform, triangular) to compare classifier performance.

Uploaded by

أميرة ألامل

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views12 pages

A Simple Approach To Weather Predictions by Using Naive Bayes Classifiers

Uploaded by

أميرة ألامل

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

A Simple Approach to Weather Predictions by using Naive

Bayes Classifiers
Agnieszka Lutecka1 , Zuzanna Radosz1
1
Silesian University Of Technology, Faculty of Applied Mathematics, Kaszubska 23, 44-100 Gliwice, Poland

Abstract
This article presents and explains how we have used Naive Bayes Classifier and date base to predict the weather. We compare
the dependence of accuracy on the probability of different distributions for various types of data.

Keywords
Naive Bayes Classifier, probability distribution, Python

1. Introduction • 𝑃 (𝐵|𝐴) is the probability that we would observe

such data if the hypothesis were true
Many IT systems use the widely understood artificial • 𝑃 (𝐴) is the a priori probability that the hypothe-
intelligence [1, 2]. Artificial intelligence methods also sis is true
apply to the data processing [3, 4, 5]. The wide appli- • 𝑃 (𝐵) is the probability of the observed data
cation of artificial intelligence also applies to systems
installed in cars, which are used, for example, to detect Importantly, the naive Bayes classifier assumes that
the quality of the surface [6]. Many technical problems the influence of the attributes is independent of each
lead to the optimization tasks [7, 8, 9], where the com- other, therefore 𝑃 (𝐵|𝐴) can be written as
plexity of the [10, 11, 12] functional is a big challenge. 𝑃 (𝐵1 |𝐴) * 𝑃 (𝐵2 |𝐴) * ... * 𝑃 (𝐵𝑛 |𝐴). (2)
Optimization processes concern many different areas of The naivety of this classifier follows from the above as-
life require constant search for new, more effective op- sumption.
timization methods [13, 14] based on the observation
of nature [15, 16, 17]. A very important and important
branch of artificial intelligence are the applications of 3. Description of how the
broadly understood [18] neural networks. Interesting
applications concern health protection [19], adult care
program works
[20, 21]. Artificial intelligence methods are also used We started the project with an analysis of data from the
for weather forecasting [22, 23, 24], as well as for the database. Initially, we shuffled and normalized the data
detection of features [25, 26, 27]. to the 0 − 1 range to operate on smaller numbers, thus
increasing the performance of our program. After di-
2. Program description viding into validation and training sets, we move on to
the main part of our program, i.e. the use of the naive
The task of our program is to predict the weather using Bayesian classifier. Its task is to return the name of the
the naive Bayes classifier. It is especially suitable for most probable weather for a given sample.
problems with a lot of input data, so it is perfect for our This algorithm uses a probability distribution defined by
project. It uses a conditional probability formula that a density function that describes how the probability of
looks like this: a random variable (x) is distributed. We implemented 5
different probability distributions to compare the algo-
𝑃 (𝐵|𝐴) * 𝑃 (𝐴) rithm’s efficiency for different probability density formu-
𝑃 (𝐴|𝐵) = (1)
𝑃 (𝐵) las. We use the following distributions: Gauss, Laplace,
log-normal, uniform, triangular.
• 𝐴 is our hypothesis
• 𝐵 is the observed data (attribute values)
3.1. Gaussian distribution
SYSYEM 2022: 8th Scholar’s Yearly Symposium of Technology, Engi- It is one of the most important probability distributions,
neering and Mathematics, Brunek, July 23, 2022 playing an important role in statistics. The formula for
" [email protected] (A. Lutecka); [email protected] the probability density is as follows:
(Z. Radosz)
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
1 −(𝑥 − 𝜇)2
CEUR
Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org) √ exp( ) (3)
2𝜎 2
http://ceur-ws.org
Workshop

𝜎 2𝜋
ISSN 1613-0073
Proceedings

64
Agnieszka Lutecka et al. CEUR Workshop Proceedings 64–75

The probability function plot of this distribution is a bell- Where 𝜇 is the mean value and 𝜎 is the standard devi-
shaped curve (the so-called bell curve). ation.
The density function graph is as follows:

Figure 1: Graph of the probability density function

Source: Wikipedia.org

Figure 3: Graph of the probability density function

Where 𝜇 is the expected or mean value and 𝜎 stan- Source: Wikipedia.org
dard deviation. The red line corresponds to the standard
normal distribution.

3.2. Laplace Distribution 3.4. Triangular Distribution

It’s a continuous probability distribution named after It is a continuous probability distribution of a random
Pierre Laplace. The probability density is given by the variable. The probability density of a triangular distribu-
formula: tion can also be expressed as:
1 −|𝑥 − 𝜇|
exp( ) (4) √
0 dla 𝑥 < 𝜇 − 6𝜎
⎧
2𝑏 𝑏 ⎪
⎪ √
Where 𝜇 is the expected value, i.e. the mean, and b> 0 is + √16𝜎 dla 𝜇 − 6𝜎 ≤ 𝑥 ≤ 𝜇
⎪ 𝑥−𝜇
⎨
6𝜎 2 √
the scale parameter. The function graph looks like this: 𝑓 (𝑥) =
⎪− 𝑥−𝜇 2 +
√1 dla 𝜇 ≤ 𝑥 ≤ 𝜇 + 6𝜎
⎩ 6𝜎 6𝜎 √
⎪
⎪
0 dla 𝑥 > 𝜇 + 6𝜎
(6)
Where 𝜇 is the mean value and 𝜎 is the standard devi-
ation.
The function graph looks like this:

Figure 2: Graph of the probability density function

Source: Wikipedia.org

3.3. Log Normal Distribution Figure 4: Graph of the probability density function
Source: Wikipedia.org
It is the continuous probability distribution of a positive
random variable whose logarithm is normally distributed.
Pattern:
1 −(𝑙𝑛𝑥 − 𝜇)2 3.5. Uniform distribution
√ exp( ) * 1(0, ∞) (5)
2𝜋𝜎𝑥 2𝜎 2 It is a continuous probability distribution for which the
probability density in the range from a to b is constant

65
Agnieszka Lutecka et al. CEUR Workshop Proceedings 64–75

and not equal to zero, and otherwise equal to zero. We column j, sr [j] - the mean value of the column j and
can see it in the formula below std [j] - standard deviation of the values from column j.
√ These methods process the input data through the formu-
⎨0 dla 𝑥 < 𝜇 −√ 3𝜎
⎧
⎪
√ las for the probability distribution and return the value
𝑓 (𝑥) = 2√13𝜎 dla 𝜇 − 3𝜎 ≤ 𝑥 ≤ 𝜇 + 3𝜎 (7)of the probability density function at the point sample
√
0 dla 𝑥 > 𝜇 + 3𝜎 [j], that is, the probability of sample [j] occurring under
⎪
⎩
the conditions sr [j] and st [j]. Finally, the algorithm
Where 𝜇 is the mean value and 𝜎 is the standard devia- returns the name of the weather most likely to occur at
tion. the sample input.
The function graph is as follows: Each probability distribution has a differently defined
density function. Therefore, the distributions may differ
in the results. Below we present the pseudocode of Naive-
Classifier class methods with an emphasis on processing
the input data by probability distributions.

Data: Input 𝑑𝑎𝑡𝑎, 𝑠𝑎𝑚𝑝𝑙𝑒, 𝑛𝑎𝑚𝑒

Result: Weather Name
Extract weather records ;
Enter the weather record sets into the list 𝑛𝑎𝑚𝑒𝑠;
𝑖 := 0;
for 𝑖 < 𝑙𝑒𝑛(𝑛𝑎𝑚𝑒𝑠) do
Figure 5: Graph of the probability density function 𝑡𝑟 = [];
Source: Wikipedia.org 𝑗 := 1;
for 𝑗 < 7 do
Calculate the mean value of the column j ;
Calculate the column standard deviation j
3.6. Select Distributions ;
if 𝑛𝑎𝑚𝑒 == 𝑙𝑎𝑝𝑙𝑎𝑐𝑒′ 𝑎 then
As we can see, the formulas differ significantly, which tr.append(NaiveClassifier.laplace
will definitely have a big impact on the effectiveness (sample[j],sr[j-1]));
of the program. We tried to choose such formulas for end
the probability density so that the values were not very if 𝑛𝑎𝑚𝑒 == 𝑙𝑜𝑔 − 𝑛𝑜𝑟𝑚𝑎𝑙𝑛𝑦 then
divergent, as we will present in the next paragraphs. tr.append(NaiveClassifier.logarytmiczny
(sample[j],std[j-1],sr[j-1]));
4. Algorithm end
if 𝑛𝑎𝑚𝑒 == 𝑗𝑒𝑑𝑛𝑜𝑠𝑡𝑎𝑗𝑛𝑦 then
The naive Bayes classifier uses probability density func- tr.append(NaiveClassifier.jednostajny
tions to compute the probability of a given start con- (sample[j],std[j-1],sr[j-1]));
dition. The NaiveClassifier class has 6 static methods: end
„laplace”, „logarytmiczny”, „jednastajny”, „trojkatny”, if 𝑛𝑎𝑚𝑒 == 𝑡𝑟𝑜𝑗𝑘𝑎𝑡𝑛𝑦 then
„gauss”, „bayes”, where the first 5 are different proba- tr.append(NaiveClassifier.trojkatny
bility distributions . We use as many as 5 to compare (sample[j],std[j-1],sr[j-1]));
the algorithm’s effectiveness for different formulas on end
the probability density. The "bayes" method accepts the if 𝑛𝑎𝑚𝑒 == 𝑔𝑎𝑢𝑠𝑠 then
following input data: data – training set, sample – a set of tr.append(NaiveClassifier.gauss
values in the range 0-1 that represent successive database (sample[j],std[j-1],sr[j-1]));
columns, name – the name of the probability distribution end
(Gauss, Laplace, log-normal, uniform, triangular). At the end
beginning, the algorithm extracts the records with the Return the probability of the given weather;
given weather. Then, using the loops, the program goes end
through all the records of the training set, calculating Return the name of the most likely weather;
the mean values and standard deviation of each column Algorithm 1: Bayes algorithm
for each type of weather. Using the given "name", the
algorithm calls the appropriate method, where the input
data is: sample [j] - where j is the sample value for the

66
Agnieszka Lutecka et al. CEUR Workshop Proceedings 64–75

Data: Input 𝑥, 𝑚𝑒𝑎𝑛 Data: Input 𝑥, 𝑠𝑡𝑑,𝑠𝑟

Result: The value of the density function at x Result: The value of the density function at x
𝑏 := 2; return (1/(dev*np.sqrt(2*np.pi)))*np.exp(-((x-
return mean)**2)/(2*dev**2));
((1/(2*b))*(math.exp(-(math.fabs(x-mean)/b))));
Algorithm 2: Laplace’s algorithm Algorithm 6: Gauss algorithm

Data: Input 𝑥, 𝑠𝑡𝑑,𝑠𝑟

Result: The value of the density function at x 5. Databases
if 𝑥 > 0 then
return (1/((math.pi*2)**(1/2)*std*x))*math.exp(- 5.1. Database Analysis
((math.log(x)-sr)**2)/(2*std**2));
For our project, we use the Istanbul Weather Data
database, downloaded from the kaggle website. The
end database has 3896 records. It contains the following
else data columns: DateTime, Condition, Rain, MaxTemp,
return 0; MinTemp, SunRise, SunSet, MoonRise, MoonSet, Avg-
end Wind, AvgHumidity, AvgPressure.
Algorithm 3: Logarithmic algorithm

Data: Input 𝑥, 𝑠𝑡𝑑,𝑠𝑟

Result: The value of the density function at x
if 𝑥 < 𝑠𝑟 − 3 * *(1/2) * 𝑠𝑡𝑑 then
return 0;
end
if 𝑥 >= 𝑠𝑟 − 3 * *(1/2) * 𝑠𝑡𝑑 && 𝑥 <=
𝑠𝑟 + 3 * *(1/2) * 𝑠𝑡𝑑 then
return 1/(2*(3**(1/2))*std);
end
if 𝑥 > 𝑠𝑟 + 3 * *(1/2) * 𝑠𝑡𝑑 then
return 0;
end
else
return 0; Figure 6: Data types in each column
end
Algorithm 4: Uniform algorithm
We analyzed the data using a matrix graph that shows
the relationships between weather variables.
Data: Input 𝑥, 𝑠𝑡𝑑,𝑠𝑟
The chart is dominated by warm colors, which means
Result: The value of the density function at x
that most of the records in our database are sunny and
if 𝑥 < 𝑠𝑟 − 6 * *(1/2) * 𝑠𝑡𝑑 then
slightly cloudy.
return 0;
It can be seen that the data is presented in compact
end
groups. This means that the parameters for different
if 𝑥 >= 𝑠𝑟 − 6 * *(1/2) && 𝑥 <= 𝑠𝑟 then
weather conditions do not differ much from the others.
return (x-sr)/(6*std**2)+1/(6**(1/2)*std);
This can make our algorithm that determines the weather
end
based on these parameters not very accurate. There
if 𝑥 > 𝑠𝑟𝑎𝑛𝑑𝑥 <= 𝑠𝑡𝑑 + 6 * *(1/2) * 𝑠𝑡𝑑 then
may be situations where the algorithm will calculate the
return -(x-sr)/(6*std**2) + 1/(6**(1/2)*std);
weather "Sunny" because it was the most probable, but
end
the actual weather will be different. In the Experiments
if 𝑥 > 𝑠𝑟 + 6 * *(1/2) * 𝑠𝑡𝑑 then
section, we will test and analyze the obtained results of
return 0;
the algorithm’s accuracy.
end
else
return 0; We also analyzed the data using a violin graph for all
end weather conditions and maximum temperature as we can
Algorithm 5: The triangle algorithm see below:

67
Agnieszka Lutecka et al. CEUR Workshop Proceedings 64–75

Figure 7: Matrix graph

In the attached photo we can see how the temperature To normalize the data to the range 0-1, we changed int64
value changes for a given weather, for example for "Mod- to float64.
erate rain", i.e. moderate rain, the maximum temperature
ranges from 5 to 20 degrees.
6. Implementation
5.2. Database modification 6.1. ProcessingData class
DateTime, SunRise, SunSet, MoonRise, MoonSet will not Our project consists of two files: a file containing the
be used in our project, so we can get rid of them. program code - "Pogoda.ipnb" and the database - "Istan-
d a t a . drop ( ’ DateTime ’ , a x i s =1 , bul Weather Data.csv". After analyzing the data from
i n p l a c e = True ) the database, we went to the "ProcessingData" class, in
d a t a . drop ( ’ S u n R i s e ’ , a x i s =1 , which we created 3 static methods: shuffle, splitSet and
i n p l a c e = True ) normalize.
d a t a . drop ( ’ SunSet ’ , a x i s =1 ,
i n p l a c e = True )
d a t a . drop ( ’ MoonRise ’ , a x i s =1 ,
6.2. Shuffle method
i n p l a c e = True )
d a t a . drop ( ’ MoonSet ’ , a x i s =1 , It takes base as input, i.e. our database. We use for loop
i n p l a c e = True ) to go through it selecting records and swapping them.

68
Agnieszka Lutecka et al. CEUR Workshop Proceedings 64–75

Figure 8: Violin graph

Data: Input 𝑏𝑎𝑠𝑒 Data: Input 𝑥, 𝑘

Result: Database with jumbled records Result: Training and validation set
for 𝑖 in range(len(base)-1,-1,-1): do 𝑛 = 𝑖𝑛𝑡(𝑙𝑒𝑛(𝑥) * 𝑘)
take two records from the database, the 𝑥𝑇 𝑟𝑎𝑖𝑛 = 𝑥[: 𝑛]
second with a random index and swap them 𝑥𝑉 𝑎𝑙 = 𝑥[𝑛 :]
end return xTrain, xVal;
return base; Algorithm 8: SplitSet algorithm
Algorithm 7: The shuffle algorithm

def s p l i t S e t ( x , k ) :
Program code n= i n t ( l e n ( x ) ∗ k )
x T r a i n =x [ : n ]
@staticmethod
x V a l =x [ n : ]
def s h u f f l e ( base ) :
r e t u r n xTrain , xVal
f o r i in range ( len ( base )
−1 , −1 , −1) :
base . i l o c [ i ] , base . i l o c [ rd
6.4. Normalize method
. r a n d i n t ( 0 , i ) ]= b a s e .
i l o c [ rd . r a n d i n t ( 0 , i ) ] , Takes x, which is a database that will have records scram-
base . i l o c [ i ] bled using the shuffle method. At the beginning, we enter
return base all data from the database into the variable values, except
for the string value, and the values of the column names
into the variable columnNames. We loop through all the
6.3. Splitset method columns in the column, and then take all the rows in
the column column and store them in the variable data.
It takes as input x - database and k - division of the set. In
Variables max1 and min1 are assigned maximum and
the variable n we write the length of the set x multiplied
minimum values from date. Using the next loop, we go
by k to know how to divide this set. Then, to two xTrain
through all the rows and assign to the variable val the
variables, we write the data from the database to the
formula for normalization min-max, that is, we subtract
value n, creating the training set, and to the variable xVal
the coordinate database record [row, column] from the
- all data following the value n, creating the validation
value min1, and then divide this difference by the differ-
set. Finally, we return both of these sets.
ence between max1 and min1. Finally, we write the value
after normalization to the database. The method returns
us a normalized database.
Program code

69
Agnieszka Lutecka et al. CEUR Workshop Proceedings 64–75

Data: Input 𝑥 described in general in the "Algorithm" section, so now

Result: Standardized database we will look at the details. First, the method extracts the
𝑣𝑎𝑙𝑢𝑒𝑠 = 𝑥.𝑠𝑒𝑙𝑒𝑐𝑡𝑑 𝑡𝑦𝑝𝑒𝑠(𝑒𝑥𝑐𝑙𝑢𝑑𝑒 =, , 𝑜𝑏𝑗𝑒𝑐𝑡′′ ) database records with the given weather name into sepa-
𝑐𝑜𝑙𝑢𝑚𝑛𝑁 𝑎𝑚𝑒𝑠 = 𝑣𝑎𝑙𝑢𝑒𝑠.𝑐𝑜𝑙𝑢𝑚𝑛𝑠.𝑡𝑜𝑙𝑖𝑠𝑡() rate lists. Then each of the lists created above is put into
for 𝑐𝑜𝑙𝑢𝑚𝑛𝑖𝑛𝑐𝑜𝑙𝑢𝑚𝑛𝑁 𝑎𝑚𝑒𝑠): do the "names" list. Additionally, we create a stringnames
take all the rows from the column column list with string weather names and a values list that will
𝑚𝑎𝑥1 = 𝑚𝑎𝑥(𝑑𝑎𝑡𝑎) 𝑚𝑖𝑛1 = 𝑚𝑖𝑛(𝑑𝑎𝑡𝑎) store the calculated probabilities for each weather.
for row in range(0,len(x) do
we go through all the lines
𝑣𝑎𝑙 = (𝑥.𝑎𝑡[𝑟𝑜𝑤, 𝑐𝑜𝑙𝑢𝑚𝑛] −
𝑚𝑖𝑛1)/(𝑚𝑎𝑥1 − 𝑚𝑖𝑛1)
𝑥.𝑎𝑡[𝑟𝑜𝑤, 𝑐𝑜𝑙𝑢𝑚𝑛] = 𝑣𝑎𝑙
end
end
return x;
Algorithm 9: The normalize algorithm

Program code
def normalize ( x ) :

v a l u e s =x . s e l e c t _ d t y p e s (
exclude =" o b j e c t " ) # s e l e c t
a l l d a t a from t h e d a t a b a s e
except the object , i . e .
string
columnNames= v a l u e s . columns .
tolist ()

f o r column i n columnNames :
d a t a =x . l o c [ : , column ] #
summon a l l rows i n
column column

max1=max ( d a t a )
min1=min ( d a t a )

f o r row i n r a n g e ( 0 , l e n ( x )
) : #we go t h r o u g h a l l
the l i n e s
v a l = ( x . a t [ row , column
] − min1 ) / ( max1−min1
)
x . a t [ row , column ]= v a l
return x

6.5. NaiveClassifier class and bayes

method
The NaiveClassifier class has 6 static methods: "laplace",
"logarithmic", "uniform", "triangle", "gauss", "bayes". The
first 5 are methods describing the functions of different
probability distributions. The "bayes" method has been

70
Agnieszka Lutecka et al. CEUR Workshop Proceedings 64–75

Data: Input 𝑑𝑎𝑡𝑎, 𝑠𝑎𝑚𝑝𝑙𝑒, 𝑛𝑎𝑚𝑒

Result: Weather name
Extract weather records ;
Enter weather record sets into 𝑛𝑎𝑚𝑒𝑠 ;
Enter weather names in the list 𝑠𝑡𝑟𝑖𝑛𝑔𝑛 𝑎𝑚𝑒𝑠 ;
𝑣𝑎𝑙𝑢𝑒𝑠 := [] 𝑖 := 0;
for 𝑖 < 𝑙𝑒𝑛(𝑛𝑎𝑚𝑒𝑠) do
𝑡𝑟 = [];
𝑠𝑟 = [];
𝑠𝑡𝑑 = [];
𝑗 := 1;
for 𝑗 < 7 do
Calculate the mean value of the column j ;
Calculate the column standard deviation j
;
if 𝑠𝑟[𝑗 − 1] == 0 then
sr[j-1]=0.0000001;
end
if 𝑠𝑡𝑑[𝑗 − 1] == 0 then
std[j-1]=0.0000001;
end
if 𝑛𝑎𝑚𝑒 == 𝑙𝑎𝑝𝑙𝑎𝑐𝑒′ 𝑎 then
tr.append(NaiveClassifier.laplace
(sample[j],sr[j-1]));
end
if 𝑛𝑎𝑚𝑒 == 𝑙𝑜𝑔 − 𝑛𝑜𝑟𝑚𝑎𝑙𝑛𝑦 then
tr.append(NaiveClassifier.logarytmiczny
(sample[j],std[j-1],sr[j-1]));
end
if 𝑛𝑎𝑚𝑒 == 𝑗𝑒𝑑𝑛𝑜𝑠𝑡𝑎𝑗𝑛𝑦 then
tr.append(NaiveClassifier.jednostajny
(sample[j],std[j-1],sr[j-1]));
end
if 𝑛𝑎𝑚𝑒 == 𝑡𝑟𝑜𝑗𝑘𝑎𝑡𝑛𝑦 then
tr.append(NaiveClassifier.trojkatny
(sample[j],std[j-1],sr[j-1]));
end
if 𝑛𝑎𝑚𝑒 == 𝑔𝑎𝑢𝑠𝑠 then
tr.append(NaiveClassifier.gauss
(sample[j],std[j-1],sr[j-1]));
end
end
values.append
(np.prod(tr)*len(names[i])/len(names));
end
𝐼𝑛𝑑𝑒𝑥 = 𝑣𝑎𝑙𝑢𝑒𝑠.𝑖𝑛𝑑𝑒𝑥(𝑚𝑎𝑥(𝑣𝑎𝑙𝑢𝑒𝑠));
return value from 𝑠𝑡𝑟𝑖𝑛𝑔𝑛𝑎𝑚𝑒𝑠 at index 𝐼𝑛𝑑𝑒𝑋 ;
Algorithm 10: Bayes algorithm

71
Agnieszka Lutecka et al. CEUR Workshop Proceedings 64–75

The next step is to loop through all the values of the @staticmethod
names list in sequence. Then we create auxiliary lists: tr d e f a n a l i z e ( T r a i n , Val , name ) :
[] - to store 6 probability values that correspond to the c o r r e c t =0
next database column, sr [] - to store the mean values f o r i in range ( len ( Val ) ) :
from each column, std - to store the standard deviations if NaiveClassifier .
for each column. We pass the next loop through all the c l a s s i f y ( Train , Val .
columns one by one. Then we calculate the mean and i l o c [ i ] , name ) == V a l .
standard deviation for a given column. The next step is i l o c [ i ] . Condition :
the conditions that prevent sr [] and std [] from occurring. c o r r e c t +=1
The time has come for timetables. Depending on the in- accuracy = c o r r e c t / len ( Val ) ∗100
put data "name" to the list tr [] we add the result of the return accuracy
function of the given distribution. After going through
the inner loop, we compute the value from Bayes’ theo-
rem. Based on the formula for conditional probability, we 7. Tests
multiply the values in the tr list, then multiply that prod-
uct by the list of names [i]. We divide the whole thing by We started our tests by checking the algorithm’s opera-
the length of the "names" list. We add the obtained result tion using various samples.
to the "values" list. After going through both loops, we
determine the index from the values list with the highest
value. Finally, we return the name of the weather with
the given index from the stringnames list.

6.6. AnalizingData class

Another class in our project is AnalizingData with the
Figure 9: Sample tests
Analyze method. This method measures the accuracy
as a percentage of the Bayes classifier. The input data
is: Train - training set, Val - validation set and name The above code shows us that depending on the value in
- name of the probability distribution. The algorithm the sample, the algorithm returns different values, which
first sets the value of the corrtect variable to 0. Then it proves the correct operation of the algorithm.
goes through the iterator loop and goes through all the
records of the Val set. If the value returned by the bayes The next step was to determine the accuracy of our al-
classifier at the input: Train, Val [i], name is the same gorithm for various probability distributions. For this we
as the weather name for the Val [i] record, increase the used the AnalizingData class with the analysis method.
variable correct by 1. Finally, the algorithm returns the We called the method for each of the 5 types of probabil-
accuracy, which we calculate by dividing correct by the ity distributions for the training and validation division
product the length of the validation set and 100. in the ratio of 7: 3, and then, using the plt package, we
displayed the graph.
Data: Input 𝑇 𝑟𝑎𝑖𝑛, 𝑉 𝑎𝑙, 𝑛𝑎𝑚𝑒
Result: Accuracy of the bayes algorithm
𝑐𝑜𝑟𝑟𝑒𝑐𝑡 := 0
𝑖 := 0 for 𝑖 < 𝑙𝑒𝑛(𝑉 𝑎𝑙)): do
if
𝑁 𝑎𝑖𝑣𝑒𝐶𝑙𝑎𝑠𝑠𝑖𝑓 𝑖𝑒𝑟.𝑐𝑙𝑎𝑠𝑠𝑖𝑓 𝑦(𝑇 𝑟𝑎𝑖𝑛, 𝑉 𝑎𝑙.𝑖𝑙𝑜𝑐[𝑖], 𝑛𝑎𝑚𝑒) ==
𝑉 𝑎𝑙.𝑖𝑙𝑜𝑐[𝑖].𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 then
correct+=1;
end
end
accuracy=correct/len(Val)*100;
return x;
Algorithm 11: Analyze algorithm Figure 10: The graph of the accuracy of the algorithm de-
pending on the probability distribution

Program code
As you can see in the attached picture, the algorithm
c l a s s AnalizingData :

72
Agnieszka Lutecka et al. CEUR Workshop Proceedings 64–75

has different accuracies depending on the probability dis-

tribution. The Gaussian distribution definitely exceeds
other distributions with its accuracy, which is over 60%.
This means that more than 60% of the weather names
returned were valid. However, the Laplace and uniform
distributions do not lag far behind. Their values are in the
range 50-60%. Log normal and triangular distributions
are the least efficient because their accuracy is less than
20%. Additionally, we measured the execution time of
the algorithm. It was almost 4.512 minutes.
Figure 12: Bar chart 1. for the training set 0.1
8. Experiments
8.1. Analysis of algorithm results for
normalized and non-normalized data
We tested the operation of our program for both nor-
malized and non-normalized values and we determined
the algorithm execution time for both data sets. The
graphs below show the dependence of the accuracy on
the probability of individual distributions.

Figure 13: Bar chart 2. for the training set 0.3

Figure 11: Bar chart for unnormalized data

The first plot shows the Bayesian operation for unnor-

malized data in a ratio of 7: 3, while the second plot shows Figure 14: Bar chart 3. for the training set 0.5
the operation for unnormalized data with the same par-
tition. The program launch time for the first graph was
4.512 minutes, while for the second graph it was 4.433
minutes. As we can see, the only significant difference
was when using the Laplace distribution, the accuracy of
which decreased by almost 10%. The remaining results
are similar for both types of data. The time difference is
insignificant as it is only 5.023s.

8.2. Analysis of the algorithm’s results

for different data divisions
The following charts show the efficiency of the algorithm
Figure 15: Bar chart 4. for the training set 0.9
for normalized data for various divisions into training
and validation sets:
The algorithm execution time decreases with the re-
duction of the training set. For the last execution of the algorithm, where the division was in the ratio of 9: 1, the

73
Agnieszka Lutecka et al. CEUR Workshop Proceedings 64–75

time was only 1.467 minutes. However, the first calcula- [5] R. Avanzato, F. Beritelli, M. Russo, S. Russo, M. Vac-
tions, where the division was in the ratio of 1: 9, took as caro, Yolov3-based mask and face recognition al-
long as 9,517 minutes. This is because by reducing the gorithm for individual protection applications, in:
training set, we increase the number of records in the CEUR Workshop Proceedings, volume 2768, CEUR-
validation set. As a result, the algorithm will be called WS, 2020, pp. 41–45.
more times and the most time-consuming elements such [6] M. Woźniak, A. Zielonka, A. Sikora, Driving sup-
as extracting records with a given weather or loops will port by type-2 fuzzy logic control model, Expert
be performed many times. Systems with Applications 207 (2022) 117798.
Analyzing the above, we can see that the accuracy [7] G. Borowik, M. Woźniak, A. Fornaia, R. Giunta,
of the Gaussian distribution is superior to all of them, C. Napoli, G. Pappalardo, E. Tramontana, A soft-
its value is practically unchanged. The Laplace distri- ware architecture assisting workflow executions
bution is in second place, almost tapping 60%. The on cloud resources, International Journal of Elec-
value of the uniform distribution ranges from 50-55 %. It tronics and Telecommunications 61 (2015) 17–23.
achieves the most on the last chart where the training doi:10.1515/eletel-2015-0002.
set is 0.9. Triangular and log normal distributions reach [8] T. Qiu, B. Li, X. Zhou, H. Song, I. Lee, J. Lloret,
much lower values than the previously mentioned dis- A novel shortcut addition algorithm with particle
tributions. The jump is quite big, around 30%. However, swarm for multisink internet of things, IEEE Trans-
the log-normalized distribution only slightly exceeds the actions on Industrial Informatics 16 (2019) 3566–
triangular distribution once, and it is in the third graph. 3577.
Nevertheless, the accuracy values of both distributions [9] G. Capizzi, G. Lo Sciuto, C. Napoli, R. Shikler,
never exceed 20%. M. Wozniak, Optimizing the organic solar cell man-
ufacturing process by means of afm measurements
and neural networks, Energies 11 (2018).
9. Conclusion [10] M. Woźniak, A. Sikora, A. Zielonka, K. Kaur, M. S.
Hossain, M. Shorfuzzaman, Heuristic optimization
We can conclude from this that the Gauss distribution
of multipulse rectifier for reduced energy consump-
is the best probability distribution for our database. The
tion, IEEE Transactions on Industrial Informatics
algorithm with this distribution, with each modification,
18 (2021) 5515–5526.
correctly determines about 60% of weather names, which
[11] G. Capizzi, G. Lo Sciuto, C. Napoli, E. Tramontana,
is a good but unsatisfactory value. This is due to the
M. Woźniak, A novel neural networks-based tex-
way the data is distributed in the database. In the case
ture image processing algorithm for orange defects
of more different values for different weather conditions,
classification, International Journal of Computer
this algorithm could become much more efficient.
Science and Applications 13 (2016) 45–60.
[12] N. Brandizzi, S. Russo, R. Brociek, A. Wajda, First
References studies to apply the theory of mind theory to green
and smart mobility by using gaussian area cluster-
[1] Y. Li, W. Dong, Q. Yang, S. Jiang, X. Ni, J. Liu, Auto- ing, volume 3118, CEUR-WS, 2021, pp. 71–76.
matic impedance matching method with adaptive [13] D. Yu, C. P. Chen, Smooth transition in communica-
network based fuzzy inference system for wpt, IEEE tion for swarm control with formation change, IEEE
Transactions on Industrial Informatics 16 (2019) Transactions on Industrial Informatics 16 (2020)
1076–1085. 6962–6971.
[2] J. Yi, J. Bai, W. Zhou, H. He, L. Yao, Operating [14] C. Napoli, G. Pappalardo, E. Tramontana, A
parameters optimization for the aluminum electrol- hybrid neuro-wavelet predictor for qos control
ysis process using an improved quantum-behaved and stability, Lecture Notes in Computer Sci-
particle swarm algorithm, IEEE Transactions on ence (including subseries Lecture Notes in Arti-
Industrial Informatics 14 (2017) 3405–3415. ficial Intelligence and Lecture Notes in Bioinfor-
[3] J. W. W. L. Z. B. Wei Dong, Marcin Woźniak, De- matics) 8249 LNAI (2013) 527–538. doi:10.1007/
noising aggregation of graph neural networks by 978-3-319-03524-6_45.
using principal component analysis, IEEE Transac- [15] Y. Zhang, S. Cheng, Y. Shi, D.-w. Gong, X. Zhao,
tions on Industrial Informatics (2022). Cost-sensitive feature selection using two-archive
[4] N. Brandizzi, V. Bianco, G. Castro, S. Russo, A. Wa- multi-objective artificial bee colony algorithm, Ex-
jda, Automatic rgb inference based on facial emo- pert Systems with Applications 137 (2019) 46–58.
tion recognition, in: CEUR Workshop Proceedings, [16] M. Ren, Y. Song, W. Chu, An improved locally
volume 3092, CEUR-WS, 2021, pp. 66–74. weighted pls based on particle swarm optimization
for industrial soft sensor modeling, Sensors 19

74
Agnieszka Lutecka et al. CEUR Workshop Proceedings 64–75

(2019) 4099.
[17] B. Nowak, R. Nowicki, M. Woźniak, C. Napoli,
Multi-class nearest neighbour classifier for
incomplete data handling, in: Lecture Notes
in Artificial Intelligence (Subseries of Lec-
ture Notes in Computer Science), volume
9119, Springer Verlag, 2015, pp. 469–480.
doi:10.1007/978-3-319-19324-3_42.
[18] V. S. Dhaka, S. V. Meena, G. Rani, D. Sinwar, M. F.
Ijaz, M. Woźniak, A survey of deep convolutional
neural networks applied for prediction of plant leaf
diseases, Sensors 21 (2021) 4749.
[19] R. Brociek, G. Magistris, F. Cardia, F. Coppa,
S. Russo, Contagion prevention of covid-19 by
means of touch detection for retail stores, in: CEUR
Workshop Proceedings, volume 3092, CEUR-WS,
2021, pp. 89–94.
[20] N. Dat, V. Ponzi, S. Russo, F. Vincelli, Supporting
impaired people with a following robotic assistant
by means of end-to-end visual target navigation
and reinforcement learning approaches, in: CEUR
Workshop Proceedings, volume 3118, CEUR-WS,
2021, pp. 51–63.
[21] M. Woźniak, M. Wieczorek, J. Siłka, D. Połap, Body
pose prediction based on motion sensor data and
recurrent neural network, IEEE Transactions on
Industrial Informatics 17 (2020) 2101–2111.
[22] G. Capizzi, F. Bonanno, C. Napoli, A wavelet
based prediction of wind and solar energy for long-
term simulation of integrated generation systems,
in: SPEEDAM 2010 - International Symposium
on Power Electronics, Electrical Drives, Automa-
tion and Motion, 2010, pp. 586–592. doi:10.1109/
SPEEDAM.2010.5542259.
[23] G. Capizzi, G. Lo Sciuto, C. Napoli, M. Woźniak,
G. Susi, A spiking neural network-based long-term
prediction system for biogas production, Neural
Networks 129 (2020) 271 – 279.
[24] G. Capizzi, G. Lo Sciuto, C. Napoli, E. Tramontana,
An advanced neural network based solution to en-
force dispatch continuity in smart grids, Applied
Soft Computing Journal 62 (2018) 768 – 775.
[25] O. Dehzangi, et al., Imu-based gait recognition
using convolutional neural networks and multi-
sensor fusion, Sensors 17 (2017) 2735.
[26] G. Capizzi, F. Bonanno, C. Napoli, Hybrid neu-
ral networks architectures for soc and voltage pre-
diction of new generation batteries storage, in:
3rd International Conference on Clean Electrical
Power: Renewable Energy Resources Impact, IC-
CEP 2011, 2011, pp. 341–344. doi:10.1109/ICCEP.
2011.6036301.
[27] H. G. Hong, M. B. Lee, K. R. Park, Convolutional
neural network-based finger-vein recognition using
nir image sensors, Sensors 17 (2017) 1297.

06 - NaiveBayes and ME
No ratings yet
06 - NaiveBayes and ME
26 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
Bayesian Learning
No ratings yet
Bayesian Learning
41 pages
Pattern Recognition
No ratings yet
Pattern Recognition
76 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
2 Naive Bayes
No ratings yet
2 Naive Bayes
49 pages
Bayes Classifier
No ratings yet
Bayes Classifier
35 pages
Unit-4 Naïve Bayes & Support Vector Machine
No ratings yet
Unit-4 Naïve Bayes & Support Vector Machine
79 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Statistical Perspective
No ratings yet
Statistical Perspective
85 pages
Naive Bayes
No ratings yet
Naive Bayes
6 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
BSC ML CH2
No ratings yet
BSC ML CH2
79 pages
CS-DM Module-4
No ratings yet
CS-DM Module-4
22 pages
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
100% (3)
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
11 pages
ML Lecture#5
No ratings yet
ML Lecture#5
65 pages
Lesson 6.0 Supervised Learning With Naive Bayes Classifiers
No ratings yet
Lesson 6.0 Supervised Learning With Naive Bayes Classifiers
13 pages
Bayesian Classifier Notes
No ratings yet
Bayesian Classifier Notes
9 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Bayes Classifier PDF
100% (1)
Bayes Classifier PDF
18 pages
Nave Bayes Algorithms
No ratings yet
Nave Bayes Algorithms
15 pages
WK 08
No ratings yet
WK 08
10 pages
Machine Learning Models and Theories
No ratings yet
Machine Learning Models and Theories
38 pages
Lecture - 4.2 - Continuous Data and Zero Frequency Problem in Naive Bayes Classifier
No ratings yet
Lecture - 4.2 - Continuous Data and Zero Frequency Problem in Naive Bayes Classifier
11 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
Module05 - Bayesian Reasoning
No ratings yet
Module05 - Bayesian Reasoning
37 pages
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
No ratings yet
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
18 pages
D3 It Naive Bayes
No ratings yet
D3 It Naive Bayes
24 pages
Naivebayes Tute
No ratings yet
Naivebayes Tute
4 pages
Week 4 - Classification Alternative Techniques
No ratings yet
Week 4 - Classification Alternative Techniques
87 pages
AI 02 Naive Bayes
No ratings yet
AI 02 Naive Bayes
9 pages
Classification (Naive Bayes)
No ratings yet
Classification (Naive Bayes)
40 pages
Naive Bayes
No ratings yet
Naive Bayes
21 pages
Naive Bayes Classification of Uncertain Data: 2009 Ninth IEEE International Conference On Data Mining
No ratings yet
Naive Bayes Classification of Uncertain Data: 2009 Ninth IEEE International Conference On Data Mining
6 pages
Simple Learning Algorithms: Jiming Peng, Advol, Cas, Mcmaster 1
No ratings yet
Simple Learning Algorithms: Jiming Peng, Advol, Cas, Mcmaster 1
41 pages
Chapter 4
No ratings yet
Chapter 4
22 pages
Slide 3
No ratings yet
Slide 3
23 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
15 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Bayesian Approaches: Data Mining Selected Technique
No ratings yet
Bayesian Approaches: Data Mining Selected Technique
30 pages
Unit II AI
No ratings yet
Unit II AI
43 pages
ML 4
No ratings yet
ML 4
50 pages
Pattern Recognition - Lec02
No ratings yet
Pattern Recognition - Lec02
44 pages
Cours #5 - Naive Bayes Classification
No ratings yet
Cours #5 - Naive Bayes Classification
18 pages
Essentials of Bayesian Inference 1706204646
No ratings yet
Essentials of Bayesian Inference 1706204646
21 pages
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
No ratings yet
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
12 pages
Pattern Reco Tutorial
No ratings yet
Pattern Reco Tutorial
13 pages
Ex - No.5 - Naïve Bayesian Classifier
No ratings yet
Ex - No.5 - Naïve Bayesian Classifier
4 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Baye's Theorem - Example
No ratings yet
Baye's Theorem - Example
7 pages
IML Module 3
No ratings yet
IML Module 3
95 pages
ML Lec 15 Naive Bayes
No ratings yet
ML Lec 15 Naive Bayes
16 pages
Naive Bayes
No ratings yet
Naive Bayes
32 pages
Classification With NaiveBayes
No ratings yet
Classification With NaiveBayes
19 pages
What Is Naive Bayes Algorithm
No ratings yet
What Is Naive Bayes Algorithm
10 pages
9 Supervised Learning - II
No ratings yet
9 Supervised Learning - II
55 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Memsic 2125 Accel Guide v2.1
No ratings yet
Memsic 2125 Accel Guide v2.1
3 pages
Curriculum Reform
No ratings yet
Curriculum Reform
27 pages
025 QHSEC SOP Manual Handling
No ratings yet
025 QHSEC SOP Manual Handling
4 pages
Lit - Ch01 - Kimmel Et Al. 2013 - Ch13-2
No ratings yet
Lit - Ch01 - Kimmel Et Al. 2013 - Ch13-2
28 pages
Muhaba Research Proposal 20211
No ratings yet
Muhaba Research Proposal 20211
84 pages
(Ebook PDF) The Business Communication Handbook 11Th Edition Download
No ratings yet
(Ebook PDF) The Business Communication Handbook 11Th Edition Download
53 pages
100% Online: MSC Project Management Offered in Exclusive Partnership With Robert Kennedy College
No ratings yet
100% Online: MSC Project Management Offered in Exclusive Partnership With Robert Kennedy College
7 pages
Green Orientation
No ratings yet
Green Orientation
18 pages
Division 2 League Adm - Letter Mitunguu
No ratings yet
Division 2 League Adm - Letter Mitunguu
1 page
Dharavi - A City Within A City
No ratings yet
Dharavi - A City Within A City
2 pages
Letter Gothic STD
No ratings yet
Letter Gothic STD
3 pages
PQ PDF
No ratings yet
PQ PDF
74 pages
Tutorial 1
No ratings yet
Tutorial 1
8 pages
Solar Power and Solar Inverter Data
No ratings yet
Solar Power and Solar Inverter Data
6 pages
Asnt NDT Level Iii Basic Requirements
59% (17)
Asnt NDT Level Iii Basic Requirements
2 pages
eTranscriptFree
No ratings yet
eTranscriptFree
3 pages
Ta d4 MRKG
No ratings yet
Ta d4 MRKG
79 pages
Work Psychology Understanding Human Behaviour in The Workplace 4th Edition Joanne Silvester Ebook All Chapters PDF
100% (2)
Work Psychology Understanding Human Behaviour in The Workplace 4th Edition Joanne Silvester Ebook All Chapters PDF
41 pages
Local Link Portlaoise To Roscrea Bus Timetable
No ratings yet
Local Link Portlaoise To Roscrea Bus Timetable
2 pages
International Student Handbook: WWW - Tarc.edu - My
No ratings yet
International Student Handbook: WWW - Tarc.edu - My
30 pages
Course Work of PHD
100% (1)
Course Work of PHD
8 pages
Coa Lecture Unit 3 Pipelining
No ratings yet
Coa Lecture Unit 3 Pipelining
95 pages
Class 12 Macro Economics Mind Map Chapter - 4 Determination of Income and Employment
No ratings yet
Class 12 Macro Economics Mind Map Chapter - 4 Determination of Income and Employment
24 pages
Multiplying and Dividing Integers 4x4 Puzzle: Math Made Fun! Great For Formative Assessments!
No ratings yet
Multiplying and Dividing Integers 4x4 Puzzle: Math Made Fun! Great For Formative Assessments!
5 pages
Exercises On Exception Handling
No ratings yet
Exercises On Exception Handling
6 pages
Sal Proj Statement r3
No ratings yet
Sal Proj Statement r3
86 pages
Problem - 1739D - Codeforces
No ratings yet
Problem - 1739D - Codeforces
2 pages
CEO & Corporate Finance
No ratings yet
CEO & Corporate Finance
10 pages
Saudi Jobs
No ratings yet
Saudi Jobs
18 pages
Peter Velikov Petrov: Personal Details
No ratings yet
Peter Velikov Petrov: Personal Details
2 pages