0% found this document useful (0 votes)
113 views14 pages

Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Water Quality Related Data

1) The document describes an experiment using the Weka machine learning software to analyze a water quality dataset. 2) Weka was used to apply various classification techniques like J48 decision trees, random forests, linear regression, and Gaussian processes to predict water quality. 3) The random tree classification achieved the highest accuracy of 91.61% while random forest was the lowest at 57.27% based on 10-fold cross validation.

Uploaded by

sai manikanta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views14 pages

Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Water Quality Related Data

1) The document describes an experiment using the Weka machine learning software to analyze a water quality dataset. 2) Weka was used to apply various classification techniques like J48 decision trees, random forests, linear regression, and Gaussian processes to predict water quality. 3) The random tree classification achieved the highest accuracy of 91.61% while random forest was the lowest at 57.27% based on 10-fold cross validation.

Uploaded by

sai manikanta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Name: M.

SAIMANIKANTA Reg no: 18BEC1314


SLOT: L37+L38 FACLITY: - DR VELMATHI G

EXPERIMENT NUMBER: - 5 DATE: - 04/03/2021

IOT DOMAIN ANALYST- ECE3502

AIM: Data Analytics using Weka for water quality related data.

THEORY:
WEKA

The workbench for machine learning

Weka is tried and tested open source machine learning software that can be
accessed through a graphical user interface, standard terminal applications, or a
Java API. It is widely used for teaching, research, and industrial applications,
contains a plethora of built-in tools for standard machine learning tasks, and
additionally gives transparent access to well-known toolboxes such as scikit-
learn, R, and Deeplearning4j.
WEKA: Weka (Waikato Environment for Knowledge Analysis) is a popular suite of
machine learning software written in Java, developed at the University of
Waikato, New Zealand. Weka is free software available under the GNU General
Public License. The Weka workbench contains a collection of visualization tools
and algorithms for data analysis and predictive modeling, together with graphical
user interfaces for easy access to this functionality
Weka is a collection of machine learning algorithms for solving real-world data
mining problems. It is written in Java and runs on almost any platform. The
algorithms can either be applied directly to a dataset or called from your own Java
code
The original non-Java version of Weka was a TCL/TK front-end to (mostly third-
party) modeling algorithms implemented in other programming languages, plus
data preprocessing utilities in C, and a Makefile-based system for running
machine learning experiments. This original version was primarily designed as a
tool for analyzing data from agricultural domains, but the more recent fully Java-
based version (Weka 3), for which development started in 1997, is now used in
many different application areas, in particular for educational purposes and
research.
Advantages of Weka include:
 Free availability under the GNU General Public License
 Portability, since it is fully implemented in the Java programming language
and thus runs on almost any modern computing platform
 A comprehensive collection of data preprocessing and modeling techniques
 Ease of use due to its graphical user interfaces
DESIGN AND PROCEDURE:
1) Download and install weka software in laptop and open it.

2) Open a new explorer in weka


3) Now download dataset from interent.

4) Here we take a csv file

In that we make the changes like

We deleted the unwanted rows form the data set

We can exchange the rows form the dataset etc.

5) Now open the data set csv file through notepad.

And make them in the correct format for weka.

(This is the basic format)

% 1. Title: Iris Plants Database

@RELATION iris
% 2. Sources:

@ATTRIBUTE sepallength NUMERIC


@ATTRIBUTE sepalwidth NUMERIC
@ATTRIBUTE petallength NUMERIC
@ATTRIBUTE petalwidth NUMERIC
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}

The Data of the ARFF file looks like the following:

@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa

After that save file the .arff format.

6) In the following graph is showing the station code of water quality in that the
range is from 11-3330 and the mean is 2052 and stddev is 755
7) The following graph is showing the temperature of the water at different areas
of water dataset in that the range is from 0-33.8 and the mean is 25 and stddev is
4.2
8) In the following graph is showing the ph of water, in water dataset in that the
range is from 6.3-17.7 and the mean is 7.7 and stddev is 0.68
9) In the following graph is showing the nitratre_n of water, in water dataset in
that the range is from 0-45.4 and the mean is 1.3 and stddev is 2.8 and the mode
of nitrate_n is 0
OUTPUT:-
1) The following is showing the trees.j48 logic with 10 cross validation
classification used to study the data set for machine learning and to predict the
quality of water. The accuracy of this classification is 64.34% which is moderate
good.
2) The following is showing the trees.randomtree logic with 10 cross validation
classification used to study the data set for machine learning and to predict the
quality of water. The accuracy of this classification is 91.61% which is good.
3) The following is showing the random forest tree with 10 cross validation
classification used to study the data set for machine learning and to predict the
quality of water. The accuracy of this classification is 57.27% which is not good.
4) The following is showing the function linear regression with 10 cross validation
classification used to study the data set for machine learning and to predict the
quality of water. The accuracy of this classification is 79.48% which is moderate
good.
5) The following is showing the function Gaussian process with 10 cross validation
classification used to study the data set for machine learning and to predict the
quality of water. The accuracy of this classification is 85.46% which is good.

Result:
The following dataset of water analysis is analyzed and different classification
techniques are studied for machine learning process with the help of Weka.

You might also like