Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Water Quality Related Data
Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Water Quality Related Data
AIM: Data Analytics using Weka for water quality related data.
THEORY:
WEKA
Weka is tried and tested open source machine learning software that can be
accessed through a graphical user interface, standard terminal applications, or a
Java API. It is widely used for teaching, research, and industrial applications,
contains a plethora of built-in tools for standard machine learning tasks, and
additionally gives transparent access to well-known toolboxes such as scikit-
learn, R, and Deeplearning4j.
WEKA: Weka (Waikato Environment for Knowledge Analysis) is a popular suite of
machine learning software written in Java, developed at the University of
Waikato, New Zealand. Weka is free software available under the GNU General
Public License. The Weka workbench contains a collection of visualization tools
and algorithms for data analysis and predictive modeling, together with graphical
user interfaces for easy access to this functionality
Weka is a collection of machine learning algorithms for solving real-world data
mining problems. It is written in Java and runs on almost any platform. The
algorithms can either be applied directly to a dataset or called from your own Java
code
The original non-Java version of Weka was a TCL/TK front-end to (mostly third-
party) modeling algorithms implemented in other programming languages, plus
data preprocessing utilities in C, and a Makefile-based system for running
machine learning experiments. This original version was primarily designed as a
tool for analyzing data from agricultural domains, but the more recent fully Java-
based version (Weka 3), for which development started in 1997, is now used in
many different application areas, in particular for educational purposes and
research.
Advantages of Weka include:
Free availability under the GNU General Public License
Portability, since it is fully implemented in the Java programming language
and thus runs on almost any modern computing platform
A comprehensive collection of data preprocessing and modeling techniques
Ease of use due to its graphical user interfaces
DESIGN AND PROCEDURE:
1) Download and install weka software in laptop and open it.
@RELATION iris
% 2. Sources:
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
6) In the following graph is showing the station code of water quality in that the
range is from 11-3330 and the mean is 2052 and stddev is 755
7) The following graph is showing the temperature of the water at different areas
of water dataset in that the range is from 0-33.8 and the mean is 25 and stddev is
4.2
8) In the following graph is showing the ph of water, in water dataset in that the
range is from 6.3-17.7 and the mean is 7.7 and stddev is 0.68
9) In the following graph is showing the nitratre_n of water, in water dataset in
that the range is from 0-45.4 and the mean is 1.3 and stddev is 2.8 and the mode
of nitrate_n is 0
OUTPUT:-
1) The following is showing the trees.j48 logic with 10 cross validation
classification used to study the data set for machine learning and to predict the
quality of water. The accuracy of this classification is 64.34% which is moderate
good.
2) The following is showing the trees.randomtree logic with 10 cross validation
classification used to study the data set for machine learning and to predict the
quality of water. The accuracy of this classification is 91.61% which is good.
3) The following is showing the random forest tree with 10 cross validation
classification used to study the data set for machine learning and to predict the
quality of water. The accuracy of this classification is 57.27% which is not good.
4) The following is showing the function linear regression with 10 cross validation
classification used to study the data set for machine learning and to predict the
quality of water. The accuracy of this classification is 79.48% which is moderate
good.
5) The following is showing the function Gaussian process with 10 cross validation
classification used to study the data set for machine learning and to predict the
quality of water. The accuracy of this classification is 85.46% which is good.
Result:
The following dataset of water analysis is analyzed and different classification
techniques are studied for machine learning process with the help of Weka.