DHW Lab (Ex1 To 3)
DHW Lab (Ex1 To 3)
Found only on the islands of New Zealand, the Weka is a flightless bird with an
inquisitive nature. The name is pronounced like this, and the bird sounds like this.
Weka is open source software issued under the GNU General Public License.
Knowledge Flow: This environment supports essentially the same functions as the
Explorer but with a drag and drop interface. One advantage is that it supports
incremental learning. Simple CLI: Provides a simple command line interface that
allows direct execution of WEKA commands for operating systems that do not
provide their own command line interface.
Navigate the options available in the WEKA (ex. Select attributes panel,
Preprocess panel,classify panel, Cluster panel, Associate panel and Visualize
panel)
When the Explorer is first started only the first tab is active. This is because it is
necessary to open a data set before starting to explore the data.
The tabs are as follows:
Loading Data: The first four buttons at the top of the preprocess section enable you
to load data into WEKA:
Open file.... Brings up a dialog box allowing you to browse for the data file
on the local file system.
Open URL Asks for a Uniform Resource Locator address for where the data
is stored.
Open DB Reads data from a database.
Generate. Enables you to generate artificial data from a variety of Data
Generators.
Using the Open file ... button you can read files in a variety of formats:
WEKA‘s ARFF format, CSV format, C4.5 format, or serialized Instances format.
ARFF files typically have a .arff extension, CSV files a .csv extension, C4.5 files a
.data and .names extension, and serialized Instances objects a .bsi extension.
2. Classification:
Selecting a Classifier: At the top of the classify section is the Classifier box. This
box has a text field that gives the name of the currently selected classifier, and its
options. Clicking on the text box with the left mouse button brings up a Generic
Object Editor dialog box, just the same as for filters that you can use to configure
the options of the current classifier. With a right click (or Alt+Shift+left click) you
can once again copy the setup string to the clipboard or display the properties in a
Generic Object Editor dialog box. The Choose button allows you to choose on4eof
the classifiers that are available in WEKA.
Test Options: The result of applying the chosen classifier will be tested according
to the options that are set by clicking in the Test options box.
3. Clustering:
Cluster Modes: The Cluster mode box is used to choose what to cluster and how to
evaluate the results. The first three options are the same as for classification: Use
training set, Supplied test set and Percentage split.
4. Associating:
Setting Up: This panel contains schemes for learning association rules, and the
learners are chosen and configured in the same way as the clusterers, filters, and
classifiers in the other panels.
5. Selecting Attributes:
Searching and Evaluating: Attribute selection involves searching through all
possible combinations of attributes in the data to find which subset of attributes
works best for prediction. To do this, two objects must be set up: an attribute
evaluator and a search method. The evaluator determines what method is used to
assign a worth to each subset of attributes. The search method determines what style
of search is performed.
6. Visualizing:
Description:
Data preprocessing is a critical step in data integration and exploration. You may need to clean,
transform,and preprocess your data to make it suitable for machine learning. Here are some
common preprocessing
Feature Selection: WEKA provides various feature selection methods to choose the most
relevant features for your machine learning model.
Data Transformation: You can use filters like "Normalize" or "Standardize" to scale your
features. This ensures that all features are on the same scale, which can be important for many
machine learning algorithms.
Data Discretization: If you have continuous variables, you may want to discretize them into
bins using filters like "Discretize."
Step 4: Integration
Integration often involves combining data from multiple sources. In WEKA, you can load
multiple datasets and merge, append them using the "Merge Two Files" filter or use external
tools to combine data before loading it into WEKA.
Example Scenario:
Let's consider two datasets: one containing information about customers (StudentDetails.csv) and
another containing their purchase history (WeatherDetails.csv). Now integrate these datasets.
Preprocess each dataset separately to handle missing values, feature selection, and transformation.
Use the "Merge Two Files" and Append Two Files filter in WEKA. Once the datasets are
integrated, proceed with clustering or classification tasks to segment.
PROCEDURE 1:
1. Open the weka tool
5. Select the outlook attribute and observe the attributes and the charts
PROCEDURE 2:
STEP 1: Create a new dataset Sample1.arff
@relation sample2
@attribute col1
@attribute col2
@attribute result {Yes, No}
@data
30, 40, Yes
40, 50, No
RESULT:
Thus, the data exploration and integration using weka has been completed successfully.
Ex:No:2 Apply Weka tool for data validation
Aim:
Description
Procedure 1:
Procedure 2:
8. Click start button and compare the training and test results.
Result:
Thus Data validation using Weka tool has been completed successfully.
Plan the architecture for Real time application
AIM:
To plan the Web Services based Real time Data Warehouse Architecture
Procedure:
Data Sources: These are the systems or applications where the raw data originates
from. They could include operational databases, external APIs, logs, etc.
Web Service Clients (WS Client): These components are responsible for extracting
data changes from the data sources using techniques such as Change Data Capture
(CDC) and sending them to the web service provider. They make use of web service
calls to transmit data.
Web Service Provider: The web service provider receives data from the clients and
processes them for further integration into the real-time data warehouse. It
decomposes the received data, performs necessary transformations, generates SQL
statements, and interacts with the data warehouse for insertion.
This is a web service that receives data from the WS Client and adds it to the Real-
Time Partition. It decomposes the received Data Transfer Object into data and
metadata. It then uses metadata to generate SQL via an SQL-Generator to insert the
data into RTDW log tables and executes the generated SQL on the RTDW database.
Metadata: Metadata describes the structure and characteristics of the data. In this
context, it's used by the Web Service Provider to generate SQL for inserting data
into RTDW log tables.In a web services-based architecture, metadata plays a crucial
role in understanding data formats, schemas, and transformations. It is often
managed centrally to ensure consistency across the system.
ETL (Extract, Transform, Load): ETL processes are employed to collect data
from various sources, transform it into a consistent format, and load it into the data
warehouse. In a real-time context, this process may involve continuous or near real-
time transformations to ensure that data is available for analysis without significant
delays.
Query Interface: Users interact with the system through a query interface, which
could be a web-based dashboard, API endpoints, or other client applications. The
query interface allows users to retrieve and analyze data stored in the data
warehouse, including both historical and real-time data.
Result:
Thus the web services based real time data warehouse application has been studied successfully