0% found this document useful (0 votes)
7 views

Assign 3 Datamining

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Assign 3 Datamining

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Datamining and Datasets

Assignment 3A Date 19/7/24

Based on Example_1 data and code given, find multivariate multioutput regression coefficients
(weights) for the data on 'insects' and 'weather' . Convert all categorical variables to one-hot
multivariate data and do required regression. This is a coding assignment. Report the training
accuracy.

This type of coding you can expect in forthcoming online tests.

If any clarification is required contact your Maths/ML faculty.

Submit on 22.7.24 in Matlab livescript format.

Example_1.

Datamining ?. Seek Regularity/pattern in Data

1. For all the patients with 'BP' is 'High' , Drug 'A' is assigned

1
Datamining Seek Regularity/pattern in Data

2. For All the patient with 'BP' is 'Low' ; Drug 'B' is assigned

Seek Regularity/pattern in Data

3. For Patients with 'BP' is 'Normal', both Drug 'A' and 'B' is assigned

but

a) if 'BP' is 'Normal' and 'Age' is less than 40; Drug 'A' is assigned

b) if 'BP' is 'Normal' and 'Age' is greater than 40; Drug 'B' is assigned

2
Replacing Decision trees with multivariate multioutput
regression

% Medical data with one hot Presentation of categorical variables


% 'Gender' Male= [1 0] ; Female =[0 1]
% 'BP' Normal=[ 1 0 0] , High =[ 0 1 0] , Low=[0 0 1]
% Target variable Drug A= [ 1 0]; Drug B =[0 1]
% 'Age' variable is normalized between 0 and 1
A=[ 1 0 20 1 0 0 ;
0 1 73 1 0 0;
1 0 37 1 1 0;
1 0 33 0 0 1;
0 1 48 1 0 0;
1 0 29 1 0 0;
0 1 52 1 0 0;
1 0 42 0 0 1;
1 0 61 1 0 0;

3
0 1 30 1 0 0;
0 1 26 0 0 1;
1 0 54 1 0 0;]

A = 12×6
1 0 20 1 0 0
0 1 73 1 0 0
1 0 37 1 1 0
1 0 33 0 0 1
0 1 48 1 0 0
1 0 29 1 0 0
0 1 52 1 0 0
1 0 42 0 0 1
1 0 61 1 0 0
0 1 30 1 0 0

y = [1 0; 0 1; 1 0; 0 1; 1 0; 1 0; 0 1; 0 1; 0 1; 1 0; 0 1; 1 0];
% yd=Target variable in terms of position where 1 is put.
% This is for easy comparision with predicted output
yd= [1 2 1 2 1 1 2 2 2 1 2 1];
x=A(:,3);
xmin=15;
xmax=80;
x=(x-xmin)./xmax; % make age variable values between 0 and 1
A(:,3)=x;
% learned weight =regression coefficients; two columns
w=pinv(A)*y

w = 6×2
0.4451 0.0549
0.3445 0.1555
-1.5679 1.5679
0.8353 -0.3353
0.1508 -0.1508
-0.0457 0.5457

D=A*pinv(A)*y;
[val,index]=max(D'); % max operate in columnwise. Hence Transpose
% check whether model is working or not
% 'index' gives predicted class label
index

index = 1×12
1 2 1 2 1 1 2 2 2 1 2 1

yd

yd = 1×12
1 2 1 2 1 1 2 2 2 1 2 1

Data on Insects

4
5
Weather data

6
Dataset for Practice
https://waikato.github.io/weka-wiki/datasets/

•A gzip'edtar containing ordinal, real-world datasets donated by Professor Arie Ben David (datasets-
arie_ben_david.tar.gz, 11,348 Bytes)

•A zip file containing 19 multi-class (1-of-n) text datasets donated by Dr George Forman
(19MclassTextWc.zip, 14,084,828 Bytes)

•A bzip'edtar file containing the Reuters21578 dataset split into separate files according to the ModAptesplit
reuters21578-ModApte.tar.bz2, 81,745,032 Bytes

•A zip file containing 41 drug design datasets formed using the Adriana.Code software donated by Dr
Mehmet Fatih Amasyali (Drug-datasets.zip, 11,376,153 Bytes)

•A zip file containing 80 artificial datasets generated from the Friedman function donated by Dr.M.
FatihAmasyali(YildizTechnical Unversity) (Friedman-datasets.zip, 5,802,204 Bytes)

7
•A zip file containing a new, image-based version of the classic iris data, with 50 images for each of the
three species of iris. The images have size 600x600. Please see the ARFF file for further information
(iris_reloaded.zip, 92,267,000 Bytes). After expanding into a directory using your jar utility (or an archive
program that handles tar-archives/zip files in case of the gzip'edtars/zip files), these datasets may be used
with Weka.

•Protein datasets made available by Associate Professor Shuiwang Ji when he was a PhD student at
Louisiana State University.

•Kent Ridge Biomedical Data Set Repository, which was put together by Professor Jinyan Li and Dr Huiqing
Liu while they were at the Institute for Infocomm Research, Singapore.

•Repository for Epitope Datasets (RED), maintained by Professor Yasser El-Manzalawy when he was at
Iowa State University.

Video Classes on Datamining using WEKA


Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written
in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called
from your own Java code.

https://www.youtube.com/watch?time_continue=542&v=TF1yh5PKaqI&feature=emb_logo

Matlab and WEKA

https://in.mathworks.com/matlabcentral/fileexchange/33128-parallel-distributed-processing-of-weka-
algorithms-in-matlab

https://in.mathworks.com/matlabcentral/fileexchange/58675-wekalab-bridging-weka-and-matlab?
s_tid=FX_rc1_behav

https://forum.image.sc/t/running-trainable-weka-segmentation-from-matlab-using-imagej-matlab/3766/2

https://github.com/NicholasMcCarthy/wekalab

https://e-archivo.uc3m.es/rest/api/core/bitstreams/7e742952-cad2-4681-b0cc-cb86b14c9ae1/content

MLC Toolbox: A MATLAB/OCTAVE Library for Multi-Label Classification https://arxiv.org/pdf/1704.02592

https://medium.com/@j622amilah/exploring-the-java-weka-machine-learning-library-48e842b88307

https://blogs.mathworks.com/pick/2017/11/20/getelevations/

Class for freshers

C:\Users\soman\Desktop\General AI

Next

LAEigenSymbolic--Book References for DS

PinvforML -IIIT kottayam

8
LApart1

You might also like