0% found this document useful (0 votes)

54 views

Data Mining in Bioinformatics

The document discusses using data mining techniques like association rule mining to analyze biomedical data. It describes loading biomarker data into the WEKA tool and performing preprocessing steps like discretization and attribute filtering. The Apriori algorithm is then used to generate association rules between biomarkers and extract patterns from the data.

Uploaded by

keerthanpai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

Data Mining in Bioinformatics

Uploaded by

keerthanpai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 21

DATA MINING IN BIOINFORMATICS

Page |2

INDEX

I.
II.

AbstractPage 2 Introduction Page 3 Overview..Page 4 Implementation Using WEKA...Page 5 Conclusion..Page 22 Acknowledgement.Page 23

III.
IV. V.

VI.

M.S.R.I.T Deptt. Of Information Science & Engineering

Page |3

ABSTRACT The field of bioinformatics , always generates a huge amount of data. To study such volumes of data ,data mining techniques needs to be used. The major data generated in this field falls in the below category: (1) understanding the gene sequencing , ie comparing differing genes of the same species (2) investigating data analysis approaches with the purpose of identifying promising methods pertinent to human health aspects and (3) Studying the different diseases associated with humans and also study their characteristics .

M.S.R.I.T Deptt. Of Information Science & Engineering

Page |4

INTRODUCTION Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software engineering, data mining, image processing, modeling and simulation, signal processing, discrete mathematics, control and system theory, circuit theory, and statistics, for generating new knowledge of biology and medicine, and improving & discovering new models of computation (e.g. DNA computing, neural computing, evolutionary computing, immuno-computing, swarm-computing, cellular-computing). A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. A simple database might be a single file containing many records, each of which includes the same set of information. For example, a record associated with a nucleotide sequence database typically contains information such as contact name, the input sequence with a description of the type of molecule, the scientific name of the source organism from which it was isolated, and often, literature citations associated with the sequence. For researchers to benefit from the data stored in a database, two additional requirements must be met: easy access to the information a method for extracting only that information needed to answer a specific biological question

M.S.R.I.T Deptt. Of Information Science & Engineering

Page |5

OVERVIEW

Data mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into business intelligence giving an informational advantage. It is currently used in a wide range of profiling practices, such as marketing, surveillance, fraud detection, and scientific discovery. The related terms data dredging, data fishing and data snooping refer to the use of data mining techniques to sample portions of the larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered (see also data-snooping bias). These techniques can, however, be used in the creation of new hypotheses to test against the larger data populations. . DATA MINING ARCHITECTURE The data mining process consists of several processes and stages, which are related to each other and interactive. The main stages of the data mining process are (1) domain understanding; (2) data selection; (3) cleaning and preprocessing; (4) discovering patters; (5)interpretation; (6) reporting and using discovered knowledge
M.S.R.I.T Deptt. Of Information Science & Engineering

Page |6

Implementation using WEKA Tool

Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. WEKA is free software available under the GNU General Public License. Weka supports several standard data mining tasks, more specifically, data preprocessing, clustering, classification, regression, visualization, and feature selection. All of Weka's techniques are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes. Problem Statement:- We have considered a single multimedia schema describing video-library. The application should allow multiple movie makers working simultaneously to store, remove and manipulate different kinds of multimedia data we assume that some material is gathered from database. This application helps the manager of a a video library to group the customers according to the purchase language, rating, cast etc Working 1. Loading The Data After weka 3.6 has been installed we launch the explorer application of weka. Now we need to load the dataset we have created as a .csv extension. Click on choose file and then change the file type to .csv and browse to the desired location and select the file. It is as shown below in the figure.

M.S.R.I.T Deptt. Of Information Science & Engineering

Page |7

M.S.R.I.T Deptt. Of Information Science & Engineering

Page |8

Our dataset contain the following attributes

1. lymphatics 2. block_of_affere 3. bl_of_lymph_c 4. bl_of_lymph_s 5. by_pass 6. extravasates 7. regeneration_of 8. early_uptake_in 9. lym_nodes_dimin 10.lym_nodes_enlar 11.changes_in_lym 12.defect_in_node 13.changes_in_node 14.changes_in_stru 15.special_forms 16.dislocation_of 17.exclusion_of_no 18.no_of_nodes_in 19.class

M.S.R.I.T Deptt. Of Information Science & Engineering

Page |9

2. Basic Statistics Once the data set has been loaded Weka will recognize the attributes and during the scan of the data will compute some basic statistics on each attribute. The left panel in Figure below shows the list of recognized attributes, while the top panels indicate the names of the base relation (or table) and the current working relation

Clicking on any attribute in the left panel will show the basic statistics on that attribute. For categorical attributes, the frequency for each attribute value is shown, while for continuous attributes we can obtain min, max, mean, standard deviation, etc. The figure below illustrates the same. It shows the type of attribute be it numeric, nominal etc. for nominal attribute Lead shown below it tells us the number of distinct values and also lists the number of occurrences along with the values for each attribute

M.S.R.I.T Deptt. Of Information Science & Engineering

P a g e | 10

The visualization graphs shown below is for all attributes

M.S.R.I.T Deptt. Of Information Science & Engineering

P a g e | 11

3.Selecting and Filtering Data In our sample data file, each record is uniquely identified by a Flight no. We need to remove this attribute before the data mining step. We can do this by using the Attribute filters in WEKA. In the "Filter" panel, click on the "Choose" button. This will show a popup window with list available filters. Expand the filters, then expand unsupervised, then expand attributes and select NumericToNominal filter from that. It is as shown below in the figure.

M.S.R.I.T Deptt. Of Information Science & Engineering

P a g e | 12

After this click on text box immediately to the right of the "Choose" button.This will conert the data into nominal as required to run apriori algorithm. Then click "OK". It is as illustrated below in the figure.

3. Discretization Some techniques, such as association rule mining, can only be performed on categorical data. This requires performing discretization on numeric or continuous attributes. There are many such attributes in this data set no_of nodes is an Integer we discretize it . In this case, we have opted for keeping all of these values in the data. This means we can simply discretize by removing the keyword integer as the type for the "no_of_nodes" attribute in the ARFF file, and replacing it with the set of discrete values. We do this directly by opening the lymph.arff file in Gedit as shown below in the figure.

M.S.R.I.T Deptt. Of Information Science & Engineering

P a g e | 13

Then select Discretize from the filters menu

M.S.R.I.T Deptt. Of Information Science & Engineering

P a g e | 14

Now select the attributes to be discretized which are 1,2,4,5,6,9 here and select bins =4. The discretized values are shown

M.S.R.I.T Deptt. Of Information Science & Engineering

P a g e | 15 4. Association

Mining

Now that all the attributes have been discretized we can perform association mining on the dataset. The most commonly used algorithm is the apriori which we will also be using. Go to the associate tab in weka. In that tab click on choose and select apriori as the associator. Then click on the textbox next to the choose button. A dialog box appears. Here we change the default value of rules to 20, this indicates that the program will report no more than the top 20 rules. The upper bound for minimum support is set to 1.0 (100%) and the lower bound to 0.1 (10%). Apriori in WEKA starts with the upper bound support and incrementally decreases support (by delta increments which by default is set to 0.05 or 5%). The algorithm halts when either the specified number of rules are generated, or the lower bound for min. support is reached. The significance testing option is only applicable in the case of confidence and is by default not used (-1.0). The figure below shows the final dialog box for apriori. Set the car to true, classindex to 1 and also specify the number of rules in numRules.

M.S.R.I.T Deptt. Of Information Science & Engineering

P a g e | 16

Now we click ok and then click on start. The results are displayed as shown below

M.S.R.I.T Deptt. Of Information Science & Engineering

P a g e | 17

Visualization The relationship between attributes can be shown in terms of graphs by plotting tne X and Y coordinates of araph with the attributes between which relation should be visualized using visualize button in the top panel of the weka explorer By pressing the visualize button following screen is obtained

Consider we need to visualize relation between Status and Departure time . select the corresponding by clicking on the red square as shown above. The following screen appears showing the relation between status and departure time.

M.S.R.I.T Deptt. Of Information Science & Engineering

P a g e | 18

M.S.R.I.T Deptt. Of Information Science & Engineering

P a g e | 19

Conclusion Data Mining can be of great help in aviation. Various predictions can be made on the basis of the data collected which in turn can help in making the control and monitoring of air traffic and other related data very easy and convienient.

M.S.R.I.T Deptt. Of Information Science & Engineering

P a g e | 20

ACKNOWLEDGEMENT We would like to sincerely thank our teacher Prof. Sumanna, our HOD and all the faculty members. We would also like to thank our classmates and friends without whom this would not have been possible.

M.S.R.I.T Deptt. Of Information Science & Engineering

P a g e | 21

REFERENCES: Weka tool User guide By Waikato University www.ncbi.nlm.nih.gov http://repository.seasr.org/Datasets/UCI/arff/

M.S.R.I.T Deptt. Of Information Science & Engineering

CEH Cheatsheet
100% (3)
CEH Cheatsheet
5 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
DMW_LabFile_0901CS243D11_swastik
No ratings yet
DMW_LabFile_0901CS243D11_swastik
25 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
DMDV_210
No ratings yet
DMDV_210
63 pages
An Introduction To WEKA: Contributed by Yizhou Sun 2008
No ratings yet
An Introduction To WEKA: Contributed by Yizhou Sun 2008
85 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
dwm
No ratings yet
dwm
19 pages
An Introduction To WEKA: Contributed by Yizhou Sun 2008
No ratings yet
An Introduction To WEKA: Contributed by Yizhou Sun 2008
85 pages
DMDV_210
No ratings yet
DMDV_210
61 pages
STRT Abhay
No ratings yet
STRT Abhay
14 pages
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
No ratings yet
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
104 pages
Hemant STRT
No ratings yet
Hemant STRT
18 pages
Weka
No ratings yet
Weka
15 pages
An Introduction To WEKA: Contributed by Yizhou Sun 2008
No ratings yet
An Introduction To WEKA: Contributed by Yizhou Sun 2008
85 pages
dm-lab-manualiii-i-1-mrits
No ratings yet
dm-lab-manualiii-i-1-mrits
39 pages
BBA CA Semester III Manisha Madam
No ratings yet
BBA CA Semester III Manisha Madam
32 pages
Data Mining Lab Manual COMPLETE GMR
No ratings yet
Data Mining Lab Manual COMPLETE GMR
66 pages
Introduction To Weka
No ratings yet
Introduction To Weka
39 pages
Data Mining in Medicine
No ratings yet
Data Mining in Medicine
42 pages
Flood Prediction Analysis
No ratings yet
Flood Prediction Analysis
42 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
DWM1
No ratings yet
DWM1
19 pages
Census Data Mining and Data Analysis Using WEKA: Abstract
No ratings yet
Census Data Mining and Data Analysis Using WEKA: Abstract
6 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
Weka Tutorial
100% (2)
Weka Tutorial
60 pages
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
Data Mining Techniques Using WEKA: Vinod Gupta School of Management, Iit Kharagpur
No ratings yet
Data Mining Techniques Using WEKA: Vinod Gupta School of Management, Iit Kharagpur
17 pages
An Introduction To WEKA
No ratings yet
An Introduction To WEKA
85 pages
Data Warehouse Final Record
No ratings yet
Data Warehouse Final Record
55 pages
Bioinformatics: Applications Note
No ratings yet
Bioinformatics: Applications Note
3 pages
Deepak Dmbi File
No ratings yet
Deepak Dmbi File
40 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
itdw
No ratings yet
itdw
44 pages
DMW lab Print
No ratings yet
DMW lab Print
21 pages
9348 11568 1 PB Published Paper
No ratings yet
9348 11568 1 PB Published Paper
12 pages
Experiment No: 01 Data Exploration & Data Preprocessing
No ratings yet
Experiment No: 01 Data Exploration & Data Preprocessing
54 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
Comp 6838
No ratings yet
Comp 6838
41 pages
Ijiset V2 I2 63 PDF
No ratings yet
Ijiset V2 I2 63 PDF
9 pages
dw9exp1(1)
No ratings yet
dw9exp1(1)
43 pages
Main Steps For Doing Data Mining Project Using Weka: February 2016
No ratings yet
Main Steps For Doing Data Mining Project Using Weka: February 2016
20 pages
Teit Cbgs Dmbi Lab Manual FH 2015
No ratings yet
Teit Cbgs Dmbi Lab Manual FH 2015
60 pages
Data Mining Lab Manual Student_copy_for_print
No ratings yet
Data Mining Lab Manual Student_copy_for_print
24 pages
5 MIS510 Weka NetDraw
No ratings yet
5 MIS510 Weka NetDraw
33 pages
4-Data Preprocessing (Cleaning) and Exploration
No ratings yet
4-Data Preprocessing (Cleaning) and Exploration
54 pages
Data warehousing
No ratings yet
Data warehousing
54 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
From Everand
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Kim Chantala
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
From Everand
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
Zemelak Goraga
No ratings yet
STL File (Format For 3D Printing) - Explained in Simple Terms - All3DP
No ratings yet
STL File (Format For 3D Printing) - Explained in Simple Terms - All3DP
14 pages
BrowserSettings WebSDS
No ratings yet
BrowserSettings WebSDS
2 pages
Google Drive Is A
No ratings yet
Google Drive Is A
1 page
Top 200 Data Engineer Interview Question PDF
100% (4)
Top 200 Data Engineer Interview Question PDF
482 pages
Parts of A Motherboard
No ratings yet
Parts of A Motherboard
9 pages
Simatic NCM Manager
No ratings yet
Simatic NCM Manager
196 pages
Hwmo
No ratings yet
Hwmo
43 pages
ADT
No ratings yet
ADT
3 pages
Set Up Public-Key Authentication Between An OpenSSH Client and An OpenSSH Server
No ratings yet
Set Up Public-Key Authentication Between An OpenSSH Client and An OpenSSH Server
3 pages
Library Management System Project For MCA
No ratings yet
Library Management System Project For MCA
13 pages
Academic Hand Book M. Tech From Nit Goa
No ratings yet
Academic Hand Book M. Tech From Nit Goa
115 pages
Levenshtein
No ratings yet
Levenshtein
14 pages
Design and Analysis of Algorithms (Cse-It) - May-2014
No ratings yet
Design and Analysis of Algorithms (Cse-It) - May-2014
5 pages
Connecting Customized Ip To The Microblaze Soft Processor Using The Fast Simplex Link (FSL) Channel
No ratings yet
Connecting Customized Ip To The Microblaze Soft Processor Using The Fast Simplex Link (FSL) Channel
12 pages
Filetype PDF Computer Architecture
No ratings yet
Filetype PDF Computer Architecture
2 pages
Internship
No ratings yet
Internship
10 pages
PL/SQL Stands For Procedural
No ratings yet
PL/SQL Stands For Procedural
81 pages
Lab Manual Java Programming Lab
No ratings yet
Lab Manual Java Programming Lab
84 pages
Preboot Execution Environment (PXE) Specification
No ratings yet
Preboot Execution Environment (PXE) Specification
103 pages
Number System and Data Representation
No ratings yet
Number System and Data Representation
36 pages
Lecture12 Texture Mapping
No ratings yet
Lecture12 Texture Mapping
22 pages
Peterson Algorithm and Implementation of Algorithm
No ratings yet
Peterson Algorithm and Implementation of Algorithm
29 pages
Retrieving Information From The CrashinfoFile
No ratings yet
Retrieving Information From The CrashinfoFile
11 pages
SQL 2006
No ratings yet
SQL 2006
32 pages
Solaris
0% (1)
Solaris
62 pages
Semantic Networks
No ratings yet
Semantic Networks
2 pages
Request Letter For Computer
No ratings yet
Request Letter For Computer
3 pages
EL - 504 Numerical Methods - DC-II Maths
No ratings yet
EL - 504 Numerical Methods - DC-II Maths
3 pages
Project Report On Shootout Enemy
No ratings yet
Project Report On Shootout Enemy
63 pages

Data Mining in Bioinformatics

Uploaded by

Data Mining in Bioinformatics

Uploaded by

DATA MINING IN BIOINFORMATICS

AbstractPage 2 Introduction Page 3 Overview..Page 4 Implementation Using WEKA...Page 5 Conclusion..Page 22 Acknowledgement.Page 23

M.S.R.I.T Deptt. Of Information Science & Engineering

M.S.R.I.T Deptt. Of Information Science & Engineering

M.S.R.I.T Deptt. Of Information Science & Engineering

Implementation using WEKA Tool

M.S.R.I.T Deptt. Of Information Science & Engineering

M.S.R.I.T Deptt. Of Information Science & Engineering

Our dataset contain the following attributes

M.S.R.I.T Deptt. Of Information Science & Engineering

M.S.R.I.T Deptt. Of Information Science & Engineering

The visualization graphs shown below is for all attributes

M.S.R.I.T Deptt. Of Information Science & Engineering

M.S.R.I.T Deptt. Of Information Science & Engineering

M.S.R.I.T Deptt. Of Information Science & Engineering

Then select Discretize from the filters menu

M.S.R.I.T Deptt. Of Information Science & Engineering

M.S.R.I.T Deptt. Of Information Science & Engineering

M.S.R.I.T Deptt. Of Information Science & Engineering

M.S.R.I.T Deptt. Of Information Science & Engineering

M.S.R.I.T Deptt. Of Information Science & Engineering

M.S.R.I.T Deptt. Of Information Science & Engineering

M.S.R.I.T Deptt. Of Information Science & Engineering

M.S.R.I.T Deptt. Of Information Science & Engineering

REFERENCES: Weka tool User guide By Waikato University www.ncbi.nlm.nih.gov http://repository.seasr.org/Datasets/UCI/arff/

M.S.R.I.T Deptt. Of Information Science & Engineering

You might also like