BioInformatics Syllabus
BioInformatics Syllabus
The course in Bioinformatics (BI) is being introduced at various levels for the benefit of students aspiring
to become Bioinformatics Professionals. The Society has formulated the Scheme with the active
involvement of leading experts from reputed Institutions/organisations. Following levels of courses are
available under the Scheme:
Bioinformatics “O” level (BI-O) Assistant Programmers, Technical Assistants, Information Assistants
Bioinformatics “B” level (BI-B) Senior Programmers, Systems Analysts, Scientists, Research
Associates
Bioinformatics “C” level (BI-C) Research Scientist, Strategy Manager, Database Architect,
Consultant, Design specialist.
1
Bioinformatics “A” level (BI-A) Syllabus:
Semester-I
Semester-II
Exemptions: Students who have qualified BI-O-R0 would get exemption in the following subjects:-
BI-A1-R0
BI-A2-R0
BI-A3-R0
BI-A4-R0, If the candidate has qualified equivalent paper at BI-O level
(Programming and Problem solving through C)
PR-1
2
Bioinformatics “B” level (BI-B) Syllabus:
Semester II
6. B2.1 Introduction to Database and Web enabling technologies
7. B2.2 PERL/PYTHON Programming and applications to Bioinformatics
8. B2.3 Introduction to Object Oriented programming through JAVA
9. B2.4 Elements of protein Sequence, Structure and modelling
10. B2.5 Basics of Genomics and Proteomics
Semester III
11. B3.1 Computer Organization and Distributed computing
12. B3.2 Probability and Information theory
13. B3.3 Computational methods in Biomolecular sequence analysis
14. B3.4 Discrete Mathematics
Semester IV
15. B4.1 Statistical methods in Bioinformatics
16. B4.2 Biomoleculer Structure and Dynamics
17. B4.3 Data Structure and Algorithms
18. B4.4 Computational Genomics
Semester V
19. B5.1 Optimisation, Machine Learning and Computational Intelligence
20. B5.2 Object Technology for Bioinformatics
21. B5.3 Computational Proteomics and Gene Expression studies
Optional Course
22. B5.4.1 Computer aided Molecular Modeling and Drug Discovery
(OR)
B5.4.2 Chemoinformatics
Semester VI Project
Practicals:
Each course module has a practical component and the same will be carried out with software
recommended by DOEACC from time to time.
3
SEMESTER – I
4
BI-M3/A3/B1.1-R0: IT TOOLS AND APPLICATIONS
This course has been designed to provide an introduction to Information Technology and IT Tools. The
student will become IT literate, and will understand the basic IT terminology. The students will be able to
understand the role of Information Technology and more specifically computers, communication
technology and software in the present social and economic scenario.
Outline of Course
Minimum No. of
S. No. Topic
Hours
1. Computer Appreciation 04
2. Computer Organization 13
3. Operating System 13
4. Word Processing 10
5. Spreadsheet Package 10
6. Presentation Package 06
7. Information Technology and Society 04
Lectures = 60
Practical / Tutorials = 60
Total = 120
Detailed Syllabus
Characteristics of Computers, Input, Output, Storage units, CPU, Computer System, Binary
Number System Binary to Decimal Conversion, Decimal to Binary Conversion, Binary Coded
Decimal (BCD) Code, ASCII Code
Main Memory: Storage Evaluation Criteria, Memory Organization, capacity, RAM, Read
Only Memories. Secondary Storage Devices: Magnetic Disks, Floppy and Hard Disks,
Optical Disks CD-ROM, Mass Storage Devices.
5
2.3 Input Devices 2 Hrs.
Keyboard, Mouse, trackball, joystick, Scanner, OMR, Bar-code reader, MICR Digitizer,
Card Reader, Voice Recognition, web cam, video cameras.
Monitors, Printers – Dot matrix, inkjet, laser, plotters, Computer output Micro-Film (COM),
Multimedia Projector, speech synthesizer, dumb, smart and intelligent terminal.
Microsoft Windows
Linux
An overview of Linux , Basic Linux elements: System Features, Software Features. File
Structure,
File handling in Linux, Installation of Linux: H/W, S/W requirements, Preliminary steps
before installation, specifics on Hard drive repartitioning and booting a Linux System.
Word Processing concepts: Saving, Closing, Opening an existing document, Selecting text,
Editing text, Finding and replacing text, printing documents, Creating and Printing Merged
documents, Character and Paragraph Formatting, Page Design and Layout.
Editing and Proofing Tools: Checking and correcting spellings. Handling Graphics. Creating
Tables and Charts. Document Templates and Wizards*.
Spreadsheet Concepts. Creating, Saving and Editing a workbook, Inserting, Deleting work
Sheets, entering data in a cell/formula. Copying and Moving data from selected cells, Handling
6
operators, in Formulae, Functions: Mathematical. Logical, Statistical, Text, Financial, Data ad
Time Functions, Using Function wizard.
Creating, Opening and Saving Presentations, Creating the Look of Your Presentation, Working in
Different Views, Working with Slides, Adding and Formatting Text, Formatting Paragraphs,
Checking Spelling and Correcting Typing Mistakes, Making Notes Pages and Handouts, Drawing
and Working with Objects, Adding Clip Art and Other Pictures, Designing Slide Shows, Running
and Controlling a Slide Show, Printing Presentations*.
RECOMMENDED BOOKS
MAIN READING
1. P. K. Sinha and P. Sinha (2002), “Foundations of Computing”, First Edition, BPB Publication.
2. S. Sagman, “Microsoft Office 2000 for Windows”, Second Indian Print, 2000, Pearson
Education.
SUPPLEMENTARY READING
1. Turban, Mclean and Wetherbe, “Information Technology and Management”, Second Edition,
2001, John Wiley & Sons.
7
BI-A5/B1.2-R0: BASIC MATHEMATICS, PROBABILITY AND STATISTICS
Mathematical and statistical frameworks are being increasingly employed to understand and investigate
biological processes. These frameworks helps in analyzing vast amount of datasets generated from
genome and related projects. It is thus essential to introduce basic concepts of mathematics, probability
and statistics early within the Bioinformatics curriculum. This course will enable students to understand
and appreciate computational problems in proper perspective. In addition, this course will provide a
foundation for pursuing higher level courses in Computational Biology.
Outlines of Course
Lectures = 60
Practicals/ Tutorials = 60
Total = 120
Detailed Syllabus
3. Limits 4 Hours
Sequences
Limits of sequences
Series
Limits of functions
8
4. Binomial Theorem 2 Hours
n
Expanding (x+y)
Binomoal Coffiecients, Binomial Theorem
5. Calculus 12 Hours
a) Representation of data :
Discrete Data, Continuous data
Histogram, Polygons, Frequency Curves
b) The Mean,
Variability of data –
The standard deviation
c) Median, quantiles, percentile
d) Skewness
e) Box and Whisker diagrams (box plots)
9
10. Probability 8 hours
RECOMMENDED BOOKS
MAIN READING
1. H. Nell and D. Quadling, ‘Pure Mathematics (Advanced Level Mathematics)’. Vol. 1,2,3,
Cambridge University Press 2002.
rd
2. Edward Batschelet. ‘Introduction to Mathematics for Life Scientists’, 3 Edition, Springer – Verlag,
1992
th
3. J. Crawshaw and J. Chambers’, ‘Advanced level Statistics’, 4 Edition, Nelson Thornes, 2002
SUPPLEMENTARY READING
1. Sheldon Ross, ‘A first Course in Probability’, Sixth Edition, Pearson Education Asia, 2002
2. S. Dobbs and J. Millser, ‘Statistics (Advanced Level mathematics);, Cambridge University Press
2002
10
BI-M4.1/A4/B1.3-R0: PROGRAMMING AND PROBLEM SOLVING THROUGH ‘C’ LANGUAGE
The objectives of this course are to make the student understand programming language, programming,
concepts of Loops, reading a set of Data, stepwise refinement, Functions, Control structure, Arrays. After
completion of this course the student is expected to analyse the real life problem and write a program in
‘C’ language to solve the problem. The main emphasis of the course will be on problem solving aspect,
i.e. developing proper algorithms.
Outline of Course
Minimum No. of
S.No. Topic
Hours
1. Introduction to Programming 04
2. Algorithms for Problem solving 12
3. Introduction to ‘C’ language 04
4. Conditionals and loops 08
5. Arrays 06
6. Functions 06
7. Structures and Unions 06
8. Pointers 06
9. Self Referential Structures and Linked Lists 04
10. File Processing 04
Lectures = 60
Practicals/ Tutorials = 60
Total = 120
Detailed Syllabus
Exchanging values of two variables, summation of a set of numbers, Decimal Base to Binary
Base conversion, Reversing digits of an Integer, GCD (Greatest Common Division) of two
numbers, Test whether a number is prime, Organise numbers in ascending order, Find square
root of a number, factorial computation, Fibonacci sequence, Evaluate ‘sin x’ as sum of a series,
Reverse order or elements of an array, find Largest number in an array, Print elements of upper
triangular matrix, multiplication of two matrices, Evaluate a Polynomial.
11
3. Introduction to ‘C’ language 4 hours
3.1 Character set, Variable and Identifiers, Built-in-Data Types, Variable Definition
3.2 Arithmetic operators and Expressions, Constants and Literals
3.3 Simple assignment statement, Basic input/output statement
3.4 Simple ‘C’ programs
5. Arrays 6 hours
One dimensional arrays: Array manipulation; Searching, Insertion, Deletion of an element from an
array; Finding the largest /smallest element in an array; Two dimensional arrays, Addition/
Multiplication of two matrices, Transpose of a square matrix; Null terminated strings as array of
characters, Representation sparse matrices
6. Functions 6 hours
Top down approach of problem solving, Modular programming and functions, Standard Library of
C functions, Prototype of a function: Formal parameter list, Return Type, Function call, Block
structure, Passing arguments to a function: call by reference, call by value, Recursive Functions,
arrays as function arguments
Structure variables, initialization, structure assignment, nested structure, structures and functions,
structures and arrays: arrays of structures, structures containing arrays, unions
8. Pointers 6 hours
Address operators, pointer type declaration, pointer assignment, pointer initialization, pointer
arithmetic, functions and pointers, Arrays and Pointers, pointer arrays
Creation of a singly connected linked list, traversing a link list, Insertion into a linked list, Deletion
from a linked list
Concepts of Files, File opening in various modes and closing of a file, Reading from a file, Writing
onto a file
12
RECOMMENDED BOOKS
MAIN READING
2. R. G. Dromey, “How to solve it by Computer”, Seventh Edition, 2001, Prentice Hall of India
SUPPLEMENTARY READING
2. A. Kamthane, “Programming with ANSI & Turbo C”, First Edition, 2001, Pearson Education
3. Venugopal and Prasad, “Programming with ‘C’”, First Edition, 1997, Tata McGrawhill
4. B. W. Kernighan & D. M. Ritchie, “The C Programming Language”, Second Edition 2001, Prentice
Hall of India
On completion of this course, students will have basic knowledge of sources of sequences and protein
structure data, an understanding of the relevance and importance of this data, and some exposure to
basic algorithms used for processing this data.
Outline of Course
Detailed Syllabus
Genome Sequences
ORFs, Genes, Introns, Exons, Splice Variants
DNA/RNA Secondary Structure Triplet Coding
Protein Sequences
Protein Structure: Secondary, Tertiary, Quaternary
13
The notion of Homology
EMBL
GENBANK
Entrez
Unigene
Understanding the structure of each source and using it on the web
PDB
SwissProt
TrEMBL
Understanding the structure of each source and using it on the web
RECOMMENDED BOOKS
MAIN READING
Theory
1. D. Baxevanis and F. Oulette, (2002) “Bioinformatics : A practical guide to the analysis of genes
and proteins”, Wiley Indian Edition
2. Cynthia Gibas and Per Jambeck (2001), “Developing Bioinformatics Computer Skills”. O’Reilly
press, Shorff Publishers and Distributors Pvt. Ltd., Mumbai.
3. Bryan Bergeron MD (2003), “Bioinformatics Computing”. Prentice Hall India (Economy Edition)
4. Stuart Brown (2000) “Bioinformatics – A biologists guide to Biocomputing and Internet”. Eaton
Publishing
14
Practical
1. Jean-Michel Claverie and Cedric Notredame (2003) Bioinformatics – A Beginners Guide. Wiley –
Dreamtech India Pvt. Ltd.
SUPPLEMENTARY READING
Theory
2. Bioinformatics: Sequence and Genome Analysis. D. W. Mount (2001) Cold Spring Harbor
Laboratory Press.
The objective of this foundation course is to familiarize students to basic terminology and concepts in the
subject. Unifying concepts and theories pointed out and stressed wherever applicable.
Outline of Course
15
Detailed Syllabus
The nature of life, definition of life, characteristics of life, differences between animals and plants,
principal divisions in biology, importance of biology
Definition of Cell – fundamental cell types, difference between prokaryotic and eukaryotic cell
types, cell structure, cell wall, plasma membrane, different organelles and their function. Cell
division stages, chromosome structure, human chromosomes, sex chromosome.
Types of tissue in human : connective tissue, supporting tissues fluid tissue, muscular tissue,
nerve tissue, secretory tissue.
Human respiration : respiratory organs, mechanism of breathing in man, composition of different
respiratory gases, compartment of lung, gas exchange in lungs, transport of gases during
respiration, diseases of respiratory system. Human nutrition: Salivary glands, function of saliva,
swallowing process, digestion in stomach, secretory cells of stomach, digestion in small intestine,
pancreas, liver, bile, role of bile in fat digestion.
Human circulation : heart, blood flow control, diseases of heart and circulatory system. Human
excretion: Function of kidney, large intestine, summary of excretory substances, diseases related
to excretory system. Movement and locomotion: Organs responsible for movement, joints, bones
and muscles involved in walking, different sequences during locomotion.
16
The diversity of Microorganisms
Mendel’s work and experiments, gene bearer of heredity character, chemical basis of heredity.
Stages of evolution, origin of life, evidences of evolution from plant and animal kingdom. Genome
sequencing techniques.
Chemical structure, hybridization, double helical structures, replication, concepts of gene and
genetic code, transcription and translation, mutations and its implications, Polymerase Chain
Reaction.
Amino acid structure, chemical nature of residues, polymeric chain and folding, concepts of pH,
pKa, buffer, aqueous medium.
RECOMMENDED BOOKS
MAIN READING
SUPPLEMENTARY READING
1. DNA Structure and Function, R. R. Sinden, Academic Press, San Diego (1994)
17
SEMESTER – II
18
BI-A6/B2.1-R0: INTRODUCTION DATABASE AND WEB ENABLING TECHNOLOGIES
On completion of this course students will have adequate knowledge of different data models, basic
principles of data base management and basic web enabling technologies. Students will learn how to
design a database system, various normalization techniques and some useful query language. Students
will be able to develop web based applications to retrieve information from data bases. They should also
be able to design their own web sites, install web server, and do the relevant administrative tasks.
Outline of Course
Detailed Syllabus
19
5. Query language and query optimization 10 hours
Domain types in SQL, Schema definition in SQL, Types of SQL commands, SQL operators,
tables, views, indexes, aggregate functions, insert, delete and update operations, join, union,
intersection, minus etc. in SQL, queries, sub-queries, equivalence of queries.
Dynamic HTML, cascading style sheets, creation of background images, HTML object model,
dynamic positioning, direct animation path control.
RECOMMENDED BOOKS
MAIN READING
1. H. M. Dietel, P. J. Dietel and T. R. Nieto, Internet and World Wide Web – how to program, Pearson
Education India
2. A. Silberschatz, H. F. Korth and S. Sudarshan, Database System Concepts, McGraw Hill
International
3. Zoe’Lacroix and critchlow. Bioinformatics : Managing scientific data. Morgan Kaufmann Publishers
2004.
SUPPLEMENTARY READINGS
20
BI-A7/B2.2-R0: PERL PROGRAMMING AND APPLICATION TO BIOINFORMATICS
Perl is a popular programming language that is extensively used in Bioinformatics. The goal of this course
is to provide a practice-oriented introduction to Perl programming language. The goal is two-fold: to teach
programming skills and to apply them for the solution of interesting Bioinformatics problems. The
programming concepts such as data type, arithmetic and logical operators, conditionals and loops,
input/output, regular expression and pattern matching, functions and sub-routines, application in reading
protein files, finding motifs, simulating DNA, parsing PDB record, BLAST and annotations in GenBank,
etc. After completion of the course a student will have solid understanding of Perl basics and will be able
to program for tackling tasks described above.
Outlines of Course
Detailed Syllabus
1. Introduction: 2 hours
Introduction to Perl, Downloading and installation from Website, Writing and Running a Perl
Program, Editing, Advantages
21
6. Regular Expressions and Pattern Matching 12 hours
Regular Expression, Pattern Matching, Meta Character, Simple Pattern, Matching Group of
Characters, Matching multiple instances of Characters, Pattern Building, Pattern and Variable,
Pattern and Loops, Using Pattern for Search and Replace, Matching Pattern over multiple Lines
etc.
RECOMMENDED BOOKS
MAIN READING
1. James Tisdall, “Beginning Perl for Bioinformatics”, O’Reilly & Associates, 2001
2. James Tisdall, “Mastering Perl for Bioinformatics”, O’Reilly, 2003
SUPPLEMENTARY BOOKS
1. Cynthia Gibas & Per Jambeck, “Developing Bioinformatics Computer Skills”, O’Reilly &
Associates, 2000.
2. Rex A. Dawyer, “Genomic Perl”, Cambridge University Press
rd
3. Learning Perl, 3 Edition , Author: Randal L. Schwartz and Tom Phoenix, O’Reilly
22
BI-A7/B2.2-R0: PYTHON PROGRAMMING AND APPLICATIONS TO BIOINFORMATICS
The course is designed for students who already have some programming knowledge, in other languages
such as Perl or C. This course focus on bioinformatics application and examples and also covers the
necessary programming features of Python. Students who undertakes this course will be able to write
codes for simple applications and will motivate to develop modules for some specialized applications.
Outlines of Course
Detailed Syllabus
Basic data types – Strings, Lists, Tuples – Lists and Tuples – Strings and Unicode strings
Buffers – Dictionaries – Numbers – Type conversions – Files
5. Functions 6 hours
23
6. Exceptions 4 hours
8. Classes 6 hours
9. Biopython 10 hours
Introduction – Bio.Seq and Bio.SeqRecord modules – Using Seq class – Sequences reading and
writing – Bio classes for sequences – Bio.SwissProt.SProt and Bio.WWW.ExPASy – Reading
entries – Regular expressions in Python – Prosite - Bio.GenBank – Reading entries – Running
Blast and Clustalw – Blast - Clustalw - Running other bioinformatics programs under Pise
Advanced Parsers – Introduction – building parsing classes for Enzyme – Iterator using the
parsers classes – Building parsing classes for phylogenetic trees working with PDB – Study of
disulfide bonds – Graphics in Python.
RECOMMENDED BOOKS
MAIN READING
1. Mark Lutz, David Ascher (2003) Learning Python. O’Reilly & Associates
2. Alan Gauld (2000) Learn To program Using Python Addison –Wesley
ADDITIONAL READING
RESOURCES
1. URL : http://www.python.org
2. URL : http://www.biopython.org
24
BI-A8/B2.3-R0: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING THROUGH JAVA
The course is designed to impart knowledge and skills required to solve the real world problems using
object – oriented approach utilizing JAVA language constructs. This course covers the subject in two
parts, viz., Java Language and Java Library.
After completion of the course students are expected to understand the following:
Outlines of Course
Detailed Syllabus
The JAVA class libraries, Declaring a variable, Dynamic initialization, The scope and lifetime of
variable. Type conversion and casting.
Arithmetic operators, The Bitwise operators, Relational operators, Boolean logical operators, the
assignment operator, The ? Operator, Operator precedence
25
1.4 Control statements 03 hours
Inheritance basics, Member access and inheritance, Using super class, Creating a multilevel
hierarchy, Method overriding, Dynamic method dispatch, Using abstract classes, Using final with
inheritance, The object class
The JAVA thread model. The main thread, Creating a thread, Alive () and joint (), Suspend() and
resume (), Thread priorities, Synchronization, Interthread communication
I/O Basics: Streams, The stream classes, The predefined streams, Reading console input,
Writing console output, Reading and writing files, Applet fundamentals, The transient and volatile
modifiers, Using instance of native methods
The string constructor, Special string operations, Character extraction, String Searching &
Comparison, Data conversion using value of (), String buffer
Simple type wrappers, Runtime Memory management, Array copy, Object, Clone() and the
cloneable interface, Class & Class loader
26
2.3 The utility classes 03 hours
File Formats
Image Fundamentals: Creating, Loading and Siaplaying
Image Observer: Double Buffering, Media Tracker
27
RECOMMENDED BOOKS
MAIN READING
1. H. Schildt, (2001), “The Complete Reference – Java 2”, Fourth Edition, Tata McGraw Hill
2. Dietel nad Dietel (2001), “Java: How to program Java 2”, Second Edition, Pearson Education
SUPPLEMENTARY READING
The course is formulated keeping in view first year Bachelor’s students with no biology
background. The essential focus is on a qualitative and elementary exposition of the idea that
proteins are important biological polymer molecules made of amino acid units, that proteins
possess unique structures which in turn dictate their function inside the cell and that it is
possible to store, retrieve and represent this information on computers. Other biological
macromolecules are introduced towards the end of the course. After attending the course, the
student is expected to be acquainted with protein sequences, structures and their role inside the
cells, their relation to diseases and drug targets, and how computers may help in understanding
structure and function of proteins.
Outline of Course
Lectures = 60
Practicals/Tutorials = 60
Total = 120
28
Detailed Syllabus
29
b> Protein synthesis
c> Isolation, purification, characterization & estimation
8. Introductory Proteomics 2 hrs.
Proteomics : Comparative genome analysis using proteomics –
Differential expression on 2D- gel
9. Protein as drug targets 6 hrs.
Some case studies
10. Structure and function of Nucleic acid, lipids & carbohydrates 10 hrs.
Introduction to Nucleic acids, lipids & carbohydrates.
RECOMMENDED BOOKS
MAIN READING
1. C, Branden & C. Tooze, “Introduction to Protein Structure”, Garland Publishing Inc., New
York, 1991.
2. T.E. Creighton, “Proteins – Structures and Molecular Properties”, W.H. Freeman & Co.,
New York, 1993.
SUPPLIMENTARY BOOKS
30
BI-A10/B2.5-R0 : BASICS OF GENOMICS AND PROTEOMICS
The primary objective of this course is to impart advanced knowledge to students on a self
learning mode such that students become familiar with the methods of searching for information
and analyzing preliminary data.
After completion of this course the student should be able to
• Understand state-of-the-art molecular biology techniques
• Be conceptually familiar with metabolism
• Interpret preliminary gene expression and protein expression studies
• Be familiar with the various genome projects completed and ongoing
Outline of Course
Lectures = 60
Practicals / Tutorials = 60
Total = 120
Detailed Syllabus
31
4. Micro array for gene expression 10 hrs.
Target selection, customized microarray design, image processing and quantification,
normalization and filtering, statistical analysis, public microarray data sources.
5. Proteomics 10 hrs.
Basics of Protein structure, Introduction to basic Proteomics technology, Bio-informatics in
Proteomics, Basics of Proteome Analysis, Concepts in Enzyme Catalysis.
6. Genome Project – The unfolding story 12 hrs.
Introduction to the concepts cloning and mapping, Construction of Physical maps, Basics
of radiation hybrid maps, Sequencing : Related discoveries and technology development,
Implications of the Human Genome Project, Basic Human Inheritance Patterns, Basics of
Single Nucleotide Polymorphism detection and its implication, Practical Application of
medical Genetics Technology.
RECOMMENDED BOOKS
MAIN READING
1. Genomes. Author : T.A. Brown, John Wiley and Sons, Bios Scientific Publishers
2. Discovering Genomics, Proteomics and Bioinformatics Campbell AM and Heyer LJ
Perason Education (Low priced Editions) 2003.
SUPPLIMENTARY BOOKS
1. Principles of Genome Analysis and Genomics (Third Edition) Primrose and Twyman
Blackwell publishing (2003)
2. Reiner Westermeier and Tom Naven. Proteomics in practice. Wiley-vch. 3rd Edition 2002
3. Instant Notes-Bioinformatics Authors : D.R. Westheadn, J.H. Parish and R.M.Twyman.
Publisher : Viva Books Pvt. Ltd. IndianEdition 2003.
32
SEMESTER – III
33
BI-B3.1-R0 : COMPUTER ORGANIZATION AND DISTRIBUTED SYSTEMS
Objective of the course is to familiarize students about computer hardware design and
distributed systems, basic structure and behavior of the various functional modules of the
computer and how they interact to provide the processing needs of the user.
This subject mainly focuses on the computer fundamentals, network and distributed systems,
architectural models, fundamental models, networking and internetworking, operating systems &
its supports for distributed systems. Students will be exposed to internet principles, different
types of networking topologies. OSI (Open System Interconnection) seven layer model. The
course also deals in memory management, methodologies for enhancement of processor
throughout besides process control and interprocess communication.
Outline of Course
Lectures = 60
Practicals / Tutorials = 60
Total = 120
Detailed Syllabus
1. Fundamentals 11 hrs.
Binary number systems, Two’s complement, floating point representation, addition,
subtraction,overflow.
Logic gates, flip-flops, encoders, decoders, multiplexers, registers, counters, RAM, ROM.
34
3. Introduction to Networks and Distributed Systems 5 hrs.
The motivation and goals of network and distributed system, network topologies, layers in
a network, OSI (Open Systems interconnection) seven layer model (introduction, e.g.,
Chapter 1 of Tanenbaum), characteristics of a distributed computing system, issues such
as heterogeneity, openness, security, scalability, concurrency.
35
RECOMMENDED BOOKS
MAIN READING
1. George Coluouris Jean Dollimore Tim Kindberg, Distributed Systems : Concepts and
Design, Pearson Education, 2001.
2. M.M. Mano : Computer Systems Architecture, 3rd Edition, Pearson Education, 2000.
3. D.M. Dhamdhere, Systems Programming and Operating Systems, Tata McGraw Hill,
1996.
4. A. Tannenbaum & Van Steen, Distributed Systems Principles and Paradigms, 1st Edition,
2002 Prentice Hall
SUPPLEMENTARY READING
3. John D. Carpinelli, Computer Systems Organization & Architecture Pearson Low priced
Edition 2004.
36
BI-B3.2-R0 : PROBABILITY AND INFORMATION THEORY
Biological processes are inherently random. A rational description would require study of
probabilistic framework. Analysis of vast amount of biological data sets would require a through
understanding of probability theory. For quantifying uncertainty embedded in a biological system,
tools from information theory are useful. For simulating biological phenomena, Monte Carlo methods
are generally employed. This course will provide foundational support to the field of Bioinformatics.
Outline of Course
Lectures = 60
Practicals / Tutorials = 60
Total = 120
Detailed Syllabus
2. 12 hrs.
a> Random Variables
Discrete Random variables : - Bernoulli, Binomial, Geometric and Poisson
distributions.
Continuous Random variables : - Uniform, Exponential, Normal, Gamma, and Chi-
square distributions.
Expectation of random variables.
Chebyshev’s inequality.
Jointly distributed random variables.
b> Functions of Random Variables
Minimum, Maximum sum of Random Variables
37
Discrete case, Continuous case, Computing Expectation by conditioning, Computing
probabilities by conditioning.
38
RECOMMENDED BOOKS
MAIN READING
SUPPLEMENTARY READING
2. S.M. Ross, “Probability Models”, 7th Edition, Harcourt India Private Limited, 2001
The aim of course is to introduce and impart skills to students about the computational tools and
techniques which are needed to analyze Primary DNA and Protein sequences. This will enable them
to understand the significance of Protein structure and function. Some of the major Computational
issues and solutions will be covered with suitable examples. Focus will be on introducing few well
known algorithms and methods which are widely used in analyzing sequence. Basic knowledge in
Mathematics and Statistics is prerequisite. Students who undertake this course will get the
necessary understanding, exposure and foundation to take up small or large scale software
Development projects in the area of Bioinformatics.
After the end of this module the students can :
1. Use and apply state of the art tools pertaining to Sequence analysis
2. Will be able to analyze primary DNA and Protein sequences and interpret results
3. Will have the understanding of the methodology or algorithm employed in sequence analysis
tools
4. Will be in a position to undertake software development project relate to sequence analysis
Outline of Course
39
5. Database homology Search 10
6. Statistical Modeling of sequences 4
7. Pattern recognition and motif analysis 6
8. Secondary structure prediction 6
9. Phylogenetic analysis 6
Lectures = 60
Practicals / Tutorials = 60
Total = 120
Detailed Syllabus
40
4.2 Multiple Sequence alignment (6 hours)
4.2.1 SP (Sum of Pairs) measure
4.2.2 Star alignments
4.2.3 Tree alignments
4.2.4 Motifs and Profile
4.2.5 Alignment Representation and Applications
4.2.6 ClustalW, clustalX, Hmmer and T-cofee
41
RECOMMENDED BOOKS
MAIN READING
1. Setubal Joao and Meidanis Joao, “Introduction to Computational Molecular Biology”, PWS
Publishing Company (An International Thomson Publishing Company), 1997, Indian low
priced edition.
SUPPLEMENTARY READING
4. Ian korf, Mark yandell & Joseph Bedell. “BLAST”. O’Reilly press 2003.
42
BI-B3.4-R0 : DISCRETE MATHEMATICS
The objective is to provide the necessary mathematical knowledge so that students can understand
computational issues relating to practical problems. The paper intends to equip the students with the
necessary mathematical ideas, notations and techniques that are required to formulate an algorithm
for a problem, and to reason about its time and pace complexity. Many problems of bioinformatics
are combinatorial in nature. This course will help to solve such problems also.
Outline of Course
Lectures = 60
Practicals / Tutorials = 60
Total = 120
Detailed Syllabus
43
3. Recurrences and generating functions 10 hours
Boolean expression and function, formal definition of Boolean algebra, identities, combinatorial
circuits, Karnaugh maps.
Graph representation scheme, breadth first search, depth first search and their properties,
decomposing a directed graph into its strongly connected components, Euler tour, minimum
spanning tree, Prim’s algorithm, Kruskal’s algorithm, shortest path problem, Dijkstra’s algorithm
and Bellman –Ford Algorithm for single source shortest paths problem.
Terminals, non-terminals, grammar, language, types of grammar – context sensitive and context
free grammar, parse tree, finite state automata, non-deterministic finite state automata, regular
expression, regular sets, Kleene’s theorem Turing machine.
7. NP-completeness 5 hours
Introduction to complexity class P, verification algorithm, complexity class NP, reduction function,
reduction algorithm, NP-completeness
RECOMMENDED BOOKS
MAIN READING
44
SEMESTER – IV
45
BI-B4.1-R0: STATISTICAL METHODS IN BIOINFORMATICS
Statistics as a discipline deals with collection and analysis of data. The collection of data involves
techniques of sampling methods while the analysis involves inferences and estimation based on sampled
data. In Bioinformatics the same principle regarding scientific data collection and analysis of data hold. To
investigate, large scale data sets generated from various genomes, proteome, metabolomic projects and
related sources, it is necessary to have a proper grounding in statistical techniques.
Outline of Course
Minimum No. of
S. No. Topic
Hours
1. Review 04
2. Sampling Distributions 05
3. Estimation Theory 08
4. Maximum Likelihood Esimation 04
5. Inference – Tests of Hypotheses 16
6. Simple Linear Regression and Correlation 06
7. Analysis of Variance 06
Statistical methods for Stochastic
8. 05
Processes
Statistical Analysis of Stochastic
9. 06
Processes
Lectures = 60
Practical / Tutorials = 60
Total = 120
Detailed Syllabus
1. Review 4 hours
Statistic
Distribution of sample mean
Sample variance – (application of Central limit theorem)
46
4. Maximum Likelihood Estimation 4 hours
Testing of independence
Testing of Markov property
Association test
Long Repeat
Scan Statistics
Analysis of Patterns
RECOMMENDED BOOKS
MAIN READING
47
BI-B4.2-R0: BIOMOLECULAR STRUCTURE & DYNAMICS
This course is formulated keeping in view first year Master’s students with ‘A’ level (DOEACC)
background in biology. The focus is on imparting a sound knowledge of structure and dynamics of
proteins and nucleic acids and modelling them in silico. After attending the course, the students are
expected to handle research projects in biomolecular structure and function.
Outline of Course
Minimum No. of
S. No. Topic
Hours
Protein Structure, Classification of protein
Structure, Membrane proteins, Viral
1. 10
Proteins, Proteins in immune system,
Proteins as drug targets.
Manipulating protein structures on
computer, Cartesian and internal
2. coordinate representations , ribbon 10
diagrams, space filling models, topology
diagram, searching for sequences & folds
3. Nucleic acid structures – A review 10
Database analyses of internal degrees of
4. 5
freedom of nucleic acid structures
Forces stabilizing Protein & Nucleic Acid
5. 5
structures
Energy minimization and Molecular
6. 15
Dynamics methods
Structure and function of Carbohydrates
7. 5
and Lipids.
Lectures = 60
Practical / Tutorials = 60
Total = 120
Detailed Syllabus
48
3. Nucleic acid structures – A review
a) Right handed (A, B) and left handed (Z) double helical DNA.
b) Structures of t-RNA and r-RNA
c) DNA bending
d) DNA supercoiling
e) Cruciforms, triplexes and quadruplexes
f) Nucleic acids as drug targets 10 hours
RECOMMENDED BOOKS
MAIN READING
1. C. Branden & C. Tooze, “Introduction to Protein Structure”, Garland Publishing Inc. New York,
1991
2. R. R. Sinden, “DNA Structure & Function”, Academic Press, San Diego, 1994
SUPPLEMENTARY READING
th
1. Lehninger principles of Biochemistry, David Nelson and Michael M. Cox, Freeman Publishing, 4
Edition 2004
2. A. R. Leach, “Molecular Modelling – Principles & Applications” Addison Wesley Longman, Essex,
1996
49
BI-B4.3-R0: DATA STRUCTURES AND ALGORITHMS
Preamble
“Data Structures and Algorithms” can be considered as a classic, core topic of computer science. Data
structures and algorithms are central to the development of good quality programs. Their study is very
critical for a sound understanding and efficient implementation of Bioinformatics algorithms and programs.
(1) To gain a sound understanding of different types of data structures and their role in algorithm
design in computer science (in general) and in bioinformatics (in particular).
(2) To be able to understand any typical algorithm design problem in computer science and
bioinformatics and be able to visualize appropriate data structures for efficiently implementing the
algorithm.
(3) To be able to prove why a particular data structure is best suited for a particular problem, by
analyzing the algorithm in a sound and systematic way.
Outline of course
Minimum No. of
S. No. Topic
Hours
1. Introduction 2
2. Computational Complexity 6
3. Essential Preliminaries 5
4. Dictionaries 12
5. Priority Queues 4
6. Graphs 12
7. Sorting Methods 6
8. String Matching 4
9. Algorithm Design Paradigms 9
Lectures = 60
Practical / Tutorials = 60
Total = 120
Detailed Syllabus
1. INTRODUCTION 2 hours
Complexity of algorithms: worst case, average case and amortized complexity, notation for
algorithm complexity, Algorithm analysis, Recurrence relations, Introduction to NP-completeness
50
3. ESSENTIAL PRELIMINARIES 5 hours
4. DICTIONARIES 12 hours
Dictionaries: Hash tables, Binary search trees, splay trees, Balanced Trees, AVL trees, 2-3 trees,
B-Trees
6. GRAPHS 12 hours
Graphs: Shortest path algorithms (Dijkstra, Bellman-Ford, Floyd-Warshall), minimal spanning tree
algorithms (Prim, Kruskall), depth-first, breadth-first search, and their applications.
Sorting: sorting methods and their analysis (shell sort, quicksort, merge sort, heap sort, radix
sort), lower bound on complexity
Algorithm Design Paradigms: Greedy methods, divide and conquer, dynamic programming,
backtracking, local search methods, branch and bound technique. Travelling salesman problem.
RECOMMENDED BOOKS
MAIN READING
1. A. V. Aho, J. E. Hopcroft, and J. D. Ullman, Data Structures and Algorithms, Addison Wesley,
Reading Massachusetts, USA, 1983 (Available in Indian Edition)
2. M. A. Weiss, Data Structures and Algorithms Analysis in C++, Benjamin / Cummins, Redwood
City, California, USA, 1994 (Available in Indian Edition)
SUPPLEMENTARY BOOKS
51
BI-B4.4-R0: COMPUTATIONAL GENOMICS
Computational approaches are adopted right from Genome sequencing to sequence characterization.
This course module will provide the necessary basics and overview of the computational issues related to
Genomics. Application of in silico methods to some of the complex biological problems such as
Eukaryotic gene finding will be highlighted. Relevant and popular software tools and techniques employed
in this domain will be covered with suitable examples. Students at the end of the course will be able to
undertake pilot software project in related topics.
Outline of Course
Detailed Syllabus
Short-gun sequencing
Clone by Clone approach
Genome markers and maps
Sequencing errors
Base calling
Fragment assembly
Overlap and layout consensus
52
Homology based methods
Performance measures
Statistical methods
Compositional bias
Non-randomness
Machine learning and probabilistic methods
Artificial Neural Network
Markov Chain
Hidden Markov Model
EST comparison
Blast analysis
Gene order
Horizontal gene transfer
Transposable elements
Clusters of orthologus genes
RECOMMENDED BOOKS
MAIN READING
1. Mount (2001), “Bioinformatics : Sequence and Genome analysis”, Cold Spring Harbor Laboratory
press
4. Cecillia saccone and Grazianio pesole, Handbook of comparative Genomics. Principles and
methodology
SUPPLEMENTARY READING
53
1. R. Durbin, S. Eddy, A Krogh and G. Mitchison, “Biological sequence analysis: Probabilistic
models of proteins and nucleic acids”, Cambridge University Press, 1998
3. Tao Jiang, Ying Xu and Michael Q. Zhnag (Editors) Current topics in computational Molecular
biology. Ane Books, New Delhi 2004 .
4. Pierre Baldi and Soren Brunak (2003) “Bioinformatics: The machine learning approach”, East
West Press, Low priced Edition, Delhi
SEMESTER – V
54
BI-B5.1-R0 ; OPTIMIZATION, MACHINE LEARNING AND COMPUTATIONAL INTELLIGENCE
To provide the basic concepts and terminology of machine learning and to give an overview on the main
approaches to machine learning along with an understanding of the differences, advantages and
problems of the various approaches. On completion of the course students should be able to understand
and apply various clustering and classification algorithms in classical and computational intelligence
framework for bioinformatic problems. The intention is to provide the necessary theoretical background
also so that they can avoid blind use of such tools for practical problems. It is a self-contained course
which will also equip the students with the necessary knowledge of optimization theory.
Outline of Course
Detailed Syllabus
Concepts of local and global optimum, constrained and unconstrained optimization, gradient,
Hessian of a function, first order and second order necessary conditions for local minimizers,
method of steepest descent, steepest descent for quadratic function, Newton’s method,
Levenberg-Marquardt algorithm, and Lagrange multipliers methods.
Introduction to Genetic algorithms, exploitation and exploration, general structure of the basic
genetic algorithm, different coding schemes, fitness function, chromosome, population, basic
genetic operators, different selection mechanism, mutation, single point and multi-point
crossover, termination criteria, elitist strategy.
55
Introduction to classifier design, linear discriminant analysis – two category and multi category
cases, perceptron criterion and its minimization, decision trees (ID3, C4.5), impurity functions,
pruning methods, rule extraction from decision trees, nearest neighbour classifier, k-nearest
neighbour classifier, Bayes decision rule, lose function, minimum error rate classification, Bayes
classifier with multivariate normal density.
Introduction to neural networks, introduction to biological neural network, motivation for artificial
neural network (ANN), significance of massive parallelism and characteristics of (ANN), various
types of architectures.
Layered networks, perceptron and motivation for multilayered perceptron (MLP), back-
propagation learning , use of Levenberg-Marquardt method in training of MLP’s, online vas batch
learning, issues relating to initialization, termination, choice of architecture. Radial basis function
networks and related training issues, properties of MLP and RBF.
Self-organizing feature map, recurrent neural networks, Hopfield network and its applications,
simulated annealing.
RECOMMENDED BOOKS
MAIN READING
1. R. O. Duda, P. E. Hart and D. G. Stork, Pattern classification, John Wiley Sons, Second Edition,
2000
3. Christopher M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1995
4. E. K. P. Chong and S. H. Zak, An introduction to optimization, John Wiley & Sons, 2001
ADDITIONAL BOOKS
3. S. Haykin, Neural networks – a comprehensive foundation, Prentice Hall, 1998 Vojislav Kecman,
Learning and soft computing, Pearson Low priced Edition 2004
56
BI-B5.2-R0 : OBJECT TECHNOLOGY FOR BIOINFORMATICS
Preamble
Object technology has emerged as a sound paradigm for modelling, analysis, design, and implementation
of complex software intensive systems. Object technology is indeed an attractive methodology to use in
Bioinformatics in the areas of software modelling, software design and database design. Emerging
software technologies such as component technology and distributed object technology, which are quire
important for Bioinformatics are all founded on the object oriented paradigm. What “Data Structures and
Algorithms” is to Bioinformatics programming, “Object Oriented Modelling and Design” is to Bioinformatics
software development.
The objective of the course is to provide a sound technical exposure to the concepts, principles, methods,
and best practices in object oriented modelling and design as applied to Bioinformatics.
The version of the course is to produce “Object Technologists” with sound knowledge and superior
competence in building robust, scalable, and reliable software intensive systems in Bioinformatics. They
would have a clear appreciation of the role of abstraction, modelling and design patterns in the
development of a Bioinformatics product. They would be able to make optimal design choices and employ
the most relevant methods, best practices, and technologies for building a software intensive
bioinformatics product, regardless of its complexity and scale.
Outline of Course
Detailed Syllabus
Meaning of the following terms: model, notation, method, methodology, analysis, design, process
with examples, Evolution of object technology, difference between structured methodology and
OO methodology
57
2. UNIFIED MODELING LANGUAGE (UML) 10 hours
Need for UML. UML Diagrams: Class diagrams, object diagrams, use case diagrams, component
diagrams, deployment diagrams, state chart diagrams, activity diagrams, sequence diagrams,
collaboration diagrams, other essential UML features. Examples and case studies
Use case analysis, Analysis level UML diagrams, Analysis best practices, Role of aggregation,
composition, association, inheritance, delegation, interaces, Examples and case studies.
Design best practices, Design patterns, Creational patterns, Structural patterns, Behavioural
patterns, Role of patterns in object oriented design, Examples and case studies
Distributed object computing, Concept of object request broker, Interface definition language
(IDL). Common object request broker architecture (CORBA). Component technology, Reusing
components, Bioinformatics components, Object oriented frameworks.
Limitations of flat files, Advantages and disadvantages of relational databases, Need for object
oriented databases, Object relational databases, Mapping from objects to relational or object
relational databases, Examples and case studies in bioinformatics.
RECOMMENDED BOOKS
MAIN READING
SUPPLEMENTARY READING
1. E.Gamma, R Helm, R. Johnson, and J. Vlissides, Design Patterns : Elements of Reusable Object
Oriented Software. Addison-Wesley, 1995 (Available in Indian Edition)
2. G. Booch, Object Oriented Analysis and Design with Applications, Second Edition, Benjamin
Cummings, 1994 (Available in Indian Edition)
3. Jim Conallen, Building Web Applications with UML. Addison-Wesley, 2000 (available in Indian
Edition)
58
BI-B5.3-R0 : COMPUTATIONAL PROTEOMICS AND GENE EXPRESSION STUDIES
On completion of this course, students will have been exposed to computational problems which arise in
the analysis for data from high throughput proteomic and genomic platforms.
.
Outline of Course
Detailed Syllabus
Sequencing Algorithms
59
5. Image Analysis Basics 15 hours
Normal Distribution
T and F Distributions
T-Test and ANOVA
Permutation Testing
Distance Metrics
K-Means, Hierarchical Clustering
Visualizing Clustering Results
Application of Clustering
RECOMMENDED BOOKS
MAIN READING
1. Microarray Gene Expression Data Analysis: A Beginner’s Guide, Helen. C Caustn Blackwell
Publishers, 2003
3. DNA Microarraya and Gene Expression: From Experiments to Data Analysis and Modeling.
Pierre Baldi (Author), G. Wesley Hatfield (Author) and Wesley G. Hatfield, Cambridge University
Press, 2002
SUPPLEMENTARY READING
1. Microarrays for an integrative Genomics, Issac S. Kohare, Alvin T. Kho and Atul J. Butte, MIT
Press reprinted by Ane Books, New Delhi 2004
60
BI-B5.4.1-R0: COMPUTER AIDED MOLECULAR MODELLING AND DRUG DISCOVERY
The course is designed keeping in view an undergraduate student with plus two level knowledge in
Physics, Chemistry and Mathematics, and elementary knowledge in Biology. The student also needs to
have some exposure to use of computers. The objective of the course is to familiarize the student with
basic concepts in analog and target structure based (rational) drug designing. The focus is on
comparisons of structural and electronic parameters amongst different molecules and scoring methods to
rank the ligand making use of “Principle of similarity” with other known ligand or “complimentary” with the
active site of the target molecule. To achieve this, the student would be familiarized with building small
molecules, computation of structure based and electronic parameters, docking a small molecule in the
active cavity and setting up of QSAR (quantitative structure activity relationship) method.
Learning outcome
At the end of the course, the student will be equipped with different skills involved in computer aided drug
designing. He would be able to do molecular modelling, quantum chemical calculations, compute
electronic indices, identify active sites of target molecules and dock a ligand to the target molecule. He
would also be able to optimise a ligand using different scoring methods.
Outline of Course
Detailed Syllabus
61
c. Cambridge Structural Data base
d. Building small molecules using chemical information
e. Use of Builders and Sketchers (eq. ISIS Draw, HyperChem)
a. Use of classical (Empirical potential energy / function and grid search) methods
b. Use of Quantum mechanical approach
c. Use of molecular mechanics methods
5. Basic concepts in quantitative structure activity relationship (QSAR) 5 hours
a. Objective of QSAR
b. Development of Hansch QSAR equation
c. QSAR Descriptors
d. Regression analysis
6. Basic principles of target structure based (rational) drug designing 15 hours
Methodology
RECOMMENDED BOOKS
MAIN READING
1. Molecular Modeling: Basic principles and applications. Holtje HD, sippl W, Rognan D and Folkers
nd
G. Wiley-VFH 2 Edition (2003)
2. Quantum Biology: S. P. Gupta, New Age publishers
3. Molecular modelling and drug design. Andrew Vinter and Mark Gardner and Boca Raton, CRC
Press, 1994
4. Molecular Similarity in drug design, Dean PM, Chapman and Hall, 1995
SUPPLEMENTARY READING
1. Computer assisted Lead finding and optimization: current tools for medicinal chemistry. Han Van
De Waterbeemd (Ed.) Wiley-VCH, 1997
2. Chemical Applications of Molecular Modelling, Goodman, JM Royal Society of Chemistry , 1998
3. Molecular Modelling: Principles and Applications by Andrew R. Leach, Longman, 1996
62
4. Computer Aided drug design: Methods and applications: Ed by Thomas J. Perun and CL Propst.
Marcel Dekker, 1989
5. 3D QSAR in Drug Design Theory Methods and Applications – Hugo Kubinyi. Kluwer, 1993
6. Advances in Drug Discovery Techniques Alan L. Harvey (Ed.) John Wiley & Sons, 1998
BI-B5.4.2-R0: CHEMOINFORMATICS
The course aims to introduce the fundamentals of Chemoinformatics which are required in computer
aided design. The students will also be exposed to issues related to chemical data processing
Outline of Course
Detailed Syllabus
Web based chemoinformatics : Molecular data generated in modern drug discovery efforts – Data
– structured numerical annotation /text – graphical overview of the architecture, database access,
similarity and subtractive searches, Library design and reactant solution and filtering
63
3. Molecular similarity measures 6 hours
Relavance chemical reasoning and analysis: Molecular representations and their similarity
measures, chemical graphs, discrete and continuous- valued feature vectors, Field based
functions – representations of Molecular fields, chemistry spaces.
Practical aspects, clustering and similarity, recursive partitioning, rule based evaluation,
pharmacophore modelling, data sharing issues associated with NCI dataset
Information compound sets – High throughput screening, Methods, techniques and strategies
implemented at Pharmacia, computational methods, custom prioritization software, score
components – q score, molecular complexity estimated stability, scoring strategy Biological score,
important features of scores and statistics
Design of compounds for medicinal use, predicted bioactivity, QSAR virtual throughput
sequencing (VHTS), bioactive data Chiral centres Force fields and partial changes, molecular
confirmations systematic conformation feature – Molecular alignment.
RECOMMENDED BOOKS
MAIN READING
1. Chemoinformatics (Methods in Molecular Biology Vol. 275 Ed. By Jurgen bajorath. Humana
Press 2004
2. Structural Bioinformatics. Ed. By P. E. Bourne and H. Weissig. Wiley-Liss 2003
3. J. Gasteiger “Chemoinformatics: A text book” John Wiley and Sons 2003
64
SEMESTER – VI
Project
65