0% found this document useful (0 votes)
3 views4 pages

???-101 ?????????? ??.2

The document is an assignment on bioinformatics, specifically focusing on differentiating between key concepts such as Pairwise and Multiple Sequence Alignment, Identity and Similarity, BLAST and FASTA, and Database and Data Warehouse. Each section provides definitions, purposes, complexities, and use cases for the terms discussed. The assignment is due on July 7, 2025, and includes links to join relevant expert groups for further collaboration.

Uploaded by

Hijab Queen41
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views4 pages

???-101 ?????????? ??.2

The document is an assignment on bioinformatics, specifically focusing on differentiating between key concepts such as Pairwise and Multiple Sequence Alignment, Identity and Similarity, BLAST and FASTA, and Database and Data Warehouse. Each section provides definitions, purposes, complexities, and use cases for the terms discussed. The assignment is due on July 7, 2025, and includes links to join relevant expert groups for further collaboration.

Uploaded by

Hijab Queen41
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Assignment No.

By Yasin Abbas Chaudhary

BIF-101 Introduction to Bioinformatics

Total Marks: 8

Due Date: 07-07-2025


Questions:
Differentiate between following:
a) Pairwise and Multiple Sequence Alignment
b)Identity and Similarity
c) BLAST and FASTA
d)Database and Data warehouse

Solution:
a. Pairwise vs. Multiple Sequence Alignment
1. Definition:
• Pairwise Sequence Alignment compares two sequences (DNA, RNA, or proteins) to find regions
of similarity.
• Multiple Sequence Alignment (MSA) aligns three or more sequences simultaneously to identify
conserved regions.

2. Purpose:
• Pairwise Alignment is primarily used to compare two sequences for similarity or evolutionary
relationships.
• MSA helps analyze conserved domains, motifs, and evolutionary trends among a group of
sequences.

3. Complexity:
• Pairwise is computationally simple and faster, using algorithms like Needleman-Wunsch or
Smith-Waterman.
• MSA is computationally intensive and requires complex algorithms like ClustalW or MUSCLE.

4. Output Information:
• Pairwise Alignment gives information on similarity, identity, and possible homology between just
two sequences.
• MSA shows alignment across many sequences, revealing conserved residues and structural or
functional motifs.

5. Use Cases:
• Pairwise is used in tasks like checking a new gene sequence against a known sequence.
• MSA is used for building phylogenetic trees, identifying conserved sequence patterns, and
understanding protein families.

b. Identity vs. Similarity


1. Definition:
• Identity refers to the exact match of characters (nucleotides or amino acids) at corresponding
positions in aligned sequences.
• Similarity refers to the degree to which sequences are alike based on both identical matches and
conservative substitutions.

2. Measurement:
• Identity is expressed as a percentage of exact matches over the alignment length.
• Similarity includes both exact matches and functionally similar (but not identical) residues using
substitution matrices like PAM or BLOSUM.
3. Implication:
• High Identity indicates strong evolutionary relationship or conserved function.
• High Similarity can still suggest functional or structural relation even if identity is low.

4. Tools Used:
• Identity can be calculated using tools like BLAST or EMBOSS pairwise alignment.
• Similarity considers evolutionary models and is assessed through scoring matrices in tools like
Clustal, BLAST, etc.

5. Use in Bioinformatics:
• Identity is often used in threshold-based sequence filtering (e.g., 90% identity cutoffs).
• Similarity helps in detecting distant homologs and understanding structural/functional
relationships.

c. BLAST vs. FASTA


1. Definition:
• BLAST (Basic Local Alignment Search Tool) finds regions of local similarity between
sequences using a heuristic approach.
• FASTA is an older algorithm that also performs local sequence alignments but uses different
scoring and searching methods.

2. Speed and Accuracy:


• BLAST is faster and more efficient for large database searches due to optimized heuristics.
• FASTA is slightly slower but may be more sensitive in some cases, especially for shorter
sequences.

3. Algorithm Design:
• BLAST breaks query into short words (k-mers), finds matches, then extends them.
• FASTA searches for exact matches (k-tuples), builds diagonals, and then scores the alignments.

4. Output Format:
• BLAST provides a more interactive and detailed output, including graphical displays, scores, e-
values, and alignment segments.
• FASTA gives a textual output showing alignments, scores, and statistical significance.

5. Use in Bioinformatics:
• BLAST is more commonly used today due to its speed and NCBI integration. It’s widely used for
gene annotation, database searching, and homology detection.
• FASTA is still used for teaching, smaller searches, and specific tasks where its sensitivity is
beneficial.

d. Database vs. Data Warehouse


1. Definition:
• Database is a collection of data organized for quick search, retrieval, and updating, typically used
in real-time applications.
• Data Warehouse is a centralized repository that stores historical and current data from multiple
sources for analysis and reporting.

2. Data Nature:
• Database contains real-time, operational data such as daily transactions, user activities, etc.
• Data Warehouse contains historical, integrated, and often large-scale data optimized for query
and analysis.

3. Usage Purpose:
• Database is used for everyday operations, e.g., booking systems, hospital management, bank
transactions.
• Data Warehouse is used for business intelligence, trend analysis, and decision-making processes.

4. Structure and Design:


• Database is designed for high-speed inserts, updates, and deletions. It follows ER models and
normalization rules.
• Data Warehouse is designed for reading and analyzing large datasets. It follows star or snowflake
schemas and may be denormalized.

5. Tools and Examples:


• Databases use systems like MySQL, Oracle, or PostgreSQL for transactional tasks.
• Data Warehouses use tools like Amazon Redshift, Google BigQuery, or Snowflake for analytical
queries.

For upcoming activities join


VU Expert B.S Zoologists (Solution Group)

https://chat.whatsapp.com/LF8UFWqhEzxJHJzQ9bpn46

VU Expert B.S Zoologists (All semester)

https://chat.whatsapp.com/IOBrX1fPvjMLRyi64DP6XW

BIF-101 Expert Zoologists

https://chat.whatsapp.com/JHYW7fjvJRI2J62WpVk2UG

Regards:
VU Expert Zoologists

You might also like