Introduction To Bioinformatics
Introduction To Bioinformatics
Introduction to Bioinformatics
24 – 07 – 2023
23-07-2024
1
29-07-2024
What is Bioinformatics
• Bioinformatics is a multidisciplinary field that utilizes computer
programming, machine learning, algorithms, statistics, and
other computational tools to organize and analyze large
volumes of biological data.
• Bioinformatic tools are software programs designed for
• for extracting the meaningful information from the mass of molecular
biology / biological databases
• to carry out sequence or structural analysis.
2
29-07-2024
>NM_001384529.1 Homo sapiens GATA zinc finger domain containing 2A (GATAD2A), transcript variant 21, mRNA
AGTGTGAGACTGAGCCGCGAGACTGAGCTGCGGCTCCGAGCGCTGCGCGGCGGCTCCTCCCGCCCAGGGT
CAGCGCCCCGGCGCGCGCACGCGCACCCCCGCCGCCCGAGCGCGCCCCGCGCCGCCCGCGCAGTCGGTCG
GTCGGTCGTCTGTCCTGTCGCCGCTGCCGCCGCCGCCACAGCGGCCGCCGCGGGCGCCACCTGAGGGAGT
CGCCTCCGCGGGACGCCACAAGACCTGACCGGACTGCGCCGCCCGAGGCCGTCGGCCGCCGTCAGCGAGG
GCGCCGAGCAACTTCGGAGCAACAGATTTGGATGAAACCCTGAGGATCCCAGAGCTGAAAGTGAGTTTGA
AGTGTCCGATCCAGTCCTTCAACTCAGAGCACTCCTATCTGTGACACCTCTGCCCACGCATCCAGTATGT
GCAGCACACCTGCTCTGTGACTGACACTCTTGCAGAAGTGGGGCCACTTCAGGGACATGGACAAGGTGTT
GTACCTGCTGTCACAGAGCCTGTTATCTGAATGACCGAAGAAGCATGCCGAACACGGAGTCAGAAACGAG GAGGCCCTC
CGCTTGAACGGGACCCAACAGAGGACGATGTGGAGAGCAAGAAAATAAAAATGGAGAGAGGATTGTTGGC
TTCAGATTTAAACACTGACGGAGACATGAGGGTGACACCTGAGCCGGGAGCAGGTCCAACCCAAGGATTG
CTGAGGGCAACAGAGGCCACGGCCATGGCCATGGGCAGAGGCGAAGGGCTGGTGGGCGATGGGCCCGTGG
ATGAAAAGCA
ACATGCGCACCTCACACAGTGACATGAAGTCCGAGAGGAGACCCCCCTCACCTGACGTGATTGTGCTCTC
CGACAACGAGCAGCCCTCGAGCCCGAGAGTGAATGGGCTGACCACGGTGGCCTTGAAGGAGACTAGCACC GTCCTGAAGA
GAGGCCCTCATGAAAAGCAGTCCTGAAGAACGAGAAAGGATGATCAAGCAGCTGAAGGAAGAATTGAGGT
TAGAAGAAGCAAAACTCGTGTTGTTGAAAAAGTTGCGGCAGAGTCAAATACAAAAGGAAGCCACCGCCCA
GAAGCCCACAGGTTCTGTTGGGAGCACCGTGACCACCCCTCCCCCGCTTGTTCGGGGCACTCAGAACATT
ACGAGAAAG
CCTGCTGGCAAGCCATCACTCCAGACCTCTTCAGCTCGGATGCCCGGCAGTGTCATACCCCCGCCCCTGG
TCCGAGGTGGGCAGCAGGCGTCCTCGAAGCTGGGGCCACAGGCGAGCTCACAGGTCGTCATGCCCCCACT GATGCGGCA
CGTCAGGGGGGCTCAGCAAATCCACAGCATTAGGCAACATTCCAGCACAGGGCCACCGCCCCTCCTCCTG
GCCCCCCGGGCGTCGGTGCCCAGTGTGCAGATTCAGGGACAGAGGATCATCCAGCAGGGCCTCATCCGCG
TCGCCAATGTTCCCAACACCAGCCTGCTCGTCAACATCCCACAGCCCACCCCAGCATCACTGAAGGGGAC
GAGTCAAATA
AACAGCCACCTCCGCTCAGGCCAACTCCACCCCCACTAGTGTGGCCTCTGTGGTCACCTCTGCCGAGTCT
CCAGCAAGCCGACAGGCGGCCGCCAAGCTGGCGCTGCGCAAACAGCTGGAGAAGACGCTACTCGAGATCC CAAAAGGAAG
CCCCACCCAAGCCCCCAGCCCCAGAGATGAACTTCCTGCCCAGCGCCGCCAACAACGAGTTCATCTACCT
GGTCGGCCTGGAGGAGGTGGTGCAGAACCTACTGGAGACACAAGGCAGGATGTCGGCCGCCACTGTGCTG
TCCCGGGAGCCCTACATGTGTGCACAGTGCAAGACGGACTTCACGTGCCGCTGGCGGGAGGAGAAGAGCG
CCACCGCCC
GCGCCATCATGTGTGAGAACTGCATGACAACCAACCAGAAGAAGGCGCTCAAGGTGGAGCACACCAGCCG
GCTGAAGGCCGCCTTTGTGAAGGCGCTGCAGCAGGAACAGGAGATTGAGCAGCGGCTCCTGCAGCAGGGC A
ACGGCCCCTGCACAGGCCAAGGCCGAGCCCACCGCTGCCCCACACCCCGTGCTGAAGCAGGTCATAAAAC
CCCGGCGTAAGTTGGCGTTCCGCTCAGGAGAGGCCCGCGACTGGAGTAACGGGGCTGTGCTACAGGCCTC
CAGCCAGCTGTCCCGGGGTTCGGCCACGACGCCCCGAGGTGTCCTGCACACGTTCAGTCCGTCACCCAAA
CTGCAGAACTCAGCCTCGGCCACAGCCCTGGTCAGCAGGACCGGCAGACATTCTGAGAGAACCGTGAGCG
CCGGCAAGGGCAGCGCCACCTCCAACTGGAAGAAGACGCCCCTCAGCACAGGCGGGACCCTTGCGTTTGT
CAGCCCAAGCCTGGCGGTGCACAAGAGCTCCTCGGCCGTGGACCGCCAGCGAGAGTACCTCCTGGACATG
ATCCCACCCCGCTCCATCCCCCAGTCAGCCACGTGGAAATAGTGCGAGCCAGGCCCCGTGGAAGACGGGC
TCCCTCCTCCCCCACCTGGCCCCTGGTCTAGAAGGACCCACTGCACCACCCTCCGCTGGCTCGGGAAGAC
ACCGTGCCCGCCCCAAGAGCAAGCACCGGCCATGCTGCAGAGGCAAGACCTCAATTCTTGGCTGCAAAGT
TTCATCAGGGCTAGGGGGCTGGTGCCGCCTCATAGGCAGACGAGGATCATCGCTGGGGGACCTTTCCCGT 6
GGGCTTTCTTCCTTTCTCTCTTTGCCTTTAGTTTGCCCGACACCAGCAGAAAAGTGGACCTTGGGGGCTG
GTTCTGCTCCTGGCCCCCTTGTTCAGCCCCTGCCGGCACACGGGCGGCTCACCCTGGACACTGTGATGCG CAT
3
29-07-2024
Protein sequence
• >AAA40590.1 insulin [Octodon degus]
• MAPWMHLLTVLALLALWGPNSVQAYSSQHLCGSNLVEALYMTCGRSGFYRP
HDRRELEDLQVEQAELGLEAGGLQPSALEMILQKRGIVDQCCNNICTFNQLQ
NYCNVP
4
29-07-2024
Sequencing data
10
5
29-07-2024
• The central role of bioinformatics in the modern biological investigation based on ‘omics’
sciences.
Organism's
Proteomics is
transcriptome is the
the large-scale
sum of all of its RNA
study of
transcripts.
proteins.
12
6
29-07-2024
13
14
7
29-07-2024
Abdalla, H.B. A brief survey on big data: technologies, terminologies and data-intensive
applications. J Big Data 9, 107 (2022). https://doi.org/10.1186/s40537-022-00659-3 15
16
8
29-07-2024
17
3. Data Analysis:
• This feature of big data allows the researchers to analyze the data obtained by performing
experiments.
• GATK (Genome Analysis Toolkit) is a MapReduce-based programming framework -
used for large-scale DNA sequence analysis
• BlueSNP - R package for highly scalable genome-wide association studies using Hadoop
clusters
18
9
29-07-2024
19
20
10
29-07-2024
21
22
11
29-07-2024
And then….
• DNA sequencing analysts help with developing models
and algorithms to manage and analyze Sanger sequencing and
NGS data.
• A genome analyst can provide insights
• into the risk factors for genetic disease in a specific individual
• find new targets for drugs
• help with developing personalized medicine.
• A genome analyst can also help with study design, - the
selection of patients for clinical trials based on their genetic
makeup.
23
• Bioinformatician
• Bioinformatics Scientist
• Computational Analyst/Biologist
• Genome Analyst
• Genomic Data Analyst
• Rare Disease Analyst/Cancer Analyst
• https://omicstutorials.com/bioinformatics-tools-softwares-programmes/
24
12
29-07-2024
25
26
13
29-07-2024
27
14