0% found this document useful (0 votes)
102 views24 pages

Bioinformatics

The document discusses the field of bioinformatics, including its origins, key areas, and applications. Bioinformatics uses computational tools to analyze and interpret biological data, and it has many uses like structural genomics, molecular medicine, personalized medicine, and microbial genome applications such as waste cleanup and climate change studies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views24 pages

Bioinformatics

The document discusses the field of bioinformatics, including its origins, key areas, and applications. Bioinformatics uses computational tools to analyze and interpret biological data, and it has many uses like structural genomics, molecular medicine, personalized medicine, and microbial genome applications such as waste cleanup and climate change studies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Bioinformatics- Introduction and Applications

 With a large number of prokaryotic and eukaryotic genomes completely


sequenced and more forthcoming, access to the genomic information and
synthesizing it for the discovery of new knowledge have become central themes
of modern biological research.
 Mining the genomic information requires the use of sophisticated computational
tools.
 It therefore becomes imperative for the new generation of biologists to initiate
and familiarize with a field of study that is concerned with the careful storage,
organization and indexing of information in order to tackle the new challenges
in the genomic era.
 Information science has been applied to biology to produce a field is called
bioinformatics.
 It is concerned with the state of- the-art computational tools available to solve
biological research problems.
 The term bioinformatics was coined by Paulien Hogeweg and Ben Hesper to
describe “the study of informatic processes in biotic systems” and it found early
use when the first biological sequence data began to be shared.
 Bioinformatics is an interdisciplinary field that develops methods and
software tools for understanding biological data.
 The development of bioinformatics as a field is the result of advances in both
molecular biology and computer science over the past 30–40 years.
 As an interdisciplinary field of science, bioinformatics combines biology,
computer science, information engineering, mathematics and statistics to
analyze and interpret biological data.
 The key areas of bioinformatics include biological databases, sequence
alignment, gene and promoter prediction, molecular phylogenetics, structural
bioinformatics, genomics, and proteomics
Applications of Bioinformatics
Bioinformatics has not only become essential for basic genomic and molecular
biology research, but is having a major impact on many areas of biotechnology and
biomedical sciences. The main uses of bioinformatics include:
 Bioinformatics plays a vital role in the areas of structural genomics, functional
genomics, and nutritional genomics.
 It covers emerging scientific research and the exploration of proteomes from the
overall level of intracellular protein composition (protein profiles), protein
structure, protein-protein interaction, and unique activity patterns (e.g. post-
translational modifications).
 Bioinformatics is used for transcriptome analysis where mRNA expression levels
can be determined.
 Bioinformatics is used to identify and structurally modify a natural product, to
design a compound with the desired properties and to assess its therapeutic
effects, theoretically.
 Cheminformatics analysis includes analyses such as similarity searching,
clustering, QSAR modeling, virtual screening, etc.
 Bioinformatics is playing an increasingly important role in almost all aspects of
drug discovery and drug development.
 Bioinformatics tools are very effective in prediction, analysis and interpretation
of clinical and preclinical findings.
Molecular medicine
 The human genome will have profound effects on the fields of biomedical
research and clinical medicine.
 The completion of the human genome and the use of bioinformatic tools means
that we can search for the genes directly associated with different diseases and
begin to understand the molecular basis of these diseases more clearly.
 This new knowledge of the molecular mechanisms of disease will enable better
treatments, cures and even preventative tests to be developed.
Personalised medicine
 Clinical medicine will become more personalised with the development of the
field of pharmacogenomics.
 This is the study of how an individual’s genetic inheritence affects the body’s
response to drugs.
 Today, doctors have to use trial and error to find the best drug to treat a
particular patient as those with the same clinical symptoms can show a wide
range of responses to the same treatment.
 In the future, doctors will be able to analyse a patient’s genetic profile and
prescribe the best available drug therapy and dosage from the beginning.
Preventative medicine
 With the specific details of the genetic mechanisms of diseases being unravelled,
the development of diagnostic tests to measure a persons susceptibility to
different diseases may become a distinct reality.
Gene therapy
 In the not too distant future with the use of bioinformatics tool, the potential for
using genes themselves to treat disease may become a reality.
 Gene therapy is the approach used to treat, cure or even prevent disease by
changing the expression of a person’s genes.
Drug development
 At present all drugs on the market target only about 500 proteins.
 With an improved understanding of disease mechanisms and using
computational tools to identify and validate new drug targets, more specific
medicines that act on the cause, not merely the symptoms, of the disease can be
developed.
 These highly specific drugs promise to have fewer side effects than many of
today’s medicines.
Microbial genome applications
 The arrival of the complete genome sequences and their potential to provide a
greater insight into the microbial world and its capacities could have broad and
far reaching implications for environment, health, energy and industrial
applications.
 For these reasons, in 1994, the US Department of Energy (DOE) initiated the
MGP (Microbial Genome Project) to sequence genomes of bacteria useful in
energy production, environmental cleanup, industrial processing and toxic waste
reduction.
 By studying the genetic material of these organisms, scientists can begin to
understand these microbes at a very fundamental level and isolate the genes
that give them their unique abilities to survive under extreme conditions.
Waste cleanup
 Deinococcus radiodurans is known as the world’s toughest bacteria and it is the
most radiation resistant organism known.
 Scientists are interested in this organism because of its potential usefulness in
cleaning up waste sites that contain radiation and toxic chemicals.
Climate change Studies
 Increasing levels of carbon dioxide emission, mainly through the expanding use
of fossil fuels for energy, are thought to contribute to global climate change.
 Recently, the DOE (Department of Energy, USA) launched a program to decrease
atmospheric carbon dioxide levels.
 One method of doing so is to study the genomes of microbes that use carbon
dioxide as their sole carbon source.
Alternative energy sources
 Scientists are studying the genome of the microbe Chlorobium tepidum which
has an unusual capacity for generating energy from light
Biotechnology
 The archaeon Archaeoglobus fulgidus and the bacterium Thermotoga
maritima have potential for practical applications in industry and government-
funded environmental remediation.
 These microorganisms thrive in water temperatures above the boiling point and
therefore may provide the DOE, the Department of Defence, and private
companies with heat-stable enzymes suitable for use in industrial processes
 Other industrially useful microbes include, Corynebacterium glutamicum which is
of high industrial interest as a research object because it is used by the chemical
industry for the biotechnological production of the amino acid lysine.
 The substance is employed as a source of protein in animal nutrition.
 Biotechnologically produced lysine is added to feed concentrates as a source of
protein, and is an alternative to soybeans or meat and bonemeal.
 Lactococcus lactis is one of the most important micro-organisms involved in the
dairy industry.
 Researchers anticipate that understanding the physiology and genetic make-up
of this bacterium will prove invaluable for food manufacturers as well as the
pharmaceutical industry, which is exploring the capacity of lactis to serve as a
vehicle for delivering drugs.
Antibiotic resistance
 Scientists have been examining the genome of Enterococcus faecalis-a leading
cause of bacterial infection among hospital patients.
 They have discovered a virulence region made up of a number of antibiotic-
resistant genes that may contribute to the bacterium’s transformation from a
harmless gut bacteria to a menacing invader.
 The discovery of the region, known as a pathogenicity island, could provide
useful markers for detecting pathogenic strains and help to establish controls to
prevent the spread of infection in wards.
Forensic analysis of microbes
 Scientists used their genomic tools to help distinguish between the strain
of Bacillus anthracis that was used in the summer of 2001 terrorist attack in
Florida with that of closely related anthrax strains.
The reality of bioweapon creation
 Scientists have recently built the virus poliomyelitis using entirely artificial
means.
 They did this using genomic data available on the Internet and materials from a
mail-order chemical supply.
 The research was financed by the US Department of Defence as part of a
biowarfare response program to prove to the world the reality of bioweapons.
 The researchers also hope their work will discourage officials from ever relaxing
programs of immunisation.
 This project has been met with very mixed feelings.
Evolutionary studies
 The sequencing of genomes from all three domains of life, eukaryota, bacteria
and archaea means that evolutionary studies can be performed in a quest to
determine the tree of life and the last universal common ancestor.
Crop improvement
 Comparative genetics of the plant genomes has shown that the organisation of
their genes has remained more conserved over evolutionary time than was
previously believed.
 These findings suggest that information obtained from the model crop systems
can be used to suggest improvements to other food crops.
 At present the complete genomes of Arabidopsis thaliana (water cress) and
Oryza sativa (rice) are available.
Insect resistance
 Genes from Bacillus thuringiensis that can control a number of serious pests
have been successfully transferred to cotton, maize and potatoes.
 This new ability of the plants to resist insect attack means that the amount of
insecticides being used can be reduced and hence the nutritional quality of the
crops is increased.
Improve nutritional quality
 Scientists have recently succeeded in transferring genes into rice to increase
levels of Vitamin A, iron and other micronutrients.
 This work could have a profound impact in reducing occurrences of blindness
and anaemia caused by deficiencies in Vitamin A and iron respectively.
 Scientists have inserted a gene from yeast into the tomato, and the result is a
plant whose fruit stays longer on the vine and has an extended shelf life.
Development of Drought resistance varieties
 Progress has been made in developing cereal varieties that have a greater
tolerance for soil alkalinity, free aluminium and iron toxicities.
 These varieties will allow agriculture to succeed in poorer soil areas, thus adding
more land to the global production base.
 Research is also in progress to produce crop varieties capable of tolerating
reduced water conditions.
Veterinary Science
 Sequencing projects of many farm animals including cows, pigs and sheep are
now well under way in the hope that a better understanding of the biology of
these organisms will have huge impacts for improving the production and health
of livestock and ultimately have benefits for human nutrition.
Comparative Studies
 Analysing and comparing the genetic material of different species is an
important method for studying the functions of genes, the mechanisms of
inherited diseases and species evolution.
 Bioinformatics tools can be used to make comparisons between the numbers,
locations and biochemical functions of genes in different organisms.
Bioinformatics
Bioinformatics is an emerging field of science that deals with the application of
computers to the collection, organization, analysis, manipulation, presentation,
and sharing of biological data.
 Bioinformatics is an interdisciplinary field directly involving molecular biology,
genetics, computer science, mathematics, and statistics.
 The central component of bioinformatics is the study of the best ways to
design and operate biologic databases.
 As a large amount of nucleotide and protein sequence data are obtained via
various research techniques, along with other types of information stored in
primary and secondary biological databases, scientists started to use
computers to obtain and analyze biological data in their daily research with
bioinformatics tools.
 To help the biologists access the databases effectively and use the analysis
tools efficiently, bioinformatics has eventually become a vital part of biological
education.
 Bioinformatics is an evolving discipline, and complex software programs are
now being used for retrieving, sorting out, analyzing, predicting, and storing
DNA and protein sequence data.
 One of the fundamental activities in bioinformatics is the sequence analysis of
DNA and proteins using various programs and databases available on the
world wide web.
 Large commercial enterprises such as pharmaceutical companies employ
bioinformaticians to perform and maintain the large scale and complex
bioinformatics needs of these industries.
 Apart from the analysis of genome sequence data, bioinformatics is now
being used for a vast array of other vital tasks, including analysis of gene
variation and expression, analysis, and prediction of gene and protein
structure and function.
 Besides, bioinformatics has found its importance in tasks like prediction and
detection of gene regulation networks and presentation and analysis of
molecular pathways in order to understand gene-disease interactions.
 Bioinformatics even has clinical applications as the whole genome sequencing
of an organism allows for the production of a complete list of human gene
products that may provide new drugs and gene therapy for single-gene
diseases may become routine.
 Bioinformatics can be used for different other fields of the biology of different
groups of living beings.
Biological Databases
Biological databases are archives of biological data, including genetic and protein
sequences, annotations, pathways, and disease information. These databases are
used for the storage and organization of data in a way that allows easy retrieval of
information.
Types of Biological Databases
Biological databases can be classified into the following three types based on their
contents:
Primary Databases
Primary databases are collections of unprocessed biological data, consisting of raw
sequences or structural information. These databases are repositories of original
information and are not modified in any way. Examples of primary databases include
GenBank, PDB, and DDBJ.
Secondary Databases
Secondary databases contain information that has been processed or curated using
computational or manual methods. The information in these databases is based on
the original data from primary databases. Examples of secondary databases include
PIR, SWISS-PROT, and Pfam.
Specialized Databases
Specialized databases are databases that are designed to serve a specific research
interest. These databases are created with a particular focus on a specific organism or
type of data. Examples of specialized databases include Flybase, the HIV sequence
database, and the Ribosomal Database Project.
Some of the most popular biological databases are discussed below:
 GenBank is a comprehensive and well-annotated collection of nucleic acid
sequence data developed by the National Center for Biotechnology Information
(NCBI). It contains data for nearly all types of organisms.
 EMBL (European Molecular Biology Laboratory) is a nucleotide sequence
database managed by the European Bioinformatics Institute (EBI). It is an
extensive repository of primary nucleotide sequences that stores data on DNA
and RNA, gene expression, protein, structure, pathways, and literature.
 DDBJ (DNA Data Bank of Japan) is a nucleotide sequence database that collects
and maintains nucleotide sequence data from researchers. It is operated by the
National Institute of Genetics in Japan, collaborating with the National Center for
Biotechnology Information (NCBI) and the European Molecular Biology
Laboratory (EMBL).
 PDB (Protein Data Bank) is a biological database that contains structural data of
biological macromolecules. PDB stores the three-dimensional structural data for
large biological molecules such as proteins, DNA, and RNA, determined by
experimental methods such as X-ray crystallography and NMR spectroscopy.
 PIR (Protein Information Resource) is a publicly accessible database of protein
informatics. PIR maintains three other databases: the Protein Sequence Database
(PSD), the Non-redundant Reference (NREF) database, and the integrated
Protein Classification (iProClass) database.
 PROSITE is a protein database that contains a large collection of protein
patterns or profiles. These patterns are linked to documentation providing useful
biological information on the protein family, domain, or functional site.
 Pfam: Pfam is a database of protein families and domains represented by
multiple sequence alignments, profile hidden Markov models (HMMs), and
annotations. The database is accessible online and is used by researchers
worldwide for various applications, including genome annotation, protein
classification, and protein structure prediction.
 KEGG (Kyoto Encyclopedia of Genes and Genomes) is a biological database that
contains genomic, chemical, and systemic functional information used to study
molecular-level information about various cellular processes, including
metabolism, signaling, and diseases.
 OMIM: Online Mendelian Inheritance in Man (OMIM) is a freely available
database of human genes and genetic disorders that contains detailed and
referenced overviews of all known Mendelian genetic disorders and over 16,000
genes
Importance of Biological Databases
 Biological databases allow for the organization of vast amounts of biological
data in a structured manner.
 Biological databases are important resources for researchers that can aid in their
research.
 Biological databases can be used to develop new bioinformatics tools and
methods to drive further research.
 Biological databases also enable collaboration between researchers and facilitate
data sharing and resources.
Primary Databases- Definition, Types, Examples, Uses
Modern genomic research generates vast amounts of raw sequence data, which has
created the need for biological databases to store and organize this enormous
data. Biological databases are collections of biological data that are used for the
storage and organization of data in a way that facilitates easy retrieval of
information.
What are Primary Databases?
Primary databases are a type of biological database that contain original and
unprocessed biological data. These databases typically consist of raw sequences,
such as nucleotide or protein sequences, or structural information, such as molecular
structures.
There are several primary sequence databases available that are widely used in the
field of bioinformatics. The three main primary databases are GenBank at the
National Center for Biotechnology Information (NCBI), the DNA Database of Japan
(DDBJ), and the European Molecular Biology Laboratory (EMBL). Other examples of
primary databases include Protein Data Bank (PDB), Gene Expression Omnibus (GEO),
and ArrayExpress
GenBank
 GenBank is a primary biological database managed by the National Center for
Biotechnology Information (NCBI). It is an annotated collection of publicly
available sequences, which includes information about genes, proteins, and
other genetic elements.
 GenBank is part of the International Nucleotide Sequence Database
Collaboration (INSDC), which is a joint effort between three primary databases:
GenBank, DDBJ, and EMBL. These organizations work collaboratively to share
sequence data from around the world on a daily basis and ensure that the data
in each database is up-to-date and accurate.
 The GenBank flat file format is used to represent the sequence data and
annotations in the database.
 GenBank accepts mRNA or genomic sequence data with proper source organism
information and annotation provided by the submitter. However, the database
does not accept noncontiguous sequences, primer sequences, protein
sequences without underlying nucleotide submission, mixed genomic and
mRNA sequences, consensus sequences, or sequences with lengths of less than
200 nucleotides.
 To submit sequences to this database, there are several web-based tools
available, including BankIt, Sequin, and tbl2asn.
 BankIt is a web-based submission tool that allows users to submit gene
sequences to the GenBank database. It allows for the submission of sets of
sequences.
 Sequin submission tool is used for more complex submissions, such as those
containing long sequences, multiple annotations, or gapped sequences. Sequin
is a stand-alone submission tool provided by NCBI that can be downloaded from
the FTP site for use on Mac, PC, and UNIX platforms. To ensure maximum
performance, each Sequin file should have fewer than 10,000 sequences.
 For even larger submissions, the tbl2asn submission tool should be used. Like
Sequin, tbl2asn is a stand-alone tool that can be downloaded from the FTP site.
The submitter can work offline to prepare the submission and then submit it
using tbl2asn.
European Molecular Biology Laboratory (EMBL)
 EMBL (European Molecular Biology Laboratory) is a collection of nucleotide
sequence data that is maintained by the European Bioinformatics Institute (EBI).
It is also a part of INSDC along with the GenBank and DDBJ databases.
 EMBL’s main focus is on the storage and distribution of nucleotide and protein
sequences, as well as providing tools and resources for researchers to analyze
and interpret this data.
 Like other primary databases, EMBL collects and archives data from various
sources, including scientific publications and direct submissions from
researchers.
 One of the main features of EMBL is its user-friendly interface, which allows
researchers to easily search for and retrieve data.
 EMBL also offers a range of tools and resources for sequence analysis, including
alignment tools, phylogenetic trees, and protein structure prediction software.
 EMBL uses a sequence submission tool called Webin. This tool is web-based and
can be accessed through EMBL’s website. With Webin, researchers can submit
single sequences, multiple sequences, or a large number of sequences.

DNA Data Bank of Japan (DDBJ)


 DDBJ (DNA Data Bank of Japan) is a primary database that collects and stores
genetic information, mainly from Japanese researchers. They also receive and
assign accession numbers to researchers from other countries.
 DDBJ is also a member of INSDC and regularly exchanges collected data with
EMBL and GenBank.
 Its main activities include collecting and exchanging nucleotide sequence data,
managing bioinformatics tools for data submission and retrieval, developing
tools for biological data analysis, and organizing Bioinformatics Training Courses
in Japanese to teach people how to analyze biological data.
 DDBJ uses the newly developed web-based tool called the Nucleotide Sequence
Submission System (NSSS) for sequence submissions. The NSSS replaced Sakura,
beginning in November 2012. Sakura was used for sequence submission from
1995. In cases where the sequences are very long or numerous, DDBJ
recommends using its Mass Submission System (MSS)
Protein Data Bank (PDB)
 PDB (Protein Data Bank) is a global database that stores information about the
structure of biological macromolecules.
 It is managed by Research Collaboratory for Structural Bioinformatics (RCSB) and
provides many services to help researchers access and analyze the structural
data.
 It collects and archives the 3D-atomic level structural models of these
macromolecules obtained through three commonly used experimental
techniques: crystallography, nuclear magnetic resonance spectroscopy (NMR),
and electron microscopy (3DEM).
 The database entries are mostly structures of proteins, although there are also
entries for nucleic acids, carbohydrates, and theoretical models.
 In addition to the structural models, PDB also archives experimental data,
associated metadata, and other details about the molecules.
Gene Expression Omnibus (GEO)
 GEO (Gene Expression Omnibus) is a public database that stores high-
throughput gene expression and functional genomics data.
 It was created in 2000 as a resource for gene expression studies but has since
expanded to include other types of data such as genome methylation and
chromatin structure.
 The database requires that researchers provide raw data, processed data, and
descriptive metadata.
 The original submitter-supplied GEO records are of 3 types: Platform, Sample,
and Series. Platform describes the array or sequencer used, Sample describes the
source and analysis of the sample, and Series links related Samples and
describes a whole study.
 These records are organized into two categories: DataSet and Profile A DataSet
is a curated collection of comparable Samples that share a common set of array
elements. A Profile consists of expression measurements for a gene across all
Samples in a DataSet

Applications of Primary Databases


 Primary databases such as GenBank and EMBL can be used as a reference for
genome analysis and comparison.
 The primary database PDB can be used for protein structure identification.
 Primary databases such as Gene Expression Omnibus (GEO) contain
transcriptome data that can be analyzed to identify differentially expressed
genes and to understand gene expression.
 Primary databases such as KEGG can be used to obtain information on metabolic
and signaling pathways in various organisms.
Secondary Databases- Definition, Types, Examples, Uses
Secondary databases refer to databases that are derived from primary
databases, which include manually curated or computationally processed
information.
The amount of computational processing work in secondary databases varies greatly,
depending on the level of information they provide. Some secondary databases may
simply archive translated sequence data, while others may provide extensive
annotations and information on structure and function.
There are different secondary databases available that contain information on
biological sequences and their attributes, such as expression, structure, function, and
interactions. Some examples of secondary databases are SWISS-PROT, PROSITE,
Pfam, PRINTS, and BLOCKS.
1. SWISS-PROT
 SWISS-PROT is a well-known and widely used secondary database of protein
sequences that provides detailed annotation, including information on structure,
function, and protein family assignment.
 The sequence data is primarily derived from the TrEMBL database, which stores
translated nucleic acid sequences.
 SWISS-PROT stands out from other protein databases for its detailed
annotations, minimal redundancy, and integration with other databases.
 Annotations in SWISS-PROT provide detailed information on protein function,
post-translational modifications, domains and sites, secondary and quaternary
structure, similarities to other proteins, diseases associated with deficiencies in
the protein, sequence conflicts, and variants.
 Swiss-Prot is popular for its low redundancy and high level of integration with
other databases
2. PROSITE
 ProSite is a database of protein families, domains, and functional sites that
contains manually curated information on amino acid patterns and profiles of
proteins.
 It is a secondary protein database that provides tools for the analysis of protein
sequences and the identification of motifs.
 The database contains a large collection of signature patterns or profiles that
hold biological importance. Each signature is associated with important
biological information such as protein family, domain, or functional site.
 ProSite uses two types of signatures, patterns and generalized profiles, to
identify conserved regions.
 These signatures can be used to predict the function and structure of proteins
and help in the annotation of new protein sequences.
3. Pfam
 Pfam is another secondary database of protein families and domains that are
represented by multiple sequence alignments, profile hidden Markov models
(HMMs), and annotations.
 The database is accessible online and is used by researchers worldwide for a
variety of applications, including genome annotation, protein classification, and
protein structure prediction.
 Pfam has two components. Pfam-A stores manually curated high-quality entries.
Pfam-B stores automatically generated lower-quality entries.
 Pfam provides a platform for the analysis of protein sequence data, which allows
researchers to search for related proteins in the database based on the presence
of specific protein domains.
4. PRINTS
 PRINTS database contains protein family fingerprints which are groups of motifs.
 PRINTS is one of several widely-used pattern databases, including PROSITE,
BLOCKS, and Pfam, each with different strengths and weaknesses.
 PRINTS uses a fingerprinting method that detects distant relatives of large and
highly divergent protein superfamilies by exploiting conserved regions within
sequence alignments.
5. BLOCKS
 BLOCKS is a collection of ungapped multiple alignments of segments of related
protein sequences, called blocks, that represent the most conserved regions of
proteins.
 It contains blocks for a wide variety of protein families, including enzymes,
receptors, transporters, and structural proteins.
 Each block is assigned a unique identifier and annotated with information about
the proteins it represents, including their names, functions, and structures.
 The database is widely used as a tool for protein family classification, protein
structure prediction, and functional annotation.
Applications of Secondary Databases
 Secondary databases can be used to predict the structure and function of
proteins by identifying homologous proteins with known structures.
 Secondary databases contain functional annotation information which helps to
better understand the roles of proteins in different organisms.
 Secondary databases also help to identify conserved regions within a sequence,
which can help to identify important functional domains and motifs.
 Secondary databases also help in evolutionary analysis by comparing protein
sequences across different species to study the evolution of proteins.
 Secondary databases can also be used to identify potential drug targets by
analyzing protein families and identifying conserved motifs that are essential for
protein function
What is Internet? Definition, Uses, Working, Advantages and Disadvantages
The Internet is the foremost important tool and the prominent resource that is being
used by almost every person across the globe. It connects millions of computers,
webpages, websites, and servers. Using the internet we can send emails, photos,
videos, and messages to our loved ones. Or in other words, the Internet is a
widespread interconnected network of computers and electronic devices(that
support Internet). It creates a communication medium to share and get information
online. If your device is connected to the Internet then only you will be able to
access all the applications, websites, social media apps, and many more services.
The Internet nowadays is considered the fastest medium for sending and receiving
information.
History of the Internet
The Internet came in the year 1960 with the creation of the first working model
called ARPANET (Advanced Research Projects Agency). It allowed multiple
computers to work on a single network which was their biggest achievement at that
time. ARPANET uses packet switching to communicate multiple computer systems
under a single network. In October 1969, using ARPANET first message was
transferred from one computer to another. After that technology continues to grow.
How is the Internet Set Up?
The internet is set up with the help of physical optical fiber data transmission cables
or copper wires and various other networking mediums like LAN, WAN, MAN, etc.
For accessing the Internet even the 2G, 3G, and 4G services and the Wifi require
these physical cable setups to access the Internet. There is an authority
named ICANN (Internet Corporation for Assigned Names and Numbers) located
in the USA which manages the Internet and protocols related to it like IP addresses.
How Does the Internet Work?
The actual working of the internet takes place with the help of clients and servers.
Here the client is a laptop that is directly connected to the internet and servers are
the computers connected indirectly to the Internet and they are having all the
websites stored in those large computers. These servers are connected to the
internet with the help of ISP (Internet Service Providers) and will be identified with
the IP address.
Each website has its Domain name as it is difficult for any person to always
remember the long numbers or strings. So, whenever you search for any domain
name in the search bar of the browser the request will be sent to the server and that
server will try to find the IP address from the Domain name because it cannot
understand the domain name. After getting the IP address the server will try to
search the IP address of the Domain name in a Huge phone directory that in
networking is known as a DNS server (Domain Name Server). For example, if we
have the name of a person and we can easily find the Aadhaar number of him/her
from the long directory as simple as that.
So after getting the IP address, the browser will pass on the further request to the
respective server and now the server will process the request to display the content
of the website which the client wants. If you are using a wireless medium of Internet
like 3G and 4G or other mobile data then the data will start flowing from the optical
cables and will first reach towers from there the signals will reach your cell phones
and PCs through electromagnetic waves and if you are using routers then optical
fiber connecting to your router will help in connecting those light-induced signals to
electrical signals and with the help of ethernet cables internet reaches your
computers and hence the required information.
For more, you can refer to How Does the Internet Work?
What is an IP Address?
IP Address stands for Internet Protocol Address. Every PC/Local machine is having
an IP address and that IP address is provided by the Internet Service Providers
(ISPs). These are some sets of rules which govern the flow of data whenever a
device is connected to the Internet. It differentiates computers, websites, and
routers. Just like human identification cards like Aadhaar cards, Pan cards, or any
other unique identification documents. Every laptop and desktop has its own unique
IP address for identification. It’s an important part of Internet technology. An IP
address is displayed as a set of four-digit like 192.154.3.29. Here each number on
the set ranges from 0 to 255. Hence, the total IP address range from 0.0.0.0 to
255.255.255.255.
You can check the IP address of your Laptop or desktop by clicking on the Windows
start menu -> then right-click and go to network -> in that go to status and then
Properties you can see the IP address. There are four different types of IP
addresses are available:
1. Static IP Address
2. Dynamic IP Address
3. Private IP Address
4. Public IP Address
World Wide Web (WWW)
The world wide web is a collection of all the web pages, and web documents that
you can see on the Internet by searching their URLs (Uniform Resource Locator) on
the Internet. For example, www.geeksforgeeks.org is the URL of the GFG website,
and all the content of this site like webpages and all the web documents are stored
on the world wide Web. Or in other words, the world wide web is an information
retrieval service of the web. It provides users with a huge array of documents that
are connected to each other by means of hypertext or hypermedia links. Here,
hyperlinks are known as electronic connections that link the related data so that
users can easily access the related information hypertext allows the user to pick a
word or phrase from text, and using this keyword or word or phrase can access
other documents that contain additional information related to that word or keyword
or phrase. World wide web is a project which is created by Timothy Berner’s Lee in
1989, for researchers to work together effectively at CERN. It is an organization,
named World Wide Web Consortium (W3C), which was developed for further
development in the web.
Difference Between World Wide Web and the Internet
The main difference between the World Wide Web and the Internet are:
Internet
World Wide Web

All the web pages and web documents are stored there The Internet is a global network of
on the World wide web and to find all that stuff you will computers that is accessed by the
have a specific URL for each website. World wide web.
Internet
World Wide Web

The world wide web is a service. The Internet is an infrastructure.

The Internet is the superset of the


The world wide web is a subset of the Internet.
world wide web.

The world wide web is software-oriented. The Internet is hardware-oriented.

The world wide web uses HTTP. The Internet uses IP Addresses.

The world wide web can be considered as a book from The Internet can be considered a
the different topics inside a Library. Library.
Uses of the Internet
Some of the important usages of the internet are:
 Online Businesses (E-commerce): Online shopping websites have made our life
easier, e-commerce sites like Amazon, Flipkart, and Myntra are providing very
spectacular services with just one click and this is a great use of the Internet.
 Cashless Transactions: All the merchandising companies are offering services to
their customers to pay the bills of the products online via various digital payment
apps like Paytm, Google Pay, etc. UPI payment gateway is also increasing day by
day. Digital payment industries are growing at a rate of 50% every year too
because of the INTERNET.
 Education: It is the internet facility that provides a whole bunch of educational
material to everyone through any server across the web. Those who are unable to
attend physical classes can choose any course from the internet and can have
point-to-point knowledge of it just by sitting at home. High-class faculties are
teaching online on digital platforms and providing quality education to students
with the help of the Internet.
 Social Networking: The purpose of social networking sites and apps is to connect
people all over the world. With the help of social networking sites, we can talk, and
share videos, and images with our loved ones when they are far away from us.
Also, we can create groups for discussion or for meetings.
 Entertainment: The Internet is also used for entertainment. There are numerous
entertainment options available on the internet like watching movies, playing
games, listening to music, etc. You can also download movies, games, songs, TV
Serial, etc., easily from the internet.
Security and the Internet
Very huge amount of data is managed across the Internet almost the time, which
leads to the risk of data breaching and many other security issues. Both Hackers
and Crackers can lead to disrupting the network and can steal important information
like Login Credentials, Banking Credentials, etc.
Steps to Protect the Online Privacy
 Install Antivirus or Antimalware.
 Create random and difficult passwords, so that it becomes difficult to guess.
 Use a private browsing window or VPN for using the Internet.
 Try to use HTTPS only for better protection.
 Try to make your Social Media Account Private.
 If you are not using any application, which requires GPS, then you can turn GPS
off.
 Do not simply close the tab, first log out from that account, then close the tab.
 Try to avoid accessing public Wifi or hotspots.
 Try to avoid opening or downloading content from unknown sources.
There is an element of the Internet called the Dark Web, which is not accessible
from standard browsers. To keep safe our data, we can use Tor and I2P, which
helps in keeping our data anonymous, that helps in protecting user security, and
helps in reducing cybercrime.
Social Impact of the Internet
The social impact of the Internet can be seen in both ways. Some say it has a
positive impact as it helps in gaining civic engagement, etc. whereas some say it
has a negative impact as it increased the risk of getting fooled by someone over the
internet, getting withdrawal from society, etc.
Whatever the impact of Social Media, one thing is that it changed the way of
connecting and interacting with others in society. The number of people increasing
day by day on social media platforms which helps in constructing new relationships
over social media, new communities are made on social media in the interest of the
people. Social Media platforms like Facebook, Instagram, LinkedIn, etc are the most
used social media platform for both individual and business purposes where we can
communicate with them and perform our tasks.
Advantages of the Internet
 Online Banking and Transaction: The Internet allows us to transfer money
online through the net banking system. Money can be credited or debited from one
account to the other.
 Education, Online Jobs, Freelancing: Through the Internet, we are able to get
more jobs via online platforms like Linkedin and to reach more job providers.
Freelancing on the other hand has helped the youth to earn a side income and the
best part is all this can be done via the INTERNET.
 Entertainment: There are numerous options for entertainment online we can
listen to music, play games can watch movies, and web series, and listen to
podcasts, youtube itself is a hub of knowledge as well as entertainment.
 New Job Roles: The Internet has given us access to social media, and digital
products so we are having numerous new job opportunities like digital marketing
and social media marketing online businesses are earning huge amounts of
money just because the Internet is the medium to help us to do so.
 Best Communication Medium: The communication barrier has been removed
from the Internet. You can send messages via email, Whatsapp, and Facebook.
Voice chatting and video conferencing are also available to help you to do
important meetings online.
 Comfort to humans: Without putting any physical effort you can do so many
things like shopping online it can be anything from stationeries to clothes, books to
personal items, etc. You can books train and plane tickets online.
 GPS Tracking and google maps: Yet another advantage of the internet is that
you are able to find any road in any direction, and areas with less traffic with the
help of GPS on your mobile.
Disadvantages of the Internet
 Time Wastage: Wasting too much time on the internet surfing social media apps
and doing nothing decreases your productivity rather than wasting time on
scrolling social media apps one should utilize that time in doing something skillful
and even more productive.
 Bad Impacts on Health: Spending too much time on the internet causes bad
impacts on your health physical body needs some outdoor games exercise and
many more things. Looking at the screen for a longer duration causes serious
impacts on the eyes.
 Cyber Crimes: Cyberbullying, spam, viruses, hacking, and stealing data are some
of the crimes which are on the verge these days. Your system which contains all
the confidential data can be easily hacked by cybercriminals.
 Effects on Children: Small children are heavily addicted to the Internet watching
movies, and games all the time is not good for their overall personality as well as
social development.
 Bullying and Spreading Negativity: The Internet has given a free tool in the form
of social media apps to all those people who always try to spread negativity with
very revolting and shameful messages and try to bully each other which is wrong.
Overview
WWW stands for World Wide Web. A technical definition of the World Wide Web is :
all the resources and users on the Internet that are using the Hypertext Transfer
Protocol (HTTP).
A broader definition comes from the organization that Web inventor Tim Berners-
Lee helped found, the World Wide Web Consortium (W3C).
The World Wide Web is the universe of network-accessible information, an
embodiment of human knowledge.
In simple terms, The World Wide Web is a way of exchanging information between
computers on the Internet, tying them together into a vast collection of interactive
multimedia resources.
Internet and Web is not the same thing: Web uses internet to pass over the
information.

Evolution
World Wide Web was created by Timothy Berners Lee in 1989
at CERN in Geneva. World Wide Web came into existence as a proposal by him, to
allow researchers to work together effectively and efficiently at CERN. Eventually it
became World Wide Web.
The following diagram briefly defines evolution of World Wide Web:

WWW Architecture
WWW architecture is divided into several layers as shown in the following diagram:

Identifiers and Character Set


Uniform Resource Identifier (URI) is used to uniquely identify resources on the
web and UNICODE makes it possible to built web pages that can be read and write
in human languages.
Syntax
XML (Extensible Markup Language) helps to define common syntax in semantic
web.
Data Interchange
Resource Description Framework (RDF) framework helps in defining core
representation of data for web. RDF represents data about resource in graph form.
Taxonomies
RDF Schema (RDFS) allows more standardized description of taxonomies and
other ontological constructs.
Ontologies
Web Ontology Language (OWL) offers more constructs over RDFS. It comes in
following three versions:
 OWL Lite for taxonomies and simple constraints.
 OWL DL for full description logic support.
 OWL for more syntactic freedom of RDF
Rules
RIF and SWRL offers rules beyond the constructs that are available
from RDFs and OWL. Simple Protocol and RDF Query Language (SPARQL) is
SQL like language used for querying RDF data and OWL Ontologies.
Proof
All semantic and rules that are executed at layers below Proof and their result will be
used to prove deductions.
Cryptography
Cryptography means such as digital signature for verification of the origin of
sources is used.
User Interface and Applications
On the top of layer User interface and Applications layer is built for user
interaction.
WWW Operation
WWW works on client- server approach. Following steps explains how the web
works:
1. User enters the URL (say, http://www.tutorialspoint.com) of the web page
in the address bar of web browser.
2. Then browser requests the Domain Name Server for the IP address
corresponding to www.tutorialspoint.com.
3. After receiving IP address, browser sends the request for web page to the
web server using HTTP protocol which specifies the way the browser and web
server communicates.
4. Then web server receives request using HTTP protocol and checks its search
for the requested web page. If found it returns it back to the web browser and
close the HTTP connection.
5. Now the web browser receives the web page, It interprets it and display the
contents of web page in web browser’s window.
Future
There had been a rapid development in field of web. It has its impact in almost every
area such as education, research, technology, commerce, marketing etc. So the
future of web is almost unpredictable.
Apart from huge development in field of WWW, there are also some technical issues
that W3 consortium has to cope up with.
User Interface
Work on higher quality presentation of 3-D information is under deveopment. The
W3 Consortium is also looking forward to enhance the web to full fill requirements of
global communities which would include all regional languages and writing systems.
Technology
Work on privacy and security is under way. This would include hiding information,
accounting, access control, integrity and risk management.
Architecture
There has been huge growth in field of web which may lead to overload the internet
and degrade its performance. Hence more better protocol are required to be
developed.
History

Historically, the term bioinformatics did not mean what it means today. Paulien
Hogeweg and Ben Hesper coined it in 1970 to refer to the study of information
processes in
biotic systems. This definition placed bioinformatics as a field parallel to biophysics
(the
study of physical processes in biological systems) or biochemistry (the study of
chemical
processes in biological systems.
Since Mendel, bioinformatics and genetic record keeping have come a long way.
The
understanding of genetics has advanced remarkably in the last thirty years. In 1972,
Paul berg
made the first recombinant DNA molecule using ligase. In that same year, Stanley
Cohen,
Annie Chang and Herbert Boyer produced the first recombinant DNA organism. In
1973, two
important things happened in the field of genomics:
1. Joseph Sambrook led a team that refined DNA electrophoresis using agarose gel,
and
2. Herbert Boyer and Stanely Cohen invented DNA cloning. By 1977, a method for
sequencing DNA was discovered and the first genetic engineering company,
Genetech was
founded.
By 1981, 579 human genes had been mapped and mapping by insitu hybridization
had
become a standard method. Marvin Carruthers and Leory Hood made a huge leap in
bioinformatics when they invented a mehtod for automated DNA sequencing. In
1988, the
Human Genome organization (HUGO) was founded. This is an international
organization of
scientists involved in Human Genome Project. In 1989, the first complete genome
map was
published of the bacteria Haemophilus influenza.
The following year, the Human Genome Project was started. By 1991, a total of 1879
human
genes had been mapped. In 1993, Genethon, a human genome research center in
France
Produced a physical map of the human genome. Three years later, Genethon
published the
final version of the Human Genetic Map. This concluded the end of the first phase of
the
Human Genome Project.
In the mid-1970s, it would take a laboratory at least two months to sequence 150
nucleotides.
Ten years ago, the only way to track genes was to scour large, well documented
family trees
of relatively inbred populations, such as the Ashkenzai Jews from Europez. These
types of
genealogical searches 11 million nucleotides a day for its corporate clients and
company
research.
Bioinformatics was fuelled by the need to create huge databases, such as GenBank
and
EMBL and DNA Database of Japan to store and compare the DNA sequence data
erupting
from the human genome and other genome sequencing projects. Today,
bioinformatics
embraces protein structure analysis, gene and protein functional information, data
from
patients, pre-clinical and clinical trials, and the metabolic pathways of numerous
species.
A Chronological History of Bioinformatics
• 1953 - Watson & Crick proposed the double helix model for DNA based x-ray data
obtained by Franklin & Wilkins.
• 1954 - Perutz's group develop heavy atom methods to solve the phase problem in
protein crystallography.
• 1955 - The sequence of the first protein to be analysed, bovine insulin, is announed
by
F.Sanger.
• 1969 - The ARPANET is created by linking computers at Standford and UCLA.
• 1970 - The details of the Needleman-Wunsch algorithm for sequence comparison
are
published.
• 1972 - The first recombinant DNA molecule is created by Paul Berg and his group.

1973 - The Brookhaven Protein DataBank is announeced


(Acta.Cryst.B,1973,29:1764).
Robert Metcalfe receives his Ph.D from Harvard University. His thesis describes
Ethernet.
• 1974 - Vint Cerf and Robert Khan develop the concept of connecting networks of
computers into an "internet" and develop the Transmission Control Protocol (TCP).
• 1975 - Microsoft Corporation is founded by Bill Gates and Paul Allen. Two-
dimensional
electrophoresis, where separation of proteins on SDS polyacrylamide gel is
combined with
separation according to isoelectric points, is announced by P.H.O'Farrel.
• 1988 - The National Centre for Biotechnology Information (NCBI) is established at
the
National Cancer Institute. The Human Genome Intiative is started (commission on
Life
Sciences, National Research council. Mapping and sequencing the Human Genome,
National
Academy Press: wahington, D.C.), 1988. The FASTA algorith for sequence
comparison is
published by Pearson and Lupman. A new program, an Internet computer virus
desined by a
student, infects 6,000 military computers in the US.
• 1989 - The genetics Computer Group (GCG) becomes a privatae company. Oxford
Molceular Group,Ltd.(OMG) founded, UK by Anthony Marchigton, David Ricketts,
James
Hiddleston, Anthony Rees, and W.Graham Richards. Primary products: Anaconds,
Asp,
Cameleon and others (molecular modeling, drug design, protein design).
• 1990 - The BLAST program (Altschul,et.al.) is implemented. Molecular applications
group
is founded in California by Michael Levitt and Chris Lee. Their primary products are
Look
and SegMod which are used for molecular modeling and protein deisign. InforMax is
founded in Bethesda, MD. The company's products address sequence analysis,
database and
data management, searching, publication graphics, clone construction, mapping and
primer
design.
• 1991 - The research institute in Geneva (CERN) announces the creation of the
protocols
which make -up the World Wide Web. The creation and use of expressed sequence
tags
(ESTs) is described. Incyte Pharmaceuticals, a genomics company headquartered in
Palo Alto
California, is formed. Myriad Genetics, Inc. is founded in Utah. The company's goal
is to
lead in the discovery of major common human disease genes and their related
pathways. The
company has discovered and sequenced, with its academic collaborators, the
following major genes: BRCA1, BRACA1 , CHD1, MMAC1, MMSC1, MMSC2, CtIP,
p16,
p19 and MTS2.
• 1993 - CuraGen Corporation is formed in New Haven, CT. Affymetrix begins
independent
operations in Santa Clara, California.
• 1994 - Netscape Communications Corporation founded and releases Naviagator,
the
commerical version of NCSA's Mozilla. Gene Logic is formed in Maryland. The
PRINTS
database of protein motifs is published by Attwood and Beck. Oxford Molecular
Group
acquires IntelliGenetics.
• 1995 - The Haemophilus influenzea genome (1.8) is sequenced. The Mycoplasma
genitalium genome is sequenced.
• 1996 - The genome for Saccharomyces cerevisiae (baker's yeadt, 12.1 Mb) is
sequenced.
The prosite database is reported by Bairoch, et.al. Affymetrix produces the first
commerical
DNA chips.
• 1997 - The genome for E.coli (4.7 Mbp) is published.Oxford Molecualr Group
acquires the
Genetics Computer Group. LION bioscience AG founded as an intergrated genomics
company with strong focus on bioinformatics. The company is built from IP out of the
European Molecualr Biology Laboratory (EMBL), the European Bioinformtics Institute
(EBI), the GErman Cancer Research Center (DKFZ), and the University of
Heidelberg.paradigm Genetics Inc., a company focussed on the application of
genomic
technologies to enhance worldwide food and fiber production, is founded in
Research
Triangle Park, NC. deCode genetics publishes a paper that described the location of
the FET1
gene, which is responsible for familial essential tremor, on chromosome 13 (Nature
Genetics).
• 1998 - The genomes for Caenorhabitis elegans and baker's yeast are
published.The Swiss
Institute of Bioinformatics is established as a non-profit foundation.Craig Venter
forms
Celera in Rockville, Maryland. PE Informatics was formed as a center of Excellence
within
PE Biosystems. This center brings together and leverges the complementary
expertise of PE
Nelson and Molecualr Informatics, to further complement the genetic instrumention
expertise
of Applied Biosystems.Inpharmatica, a new Genomics and Bioinformatics company,
is
established by University College London, the Wolfson Institute for Biomedical
Research,
five leading scientists from major British academic centres and Unibio Limited.
GeneFormatics, a company dedicated to the analysis and predication of protein
structure and
function, is formed in San Diego.Molecualr Simulations Inc. is acquired by
Pharmacopeia.
• 1999 - deCode genetics maps the gene linked to pre-eclampsia as a locus on
chromosome 2p13.
• 2000 - The genome for Pseudomonas aeruginosa (6.3 Mbp) is published. The
Athaliana
genome (100 Mb) is secquenced.The D.melanogaster genome (180 Mb) is
sequenced.Pharmacopeia acquires Oxoford Molecular Group.
• 2001 - The huam genome (3,000 Mbp)

You might also like