0% found this document useful (0 votes)
10 views24 pages

RAJU

Uploaded by

Pathak Payhak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views24 pages

RAJU

Uploaded by

Pathak Payhak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

WHAT IS BIOINFORMATICS?

Bioinformatics is a special field that combines biology (the study of


living things) and computer science. It uses computers to handle big
tasks like storing, finding, and analyzing information about important
molecules in living things, such as DNA, RNA, and proteins. These
molecules are like the building blocks of life.

For example:

● Computers help scientists study huge amounts of DNA to find


patterns or important parts, like genes.
● They are also used to predict how proteins are shaped and what
they do in the body.

Without computers, it would take a long time for humans to do all the
repetitive or complicated math needed for these tasks!

GOALS

The ultimate goal of bioinformatics is to understand how living cells


function at the molecular level by analyzing data from DNA, RNA,
and proteins. This is rooted in the central dogma of biology, where
DNA is transcribed into RNA, which is then translated into
proteins—the main molecules responsible for cellular functions.
Proteins' roles are determined by their sequences, so studying these
sequences (and sometimes their structures) helps scientists uncover
how cells work, predict protein behavior, and understand diseases
caused by mutations. Bioinformatics provides a "big picture" view,
enabling new discoveries and deeper insights into life processes.

Scope of Bioinformatics
Bioinformatics has two main areas: developing computational tools
and databases and applying them to study living systems. These
two fields work together to advance our understanding of biology.
Tool development includes creating software for analyzing molecular
sequences, structures, and functions, as well as building and
managing biological databases. These tools are applied in three
research areas: sequence analysis, structural analysis, and
functional analysis.

Sequence analysis involves tasks like aligning sequences, finding


genes, and comparing genomes. Structural analysis focuses on
studying and predicting the shapes of proteins and nucleic acids.
Functional analysis explores activities like gene expression, protein
interactions, and metabolic pathways. These areas often overlap—for
example, predicting a protein’s structure may rely on sequence
alignment, and understanding gene function may require data from all
three types of analysis. Together, these approaches help solve
complex biological problems and drive the development of better
computational tools.
APPLICATIONS

Bioinformatics is super helpful in many areas like medicine,


crime-solving, and farming. It speeds up the discovery of new
medicines by helping scientists design drugs that stick to the right
proteins in the body, working better and causing fewer side effects
than older trial-and-error methods. In crime-solving, DNA analysis
can now prove someone's identity with advanced computer tools,
which are even used in courts!

In healthcare, bioinformatics helps create personalized medicine,


where doctors can study a patient’s DNA to find and treat diseases
early. For example, they might detect harmful mutations quickly and
suggest the best treatments.
Limitations of Bioinformatics

While bioinformatics is a powerful tool, it has limitations and should


not be overly relied upon. Like intelligence in a battlefield, it provides
valuable guidance but depends on accurate data. Errors in raw
sequence data or algorithm flaws can lead to incorrect predictions,
which may misguide research. Bioinformatics predictions are not
proof; they need to be validated with experimental biology.

The accuracy of results depends on data quality, algorithm


sophistication, and computing power. High-speed algorithms may
sacrifice precision for efficiency, creating a trade-off. Since
bioinformatics is still a developing field, its tools and predictions
often fail to fully reflect biological reality. To minimize errors, it’s
good practice to compare outputs from multiple tools and seek
consensus for more reliable predictions.

What is a Database?

A database is like a digital filing cabinet that helps store and organize
information so you can find it easily. Imagine you have a box full of
cards with names, phone numbers, and addresses. If you want to find
someone’s phone number, you’d search through the cards. A database
does this faster with the help of computers.

Each piece of information in a database is stored in a record (like a


card). A record has different fields, such as "Name," "Phone
Number," or "Address," where the actual details are kept. To find
something, you make a query—you tell the database what you’re
looking for (e.g., "Find the person named Alex"), and it gives you the
matching record.

In biology, databases do even more! They don’t just store data like
DNA sequences but also help find patterns, similarities, or special
features in the data. For example, they can compare DNA sequences
to see if they are related or find parts of sequences that haven’t been
studied yet. This helps scientists discover new things about living
organisms.

Types of Databases

Databases were first simple flat files, like long text files with data
entries separated by characters (e.g., commas). These were fine for
small data but became inefficient for large, complex datasets since
searching required reading the entire file.

To fix this, Database Management Systems (DBMS) were created


to organize data better. These systems can find information faster and
connect related data automatically. Two main types are:

1. Relational Databases: Organize data into tables linked by


relationships (e.g., matching IDs).
2. Object-Oriented Databases: Store data as objects (like in
programming), making them better for complex data.

Relational Databases

In relational databases, data is organized into multiple tables instead


of a single file. Each table has columns (fields) and rows (records),
with columns indexed by common features called attributes. These
tables can be linked together by shared attributes, making it easy to
retrieve related information from different tables.

For example, to find out which courses students from Texas are
taking, the database first checks a table to find students from Texas,
then links to another table to find their student IDs, and finally
connects to a third table to list the corresponding courses. This
method is much faster than reading through a single flat file,
especially for large databases, because relational databases can
efficiently link data across tables.

Relational databases use SQL (Structured Query Language) to


create, modify, and search data. They are designed to allow easy
additions of new data categories without disrupting existing tables.
This makes retrieving and reporting information straightforward and
efficient, particularly when compared to flat file databases.

Object-Oriented Databases

Object-oriented databases are designed to handle complex


relationships between data more naturally than relational databases.
Instead of using tables, these databases store data as objects, which
are units that contain both data and the functions (or routines) that
act on the data. This allows for more flexible organization, especially
for complex, hierarchical data like multimedia (images, audio, video).

In these databases, objects are connected using pointers, which define


the relationships between them. To search the database, you navigate
through these objects by following the pointers that link related data.
This structure is especially useful for handling data with complex
relationships, making programming tasks easier.

However, object-oriented databases don't have the same solid


mathematical structure as relational databases, which can sometimes
lead to incorrect representation of relationships between objects.
Some modern systems combine both object-oriented and relational
features to benefit from both approaches, creating what's called an
object-relational database.

For example, to find out which courses students from Texas are
taking, an object-oriented database uses pointers starting from the
Texas object, which leads to student objects, and then points to the
courses they are enrolled in, making data retrieval quick and intuitive.

2. BIOLOGICAL DATABASES

Biological Databases
Biological databases are classified into three main categories:
primary, secondary, and specialized databases. Primary databases
store original biological data, such as GenBank for DNA sequences
and the Protein Data Bank (PDB) for protein structures. Secondary
databases contain curated or processed data based on primary
sources, like SWISS-Prot and PIR, which provide functional
annotations for proteins. Specialized databases focus on specific
research areas or organisms, such as Flybase for fruit flies, the HIV
sequence database, and the Ribosomal Database Project for
ribosomal RNA. These databases use various structures like flat files,
relational, and object-oriented formats, depending on their needs and
simplicity.

2.1 Primary databases

store raw biological data and are essential for research. The three
major public sequence databases—GenBank, the European
Molecular Biology Laboratory (EMBL) database, and the DNA
Data Bank of Japan (DDBJ)—are freely accessible online and hold
nucleic acid sequence data contributed by researchers worldwide.
These databases collaborate and exchange data daily through the
International Nucleotide Sequence Database Collaboration,
ensuring that the same sequence data is available across all three
platforms. Although they contain identical raw data, the format in
which the data is represented may differ slightly between them. For
protein structures, the Protein Data Bank (PDB) is the centralized
resource, storing atomic coordinates for macromolecules obtained
through techniques like X-ray crystallography and NMR, and
provides tools for visualizing these structures.

2.2 Secondary Databases


Secondary databases make raw data from primary databases more
useful. For example, SWISS-PROT adds extra details to protein data,
like what the protein does and what it looks like, and experts check all
this information. UniProt is like a bigger version of SWISS-PROT
with even more data. Some databases, like Pfam and Blocks, help
sort proteins into groups based on how they work or what they look
like. DALI helps scientists understand protein shapes and find
connections between them. It's like taking a puzzle piece and adding
more clues to make it easier to understand.
2.3 Specialized databases

focus on specific types of research or particular organisms. They often


have data similar to what’s found in primary databases but may also
include unique information and expert-curated annotations. For
example, Flybase focuses on data related to fruit flies, while
WormBase is for information about worms. These databases might
contain sequences or data from experiments, like gene expression
information in GenBank EST or Microarray Gene Expression
Database. These databases are valuable for researchers studying
specific species or topics.

Interconnection between Biological Databases

Biological databases often need to work together because information


in one database is not always enough to complete a task. Primary
databases store raw biological data, while secondary and specialized
databases process or provide more specific data. To make it easier for
researchers to access all the information they need, databases are
often linked together. However, different databases use different
structures (like flat files, relational databases, or object-oriented
databases), which makes it hard for them to communicate. To solve
this problem, technologies like COBRA (Common Object Request
Broker Architecture) and XML (Extensible Markup Language)
are used to allow databases to connect and share information without
needing to understand each other’s structure. These protocols help
improve data exchange and make it easier to gather data from
multiple sources for analysis.

PITFALLS OF BIOLOGICAL DATABASES


Biological databases can have problems that make the information
unreliable. One issue is that there are often errors in the DNA
sequences stored, such as mistakes made during sequencing or
contamination from other sources. Older sequences are more likely to
have errors, so caution is needed when using them. Another problem
is redundancy, where the same information is repeated multiple times
in the database, making it difficult to find what you need. To fix this,
some databases, like RefSeq, remove duplicates and organize
sequences better. There are also mistakes in the descriptions of genes
and proteins, which can lead to confusion. To solve this, researchers
use controlled naming systems, like Gene Ontology, to make sure
genes are named correctly and consistently.

5. INFORMATION RETRIEVAL FROM BIOLOGICAL


DATABASES

Biological databases have systems that help users easily find and
retrieve data. Two of the most popular systems are Entrez and
Sequence Retrieval Systems (SRS), which allow users to search
across multiple databases at once and get combined results. When
searching, users often need to perform complex queries using
Boolean operators like AND, OR, and NOT. These operators help
combine search terms in logical ways: AND means both terms must
be included, OR means either term can be included, and NOT
excludes results containing certain terms. Parentheses can group terms
to define a concept, and quotes can be used to search for exact
phrases. Most search engines in biological databases use this type of
Boolean logic.
5.1 ENTREZ

Entrez, developed and maintained by NCBI, is a powerful tool for


retrieving biological data. It allows users to perform text-based
searches across multiple types of data, including genetic sequences,
structural information, and biomedical literature. A key feature of
Entrez is its ability to integrate data by cross-referencing entries
across different NCBI databases. This means users can access related
information without visiting separate databases. For example, a
nucleotide sequence page might link to the translated protein
sequence, genome mapping data, related literature, and even protein
structures.

Entrez offers several search options to narrow results. The "Limits"


feature lets users focus on specific subsets of data, like restricting the
search to a certain database or data type. "Preview/Index" connects
searches with Boolean operators and keywords, while "History"
allows users to review and combine previous searches. Users can also
store results in the "Clipboard" for later use.

5.2 GenBank

is a comprehensive database that contains annotated nucleic acid


sequence data from a wide range of organisms. It includes various
types of sequence data, such as genomic DNA, mRNA, cDNA, ESTs
(Expressed Sequence Tags), and high-throughput raw sequence data.
GenBank also holds sequence polymorphisms. In addition to nucleic
acid sequences, there is a related database, GenPept, which contains
protein sequences, most of which are conceptual translations of DNA
sequences, although some are derived from peptide sequencing
techniques.
There are two main ways to search GenBank. One method is using
text-based keywords, similar to searching PubMed. The other
method involves searching for sequence similarity using a tool like
BLAST (Basic Local Alignment Search Tool), which allows users to
find sequences that are similar to a given input sequence.

GenBank Sequence Format is the structure used to organize and


display nucleotide sequence data. When you retrieve sequence data
from GenBank, it is provided as a flat file with three main sections:
Header, Features, and Sequence.
1. Header Section:

● Locus: A unique identifier for the sequence, followed by


sequence length and molecule type (DNA, RNA).
● DEFINITION: Provides a summary of the sequence, including
the name of the sequence, the source organism, and whether the
sequence is complete or partial.
● Accession Number: A unique identifier assigned when the
sequence is first submitted to GenBank. This number should be
cited in publications.
● Taxonomy: Information about the organism's classification,
linked to the NCBI taxonomy database.
● REFERENCE: Details about the publication related to the
sequence, including author, title, and citation.
● Contact Information: Information about the submitter of the
sequence.

2. Features Section:

● Source: Details of the organism, including the length of the


sequence and its taxonomy identification number.
● Gene Information: Includes gene name, nucleotide coding
sequence (CDS), and other significant biological features like
exons and translated protein sequences.

3. Sequence Section:

● The actual sequence of DNA or protein is displayed, starting


with the label "ORIGIN".
● Base Count: Shows the count of each nucleotide (A, G, C, T) in
DNA sequences.
● The sequence ends with "//".

Search and Retrieval:


You can search GenBank using various qualifiers to narrow down the
data, such as:

● [GENE]: for gene name.


● [AUTH]: for author name.
● [ORGN]: for organism name.

Alternative Sequence Formats


Abstract Syntax Notation One (ASN.1) is a data markup language
designed specifically for representing sequences and structuring data,
particularly for use in accessing relational databases. The key feature of
ASN.1 is that each piece of information in a sequence record is separated
by tags, making it easier to add data to relational tables and later retrieve
it. While ASN.1 is more difficult for humans to read, it offers significant
advantages for computer processing, such as efficient filtering, parsing,
and the ability to transmit and integrate data between databases.
Sequence Format Conversion

In sequence analysis and phylogenetic studies, there is often a need to


convert data between various sequence formats. A popular tool for
sequence format conversion is Readseq, developed by Don Gilbert at
Indiana University. This tool can read sequences in nearly any format and
convert them into a new desired format. It is available via a web interface
at Readseq Web Interface.

SRS (Sequence Retrieval System)

The SRS is a retrieval system maintained by the European


Bioinformatics Institute (EBI) and is similar to NCBI Entrez,
although it is less integrated. SRS allows users to query multiple
databases simultaneously, offering a form of database integration.
While it may not be as integrated as Entrez, it still provides useful
features, such as:

● Quick Text Search: Allows basic querying with a single input


box.
● Standard Query Form: Offers a more structured search with four
criteria (fields) connected by Boolean operators.
● Extended Query Form: Provides the flexibility to use more
diverse criteria and fields for more complex searches.
4.1. Crop Production

In agricultural crop production, the goal is to create sustainable


farming systems that use fewer resources like water, soil, and energy
while maintaining crop yields and improving farm income. The
European Union's Sustainable Use of Pesticides strategy promotes
Integrated Pest Management (IPM), which combines biocontrol
agents, plant genetics, cultural methods, and limited pesticide use to
reduce environmental impact. Successful implementation of IPM
requires careful planning, record-keeping, and the integration of new
technologies. Decision Support Systems (DSS) help farmers make
better decisions by collecting and analyzing data, offering
recommendations for both short-term and long-term management.
However, many DSS tools are not widely adopted because they often
address specific problems rather than the broad range of challenges
farmers face.

4.1.1. Structure of a DSS


A Decision Support System (DSS) is a complex tool designed to
assist farmers in making informed decisions by integrating and
analyzing data related to crop production. Key decisions, especially
pre-cultivation and cultivation choices, are crucial as they involve
substantial resource allocation and long-term impacts. These
decisions are often difficult due to complexity and uncertainty, such
as unpredictable weather. The DSS should be accessible via the web,
eliminating the need for software installation and enabling continuous
updates. It utilizes both static-site profiles (e.g., soil type, previous
crops) and dynamic, site-specific data (e.g., weather, crop status)
collected from sensors or human reports. The system analyzes this
data with expert knowledge to support decision-making, but the final
decision remains with the user. The process is cyclical, with data
flowing from the environment to the system and back, helping to
refine decisions and actions over time. However, the success of DSS
depends on clear user support, as it is intended to assist, not replace,
the decision maker.
Actors and Infrastructures of the DSS: Outlines the roles of the DSS
provider (managing weather stations and crop data), client enterprises
(farms or organizations), and crop managers (technicians or advisors who
input data and interpret the output).
Monitoring the Crop Environment: Details how agro-meteorological
stations collect weather data, while reference crops are used to monitor the
field conditions, which help improve DSS interpretation.
Management of Data Fluxes: Explains how weather and crop data are
automatically collected and stored in databases, with weather data
transmitted via gateways and crop data input by crop managers through the
DSS interface.
Data Analysis: Describes how weather and crop data are analyzed to
provide decision support for various aspects of crop cultivation, including
sowing, fertilization, and pest control. The analysis follows a structured
problem-solving process.
Decision Supports: Highlights the outputs of the DSS, which include
detailed decision support tools for crop management, like dashboards
showing current conditions and risk assessments for diseases, fertilization,
and other factors.
Technological Infrastructure: Outlines the four components of the DSS:
Weather (data collection), Crop (data storage), Analyze (decision support
calculations), and Access (user management and system interface), all
integrated into a web portal for easy access and interaction.

Increased Automation: DSS will become more automated, requiring


less human intervention for data collection, analysis, and
decision-making, which will make it easier for farmers to manage
crops effectively.
Integration with IoT: With the growth of the Internet of Things
(IoT), DSS will integrate more sensor data (e.g., soil moisture,
temperature) for real-time decision-making, allowing for more precise
and timely actions.
AI and Machine Learning: Artificial Intelligence (AI) and machine
learning will play a larger role in DSS, improving the accuracy of
predictions and automating complex decision processes, such as pest
control and disease management.
Mobile Accessibility: DSS will increasingly be accessible through
mobile apps, enabling farmers to access decision support tools
anywhere, anytime, on their smartphones.
Cloud Computing: More DSS will be hosted on the cloud, allowing
for scalable, real-time data analysis and easier collaboration between
farmers, advisors, and researchers.
Better Data Visualization: Future DSS will offer advanced data
visualization tools, such as interactive maps and dashboards, to make
complex data more understandable and actionable for users.

12. fejezet - Expert Systems

In the early 1970s, researchers were interested in understanding how


experts make decisions and whether these decisions could be modeled
in a computer. This led to the creation of expert systems, which were
computer programs designed to imitate the decision-making of human
experts, like doctors. The idea was that these programs could carry the
knowledge of experts and be used around the world.

By the 1980s and 1990s, the term "expert system" changed to


"knowledge-based system". This shift was to reduce the pressure to
create programs that were like real experts, and instead, focus on
systems that had a lot of useful knowledge, but didn’t need to be
perfect like a human expert. This change represented a move towards
better and more advanced problem-solving technologies

5. Characteristics of Expert System Applications

5.1 Structured vs. Unstructured Tasks


Expert systems are most effective for structured tasks, where
problems can be clearly defined and broken down into formal rules.
These systems rely on structured data and logical rules to solve
problems, making them ideal for situations where the task can be
easily defined, even if statistical or optimization methods can't be
applied.

5.2 Support vs. Replace


While expert systems were originally designed to replace human
experts, in many cases, they are better suited for supporting
decision-makers rather than replacing them. For example, in
accounting systems like ExperTAX, the expert system helps
accountants by providing advice, not by fully replacing their
expertise.

5.3 Available Time


The amount of time available to make a decision is crucial. In
real-time situations where quick decisions are needed, support from
expert systems may not be enough. However, if the system is used to
provide valuable insights quickly, it can assist experts by saving time
searching for information.

10. Expert System Strengths and Limitations

10.1 Strengths
Expert systems excel in solving real-world problems by manipulating
both syntactic (structure) and semantic (meaning) information,
allowing computers to handle tasks previously thought to require
human expertise. They can be highly effective for rule-based
problems, offering potential solutions based on pre-defined rules.
Additionally, expert systems can be integrated with other computer
technologies, such as in financial systems where they analyze data
like financial ratios, saving time and effort.

10.2 Limitations
Despite their advantages, expert systems have limitations. One major
issue is their narrow scope—they are typically designed to solve
specific problems and don't know when a problem is outside their
scope. They also face challenges in keeping their knowledge up to
date, especially in areas where rules change frequently, like taxation
or technology. Modifying the rule set can also be difficult for
non-experts, making it hard to adapt or improve the system.
Furthermore, expert systems can capture tangible knowledge, but they
often fail to account for "intangibles"—like human intuition or
judgment—that can be crucial in decision-making.

10. fejezet - ICT in Quality Management

1. Introduction
With the rise of global food markets, ensuring food safety and quality
has become a top priority. Issues like mad cow disease and food
contamination have made consumers more concerned. As a result,
there are stricter rules for food quality control from farm to table, and
retailers are now more focused on food safety, pushing producers and
regulators worldwide to improve their quality systems.

1.1. Food Quality and Safety


Food safety and quality are mainly the responsibility of producers,
processors, and traders. Public authorities help by ensuring proper risk
assessments. Everyone in the food supply chain, from farmers to
consumers, must follow guidelines to ensure food is safe and of good
quality.

1.2. Actors in the Food Supply Chain

● Farmers: Handle crops and keep records.


● Processors: Manage raw materials and control the production
process.
● Consumers: Store and prepare food correctly.
● Wholesalers and Retailers: Store and distribute food properly.

Supporters include service providers like transporters and researchers,


while enablers like regulators and food safety agencies ensure
everything meets standards.

1.3. World Trade Organization (WTO)


The WTO’s Doha round aimed to help developing countries improve
access to markets, but talks stalled because of disagreements between
the EU and the US on agricultural subsidies and market access,
leaving developing countries concerned about unfair competition
from subsidized products.

You might also like