U1
U1
Data is a raw and unorganized fact that is required to be processed to make it meaningful. Data is always interpreted, by a
human or machine, to derive meaning. So, data is meaningless. Data can come in the form of text, observations, figures,
images, numbers, graphs, or symbols. For example, data might include individual prices, weights, addresses, ages, names,
temperatures, dates, or distances.
• Data helps in mak better decisions.
• Data helps one to evaluate the performance.
• Data helps one understand consumers and the market.
• "Data" comes from a singular Latin word, datum, which originally meant "something given." Its early usage
dates back to the 1600s. Over time "data" has become the plural of datum.
• Example- Each student's test score is one piece of data.
There are two main types of data:
•Quantitative data is provided in numerical form, like the weight, volume, or cost of an item.
•Qualitative data is descriptive, but non-numerical, like the name, sex, or eye color of a person.
Information
When data is processed, organized, structured or presented in a given context so as to make it useful, it is called
information. Information is defined as knowledge gained through study, communication, research, or instruction.
Essentially, information is the result of analyzing and interpreting pieces of data.
• Information" is an older word that dates back to the 1300s and has Old French and Middle English origins. It
has always referred to "the act of informing, " usually in regard to education, instruction, or other knowledge
communication.
• The average score of a class or of the entire school is information that can be derived from the given data.
Data Vs Information
Database
A database is an organized collection of structured information, or data, typically stored electronically in a
computer system. A database is usually controlled by a database management system (DBMS).
The Database is an essential part of our life. We encounter several activities that involve our interaction with databases, for
example in the bank, in the railway station, in school, in a grocery store, etc. These are the instances where we need to store
a large amount of data in one place and fetch these data easily.
DBMS
Database Management System is a software or technology used to manage data from a database. Some popular databases
are MySQL, Oracle, MongoDB, etc. DBMS provides many operations e.g. creating a database, Storing in the database,
updating an existing database, delete from the database. DBMS is a system that enables you to store, modify and retrieve
data in an organized way. It also provides security to the database.
RDBMS
A relational database is a collection of information that organizes data in predefined relationships where data is stored in
one or more tables (or "relations") of columns and rows, making it easy to see and understand how different data structures
relate to each other. Relationships are a logical connection between different tables, established on the basis of interaction
among these tables. A relational database model was Developed by E.F. Codd from IBM in the 1970s.
Requirement/Need of DataBase
A good database is crucial to any company or organisation. This is because the database
stores all the pertinent details about the company such as employee records, transactional
records, salary details etc
.
1.Manages large amounts of data
A database stores and manages a large amount of data on a daily basis. This would not be possible using any other
tool such as a spreadsheet as they would simply not work.
2.Accurate
A database is pretty accurate as it has all sorts of build in constraints, checks etc. This
means that the information available in a database is guaranteed to be correct in most
cases.
3.Easy to update data
In a database, it is easy to update data using various Data Manipulation languages (DML) available. One of these
languages is SQL.
4.Security of data
Databases have various methods to ensure security of data. There are user logins required before accessing a
database and various access specifiers. These allow only authorised users to access the database.
5.Data integrity
This is ensured in databases by using various constraints for data. Data integrity in databases makes sure that the
data is accurate and consistent in a database.
6.Easy to research data
It is very easy to access and research data in a database. This is done using Data Query
Languages (DQL) which allow searching of any data in the database and performing
computations on it.
Characteristics of a Database
1.We should be able to store all kinds of data that exist in this real world. Since we need to work with all kinds of data
and requirements, the database should be strong enough to store all kinds of data that are present around us.
2.We should be able to relate the entities/tables in the database by means of relation. i.e.; any two tables should be
related. Let us say, an employee works for a department. This implies that an Employee is related to a particular
department. We should be able to define such a relationship between any two entities in the database. There should
not be any table lying without any mapping.
3.There should not be any duplication of data in the database. Data should be stored in such a way that it
should not be repeated in multiple tables. If repeated, it would be an unnecessary waste of DB space, and
maintaining such data becomes chaos.
4.DBMS has a strong query language. Once the database is designed, this helps the user to retrieve and
manipulate the data. If a particular user wants to see any specific data, he can apply as many filtering
conditions that he wants and pull the data that he needs.
5.Multiple users should be able to access the same database, without affecting the other user. i.e.; if teachers
want to update a student’s marks in the Results table at the same time, then they should be allowed to update
the marks for their subjects, without modifying other subject marks. A good database should support this
feature.
6.It supports multiple views to the user, depending on his role. In a school database, Students will able to see
only their reports and their access would be read-only. At the same time, teachers will have access to all the
students with modification rights. But the database is the same. Hence a single database provides different
views to different users.
7.The database should also provide security, i.e.; when there are multiple users are accessing the database,
each user will have their own levels of rights to see the database. Some of them will be allowed to see the
whole database, and some will have only partial rights. For example, an instructor who is teaching Physics will
have access to see and update marks of his subject. He will not have access to other subjects. But the HOD
will have full access to all the subjects.
Advantages OF Database
1. Improves data sharing and employees’ productivity
As all data is centralized in a database management system, it creates an environment in which employees
have greater access to a variety of data in one place.
Thus, users can respond fast to changes in their environment and make better decisions.
For example, users like marketers will have access to more customer data, enabling them to better
understand customers and react fast to the changes in customers’ needs or market preferences.
2.Ensure data backup & Recovery
Databases provide tools for regular backups and recovery procedures,reducing the risk of data
loss due to hardware failures and other disasters.
3. Shows you the big picture
Rather than checking one piece of software to look at the number of sales, another location to
look at the number of digital marketing leads, then another to review the financial figures, DBMS
lets you view all of this in one “single” place.
4.Support data analytics
Databases can be integrated with analytics tools,allowing organisations to gain insights from their data for better
decision making.
5. Raises your ability to increase profits
It helps you better understand how your business processes are working. Are they working together to build
your success? Or are they operating independently, not aiding each other?
You may be making mistakes that cost a lot of money and effort.
For example, let’s say you see that your business has high sales transactions but you are not gaining money.
Instead, you are losing money.
Looking at your DBMS, you notice that you are selling too many low-end products. But as you are a small
business, you can’t sell them in mass quantity (like Amazon) to gain a real profit.
6. Ensures consistency of data
In a database, each data item is held only once. It eliminates the threat of the item being updated on one
system and not on another. This ensures data consistency.
7. Improves your overall marketing and ensures a better customer experience
DBMS provides a clean and centralized record of each of your customers (within a Single Customer View
database).
It stores customer info such as names, addresses, emails, age, gender, locations, etc.
It also records information on how customers interact with your business (including your brands, websites,
products, applications, etc.)
In other words, the database allows you to obtain a full picture of your customers, that you can use for
further data analytics in marketing.
Using this info, you can optimize your overall marketing efforts such as digital customer experience,
personalization, retention, lead generation, etc.
DBMS helps the data-driven marketers understand customer behavior at scale and personalize offers for
every buyer.
Thus, your business can serve your customers better while delivering a remarkable experienc
Disadvantages of database
1. Increased costs
As database management systems require advanced hardware, software, and skilled employees, it is often
associated with higher costs.
The cost of maintaining the resources to operate a DBMS can include training, licensing, regulatory
compliance, etc.
DBMS also requires a high-speed processor as well as a large memory size to store safely and securely data.
They can be expensive solutions too.
2. Complexity
To cover a lot of requirements and solve many data problems, the DBMS has complex functionality that
makes it a complicated software.
It is critical for developers, designers, and database users to have an appropriate skill set in order to use
the database successfully and unlock its power.
If they don’t understand the DBMS, then it may lead to loss of data or database failure.
3. Higher impact of a failure
The fact that the DBMS is a central place for all of your data increases the vulnerability of the system.
Since all users rely on one centralized place, the failure of any component can have a severe negative
impact on operations or permanent damage to the database.
4. Performance Overhead: Databases introduce a performance overhead due to the need to read and write data to
storage and perform complex queries. Poorly designed databases can suffer from slow performance.
Difficult to learn: Learning to design and query databases effectively requires time and effort, making it a potential
challenge for newcomers.
5.Data Migration: Moving data between different database systems or versions can be challenging and time-
consuming.
6.Resource Consumption: Databases can consume a significant amount of system resources, including CPU, memory,
and disk space.
7.Security Concerns: While databases offer security features, they are still vulnerable to security breaches if not
configured and maintained properly.
Data model Schema and Instance
Schema
Schema is the overall description or design of the database. The basic structure of how the data will be stored in the
database is called schema.
Sckeleton/Structure/Blueprint of how our data will look.It does not hold data.
Schema is of three types: Logical Schema, Physical Schema and view Schema.
•Logical/Conceptual Schema – It describes the database designed at a logical level.Specifies all the logical constraints
applied on the data stored in table or set of tables.This will be done by a programmer.
•Physical Schema – It describes the database designed at the physical level.How data is stored actually as block of disk
storage in KB,MB,GB,TB.All this info. Is hidden for programmer.
•View Schema – It defines the design of the database at the view level.Specifies how user/users will interact with
system using GUI(Graphical user interface).
Example:
Let’s say a table teacher in our database named school, the teacher table requires the name, dob, and doj in their table so we design a
structure as:
Teacher table
name: String
doj: date
dob: date
Instance
The data which is stored in the database at a particular moment of time is called an instance of the database. The instances
can be changed by certain CRUD operations, such as like addition,updation and deletion of data. It may be noted that any
search query will not make any kind of changes in the instances.
Example:
Let’s say a table teacher in our database whose name is School, suppose the table has 50 records so the instance of the
database has 50 records for now and tomorrow we are going to add another fifty records so tomorrow the instance has a
total of 100 records. This is called an instance.
Database Architecture
A Database Architecture is a representation of DBMS design. It helps to design, develop, implement, and
maintain the database management system. A DBMS architecture allows dividing the database system into
individual components that can be independently modified, changed, replaced, and altered.. We choose database
architecture depending on several factors like the size of the database, number of users, and relationships between the
users. There are two types of database models that we generally use, logical model and physical model.
Types of DBMS Architecture
There are several types of DBMS Architecture that we use according to the usage requirements.
•1-Tier Architecture
•2-Tier Architecture
•3-Tier Architecture
1-Tier Architecture
1 Tier Architecture in DBMS is the simplest architecture of Database in which the client, server, and Database
all reside on the same machine. A simple one tier architecture example would be anytime you install a
Database in your system and access it to practice SQL queries. But such architecture is rarely used in
production.
In 1-Tier Architecture the database is directly available to the user, the user can directly sit on the DBMS and use it.
For Example: to learn SQL we set up an SQL server and the database on the local system. This enables us to directly
interact with the relational database and execute operations. The industry won’t use this architecture they logically
go for 2-tier and 3-tier Architecture.
The Application layer resides between the user and the DBMS, which is responsible for communicating the
user’s request to the DBMS system and send the response from the DBMS to the user. The application
layer(business logic layer) also processes functional logic, constraint, and rules before passing data to the user
or down to the DBMS.
Example-Any large website on the internet
Advantages of 3-Tier Architecture
•Enhanced scalability: Scalability is enhanced due to the distributed deployment of application servers. Now, individual
connections need not be made between the client and server.
•Data Integrity: 3-Tier Architecture maintains Data Integrity. Since there is a middle layer between the client and the server,
data corruption can be avoided/removed.
•Security: 3-Tier Architecture Improves Security. This type of model prevents direct interaction of the client with the server
thereby reducing access to unauthorized data.
Disadvantages of 3-Tier Architecture
•More Complex: 3-Tier Architecture is more complex in comparison to 2-Tier Architecture. Communication Points are also
doubled in 3-Tier Architecture.
•Difficult to Interact: It becomes difficult for this sort of interaction to take place due to the presence of middle layers.
Data Independence
o Data independence refers characteristic of being able to modify the schema at one level of the database
system without altering the schema at the next higher level.
o In addition to data entered by users, database systems typically store large amounts of data. The system
stores metadata about data which makes it easier to find and retrieve data. In a DBMS, once a set of
metadata is stored in the database, it is difficult to change or update the metadata. However, as a database
management system (DBMS) matures, it must evolve to meet the needs of users. Updating schema or data
can be a time-consuming and complex task if all the data is dependent.
To solve the problem of updating metadata, it is organized in a tiered structure so that changing the data at
one level does not affect the data at another. This information is independent, but all this information is
related. Therefore, data independence helps to decouple the data from the application that uses it.
Example:consider a database system that stores data in a file system at start. If the storage technology needs
to be upgraded to a more efficient system, such as a relational database management system (RDBMS), the
applications using the database should not be impacted by this change. The data should remain accessible and
usable without requiring modifications to the application code.
There are two types of data independence:
1. Logical Data Independence
•Logical data independence refers characteristic of being able to change the conceptual schema without having
to change the external schema.
•Logical data independence is used to separate the external level from the conceptual view.
•If we do any changes in the conceptual view of the data, then the user view of the data would not be affected.
•Logical data independence occurs at the user interface level.
Techniques for Achieving Logical Data Independence
o Data Normalization
o Creating Database Abstraction Layer
o Stored Procedures
o Creating Views
2. Physical Data Independence
•Physical data independence can be defined as the capacity to change the internal schema without having to change the
conceptual schema.
•If we do any changes in the storage size of the database system server, then the Conceptual structure of the database will
not be affected.
•Physical data independence is used to separate conceptual levels from the internal levels.
•Physical data independence occurs at the logical interface level.
Techniques to Achieve Physical Data Independence
o Storage Configuration and Optimization
o Data Replication and Partitioning
o Secure Storage
o Password Management
Importance of Data Independence
Data independence holds significant importance in the field of database management for several reasons:
1. Flexibility
Data independence allows for changes to be made to the database structure, storage technology, or performance
optimizations without impacting the applications that rely on the data. It provides the flexibility to adapt and evolve
the database system over time, accommodating changing requirements and advancements in technology.
2. Application Independence
With data independence, applications can remain unaffected by modifications made to the database schema. This
reduces the maintenance effort required when changes are made to the database. It allows applications to be
developed and modified independently, without requiring extensive modifications to accommodate changes in the
underlying data structure.
3. Scalability
Data independence enables the seamless scaling of database systems. It allows for the addition of new data sources,
modification of storage structures, or redistribution of data without disrupting the applications or users. This
scalability is particularly crucial in environments where data volumes and user demands are constantly increasing.
4. Simplified Development
By separating the logical and physical aspects of data storage, data independence simplifies the development
process. Developers can focus on application logic without being concerned about the underlying database structure.
This separation allows for modular development, where changes to one component, such as the logical schema, do
not require modifications in other components, improving
development efficiency and reducing time-to-market.
5. Data Integrity and Consistency
Data independence helps maintain data integrity and consistency. Changes made to the database structure or
organization can be implemented without compromising the integrity of the existing data or violating predefined
constraints. The logical and physical separation provided by data independence ensures that changes made at one
level do not impact the integrity of data stored at other levels.
6. Database Evolution
Over time, databases need to evolve to accommodate new requirements, technologies, or business processes.
Data independence allows for smooth database evolution by providing a clear separation between the logical and
physical layers. This separation facilitates the migration of data between different database management systems
or storage technologies, making it easier to adopt new technologies or platforms while preserving data and
application functionality
Database System Environment
The database system environment is comprised of the components that are meant for defining and managing the data that
we collect, store, manage and use in the database environment.
1. Hardware
The hardware component of the database system environment includes all the physical devices that comprise the
database system. It includes storage devices, processors, input and output devices, printers, network devices and
many more.
2. Software
The software component of the database environment includes all the software that we require to access, store
and regulate the database. Like operating systems, DBMS and application programs and utilities. The operating
system invokes computer hardware, and let other software runs. DBMS software controls and regulates the
database. The application program and utilities access the database and if required you can even manipulate the
database.
3. People
If talk of the people component then it will include all the people who are related to the database. There may be
a group of people who will access the database just to resolve their queries i.e. end-user, there may be people
that are involved in designing the database i.e. database designer.
Some may be involved in designing the applications that will have an interface through which data entry is
possible i.e. database programmer and analyst and some may also be there to monitor the database i.e. database
administrator.
4. Procedures
The procedure component of the database environment is nothing but the function that regulates and controls
the use of the database.
5. Data
Data component include a collection of related data which are the known fact that can be recorded and it has an
implicit meaning in the databsae.
Database System Utilities
Database system utilities are the tools that can be used by the database system administrator to control and
manage the database system.
1. Loading Utility
Loading database utility helps in loading the database file into the database. It efficiently reformats the current
format of data files to the format that is required by the destination database file structure. Some loading
programs or tools are specially designed for loading data from one DBMS to another.
If you provide source database storage description and target database storage description to these loading tools
then it will automatically reformat the data files to target database storage description.
2. Backup Utility
The backup utility in the database environment helps in creating a backup copy of the entire database. Generally,
the entire data of the database is copied to mass storage and we refer to it as a backup copy. This backup copy
can be used when there is a system failure or storage of your system is corrupted.
You can always choose incremental backups which only record the changes from the previous backup. Though the
incremental backup requires a more complex algorithm it saves more space as compared to regular backup.
3. Database Storage Reorganization Utility
Sometimes we need to relocate the set of database files to a different location. The database storage
reorganization utility helps to relocate and organize the database files to a different location and it also produces
a
new access path to access the files from its new location.
4. Performance Monitoring Utility
Performance monitoring utility monitors the usage of the database by its user and provides statistics for the same
to the database administrator (DBA). The statistics provided by the utility helps the DBA to decide whether it is
required to reorganize the data files, whether there is a need to add new indexes or not, whether some indexes to
the files must be dropped to improve the performance of the database system.
There are more utilities in the database environment that help in sorting the database file on some basis, handling
data compression on the large databases, monitoring the user’s access to the database and many more.
Types And Classification Of Database Management System
1. Based on the data model
2. Based on the number of users
3. Based on the sites over which network is distributed
4. Based on the cost
5. Based on the usage
6. Based on the flow control
1. Based on Data Model
The data model defines the physical and logical structure of a database which involves the data types, the
relationship among the data, constraints applied on the data and even the basic operations specifying retrieval
and updation of data in the database. Depending upon how the data is structured, data models are further
classified into:
a. Relational Data Model
In the relational data model, we use tables to represent data and the relationship among that data. Each of the
tables in the relational data model has a unique name.
A table has multiple columns where each column name is unique. A table holds records which has value for each
column of the table.
We also refer to the relational database model as a record-based data model as it holds records of fixed-format.
The relational database model is the most currently used data model.
b. Entity-Relationship Model
The Entity-Relationship model (E-R data model) represents data using objects and the relationship among these
objects. These objects are referred to as entities that represent the real ‘thing’ or ‘object’ in the real world.
c. Object-Based Data Model
Nowadays, object-oriented programming such as Java, C++, etc. is widely used to develop most of the software.
This motivated the development of an object-based data model. The object-based data model is an extension of
the E-R model which also include notion for encapsulation, methods. There is also an object-relational data model
which is a combination of the object-oriented data model and relational data model.
d. Semistructured Data Model
The semistructured data model is different from what we have studied above. In the semistructured data model,
the data items or objects of the same kind might have a different set of attributes. The Extensible Markup
Language represents the semistructured data.
2. Classification Based on Number of Users
The database management system can also be classified on the basis of its user. So, a DBMS can either be used by
a single user or it can be used by multiple users.
The database system that can be used by a single user at a time is referred to as a single-user system
The database system that can be used by multiple users at a time is referred to as a multiple user system.
3. Based on Database Distribution
Depending on the distribution of the database over numerous sites we can classify the database as:
a. Centralized DBMS
In the centralized DBMS, the entire database is stored in a single computer site. Though the centralized database
support multiple users still the DBMS software and the data both are stores on a single computer site.
b. Distributed DBMS
In the distributed DBMS (DDBMS) the database and the DBMS software are distributed over many computer sites.
These computer sites are connected via a computer network.
4. Based on Cost of Database
Well, it is quite difficult to classify the database on the basis of its cost as nowadays you can have free open
source DBMS products such as MySQL and PostgreSQL. Although the personal version of RDBMS can cost up to
$100.
Some systems with single-user versions such as Microsoft Access are sold per copy or included in the
configuration of your desktop or laptop.
You may also have to pay millions of dollars for the installation and maintenance of a large database system.
5. Classification Based on Usage
On the basis of the access path that is used to store the files, the database can be classified as general-purpose
DBMS and special-purpose DBMS.
The special-purpose DBMS is the one that is designed for a specific application and it can not be used for
another application without performing any major changes we refer to this as online transaction processing
(OLTP).
The general-purpose DBMS is the one that is designed to meet the need of as many applications as possible.
6. Based on Flow Control
Based upon the flow of control from application to DBMS the database management system is broadly classified
into two types active database management system and passive database management system. Let us discuss
With the passive database management system, the user needs to specify the query to the current state of the
database system to retrieve the desired information.
The active database management system on other hand are referred to as data-driven systems or event-driven
systems where the control flow between the application and DBMS is based on the occurrence of an event.