0% found this document useful (0 votes)
16 views7 pages

Unit 1-1

The document provides an overview of data and information, defining data as raw facts and information as processed data that is meaningful. It discusses database management systems (DBMS), their types, characteristics, applications, advantages, and disadvantages, as well as data languages such as DDL, DML, DCL, and TCL. Additionally, it emphasizes the importance of data quality and its characteristics, which are crucial for making informed decisions.

Uploaded by

jayeshborase781
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views7 pages

Unit 1-1

The document provides an overview of data and information, defining data as raw facts and information as processed data that is meaningful. It discusses database management systems (DBMS), their types, characteristics, applications, advantages, and disadvantages, as well as data languages such as DDL, DML, DCL, and TCL. Additionally, it emphasizes the importance of data quality and its characteristics, which are crucial for making informed decisions.

Uploaded by

jayeshborase781
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Unit1- Basics

What is Data ?
The term Data is defined as a raw and unstructured fact that needs to be processed to make it meaningful.
Data can be simple and unstructured at the same time until it is structured. Usually data contains facts,
numbers, symbols, image, observations, perceptions, characters, etc.

To derive meaning, data is always interpreted by a machine or human. So, it is meaningless. Data comprises
of statements, characters and numbers in a raw form. Examples of Data; the number of visitors to a website
by country, for the past 100 years, the history of temperature readings around the globe is the data.

What is Information ?
The term Information is defined as a set of data that is processed according to the given requirement in a
meaningful way. To make the information useful and meaningful, it must be processed, presented and
structured in a given context.

Information is processed from data and possess context, purpose and relevance. It also includes raw data
manipulation.

Optimization
o The query optimizer (also known as the optimizer) is database software that identifies the most
efficient way (like by reducing time) for a SQL statement to access data

o Database optimization involves maximizing the speed and efficiency with which data is retrieved.
o The process of selecting an efficient execution plan for processing a query is known as query
optimization.
o Query optimization is used to access and modify the database in the most efficient way possible.

Data Preprocessing
Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning
model. It is the first and crucial step while creating a machine learning model.

Query processing refers to the range of the activities involved in extracting data from a database to process the
query and generate result.

Query (SQL) Result


Query Processing

Before processing the query (which is the SQL query), the system must translate the query into a usable form
(language which system can understand)
What is Database
The database is a collection of inter-related data which is used to retrieve, insert and delete the data
efficiently. It is also used to organize the data in the form of a table, schema, views, and reports, etc.

For example: The college Database organizes the data about the admin, staff, students and faculty etc.

Using the database, you can easily retrieve, insert, and delete the information.

Database Management System (DBMS)


o Database management system is a software which is used to manage the database. For
example: MySQL, Oracle, etc are a very popular commercial database which is used in different applications.
o In 1960, the Charles bachman designed the dbms.

o Database management system is the combination of two words –

Database + Management System = DBMS

o DBMS provides an interface to perform various operations like database creation, storing data in it, updating
data, creating a table in the database and a lot more.
o Database management system is a collection of programs that enables users to create and maintain the
database.
o Operating system
Application DBMS (OS)
Database

o Database management system (DBMS) can be also define as an interface between application and the operating
system to access that database.
o It provides protection and security to the database.

Types of DBMS
o Relational DBMS
o Non - Relational DBMS

Relational DBMS :- In this DBMS, data stored in table format

Roll No Name Class


1 Ram FY
2 Jai TY
3 Om SY
4 Sai FY
For Ex – MYSQL, Oracle

Non - Relational DBMS :- In this DBMS, data stored in table Key-value point.
{ Roll No:1,
Name: ‘Om’,
Class: ‘FY’ }
For Ex – MongoDB

Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the information.
o It can provide a clear and logical view of the process that manipulates data.
o DBMS contains automatic backup and recovery procedures.
o It can reduce the complex relationship between data.
o It is used to support manipulation and processing of data.
o It is used to provide security of data.
o It can view the database from different viewpoints according to the requirements of the user.

Applications of DBMS
o Banking - For maintaining customer information, accounts, loans and banking transactions.

o Universities - For maintaining students information, records, course, registration and grades.

o Railway Reservation - For checking the availability of reservation in different trains, tickets.

o Airlines - For reservation and schedule information.

o Telecommunication – For keeping records of calls mode, generating monthly bills, etc.

o Finance – For storing information about holidays, sales and purchase of financial instructions.

o Sales – For customer, product and purchase information.


Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it stores all the data in
one single database file and that recorded data is placed in the database.

o Data sharing: In DBMS, the authorized users of an organization can share the data among multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature of the database system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create automatic backup of data
from hardware and software failures and restores the data if required.

o Multiple user interface: It provides different types of user interfaces like graphical user interfaces,
application program interfaces.

Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and large memory
size to run DBMS software.

o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in most of the organization,
all the data stored in a single database and if the database is damaged due to electric failure or database
corruption then the data may be lost forever.

There are four types of Data Languages

1. Data Definition Language (DDL)


2. Data Manipulation Language (DML)
3. Data Control Language (DCL)
4. Transactional Control Language (TCL)

DDL is the short name for Data Definition Language, which deals with database schemas and
descriptions, of how the data should reside in the database.

 CREATE: to create a database and its objects like (table, index, views, store
procedure, function, and triggers)
 ALTER: alters the structure of the existing database
 DROP: delete objects from the database
 TRUNCATE: remove all records from a table, including all spaces allocated for
the records are removed
 COMMENT: add comments to the data dictionary
 RENAME: rename an object
DML is the short name for Data Manipulation Language which deals with data manipulation
and includes most common SQL statements such SELECT, INSERT, UPDATE, DELETE, etc.,
and it is used to store, modify, retrieve, delete and update data in a database.

 SELECT: retrieve data from a database

 INSERT: insert data into a table

 UPDATE: updates existing data within a table

 DELETE: Delete all records from a database table

 MERGE: UPSERT operation (insert or update)

 CALL: call a PL/SQL or Java subprogram

 EXPLAIN PLAN: interpretation of the data access path.

 LOCK TABLE: concurrency Control


DCL is short for Data Control Language which acts as an access specifier to the database.
(basically to grant and revoke permissions to users in the database

 GRANT: grant permissions to the user for running DML (SELECT,


INSERT, DELETE,…) commands on the table

 REVOKE: revoke permissions to the user for running DML (SELECT,


INSERT, DELETE,…) command on the specified table

TCL is short for Transactional Control Language which acts as an manager for all types of
transactional data and all transactions. Some of the command of TCL are

 Role Back: Used to cancel or Undo changes made in the database

 Commit: It is used to apply or save changes in the database

 Save Point: It is used to save the data on the temporary basis in the database

 Database Management System: The software which is used to managem


databases is called Database Management System (DBMS). For Example, MySQL, Oracle,
etc. are popular commercial DBMS used in different applications.

DBMS allows users the following tasks :


 Data Definition: It helps in the creation, modification, and removal of definitions
that define the organization of data in the database.

 Data Updation: It helps in the insertion, modification, and deletion of the actual
data in the database.

 Data Retrieval: It helps in the retrieval of data from the database which can be
used by applications for various purposes.

 User Administration: It helps in registering and monitoring users, enforcing data


security, monitoring performance, maintaining data integrity, dealing with concurrency
control, and recovering information corrupted by unexpected failure.
What is Data Quality?
Data quality is defined as:
the degree to which data meets a company’s expectations of accuracy, validity,
completeness, and consistency.
By tracking data quality, a business can pinpoint potential issues harming quality,
and ensure that shared data is fit to be used for a given purpose.
When collected data fails to meet the company expectations of accuracy,
validity, completeness, and consistency, it can have massive negative impacts on
customer service, employee productivity, and key strategies.

Why Is Data Quality Important?


Quality data is key to making accurate, informed decisions. And while all data
has some level of “quality,” a variety of characteristics and factors determines the
degree of data quality (high-quality versus low-quality). Furthermore, different
data quality characteristics will likely be more important to various stakeholders
across the organization.

A list of popular data quality characteristics and dimensions include:

 Accuracy
 Completeness
 Consistency
 Integrity
 Reasonability
 Timeliness
 Uniqueness/Deduplication
 Validity
 Accessibility

You might also like