0% found this document useful (0 votes)

6 views9 pages

Advanced Databases Unit 2

The document provides an overview of advanced databases, including modern databases, NoSQL, NewSQL, and RDBMS, explaining their structures, use cases, and differences. It also covers various database management tools, ETL processes, and the distinctions between OLTP and OLAP systems. Additionally, it discusses data preparation and cleaning techniques essential for ensuring data accuracy and usability.

Uploaded by

18-TYCM-I-Gaurav Gangurde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views9 pages

Advanced Databases Unit 2

Uploaded by

18-TYCM-I-Gaurav Gangurde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

ADVANCED DATABASES

Unit 2 ( Module 1 )

 Introduction To Moder Databases

 What is a Database?
A database is like a digital storage room where data is kept. Imagine a huge,
organized file cabinet. It stores all kinds of information like customer details, product
information, or transaction records.
 Modern Databases:
Modern databases are more advanced and powerful than older ones. They are
designed to store, manage, and quickly find large amounts of data, even as that data
grows rapidly. They use advanced technologies to make sure the data is organized,
easy to access, and secure.
Modern databases are powerful tools that store, manage, and retrieve data
efficiently. They are built to handle lots of data and make it easy to access, secure,
and manage. They are essential for everything from small apps to massive
companies with millions of users.

Example of Where Modern Databases Are Used:

 E-commerce websites use databases to store product information, customer
orders, and payments.
 Social media platforms use databases to store user profiles, posts, and comments.
 Banks use databases to track account information and transactions.

 NoSQL, NewSQL
1. NoSQL:
NoSQL stands for "Not Only SQL". It’s a type of database that is designed for storing
and managing large amounts of data that may not fit well into traditional relational
databases.
They can handle huge amounts of data across many servers.
NoSQL can store data in different formats, like key-value pairs, documents, wide-
columns, or graphs

 Types of NoSQL Databases:

o Document-Based: Stores data in documents (e.g., JSON or BSON format).
Example: MongoDB.
o Key-Value Stores: Stores data as key-value pairs. Example: Redis.
o Column-Based: Data is stored in columns rather than rows. Example:
Cassandra.
o Graph-Based: Designed for relationships between data (e.g., social networks).
Example: Neo4j.
 When to Use: NoSQL is ideal for projects that need to handle:
o Large amounts of unstructured or semi-structured data.
o Quick scalability and flexibility.
o Real-time data, like social media or IoT (Internet of Things) data.

2. NewSQL:
NewSQL is a newer category of databases that aim to provide the advantages of SQL
(structured data and relational models) with the scalability and performance
features that NoSQL databases offer.
It is designed to scale horizontally, which means it can handle increased traffic and
large amounts of data more easily (just like NoSQL).
support transactional processing (like banking systems).

 What it is: NewSQL is built to combine the best of both worlds: it supports traditional
SQL (structured queries, transactions) but can handle large-scale data and
distributed architectures like NoSQL.
 Popular NewSQL Databases:
o Google Spanner: A distributed relational database that can scale horizontally
while maintaining consistency and strong consistency guarantees.
o CockroachDB: A distributed SQL database that is easy to scale while
maintaining SQL features.
o VoltDB: A high-performance NewSQL database designed for fast transactions.
 When to Use: NewSQL is useful when you need:
o Relational data but also need to scale to handle high traffic.
o Strong consistency and ACID transactions at a large scale.
o High availability with minimal downtime.

 RDBMS Databases
RDBMS (Relational Database Management System):
An RDBMS is a type of database that stores data in an organized way, using tables
that are related to each other. It's like a digital spreadsheet where the data is
structured into rows and columns.
Example:

StudentID First_Name Last_Name Age Major

Computer
1 John Doe 20
Science
2 Jane Smith 22 Mathematics

This is a simple example of an RDBMS table where:

 The columns represent attributes (like name, age, major).

 Each row represents a single student.

 Examples: MySQL, PostgreSQL, Oracle, SQL Server.

 NoSQL Vs RDBMS Databases

Feature NoSQL RDBMS (SQL)

Flexible (documents, key-value, graphs, Structured (tables with rows and

Data Model
etc.) columns)

No fixed schema (can change over

Schema Fixed schema (predefined structure)
time)

Vertical scaling (requires stronger

Scaling Horizontal scaling (across many servers)
hardware)

Not always ACID-compliant (eventual ACID-compliant (strong consistency

Transactions
consistency) and reliability)

High performance, especially for large Optimized for complex queries and
Performance
datasets transactions

Big data, real-time apps, flexible data Financial systems, CRMs, inventory
Use Cases
(social media, IoT) systems, reporting

MySQL, PostgreSQL, Oracle, SQL

Examples MongoDB, Cassandra, Redis, Neo4j
Server

Unit 2 ( Module 1 )
 Tools
1. Database Management Systems (DBMS):

These are the core tools used to create, manage, and interact with databases. They allow
users to store, retrieve, and manipulate data.

 Examples:

o MySQL: A popular open-source relational database system.

o PostgreSQL: Another open-source database system known for its advanced

features.

o MongoDB: A NoSQL database used for flexible data storage (documents, key-
value pairs, etc.).

2. ETL Tools (Extract, Transform, Load):

ETL tools are used to move and manipulate data from different sources and load it into a
data warehouse or database.

 Extract: Getting data from various sources.

 Transform: Cleaning or converting the data into a suitable format.

 Load: Putting the data into the final destination (like a data warehouse).

 Examples:

o Informatica: A powerful tool used for data integration.

o Talend: An open-source ETL tool that helps in connecting and transforming

data.

o Apache Nifi: A tool for automating the flow of data between systems.

3. Data Warehousing Tools:

These are used to store and manage large amounts of historical data that come from
various sources, making it easier for businesses to run reports and analyze trends.

 Examples:

o Amazon Redshift: A cloud-based data warehouse that can handle large

datasets.

o Google BigQuery: A tool for running fast, SQL-like queries on massive

amounts of data in the cloud.

4. Database Performance Tuning Tools:

These tools help optimize and monitor how well a database is running. They make sure the
database is fast, efficient, and can handle a lot of queries.

 Examples:

o Oracle Enterprise Manager: Helps monitor and manage Oracle databases.

o SQL Profiler (for SQL Server): Monitors and analyzes SQL queries to identify
slow parts of the database.

o pgAdmin: A tool for managing PostgreSQL databases and optimizing their

performance.

5. Backup and Recovery Tools:

These tools ensure that your data is safe and can be restored if something goes wrong, like a
system failure or human error.

 Examples:

o Veeam: A backup and recovery tool for both databases and virtual
environments.

o RMAN (Recovery Manager): A tool for backing up and recovering Oracle

databases.

6. Data Migration Tools:

These tools help you move data from one system or format to another, such as moving data
between different databases or to the cloud.

 Examples:

o AWS Database Migration Service: Helps you move databases to the cloud
with minimal downtime.

o Microsoft Data Migration Assistant: Used to migrate databases to SQL Server.

7. NoSQL Database Tools:

These tools help manage and interact with NoSQL databases that store data in ways other
than traditional tables (e.g., key-value pairs, documents, or graphs).

 Examples:

o MongoDB Compass: A GUI tool for MongoDB that helps visualize and analyze
data.

o Cassandra Query Language (CQL): A tool used to interact with Apache

Cassandra (a NoSQL database).

8. Database Security Tools:

These tools ensure that the data is protected and only authorized users can access or modify
it.

 Examples:

o IBM Guardium: Monitors and protects sensitive data in databases.

o Oracle Audit Vault: A tool for monitoring database security and compliance.

9. Data Visualization and Reporting Tools:

These tools help create reports and visualizations of the data stored in databases, making it
easier to analyze trends and make decisions.

 Examples:

o Tableau: A popular tool for creating visualizations and dashboards from

database data.

o Power BI: A Microsoft tool that connects to various databases and creates
interactive reports and dashboards.

 OLTP & OLAP

OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are two types
of database systems used for different purposes

1. OLTP (Online Transaction Processing):

o It's designed for handling everyday transactions and operations.

o Example: When you make a purchase online, check your bank account
balance, or update your contact details, these are all OLTP activities.

o Focus: Speed, accuracy, and handling many small transactions at once (like
inserting, updating, or deleting records).

o Databases are usually highly normalized (organized to minimize redundancy).

Example: An e-commerce website where every time a customer buys something, the system
records the transaction, updates the inventory, and adjusts the customer's order history.

2. OLAP (Online Analytical Processing):

o It's designed for complex data analysis and reporting, often using historical
data.
o Example: Looking at business trends over the past year, running reports on
sales performance by region, or analyzing data for decision-making.

o Focus: Complex queries, aggregations, and summarizations of large datasets,

often for decision-making.

o Databases are usually denormalized (to make analysis faster by storing data in
a more readable format).

Example: A company’s manager might run an OLAP query to find out how sales have
changed over the last 5 years in different regions.

Key Differences:

 OLTP is about fast and efficient handling of transactions, while OLAP is about
analyzing large amounts of data for patterns and trends.

 OLTP databases have lots of small updates, inserts, and deletions, whereas OLAP
databases focus on large read-heavy operations, like summarizing and analyzing
data.

 Data Preparation & Cleaning Techniques

In an advanced database context, data preparation and cleaning techniques are all
about making sure the data you work with is accurate, consistent, and usable for
analysis or further processing. Here are the most common techniques,
1. Handling Missing Data
 Why?: Missing data can mess up your analysis, so it's important to deal with it.
 How?:
o Remove Missing Data: Sometimes, if the missing data is small, you can simply
remove the rows or columns that have it.
o Fill with Defaults: You can replace missing values with common replacements
like the mean, median, or the most frequent value.
o Prediction: Use algorithms to predict what the missing values should be
based on other data.

2. Removing Duplicates
 Why?: Duplicate data can distort your results, making them inaccurate.
 How?: Find and remove rows that are exactly the same to ensure that each record is
unique.
3. Standardizing Data
 Why?: Data may come from different sources with different formats (like dates in
various formats), which can cause confusion.
 How?:
o Consistent Formats: Make sure everything is in the same format (e.g., dates
should all be in YYYY-MM-DD).
o Scaling: If you're working with numbers, sometimes you need to normalize or
standardize them (scaling to a specific range or making them comparable).

4. Handling Outliers
 Why?: Outliers (data points far from the norm) can skew your analysis and make
results unreliable.
 How?: Identify and either remove outliers or transform them to be in line with other
data, depending on their significance.

5. Dealing with Categorical Data

 Why?: Many machine learning algorithms can't work with categories like "yes", "no",
"red", "blue" directly.
 How?: Convert these categories into numbers or one-hot encode them (creating
separate columns for each category).

6. Text Data Cleaning

 Why?: If you're working with text data (like customer reviews or tweets), it might
contain extra or irrelevant information.
 How?:
o Remove unwanted characters (like punctuation or special symbols).
o Lowercase everything to make it uniform.
o Remove common words (like "the", "is", "and") that don’t add much
meaning.

7. Fixing Inconsistent Data

 Why?: Sometimes data entries aren’t consistent (e.g., "USA" vs "U.S.A." or "NY" vs
"New York").
 How?: Standardize the way things are written, making sure they all follow the same
naming rules.

8. Converting Data Types

 Why?: Data may be stored incorrectly (e.g., numbers stored as text or dates stored as
plain text), making it hard to work with.
 How?: Convert data into the right type (e.g., turning a string of numbers into actual
numeric values).
9. Data Transformation
 Why?: Sometimes data needs to be changed to make it more useful for analysis.
 How?:
o Log Transformation: For very large numbers, taking the logarithm can make
the data easier to analyze.
o Feature Engineering: Create new columns from existing data, like splitting a
"date" column into "day", "month", and "year".

10. Data Consistency Checks

 Why?: You need to make sure your data is valid and follows the rules you expect
(e.g., no negative values for ages or prices).
 How?: Verify that the data follows proper rules and fix any errors (like changing a
negative price to a valid value).

11. Data Aggregation

 Why?: Sometimes, you need to combine data into a simpler form to make it more
useful for analysis.
 How?: You might combine data from different rows or columns into a single
summary, like calculating the total sales from individual product sales.

By applying these techniques, you make sure that the data in your advanced
database is clean, consistent, and ready for more complex analysis, like generating
reports, building models, or making predictions.

Unit 6
No ratings yet
Unit 6
143 pages
CCS334 BIG DATA ANALYTICS Session 1 Intr
No ratings yet
CCS334 BIG DATA ANALYTICS Session 1 Intr
18 pages
Wells Fargo Cashout Guide - LUCIFER
100% (2)
Wells Fargo Cashout Guide - LUCIFER
12 pages
Unit 2 Bda Bda
No ratings yet
Unit 2 Bda Bda
29 pages
Assignment 4 Rdbms
No ratings yet
Assignment 4 Rdbms
18 pages
Computerized Pattern Making
No ratings yet
Computerized Pattern Making
18 pages
Hand Out Intro To Database
No ratings yet
Hand Out Intro To Database
112 pages
Database Management - Class Notes
No ratings yet
Database Management - Class Notes
6 pages
DBMS PPT 1 Eng
No ratings yet
DBMS PPT 1 Eng
74 pages
Advanced Database Totorials 1
No ratings yet
Advanced Database Totorials 1
95 pages
1623 - SAM Chassis Fault Codes 6.0
No ratings yet
1623 - SAM Chassis Fault Codes 6.0
26 pages
1 Introduction
No ratings yet
1 Introduction
39 pages
S-Advance Database Management System 1
No ratings yet
S-Advance Database Management System 1
68 pages
Emerging Trends in Database
No ratings yet
Emerging Trends in Database
4 pages
Introduction To Databases Part 1
No ratings yet
Introduction To Databases Part 1
78 pages
CloudComputing DATABASE
No ratings yet
CloudComputing DATABASE
27 pages
Overview of NoSQL
No ratings yet
Overview of NoSQL
17 pages
SQL Unit1
No ratings yet
SQL Unit1
28 pages
Module 2
No ratings yet
Module 2
48 pages
Management Information System Assignment
0% (1)
Management Information System Assignment
6 pages
Module 02 Databases Accessible PowerPoint Presentation
No ratings yet
Module 02 Databases Accessible PowerPoint Presentation
51 pages
DB Lecture 1
No ratings yet
DB Lecture 1
31 pages
New 2nd Lecture Data Resource Management
No ratings yet
New 2nd Lecture Data Resource Management
24 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
Database Management System Major Assignment
No ratings yet
Database Management System Major Assignment
17 pages
Chapter1-Overview of Database Concepts
No ratings yet
Chapter1-Overview of Database Concepts
19 pages
Big Data
No ratings yet
Big Data
53 pages
WK 3
No ratings yet
WK 3
29 pages
BDA Unit 2
No ratings yet
BDA Unit 2
30 pages
Saf2 Itc
No ratings yet
Saf2 Itc
28 pages
Case Study About Database Tools
No ratings yet
Case Study About Database Tools
13 pages
Udbms Notes
No ratings yet
Udbms Notes
18 pages
Amazon Web Services Passguide Dop-C02 Simulations 2024-Apr-03 by Stan 68q Vce
No ratings yet
Amazon Web Services Passguide Dop-C02 Simulations 2024-Apr-03 by Stan 68q Vce
23 pages
Relational DB
No ratings yet
Relational DB
32 pages
Course Work Database Programming
No ratings yet
Course Work Database Programming
18 pages
Technical Presentation - MySQL
No ratings yet
Technical Presentation - MySQL
17 pages
Databases - A Comprehensive Overview
No ratings yet
Databases - A Comprehensive Overview
7 pages
Introduction To Database Systems and ER Diagrams
No ratings yet
Introduction To Database Systems and ER Diagrams
12 pages
2 Big Data Analytics-Hadoop R21 A7902 ABP
No ratings yet
2 Big Data Analytics-Hadoop R21 A7902 ABP
16 pages
BD Unit 1,2
No ratings yet
BD Unit 1,2
12 pages
DBMS Unit Iii
No ratings yet
DBMS Unit Iii
13 pages
Newsql: Towards Next-Generation Scalable Rdbms For Online Transaction Processing (Oltp) For Big Data Management
No ratings yet
Newsql: Towards Next-Generation Scalable Rdbms For Online Transaction Processing (Oltp) For Big Data Management
11 pages
SWDF Assignment Database
No ratings yet
SWDF Assignment Database
12 pages
Types o Database
No ratings yet
Types o Database
11 pages
App Dev Finals
No ratings yet
App Dev Finals
7 pages
Ijeme V13 N4 5
No ratings yet
Ijeme V13 N4 5
9 pages
DBMS Lecture 1
No ratings yet
DBMS Lecture 1
6 pages
SP Install Guide
No ratings yet
SP Install Guide
188 pages
Database
No ratings yet
Database
4 pages
Unit I
No ratings yet
Unit I
11 pages
The KLS Martin ME 400 and ME 200 Electrosurgical Unit
50% (2)
The KLS Martin ME 400 and ME 200 Electrosurgical Unit
3 pages
Database Types
No ratings yet
Database Types
4 pages
Introduction To Database Systems
No ratings yet
Introduction To Database Systems
4 pages
Fundamentals of Databases & DBMS
No ratings yet
Fundamentals of Databases & DBMS
4 pages
Database Defined
No ratings yet
Database Defined
3 pages
ACS233025 M Talha
No ratings yet
ACS233025 M Talha
4 pages
Database
No ratings yet
Database
4 pages
BD Unit 1
No ratings yet
BD Unit 1
5 pages
The Latest Trends in Printing Technology-Libre
100% (1)
The Latest Trends in Printing Technology-Libre
9 pages
They Come in Various Types
No ratings yet
They Come in Various Types
3 pages
What Are The Types of Databases?
No ratings yet
What Are The Types of Databases?
5 pages
Life-Time Management of Relay Settings: Working Group
No ratings yet
Life-Time Management of Relay Settings: Working Group
44 pages
Office Administration Spectrum PDF
No ratings yet
Office Administration Spectrum PDF
1 page
DATABASE
No ratings yet
DATABASE
3 pages
DB
No ratings yet
DB
3 pages
PC Assembly
No ratings yet
PC Assembly
18 pages
AB - CP - Email - Security - Implementation Project-Dashboard Report - New
No ratings yet
AB - CP - Email - Security - Implementation Project-Dashboard Report - New
8 pages
Gs3001 Issue 23 - Inc Addendum 001
No ratings yet
Gs3001 Issue 23 - Inc Addendum 001
25 pages
430 Pipeline Security
No ratings yet
430 Pipeline Security
39 pages
1GuC3rRE02 UTR9vFlPxh0x0yTG5s1cqF
No ratings yet
1GuC3rRE02 UTR9vFlPxh0x0yTG5s1cqF
8 pages
Quality Document Review & Assessment Checklist
No ratings yet
Quality Document Review & Assessment Checklist
30 pages
S7-200 High Speed Counters
No ratings yet
S7-200 High Speed Counters
27 pages
Proxywithintenet
No ratings yet
Proxywithintenet
11 pages
HQ - 350 XT X-Ray Processor
No ratings yet
HQ - 350 XT X-Ray Processor
4 pages
ADMO User Manual ENU
No ratings yet
ADMO User Manual ENU
52 pages
Error Correction and Detection
No ratings yet
Error Correction and Detection
5 pages
Conference 041818 PDF
No ratings yet
Conference 041818 PDF
6 pages
Jdlms
No ratings yet
Jdlms
12 pages
MyScale Full User Guide en
No ratings yet
MyScale Full User Guide en
22 pages
Troubleshooting No Data Conditions On E3270ui v1.0
No ratings yet
Troubleshooting No Data Conditions On E3270ui v1.0
17 pages
2403 13536
No ratings yet
2403 13536
23 pages
Kaizen Blitz For Project Teams
No ratings yet
Kaizen Blitz For Project Teams
3 pages
ECSP DotNet Brochure PDF
No ratings yet
ECSP DotNet Brochure PDF
6 pages
Angel Emmanuel Flores Munoz - Es.en
No ratings yet
Angel Emmanuel Flores Munoz - Es.en
6 pages
Mcafee Activate
No ratings yet
Mcafee Activate
9 pages
Design of Alternative Energy Systems: Course Learning Objectives
No ratings yet
Design of Alternative Energy Systems: Course Learning Objectives
2 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
SQL Demystified: A Beginner's Roadmap to Data Retrieval and Management
From Everand
SQL Demystified: A Beginner's Roadmap to Data Retrieval and Management
Kaushal Mehta
No ratings yet
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet

Advanced Databases Unit 2

Uploaded by

Advanced Databases Unit 2

Uploaded by

ADVANCED DATABASES

 Introduction To Moder Databases

Example of Where Modern Databases Are Used:

 Types of NoSQL Databases:

StudentID First_Name Last_Name Age Major

This is a simple example of an RDBMS table where:

 The columns represent attributes (like name, age, major).

 Each row represents a single student.

 NoSQL Vs RDBMS Databases

Feature NoSQL RDBMS (SQL)

Flexible (documents, key-value, graphs, Structured (tables with rows and

No fixed schema (can change over

Vertical scaling (requires stronger

Not always ACID-compliant (eventual ACID-compliant (strong consistency

MySQL, PostgreSQL, Oracle, SQL

o MySQL: A popular open-source relational database system.

o PostgreSQL: Another open-source database system known for its advanced

2. ETL Tools (Extract, Transform, Load):

 Extract: Getting data from various sources.

 Transform: Cleaning or converting the data into a suitable format.

o Informatica: A powerful tool used for data integration.

o Talend: An open-source ETL tool that helps in connecting and transforming

3. Data Warehousing Tools:

o Amazon Redshift: A cloud-based data warehouse that can handle large

o Google BigQuery: A tool for running fast, SQL-like queries on massive

4. Database Performance Tuning Tools:

o Oracle Enterprise Manager: Helps monitor and manage Oracle databases.

o pgAdmin: A tool for managing PostgreSQL databases and optimizing their

5. Backup and Recovery Tools:

o RMAN (Recovery Manager): A tool for backing up and recovering Oracle

6. Data Migration Tools:

o Microsoft Data Migration Assistant: Used to migrate databases to SQL Server.

7. NoSQL Database Tools:

o Cassandra Query Language (CQL): A tool used to interact with Apache

8. Database Security Tools:

o IBM Guardium: Monitors and protects sensitive data in databases.

9. Data Visualization and Reporting Tools:

o Tableau: A popular tool for creating visualizations and dashboards from

 OLTP & OLAP

1. OLTP (Online Transaction Processing):

o It's designed for handling everyday transactions and operations.

o Databases are usually highly normalized (organized to minimize redundancy).

2. OLAP (Online Analytical Processing):

o Focus: Complex queries, aggregations, and summarizations of large datasets,

 Data Preparation & Cleaning Techniques

5. Dealing with Categorical Data

6. Text Data Cleaning

7. Fixing Inconsistent Data

8. Converting Data Types

10. Data Consistency Checks

11. Data Aggregation

You might also like