0% found this document useful (0 votes)
6 views9 pages

Advanced Databases Unit 2

The document provides an overview of advanced databases, including modern databases, NoSQL, NewSQL, and RDBMS, explaining their structures, use cases, and differences. It also covers various database management tools, ETL processes, and the distinctions between OLTP and OLAP systems. Additionally, it discusses data preparation and cleaning techniques essential for ensuring data accuracy and usability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views9 pages

Advanced Databases Unit 2

The document provides an overview of advanced databases, including modern databases, NoSQL, NewSQL, and RDBMS, explaining their structures, use cases, and differences. It also covers various database management tools, ETL processes, and the distinctions between OLTP and OLAP systems. Additionally, it discusses data preparation and cleaning techniques essential for ensuring data accuracy and usability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

ADVANCED DATABASES

Unit 2 ( Module 1 )

 Introduction To Moder Databases


 What is a Database?
A database is like a digital storage room where data is kept. Imagine a huge,
organized file cabinet. It stores all kinds of information like customer details, product
information, or transaction records.
 Modern Databases:
Modern databases are more advanced and powerful than older ones. They are
designed to store, manage, and quickly find large amounts of data, even as that data
grows rapidly. They use advanced technologies to make sure the data is organized,
easy to access, and secure.
Modern databases are powerful tools that store, manage, and retrieve data
efficiently. They are built to handle lots of data and make it easy to access, secure,
and manage. They are essential for everything from small apps to massive
companies with millions of users.

Example of Where Modern Databases Are Used:


 E-commerce websites use databases to store product information, customer
orders, and payments.
 Social media platforms use databases to store user profiles, posts, and comments.
 Banks use databases to track account information and transactions.

 NoSQL, NewSQL
1. NoSQL:
NoSQL stands for "Not Only SQL". It’s a type of database that is designed for storing
and managing large amounts of data that may not fit well into traditional relational
databases.
They can handle huge amounts of data across many servers.
NoSQL can store data in different formats, like key-value pairs, documents, wide-
columns, or graphs

 Types of NoSQL Databases:


o Document-Based: Stores data in documents (e.g., JSON or BSON format).
Example: MongoDB.
o Key-Value Stores: Stores data as key-value pairs. Example: Redis.
o Column-Based: Data is stored in columns rather than rows. Example:
Cassandra.
o Graph-Based: Designed for relationships between data (e.g., social networks).
Example: Neo4j.
 When to Use: NoSQL is ideal for projects that need to handle:
o Large amounts of unstructured or semi-structured data.
o Quick scalability and flexibility.
o Real-time data, like social media or IoT (Internet of Things) data.

2. NewSQL:
NewSQL is a newer category of databases that aim to provide the advantages of SQL
(structured data and relational models) with the scalability and performance
features that NoSQL databases offer.
It is designed to scale horizontally, which means it can handle increased traffic and
large amounts of data more easily (just like NoSQL).
support transactional processing (like banking systems).

 What it is: NewSQL is built to combine the best of both worlds: it supports traditional
SQL (structured queries, transactions) but can handle large-scale data and
distributed architectures like NoSQL.
 Popular NewSQL Databases:
o Google Spanner: A distributed relational database that can scale horizontally
while maintaining consistency and strong consistency guarantees.
o CockroachDB: A distributed SQL database that is easy to scale while
maintaining SQL features.
o VoltDB: A high-performance NewSQL database designed for fast transactions.
 When to Use: NewSQL is useful when you need:
o Relational data but also need to scale to handle high traffic.
o Strong consistency and ACID transactions at a large scale.
o High availability with minimal downtime.

 RDBMS Databases
RDBMS (Relational Database Management System):
An RDBMS is a type of database that stores data in an organized way, using tables
that are related to each other. It's like a digital spreadsheet where the data is
structured into rows and columns.
Example:

StudentID First_Name Last_Name Age Major


Computer
1 John Doe 20
Science
2 Jane Smith 22 Mathematics

This is a simple example of an RDBMS table where:

 The columns represent attributes (like name, age, major).

 Each row represents a single student.


 Examples: MySQL, PostgreSQL, Oracle, SQL Server.

 NoSQL Vs RDBMS Databases

Feature NoSQL RDBMS (SQL)

Flexible (documents, key-value, graphs, Structured (tables with rows and


Data Model
etc.) columns)

No fixed schema (can change over


Schema Fixed schema (predefined structure)
time)

Vertical scaling (requires stronger


Scaling Horizontal scaling (across many servers)
hardware)

Not always ACID-compliant (eventual ACID-compliant (strong consistency


Transactions
consistency) and reliability)

High performance, especially for large Optimized for complex queries and
Performance
datasets transactions

Big data, real-time apps, flexible data Financial systems, CRMs, inventory
Use Cases
(social media, IoT) systems, reporting

MySQL, PostgreSQL, Oracle, SQL


Examples MongoDB, Cassandra, Redis, Neo4j
Server

Unit 2 ( Module 1 )
 Tools
1. Database Management Systems (DBMS):

These are the core tools used to create, manage, and interact with databases. They allow
users to store, retrieve, and manipulate data.

 Examples:

o MySQL: A popular open-source relational database system.

o PostgreSQL: Another open-source database system known for its advanced


features.

o MongoDB: A NoSQL database used for flexible data storage (documents, key-
value pairs, etc.).

2. ETL Tools (Extract, Transform, Load):

ETL tools are used to move and manipulate data from different sources and load it into a
data warehouse or database.

 Extract: Getting data from various sources.

 Transform: Cleaning or converting the data into a suitable format.

 Load: Putting the data into the final destination (like a data warehouse).

 Examples:

o Informatica: A powerful tool used for data integration.

o Talend: An open-source ETL tool that helps in connecting and transforming


data.

o Apache Nifi: A tool for automating the flow of data between systems.

3. Data Warehousing Tools:

These are used to store and manage large amounts of historical data that come from
various sources, making it easier for businesses to run reports and analyze trends.

 Examples:

o Amazon Redshift: A cloud-based data warehouse that can handle large


datasets.

o Google BigQuery: A tool for running fast, SQL-like queries on massive


amounts of data in the cloud.

4. Database Performance Tuning Tools:


These tools help optimize and monitor how well a database is running. They make sure the
database is fast, efficient, and can handle a lot of queries.

 Examples:

o Oracle Enterprise Manager: Helps monitor and manage Oracle databases.

o SQL Profiler (for SQL Server): Monitors and analyzes SQL queries to identify
slow parts of the database.

o pgAdmin: A tool for managing PostgreSQL databases and optimizing their


performance.

5. Backup and Recovery Tools:

These tools ensure that your data is safe and can be restored if something goes wrong, like a
system failure or human error.

 Examples:

o Veeam: A backup and recovery tool for both databases and virtual
environments.

o RMAN (Recovery Manager): A tool for backing up and recovering Oracle


databases.

6. Data Migration Tools:

These tools help you move data from one system or format to another, such as moving data
between different databases or to the cloud.

 Examples:

o AWS Database Migration Service: Helps you move databases to the cloud
with minimal downtime.

o Microsoft Data Migration Assistant: Used to migrate databases to SQL Server.

7. NoSQL Database Tools:

These tools help manage and interact with NoSQL databases that store data in ways other
than traditional tables (e.g., key-value pairs, documents, or graphs).

 Examples:

o MongoDB Compass: A GUI tool for MongoDB that helps visualize and analyze
data.

o Cassandra Query Language (CQL): A tool used to interact with Apache


Cassandra (a NoSQL database).

8. Database Security Tools:


These tools ensure that the data is protected and only authorized users can access or modify
it.

 Examples:

o IBM Guardium: Monitors and protects sensitive data in databases.

o Oracle Audit Vault: A tool for monitoring database security and compliance.

9. Data Visualization and Reporting Tools:

These tools help create reports and visualizations of the data stored in databases, making it
easier to analyze trends and make decisions.

 Examples:

o Tableau: A popular tool for creating visualizations and dashboards from


database data.

o Power BI: A Microsoft tool that connects to various databases and creates
interactive reports and dashboards.

 OLTP & OLAP


OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are two types
of database systems used for different purposes

1. OLTP (Online Transaction Processing):

o It's designed for handling everyday transactions and operations.

o Example: When you make a purchase online, check your bank account
balance, or update your contact details, these are all OLTP activities.

o Focus: Speed, accuracy, and handling many small transactions at once (like
inserting, updating, or deleting records).

o Databases are usually highly normalized (organized to minimize redundancy).

Example: An e-commerce website where every time a customer buys something, the system
records the transaction, updates the inventory, and adjusts the customer's order history.

2. OLAP (Online Analytical Processing):

o It's designed for complex data analysis and reporting, often using historical
data.
o Example: Looking at business trends over the past year, running reports on
sales performance by region, or analyzing data for decision-making.

o Focus: Complex queries, aggregations, and summarizations of large datasets,


often for decision-making.

o Databases are usually denormalized (to make analysis faster by storing data in
a more readable format).

Example: A company’s manager might run an OLAP query to find out how sales have
changed over the last 5 years in different regions.

Key Differences:

 OLTP is about fast and efficient handling of transactions, while OLAP is about
analyzing large amounts of data for patterns and trends.

 OLTP databases have lots of small updates, inserts, and deletions, whereas OLAP
databases focus on large read-heavy operations, like summarizing and analyzing
data.

 Data Preparation & Cleaning Techniques


In an advanced database context, data preparation and cleaning techniques are all
about making sure the data you work with is accurate, consistent, and usable for
analysis or further processing. Here are the most common techniques,
1. Handling Missing Data
 Why?: Missing data can mess up your analysis, so it's important to deal with it.
 How?:
o Remove Missing Data: Sometimes, if the missing data is small, you can simply
remove the rows or columns that have it.
o Fill with Defaults: You can replace missing values with common replacements
like the mean, median, or the most frequent value.
o Prediction: Use algorithms to predict what the missing values should be
based on other data.

2. Removing Duplicates
 Why?: Duplicate data can distort your results, making them inaccurate.
 How?: Find and remove rows that are exactly the same to ensure that each record is
unique.
3. Standardizing Data
 Why?: Data may come from different sources with different formats (like dates in
various formats), which can cause confusion.
 How?:
o Consistent Formats: Make sure everything is in the same format (e.g., dates
should all be in YYYY-MM-DD).
o Scaling: If you're working with numbers, sometimes you need to normalize or
standardize them (scaling to a specific range or making them comparable).

4. Handling Outliers
 Why?: Outliers (data points far from the norm) can skew your analysis and make
results unreliable.
 How?: Identify and either remove outliers or transform them to be in line with other
data, depending on their significance.

5. Dealing with Categorical Data


 Why?: Many machine learning algorithms can't work with categories like "yes", "no",
"red", "blue" directly.
 How?: Convert these categories into numbers or one-hot encode them (creating
separate columns for each category).

6. Text Data Cleaning


 Why?: If you're working with text data (like customer reviews or tweets), it might
contain extra or irrelevant information.
 How?:
o Remove unwanted characters (like punctuation or special symbols).
o Lowercase everything to make it uniform.
o Remove common words (like "the", "is", "and") that don’t add much
meaning.

7. Fixing Inconsistent Data


 Why?: Sometimes data entries aren’t consistent (e.g., "USA" vs "U.S.A." or "NY" vs
"New York").
 How?: Standardize the way things are written, making sure they all follow the same
naming rules.

8. Converting Data Types


 Why?: Data may be stored incorrectly (e.g., numbers stored as text or dates stored as
plain text), making it hard to work with.
 How?: Convert data into the right type (e.g., turning a string of numbers into actual
numeric values).
9. Data Transformation
 Why?: Sometimes data needs to be changed to make it more useful for analysis.
 How?:
o Log Transformation: For very large numbers, taking the logarithm can make
the data easier to analyze.
o Feature Engineering: Create new columns from existing data, like splitting a
"date" column into "day", "month", and "year".

10. Data Consistency Checks


 Why?: You need to make sure your data is valid and follows the rules you expect
(e.g., no negative values for ages or prices).
 How?: Verify that the data follows proper rules and fix any errors (like changing a
negative price to a valid value).

11. Data Aggregation


 Why?: Sometimes, you need to combine data into a simpler form to make it more
useful for analysis.
 How?: You might combine data from different rows or columns into a single
summary, like calculating the total sales from individual product sales.

By applying these techniques, you make sure that the data in your advanced
database is clean, consistent, and ready for more complex analysis, like generating
reports, building models, or making predictions.

You might also like