Advanced Databases Unit 2
Advanced Databases Unit 2
Unit 2 ( Module 1 )
NoSQL, NewSQL
1. NoSQL:
NoSQL stands for "Not Only SQL". It’s a type of database that is designed for storing
and managing large amounts of data that may not fit well into traditional relational
databases.
They can handle huge amounts of data across many servers.
NoSQL can store data in different formats, like key-value pairs, documents, wide-
columns, or graphs
2. NewSQL:
NewSQL is a newer category of databases that aim to provide the advantages of SQL
(structured data and relational models) with the scalability and performance
features that NoSQL databases offer.
It is designed to scale horizontally, which means it can handle increased traffic and
large amounts of data more easily (just like NoSQL).
support transactional processing (like banking systems).
What it is: NewSQL is built to combine the best of both worlds: it supports traditional
SQL (structured queries, transactions) but can handle large-scale data and
distributed architectures like NoSQL.
Popular NewSQL Databases:
o Google Spanner: A distributed relational database that can scale horizontally
while maintaining consistency and strong consistency guarantees.
o CockroachDB: A distributed SQL database that is easy to scale while
maintaining SQL features.
o VoltDB: A high-performance NewSQL database designed for fast transactions.
When to Use: NewSQL is useful when you need:
o Relational data but also need to scale to handle high traffic.
o Strong consistency and ACID transactions at a large scale.
o High availability with minimal downtime.
RDBMS Databases
RDBMS (Relational Database Management System):
An RDBMS is a type of database that stores data in an organized way, using tables
that are related to each other. It's like a digital spreadsheet where the data is
structured into rows and columns.
Example:
High performance, especially for large Optimized for complex queries and
Performance
datasets transactions
Big data, real-time apps, flexible data Financial systems, CRMs, inventory
Use Cases
(social media, IoT) systems, reporting
Unit 2 ( Module 1 )
Tools
1. Database Management Systems (DBMS):
These are the core tools used to create, manage, and interact with databases. They allow
users to store, retrieve, and manipulate data.
Examples:
o MongoDB: A NoSQL database used for flexible data storage (documents, key-
value pairs, etc.).
ETL tools are used to move and manipulate data from different sources and load it into a
data warehouse or database.
Load: Putting the data into the final destination (like a data warehouse).
Examples:
o Apache Nifi: A tool for automating the flow of data between systems.
These are used to store and manage large amounts of historical data that come from
various sources, making it easier for businesses to run reports and analyze trends.
Examples:
Examples:
o SQL Profiler (for SQL Server): Monitors and analyzes SQL queries to identify
slow parts of the database.
These tools ensure that your data is safe and can be restored if something goes wrong, like a
system failure or human error.
Examples:
o Veeam: A backup and recovery tool for both databases and virtual
environments.
These tools help you move data from one system or format to another, such as moving data
between different databases or to the cloud.
Examples:
o AWS Database Migration Service: Helps you move databases to the cloud
with minimal downtime.
These tools help manage and interact with NoSQL databases that store data in ways other
than traditional tables (e.g., key-value pairs, documents, or graphs).
Examples:
o MongoDB Compass: A GUI tool for MongoDB that helps visualize and analyze
data.
Examples:
o Oracle Audit Vault: A tool for monitoring database security and compliance.
These tools help create reports and visualizations of the data stored in databases, making it
easier to analyze trends and make decisions.
Examples:
o Power BI: A Microsoft tool that connects to various databases and creates
interactive reports and dashboards.
o Example: When you make a purchase online, check your bank account
balance, or update your contact details, these are all OLTP activities.
o Focus: Speed, accuracy, and handling many small transactions at once (like
inserting, updating, or deleting records).
Example: An e-commerce website where every time a customer buys something, the system
records the transaction, updates the inventory, and adjusts the customer's order history.
o It's designed for complex data analysis and reporting, often using historical
data.
o Example: Looking at business trends over the past year, running reports on
sales performance by region, or analyzing data for decision-making.
o Databases are usually denormalized (to make analysis faster by storing data in
a more readable format).
Example: A company’s manager might run an OLAP query to find out how sales have
changed over the last 5 years in different regions.
Key Differences:
OLTP is about fast and efficient handling of transactions, while OLAP is about
analyzing large amounts of data for patterns and trends.
OLTP databases have lots of small updates, inserts, and deletions, whereas OLAP
databases focus on large read-heavy operations, like summarizing and analyzing
data.
2. Removing Duplicates
Why?: Duplicate data can distort your results, making them inaccurate.
How?: Find and remove rows that are exactly the same to ensure that each record is
unique.
3. Standardizing Data
Why?: Data may come from different sources with different formats (like dates in
various formats), which can cause confusion.
How?:
o Consistent Formats: Make sure everything is in the same format (e.g., dates
should all be in YYYY-MM-DD).
o Scaling: If you're working with numbers, sometimes you need to normalize or
standardize them (scaling to a specific range or making them comparable).
4. Handling Outliers
Why?: Outliers (data points far from the norm) can skew your analysis and make
results unreliable.
How?: Identify and either remove outliers or transform them to be in line with other
data, depending on their significance.
By applying these techniques, you make sure that the data in your advanced
database is clean, consistent, and ready for more complex analysis, like generating
reports, building models, or making predictions.