0% found this document useful (0 votes)
9 views4 pages

Unit 5 DMS

The document provides an overview of database backup processes, types of failures, and backup strategies, emphasizing the importance of backups for disaster recovery. It also discusses advanced database concepts such as data warehouses and data lakes, highlighting their characteristics, use cases, and differences. Additionally, it covers data mining techniques, big data characteristics, and features of NoSQL databases like MongoDB and DynamoDB.

Uploaded by

dinesh22.b4u
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

Unit 5 DMS

The document provides an overview of database backup processes, types of failures, and backup strategies, emphasizing the importance of backups for disaster recovery. It also discusses advanced database concepts such as data warehouses and data lakes, highlighting their characteristics, use cases, and differences. Additionally, it covers data mining techniques, big data characteristics, and features of NoSQL databases like MongoDB and DynamoDB.

Uploaded by

dinesh22.b4u
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Database Backup:

Types of Failures, Causes of Failure, and Backup Overview


Database Backup Overview: A database backup is the process of creating a copy of the data from a database so that it
can be restored in case of data loss or corruption. Backup is a critical part of a database's disaster recovery plan,
ensuring data integrity and availability in case of failure.

Types of Failures in Database Systems


1. Hardware Failures:
o Cause: Physical damage to the storage device (e.g., hard disk failure).
o Impact: Data loss or unavailability due to the inability to access stored data.
2. Software Failures:
o Cause: Bugs, application crashes, or system malfunction.
o Impact: Data corruption or failure to process database queries.
3. Human Errors:
o Cause: Accidental deletion of data, wrong configuration changes, or improper data entry.
o Impact: Loss of critical data or misconfigured databases, leading to system malfunctions.
4. Data Corruption:
o Cause: Issues like bugs in the database management system (DBMS), network issues, or power outages.
o Impact: Inaccessibility or incorrect retrieval of data.
5. Security Breaches:
o Cause: Hacking, ransomware, or malicious attacks.
o Impact: Unauthorized access, data theft, or corruption.
6. Natural Disasters:
o Cause: Fires, floods, earthquakes, etc.
o Impact: Physical damage to data storage facilities and servers.

Causes of Database Failures


1. Hardware Failure: This could include disk crashes, memory failures, or server downtime.
2. Software Bugs: Issues within the DBMS or database applications that cause crashes or malfunctions.
3. Power Outages: Unexpected power losses that disrupt the database operations, possibly leading to data
corruption.
4. Network Failures: Issues like slow network speeds or connection drops that may disrupt access to the database.
5. Human Mistakes: Manual errors, such as accidental deletion or incorrect updates, can lead to critical data loss.
6. Security Threats: Attacks like SQL injection, ransomware, and data breaches can compromise data integrity and
availability.
7. Improper Maintenance: Lack of regular updates or failure to monitor the system can lead to performance
degradation or vulnerabilities.

Types of Database Backups


1. Full Backup:
o A complete copy of the entire database, including all data, schema, and settings.
o Pros: Simplifies restoration because all data is in one backup.
o Cons: Takes longer to complete and requires more storage space.
2. Incremental Backup:
o Only the changes made since the last backup (either full or incremental) are backed up.
o Pros: Faster and requires less storage than full backups.
o Cons: Restoration is slower since multiple backup sets may be needed.
3. Differential Backup:
o Backs up all changes made since the last full backup.
o Pros: Faster restoration compared to incremental backups.
o Cons: Takes more storage space and time than incremental backups.
4. Transaction Log Backup:
o Captures all the transaction log entries that are recorded in the database.
o Pros: Enables point-in-time recovery.
o Cons: Requires that the database be running in full or bulk-logged recovery model.

Backup Strategies
1. Local Backup:
o Backups are stored on the same server or nearby local storage devices.
o Pros: Fast access for restoring data.
o Cons: Vulnerable to local disasters (e.g., fire, theft).
2. Remote Backup:
o Backups are stored on external servers or cloud storage.
o Pros: Provides protection against local failures and disasters.
o Cons: Slower recovery times, and the need for internet connectivity.
3. Cloud Backup:
o Backups are stored in cloud services like AWS, Google Cloud, or Azure.
o Pros: Off-site storage, automated backups, and scalability.
o Cons: Reliant on internet access and cloud service provider reliability.

Advanced Database Concepts: Data Warehouse and Data Lakes

Data Warehouse:
 Definition: A Data Warehouse is a centralized repository used for storing and analyzing large volumes of
structured data from multiple sources. It’s specifically designed for business intelligence (BI) tasks like reporting,
querying, and data analysis.
 Characteristics:
o Structured Data: Primarily stores structured data (e.g., relational data from operational systems).
o OLAP (Online Analytical Processing): Optimized for complex queries and analytical workloads rather
than transactional processing.
o Data Integration: Integrates data from various sources like transactional databases, external data
sources, and other systems.
o ETL Process: Data is extracted, transformed, and loaded (ETL) into the warehouse for analysis.
 Use Cases:
o Business intelligence (BI) reporting
o Trend analysis and decision-making support
o Historical data analysis
 Examples:
o Amazon Redshift, Google BigQuery, Microsoft Azure Synapse Analytics.

Data Lake:
 Definition: A Data Lake is a large, centralized repository that stores vast amounts of raw, unstructured, semi-
structured, and structured data. It allows for the storage of data in its native format until it is needed for
analysis.
 Characteristics:
o Raw and Unstructured Data: Can store unstructured data like text, images, videos, logs, and social
media data, alongside structured data.
o Scalability: Highly scalable, capable of storing petabytes of data, often used in big data applications.
o Schema-on-Read: Data is stored without a predefined schema, and the schema is applied when the data
is read or analyzed, making it more flexible for future analysis.
o Data Variety: Capable of storing diverse data types (e.g., JSON, XML, images, text).
 Use Cases:
o Big data analytics
o Machine learning and data mining
o Real-time analytics and data exploration
 Examples:
o Amazon S3 (with analytics tools like AWS Lake Formation), Microsoft Azure Data Lake Storage, Google
Cloud Storage.

Difference Between
Feature Data Warehouse Data Lake
Data Type Primarily structured data Structured, semi-structured, and unstructured data
Storage Data is processed and structured before
Raw data stored in its native format
Format storing
Schema Schema-on-write (predefined schema) Schema-on-read (schema applied during analysis)
Business intelligence, reporting, historical Big data analytics, machine learning, real-time
Use Case
analysis processing
Optimized for complex queries and reporting Optimized for flexible and scalable data storage and
Processing
(OLAP) processing

Data Mining
 Definition: Data mining is the process of discovering patterns, correlations, trends, and useful information from
large datasets using statistical, mathematical, and computational techniques. It transforms raw data into
valuable insights.
 Techniques:
o Classification: Assigning items to predefined categories (e.g., spam vs. non-spam emails).
o Clustering: Grouping similar items without predefined labels (e.g., customer segmentation).
o Association Rule Mining: Finding interesting relationships between variables (e.g., market basket
analysis).
o Regression: Predicting a continuous value based on data (e.g., forecasting sales).
o Anomaly Detection: Identifying outliers or unusual patterns in data (e.g., fraud detection).
 Applications:
o Market basket analysis
o Fraud detection
o Customer segmentation
o Predictive analytics (e.g., stock market forecasting)

2. Big Data
 Definition: Big Data refers to extremely large datasets that are too complex or voluminous to be processed by
traditional database management systems (DBMS) or computing tools. Big data typically involves the 3 Vs:
o Volume: Large amounts of data (petabytes, exabytes).
o Velocity: The speed at which data is generated, processed, and analyzed (real-time or near real-time).
o Variety: Different types of data (structured, semi-structured, unstructured).
 Characteristics:
o Scale: Big data can include vast amounts of data that come from diverse sources like social media,
sensors, logs, and more.
o Data Processing: It requires distributed computing systems, such as Hadoop and Spark, to process
efficiently.
o Advanced Analytics: Often used for advanced analytics, predictive modeling, and machine learning.
 Applications:
o Real-time analytics (e.g., stock market analysis)
o Internet of Things (IoT)
o Social media analysis
o Healthcare, genomics, and scientific research

3. MongoDB
 Definition: MongoDB is an open-source, NoSQL database that stores data in a flexible, document-oriented
format using JSON-like documents (BSON - Binary JSON). MongoDB is designed to handle large-scale,
unstructured, or semi-structured data.
 Key Features:
o Schema-less Design: Data is stored in JSON-like documents, allowing flexibility in the structure (no
predefined schema).
o Scalability: MongoDB is horizontally scalable, meaning it can distribute data across many servers
(sharding).
o High Availability: Supports replication, where data is duplicated across multiple nodes to ensure
availability.
o Aggregation Framework: Provides a powerful way to perform data analysis and aggregation.

 Advantages:
o Flexibility and scalability
o High availability and fault tolerance
o Suitable for unstructured data

4. DynamoDB

 Definition: DynamoDB is a fully managed NoSQL database service provided by Amazon Web Services (AWS). It
is designed for high availability, scalability, and low-latency performance. DynamoDB is optimized for
applications that require consistent, single-digit millisecond response times.
 Features:
o Key-Value and Document Data Model: It supports both key-value pairs and document-based structures,
allowing for flexible data representation.
o Scalable and Fast: DynamoDB can automatically scale up or down to handle increasing traffic without
manual intervention.
o Fully Managed: Amazon handles infrastructure, backups, replication, and scaling automatically.
o Global Replication: Allows for data replication across multiple regions for low-latency access.
 Advantages:
o Automatic scaling and management
o Low-latency reads and writes
o Built-in fault tolerance and replication

You might also like