Chapter 6 - Generated Flashcards and Questions
Chapter 6 - Generated Flashcards and Questions
Chapter 6 - Generated Flashcards and Questions
o In traditional file systems, each application has its own files, and this leads to data
redundancy (duplication of data) and inconsistency (same data exists in multiple places
but might differ).
DBMS Purpose:
o The DBMS reduces data dependence, provides flexible access to data, and improves
data security.
Components of a DBMS:
o Data Dictionary: Stores the definition of data elements and their relationships.
o Structured Query Language (SQL): A standard language used to interact with databases.
o Normalization is used to reduce data redundancy by dividing data into smaller tables
and ensuring relationships between them.
Capabilities of a DBMS:
o Data Manipulation Language (DML): Facilitates data retrieval, insertion, deletion, and
updates. SQL is the most commonly used DML.
o Query Capabilities: Allows users to retrieve data using various query functions.
o Reporting and Analytics: Tools to create reports and dashboards based on the
database.
o Used in Big Data applications, these databases allow for scalability and flexibility.
Data Warehouses:
o A data warehouse consolidates data from different operational systems for analysis and
reporting. It stores historical data.
Data Marts:
Big Data:
o Refers to datasets too large for traditional databases, characterized by high volume,
velocity, and variety.
o Data Mining: Identifies hidden patterns and relationships in large datasets to make
predictions.
Information Policy:
o Refers to the rules and guidelines that govern how data is collected, used, and
maintained in an organization.
Data Governance:
o Ensures data is accurate, consistent, and secure. Involves policies for data management
and user access.
Data Quality:
Flashcard 1
Q: What are the major problems with traditional file systems?
A: Data redundancy, inconsistency, lack of flexibility, poor security, and program-data dependence.
Flashcard 2
Q: What are the key components of a Database Management System (DBMS)?
A: Data Definition Language (DDL), Data Dictionary, Structured Query Language (SQL).
Flashcard 3
Q: What is normalization in a relational database?
A: Normalization is the process of organizing data to minimize redundancy and dependency by dividing
data into smaller tables and linking them through relationships.
Flashcard 4
Q: What is a Data Warehouse?
A: A Data Warehouse is a centralized repository that stores current and historical data for analysis and
reporting.
Flashcard 5
Q: What is OLAP?
A: OLAP (Online Analytical Processing) is a tool that allows users to analyze data from multiple
dimensions, helping in decision-making.
Flashcard 6
Q: What are Big Data and Hadoop?
A: Big Data refers to large, complex datasets that traditional databases can’t handle. Hadoop is an open-
source framework used to process and store big data across distributed computer clusters.
Flashcard 7
Q: What is the purpose of data governance?
A: Data governance ensures the integrity, accuracy, security, and quality of data through the
establishment of policies and standards.
Flashcard 8
Q: What is data cleansing?
A: Data cleansing involves detecting and correcting inaccuracies and inconsistencies in data to improve
its quality.
Q1: Discuss the major problems associated with traditional file environments. How do database
management systems (DBMS) address these problems?
Answer:
Traditional file environments suffer from problems like data redundancy (duplicate data across multiple
systems), data inconsistency (inaccurate or out-of-sync data), lack of flexibility (difficulty in accessing or
updating data), poor security, and program-data dependence (the need to modify application programs
if the data format changes).
A DBMS addresses these issues by providing a centralized system to manage data, eliminating
redundancy, ensuring consistency, and making data accessible to multiple users and applications. It
separates the data from the application logic, allowing greater flexibility and control over data integrity
and security.
Q2: What are the key capabilities of a DBMS and how do they help in managing data?
Answer:
The key capabilities of a DBMS include:
Data Definition Language (DDL): Specifies the structure of the data in the database.
Data Manipulation Language (DML): Facilitates data access, retrieval, and modification through
SQL.
Query Processing: Allows users to query data for analysis and reporting.
Reporting and Analytics: Supports decision-making by providing insights into data patterns.
These capabilities make it easier to store, retrieve, and manipulate large amounts of data while
ensuring data consistency, integrity, and security.
Q3: Explain how data warehouses and data marts are used in organizations.
Answer:
Data warehouses are large repositories that store historical data from different sources for analytical
purposes. They support decision-making by providing a unified view of the organization’s data. Data
marts, on the other hand, are smaller, more focused subsets of the data warehouse designed for
specific departments or functions like marketing or sales. Both enable businesses to analyze trends,
monitor performance, and make data-driven decisions.
Q4: How does OLAP differ from data mining, and how are both used in business intelligence?
Answer:
OLAP (Online Analytical Processing) allows users to perform multi-dimensional analysis on large
datasets, viewing data from different perspectives (e.g., time, geography, product). It is ideal for
interactive, fast queries and comparisons.
Data mining is used to uncover hidden patterns, relationships, and trends in large datasets. It can
predict future behavior by analyzing historical data. While OLAP helps in summarizing and analyzing
data, data mining goes deeper by identifying underlying patterns not immediately visible.
Q5: What are the key components of an effective data governance policy?
Answer:
An effective data governance policy should include:
Data Security: Establishing controls to protect data from unauthorized access and breaches.
Data Access: Defining who can access and modify data based on roles and responsibilities.
Compliance: Ensuring that data management practices comply with legal and regulatory
requirements.
Flashcard 9
Q: What are non-relational databases, and why are they used?
A: Non-relational databases, or NoSQL databases, are designed to handle large volumes of unstructured
data. They are used in situations where scalability, flexibility, and speed are more important than the
traditional relational model, such as in Big Data and real-time web applications.
Flashcard 10
Q: What is the purpose of a data dictionary?
A: A data dictionary is a metadata repository that defines the structure of the database, listing data
elements, their types, relationships, and constraints. It is used by the DBMS to manage data effectively
and enforce integrity rules.
Flashcard 11
Q: What is a database schema?
A: A database schema is the structure or blueprint of the database that defines how data is organized,
including tables, fields, relationships, and constraints.
Flashcard 12
Q: What are primary keys and foreign keys in relational databases?
A: A primary key is a unique identifier for each record in a table, while a foreign key is a field in one table
that links to the primary key of another table, establishing a relationship between the two tables.
Flashcard 13
Q: What is data mining, and how does it benefit organizations?
A: Data mining is the process of analyzing large datasets to discover patterns and relationships. It helps
organizations predict trends, make informed decisions, and personalize services.
Flashcard 14
Q: What is the difference between structured, semi-structured, and unstructured data?
A: Structured data is organized in a predefined format (e.g., relational databases), semi-structured data
has some organizational properties but is not fully structured (e.g., JSON files), and unstructured data
lacks a specific structure (e.g., text documents, videos).
Flashcard 15
Q: What are the characteristics of Big Data?
A: Big Data is characterized by high volume (large data sets), velocity (speed of data generation), and
variety (different types of data).
Flashcard 16
Q: How does the Hadoop framework support Big Data?
A: Hadoop allows for the distributed processing of large data sets across clusters of computers using a
simple programming model. It provides fault tolerance, scalability, and parallel processing capabilities.
Flashcard 17
Q: What are the main types of database models?
A: The main types include hierarchical, network, relational, and object-oriented models. The relational
model is the most widely used, organizing data in tables with rows and columns.
Flashcard 18
Q: What is the difference between OLTP and OLAP?
A: OLTP (Online Transaction Processing) supports daily transaction processing (e.g., banking, retail),
while OLAP (Online Analytical Processing) supports complex analysis and decision-making using
historical data.
Q6: What are the key advantages of using a DBMS compared to traditional file systems?
Answer:
A DBMS provides several advantages over traditional file systems:
1. Data Integrity and Consistency: A DBMS ensures that data is consistent and accurate by
enforcing rules and constraints.
2. Reduced Data Redundancy: In a DBMS, data is stored centrally and shared across applications,
reducing duplication.
3. Improved Data Security: DBMSs provide centralized control over who can access and modify the
data.
4. Flexibility: DBMSs allow users to quickly access and manipulate data in a variety of ways,
without having to rewrite application programs.
5. Scalability: DBMSs are better suited for handling large datasets as organizations grow.
Q7: Explain the role of Hadoop in managing Big Data and its significance in modern organizations.
Answer:
Hadoop is an open-source framework that allows for the distributed processing of large datasets across
clusters of computers. It is significant in modern organizations because it:
2. Fault Tolerance: Provides data redundancy across nodes, ensuring that the system continues
functioning even if individual nodes fail.
3. Cost-Effective: It runs on commodity hardware, making it an affordable option for Big Data
processing.
4. Flexibility: Supports both structured and unstructured data, allowing businesses to analyze
different types of information, from log files to images and videos.
Q8: What are the different types of data integrity constraints in a relational database, and why are they
important?
Answer:
Data integrity constraints ensure that the data entered into a database is accurate and consistent. The
types include:
1. Entity Integrity: Ensures that each table has a primary key and that no two rows have the same
primary key value.
2. Referential Integrity: Ensures that foreign key values in one table match primary key values in
another, maintaining the relationships between tables.
3. Domain Integrity: Ensures that each column contains only valid data types or values, such as
dates or numbers.
4. User-Defined Integrity: Custom rules defined by the user to meet specific business
requirements.
These constraints are crucial for maintaining data accuracy and ensuring the reliability of the database in
decision-making.
Q9: How do data warehouses differ from operational databases, and why are they essential for business
intelligence?
Answer:
Operational databases are designed for routine transaction processing and store real-time data that
supports day-to-day operations. Data warehouses, on the other hand, store historical data consolidated
from multiple sources for analysis and reporting purposes. Data warehouses are essential for business
intelligence because they:
1. Support complex queries that require analyzing data across time periods.
3. Improve decision-making by allowing managers to access integrated, clean, and consistent data
for analysis.
Q10: Discuss the role of data governance in organizations and how it impacts data quality.
Answer:
Data governance refers to the overall management of the availability, usability, integrity, and security of
the data used in an organization. It involves defining policies, processes, and responsibilities for data
management. Impact on data quality:
1. Data Consistency: Governance ensures that data is consistent across different departments and
systems.
2. Accuracy: Data governance sets standards to ensure the accuracy of the data used for decision-
making.
3. Compliance: Ensures that data management complies with regulatory requirements, such as
GDPR or HIPAA.
4. Security: Establishes protocols for who can access data and how it is protected from
unauthorized access, reducing the risk of data breaches.