0% found this document useful (0 votes)
9 views47 pages

Week 5 - Database System and Big Data Analytics

This document covers the fundamentals of database systems and big data analytics, detailing the hierarchy of data, advantages of database management, and key considerations for database design. It explains various data models, the roles of database administrators, and the challenges and opportunities presented by big data. Additionally, it discusses data management practices, including data warehouses, data marts, and NoSQL databases, as well as the Hadoop framework for processing large datasets.

Uploaded by

ishmalali13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views47 pages

Week 5 - Database System and Big Data Analytics

This document covers the fundamentals of database systems and big data analytics, detailing the hierarchy of data, advantages of database management, and key considerations for database design. It explains various data models, the roles of database administrators, and the challenges and opportunities presented by big data. Additionally, it discusses data management practices, including data warehouses, data marts, and NoSQL databases, as well as the Hadoop framework for processing large datasets.

Uploaded by

ishmalali13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

INF101 – Business

Information Systems

Week 5 - Database System and Big


Data Analytics (Chapter 5)

1
Objectives
After completing this chapter, you will be able to:
Identify and briefly describe the members of the hierarchy of data
Identify the advantages of the database approach to data
management
Identify the key factors that must be considered when designing a
database
Identify the various types of data models and explain how they
are useful in planning a database

2
Objectives
After completing this chapter, you will be able to (cont’d):
Describe the rational database model
Define the role of the database schema, data definition language,
and data manipulation language
Discuss the role of a database administrator and data
administrator
Identify the common functions performed by all database
management systems
Define the term big data
Explain why big data represents a challenge and an opportunity
3
Objectives
After completing this chapter, you will be able to (cont’d):
Define the term data management
Define the terms data warehouse, data mart, and data lakes and
explain how they are different
Outline the extract, transform, load process
Explain how a NoSQL database is different from an SQL database
Discuss the whole Hadoop computing environment
Define the term in-memory database and explain its advantages in
processing big data

4
Introduction

• Database: an organized collection of data


• A database management system (DBMS) is a group of programs that:
• Manipulate the database
• Provide an interface between the database and its users and other application
programs

5
Data Fundamentals

• Without data and the ability to process it:


• An organization could not successfully complete most business activities
• Data consists of raw facts
• Data must be organized in a meaningful way to transform it into useful
information

6
Hierarchy of Data

• A bit (binary digit) represents a circuit that is either on or off


• A byte is made up of eight bits
• Each byte represents a character
• Field: a name, number, or combination of characters that describes an aspect of
a business object or activity
• Record: a collection of related data fields
• File: a collection of related records

7
Hierarchy of Data

• Database: a collection of integrated and related files


• Hierarchy of data: bits, characters, fields, records, files, and databases

8
Data Entities, Attributes, and Keys

• Entity: a person, place, or thing for which data is collected, stored, and
maintained
• Attribute: a characteristic of an entity
• Data item: the specific value of an attribute
• Primary key: a field or set of fields that uniquely identifies the record

9
Data Entities, Attributes, and Keys

10
The Database Approach

• Traditional approach to data management


• Each distinct operational system used data files dedicated to that system
• Database approach to data management
• Information systems share a pool of related data
• Offers the ability to share data and information resources
• A database management system (DBMS) is required

11
The Database Approach

12
Data Modeling and Database Characteristics

• Considerations when building a database


• Content: what data should be collected? cost?
• Access: what data should be provided to which users and when?
• Logical structure: how should data be arranged so that it makes sense?
• Physical organization: where should data be physically located?
• Archiving: how long to store?
• Security: how can data be protected?

13
Data Modeling

• Data model: a diagram of data entities and their relationships


• Enterprise data modeling: data modeling done at the level of the entire
enterprise
• Entity-relationship (ER) diagrams: data models that use basic graphical symbols
to show the organization of and relationships between data

14
Data Modeling

15
Data Modeling

16
Relational Database Model

• Relational model: a simple but highly useful way to organize data into
collections of two-dimensional tables called relations
• Each row in the table represents an entity
• Each column represents an attribute of that entity
• Domain: range of allowable values for a data attribute

17
Relational Database Model

18
Manipulating Data

19
Manipulating Data

20
Data Cleansing

• Also called data cleaning or data scrubbing


• The process of detecting and then correcting or deleting incomplete, incorrect,
inaccurate, irrelevant records that reside in a database
• The cost of performing data cleansing can be quite high
• Different from data validation
• Which involves the identification of “bad data” and its rejection at the time of data
entry

21
Data Cleansing

22
SQL Databases

• SQL: a special-purpose programming language for accessing and manipulating


data stored in a relational database
• SQL databases conform to ACID properties:
• Atomicity, consistency, isolation, and durability
• 1986: SQL was adopted by ANSI as the standard query language for relational
databases

23
SQL Databases

24
Database Activities

• Providing a user view of the database


• Adding and modifying data
• Storing and retrieving data
• Manipulating the data and generating reports

25
Providing a User View

• Schema: a description of the entire database


• A schema can be part of the database or a separate schema file
• The DBMS can reference a schema to find where to access the requested data
in relation to another piece of data

26
Creating and Modifying the Database

• Data definition language (DDL)


• A collection of instructions and commands used to define and describe data and
relationships in a specific database
• Allows the database’s creator to describe data and relationships that are to be
contained in the schema
• Data dictionary: a detailed description of all the data used in the database
• Can also include a description of data flows, information about the way records are
organized, and the data-processing requirements

27
Storing and Retrieving Data

• When an application program needs data, it requests the data through the
DBMS
• Concurrency control deals with the situation in which two or more users or
applications need to access the same record at the same time

28
Manipulating Data and Generating Reports

29
Database Administration

• Database administrators (DBAs): skilled and trained IS professionals


• Works with users to define their data needs
• Applies database programming languages to craft a set of databases to meet those
needs
• Tests and evaluates databases
• Implements changes to improve their databases’ performance
• Assures that data is secure from unauthorized access

30
Database Administration

• Data administrator: a nontechnical position responsible for defining and


implementing consistent principles for a variety of data issues
• Including setting data standards and data definitions that apply across all the
databases in an organization
• The data administrator can be a high-level position reporting to top-level
managers

31
Popular Database Management Systems

32
Popular Database Management Systems

• Database as a Service (DaaS)


• The database is stored on a service provider’s servers
• The database is accessed by the client over a network, typically the Internet
• Database administration is handled by the service provider
• Example of DaaS: Amazon Relational Database Service (Amazon RDS)

33
Using Databases with Other Software

• DBMSs can act as front-end or back-end applications


• Front-end applications interact directly with people
• Back-end applications interact with other programs or applications
• Example:
• The Library of Congress (LOC) provides a back-end application that allows Web access
to its databases, which include references to books and digital media in the LOC
collection

34
Big Data

• Extremely large and complex data collections


• Traditional data management software, hardware, and analysis processes are
incapable of dealing with them
• Three characteristics of big data
• Volume
• Velocity
• Variety

35
Sources of Big Data

36
Big Data Uses

• Examples:
• Retail organizations monitor social networks to engage brand advocates, identify
brand adversaries
• Advertising and marketing agencies track comments on social media
• Hospitals analyze medical data and patient records
• Consumer product companies monitor social networks to gain insight into consumer
behavior
• Financial service organizations use data to identify customers who are likely to be
attracted to increasingly targeted and sophisticated offers

37
Challenges of Big Data

• How to choose what subset of the data to store


• Where and how to store the data
• How to find the nuggets of data that are relevant to the decision making at
hand
• How to derive value from the relevant data
• How to identify which data needs to be protected from unauthorized access

38
Data Management

• Data management
• An integrated set of functions that defines the processes by which data is obtained,
certified fit for use, stored, secured, and processed in such a way as to ensure that the
accessibility, reliability, and timeliness of the data meet the needs of the data users
within an organization
• Data governance
• Defines the roles, responsibilities, and processes for ensuring that data can be trusted
and used by an entire organization

39
Data Management

40
Data Management

• Data lifecycle management (DLM)


• A policy-based approach to managing the flow of an enterprise’s data

41
Data Warehouses, Data Marts, and Data Lakes

• Data warehouse: a large database that collects business information from many
sources in the enterprise in support of management decision making
• ETL process
• Extract
• Transform
• Load

42
Data Warehouses, Data Marts, and Data Lakes

43
Data Warehouses, Data Marts, and Data Lakes

• Data mart: a subset of a data warehouse that is used by small- and medium-
sized businesses and departments within large companies to support decision
making
• A specific area in the data mart might contain greater detailed data than the
data warehouse
• Data lake: takes a “store everything” approach to big data, saving all the data in
its raw and unaltered form
• Also called an enterprise data hub
• Raw data is available when users decide just how they want to use the data
• Only when the data is accessed for a specific analysis is it extracted from the data lake

44
NoSQL Databases

• NoSQL database
• Provides a means to store and retrieve data that is modeled using some means other
than the simple two-dimensional tabular relations used in relational databases
• Advantages:
• Ability to spread data over multiple servers so that each server contains only a subset
of the total data
• Do not require a predefined schema
• Data structures are more flexible and can provide improved access speed and
redundancy

45
Hadoop

• Hadoop
• An open-source software framework that includes several software modules that
provide a means for storing and processing extremely large data sets
• Has two primary components:
• A data processing component (MapReduce)
• A distributed file system (Hadoop Distributed File System, HDFS)

46
Summary

• The database approach to data management has become broadly accepted


• Data modeling is a key aspect of organizing data and information
• A well-designed and well-managed database is an extremely valuable tool in
supporting decision making
• We have entered an era where organizations are grappling with a tremendous
growth in the amount of data available and struggling how to manage and
make use of it
• A number of available tools and technologies allow organizations to take
advantage of the opportunities offered by big data

47

You might also like