Data For Business Analytics Unit 2

The document discusses data management concepts including data collection, data quality, data security, big data characteristics, structured and unstructured data, business intelligence, and techniques for dealing with missing data such as imputation and removing data. Common data sources both internal and external to an organization are also covered.

Uploaded by

iemhardik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views23 pages

Data For Business Analytics Unit 2

Uploaded by

iemhardik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Data For Business Analytics

Unit 2
• Data : Numerical text and figures that have
been collected through some type of
measurement process.

• Information : Result of analyzing data that is

extracting meaning from data to support
evaluation and decision making.
Data source –Internal/External
Internal
These types of data can easily be found within the organization such as
market record, a sales record, transactions, customer data, accounting
resources, etc. The cost and time consumption is less in obtaining internal
sources.

• Financial Statements
• Sales Reports
• Retailer/Distributor/Deal Feedback
• Customer Personal Information (e.g., name, address, age, contact info)
• Business Journals
• Government Records (e.g., census, tax records, Social Security info)
• Trade/Business Magazines
• The internet
• External : The data which can’t be found at internal organizations
and can be gained through external third party resources is
external source dataGovernment publications, news publications,
Registrar General of India, planning commission.
• Sensors data: With the advancement of IoT devices, the sensors of
these devices collect data which can be used for sensor data
analytics to track the performance and usage of products.
• Satellites data: Satellites collect a lot of images and data in
terabytes on daily basis through surveillance cameras which can
be used to collect useful information.
• Web traffic: Due to fast and cheap internet facilities many formats
of data Which is uploaded by users on different platforms can be
predicted and collected with their permission for data analysis.
The search engines also provide their data through keywords and
queries searched mostly.
Types of Data
• Quantitative/Qualitative Data
• Discreet/Continuous Data
• Nominal
• Ordinal
• Interval
• Ratio
Data Collection
• Data collection is the process of acquiring,
collecting, extracting, and storing the
voluminous amount of data which may be in
the structured or unstructured form like text,
video, audio, XML files, records, or other
image files used in later stages of data
analysis. In the process of big data analysis
• Types of Data Collection: Primary and
Secondary .
Data Management
• Data management refers to the professional
practice of constructing and maintaining a
framework for ingesting, storing, mining, and
archiving the data integral to a modern
business.
• Example : Competitive Exams,Organization
data from diff departments.Purchase history
data to segment different customers for
future.
• ERP
Benefits of Data Management System
• Data management provides businesses with a way of measuring
the amount of data in play.
• Data management gives managers a big-picture look at business
processes, which helps with both perspective and planning.
• Once data is under management, it can be mined for
informational gold: business intelligence. This helps business users
across the organization in a variety of ways, including the
following:
• Smart advertising that targets customers according to their
interests and interaction.
• Holistic security that safeguards critical information
• Alignment with relevant compliance standards, saving time and
money
Data Management Challenges
• The amount of data can be (at least temporarily) overwhelming.
• The development team may work from one data set, the sales team
from another, operations from another finance from other and so on.
• The journey from unstructured data to structured data can be steep.
• Making team members aware of the benefits of data management
(and the potential pitfalls of ignoring it) and fostering the skills of
using data correctly, managers engage team members as essential
pieces of the information process.
Data Management
• Master Data Management: Master data management (MDM) is the process of
ensuring the organization is always working with — and making business decisions
based on — a single version of current, reliable information.
• Data quality management: Quality management is responsible for combing
through collected data for underlying problems like duplicate records, inconsistent
versions, and more. Data quality managers support the defined data management
system.
• Data security: One of the most important aspects of data management today is
security. Though emergent practices like DevSecOps incorporate security
considerations at every level of application development and data exchange,
security specialists are still tasked with encryption management, preventing
unauthorized access, guarding against accidental movement or deletion, and other
frontline concerns.
Data Quality /Security
• Data governance sets the law for an enterprise’s state of
information. A data governance framework is like a
constitution that clearly outlines policies for the intake, flow,
and protection of institutional information.
• Data governors oversee their network of stewards, quality
management professionals, security teams, and other people
and data management processes in pursuit of a governance
policy that serves a master data management approach.
• Data stewardship: A data steward does not develop
information management policies but rather deploys and
enforces them across the enterprise.
Big Data
• Big data consists of huge amounts of information
that cannot be stored or processed using traditional
data storage mechanisms or processing techniques.

• Big datarefer to massive amounts of business data

from a wide variety of sources, much of which is
available in real time, and much of which is uncertain
or unpredictable. IBM calls these characteristics
volume, variety, velocity,and veracity.
Characteristics of Big Data
Management
• Volume: This trait refers to the immense amounts of
information generated every second via social media, cell
phones, cars, transactions, connected sensors, images, video,
and text. In petabytes, terabytes, or even zettabytes, these
volumes can only be managed by big data technologies.
• Variety: To the existing landscape of transactional and
demographic data such as phone numbers and addresses,
information in the form of photographs, audio streams, video,
and a host of other formats now contributes to a multiplicity of
data types — about 80% of which are completely unstructured.
• Velocity: Refers to the speed with which big data can be
processed and analyzed to extract the insights and patterns it
contains. These days, that speed is often real-time.
Veracity: This is the degree of reliability and truth that
big data has to offer in terms of its relevance,
cleanliness, and accuracy.
Value: Since the primary aim of big data gathering and
analysis is to discover insights that can inform
decision-making and other processes, this
characteristic explores the benefit or otherwise that
information and analytics can ultimately produce.
• Structured data (as its name suggests) has a well-defined
structure and follows a consistent order. This kind of information is
designed so that it can be easily accessed and used by a person or
computer. Structured data is usually stored in the well-defined
rows and columns of a table (such as a spreadsheet) and
databases — particularly relational database management
systems, or RDBMS.

• Semi-structured data exhibits a few of the same properties as

structured data, but for the most part, this kind of information has
no definite structure and cannot conform to the formal rules of
data models such as an RDBMS.

• Unstructured data possesses no consistent structure across its

various forms and does not obey conventional data models’ formal
structural rules. In very few instances, it may have information
related to date and time.
How is Big Data Collected
1. Asking for it the majority of firms prefer asking
users directly to share their personal information.
Include username and email.
2. Cookies They provide basic statistics about how a
website is used.
3. Email tracking: email tracker allows detecting when
an email was opened. Both Google and Yahoo use
this method to learn their users’ behavioural
patterns and provide personalized advertising.
Business Intelligence
• It use business process data to create charts
and tables that summarize business
performance .
• Main purpose is to analye business data to
create summarized information periodically.
• Techniques involved
summarization,visualization,and charting etc.
• Software used Business Objects,SAP/BI,
Pentaho.
Dealing with missing data or
incomplete data
• Data that is not captured for a variable for the
observation in question. Missing data reduces
the statistical power of the analysis, which can
distort the validity of the results.
Technique for missing data
• The imputation method develops reasonable guesses
for missing data. It’s most useful when the percentage
of missing data is low. If the portion of missing data is
too high, the results lack natural variation that could
result in an effective model.
• The other option is to remove data. When dealing with
data that is missing at random, related data can be
deleted to reduce bias. Removing data may not be the
best option if there are not enough observations to
result in a reliable analysis. In some situations,
observation of specific events or factors may be
required.
Reason for Missing Data
• Missing at Random (MAR): The data is not missing
across all observations but only within
sub-samples of the data. The missing data can be
predicted based on the complete observed data.
• In MCAR situation, the data is missing across all
observations regardless of the expected value or
other variables. Data scientists can compare two
sets of data, one with missing observations and
one without. Using a t-test, if there is no
difference between the two data sets, the data is
characterized as MCAR.
• Missing Not at Random (MNAR)
• The MNAR category applies when the missing data has
a structure to it. In other words, there appear to be
reasons the data is missing. In a survey, perhaps a
specific group of people – say women ages 45 to 55 –
did not answer a question. Like MAR, the data cannot
be determined by the observed data, because the
missing information is unknown. Data scientists must
model the missing data to develop an unbiased
estimate. Simply removing observations with missing
data could result in a model with bias.
• List wise
• In this method, all data for an observation that has one or more missing
values are deleted. The analysis is run only on observations that have a
complete set of data. If the data set is small, it may be the most efficient
method to eliminate those cases from the analysis. However, in most
cases, the data are not missing completely at random (MCAR). Deleting
the instances with missing observations can result in biased parameters
and estimates and reduce the statistical power of the analysis.
• Pair wise
• Pair wise deletion assumes data are missing completely at random
(MCAR), but all the cases with data, even those with missing data, are used
in the analysis. Pairwise deletion allows data scientists to use more of the
data. However, the resulting statistics may vary because they are based on
different data sets. The results may be impossible to duplicate with a
complete set of data.

Batch 8-500-IT Head Database
50% (2)
Batch 8-500-IT Head Database
98 pages
Offier Letter 1
100% (1)
Offier Letter 1
5 pages
CH 05
No ratings yet
CH 05
27 pages
(Please Specify) : Fedwire CH Chips BIC Bank Identifier Code
No ratings yet
(Please Specify) : Fedwire CH Chips BIC Bank Identifier Code
1 page
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
15 pages
Adidas v. Kmart - Complaint
No ratings yet
Adidas v. Kmart - Complaint
67 pages
Eisenhower Matrix 4.0 Demo
No ratings yet
Eisenhower Matrix 4.0 Demo
261 pages
Thesis Topics For Education Majors
100% (3)
Thesis Topics For Education Majors
8 pages
C20 Combined
No ratings yet
C20 Combined
291 pages
Introduction To Data: - Manish Lamba
No ratings yet
Introduction To Data: - Manish Lamba
23 pages
Option Chain Analysis
No ratings yet
Option Chain Analysis
14 pages
Recognize Potential Market
No ratings yet
Recognize Potential Market
6 pages
Mivan Project
No ratings yet
Mivan Project
12 pages
Data and Information
No ratings yet
Data and Information
22 pages
GR 10 - Resource Pack 2025 - Accounting - Answer Book
No ratings yet
GR 10 - Resource Pack 2025 - Accounting - Answer Book
26 pages
Tana Water Works Dev Agency - Workload Analysis RFP Tech RFP 23.10.24
No ratings yet
Tana Water Works Dev Agency - Workload Analysis RFP Tech RFP 23.10.24
113 pages
8th INTRA Departmental Moot Court Competition (Updated)
No ratings yet
8th INTRA Departmental Moot Court Competition (Updated)
12 pages
Audit Document - 2014
No ratings yet
Audit Document - 2014
28 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
86 pages
Unit II
No ratings yet
Unit II
91 pages
Session 2 - Foundations of Data and Information - 2024
No ratings yet
Session 2 - Foundations of Data and Information - 2024
33 pages
Lecture 2
No ratings yet
Lecture 2
50 pages
Data Analysis - Unit1
No ratings yet
Data Analysis - Unit1
65 pages
Class+2+ +Lecture+Note.
No ratings yet
Class+2+ +Lecture+Note.
43 pages
DA Unit 1
No ratings yet
DA Unit 1
33 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
L01-Fundamentals of Big Data and Data Analytics
No ratings yet
L01-Fundamentals of Big Data and Data Analytics
58 pages
Unit 1ppt 241202105748 Ba1c594f
No ratings yet
Unit 1ppt 241202105748 Ba1c594f
30 pages
ISP500 Topic 3 Data and Knowledge Management - ch5
No ratings yet
ISP500 Topic 3 Data and Knowledge Management - ch5
23 pages
BSST 371 Data and Analysis
No ratings yet
BSST 371 Data and Analysis
24 pages
Getting An Overview of Big Data (Module1)
No ratings yet
Getting An Overview of Big Data (Module1)
58 pages
Introduction To Big Data Platform (Module-3)
No ratings yet
Introduction To Big Data Platform (Module-3)
23 pages
Market Analysis and Efficient Market
No ratings yet
Market Analysis and Efficient Market
83 pages
Introduction To Data Science Module 2
No ratings yet
Introduction To Data Science Module 2
35 pages
Business Analytics
No ratings yet
Business Analytics
42 pages
DAVAI Macro
No ratings yet
DAVAI Macro
6 pages
01 DM BI Intro
No ratings yet
01 DM BI Intro
22 pages
Data Analystic
No ratings yet
Data Analystic
35 pages
Unit - I - Types of Digital Data
No ratings yet
Unit - I - Types of Digital Data
45 pages
Unit 1ppt
No ratings yet
Unit 1ppt
29 pages
Lecture 1 - Data Management
No ratings yet
Lecture 1 - Data Management
33 pages
2intern1 2
No ratings yet
2intern1 2
10 pages
Training Booking Form
No ratings yet
Training Booking Form
3 pages
Introduction To Data
No ratings yet
Introduction To Data
34 pages
Insights Into Big Data: An Industrial Perspective
No ratings yet
Insights Into Big Data: An Industrial Perspective
52 pages
3 Data Analytics Techniques
No ratings yet
3 Data Analytics Techniques
17 pages
Day 13-Managing Digital Data
No ratings yet
Day 13-Managing Digital Data
17 pages
Group 8 - CHAPTER 8 - Project TIM
No ratings yet
Group 8 - CHAPTER 8 - Project TIM
18 pages
Dataanalyticsunit 1
No ratings yet
Dataanalyticsunit 1
26 pages
DA Unit 2 Trio 1
No ratings yet
DA Unit 2 Trio 1
26 pages
Group Dynamics-Ob
No ratings yet
Group Dynamics-Ob
51 pages
Data Analytics in Power Bi
No ratings yet
Data Analytics in Power Bi
14 pages
Data Analytics - 1
No ratings yet
Data Analytics - 1
21 pages
Enterprise Sustainability Analysis - Edited
No ratings yet
Enterprise Sustainability Analysis - Edited
17 pages
FBAS Notes
No ratings yet
FBAS Notes
20 pages
1 - Konsep Big Data
No ratings yet
1 - Konsep Big Data
35 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
Data Analytics-Wps Office
No ratings yet
Data Analytics-Wps Office
21 pages
Basic Business Analytics Using Excel, Chapter 01
No ratings yet
Basic Business Analytics Using Excel, Chapter 01
21 pages
Unit 1
No ratings yet
Unit 1
19 pages
Unit 2 Data Analytics
No ratings yet
Unit 2 Data Analytics
16 pages
Anglais Des Affaires
No ratings yet
Anglais Des Affaires
3 pages
Midterm Data Analytics
No ratings yet
Midterm Data Analytics
15 pages
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
Lecture 2
No ratings yet
Lecture 2
14 pages
Performance Evaluation of Portfolio
No ratings yet
Performance Evaluation of Portfolio
27 pages
Unit II
No ratings yet
Unit II
6 pages
ACC IT APP MIdterm Bigdata
No ratings yet
ACC IT APP MIdterm Bigdata
12 pages
Bana1 Midterm Reviewer
No ratings yet
Bana1 Midterm Reviewer
10 pages
SAP QM Interview Questions
No ratings yet
SAP QM Interview Questions
12 pages
Air India - Damage TO Image
No ratings yet
Air India - Damage TO Image
7 pages
Content
No ratings yet
Content
7 pages
Notes - IM1
No ratings yet
Notes - IM1
5 pages
Mba It Unit 2
No ratings yet
Mba It Unit 2
6 pages
Enterprise Data Management (Midterm Reviewer)
No ratings yet
Enterprise Data Management (Midterm Reviewer)
6 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
Data Analysis
No ratings yet
Data Analysis
6 pages
Advertising AND Personal Selling Case Study: Anshul Yadav BCH19166 Section B
No ratings yet
Advertising AND Personal Selling Case Study: Anshul Yadav BCH19166 Section B
16 pages
Malawi Customs Agent Broker Authorization Form
No ratings yet
Malawi Customs Agent Broker Authorization Form
2 pages
How To Validate A Backup: 1. Validating A Logical Export (Taken Using Exp Utility)
No ratings yet
How To Validate A Backup: 1. Validating A Logical Export (Taken Using Exp Utility)
3 pages
1-Tally Basics
No ratings yet
1-Tally Basics
4 pages
Untitled Document-1
No ratings yet
Untitled Document-1
3 pages
Script For Presentation
No ratings yet
Script For Presentation
10 pages
Regulatory Bodies For Health An
No ratings yet
Regulatory Bodies For Health An
4 pages
Ethics Assignment
No ratings yet
Ethics Assignment
7 pages
TEFC Vertical Pump Motors: Innovation and Quality
No ratings yet
TEFC Vertical Pump Motors: Innovation and Quality
2 pages
Quality Management System (QMS) / ISO 9001:2008
No ratings yet
Quality Management System (QMS) / ISO 9001:2008
7 pages
Rahul Singh Parihar
No ratings yet
Rahul Singh Parihar
2 pages
Ansoff Practice Question
No ratings yet
Ansoff Practice Question
2 pages
Zero To Mastery In Cybersecurity- Become Zero To Hero In Cybersecurity, This Cybersecurity Book Covers A-Z Cybersecurity Concepts, 2022 Latest Edition
From Everand
Zero To Mastery In Cybersecurity- Become Zero To Hero In Cybersecurity, This Cybersecurity Book Covers A-Z Cybersecurity Concepts, 2022 Latest Edition
RAJIV JAIN
No ratings yet
Decision Making with Data
From Everand
Decision Making with Data
Ravi Deshpande
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet

Data For Business Analytics Unit 2

Uploaded by

Data For Business Analytics Unit 2

Uploaded by

Data For Business Analytics

• Information : Result of analyzing data that is

• Big datarefer to massive amounts of business data

• Semi-structured data exhibits a few of the same properties as

• Unstructured data possesses no consistent structure across its

You might also like