0% found this document useful (0 votes)
60 views25 pages

Big Data in Management Unit - I: Session 1-5

This document provides an overview of big data in management. It discusses key topics such as the origins of big data analytics, different data sources, skill sets needed, integrating big data into corporate culture, and best practices for data analysis. The document also covers data storage, processing power, security and compliance issues, and how big data can drive both short-term and long-term changes within an organization. Examples are given of how big data is used across various industries like healthcare, government, transportation and media.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views25 pages

Big Data in Management Unit - I: Session 1-5

This document provides an overview of big data in management. It discusses key topics such as the origins of big data analytics, different data sources, skill sets needed, integrating big data into corporate culture, and best practices for data analysis. The document also covers data storage, processing power, security and compliance issues, and how big data can drive both short-term and long-term changes within an organization. Examples are given of how big data is used across various industries like healthcare, government, transportation and media.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Big Data in Management

UNIT - I
Session 1-5
Content
• 1. Origins of Big Data Analytics
• 2. Different Types of Data Sources
• 3. Skill Sets Needed
• 4. How to integrate Big Data into a corporate culture
• 5. Public and Private sources of data
• 6. Storage, Processing Power, Platforms
• 7. Security, Compliance and Auditing
• 8. Short - term and Long-term Changes
• 9. Best Practices for data analysis
• 10. Data Pipeline, Value creation
Big Data – Introduction (1)
• Data
– Can be anything from “something recorded” to
“everything under the sun”

– Recording and preserving that data has always


been the challenge, and technology has limited
ability to capture and preserve data

– Compliance Regulations, Backup Strategy


Big Data – Introduction (2)
• Big data is often described as extremely large data sets that have
grown beyond the ability to manage and analyse them with
traditional data processing tools.

• The primary difficulties are the


acquisition,
storage,
searching,
sharing,
analytics and
visualization of data
Big Data – Introduction (3)
• The Arrival of Analytics
• Researchers started to incorporate related data sets,
unstructured data, archival data, and real-time data into the
process, which in turn gave birth to what we now call Big Data.

• These data comes from everywhere


– Sensors used to gather climate information
– Posts to social media sites,
– Digital pictures and videos posted online
– Transaction records of online purchases and
– Cell phone GPS signals…..
Uses of Big Data in different areas
• National Oceanic and Atmospheric Administration
(NOAA) - Uses Big Data approaches to aid in climate,
ecosystem, and weather
• NASA - Uses Big Data for aeronautical research
• Pharmaceutical Companies - Uses BD for Drug Testing
• Energy Companies – uses BD for geophysical analysis
• Media – uses BD for Text Analysis and Web mining
• Walt Disney – uses BD for customer behaviour in all
stores, theme parks
BD and relevant Umbrella Technologies
• Business Intelligence (BI) – provide historical,
current and predictive views of business
operations.
• Data Mining – normally used with data at rest
or with archival data. Modelling and
knowledge discovery for predictive rather
than descriptive.
• Statistical Applications – Survey, Sampling,
Empirical data experimental reporting etc.,
Caution: Obstacles Ahead
• Must know about
– Business Intelligence (BI),
– ETL ( Extract, Transform, Load)
– Fault Tolerant Clustered Architecture
– Parallel Computing
• BD Software
Data Management - HDFS – Hadoop Distribution File
System & HBASE
Processing Framework – MapReduce, OOZIE
Development Framework – PIG, HIVE
Business Intelligence – Pentaho
• Google Data Centre -
https://www.youtube.com/watch?v=XZmGGAbHqa0
Obstacles Remain
• Purity of the data, Analytical Knowledge
• Understanding of Statistics and Others

• Gathering the data is usually half the battle in the Analytics game.
• Example Tools : 80Legs, Needlebase etc.,
• Data Sources are from
Multimedia, social media, instant messaging, e-mail, POS, etc.,
• Bio-Informatics
– Processing Time is reduced from Years to Days
– Human Individual Genome – 1000$
– Personalized Medicine
Data and Data Analytics are Getting more
Complex
• Behavioural Analytics is a process of that
determines patterns of behaviour from
human to human and human to system
interaction data.

• The behavioural pattern can provide insight


into which series of actions led to an event
(Ex. A Customer Sale or a product Switch)
Few concepts in BDM
• Four Ms in Big Data
Make Me More Money (or) Make Me More Efficient
• BI
– Focuses on Reporting
– What Happened ? (Descriptive Analytics)
– Operates with Scheme on Load
• Data Science
– What is Likely to happen ?
– Deals with Scheme on Query
Evolution of the Business Questions
What Happened ? What will Happen? What Should I do ?
(Descriptive Analytics) (Predictive Analytics) (Prescriptive Analytics)
How many widgets did I How many widgets will I Order [15,000] units of
sell last month ? sell next month? component Z to support
widget sales for next
month
How many of product X How many of product X will Set aside 5125k in financial
were returned last month? be returned next month? reserve to cover product X
returns
BD Business Model Maturity Index
• The index provides a roadmap to help organization accelerate the
integration of data and analytics into their business models
• 4 Phases
• 1. Business Monitoring
– DWH, BI, also called as Business performance management
• 2. Business Insights
– Coupling internal (Example – consumer comments, e-mail conversations,
Technician notes) and external data (Social Media, Weather, Traffic, data.gov)
• 3. Business optimization
– Prescriptive analytics to optimize key business process
– Example – recommendations, scores, rules
– Provides opportunities for organizations to push analytics insights to their
customers in order to influence customer behaviours
• 4. Data Monetization
– Organizations seek to create new sources of revenue
– Selling data or insights into new market.
• 5. Business Metamorphosis
– A major shift in the organizations core business model. (Ex – selling products to
selling “Business as a Service”.
Business Analyst vs Data Scientist
Area BI Analytst Data Scientist
Focus Reports, KPIs, Trends Patterns, correlations,
models
Process Static, comparative Exploratory,
experimentation
Data Pre-Planned, added slowly On the fly, as needed
Sources
Transform Up front, carefully planned In-database, on-
demand, enrichment
Data Single version of truth “Good Enough”,
Quality probabilities
Data Schema on load Schema on query
Model
Analysis Retrospective, descriptive Predictive, prescriptive
Data Science Algorithms
• 1. Fundamental Exploratory Analytics algorithms
– Trend Analysis, Boxplots, Geography (Spatial) Analysis,
Pairs plot, Time series decomposition
• 2. Advanced Analytics Algorithms
– Cluster Analysis, Normal Curve equivalent analysis,
Assumption Analysis, Graph Analysis, Text Mining,
– Sentiment Analysis, Traverse pattern analysis, Decision
Tree
Data Lake
• The data lake was born out of the “economics of BD”
that allows organizations to store, manage and
analyze massive amounts of data at a cost that can be
20 to 50 times cheaper than at traditional DWH.
• Because of agile underlying Hadoop/HDFS
architecture that typically supports the data lake.
• Organizations can store structured (tables, csv), semi-
structured (web logs, sensor logs) & unstructured
data (text files, SM posts)
Big Data Sources
Step 1 : Identify the Usability - CONSIDERATIONS
• Structure of the data
- structured, unstructured, semi-structured, table based, proprietary
• Source of the data
- internal, external, private, public
• Value of the data
- generic, unique, specialized
• Quality of the data
- verified, static, streaming
• Storage of the data
- remotely accessed, shred, dedicated platforms, portable
• Relationship of the data
- superset, subset, correlated
Step 2: Import the data into an appropriate platform
• Data have to transformed into something accessible, queryable and relatable.
Hunting for BIG DATA
• Where do I get the data from?

• Finding data for BDA is part science, part investigative work and part assumption.

• Some of the most obvious sources for data are


electronic transactions,
web-site logs, and sensor information.

• Determine what BDA is going to be used for,


Example – Is the business looking to analyse marketing trends?
predict web traffic,
gauge customer satisfaction or
achieve some other lofty goal that can be accomplished with the current
technologies?
SETTING THE GOAL FOR BD
• Every project usually starts out with a goal and with objectives
to reach the goal.

• Example : Retail Organization


– The goal for BDA may be to increase sales, a chore that spans
several business ideologies and departments, including marketing,
pricing, inventory, advertising, and customer relations.
– To gather information from a multitude of sources,
- some internal and others external.
- some to be purchased or in public domain (free)
- some to structured or unstructured data (call centre, support logs,
customer feedback (emails), surveys, sensors)
Big Data Sources Growing (1)
• Many industries fall under the umbrella of new data creation
and digitization of existing data, and most are becoming
appropriate sources for BD resources.
– 1. Transportation, Logistics, Retail, utilities, and telecommunications
• Sensor data from GPS transceivers, RFID tag readers, smart meters & cell
phones
– 2. Health Care
• The Health Care industry is quickly moving to electronic medical records and
images, which it wants to use for short-term public health monitoring and
long-term epidemiological research programs.
– 3. Government
• Many government agencies are digitizing public records, such as census
information, energy usage, budgets, RTI, electoral data, and law enforcement
reporting
Big Data Sources Growing (2)
• 4. Entertainment Media
- The entertainment industry has moved to digital recording,
production, and delivery in the past 5 years and is now collecting
large amounts of rich content and user viewing behaviors.
• 5. Life Sciences
– - Low-cost gene sequencing generate tens of terabytes of
information that must be analysed to look for genetic variations
and potential treatment effectiveness.
• 6. Video Surveillance
– - Organizations want to analyse for behavioural patterns.
Diving Deeper into Big Data Sources
• Additional data points are gathered from existing
systems or with the installation of new sensors that
deliver more pieces of information.
– Example 1 – Financial Transactions
- Thanks to the consolidation of global trading environments and the
increased use of programmed trading, the volume of transactions
being collected and analysed is doubling or tripling.
– Example 2 – Smart Instrumentation
- The use of smart meters in energy grid systems, which shifts meter readings
from monthly to every 15 minutes.
• - extends beyond power usage and can measure heating,
cooling and other loads.
Google Demo
• https://about.google/intl/en/products/?tab=wh

You might also like