0% found this document useful (0 votes)
26 views57 pages

Data Analytics For IOT

Uploaded by

sushma-icb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views57 pages

Data Analytics For IOT

Uploaded by

sushma-icb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Data Analytics for IOT

Course Code : BC0456A


What is Data?
● "Data" refers to raw, unprocessed facts, figures, or observations typically collected or
stored in digital form.

● It can represent anything from numbers, text, images, audio, video, or other types of
data.

● Data can be structured, semi-structured, or unstructured, depending on how it is


organized and represented.
Data Vs Information
Data:

● Data refers to raw, unorganized facts or figures, typically in the form of numbers, text, or symbols.
● Data lacks context and relevance until it is processed and analyzed.
● Examples of data include individual temperature readings, stock prices, or a list of names and ages.

Information:

● Information is the result of processing and analyzing data to give it context, meaning, and relevance.
● Information provides insights, understanding, and actionable knowledge.
● It answers questions, solves problems, or helps in decision-making.
● For example, if we analyze the temperature readings over a week and present the average temperature, highest and
lowest temperatures, and trends, we convert the raw data into meaningful information that helps understand the
weather patterns.
Structured, semi-structured, and unstructured data
2. Semi-Structured Data:
1. Structured Data:
● Structured data is highly organized and Semi-structured data does not
follows a strict, predefined data model. It is adhere to a strict schema but has
typically stored in tabular formats with rows some organizational properties. It
and columns. may contain tags, keys, or
● Example: A customer database in a relational attributes that provide a level of
database management system (RDBMS) is structure.
an example of structured data. Each row Example: JSON (JavaScript
represents a customer record with columns Object Notation) data is a common
such as customer ID, name, email, phone example of semi-structured data. It
number, and address. consists of key-value pairs and
● The data is organized into a clear structure, hierarchical structures, allowing
making it easy to query and analyze using for flexible data representation.
SQL queries.
3. Unstructured Data:
● Unstructured data lacks a predefined
structure or format and does not fit
neatly into traditional databases. It
often consists of text-heavy content,
images, videos, or audio files.

● Example: Social media posts, such as


tweets or Facebook updates, are
prime examples of unstructured data.
These posts can contain free-form
text, multimedia content, hashtags,
mentions, and emojis, making them
challenging to organize and analyze
using traditional database techniques.
However, advanced text processing
and natural language processing
(NLP) techniques can extract
insights from unstructured social
media data
Classification of Data

Data is classified into

● Qualitative data describes qualities or characteristics and cannot be measured numerically. It provides
insights into the attributes, opinions, or behaviors of individuals or entities.

Example: Customer feedback on a product can be qualitative data. Responses such as "the product is user-
friendly" or "the customer service was excellent" convey subjective opinions and perceptions rather than
numerical values.

● Quantitative data consists of numerical measurements or quantities that can be counted or measured. It is
typically objective and allows for statistical analysis.

Example: Sales figures for a particular product over a specific period represent quantitative data. For
instance, "100 units sold in January" or "revenue of $10,000 in Q2" are quantitative measurements that
provide precise numerical information about sales performance.
What is a Dataset?

● A dataset is a structured collection of data organized and stored together for analysis or processing.

● A dataset can include many different types of data, from numerical values to text, images or audio recordings. The data
within a dataset can typically be accessed individually, in combination or managed as a whole entity.

● Datasets are a fundamental tool in data analytics, data analysis and machine learning (ML), providing the data upon
which analysts draw insights and trends. They are essential to ML because selecting the suitable dataset for an ML
project is one of the most crucial initial steps of successfully training and deploying an ML model.
Public DataSet Providers
The Situation
Data Analytics
● Data analytics is the process of examining raw data to discover patterns, extract
meaningful insights, and make informed decisions. Data analytics is a process of
analyzing raw datasets in order to derive a conclusion regarding the information they
hold.

● It involves applying various statistical, mathematical, and computational techniques


to analyze data sets, often with the aid of specialized software tools and algorithms.
Some Examples for Data Analytics
1) Customer Segmentation for Marketing: By analyzing customer data, a retail company discovers that it has distinct
customer segments based on purchasing behavior, demographics, and preferences. This insight allows the company to
tailor its marketing strategies and product offerings to better meet the needs of each segment, leading to increased
customer satisfaction and sales.

2) Predictive Maintenance for Manufacturing: A manufacturing plant analyzes sensor data from its equipment and
identifies patterns indicating when machines are likely to fail. By predicting maintenance needs in advance, the
company can proactively schedule repairs, minimize downtime, and avoid costly production interruptions.

3) Fraud Detection for Financial Services: A bank analyzes transaction data and identifies suspicious patterns indicative
of fraudulent activity, such as unusual spending behavior or unauthorized account access. By detecting and preventing
fraud early, the bank protects its customers and minimizes financial losses.
Steps of data analytics
Observations found in the given situation:

Step 1: Assess the Data

First, you need to understand what data you have. Given the description, it seems like you have a large amount of sensor data. Here’s what you can do:

1. Categorize the Data: Identify different types of data you have. This could include timestamps, sensor readings, location data, etc.
2. Sample the Data: Take a small sample of the data to analyze its structure and quality. Look for patterns, missing values, and anomalies.

Step 2: Define Objectives

You need to convert the executives’ vague ideas into specific, actionable goals. Set up a meeting with your boss and key stakeholders to define clear objectives
for the data. Some questions to address could include:

3. What are the key business questions we want to answer with this data?
4. What metrics will indicate success?
5. Are there specific processes we want to optimize or automate?

Step 3: Basic Data Analysis

Use your existing skills with Excel and databases to perform initial analyses:

6. Data Cleaning: Ensure the data is clean and consistent.


7. Descriptive Statistics: Calculate basic statistics (mean, median, mode, standard deviation) to understand the data's central tendencies and dispersion.
8. Visualization: Create charts and graphs to visualize trends and patterns.
Step 4: Learn and Implement Basic Big Data Tools

Since external training is not an option, leverage free online resources:

1. Online Courses: Platforms like Coursera, edX, and Khan Academy offer free courses on Hadoop, machine learning, and data science.
2. Documentation and Tutorials: Utilize free resources like the Apache Hadoop documentation and online tutorials to get started with Hadoop and related
tools.

Step 5: Prototype Simple Machine Learning Models

Start with simple models that can provide quick wins:

3. Regression Analysis: If you have continuous data, use linear regression to predict future values based on historical data.
4. Classification: If your data can be categorized, use classification algorithms like logistic regression or decision trees.
Step 6: Leverage Open Source Tools

There are many powerful, free tools available:

1. Hadoop Ecosystem: Start with Hadoop for distributed storage and processing. Tools like Hive and Pig can help with querying and data manipulation.
2. Machine Learning Libraries: Libraries like Scikit-learn in Python are great for implementing machine learning models.

Step 7: Small Pilot Projects

Implement small pilot projects to demonstrate value:

3. Predictive Maintenance: Use sensor data to predict equipment failures and schedule maintenance proactively.
4. Anomaly Detection: Identify unusual patterns in the data that could indicate issues or opportunities.
Benefits of Data Analytics

1. Decision making improves


2. Marketing becomes more effective
3. Customer service improves
4. The efficiency of operations increases
Data analytics is the practice of examining data to answer questions, identify trends, and extract insights.
When data analytics is used in business, it’s often called business analytics.

You can use tools, frameworks, and software to analyze data, such as Microsoft Excel and Power BI, Google
Charts, Data Wrapper, Infogram, Tableau, and Zoho Analytics. These can help you examine data from
different angles and create visualizations that illuminate the story you’re trying to tell.
Who Needs Data Analytics?

Professionals who can benefit from data analytics skills include:

● Marketers, who utilize customer data, industry trends, and performance data from past campaigns to plan marketing
strategies
● Product managers, who analyze market, industry, and user data to improve their companies’ products
● Finance professionals, who use historical performance data and industry trends to forecast their companies’ financial
trajectories
● Human resources and diversity, equity, and inclusion professionals, who gain insights into employees’ opinions,
motivations, and behaviors and pair it with industry trend data to make meaningful changes within their organizations
Tom Davenport and Jeanne Harris created a scale, which they called Analytics Maturity.
Companies progress to higher levels in the scale as their use of analytics matures, and
they begin to compete with other companies by leveraging it.
An analytics maturity model is a sequence of steps or stages that represent the evolution of
the company in its ability to manage its internal and external data and use this data to inform
business decisions.

These models assess and describe how effectively companies use their resources to get value
out of data. They also serve as a guide in the analytics transformation process.
Types of Data Analytics

So, the path that companies follow in their analytical development can be broken down into 5 stages:

● No analytics refers to companies with no analytical processes whatsoever.

● Descriptive analytics lets us know what happened, gathering and visualizing historical data.

● Diagnostic analytics identifies patterns and dependencies in available data, explaining why something happened.

● Predictive analytics creates probable forecasts of what will happen in the future, using machine learning techniques
to operate big data volumes.

● Prescriptive analytics provides optimization options, decision support, and insights on how to get the desired result.
Descriptive Analytics

1. It is the simplest type of analytics and the foundation the other types are built on. It allows you to pull trends from raw
data and succinctly describe what happened or is currently happening.

2. Descriptive analytics answers the question, “What happened?”

3. imagine you’re analyzing your company’s data and find there’s a seasonal surge in sales for one of your products: a video
game console. Here, descriptive analytics can tell you, “This video game console experiences an increase in sales in
October, November, and early December each year.”

4. Data Visualization is a natural fit for communicating descriptive analysis because charts, graphs, and maps can show
trends in data—as well as dips and spikes—in a clear, easily understandable way.
Diagnostic Analytics

1. It addresses the next logical question, “Why did this happen?”

2. Taking the analysis a step further, this type includes comparing coexisting trends or movement, uncovering correlations
between variables, and determining causal relationships where possible.

3. Continuing the aforementioned example, you may dig into video game console users’ demographic data and find that
they’re between the ages of eight and 18. The customers, however, tend to be between the ages of 35 and 55. Analysis
of customer survey data reveals that one primary motivator for customers to purchase the video game console is to
gift it to their children. The spike in sales in the fall and early winter months may be due to the holidays that include
gift-giving.

4. Diagnostic analytics is useful for getting at the root of an organizational issue.


Predictive Analytics

1. It is used to make predictions about future trends or events and answers the question, “What might happen in the
future?”

2. By analyzing historical data in tandem with industry trends, you can make informed predictions about what the future
could hold for your company.

3. For instance, knowing that video game console sales have spiked in October, November, and early December every
year for the past decade provides you with ample data to predict that the same trend will occur next year.
Backed by upward trends in the video game industry as a whole, this is a reasonable prediction to make.

4. Making predictions for the future can help your organization formulate strategies based on likely scenarios.
Prescriptive Analytics

1. Finally, prescriptive analytics answers the question, “What should we do next?”

2. Prescriptive analytics takes into account all possible factors in a scenario and suggests actionable takeaways. This type
of analytics can be especially useful when making data-driven decisions.

3. Rounding out the video game example: What should your team decide to do given the predicted trend in seasonality
due to winter gift-giving? Perhaps you decide to run an A/B test with two ads: one that caters to product end-users
(children) and one targeted to customers (their parents). The data from that test can inform how to capitalize on
the seasonal spike and its supposed cause even further. Or, maybe you decide to increase marketing efforts in
September with holiday-themed messaging to try to extend the spike into another month.
The idea of a company not being analytically mature unless it is actively employing
optimization models at every turn can be dangerous. This puts pressure on a company to
focus time and resources where there may not be a return on investment (ROI) for them.

Return on Investment (ROI) is a financial metric used to evaluate the profitability of an


investment relative to its cost. In the context of a company, ROI measures the return
generated from investments made in various aspects of the business compared to the
initial investment or cost of those investments.
Defining the Internet of Things
IoT devices are hardware that connect wirelessly to the internet or a local network hub. These remotely controlled devices
have the ability to transmit and receive data from other devices. Some everyday examples include a smart car, smart doorbell
or even a smart refrigerator.

Many IoT devices gather information via their sensors and then use software to analyze it and determine what decisions to
make based on the data.

Example : Smart devices, Smart TV, Smart Watches etc.


The definition from Forrest Stroud on Webopedia:

The IoT refers to the ever-growing network of physical objects that feature an IP
address for the internet connectivity and the communication
The sudden surge in IoT (Internet of Things) hype can be attributed to several factors:

● Advancements in Technology
● Increased Connectivity
● Big Data and Analytics
● smart Devices and Consumer Adoption
● Industry 4.0 and Digital Transformation
● Emerging Applications
● Media Attention:
The dramatic decrease in sensor costs, bandwidth costs, the spread of cellular coverage,
and the rise of cloud computing all combine to create fertile conditions to easily connect
devices over the internet
IOT analytics challenges
● The limited battery power, bandwidth, and hardware capability that has to be
considered in the design of IoT devices
● There are some special challenges that come along with IoT data. The data was
created by devices operating remotely,
● sometimes in widely varying environmental conditions that can change from day to
day. The devices are often distributed widely geographically.
● The data is communicated over long distances, often across different networking
technologies.
Four Key Challenges of IoT data collection and management

1. Data Security

2. Data Privacy

3. Data Volume

4. Problem with Time

5. Problem with space


Data Security

Some IoT devices collect highly sensitive information. In the healthcare industry, the data
collected by IoT devices include protected health information (PHI). Internet-connected
cameras, voice assistants, and similar tools can monitor peoples’ activities and
conversations. IoT devices used in manufacturing have access to sensitive information
about manufacturing processes and procedures.
Securing this data is a common challenge for IoT devices. These devices are frequently designed to be accessible
from the public Internet due to their need to send data to cloud-based servers for processing and are managed from
mobile devices and web-based portals. As a result, they have notoriously poor security. Some common IoT security
issues that can endanger the sensitive data that they contain include:

● Poor Password (other unique identifier) Security: IoT devices are often deployed with default, weak,
and hardcoded passwords, keys, or secrets.

● Unpatched Vulnerabilities: IoT manufacturers are largely unregulated and often have poor secure
development practices, leading to vulnerable products being shipped. IoT devices are commonly
deployed on a “set it and forget it” basis, without patches applied for newly discovered vulnerabilities. As a
result, many IoT devices contain vulnerabilities that an attacker can exploit.
Data Privacy

Much of the information collected and processed by IoT devices may be protected under various data privacy laws

IoT device manufacturers and users must protect it per applicable laws. Some important considerations include:

1. Consent to Collection: Under the GDPR and similar laws, data subjects must provide explicit consent to collect their personal,
protected data
2. Consent to Processing: In addition to consent to data collection, GDPR and other laws require explicit consent from data subjects
for their data to be processed.
3. Encryption: Data protection laws require data to be encrypted at rest and in transit to protect against unauthorized access and
misuse.
4. IoT devices are designed to be distributed and have their data processed on cloud servers, making it more difficult to track and
control access.

Note

The General Data Protection Regulation (GDPR) is a comprehensive data protection law enacted by the European Union (EU) that came into
effect on May 25, 2018. It aims to protect the personal data and privacy of EU citizens and to harmonize data protection laws across the EU
member states
Data Volume

A company can easily have thousands to millions of IoT devices with several sensors on
each unit, each sensor reporting values on a regular basis. The inflow of data can grow
quite large very quickly. Since IoT devices send data on an ongoing basis, the volume of
data in total can increase much faster than many companies are used to.
A chart showing data storage needs for production snapshot of 200 KB and 1,000 units per month. Five years of production
data is kept
Problems with time
● When data used for analytics is recorded at headquarters or a manufacturing
plant,everything happens at the same place and time zone. IoT devices are spread out
across the globe. Events that happen at the absolute same time do not happen at the
same local time.

● When IoT devices communicate sensor data, time may be captured using the local
time. It can dramatically affect analytics results if it is not clear whether local time or
UTC was recorded.

● There can also be issues with clock synchronization.


● When data used for analytics is recorded at headquarters or a manufacturing plant, everything
happens at the same place and time zone. IoT devices are spread out across the globe.
● Events that happen at the absolute same time do not happen at the same local time.
● When IoT devices communicate sensor data, time may be captured using the local time. It can
dramatically affect analytics results if it is not clear whether local time or UTC was recorded

● consider an analyst working at a company that makes parking spot occupancy


detection sensors. She is tasked with creating predictive models to estimate future
parking lot fill rates. The time of day is likely to be a very predictive data point. It
makes a big difference to her on how this time is recorded. Even determining if it is
night or day at the sensor location will be difficult.
Factors Affecting Parking Fill Rates

1. Time of Day: As mentioned earlier, different times of the day exhibit different patterns of parking lot usage.
2. Day of the Week: Weekdays and weekends can have significantly different occupancy rates.
3. Seasonal Variations: Holidays, seasonal events, and weather conditions can affect parking behavior.
4. Location: The type of facility (e.g., office building, shopping mall, residential area) influences occupancy patterns.
5. Special Events: Concerts, sports events, and other gatherings can lead to temporary spikes in parking lot usage.

Challenges in Recording Time

6. Time Zones: The sensors may be deployed in various geographic locations with different time zones. The recorded time must be
standardized (e.g., using UTC) and then converted to the local time zone for accurate predictions.
7. Daylight Saving Time (DST): In regions observing DST, the recorded times can shift, complicating the data consistency. This needs to be
accounted for to ensure the accuracy of historical data and predictions.
8. Night and Day Classification: Determining whether it is night or day at the sensor location adds complexity. Simply relying on the
recorded time isn't enough; geographic location and seasonal variations must be considered.
1. Predictive Value: The time of day is a crucial factor in predicting parking lot occupancy. Different times of the day typically show
different patterns of parking lot usage. For example:
○ Morning Rush Hour: Higher occupancy as people arrive at work.
○ Afternoon: Possible reduction as people leave for lunch or finish early.
○ Evening: Increase again as people return home or go out for evening activities.
○ Night: Lower occupancy as most businesses are closed.
2. Time-Dependent Behavior: Parking lots at different types of locations (e.g., office buildings, shopping malls, residential areas) have
unique patterns influenced by the time of day.

● Parking Fill Rate: The ratio or percentage of occupied parking spaces to the total number of parking spaces in a parking lot.

Formula

Parking Fill Rate=(Number of Occupied Parking Spaces/ Total Number of Parking Spaces)×100
Solutions and Considerations

1. Standardization: Store all timestamps in a standardized format (e.g., UTC) and convert them to local time zones when needed. This
ensures consistency and allows for correct interpretation across different locations.

2. Feature Engineering:
a. Hour of the Day: Include the hour as a feature, to capture the daily periodicity.
b. Day of the Week: Parking patterns often vary between weekdays and weekends.
c. Public Holidays: Mark these as they can significantly alter typical patterns.

3. Environmental Factors: Use external data sources to determine sunrise and sunset times for each location, enabling accurate classification
of day and night.

4. Handling DST: Ensure the model accounts for DST changes.


Problems with space
● IoT devices are located in multiple geographic locations. Different areas of the world have
different environmental conditions.

● Temperature variations can affect sensor accuracy. You could have less accurate readings in
Calgary, Canada than in Cancun, Mexico, if cold impacts your device.

● Remote locations may have weaker network access. The higher data loss could cause data
values for those locations to be underrepresented in the resulting analytics.

● Many IoT devices are solar powered. The available battery charge can affect the frequency of
data reporting

You might also like