Data Models (Module - II)
Data Models (Module - II)
(Module-II)
Dr Shola UshaRani
Associate professor
SCOPE,VIT Chennai.
What is IoT Data management
it enables users to track, monitor and manage
the devices to ensure the devices to work
properly and securely after deployment.
Why Data Management is need?
● IoT sensors interact with people, homes, cities, farms,
factories, workplaces, vehicles, wearables and medical
devices, and beyond.
● it is changing our lives from managing home appliances
to vehicles. The technology of devices can now advise us
about what to do, when to do and where to go.
● Industrial IoT applications assist us in managing the data
for processes, and predicting faults and disasters.
● The IoT platforms help set and maintain parameters to refine
and store data accordingly.
Data Management Process
• the process of taking the overall available data and refining it down to important
information
• Different devices from different applications send large volumes and varieties of information. Managing all
this IoT data means is developing and executing architectures, policies, practices and procedures that
can meet the full data lifecycle needs.
• Things are controlled by smart devices to automate tasks, so we can save our time.
Intelligent things can collect, transmit and understand information, but a tool will
be required to aggregate data and draw out inferences, trends and patterns.
What is IoT data management
requirement
• it is need to design a data management framework
compatible with all the software and hardware that play a
role in collecting, managing and distributing data.
• The design needs to be efficient to accelerate time-to-
market of the end-product.
• A good IoT data management solution will be able to
filter out erroneous records coming from the IoT
systems—such as negative temperature readings—
before ingesting it into the data lake.
IoT data management techniques
cloud computing Edge computing
● data processing happens in a ● data is processed near the data
source or at the edge of the
centralised data storage
network. By processing and
location using some data locally, the IoT
● Sensors and devices can saves storage space for data,
connect indirectly through the processes information faster and
cloud, where data is centrally- meets security challenges.
managed, or send data directly ● Sensors produce a large amount
of data for edge gateway devices
to other devices to locally
so that these can make decisions
collect, store and analyse the by analysing the data
data, and then share selected ● Edge devices for data
findings or information with management help secure the
the cloud most valuable data and reduce
bandwidth cost.
Data Management challenges
• Space Optimization : the number of IoT devices will increase, thus increasing the
challenges for real-time processing and analysis to reduce time for storage
• Identification of tools: Functions such as adaptive maintenance, predictive
repair, security monitoring and process optimization rely on real-time data
selecting the right tools is a challenge because integration between different
sensors should be proven and compatibilities confirmed. When there is no
connection, devices must still gain insights, make decisions and prepare for data
distribution
• Security to data: to protect data from unauthorized access and tampering.
Organizations need to be compliant with national rules and regulations on
securing data.
• secure gateway device: Having many different devices connected directly to
cloud services presents a huge attack surface, which can be mitigated by
channeling data through a secure gateway device
Data Engineering and Data
Exploration
Data Exploration
• the initial step in data analysis, in which data analysts
use data visualization and statistical techniques to
describe dataset characterizations, such as size,
quantity, and accuracy, in order to better understand
the nature of the data.
• Two techniques
– Manual analysis
– Automated analysis
• through exploration software solutions.
• visually explore and identify relationships between different data
variables, the structure of the dataset, the presence of outliers,
and the distribution of data values in order to reveal patterns and
points of interest, enabling data analysts to gain greater insight
into the raw data.
Why we need Data Exploration?
• Data is often gathered in large, unstructured volumes
from various sources and data analysts must first
understand and develop a comprehensive view of the
data before extracting relevant data for further
analysis, such as univariate, bivariate, multivariate,
and principal components analysis.
• Humans process visual data better than numerical
data.
• Challenge : thousands of rows and columns of data
points and communicate that meaning without any
visual components
Data Exploration Tools
• Manual data exploration methods
– writing scripts to analyze raw data or manually filtering data into
spreadsheets.
– A popular tool for manual data exploration is Microsoft Excel spreadsheets
• To identify the correlation between two continuous variables in Excel, use the function
CORREL() to return the correlation
• Automated data exploration tools
– data visualization software, help data scientists easily monitor data sources
and perform big data exploration on otherwise overwhelmingly large datasets
– Graphical displays of data, such as bar charts and scatter plots, are valuable
tools in visual data exploration.
– variety of proprietary automated data exploration solutions,
including business intelligence tools, data visualization software, data
preparation software vendors, and data exploration platforms
– open source data exploration tools that include regression capabilities and
visualization features, which can help businesses integrate diverse data
sources
Data Engineering
Business Intelligence
• Collects, integrates, analyses data using
reports and dashboards to support decision
making
Roles required in Data Management
Process
• Data Analyst
• Data Engineer
• Data Scientist
Data Analyst or Data Integration
• Analyse all kinds of data and helps the
organization to understand it in general English
languages
• Helps in making better business decisions for
upper management
• Responsibilities :
– Collection
– Correlation
– Analysis
– Reporting
Data Engineer
• Preparing data for analytics and operational
usage
• Develops, constructs, tests and maintains the
complete architecture of the large-scale
processing system.
• Preparing a data pipeline
– To pull all the information integration from different
sources.
– The integrated data is consolidated for the data clean
and structure it for more analytics
Data Scientist
• Analyses and interpret the complex digital
data
– Statistics of an website
• A professional who deals with an enormous
mass of structured/unstructured data and use
their skills in math, statistics, programming,
machine learning etc.,
• use the above techniques to implement
strategic plans
Skills needed
Data Analyst Data engineer Data Scientist
Source: https://serokell.io/blog/data-preprocessing
Pre-processing 1 : Data Cleaning
What is Data cleaning
• Applying different techniques based on the
problem and the data type.
• Incorrect data is either removed,
corrected, or imputed.
Different Steps performed in Data
Cleaning
• Duplicates,
• Type conversion,
• Syntax errors,
• Standardize,
• Scaling / Transformation,
• Normalization,
• Missing Values,
• Outlier Treatment,
• Irrelevant Data.
Duplicates (step 1 of 9)
• Duplicates are data points that are
repeated in your dataset.
• It often happens when for example; Data
are combined from different sources
Type Conversion (step 2 of 9)
• Make sure numbers are stored as
numerical data types.
• A date should be stored as a date object,
or a Unix timestamp (number of seconds),
and so on.
• Categorical values can be converted into
and from numbers if needed.
Syntax Errors
(step 3 of 9)
• Remove white spaces: Extra white spaces
at the beginning or the end of a string should
be removed.
• Pad strings: Strings can be padded with
spaces or other characters to a certain width.
– For example, some numerical codes are often
represented with prepending zeros to ensure they
always have the same number of digits.
• Fix typos: Strings can be entered in many
different ways, and no wonder, can have
mistakes.
Standardize (step 4 of 9)
• Our duty is to not only recognize the typos
but also put each value in the same
standardized format.
• For strings, make sure all values are
either in lower or upper case.
• For numerical values, make sure all
values have a certain measurement unit.
Scaling &
Transformation (step 5 of 9)
• Scaling means to transform
your data so that it fits within
a specific scale, such as 0–
100 or 0–1.
– For example, exam scores of
a student can be re-scaled to
be percentages (0–100)
instead of GPA (0–10).
• It can also help in making
certain types of data easier
to plot.
Normalization (step 6 of 9)
• In most cases, we
normalize the data
if we’re going to be
using statistical
methods that rely
on normally
distributed data.
Missing Values (step 7 of 9)
• Ways to Resolve Missing Values are:
https://www.geeksforgeeks.or
g
Applications of Predictive
Analysis
• Health care: Predictive analysis can be used to determine the
history of patient and thus, determining the risks.
• Financial modelling: Financial modelling is another aspect
where predictive analysis plays a major role in finding out the
trending stocks helping the business in decision making
process.
• Customer Relationship Management: Predictive analysis
helps firms in creating marketing campaigns and customer
services based on the analysis produced by the predictive
algorithms.
• Risk Analysis: While forecasting the campaigns, predictive
analysis can show an estimation of profit and helps in
evaluating the risks too.
https://www.geeksforgeeks.or
g
References
• https://www.godaddy.com/garage/what-is-
hypertargeting/
• https://textsanity.com/text-message-marketing/why-
tags-are-the-best-way-to-organize-your-data/
• https://www.investopedia.com/terms/p/predictive-
modeling.asp
• https://medium.com/@osiolabs/what-is-an-etl-
extract-transform-load-pipeline-in-node-js-
9a1a17de30f1
• https://towardsdatascience.com/forecasting-
fundamentals-you-should-know-before-building-
predictive-models-299a18c2093b