Data Analytics in Iot: Cs578: Internet of Things
Data Analytics in Iot: Cs578: Internet of Things
• Structured data :
– data follows a model/schema
– defines data representation
– e.g. Relational Database
– easily formatted, stored, queried, and processed
• has been core type of data used for making business
decisions
• Wide array of data analytics tools are available
• Unstructured data:
– lacks a logical schema
– Doesn’t fit into predefined data model
– e.g. text, speech, images, video
• Semi-structured data:
− hybrid of structured and unstructured data
− Not relational, but contains a certain schema
− e.g. Email : fields are well defined, but body and
attachments are unstructured
• Predictive
– It aims to foretell problems or issues
before they occur.
• e.g., it could provide an estimate on
the remaining life of the truck
engine.
• Prescriptive
– It goes a step beyond predictive and
recommends solutions for upcoming
problems. Most data analysis
• e.g. it might calculate various space in IoT
alternatives to cost-effectively
maintain our truck.
• In more complex cases, static rules cannot be simply inserted into the
program
– because the programs require parameters that can change.
– e.g., dictation program – it does not know your accent, tone, speed, and so on.
You need to record a set of predetermined sentences to help the tool. This
process is called machine learning.
– the learning process is not about classifying in two or more categories but
about finding a correct value.
– regression predicts numeric values, whereas classification predicts categories.
Variety refers
to different
types of data.
– Transactional data
• from the sources that produce data from transactions on these
systems, and, have high volume and structured.
– Social data
• which are typically high volume and structured.
– Enterprise data
• data that is lower in volume and very structured.
• NameNode coordinate
where the data is stored,
and maintain a map of
where each block of data
is stored and where it is
replicated.
• Big Data tools like Hadoop and MapReduce are not suited for real-time analysis
– because of distance from the IoT endpoints and the network bandwidth requirement
• Streaming analytics allows you to continually monitor and assess data in real-time so that
you can adjust or fine-tune your predictions as the race progresses.
• In IoT, streaming analytics is performed at the edge (either at the sensors themselves or
very close to them such as gateway)
• The edge isn’t in just one place. The edge is highly distributed.
• Does the streaming analytics replaces big data analytics in the cloud?
– Not at all.
– Big data analytics is focused on large quantities of data at rest, edge analytics continually processes
streaming flows of data in motion.
• Time sensitivity
– When timely response to data is required, passing data to the cloud for
future processing results in unacceptable latency.
• Output streams
– The data that is output is organized into insightful streams and passed on for storage and further
processing in the cloud.
– It is network-based analytics
– power to analyze details of communications patterns made by protocols
• Capacity planning
• Security analysis
• Accounting
• Flow analysis at the gateway is not possible with all IoT systems
– LoRaWAN gateways simply forward MAC-layer sensor traffic to the centralized
LoRaWAN network server, which means flow analysis (based on Layer 3) is not
possible at this point.
– A similar problem is encountered when using an MQTT server that sends data
through an IoT broker
• Traffic flows are processed in places that might not support flow analytics,
and visibility is thus lost.
• IPv4 and IPv6 native interfaces sometimes need to inspect inside VPN
tunnels, which may impact the router’s performance.