Iot M4
Iot M4
VTU short notes by YouTuber Afnan Marquee. The notes are in simple format for quick
learning. For video explanation, check out my YouTube Channel!
While relational databases are still used for certain data types and applications, they often
struggle with the nature of IoT data. IoT data places two specific challenges on a relational
database:
ML is concerned with any process where the computer needs to receive a set of data that is
processed to help perform a task with more efficiency.
Supervised Learning
In supervised learning, the machine is trained with input for which there is a known correct
answer. For example, suppose that you are training a system to recognize when there is a
human in a mine tunnel.
Unsupervised learning,, uses machine learning algorithms to analyze and cluster unlabeled
datasets. These algorithms discover hidden patterns or data groupings without the need for
human intervention.
The three V’s of Big Data are: Volume, Velocity, and Variety.
Massively parallel processing (MPP) databases were built on the concept of the relational data
warehouses but are designed to be much faster, to be efficient, and to support reduced query
times.
NoSQL (“not only SQL”) is a class of databases that support semi-structured and unstructured.
It supports the following database types: Document store, Object store, key-value store,
row-column store, graph store.
Hadoop is the most recent entrant into the data management market, but it is arguably the most
popular choice as a data repository and processing engine.
Apache Storm and Apache Flink are other Hadoop ecosystem projects designed Apache Spark
Apache Spark is an in-memory distributed data analytics platform designed to accelerate
processes for distributed stream processing and are commonly deployed for IoT use cases.
Storm can pull data from Kafka and process it in a near-real-time fashion, and so can Apache
Flink. This space is rapidly evolving, and projects will continue to gain and lose popularity as
they evolve.
Lambda Architecture
Ultimately the key elements of a data infrastructure to support many IoT use cases involves the
collection, processing, and storage of data using multiple technologies. Querying both data in
motion (streaming) and data at rest (batch processing) requires a combination of the Hadoop
ecosystem projects discussed.
One architecture that is currently being leveraged for this functionality is the Lambda
Architecture. Lambda is a data management system that consists of two layers for ingesting
data (Batch and Stream) and one layer for providing the combined data (Serving). These layers
allow for the packages discussed previously, like Spark and MapReduce, to operate on the data
independently, focusing on the key attributes for which they are designed and optimized.
Stream layer: This layer is responsible for near-real-time processing of events.
Batch layer: The Batch layer consists of a batch-processing engine and data store.
Serving layer: The Serving layer is a data store and mediator that decides which of the ingest
layers to query based on the expected result or view into the data.