Module 5 Ioe
Module 5 Ioe
Defining IoT Analytics, IoT Analytics challenges, IoT analytics for the cloud,
Strategies to organize Data for IoT Analytics, Linked Analytics Data Sets,
Managing Data lakes.
The data retention strategy, visualization and Dashboarding-Designing visual
analysis for IoT data, creating a dashboard, creating and visualizing alerts.
PYQs:
1. IOT data analytics importance and strategies
2. Role of dashboard for data visualization
3. Role of data refineries in preventing data lakes to turn into data swamps
4. Data retention strategies
5. IOT data analytics Vs Network analytics
Data refineries play a crucial role in preventing data lakes from turning into data
swamps by transforming raw data into usable, high-quality, and organized data that can
be effectively leveraged for analytics and decision-making
• Data Lake: A large storage repository that holds vast amounts of raw,
unstructured, and semi-structured data. It allows for flexible data storage,
supporting various types of data, including logs, sensor data, or text. The value of
a data lake lies in its ability to store large volumes of data at low cost.
• Data Swamp: When a data lake becomes unmanaged, disorganized, and filled
with irrelevant or low-quality data, it turns into a "data swamp." In this case,
retrieving valuable insights becomes difficult, and the data becomes essentially
useless.
Data refineries prevent this issue by adding structure, quality checks, and
transformations to raw data before it is used for analytics. They process incoming data
to ensure it is clean, consistent, and ready for use
Here’s how data refineries prevent data lakes from turning into data swamps:
1. Data Cleansing:
• Data refineries perform automatic data validation and cleansing operations,
such as removing duplicates, handling missing values, and correcting
errors. This ensures that only high-quality, reliable data flows into the data
lake.
2. Data Organization:
• Raw data is often unstructured or semi-structured. Data refineries
categorize and label data, making it easier to search and retrieve.
• Tagging systems and metadata management are implemented to help
organize data, preventing chaos in data lakes.
4. Data Governance:
• Data refineries implement data governance policies to control data flow,
maintain standards, and ensure compliance.
• This includes applying rules for data access, ensuring data security, and
managing user roles, thereby preventing unauthorized changes or access.
5. Automated Processes:
• By automating data transformation tasks, refineries ensure consistency in
data quality. Automated pipelines enable continuous cleaning,
formatting, and filtering of incoming data, keeping the data lake clean and
functional.
8. Enhanced Analytics:
With properly processed and structured data, data lakes turn into valuable
repositories that support advanced analytics, such as machine learning or
business intelligence, thereby maximizing the potential of the stored data
• the business requirements and the use case for the data,
Storing data that no longer has any usefulness can have detrimental effects on your
organization. As the data inventory increases, it compounds the following negative
effects on your business:
• Reduce Storage Costs: By retaining only essential data and setting specific
timelines for data archiving and deletion, businesses can reduce their storage
infrastructure needs. Unnecessary data is systematically identified and
removed, minimizing the burden on storage systems and reducing costs
associated with data maintenance. Cost-effective storage solutions, like tiered
storage systems or cloud-based archiving, can further help in managing these
expenses.
Examples:
1. Healthcare: A hospital dashboard could display real-time patient data from
IoT-enabled devices (e.g., heart rate monitors, blood pressure sensors). Doctors
can quickly visualize trends and intervene when necessary to prevent medical
emergencies .
• Simplicity: Keep designs clean and simple to avoid overwhelming users with too
much information. Limit the number of visual elements to the most critical ones.
• Color Schemes: Use color schemes that are easy to interpret and accessible,
considering color blindness and contrast issues. Employ colors meaningfully
(e.g., red for alerts, green for optimal performance).
• Interactivity: Incorporate interactive features like filters, drill-down capabilities,
and tooltips to allow users to explore data in-depth.
• User Testing: Conduct user testing sessions to gather feedback on the usability
and effectiveness of the visualizations. Observe how users interact with the
visual analysis.
Example of GPS positional data from an IoT device. Each instance in the dataset will be
a combination of time and location.
2. List out all the variations, categories, calculations, and descriptions that are
added or transformed from the data:
• Latitude
• Longitude
• Speed
• Daytime or nighttime
3. Review each item for how often they are likely to be used in analysis, ML
modeling, or reporting:
For example, the exact local time and the day of the week were eliminated.
This is based on the expected frequency of use versus the storage and
computational costs of keeping the information.
This is a balancing act and your decision may shift over time as different fields
become more or less valuable to your business.
4. Create data transformation code that automatically creates and maintains the
information in one table:
• The goal is to do this in an automated fashion so that the data scientists do not
have to recreate it each time they need it.
5. Create a unique identifier for each record, if it does not already exist:
• Exact UTC time combined with the unique device identifier, so there is a need to
create a separate ID field for the combination.
10. IOT Data analytics Vs Network Analytics