DTA First Lecture
DTA First Lecture
775
729
683
634
592
546
500
340
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021-2022
Data Generation Platform: Social Media Platforms
IOT Devices
IOT Devices
General Website Statistics
• According to Forbes,
• There are about 1.09 billion websites on the internet in 2024.
• Every day sees the creation of 252,000 new sites
• Among these, 192,888,216 sites are active
• A new website is built every three seconds
• 71% of businesses have a website in 2023
• 29% of business is conducted online
• Google is the most visited website with 85.1 billion visitors
• YouTube is the second most visited site with over 33 billion
visitors
• In North America, 45% of web traffic comes from mobile
devices
Welcome to the World of Big Data
Characteristics of Big Data
▪Volume: This refers to the huge amount of data
generated by various sources.
–Internet generates 2.5 quintillion bytes of data
daily (SG Analytics, 2020)
–IOT devices generates 500 zettabytes of data
annually
Characteristics of Big Data
• Velocity: This refers to the speed at which data is generated from
these sources.
• Variety: This refers to the different types of data generated from
these sources.
• Veracity: This refers to the accuracy and reliability of the data
generated from these sources.
• Value: This refers to the usefulness and relevance of the data
generated from these sources.
Challenges Posed by Big Data
▪ Data Management
▪ Data Privacy and Security
▪ Data Integration
▪ Scalability
▪ Data Processing Speed
▪ Data Governance
▪ Skills Gap
Categories of Data
▪Unstructured data: these are data that
does not have a predefined format or
structure.
▪Examples of unstructured data include
emails, social media posts, audio
recordings, and text documents.
Categories of Data
▪Semi-structured data: these are data have
been partially organized but are not in a
well-defined format yet.
▪Examples of semi-structured data include
texts in XML files, JSON files, and NoSQL
databases
Categories of Data
i. Descriptive Analytics