Big Data Analytics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7
At a glance
Powered by AI
The key takeaways are that big data refers to large and complex datasets that are difficult to process using traditional methods. It discusses the need for big data analytics and covers topics like definitions, applications, tools, techniques and challenges.

The characteristics of big data discussed are volume, velocity, variety, veracity and value. Volume refers to the large amount of data. Velocity is the speed at which data is generated and processed. Variety means data can be structured or unstructured. Veracity is the uncertainty of the data. Value is the insights that can be derived.

Some of the tools and techniques discussed are machine learning, natural language processing, data visualization, Hadoop, Spark and Samza platforms.

Big Data Analytics

Suyash Pandey, Mohd Junaid


Department of Electronics and Communication Engineering,
Indian Institute of Information Technology Kota, Rajasthan, India

Email: [email protected], [email protected]

Abstract— This research paper is a day. It has been proven by a lot of surveys that the
comprehensive review of the current and emerging world has generated more than 90% of data in the last
aspects of big data analytics. The term "big data" 2 years . Even after having access to a plethora of
is used to describe data sets that are exceptionally information, people often feel less informed. This
large and complex, requiring more advanced abundance of data has had a significant impact on
processing methods than traditional techniques. businesses, as they must manage and analyze vast
The paper delves into the factors driving the amounts of online content, including social media
posts, videos, and mobile phone records. This type of
growth of big data, including the increased use of
data is commonly known as Big Data.
computerized devices and the popularity of social
media. It also discusses the characteristics of big
data, such as volume, velocity, variety, and veracity, The term "Big Data" refers to large and complex data
and the challenges involved in processing and sets that are generated from various sources such as
analyzing such vast amounts of data. The paper social media, e-commerce transactions, sensors, and
covers various tools and techniques used in big data other digital devices. One of the primary challenges
analytics, including machine learning, natural that we face with Big Data is how to manage and
language processing, and data visualization. analyze it effectively. Primitive data processing tools
Furthermore, it explores the role of big data and techniques are often not enough to handle such
platforms such as Hadoop, Spark, and Samza in massive amounts of data, and businesses need to adopt
enabling organizations to process and analyze these new and efficient technologies and methods to analyze
complex data sets. Additionally, the paper provides and manage it. This requires investment in powerful
an overview of the applications of big data analytics computing systems, data storage, and advanced
in healthcare, education, and sports. It highlights analytics tools that can process and extract valuable
how big data analytics is driving innovation, insights from the data.
improving decision-making, and enhancing
performance in these fields, and provides examples
of successful applications of big data analytics. B. WHAT IS BIG DATA?

Overall, this research paper provides a The concept of Big Data refers to datasets that have
comprehensive overview of big data analytics, grown to such an extent that they become challenging
covering a range of topics related to this rapidly to manage using traditional database management
evolving field. It highlights the importance of big concepts and tools. The complexity can arise from
data analytics in enabling organizations to gain various aspects, such as data capture, storage, search,
valuable insights from large and complex data sets, sharing, analytics, and visualization. Essentially, Big
and provides insights into the tools, techniques, and Data represents a large volume of structured and
platforms that are being used to achieve this goal. unstructured data that surpasses the capacity of
The objective of this review paper is to assist conventional data processing techniques to handle
individuals, such as students and researchers, who effectively.
may have little or no prior knowledge in the field of The three dimensions of Big Data are commonly
Big Data Analytics but are interested in exploring known as the 3Vs: volume, velocity, and variety.
this area of research. By addressing the challenges Recently, two additional dimensions have been added:
discussed in this paper, readers can gain a better veracity and value.
understanding of Big Data Analytics and progress
1. Volume: Refers to the massive amount of data
towards conducting their own research with ease.
generated and collected, which can be challenging to
store and process.
A. INTRODUCTION
2. Velocity: Refers to the speed at which data is
generated and collected, often in real-time, requiring
According to IBM, an enormous amount of data,
real-time processing and analytics.
approximately 2.5 quintillion bytes, is produced every
3. Variety: Refers to the diverse nature of Big Data,
including structured, semi-structured, and unstructured
The utilization of Big Data analytics empowers
data.
businesses to make better-informed decisions based on
real-time data, enabling them to swiftly respond to
changes in the market and improve their competitive
edge by monitoring customer behavior. In summary,
Big Data analytics has become crucial for businesses
to stay competitive in the present data-centric world.

D. CHARACTERISTICS OF BIG DATA


The Characteristics of Big Data is as follows:
• Comprehensive: Big Data consists of a vast amount
of information from various sources and formats.
• Enterprise-ready: Large businesses and organizations
can use big data to meet their needs.
Fig 1. The Five V’s of Big Data • Integrated: It is possible to integrate big data tools
and technologies with other platforms and
programmes.
4. Veracity: Refers to the quality and accuracy of the
data, which is crucial in ensuring reliable analysis. • Open source based: There are numerous open source
and cost-free big data technologies.
5. Value: Refers to the usefulness and relevance of the
insights derived from Big Data, which can help • Low latency reads and updates: Even with massive
organizations make informed decisions and gain a datasets, big data systems enable quick read and write
competitive advantage. performance.
• Scalability: Big Data has the ability to scale both
horizontally and vertically to handle expanding
C. WHY BIG DATA? datasets.
According to a study by the IDC, data is expected • Extensible: Big Data systems can be expanded with
to grow 50-fold by 2020, largely due to the increased new features and capabilities.
use of embedded systems like sensors in clothing,
medical devices, buildings, and bridges. The study • Allows AD HOC queries: Ad-hoc querying is made
found that unstructured data, such as files, email, and possible by big data platforms so that specific insights
video, will make up 90% of all data created over the and data can be extracted.
next decade. However, while the volume of data is • Minimal maintenance: Big Data platforms are made
growing exponentially, the number of IT professionals to need little upkeep and management.
available to manage it will only increase by 1.5 times
today's levels.
E. APPLICATIONS OF BIG DATA ANALYTICS
The digital universe currently comprises 1.8 trillion
gigabytes of data stored in 500 quadrillion files. In this paper, we explore the application of big data
Additionally, the size of the digital universe doubles analytics in research and its potential to transform the
every two years. Comparing the digital universe to our research landscape. We identify key applications of big
physical universe, there are almost as many bits of data analytics in various fields, including healthcare,
information in the digital universe as there are stars in business, social sciences and engineering. Overall, this
our physical universe. paper aims to provide a broad overview of the power
and potential of big data analytics in and highlights its
strengths and limitations, thereby portraying its diverse
Big Data analytics is crucial because conventional applications in different fields.
methods of data processing are not facilitated to handle
extraordinarily large amounts of data and information
in the present world. In today’s world, big firms have I. Big Data in the Education Industry:
access to an ample amount of sensitive information
such as the trends in the market, needs of the customer, The educational system maintains extensive databases
their behavior, etc. of information about students, teachers, courses, etc.
Better decisions can be made in many different fields data-driven decisions to protect citizens and manage
with the help of big data analysis, which will lead to crises.
efficient operations and boost overall performance.
Several areas where big data has aided in change
include: III. Big Data in the Media and Entertainment
Industry:

● Personalized Learning: We now understand that social media is one of the


Big data analytics enables personalized learning biggest and most significant platforms for producing
experiences based on student data analysis. big data, and that it also makes extensive use of big
It helps educators understand how students learn and data analytics. Among the ways that big data analysis
adjust the teaching method accordingly to improve is beneficial are:
academic outcomes.
● Audience Analytics:
Big data analytics enables media and entertainment
● Resource Allocation:
companies to understand their audience better.
Big data analytics helps schools and universities
It helps companies analyze data from various sources
allocate resources efficiently to maximize student
to identify audience preferences, behaviors, and trends,
success. It allows educational institutions to monitor
and create content that resonates with them.
and adjust budgets, staffing, and classroom resources
based on data-driven insights.
● Content Personalization:
Big data analytics enables media and entertainment
II. Big Data in the Government sector: companies to personalize their content to individual
viewers.
Since it is the national government, it has records of It helps companies use data analysis to create
almost everything. This is extremely widespread and personalized recommendations for their audience,
beneficial in a variety of ways, some of which are enhancing their viewing experience and improving
listed below: retention.

● Policy and Decision Making:


Big data analytics enables governments to make more
informed and data-driven policy and decision-making
processes.
It helps governments analyze large amounts of data
from various sources to identify trends, patterns, and
insights that inform policy decisions.

● Citizen Services:
Big data analytics enables governments to improve
citizen services by analyzing data on citizen needs and
preferences. Fig 2. Applications of Big Data Analytics
It helps governments provide more personalized and
efficient services, such as healthcare, public safety, and IV. Big Data in Business Intelligence:
transportation.
Big data has had a significant impact on the field of
● Emergency Management: business intelligence. It has enabled companies to
Big data analytics enables governments to respond to analyze large and complex data sets, including
emergency situations more efficiently and effectively. unstructured data sources such as social media and
It allows governments to analyze large amounts of data sensor data. Big data technologies like Hadoop and
in real-time to identify critical information and make Spark have allowed companies to store and process
data at scale, and real-time analytics have become Big data analytics enables sports teams to monitor
possible, enabling companies to respond to changes in player health and prevent injuries by analyzing data
the market quickly. from wearable devices and other health monitoring
tools.
● Data Integration: It helps coaches and team managers identify potential
Big data analytics enables businesses to integrate data injury risks and take preventative measures, such as
from multiple sources, including structured and adjusting training routines and rest schedules.
unstructured data.
It helps businesses create a unified view of their data ● Fan Engagement:
and gain insights that would be difficult to obtain Big data analytics enables sports teams to engage with
using traditional BI tools. their fans by analyzing data from social media, website
traffic, and other sources.
● Predictive Analytics: It helps teams create targeted marketing campaigns,
Big data analytics enables businesses to use predictive improve fan experiences, and optimize revenue
models to identify trends, patterns, and opportunities. streams by delivering personalized messaging and
It helps businesses use data analysis to make informed offerings.
predictions about future outcomes, such as customer
behavior, market trends, and sales. VI. Big Data Analytics in Healthcare:

● Real-Time Reporting: Big Data has the potential to revolutionize the


Big data analytics enables businesses to generate healthcare industry by improving patient outcomes and
real-time reports and dashboards based on up-to-date reducing costs. With the immense amount of data
data. generated from electronic health records, medical
It helps businesses make faster, data-driven decisions, devices, and patient monitoring systems, healthcare
identify opportunities and threats as they arise, and providers can gain valuable insights into patient health
respond quickly to changing market conditions. patterns, treatment effectiveness, and disease
prevention.
V. Big Data Analytics in Sports:
● Improving patient outcomes:
Big data analytics has emerged as a game-changer in By analyzing large amounts of patient data, including
the field of sports, providing teams and athletes with medical history, clinical notes, and imaging data,
valuable insights to improve performance, make healthcare providers can gain insights into patient
strategic decisions, and gain a competitive edge. By health and make more informed treatment decisions,
analyzing large volumes of data generated from leading to better outcomes.
various sources, including wearable devices, cameras,
and sensors, teams and athletes can gain insights into ● Enhancing population health management:
performance metrics such as speed, endurance, and Big data analytics can help healthcare providers
technique. identify patterns and trends in patient health, enabling
them to identify high-risk populations and develop
● Player Performance Analysis: targeted prevention and intervention programs.
Big data analytics enables sports teams to analyze ● Reducing costs:
player performance using a range of metrics, including By analyzing data on patient care and outcomes,
physical performance and gameplay data. healthcare providers can identify areas for
It helps coaches and team managers make data-driven improvement and implement cost-saving measures,
decisions about player recruitment, team strategy, and such as reducing hospital readmissions and optimizing
game tactics. medication use.

● Injury Prevention and Management:


F. PLATFORMS USED FOR BIG DATA features and benefits of the Spark platform for Big
ANALYTICS Data Analytics:

In the realm of big data, Hadoop, Spark, and Samza ● Speed and Efficiency:
are three widely used platforms for data analysis. Big Spark is designed for speed and efficiency, with
data platforms are software frameworks designed to in-memory processing capabilities that can
process and analyze large and complex data sets. significantly improve the performance of data
processing tasks.
I. Hadoop Platform: It also supports parallel processing, which allows it to
distribute workloads across multiple nodes in a cluster.
Hadoop is an open-source, distributed data processing
system that enables distributed and parallel processing ● Scalability:
of large amounts of data across clusters of computers. Spark is highly scalable, meaning it can handle large
Overall, Hadoop provides low cost, high efficiency, datasets and can be easily scaled up or down as data
high scalability, and high fault tolerance for processing volumes change.
large datasets. It also integrates with other big data technologies such
as Hadoop, making it a versatile platform for
managing and processing big data.

Fig 3. Infrastructure of Hadoop Platform

Hadoop consists of the HDFS distributed file system,


MapReduce for processing data, and several
general-purpose tools.
MapReduce is a parallel programming paradigm for Fig 4. Infrastructure of Spark Platform
processing large datasets across computer clusters.
HDFS is a distributed, scalable and portable file III. Samza Platform
system for storing large amounts of unstructured data.
Hive is a data warehouse infrastructure built on top of Apache Samza is an open-source distributed stream
Hadoop, providing summarization, query and analysis processing framework that is built to handle
of big datasets using SQL-like language. high-volume real-time data streams. Here are some
Mahout is an open-source machine learning and data features and benefits of the Samza platform for Big
mining algorithm set based on Hadoop. Data Analytics:

II. Spark Platform: ● Scalability:


Samza is designed to scale horizontally across a cluster
Apache Spark is an open-source distributed computing of machines, enabling it to handle large volumes of
system that can process large volumes of data in data in real-time.
parallel across a cluster of computers. Here are some
It can also integrate with other big data technologies, It can support sub-second latency for critical
including Apache Kafka and Apache Hadoop, for applications, making it ideal for use cases such as
added scalability and flexibility. real-time fraud detection and monitoring.

● Fault Tolerance: G. TECHNIQUES USED IN BIG DATA


Samza is built to handle failures in the data pipeline, ANALYTICS
allowing it to automatically recover from system
failures and continue processing data without data loss Techniques for big data analytics involve various
or corruption. methods and algorithms that help process and analyze
It uses checkpointing and state management large and complex datasets. Some commonly used
techniques to ensure data consistency and fault techniques include:
tolerance. 1. Machine learning: This involves training models to
recognize patterns in data and make predictions or
● Stream Processing: classifications.
Samza is designed for stream processing, meaning it
can handle real-time data streams and process them as 2. Natural language processing (NLP): This involves
they arrive. processing and analyzing human language data to
It supports complex event processing, filtering, derive insights and make predictions.
aggregation, and transformations, enabling users to
build a wide range of stream processing applications. 3. Data mining: This involves using statistical and
machine learning algorithms to identify patterns and
relationships in large datasets.

4. Text analytics: This involves extracting insights


from unstructured text data, such as social media posts,
customer reviews, and news articles.

5. Predictive analytics: This involves using historical


data to make predictions about future events or trends.

6. Visualization: This involves creating graphical


representations of data to help analysts identify
patterns and trends.

Fig 5. Infrastructure of Samza Platform H. CHALLENGES IN BIG DATA ANALYTICS

● Ease of Use: Big Data refers to a large volume of data that is


Samza is designed to be developer-friendly, with an difficult to manage and process. To address this issue,
easy-to-use API and simple deployment and Big Data Analytics techniques are used to analyze the
management tools. data and make it more relevant by eliminating
It also supports a range of programming languages, redundant and irrelevant data. Although there are
including Java, Scala, and Python, making it accessible several benefits of using these techniques, they also
to a broad range of developers. come with various challenges, such as security and
privacy concerns, complex coding, and
● Low Latency: implementation difficulties. It is essential for
Samza is built for low-latency processing, enabling it researchers to focus on developing optimal solutions to
to analyze and process data in real-time with minimal overcome these challenges.
delay.
Data Challenges : The primary challenge in dealing decision-making, improved customer satisfaction, and
with Big Data is managing its storage, processing, and enhanced operational efficiency. However, there are
identifying hidden patterns. The key areas to focus on also challenges such as data security and privacy that
for overcoming these data challenges are data storage, need to be addressed. With the right tools and
velocity of data, variety of data, computational power, techniques, these challenges can be overcome, and
understanding of data, quality of data, and data organizations can fully realize the benefits of big data
representation. analytics.

Management Challenges: Currently, many businesses


have integrated Big Data technologies, making it K. REFERENCES
crucial for them to select the optimal Big Data [1] Tanya Garg and Surbhi Khullar, “ Big Data
Analytics techniques to maximize their benefits. These Analytics: Applications, Challenges & Future
organizations must also manage changes effectively Directions ” 2020 8th International Conference on
Reliability, Infocom Technologies and Optimization
and in a timely manner. Some of the challenges they (Trends and Future Directions) (ICRITO) Amity
face include leadership, talent management, University, Noida, India. June 4-5, 2020
technology management, decision-making ability, and [2] Papoj Thamjaroenporn and Tiranee Achalakul, “
the overall company environment. Big Data Analytics Framework for Digital
Government ” 2020 1st International Conference on
Security Issues: Data security is a major concern in big Big Data Analytics and Practices (IBDAP)
data analytics. The sheer volume of data and the [3] KENNETH LI-MINN ANG, FENG LU GE and
complex nature of the analysis makes it vulnerable to AND KAH PHOOI SENG, “ Big Educational Data &
security breaches. This can result in unauthorized Analytics: Survey, Architecture and Challenges ”
Received April 10, 2020, accepted April 25, 2020, date
access, data leaks, and other malicious activities. of publication May 14, 2020, date of current version
Protecting sensitive information and ensuring data July 2, 2020
privacy is critical, and requires robust security
[4] Sachchidanand Singh and Nirmala Singh, “
measures such as access control, encryption, and Nirmala Singh ” 2012 International Conference on
monitoring. Communication, Information & Computing
Technology (ICCICT), Oct. 19-20, Mumbai, India
I. ACKNOWLEDGEMENT [5] Uma Narayanan, Dr. Varghese Paul and Dr.
Varghese Paul, “ Different Analytical Techniques for
We would like to express our sincere gratitude to our Big Data Analysis: A Review ” International
Conference on Energy, Communication, Data
mentors, Dr. Vinita Tiwari and Dr. Amit Kumar Garg,
Analytics and Soft Computing (ICECDS-2017)
for their guidance and support throughout the
development of this research paper. Their unwavering [6] Alka Londhe and PVRD Prasada Rao, “ PVRD
Prasada Rao ” International Conference on Energy,
support and encouragement have motivated us to Communication, Data Analytics and Soft Computing
explore new avenues of research and to strive for (ICECDS-2017)
excellence in all aspects of our work. Their mentorship
[7] Isaac Kofi Nti, Juanita Ahia Quarcoo, Justice
has been a source of inspiration and motivation, and Aning, and Godfred Kusi Fosu, “ Isaac Kofi Nti,
we are honored to have had the opportunity to work Juanita Ahia Quarcoo, Justice Aning, and Godfred
under them. Kusi Fosu ” BIG DATA MINING AND ANALYTICS
ISSN 2096 -0654 01/06 pp 81 – 97 Volume 5 ,
Number 2 , June 2022 DOI : 10.26599 / BDMA . 2021
J. CONCLUSION . 9020028
The outcome of our research paper on big data
analytics covers various topics, including definitions,
the need for it, its applications, tools, and techniques.
In conclusion, big data analytics plays a critical role in
addressing the challenges of processing and analyzing
large and complex data sets. The implementation of
big data analytics in various industries has led to better

You might also like