Big Data Report
Big Data Report
Micro-Project Report On
“Big Data”
Submitted By
Polytechnic Rajkot
October - 2024
1
CRETIFICATE
Place : Rajkot
2
INDEX
3
Introduction About Topic
Big data refers to extremely large datasets that are too complex or va
st for traditional data processing tools to handle.
This data can come from various sources, like social media, sensors,
transactions, and more.
The key characteristics of big data are often referred to as the three V
's: volume (the amount of data), velocity (the speed at which data is g
enerated), and variety (the different types of data).
The goal of big data analytics is to extract meaningful insights and inf
ormation from these massive datasets.
It involves using advanced tools and techniques like machine learnin
g, data mining, and statistical analysis to process and analyze the da
ta.
4
Here’s a brief history of technology in big data
1940s: The concept of "information explosion" emerged, highlighting
the rapid growth of data1.
1960s-
1970s: Mainframe computers were introduced, significantly increasi
ng data storage capacities2.
1980s: The development of relational databases allowed for more eff
icient data organization and retrieval.
1990s: The internet boom led to an exponential increase in data gene
ration and the need for better data management tools.
2000s: Hadoop, an open-
source framework for distributed storage and processing of large dat
asets, was developed2.
2010s: Big data analytics became mainstream, with advancements i
n machine learning, AI, and cloud computing.
2020s: Real data processing and the Internet of Things (IoT) have furt
her expanded the scope and capabilities of big data technologies.
5
Working In Big Data
Data Collection: Gathering large amounts of data from various sour
ces like social media, sensors, transactions, and more.
Data Storage: Using scalable storage solutions like Hadoop Distribu
ted File System (HDFS) or cloud storage to store the vast data sets.
Data Processing: Leveraging tools like Apache Spark, Hadoop MapR
educe, or real-
time processing frameworks like Apache Flink to process and manag
e the data.
Data Analysis: Applying analytical techniques and machine learning
algorithms to uncover patterns, correlations, and insights. This can b
e done using tools like R, Python, or RapidMiner.
Data Visualization: Presenting the data in an understandable format
using visualization tools like Tableau, Power BI, or Sisense. This help
s in making data-driven decisions.
These steps form a cycle where the insights gained can influence further da
ta collection and analysis, continuously improving the decision-
making process.
6
Use Of Big Data
1. Healthcare: Analyzing large datasets from electronic health records
to improve patient care, predict disease outbreaks, and optimize trea
tment plans.
7
Advantages In Big Data
1. Better Decision Making: Data-
driven insights help organizations make more informed and accurate
decisions.
8
Limitation And Disadvantage
1. Data Quality: Ensuring the accuracy and reliability of massive datas
ets can be challenging.
4. Cost: Infrastructure and tools for storing and processing big data can
be expensive.
5. Data Overload: The sheer volume of data can make it difficult to extr
act meaningful insights without proper techniques.
9
Future Expansion in Big Data
1. Artificial Intelligence (AI) and Machine Learning (ML): AI and ML wil
l play an increasingly important role in big data analysis, helping busi
nesses quickly and accurately make sense of vast amounts of data1.
10