Unit 06 Assignment 1 Frontsheet
Unit 06 Assignment 1 Frontsheet
Unit 06 Assignment 1 Frontsheet
Student declaration
I certify that the assignment submission is entirely my own work and I fully understand the consequences of plagiarism. I
understand that making a false declaration is a form of malpractice.
Grading grid
P1 P2 P3 P4 M1 M2 D1
❒ Summative Feedback: ❒ Resubmission Feedback:
IV Signature:
Table of Contents
A. Introduction..........................................................................................................................................................................................5
B. Content...............................................................................................................................................................................................11
I. P1 Demonstrate qualitative and quantitative research methods to generate relevant primary data for an identified theme..........11
1. What is primary source?.............................................................................................................................................................11
2. Primary source in bigdata...........................................................................................................................................................11
II. P2 Examine secondary sources to collect relevant secondary data and information for an identified theme...............................12
1) What is secondary sources? (Valcheva, 2023)............................................................................................................................12
2) Secondary sources in bigdata......................................................................................................................................................13
III. P3 Discuss the features and operational areas of a businesses in an identified sector................................................................13
IV. P4 Discuss the role of stakeholders and their impact on the success of a...................................................................................18
C. Conclusion.........................................................................................................................................................................................20
References..................................................................................................................................................................................................21
Hình 1:what is bigdata.................................................................................................................................................................................5
Hình 2:the 5V..............................................................................................................................................................................................6
Hình 3:Volume............................................................................................................................................................................................7
Hình 4:Velocity...........................................................................................................................................................................................8
Hình 5:Variety.............................................................................................................................................................................................9
Hình 6:veracity..........................................................................................................................................................................................10
Hình 7:Value..............................................................................................................................................................................................10
Hình 8:primary source...............................................................................................................................................................................11
Hình 9:secondary data...............................................................................................................................................................................12
Hình 10:hadoop.........................................................................................................................................................................................14
Hình 11:Apache Spark...............................................................................................................................................................................14
Hình 12:NoSQL.........................................................................................................................................................................................15
Hình 13:Machine Learning and AI............................................................................................................................................................15
Hình 14:Cloud Computing........................................................................................................................................................................16
Hình 15:Data Warehouses.........................................................................................................................................................................16
Hình 16:Data Integration Tools:................................................................................................................................................................17
Hình 17:Data Visualization Tools.............................................................................................................................................................17
Hình 18:Blockchain...................................................................................................................................................................................18
Hình 19:Edge Computing..........................................................................................................................................................................18
A. Introduction
Big data is a term for processing very large and complex data sets that cannot be processed by traditional data processing
applications. Big data includes challenges such as analysis, collection, data monitoring, search, sharing, storage,
transmission, visualization, query and privacy. The term often simply refers to the use of predictive analytics, user behavior
analytics, or some other advanced data analytics method that extracts value from data that is rarely addressed. refers to the
size of the dataset.[2] “There is little doubt that the amount of data available today is truly large, but that is not the most
relevant characterization of this new data ecosystem.”[3]
Big data typically consists of data collections whose size far exceeds the ability of conventional software tools to collect,
display, manage, and process the data in an acceptable time. Big data size is a constantly moving target. As of 2012, the
range is a few dozen terabytes to many petabytes of data. Big data requires a new set of integrated techniques and
technologies to mine from diverse, complex, and large-scale data sets. (bigdata, 2023)
The 5 V's of big data (velocity, volume, value, variety and veracity) are the five main and innate characteristics of big data.
Knowing the 5 V's allows data scientists to derive more value from their data while also allowing the scientists' organization to
become more customer-centric. (the 5V bigdata, 2023)
Hình 2:the 5V
Volume:
o Volume, the first of the 5 V's of big data, refers to the amount of data that exists. Volume is like the base of big data, as
it is the initial size and amount of data that is collected. If the volume of data is large enough, it can be considered big
data. What is considered to be big data is relative, though, and will change depending on the available computing
power that's on the market.
Hình 3:Volume
Velocity:
o The next of the 5 V's of big data is velocity. It refers to how quickly data is generated and how quickly that data moves.
This is an important aspect for companies need that need their data to flow quickly, so it's available at the right times to
make the best business decisions possible.
Hình 4:Velocity
Variety:
o The next V in the five 5 V's of big data is variety. Variety refers to the diversity of data types. An organization might
obtain data from a number of different data sources, which may vary in value. Data can come from sources in and
outside an enterprise as well. The challenge in variety concerns the standardization and distribution of all data being
collected.
Hình 5:Variety
Veracity:
o Veracity is the fourth V in the 5 V's of big data. It refers to the quality and accuracy of data. Gathered data could have
missing pieces, may be inaccurate or may not be able to provide real, valuable insight. Veracity, overall, refers to the
level of trust there is in the collected data.
Hình 6:veracity
Value:
o The last V in the 5 V's of big data is value. This refers to the value that big data can provide, and it relates directly to
what organizations can do with that collected data. Being able to pull value from big data is a requirement, as the value
of big data increases significantly depending on the insights that can be gained from them.
Hình 7:Value
B. Content
I. P1 Demonstrate qualitative and quantitative research methods to generate
relevant primary data for an identified theme.
1. What is primary source?
Primary sources are the most direct evidence of a time or event because they were created by people or things that were there
at the time or event. These sources offer original thought and have not been modified by interpretation. Primary sources are
original materials, regardless of format. (Chrisite, 2021)
o Apache Spark: Spark was born in 2014 and gained popularity quickly as a more powerful big data processing platform
than MapReduce. It supports real-time data processing and machine learning. Spark provides a higher and near-instant
performance for big data processing.
o NoSQL Databases: NoSQL database systems began as a reaction against traditional relational databases. These systems
have evolved since the early 2000s and provide large data storage capabilities with a flexible structure. Systems like
MongoDB and Cassandra have become popular for storing and querying big data.
Hình 12:NoSQL
o Machine Learning and AI: Machine learning and artificial intelligence have evolved continuously and have been
integrated into many big data tools and platforms. Machine learning algorithms and deep learning capabilities are
evolving, helping to build predictive and analytical models from big data.
o Cloud Computing: Cloud services have grown rapidly since the late 2000s, with providers such as Amazon, Microsoft
and Google providing cloud platforms for big data storage and processing. Services like Amazon Web Services (AWS)
have opened up the possibility of using computing and storage resources on demand.
Hình 14:Cloud Computing
o Data Warehouses: Data warehouses have evolved from traditional large-scale query (OLAP) systems to high-
performance systems like Amazon Redshift and Google BigQuery. They allow querying large data at high speed.
o Data Integration Tools: Data integration tools have evolved to support the aggregation and movement of data from
multiple sources to large data storage and processing systems. Tools like Apache Nifi, Talend and Apache Camel have
become important in this process.
Hình 16:Data Integration Tools:
o Data Visualization Tools: Data visualization tools have evolved to help chart and display information from big data in
an easy-to-understand manner. Tools like Tableau and Power BI have become popular in data visualization.
o Blockchain: Blockchain technology has been used to secure and verify transactions in big data-based systems,
especially in industries such as finance and supply chains.
Hình 18:Blockchain
o Edge Computing: Edge computing is evolving to support data processing close to its source, helping to reduce latency
in real-time data response and analysis.
IV. P4 Discuss the role of stakeholders and their impact on the success of a
Implementing a big data solution is indeed challenging, but understanding the challenges and their solutions is crucial. Let's review the
challenges and their corresponding solutions:
Managing massive amounts of data:
o Challenge: The growing volume of data can overwhelm traditional data centers.
o Solution: Companies are migrating to cloud storage solutions that can dynamically scale as data storage needs
increase. Big data software is optimized for storing and querying large data volumes efficiently.
Integrating data from multiple sources:
o Challenge: Diverse data from various sources must be integrated for meaningful insights.
o Solution: Employ data integration software, ETL processes, and business intelligence tools to unify data from different
sources into a common structure for accurate reporting and analysis.
Ensuring data quality:
o Challenge: Accurate, clean data is essential for valid insights, but ensuring data quality can be challenging as data
sources and types increase.
o Solution: Use data governance applications to organize, secure, and validate data sources, and implement data quality
software to clean and validate data before processing.
Keeping data secure:
o Challenge: Protecting sensitive data from breaches is critical.
o Solution: Employ cybersecurity professionals to ensure data security, implement encryption, identity and access
controls, endpoint protection, and real-time monitoring to safeguard data.
Selecting the right big data tools:
o Challenge: A plethora of big data tools can be overwhelming.
o Solution: Hire a consultant to help select the right tools based on your business needs. They can choose tools that align
with your current and future requirements, such as enterprise data streaming and ETL solutions.
Scaling systems and costs efficiently:
o Challenge: Inefficient resource utilization can lead to unnecessary costs.
o Solution: Start with clear goals, data strategies, and data types, and create policies for purging obsolete data. This
ensures efficient data processing and cost management.
Lack of skilled data professionals:
o Challenge: Existing staff may lack big data expertise.
o Solution: Hire a big data specialist to manage and train your team or offer training to existing staff. Alternatively,
consider self-service analytics or business intelligence solutions for those without a data science background.
Organizational resistance:
o Challenge: Resistance to change and skepticism about the value of big data.
o Solution: Begin with smaller, demonstrable projects to prove the value of big data. Gradually transition to becoming a
data-driven organization, and consider placing big data experts in leadership roles to guide the transformation.
Addressing these challenges and implementing these solutions will help organizations effectively leverage big data for insights and
business improvements.
C. Conclusion
In the 21st century, marked by the age of digitalization and the proliferation of information, Big Data has emerged as a pivotal force
influencing both the economy and society. This comprehensive report delves into the realm of Big Data, exploring its origins,
applications, challenges, and promising prospects.
First and foremost, it's essential to recognize that Big Data is more than just the processing of vast datasets; it entails the discovery,
analysis, and strategic utilization of information within data to generate value. As mentioned, countless businesses and organizations
have harnessed Big Data to enhance operational efficiency, foster innovation, and devise novel solutions for intricate issues.
The potential of Big Data is equally remarkable. With the ascent of artificial intelligence and machine learning, our capacity to dissect
and comprehend data is constantly expanding. Consequently, Big Data is poised to extend its influence further in the future, spanning
domains like healthcare, education, and the management of smart cities.
However, for Big Data to deliver maximum societal benefits, a well-prepared workforce and the right supporting policies are
imperative. Adequate training and education are instrumental in equipping individuals with the skills to access, interpret, and
effectively utilize data. Concurrently, supportive policies can create a conducive business environment for entities leveraging Big
Data.
In summation, Big Data stands as a decisive driver of modern-world development. By comprehending and capitalizing on its
advantages, we can unlock new opportunities and mold a more promising future. Yet, this also demands a collective sense of
awareness and responsibility to ensure the appropriate and beneficial use of Big Data by all.
References
bigdata, 2023. [Online]
Available at: https://vi.wikipedia.org/wiki/D%E1%BB%AF_li%E1%BB%87u_l%E1%BB%9Bn