Unit 06 Assignment 1 Frontsheet

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 21

ASSIGNMENT 1 FRONT SHEET

Qualification BTEC Level 5 HND Diploma in Computing

Unit number and title Unit 06: Planning a computing project

Submission date Date Received 1st submission

Re-submission Date Date Received 2nd submission

Student Name Nguyen Chi Thanh Student ID BH00887

Class SE06205 Assessor name Nguyen Trong Hung

Student declaration

I certify that the assignment submission is entirely my own work and I fully understand the consequences of plagiarism. I
understand that making a false declaration is a form of malpractice.

Student’s signature Thanh

Grading grid

P1 P2 P3 P4 M1 M2 D1
❒ Summative Feedback: ❒ Resubmission Feedback:

Grade: Assessor Signature: Date:

IV Signature:
Table of Contents
A. Introduction..........................................................................................................................................................................................5
B. Content...............................................................................................................................................................................................11
I. P1 Demonstrate qualitative and quantitative research methods to generate relevant primary data for an identified theme..........11
1. What is primary source?.............................................................................................................................................................11
2. Primary source in bigdata...........................................................................................................................................................11
II. P2 Examine secondary sources to collect relevant secondary data and information for an identified theme...............................12
1) What is secondary sources? (Valcheva, 2023)............................................................................................................................12
2) Secondary sources in bigdata......................................................................................................................................................13
III. P3 Discuss the features and operational areas of a businesses in an identified sector................................................................13
IV. P4 Discuss the role of stakeholders and their impact on the success of a...................................................................................18
C. Conclusion.........................................................................................................................................................................................20
References..................................................................................................................................................................................................21
Hình 1:what is bigdata.................................................................................................................................................................................5
Hình 2:the 5V..............................................................................................................................................................................................6
Hình 3:Volume............................................................................................................................................................................................7
Hình 4:Velocity...........................................................................................................................................................................................8
Hình 5:Variety.............................................................................................................................................................................................9
Hình 6:veracity..........................................................................................................................................................................................10
Hình 7:Value..............................................................................................................................................................................................10
Hình 8:primary source...............................................................................................................................................................................11
Hình 9:secondary data...............................................................................................................................................................................12
Hình 10:hadoop.........................................................................................................................................................................................14
Hình 11:Apache Spark...............................................................................................................................................................................14
Hình 12:NoSQL.........................................................................................................................................................................................15
Hình 13:Machine Learning and AI............................................................................................................................................................15
Hình 14:Cloud Computing........................................................................................................................................................................16
Hình 15:Data Warehouses.........................................................................................................................................................................16
Hình 16:Data Integration Tools:................................................................................................................................................................17
Hình 17:Data Visualization Tools.............................................................................................................................................................17
Hình 18:Blockchain...................................................................................................................................................................................18
Hình 19:Edge Computing..........................................................................................................................................................................18
A. Introduction
 Big data is a term for processing very large and complex data sets that cannot be processed by traditional data processing
applications. Big data includes challenges such as analysis, collection, data monitoring, search, sharing, storage,
transmission, visualization, query and privacy. The term often simply refers to the use of predictive analytics, user behavior
analytics, or some other advanced data analytics method that extracts value from data that is rarely addressed. refers to the
size of the dataset.[2] “There is little doubt that the amount of data available today is truly large, but that is not the most
relevant characterization of this new data ecosystem.”[3]
 Big data typically consists of data collections whose size far exceeds the ability of conventional software tools to collect,
display, manage, and process the data in an acceptable time. Big data size is a constantly moving target. As of 2012, the
range is a few dozen terabytes to many petabytes of data. Big data requires a new set of integrated techniques and
technologies to mine from diverse, complex, and large-scale data sets. (bigdata, 2023)

Hình 1:what is bigdata

 The 5 V's of big data (velocity, volume, value, variety and veracity) are the five main and innate characteristics of big data.
Knowing the 5 V's allows data scientists to derive more value from their data while also allowing the scientists' organization to
become more customer-centric. (the 5V bigdata, 2023)
Hình 2:the 5V

 Volume:
o Volume, the first of the 5 V's of big data, refers to the amount of data that exists. Volume is like the base of big data, as
it is the initial size and amount of data that is collected. If the volume of data is large enough, it can be considered big
data. What is considered to be big data is relative, though, and will change depending on the available computing
power that's on the market.
Hình 3:Volume

 Velocity:
o The next of the 5 V's of big data is velocity. It refers to how quickly data is generated and how quickly that data moves.
This is an important aspect for companies need that need their data to flow quickly, so it's available at the right times to
make the best business decisions possible.
Hình 4:Velocity

 Variety:
o The next V in the five 5 V's of big data is variety. Variety refers to the diversity of data types. An organization might
obtain data from a number of different data sources, which may vary in value. Data can come from sources in and
outside an enterprise as well. The challenge in variety concerns the standardization and distribution of all data being
collected.
Hình 5:Variety

 Veracity:
o Veracity is the fourth V in the 5 V's of big data. It refers to the quality and accuracy of data. Gathered data could have
missing pieces, may be inaccurate or may not be able to provide real, valuable insight. Veracity, overall, refers to the
level of trust there is in the collected data.
Hình 6:veracity

 Value:
o The last V in the 5 V's of big data is value. This refers to the value that big data can provide, and it relates directly to
what organizations can do with that collected data. Being able to pull value from big data is a requirement, as the value
of big data increases significantly depending on the insights that can be gained from them.

Hình 7:Value
B. Content
I. P1 Demonstrate qualitative and quantitative research methods to generate
relevant primary data for an identified theme.
1. What is primary source?
 Primary sources are the most direct evidence of a time or event because they were created by people or things that were there
at the time or event. These sources offer original thought and have not been modified by interpretation. Primary sources are
original materials, regardless of format. (Chrisite, 2021)

Hình 8:primary source

2. Primary source in bigdata


 IoT sensors: Data from sensors on Internet of Things (IoT) devices such as weather sensors, environmental sensors, and
industrial sensors.
 Social media: Data from activities on social media platforms such as Facebook, Twitter, Instagram, and LinkedIn.
 Websites and online applications: Data from user access and interactions on websites and online applications.
 Enterprise database: Data from enterprise databases such as customer data, transactions, and business information.
 Scientific instruments and measurement tools: Data from scientific instruments and measurement instruments in fields such as
physical sciences, medicine, and many other fields of study.
 Other direct sources: Data collected directly from sources such as surveys, tests, and online transactions.

II. P2 Examine secondary sources to collect relevant secondary data and


information for an identified theme.
1) What is secondary sources? (Valcheva, 2023)
 Secondary data is data that has been collected for another purpose but has some relevance to your current research needs.
 In other words, it was previously collected by someone else, not you. And now, you can use the data.
 Secondary data is old information. It is not used for the first time. That's why it's called secondary.

Hình 9:secondary data


2) Secondary sources in bigdata
 Research Reports and Materials: Data from research reports, references, and published research works. This can be
information from scientific studies, market reports, and academic documents.
 Data from enterprise systems: Data from enterprise systems, including information about enterprise transactions, customer
data, and financial data. Customer management systems (CRM) and financial systems are examples.
 Social and media data: Data from social networks, websites, and media applications are grouped and analyzed to create
secondary data. This can include analyzing social media data, data from media projects, and data from websites.
 Statistical and financial data: Statistical data from government statistical agencies, financial institutions, and other financial
sources. This can be economic data, financial information, and price data.
 Data from research projects: Data from scientific research projects and social statistics are often used to create secondary data.
This may include information from surveys, experiments, and research projects.
 Data from open sources and online communities: Data from open sources and online communities, such as open source
projects and online forums, are often used to create secondary data.

III. P3 Discuss the features and operational areas of a businesses in an identified


sector.
 Here are details on some of the key technologies that have contributed to the growth of big data and how they have evolved:
o Hadoop: Hadoop started as a research project at Yahoo! as an Apache project in 2006. It uses HDFS for big data
storage and MapReduce for data processing. Since then, Hadoop has grown to become one of the most popular
distributed platforms for big data. Apache Hadoop currently has many different sub-projects such as Hive, Pig, and
Spark to support many different applications.
Hình 10:hadoop

o Apache Spark: Spark was born in 2014 and gained popularity quickly as a more powerful big data processing platform
than MapReduce. It supports real-time data processing and machine learning. Spark provides a higher and near-instant
performance for big data processing.

Hình 11:Apache Spark

o NoSQL Databases: NoSQL database systems began as a reaction against traditional relational databases. These systems
have evolved since the early 2000s and provide large data storage capabilities with a flexible structure. Systems like
MongoDB and Cassandra have become popular for storing and querying big data.
Hình 12:NoSQL

o Machine Learning and AI: Machine learning and artificial intelligence have evolved continuously and have been
integrated into many big data tools and platforms. Machine learning algorithms and deep learning capabilities are
evolving, helping to build predictive and analytical models from big data.

Hình 13:Machine Learning and AI

o Cloud Computing: Cloud services have grown rapidly since the late 2000s, with providers such as Amazon, Microsoft
and Google providing cloud platforms for big data storage and processing. Services like Amazon Web Services (AWS)
have opened up the possibility of using computing and storage resources on demand.
Hình 14:Cloud Computing

o Data Warehouses: Data warehouses have evolved from traditional large-scale query (OLAP) systems to high-
performance systems like Amazon Redshift and Google BigQuery. They allow querying large data at high speed.

Hình 15:Data Warehouses

o Data Integration Tools: Data integration tools have evolved to support the aggregation and movement of data from
multiple sources to large data storage and processing systems. Tools like Apache Nifi, Talend and Apache Camel have
become important in this process.
Hình 16:Data Integration Tools:

o Data Visualization Tools: Data visualization tools have evolved to help chart and display information from big data in
an easy-to-understand manner. Tools like Tableau and Power BI have become popular in data visualization.

Hình 17:Data Visualization Tools

o Blockchain: Blockchain technology has been used to secure and verify transactions in big data-based systems,
especially in industries such as finance and supply chains.
Hình 18:Blockchain

o Edge Computing: Edge computing is evolving to support data processing close to its source, helping to reduce latency
in real-time data response and analysis.

Hình 19:Edge Computing

IV. P4 Discuss the role of stakeholders and their impact on the success of a

Implementing a big data solution is indeed challenging, but understanding the challenges and their solutions is crucial. Let's review the
challenges and their corresponding solutions:
 Managing massive amounts of data:
o Challenge: The growing volume of data can overwhelm traditional data centers.
o Solution: Companies are migrating to cloud storage solutions that can dynamically scale as data storage needs
increase. Big data software is optimized for storing and querying large data volumes efficiently.
 Integrating data from multiple sources:
o Challenge: Diverse data from various sources must be integrated for meaningful insights.
o Solution: Employ data integration software, ETL processes, and business intelligence tools to unify data from different
sources into a common structure for accurate reporting and analysis.
 Ensuring data quality:
o Challenge: Accurate, clean data is essential for valid insights, but ensuring data quality can be challenging as data
sources and types increase.
o Solution: Use data governance applications to organize, secure, and validate data sources, and implement data quality
software to clean and validate data before processing.
 Keeping data secure:
o Challenge: Protecting sensitive data from breaches is critical.
o Solution: Employ cybersecurity professionals to ensure data security, implement encryption, identity and access
controls, endpoint protection, and real-time monitoring to safeguard data.
 Selecting the right big data tools:
o Challenge: A plethora of big data tools can be overwhelming.
o Solution: Hire a consultant to help select the right tools based on your business needs. They can choose tools that align
with your current and future requirements, such as enterprise data streaming and ETL solutions.
 Scaling systems and costs efficiently:
o Challenge: Inefficient resource utilization can lead to unnecessary costs.
o Solution: Start with clear goals, data strategies, and data types, and create policies for purging obsolete data. This
ensures efficient data processing and cost management.
 Lack of skilled data professionals:
o Challenge: Existing staff may lack big data expertise.
o Solution: Hire a big data specialist to manage and train your team or offer training to existing staff. Alternatively,
consider self-service analytics or business intelligence solutions for those without a data science background.
 Organizational resistance:
o Challenge: Resistance to change and skepticism about the value of big data.
o Solution: Begin with smaller, demonstrable projects to prove the value of big data. Gradually transition to becoming a
data-driven organization, and consider placing big data experts in leadership roles to guide the transformation.

Addressing these challenges and implementing these solutions will help organizations effectively leverage big data for insights and
business improvements.

C. Conclusion
In the 21st century, marked by the age of digitalization and the proliferation of information, Big Data has emerged as a pivotal force
influencing both the economy and society. This comprehensive report delves into the realm of Big Data, exploring its origins,
applications, challenges, and promising prospects.

First and foremost, it's essential to recognize that Big Data is more than just the processing of vast datasets; it entails the discovery,
analysis, and strategic utilization of information within data to generate value. As mentioned, countless businesses and organizations
have harnessed Big Data to enhance operational efficiency, foster innovation, and devise novel solutions for intricate issues.

The potential of Big Data is equally remarkable. With the ascent of artificial intelligence and machine learning, our capacity to dissect
and comprehend data is constantly expanding. Consequently, Big Data is poised to extend its influence further in the future, spanning
domains like healthcare, education, and the management of smart cities.

However, for Big Data to deliver maximum societal benefits, a well-prepared workforce and the right supporting policies are
imperative. Adequate training and education are instrumental in equipping individuals with the skills to access, interpret, and
effectively utilize data. Concurrently, supportive policies can create a conducive business environment for entities leveraging Big
Data.

In summation, Big Data stands as a decisive driver of modern-world development. By comprehending and capitalizing on its
advantages, we can unlock new opportunities and mold a more promising future. Yet, this also demands a collective sense of
awareness and responsibility to ensure the appropriate and beneficial use of Big Data by all.
References
bigdata, 2023. [Online]
Available at: https://vi.wikipedia.org/wiki/D%E1%BB%AF_li%E1%BB%87u_l%E1%BB%9Bn

Chrisite, Q., 2021. [Online]


Available at: https://library.shu.edu/primarysources

the 5V bigdata, 2023. [Online]


Available at: https://www.techtarget.com/searchdatamanagement/definition/5-Vs-of-big-data

Valcheva, S., 2023. [Online]


Available at: https://www.intellspot.com/secondary-data/

You might also like