A5 Data Project Brief
A5 Data Project Brief
Objectives
This Achievement will introduce a variety of new concepts and processes, many of which fall
under the realm of advanced data analytics. Don’t worry if you don’t get everything right
away, as you’ll revisit some of these more complex topics in Achievement 6.
● The characteristics of big data, how data analysts use big data, and the challenges of
extracting knowledge from big data
● The impact of data bias and ethics on how data is used, shared, collected, and
protected
● The fundamentals of data mining, including techniques for data mining and how it
drives decision-making
● Predictive analysis and models such as linear regression
● Time-series analysis and time-series forecasting
● The basics of GitHub and how you can use it to refine your skills, collaborate with
colleagues, and display a portfolio of work
● What to include in your portfolio when applying for jobs
Page 1
Context
Data analytics is an exciting profession if you’re curious, enjoy problem solving, and want to
make a difference. Indeed, the satisfaction of being able to measure your impact and drive
decision-making is what attracts many to the profession. The amount of data that’s being
analyzed is minuscule compared to the amount of data that’s being produced. We’ve not
even begun to scratch the surface of big data, which is growing rapidly. While machine
learning can certainly help in this area, vast amounts of knowledge and insights are still
waiting to be found by those with the right skill set and know-how.
Technology and an interconnected world are vastly increasing the amount of data collected.
Everything from how much it rained in Seattle last year to how many text messages were
sent worldwide and the most-watched movies on Netflix all become part of big data.
However, the ways in which data is collected, used, and shared can be harmful to both
individuals and society. Data collected on individuals in particular comes with responsibility.
For these reasons, the data analyst should be guided by a strong ethical foundation and be
able to discuss ethical concerns with their coworkers and employer.
Besides being aware of data ethics and knowing how to raise ethical concerns with
stakeholders, the data analyst needs to know how to derive useful information from big data.
This is where data mining, predictive analytics, and time-series analysis and forecasting
come into play.
Finally, the data analyst needs to understand GitHub. Not only is GitHub the standard for any
data analyst portfolio, it’s also a great way for analysts to collaborate on projects and learn
from others in the industry. What’s more, displaying your SQL and Python skills in GitHub will
impress future employers and serve as proof that you have the skills they require when
applying for jobs.
Page 2
Exercise 1: Intro to Big Data
● Describe the characteristics of structured and unstructured data
● Identify the applications and limitations of big data
● Research software tools for handling big data
Page 3
Exercise 7: Using GitHub as an Analyst
● Create a GitHub account and repositories
● Host your SQL and Python work from Achievement 3 and 4 in GitHub
Page 4