FIT1043 - Lecture 2 - 2024 Slides
FIT1043 - Lecture 2 - 2024 Slides
Mahsa Salehi
Semester 2, 2024
Additional resources
Job Advertisements:
► communication skills and domain expertise are rated highly
► different jobs require different toolset skills
► see Adzuna’s CV upload page for an interesting application!
Unit Overview in Our Standard Value Chain
Collection: getting the data
Engineering: storage and computational resources
Governance: overall management of data
Wrangling: data preprocessing, cleaning
Analysis: discovery (learning, visualisation, etc.)
Presentation: arguing that results are significant and useful
Operationalisation: putting the results to work
Weeks 9-10
Week 3
Week 4
Weeks 5-7
Week 11 Tools for
Weeks 2&8 data science
Week 12
Assessments Overview
Assessments:
• Assignment 1 (Weeks 2,3,4)
• Assignment 2 (Weeks 2-7)
• Assignment 3 (Weeks 8,9, 10)
• Final Exam (Weeks 1-12)
Weeks 9-10
Week 3
Week 4
Weeks 5-7
Week 11 Tools for
Weeks 2&8 data science
Week 12
Unit Schedule
Week Activities Assignments
1 Overview of data science
6 Regression analysis
science
► Explain and interpret given Python codes
► Comprehend the concept of a dataframe
► Work with data using data pre-processing commands
such as aggregating
Introduction to Python for Data
Science
From Python Data Science Handbook by
J. Vanderplas
The 2023 Top Programming
Languages
1 8
18
2020
A desktop
graphical user
interface (GUI) to
use Anaconda
Poll
What is .ipynb?
► Integers
► Floating-Point Numbers
► Boolean
► True/False
► Strings
Integers (int)
int x;
>>> x = 10
>>> print(type(x))
Tuple Dictionary
► Tuples are identical to lists ► Dictionary is similar to a list in
in all aspects except that the that it is a collection of objects.
content are immutable (fixed).
► Only difference is that list is
► Tuples are defined by round ordered and indexed by their
Conditions Iterations
if <expr>: while <expr>:
<statement> <statement(s)>
elif <expr>:
<statement(s)> Python for loops link
elif <expr>:
<statement(s)>
else:
<statement(s)>
>>> X = data[["Age"]]
>>> print(X)
Usual 1st Step upon Obtaining
Data
►A description or a summary of it.
>>> df = pd.DataFrame(data)
>>> print(df)
>>> df.describe()
Working with DataFrames (Basic)
mahsasalehi868
Save the Data
► Assuming you just want to analyse a part of the data and you
want to save a resulting data frame to a CSV file.
>>> df.loc[df[‘Pclass'] == 1]
>>> df.loc[df['Survived'] == 1]
>>> df[:5]
>>> df[3:10]
► If
we only want certain columns, e.g. Age, Name, Sex,
Survived
>>> df.loc[:,
('Age','Name','Sex','Survived')]
Aggregating
>>> df['Fare'].sum()
4385.095600000001
>>> df['Age'].mean()
28.141507936507935
>>> df.groupby('Sex')['Age'].mean()
Sex
female 24.468085
male 30.326962
Name: Age, dtype: float64
Aggregation and groupby
Split
Input Apply (mean)
Gender Age
Gender Age
female 38 Gender Age Combine
male 22
female 26 female 33 Class Average
female 38 Age
female 35
female 26 female 33
>>>df.loc[df['Survived']==1].groupby('Sex')['Age'].mean()
Sex
female 26.265625
male 23.314444
Name: Age, dtype: float64
A. An array.
B. A list.
C. A theory about data.
D. A structure that stores tabular data mahsasalehi868
Learning Outcomes (Recap)
such as aggregating
Next few weeks