0% found this document useful (0 votes)
13 views49 pages

(Student View) Python For DS - Week 1 Lecture

This document provides an overview of an introductory course on Python for data science. It introduces the instructors and TA crew for the course. It then covers the course agenda for the day, including an overview of the course, introductions among classmates, a discussion on the learning community, and an introduction to the first week's lecture and this week's project on data wrangling with an Airbnb dataset.

Uploaded by

FreeGyaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views49 pages

(Student View) Python For DS - Week 1 Lecture

This document provides an overview of an introductory course on Python for data science. It introduces the instructors and TA crew for the course. It then covers the course agenda for the day, including an overview of the course, introductions among classmates, a discussion on the learning community, and an introduction to the first week's lecture and this week's project on data wrangling with an Airbnb dataset.

Uploaded by

FreeGyaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Welcome to Python for Data Science!

Instructor: Amazing TA Crew: Course Manager:


Samir Sen Thankgod Egbe and Barbara Kaplan-Marnas
Amlan Patnaik
We are…

Samir Sen Thankgod Amlan Barbara


1 Course overview

2 Our class community

Today’s 3 Breakout

agenda: 4 Welcome to CoRise

5 Week 1 Lecture

6 This week’s project


Welcome to the Course!

● Hi, I’m Samir! 👋

● I’m building Flair Labs

● I’ve previously worked on research teams @ Apple and @Microsoft

● I like going on runs, world cup and playing chess!

● Ping me on Slack / Linkedin / Twitter if you’ve questions


Python for Data Science
Why Python for Data Science?

● Every tech company uses ML to power core products


Why Python for Data Science?
● Every tech company uses ML to power core products

● Python is the go-to language for ML!


Why Python for Data Science?
● Every tech company uses ML to power core products

● Python is the go-to language for ML!

● Key for becoming a Data Analyst, Data Scientist or Machine Learning Engineer
Course overview

Week 1 Week 2

Numpy Foundations Data Analysis with Pandas

● File Input/Output ● Pandas Data Structures


● Computation On Arrays ● Operating on Data in Pandas
● Aggregation ● Handling Missing Data
● Slicing, Indexing, Arrays ● Combining Datasets
1 Course overview

2 Our class community

Today’s 3 Breakout

agenda: 4 Welcome to CoRise

5 Week 1 Lecture

6 This week’s project


Motivations

I want to increase my skills I want to transition roles into data


for my current work and analytics / other

55% 32%
Companies
Roles … and range of other cool roles:

● Data Analyst ● Developer

● CEO
● Data Engineer
● Biologist
● BI Analyst
● Consultant
● Software Engineer ● Architect

● Product Manager ● Student

● Financial Analyst ● Safety Engineer

● Doctor
● Support Specialist
● Operations specialist
● Manager
● Project Coordinator
● Systems Engineer ● Journalist

● Professor

● Recruiter
1 Course overview

2 Our class community

Today’s 3 Breakout

agenda: 4 Welcome to CoRise

5 Week 1 Lecture

6 This week’s project


Meet your classmates!

Break into Share Screenshot


your groups Where are you Take a screenshot of
joining us from? your group and post in
#introductions
What is your role in
your organization?

What do you hope to get


out of this course?
1 Course overview

2 Our class community

Today’s 3 Breakout

agenda: 4 Welcome to CoRise

5 Week 1 Lecture

6 This week’s project


What’s special about CoRise?

Industry-leading instructors Global community


Learn from experts in the field with Our CoRise community brings together
real-world experience applying the professionals from top companies around
skills they are teaching you the world

Personalized support Real world applied projects


Our live sessions, dedicated teaching Projects are designed to simulate real world
staff, and individualized support will challenges so you can easily apply the skills
help you complete every course you you learn at CoRise to the work you’re
take doing
Values

Generosity Bravery
Share your expertise with others Dare to try. Be willing to take risks with
and support them along the way. your learning, knowing that our community
will support you along the way.

Perseverance Joy
When (not if) times get tough, Celebrate successes and failures,
don’t give up. Remember why they’re both a part of learning! 🎉
you are doing this, and dig deep.
Give emoji reactions and #shoutouts
for your classmates.
Making a useful slack workspace

Hey…

I have some issues. Anyone cranking on


the project right now?

Yo yo. I’m here if you need me.

Cool cool.

I’m trying to figure out what’s wrong with


my event aggregation macro. Hmmm…
Slack threading!
Community

Teaching is the best way to learn.


From this point onwards, you can be:
● Discussion starter & question
answerer
● Mentor (or mentee)
● Study group leader
● Helpful classmate
Review your classmates’ projects to
further your learning 🧠
Logistics check
● Course calendar -
corise.com/course/python-for-data-science/calendar

● CoRise platform / content

● Project Environment

● Slack - INVITE LINK HERE

○ #py-for-dsl-introductions
○ #py-for-ds-shoutouts
○ #py-for-ds-questions
○ #py-for-ds-projects
○ #py-for-ds-tips-and-tricks
○ #py-for-ds-feedback
○ #py-for-ds-announcements
You are already winning!

Deciding to invest in Making the time to .... When you attend


your learning and show-up today Jumpstart you’ll be well
career signing-up! on your way to
completing Project 1
1 Course overview

2 Our class community

Today’s 3 Breakout

agenda: 4 Welcome to CoRise

5 Week 1 Lecture

6 This week’s project


Let’s do it!
What’s Data Science All About?

● Create powerful insights with data

● Store, manage, extract, analyze, and visualize data

● Python is a versatile language with great libraries to do this!


What’s Numpy?
● Data representation library!

● Represent data as arrays or matrices


Why Numpy?
● Python lists are highly inefficient compared to numpy

● Numpy can parallelize operations on multiple elements at once.

● Numpy is the underlying package used by practically all other machine learning tools
Python Lists vs Numpy

Lists Numpy
How is Numpy faster?

1. Data stored in NumPy arrays are fixed type


2. Data processed by NumPy operations are parallelizable since they are stored in
contiguous memory
NumPy vs Lists (Fixed Type)

Int16: 00000000 00001001

NumPy
8 12 2 3

7 5 11 9
Size (Int16)
18 10 4 6
Lists Reference Count (Int32)
Object Type (Int32)
Object Value (Int64)
NumPy vs Lists (Contiguous Memory)

List:

Numpy:

*Operations can be
computed in parallel on
all of these values
NumPy is the go-to for numerical computation on a set of
data
Let’s get started with numpy!

● Let’s take a look at initializing numpy arrays

a = np.array([1,2,3], dtype='int32')

b = np.array([[9.0,8.0,7.0],[6.0,5.0,4.0]])
Other NumPy initialization methods
3D Matrix in Numpy
Image Representation with Numpy
● An image can be represented as a matrix of pixel values

● Each pixel has a value between 0-255

● Each color is represented by pixel values for each of red, green and blue components
(R, G, B)
Numpy Indexing

● list[i]

○ Slicing: list[i:i+5]

● 2d_matrix[i, j]

○ Slicing: 2d_matrix[i:i+5, j:j+5]

● 3d_matrix[i, j, k]

○ Slicing: 3d_matrix[i:i+5, j:j+5, k:k+5]

Access row of elements: 2d_matrix[i, :]


Access col of elements: 2d_matrix[:, j]
Load in Data with Numpy

genfromtxt(filename, delimiter=”,”)

"WK1_Airbnb_Amsterdam_listings_1.csv"
Matrix Reshaping

● Sometimes data is not in the right shape to perform necessary operations

● np.reshape(shape=(tuple of ints))

● np.flatten() => turns data to a flat 1D array (num_elems, )

● ^ These are your friends!

● Needs to make sure reshape dimensions maintain same total number of elements
Merging Matrices

● What happens when we need to combine data from different datasets?

● np.concatenate()

● np.stack()
Broadcasting
● With numpy, you can apply a transformation to multiple elements at once

● This is called broadcasting

● Parallelization & broadcasting optimizations in numpy make it MUCH faster than python
Neat References

● Numpy Cheat Sheet

● Data Science Handbook


1 Course overview

2 Our class community

Today’s 3 Breakout

agenda: 4 Welcome to CoRise

5 Week 1 Lecture

6 This week’s project


This week’s project - Data Wrangling with Airbnb Dataset

● Clean and investigate the dataset with numpy

● Find information from cleaned dataset

● Make transformations using numpy tools discussed in class on dataset

● Reach out on #questions in Slack with any issues!


👀 Reminders
● Project session Wednesday

● Project submission due Sunday End of Day

○ Weekly Survey (2 min)

● Project code review due Monday End of Day


Q&A
[Fin]

You might also like