0% found this document useful (0 votes)

18 views22 pages

Introduction

Uploaded by

kanishkisrani01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views22 pages

Introduction

Uploaded by

kanishkisrani01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

ECE 20875

Python for Data Science

Qiang Qiu, Yi Ding and Aristides Carrillo

(Adapted from material developed by Profs. Milind Kulkarni, Stanley Chan, Chris
Brinton, David Inouye, and Qiang Qiu)
what is data?
lots of different definitions

3
humans have used data forever

The oldest known mathematical artifacts

(tally stick or lunar calendar?)

4
why do we use data?

• Analyzing data helps us make decisions

and take actions

5
what has changed?

• There’s a lot more data

• Machines can also collect (and in turn
use) it
• And we’re trying to do more with it

• Google processes 3.5 billion search queries per day.

• Instagram users post 54,000 photos each minute.
• Twitter user post 3,000 tweets every second.

6
a parable of Purdue professors
Prof. Philip E. Paré (ECE) Prof. Mahsa Ghasemi (ECE)
develops models and algorithms studies efficient and reliable
for predicting and mitigating viral use of data in sequential
spread in networks using data decision-making problems

Prof. Qiang Qiu (ECE)

studies computer vision Prof. Murat Kocaoglu (ECE)
and machine learning develops algorithms for
learning causal structures
to derive actionable
insights from data.

Jennifer Neville (CS) builds

new machine learning tools
to study graphs and networks Prof. Chris Brinton (ECE) develops
algorithms for optimizing social and
Prof. Milind Kulkarni (ECE) builds systems communication networks from data
to make data analyses run faster
7
what is data science?
• Collecting data from a wide variety of sources and putting
them into a consistent format?
• Making observations about patterns in data?
• Visualizing trends in data?
• Identifying similarities between data points?
• Making predictions about what will happen in the future?
• Prescribing courses of action to take based on forecasts?
• Developing new machine learning and data mining
algorithms?
• Accelerating analysis algorithms?
8
data science is a lot of things
making predictions identifying patterns in data
from data

building systems dealing with

for data analysis privacy concerns
visualizing data

analyzing data
collecting/organizing data interpreting data

ethics writing data analyses

9
data science is a lot of things
making predictions identifying patterns in data
from data

building systems dealing with

for data analysis privacy concerns
visualizing data

analyzing data
collecting/organizing data interpreting data

ethics writing data analyses

10
what industries has it impacted?
• Hard to think of one that is not being impacted
by data science!
• Medicine: Analytics from wearable trackers,
studying disease patterns, …
• Retail: Analyzing consumer behavior, predicting
customer satisfaction, …
• Transportation: Assisted/autonomous
navigation, predicting equipment failures, …
• Education: Tracking student engagement,
personalizing learning content, …
11
what about Python?
• General purpose programming language,
first appeared in the 90s
• Easily recognized by use of whitespace
indentation rather than { } brackets to
enhance readability
• Becoming the industry standard for data
science (displacing R?)
• Many useful, open-source libraries: numpy,
pandas, matplotlib, pytorch
• And standard control functions (e.g., loops)
from lower-level languages to help
structure programs
12
what about Python?
landscape
• This is an introductory programming and
statistics course that emphasizes data
science problems with some math
• Other data science courses in ECE, e.g.,
• ECE 30010 - Introduction to Machine
Learning and Pattern Recognition
• ECE 47300 - Introduction to Artificial
Intelligence
• ECE 57000 - Artificial Intelligence
• ECE 59500 - Machine Learning I
• But data science is a Purdue-wide initiative!
14
syllabus break!

15
some data analysis examples

16
data analysis in “practice”
• Let’s say we have a data set of applicants to Purdue

Name High school GPA SAT Math SAT R/W Residence

Jane Doe 4.7 760 700 Indiana
Purdue Pete 3.5 680 620 Indiana
B. O. Iler 3.0 800 650 Michigan
Engy Neer 4.2 750 590 North Carolina
Mark Faller 3.8 780 550 New Jersey
… … … … …

• What might we want to learn about them?

17
descriptive statistics
• Which students come from which states?
• What is the distribution of GPAs? SAT scores? 50

• GPAs may need to be normalized to a 40

consistent range across all schools 30

20
• Can build histograms, e.g., for the GPAs 10

• But how do we know how big to make the 0

buckets? 2.5–3.0 3.0–3.5 3.5–4.0 4.0+

18
reasoning about data
• How do Purdue applicants compare to the national average?
• Mean GPA of applicants: 3.6
• Is this high or low?
• Can sample GPA of all high school students
• Suppose we collect 1000 GPAs and find a mean of 3.4
• Does this mean Purdue students have a higher GPA on average?
• Need more information! In particular …
• Was the sampling method we used unbiased?
• What is the variance of the sample collected (i.e., the spread of GPAs)?
• What confidence interval can be built for the population mean (i.e., what is the likely range of
the true mean GPA)?

19
making predictions
• Can we predict how successful a particular applicant
might be at Purdue?
• How do we define success? GPA?
• Idea: Look at the application statistics of the current
seniors and see if there is a relationship between these
statistics and their current GPA
• One way to find a relationship is using linear regression
• Might tell you something like: “a Purdue student’s
GPA can be predicted mostly by their high school
GPA, with their SAT score having a lighter influence”
• Many other prediction algorithms exist too
20
classification
• Can we make admissions decisions quicker through
automation?
• Idea: Compare each applicant’s statistics to past applicants
that were admitted, and to those that were rejected
• Train a classifier to analyze these past applicants and
maximize the ability to predict whether a student would be
accepted or not
• For example, a k-nearest neighbor classifier would assess
whether a given applicant is more similar to the pool of
admitted applicants or to the rejected applicants
• Why might we run into trouble here?

21
clustering
• What if we want to identify groups of students
beyond “admitted” vs. “rejected”?
• Idea: See if students cluster together according
to some measure of distance
• Some students look more like “nearby”
students than students that are “far away”
• Important question: What features of students
should be considered for the clustering?
• E.g., maybe don’t consider something like hair
color!
• With k-means clustering, k groups of students
would be extracted based on “closeness”

23PCSC10 Data Science and Analytics
No ratings yet
23PCSC10 Data Science and Analytics
118 pages
CHAPTER 1 Grading System
100% (2)
CHAPTER 1 Grading System
22 pages
Data Science 1
100% (4)
Data Science 1
133 pages
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
100% (10)
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
227 pages
CS3352 - Foundations of Data Science
No ratings yet
CS3352 - Foundations of Data Science
142 pages
Python For Data Science 2025 Slides
No ratings yet
Python For Data Science 2025 Slides
364 pages
Algorithms For Data Science 1st Brian Steele (WWW - Ebook DL - Com)
94% (16)
Algorithms For Data Science 1st Brian Steele (WWW - Ebook DL - Com)
438 pages
Hypothesis Testing Roadmap PDF
50% (2)
Hypothesis Testing Roadmap PDF
2 pages
Data Science Book
No ratings yet
Data Science Book
383 pages
CS3352 FDS
No ratings yet
CS3352 FDS
23 pages
Getting Started With Data Science: Grade VIII
No ratings yet
Getting Started With Data Science: Grade VIII
32 pages
MTECH Handbook
No ratings yet
MTECH Handbook
18 pages
Introduction To Datascience (R20DS501)
No ratings yet
Introduction To Datascience (R20DS501)
162 pages
Module 1
No ratings yet
Module 1
140 pages
FODS Full Notes
No ratings yet
FODS Full Notes
217 pages
Core Concept - Interprofessional Education and Collaborative Practice
No ratings yet
Core Concept - Interprofessional Education and Collaborative Practice
18 pages
1VuHongDuyen - Portfolio 5
No ratings yet
1VuHongDuyen - Portfolio 5
7 pages
Datascience Notes
No ratings yet
Datascience Notes
161 pages
Data Science
No ratings yet
Data Science
244 pages
Lecture 1 - Introduction To Data Science
No ratings yet
Lecture 1 - Introduction To Data Science
12 pages
Tourettes Syndrome
No ratings yet
Tourettes Syndrome
12 pages
Part 1 Lectures
No ratings yet
Part 1 Lectures
100 pages
Poverty and Its Measurement: The Presentation of A Range of Methods To Obtain Measures of Poverty
No ratings yet
Poverty and Its Measurement: The Presentation of A Range of Methods To Obtain Measures of Poverty
39 pages
Career Research Report 1
No ratings yet
Career Research Report 1
9 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
Strategy Analysis and Choice: Strategic Management: Concepts and Cases. 9 Edition
No ratings yet
Strategy Analysis and Choice: Strategic Management: Concepts and Cases. 9 Edition
71 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
2024 GEB Presentation
No ratings yet
2024 GEB Presentation
56 pages
1 Introduction-To-Data-Science
No ratings yet
1 Introduction-To-Data-Science
43 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Lec1 - For Upload Complete
No ratings yet
Lec1 - For Upload Complete
111 pages
CAS CS 565, Data Mining
No ratings yet
CAS CS 565, Data Mining
30 pages
CH 1
No ratings yet
CH 1
34 pages
FIT1043 - Lecture 1 - 2024 Data Science
No ratings yet
FIT1043 - Lecture 1 - 2024 Data Science
66 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
Unit 1
No ratings yet
Unit 1
84 pages
Data Science - Unit 1 MDM
No ratings yet
Data Science - Unit 1 MDM
64 pages
Behaviour of Reinforced Concrete Beams With Coconut Shell As Coarse Aggregates
No ratings yet
Behaviour of Reinforced Concrete Beams With Coconut Shell As Coarse Aggregates
7 pages
CE880 Lecture 1 Slides
No ratings yet
CE880 Lecture 1 Slides
30 pages
DS Unit 1 - NUMPY
No ratings yet
DS Unit 1 - NUMPY
29 pages
Unit 1 - DA - Introduction To Data Science
No ratings yet
Unit 1 - DA - Introduction To Data Science
70 pages
GE 461 Introduction To Data Science: Spring 2021
No ratings yet
GE 461 Introduction To Data Science: Spring 2021
39 pages
NTT Technical Review September 2022 Vol. 20 No. 9
No ratings yet
NTT Technical Review September 2022 Vol. 20 No. 9
89 pages
Unit 1 - DA - Introduction To Big Data
No ratings yet
Unit 1 - DA - Introduction To Big Data
65 pages
Data Science An Introduction
No ratings yet
Data Science An Introduction
5 pages
Data Science
No ratings yet
Data Science
35 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Data Science & Aiml (Mile Stone Solution)
No ratings yet
Data Science & Aiml (Mile Stone Solution)
37 pages
KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
No ratings yet
KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
51 pages
Types of Digital Data
No ratings yet
Types of Digital Data
22 pages
Data Science
No ratings yet
Data Science
40 pages
Big Data Analytics: Data Scientists Are in High Demand
No ratings yet
Big Data Analytics: Data Scientists Are in High Demand
32 pages
Data Mining and BI - Student Notes 2
No ratings yet
Data Mining and BI - Student Notes 2
40 pages
UNIT I - Introduction - DataScience - New
No ratings yet
UNIT I - Introduction - DataScience - New
34 pages
Industrialreport
No ratings yet
Industrialreport
26 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
25 pages
Project Report
No ratings yet
Project Report
29 pages
Parental Influence On Children in Developing Professional Skills
No ratings yet
Parental Influence On Children in Developing Professional Skills
15 pages
A Comparative Study On Teacher Leadership in Special Education Classroom Between China and Malaysia
No ratings yet
A Comparative Study On Teacher Leadership in Special Education Classroom Between China and Malaysia
5 pages
Ch08 (Hypothesis Testing)
No ratings yet
Ch08 (Hypothesis Testing)
28 pages
Carmichael MArron 2018 OJO
No ratings yet
Carmichael MArron 2018 OJO
22 pages
Module 1 Introduction To Statistics and Data Analysis Math403 2020 PDF
No ratings yet
Module 1 Introduction To Statistics and Data Analysis Math403 2020 PDF
29 pages
Introduction To Data Science - Module 1
No ratings yet
Introduction To Data Science - Module 1
4 pages
Data Science & AIML Coursework
No ratings yet
Data Science & AIML Coursework
10 pages
Research Paper - Dinwang
No ratings yet
Research Paper - Dinwang
16 pages
Homework Solutions Chegg
100% (1)
Homework Solutions Chegg
4 pages
Financial Econometrics-Malama
No ratings yet
Financial Econometrics-Malama
14 pages
ECON1005 Tutorial Sheet 6
No ratings yet
ECON1005 Tutorial Sheet 6
3 pages
Birla Institute of Technology & Science, Pilani Work Integrated Learning Programmes Digital
No ratings yet
Birla Institute of Technology & Science, Pilani Work Integrated Learning Programmes Digital
9 pages
Baumer - 2015 - A Data Science Course For Undergraduates Thinking
No ratings yet
Baumer - 2015 - A Data Science Course For Undergraduates Thinking
10 pages
Case Study On The Adaptive Teaching Mechanism of Subject Teacher Educators Under The Background of New Normal
No ratings yet
Case Study On The Adaptive Teaching Mechanism of Subject Teacher Educators Under The Background of New Normal
5 pages
Introduction To Data Science Course Outline
No ratings yet
Introduction To Data Science Course Outline
5 pages
CU Data Science
No ratings yet
CU Data Science
8 pages
International Business Assignment - Ernest Dei
No ratings yet
International Business Assignment - Ernest Dei
18 pages
Bioanalytical Services in USA
No ratings yet
Bioanalytical Services in USA
2 pages
Introduction To Data Science: Cpts 483-06 - Syllabus
No ratings yet
Introduction To Data Science: Cpts 483-06 - Syllabus
5 pages
Protocol Template: Systematic Review
No ratings yet
Protocol Template: Systematic Review
4 pages
The Role of Attention in Horizontal Curves A Comparison of Advance Warning Delineation and Road Marking Treatments - Samuel G. Charlton
No ratings yet
The Role of Attention in Horizontal Curves A Comparison of Advance Warning Delineation and Road Marking Treatments - Samuel G. Charlton
13 pages
Company Law II - 1st Internal Assessment
No ratings yet
Company Law II - 1st Internal Assessment
12 pages
HSNC University - NEP Statistics - Course Stucture & Titles
No ratings yet
HSNC University - NEP Statistics - Course Stucture & Titles
5 pages
Applied Tech Curriculum (Grades 10-12+) : Python Programming
No ratings yet
Applied Tech Curriculum (Grades 10-12+) : Python Programming
1 page
Data Science & Analytics: Course Code: CSE3105 Credits: 02 Credit Hours: 02/week Exam Hours: 03
No ratings yet
Data Science & Analytics: Course Code: CSE3105 Credits: 02 Credit Hours: 02/week Exam Hours: 03
2 pages
Foundation Program Course Outline (B) 202203
No ratings yet
Foundation Program Course Outline (B) 202203
8 pages
Maulia 2020 J. Phys. Conf. Ser. 1477 042016
No ratings yet
Maulia 2020 J. Phys. Conf. Ser. 1477 042016
7 pages
Template 1st ICGC
No ratings yet
Template 1st ICGC
3 pages
Practical Data Analysis
From Everand
Practical Data Analysis
Hector Cuesta
4.5/5 (14)
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
Machine Learning Unraveled: Exploring the World of Data Science and AI
From Everand
Machine Learning Unraveled: Exploring the World of Data Science and AI
Alex Murphy
No ratings yet

Introduction

Uploaded by

Introduction

Uploaded by

ECE 20875

Python for Data Science

The oldest known mathematical artifacts

• Analyzing data helps us make decisions

• There’s a lot more data

• Google processes 3.5 billion search queries per day.

Prof. Qiang Qiu (ECE)

Jennifer Neville (CS) builds

building systems dealing with

ethics writing data analyses

building systems dealing with

ethics writing data analyses

Name High school GPA SAT Math SAT R/W Residence

• What might we want to learn about them?

• GPAs may need to be normalized to a 40

• But how do we know how big to make the 0

You might also like