Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python Part I
Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python Part I
Useful Reading:
• Chapter 4. NumPy Basics: Arrays and Vectorized Computation, Python for Data
Analysis
by Wes McKinney
• Chapter 2. Introduction to Numpy, Python Data Science Handbook, by Jake
VanderPlas
WORKING WITH PANDAS &
DATAFRAMES
Pandas
Pros:
• Provides flexible and expressive data
structures
• Easy to handle missing data
• Columns can easily be added and deleted
Cons:
• Good for several gigabytes of data
• Mostly single threaded
• Complex Group By operations
“My rule of thumb for pandas is that you should have 5 to 10 times as much
RAM as the size of your dataset”
Wes McKinney, 2017
Pandas Objects
DataFrame
Column
Series
Values
NumPy
Let’s move to the Notebook