Session3 - Analytics For Programming II - Siryani - 090524
Session3 - Analytics For Programming II - Siryani - 090524
Session 3
Thursday September 5th, 2024
Introduction NumPy
❖ NumPy, short for Numerical Python, is the fundamental package required for high performance scientific computing and data analysis
❖ ndarray, a fast and space-efficient multidimensional array providing vectorized arithmetic operations and sophisticated broadcasting capabilities
❖ Standard mathematical functions for fast operations on entire arrays of data without having to write loops
❖ Tools for reading / writing array data to disk and working with memory-mapped files
❖ Linear algebra, random number generation, and Fourier transform capabilities
❖ Tools for integrating code written in C, C++, and Fortran
Pandas Series
❖ A pandas Series is a one-dimensional array-like object containing an array of data (of any NumPy data type) and an associated array of data labels, called its index
❖ The simplest Series is formed from only an array of data
Pandas Dataframes
❖ The DataFrame has both a row and column index; it can be thought of as a dictionaries of Series (one for all sharing the same index)
• Dictionaries are a fundamental data type in the python programming language
• dictionaries store a mapping of unique keys to values. Basic operations on a dictionary include:
• Adding a new key/value pair
• Retrieving the value corresponding to a particular key
Pandas
2 Lecture 3
Present Lecture 3
Agenda - Session 3
1. Lecture 3
2. Hands-On
Lecture 3
Data Science Project Stages Reference: Book: Data Analysis and Visualization Using Python
Lecture 3 – Data Visualization in Analytics
❖ Data visualization is the process of interpreting data and presenting it in a pictorial or graphical format
❖ Data visualization helps people understand the significance of data by presenting it in a simple and easy-to-understand format
❖ Humans just understand data better through pictures rather than by reading numbers in rows and columns
❖ If the data is presented in a graphical format, people are more able to effectively find correlations
Seaborn
▪ Seaborn is a Python data visualization library based on Matplotlib that provides a high-level interface for drawing attractive and informative
statistical graphics
▪ Seaborn: http://stanford.edu/~mwaskom/software/seaborn
ggplot
▪ Python port: http://ggplot.yhathq.com
Lecture 3 – Python Visualization Libraries
Bokeh (live plots in your browser)
▪ Bokeh: http://bokeh.pydata.org/en/latest
Plotly
▪ The Plotly Python graphing library makes interactive, publication-quality graphs online
▪ Different dynamic graphs formats can be generated online or offline
Geoplotlib
▪ Geoplotlib is a toolbox for creating a variety of map types and plotting geographical data
▪ Geoplotlib needs Pyglet as an object-oriented programming interface
Pandas
▪ Pandas is a Python library written for data manipulation and analysis
Lecture 3 – Data Visualization using Python Matplotlib
"A picture is worth a thousand words." - Fred R Barnard
Lecture 3 – Anatomy of a Matplotlib figure
3 Hands-On
Let The Hands-On begin !
Python Hands-on
josephsiryani