0% found this document useful (0 votes)
16 views28 pages

Session3 - Analytics For Programming II - Siryani - 090524

Uploaded by

berkeal260
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views28 pages

Session3 - Analytics For Programming II - Siryani - 090524

Uploaded by

berkeal260
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Programming for Analytics II

Session 3
Thursday September 5th, 2024

Joseph Siryani, Ph.D.


Adjunct Professor, Department of Decision Sciences
George Washington University – School of Business
Agenda

Session 2 Recap 3 Hands-On


1
Recap of session 2 key topics, and address any Carry out instructor-led Python hands-on work.
questions related to lecture or hands-on.

Lecture 3 Q & A’s


2 Present Lecture 3.
4 Questions & Answers!
1 Session 2 Recap
NumPy Basics: Arrays and Vectorized Computation

Introduction NumPy
❖ NumPy, short for Numerical Python, is the fundamental package required for high performance scientific computing and data analysis
❖ ndarray, a fast and space-efficient multidimensional array providing vectorized arithmetic operations and sophisticated broadcasting capabilities
❖ Standard mathematical functions for fast operations on entire arrays of data without having to write loops
❖ Tools for reading / writing array data to disk and working with memory-mapped files
❖ Linear algebra, random number generation, and Fourier transform capabilities
❖ Tools for integrating code written in C, C++, and Fortran

The NumPy arrays have three fundamental properties


❖ shape: The shape attribute of any arrays describes its size along all of its dimensions
❖ ndim: The number of dimensions (often also called directions or axes)
❖ dtype: The data type of the array elements.
Pandas
Introduction Pandas
❖ pandas is a Python library containing high-level data structures and tools that have been created to help Python programmers to perform powerful data analysis. The
ultimate purpose of pandas is to help you quickly discover information in data, with information being defined as an underlying meaning.

Pandas are Fun! What is Pandas?


❖ Panel Data System
❖ Key components – series, dataframes
❖ Series: is a sequence of data, like a list in basic Python or a 1D NumPy array
❖ Dataframes: represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type (numeric,
string, boolean, etc.)

Pandas Series
❖ A pandas Series is a one-dimensional array-like object containing an array of data (of any NumPy data type) and an associated array of data labels, called its index
❖ The simplest Series is formed from only an array of data

Pandas Dataframes
❖ The DataFrame has both a row and column index; it can be thought of as a dictionaries of Series (one for all sharing the same index)
• Dictionaries are a fundamental data type in the python programming language
• dictionaries store a mapping of unique keys to values. Basic operations on a dictionary include:
• Adding a new key/value pair
• Retrieving the value corresponding to a particular key
Pandas
2 Lecture 3
Present Lecture 3
Agenda - Session 3

1. Lecture 3

2. Hands-On
Lecture 3

Data Science Project Stages Reference: Book: Data Analysis and Visualization Using Python
Lecture 3 – Data Visualization in Analytics

❖ Data visualization is the process of interpreting data and presenting it in a pictorial or graphical format

❖ Data visualization helps people understand the significance of data by presenting it in a simple and easy-to-understand format

❖ Data visualization helps communicating the information clearly and effectively


Lecture 3 – Why Is Data Visualization Important?

❖ A picture is worth a thousand words, as they say

❖ Humans just understand data better through pictures rather than by reading numbers in rows and columns

❖ If the data is presented in a graphical format, people are more able to effectively find correlations

❖ Data visualization helps the business to achieve numerous goals:


✔ Converting the business data into interactive graphs for dynamic interpretation to serve the business goals
✔ Transforming data into visually appealing, interactive dashboards of various data sources to serve the business
✔ Creating more attractive and informative dashboards of various graphical data representations
✔ Making appropriate decisions by drilling into the data and finding the insights
✔ Figuring out the patterns, trends, and correlations in the data being analyzed
✔ Making better, quick, and informed decisions with data visualization
Lecture 3 – Loading Python Data Visualization Libraries

python -m pip install -U pip setuptools


python -m pip install matplotlib
Lecture 3 – Python Visualization Libraries

Reference: Book: Data Analysis and Visualization Using Python


Lecture 3 – Python Visualization Libraries
Matplotlib
▪ Matplotlib is a Python 2D plotting library for data visualization built on NumPy arrays and designed to work with the broader SciPy stack
▪ It produces publication-quality figures in a variety of formats and interactive environments across platforms
▪ Matplotlib: http://matplotlib.org
▪ Gallery: http://matplotlib.org/gallery.html
▪ Frequently used commands: http://matplotlib.org/api/pyplot_summary.html

Seaborn
▪ Seaborn is a Python data visualization library based on Matplotlib that provides a high-level interface for drawing attractive and informative
statistical graphics
▪ Seaborn: http://stanford.edu/~mwaskom/software/seaborn

ggplot
▪ Python port: http://ggplot.yhathq.com
Lecture 3 – Python Visualization Libraries
Bokeh (live plots in your browser)
▪ Bokeh: http://bokeh.pydata.org/en/latest

Plotly
▪ The Plotly Python graphing library makes interactive, publication-quality graphs online
▪ Different dynamic graphs formats can be generated online or offline

Geoplotlib
▪ Geoplotlib is a toolbox for creating a variety of map types and plotting geographical data
▪ Geoplotlib needs Pyglet as an object-oriented programming interface

Pandas
▪ Pandas is a Python library written for data manipulation and analysis
Lecture 3 – Data Visualization using Python Matplotlib
"A picture is worth a thousand words." - Fred R Barnard
Lecture 3 – Anatomy of a Matplotlib figure
3 Hands-On
Let The Hands-On begin !
Python Hands-on

Work thru various datasets and Jupyter notebooks:


❖ Data Visualization using Python (matplotlib)
▪ Data: Wine Review
4 Q & A’s
Questions & Answers!
Thank You! Questions?

[email protected]

josephsiryani

You might also like