0% found this document useful (0 votes)
404 views12 pages

Data Analysis With Python

This document discusses data analysis with Python. It covers importing and exporting data, getting insights from data by checking types and distributions, and accessing databases. The key steps are: 1) Importing data like CSV files into Pandas dataframes for analysis. 2) Getting basic insights by checking data types with dtypes and distributions with describe(). 3) Accessing databases using a SQL API - programs connect via an API and execute queries returning result sets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
404 views12 pages

Data Analysis With Python

This document discusses data analysis with Python. It covers importing and exporting data, getting insights from data by checking types and distributions, and accessing databases. The key steps are: 1) Importing data like CSV files into Pandas dataframes for analysis. 2) Getting basic insights by checking data types with dtypes and distributions with describe(). 3) Accessing databases using a SQL API - programs connect via an API and execute queries returning result sets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

DATA ANALYSIS WITH PYTHON

BY IBM ON COURSERA

WHAT IS DATA ANALYSIS?

- Data analysis/data helps us answer questions from data


- Unlock the information and insights from raw data to answer our questions.
- Roles:
o Discovering useful information
o Answering questions
o Predicting future or the unknown

Example: Setting a price for the new car before selling

What we should consider as a data analyst:

a. Is there data on the prices of other cars and their characteristics?


b. What features of cars affect their prices?
 Color? Brand? Horsepower? Something else?
1. Understanding the data

Sample dataset

This is CSV format, each of the value is seperated by commas. The first row is a header
which contains a column name for each of the columns

Target Label is the one we want to predict.


2. Python Packages for Data Science
A python library is a collection of functions and methods that allow you to perform a lots of actions
without writing any code. The libraries usually contain built in modules providing different
functionalities which can be used directly.
a. Scientifics Computing Libraries:
o Pandas: offers data structures and tools for effective data manipulation and analysis
It provides facts, access to structured data. The primary instrument of Pandas is the
two dimensional table consisting of column and row labels which is called
dataframe. It is designed to provide easy indexing functionality.
o Numpys: uses arrays for its inputs and outputs. It can be extended to objects for
matrices ad with minor coding changes, devs can perform fast array processing.
o SciPy: includes functions for some advanced math problems, as well as data
visualization, optimization

b. Visualiztion Libraries: showing meaningful results of analysis, allow you to create graphs, charts
and maps.
o Matplotlib (plots & graphs, most popular): great for making graphs and plots, graphs
are also highly customizable.
o Seaborn (plots: heat maps, time series, violin plots)

c. Algorithmic Libraries: machine learning algorithms, we develop a model using our dataset and
obtain predictions
o Scikit-learn (Machine Learning: regression, classification,…): contains tools statistical
modeling, incluing regreesion, classification, clustering and so on.
o Statsmodels (Explore data, estimate statistical models, perform statistical tests):

IMPORTING AND EXPORTING DATA IN PYTHON

1. Importing Data:
- Process of loading and reading data into Python from various resources
- Two important properties:
o Format: the way data is encoded
e.g various formats; .csv, .json, .xlsx, .hdf
o File Path of dataset: where the data is stored.
Computer: /Desktop/mydata.csv
Internet: url..
https://archive.ics.uci.edu/autos/imports-85.data
Each row is one data point.

2. Importing a CSV into Python using Pandas:


In Panda, the read_CSV method can be read in files with columns separated by commas into
a pandas data frame.
a. Importing a CSV into Python using Pandas:
import pandas as pd (import pandas)
url = “https://archive.ics.uci.edu/autos/imports-85.data” (define a file variable with a file
path)
df = pd.read_csv(url) (use the read_csv method to read the data)
 read_csv assumes the data contains a header
b. Importing a CSV without a header
mport pandas as pd
url = “https://archive.ics.uci.edu/autos/imports-85.data”
df = pd.read_csv(url, header = None)
c. Printing the dataframe in Python
o df prints the entire dataframe not recommended for large datasets)
o df.head(n) to show the first n rows of data frame
o df.tail(n) shows the bottom n rows of data frame

Print the first 5 rows of the dataset

df.head()

header = none so python will automatically set the column header as a list of integers.
d. Adding headers
o Replace default header (by df.columns = headers)
o We can assign column names in pandas;
headers = [“symboling, “fuel-type”, “height”, “weight”]

df. columns= headers  to replace the default integer headers by the list
e. Exporting a Pandas dataframe to CSV
o Preserve progress anytime by saving modified dataset using
o Export pandas dataframe to a new csv file

e.g

path = “C:/Windows/../automobile.csv” (specify a path)

df.to_csv(path) (us to_csv)

f. Exporting to different formats in Python

GETTING STARTED ANALYZING DATA IN PYTHON

1. Basic insights from the data


- understand the data before analysis
- Checklist:
o Data Types
o Data Distribution
- Locate potential issues with the data
a. Data types:

Pandas Type vs Native Python Type

Why check data types?

opotential info and type mismatch: pandas automatically assigns types based on the
encoding it detects from the original database, these assigments can be wrong
o compatibility with python methods
b. Check datat types:
o In pandas, we use dataframe.dtypes to check data types
df.dtypes
this will return
c. dataframe.describe()
o Return a statistical summary
df.describe()

The statisticsal metrics can tell the data scientist of there are mathematical issues that may
exist such as extreme outliers and large deviations.

The dataframe.describe functions skips rows and columns that do not contain numbers.

d. dataframe.describe(include = “all”)
o Provides full summary statistics by adding an argument “include = all”
df.descruve(include= “LL)
Notes:

- std: unique: is the number of distinct objects in the column


- top: is the most frequently occurring object
- freq : is the number of times the top object appears in the column.
- mean, std (standard deviation), maximumm/mininum value, boundary of each of the
quartiles.
- NaN: not a number
e. dataframe.info()
dataframe.info() provides a concise summary of the Dataframe
This function shows the top 30 rows and bottom 30 rows of the dataframe

ACCESSING DATABASES WITH PYTHON

Typical way to access database using Jupyter notebook. python program communicates with the DBMS. The
Python code connects to the database using API calls.

DBMS: database management system

1. What is a SQL API?


- API: an Application Programming Interface is a set of functions that can be called to get
access to some type of servivce.
- The SQL API consists of library function calls as an application programming interface, API,
for the DBMS. To pass SQL statements to the DBMS, an application program calls
functions in the API, and it calls other functions to retrieve query results and status
information from the DBMS.
- The basic operation of a typical SQL API is illustates in the figure.
- The application proram begins its database access with one or more API calls that connect
the program to the DBMS. To send the SQL statements to the DBMS, the program builds
the statement as a text string in a buffer and then makes an API call to pass the buffer
contents to the DBMS.
- The application program makes API calls to check the status of its DBMS request and to
handle error
- The application program ends its databse access with an API call that disconnects it from
the database.
2. What is DB-API?

DB-API is Python’s standard API for accessing relational databases. It is a standard that allows
writing a single program that works with multiple kinds of relational databases instead of writing
a separate program for each one.
a. Concepts of the Python DB API?
Two main concepts in the Python DB API
o Connection Objects
o Query Objects

Cursor Objects:

 Run Database Queries

Connection Objects:

 Connect to a database
 Manage transaction
3. What are Connection methods?
Methods used with connection objects
 cursor(): returns a new cursor object using the connection
 commit(): is used to commit any pending transaction to the database
 rollback(): causes the database to roll back to the start of any pending transaction
 close() is used to close a database connection
4. Writing code using DB-API

- First, import database module by using the connect API from that module
- To open a connection to the database, use the connection function and pass in the
parameters which are the database name, username, password. The connect function
returns connection object.
- Create a cursor object on the connectionn object. The cursor is used to run queries and
fetch results.
- After running the queries using the cursor, we also use the cursor to fetch the results of
the query.
- Finally, when the system is done running the queries, it frees all resources by closing the
connection.
- Note: it is always important to close connections to avoid unused connections taking up
resources

You might also like