Data Analysis With Python
Data Analysis With Python
BY IBM ON COURSERA
Sample dataset
This is CSV format, each of the value is seperated by commas. The first row is a header
which contains a column name for each of the columns
b. Visualiztion Libraries: showing meaningful results of analysis, allow you to create graphs, charts
and maps.
o Matplotlib (plots & graphs, most popular): great for making graphs and plots, graphs
are also highly customizable.
o Seaborn (plots: heat maps, time series, violin plots)
c. Algorithmic Libraries: machine learning algorithms, we develop a model using our dataset and
obtain predictions
o Scikit-learn (Machine Learning: regression, classification,…): contains tools statistical
modeling, incluing regreesion, classification, clustering and so on.
o Statsmodels (Explore data, estimate statistical models, perform statistical tests):
1. Importing Data:
- Process of loading and reading data into Python from various resources
- Two important properties:
o Format: the way data is encoded
e.g various formats; .csv, .json, .xlsx, .hdf
o File Path of dataset: where the data is stored.
Computer: /Desktop/mydata.csv
Internet: url..
https://archive.ics.uci.edu/autos/imports-85.data
Each row is one data point.
df.head()
header = none so python will automatically set the column header as a list of integers.
d. Adding headers
o Replace default header (by df.columns = headers)
o We can assign column names in pandas;
headers = [“symboling, “fuel-type”, “height”, “weight”]
df. columns= headers to replace the default integer headers by the list
e. Exporting a Pandas dataframe to CSV
o Preserve progress anytime by saving modified dataset using
o Export pandas dataframe to a new csv file
e.g
opotential info and type mismatch: pandas automatically assigns types based on the
encoding it detects from the original database, these assigments can be wrong
o compatibility with python methods
b. Check datat types:
o In pandas, we use dataframe.dtypes to check data types
df.dtypes
this will return
c. dataframe.describe()
o Return a statistical summary
df.describe()
The statisticsal metrics can tell the data scientist of there are mathematical issues that may
exist such as extreme outliers and large deviations.
The dataframe.describe functions skips rows and columns that do not contain numbers.
d. dataframe.describe(include = “all”)
o Provides full summary statistics by adding an argument “include = all”
df.descruve(include= “LL)
Notes:
Typical way to access database using Jupyter notebook. python program communicates with the DBMS. The
Python code connects to the database using API calls.
DB-API is Python’s standard API for accessing relational databases. It is a standard that allows
writing a single program that works with multiple kinds of relational databases instead of writing
a separate program for each one.
a. Concepts of the Python DB API?
Two main concepts in the Python DB API
o Connection Objects
o Query Objects
Cursor Objects:
Connection Objects:
Connect to a database
Manage transaction
3. What are Connection methods?
Methods used with connection objects
cursor(): returns a new cursor object using the connection
commit(): is used to commit any pending transaction to the database
rollback(): causes the database to roll back to the start of any pending transaction
close() is used to close a database connection
4. Writing code using DB-API
- First, import database module by using the connect API from that module
- To open a connection to the database, use the connection function and pass in the
parameters which are the database name, username, password. The connect function
returns connection object.
- Create a cursor object on the connectionn object. The cursor is used to run queries and
fetch results.
- After running the queries using the cursor, we also use the cursor to fetch the results of
the query.
- Finally, when the system is done running the queries, it frees all resources by closing the
connection.
- Note: it is always important to close connections to avoid unused connections taking up
resources