0% found this document useful (0 votes)

47 views3 pages

#1 - Skill Builds - Data Analysis With Python

The document discusses how to analyze and summarize a dataset containing automobile data. It reads the data from an online CSV file and stores it in a dataframe. It then previews the data, sets column headers, drops missing values, and checks the data types and distributions of columns. Functions like .head(), .tail(), .dtypes, .describe() and .info() are used to analyze various properties of the dataframe. The document also discusses how to save the dataframe to other file formats like JSON, Excel etc.

Uploaded by

Gregory

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views3 pages

#1 - Skill Builds - Data Analysis With Python

Uploaded by

Gregory

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

#

Import pandas library
import pandas as pd

# Read the online file by the URL provides above, and assign it to variab
"df"
other_path = "https://s3-api.us-geo.objectstorage.softlayer.net/cf-
courses-data/CognitiveClass/DA0101EN/auto.csv"
df = pd.read_csv(other_path, header=None)

# show the first 5 rows using dataframe.head() method
print("The first 5 rows of the dataframe")
df.head(5)

# print("The last 10 rows of the dataframe\n")
df.tail(10)

# create headers list
headers = ["symboling","normalized-losses","make","fuel-
type","aspiration", "num-of-doors","body-style",
         "drive-wheels","engine-location","wheel-base", "length","width","
height","curb-weight","engine-type",
         "num-of-cylinders", "engine-size","fuel-
system","bore","stroke","compression-ratio","horsepower",
         "peak-rpm","city-mpg","highway-mpg","price"]
print("headers\n", headers)

#We replace headers and recheck our data frame

df.columns = headers
df.head(10)

#we can drop missing values along the column "price" as follows
df.dropna(subset=["price"], axis=0)

# Write your code below and press Shift+Enter to execute
print(df.columns)

# if you would save the dataframe df as automobile.csv to your local

machine, you may use the syntax below:
df.to_csv("automobile.csv", index=False)
We can also read and save other file formats, we can use similar functions to pd.read_csv() and
df.to_csv() for other data formats, the functions are listed in the following table:

Data Formate Read Save

csv pd.read_csv() df.to_csv()
json pd.read_json() df.to_json()
excel pd.read_excel() df.to_excel()
hdf pd.read_hdf() df.to_hdf()
sql pd.read_sql() df.to_sql()
... ... ...

Data Types
Data has a variety of types.
The main types stored in Pandas dataframes are object, float, int, bool and datetime64. In order
to better learn about each attribute, it is always good for us to know the data type of each
column. In Pandas:

df.dtypes

returns a Series with the data type of each column.

# check the data type of data frame "df" by .dtypes
print(df.dtypes)

# If we would like to get a statistical summary of each column, such as

count, column mean value, column standard deviation, etc. We use the
describe method:
df.describe()

You can add an argument include = "all" inside the bracket. Let's try it again.

# describe all the columns in "df"
df.describe(include = "all")

You can select the columns of a data frame by indicating the name of each column, for example,
you can select the three columns as follows:

dataframe[[' column 1 ',column 2', 'column 3']]

Where "column" is the name of the column, you can apply the method ".describe()" to get the
statistics of those columns as follows:

dataframe[[' column 1 ',column 2', 'column 3'] ].describe()

df[['length', 'compression-ratio']].describe()

Another method you can use to check your dataset is:

# look at the info of "df"
df.info

Data Mining
No ratings yet
Data Mining
10 pages
City Cycle Fuel Consumption 2024
No ratings yet
City Cycle Fuel Consumption 2024
23 pages
Quikr Car Price Prediction Using Linear Regression 1717999953
No ratings yet
Quikr Car Price Prediction Using Linear Regression 1717999953
12 pages
car-price-prediction-1 (1)
No ratings yet
car-price-prediction-1 (1)
24 pages
Data Analysis Report
No ratings yet
Data Analysis Report
74 pages
GmPrac1 - Jupyter Notebook
No ratings yet
GmPrac1 - Jupyter Notebook
11 pages
nalysis-manipulation-and-cleaning
No ratings yet
nalysis-manipulation-and-cleaning
15 pages
Grafik
No ratings yet
Grafik
4 pages
elite-sports-cars-eda
No ratings yet
elite-sports-cars-eda
9 pages
Exp_5_Exploratory_Data_Analysis_sdk_ok
No ratings yet
Exp_5_Exploratory_Data_Analysis_sdk_ok
13 pages
Car Price Prediction
No ratings yet
Car Price Prediction
35 pages
Model
No ratings yet
Model
164 pages
Lab Assignment 6
No ratings yet
Lab Assignment 6
5 pages
Ads Lab Manual
No ratings yet
Ads Lab Manual
63 pages
Practical Example Full Notes
No ratings yet
Practical Example Full Notes
48 pages
Car Price Prediction
No ratings yet
Car Price Prediction
72 pages
Note
No ratings yet
Note
9 pages
se python_merged (1) (1) (1)
No ratings yet
se python_merged (1) (1) (1)
77 pages
'Horsepower' "?" 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower'
No ratings yet
'Horsepower' "?" 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower'
5 pages
DSBDA1
No ratings yet
DSBDA1
5 pages
Data Analysis
No ratings yet
Data Analysis
58 pages
BDA-4 EDA Project
No ratings yet
BDA-4 EDA Project
19 pages
Advance EDA & Predictive Analytics
No ratings yet
Advance EDA & Predictive Analytics
38 pages
Project 8 Predictive Analytics - Ipynb - Colaboratory
No ratings yet
Project 8 Predictive Analytics - Ipynb - Colaboratory
8 pages
DV ca-1
No ratings yet
DV ca-1
9 pages
Car prediction - Colab
No ratings yet
Car prediction - Colab
8 pages
Machine Learning With Python - Part-2
No ratings yet
Machine Learning With Python - Part-2
27 pages
Statisitics Project 3
No ratings yet
Statisitics Project 3
22 pages
DS_on_MTCARS_Solutions
No ratings yet
DS_on_MTCARS_Solutions
3 pages
2
No ratings yet
2
6 pages
Data Wrangling
No ratings yet
Data Wrangling
24 pages
Internship
No ratings yet
Internship
23 pages
Course2 - DataAnalysis With Python - Week3 - Exploratory Data Analysis
No ratings yet
Course2 - DataAnalysis With Python - Week3 - Exploratory Data Analysis
23 pages
Untitled 0
No ratings yet
Untitled 0
3 pages
R Notebook: "Mtcars - CSV"
No ratings yet
R Notebook: "Mtcars - CSV"
4 pages
Statisitics Project 7
No ratings yet
Statisitics Project 7
22 pages
Python Lab Manual
No ratings yet
Python Lab Manual
49 pages
Drop the columns _id_ and _Unnamed_ 0_ from axis...
No ratings yet
Drop the columns _id_ and _Unnamed_ 0_ from axis...
3 pages
Mohy - Jupyter Notebook
No ratings yet
Mohy - Jupyter Notebook
3 pages
Assignment
No ratings yet
Assignment
49 pages
vertopal.com_Numpy,,Pandas(24.4.25)
No ratings yet
vertopal.com_Numpy,,Pandas(24.4.25)
1 page
Print Print Print Print Print Print Print Print Int Input 1 2 3 4 5 6 7
No ratings yet
Print Print Print Print Print Print Print Print Int Input 1 2 3 4 5 6 7
2 pages
Problem Statement Is To Predict Price Column Based On Data With 24 Columns With Over 200 Data Entries Using Linear Regression
No ratings yet
Problem Statement Is To Predict Price Column Based On Data With 24 Columns With Over 200 Data Entries Using Linear Regression
5 pages
Untitled 21
No ratings yet
Untitled 21
6 pages
Data Frames and Charts 2: 2.1 Dealing With Missing Values
No ratings yet
Data Frames and Charts 2: 2.1 Dealing With Missing Values
12 pages
Python Codes
No ratings yet
Python Codes
17 pages
Car Price Prediction
No ratings yet
Car Price Prediction
480 pages
DF - Symboling DF - Symboling DF - Sym
No ratings yet
DF - Symboling DF - Symboling DF - Sym
1 page
Car Price Prediction Using ML
No ratings yet
Car Price Prediction Using ML
11 pages
Muestre los tipos de datos de cada columna utiliz...
No ratings yet
Muestre los tipos de datos de cada columna utiliz...
2 pages
EDA Withoutcode (1)
No ratings yet
EDA Withoutcode (1)
36 pages
pandas-2
No ratings yet
pandas-2
18 pages
datacleaning.ipynb - Colab
No ratings yet
datacleaning.ipynb - Colab
4 pages
Miles Per Gallon
No ratings yet
Miles Per Gallon
11 pages
Pandas 1
No ratings yet
Pandas 1
32 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
22 pages
Pyt On Visualization
No ratings yet
Pyt On Visualization
50 pages
content beyond syllabus and case based program
No ratings yet
content beyond syllabus and case based program
8 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Coding In C Decoded: Decoded, #1
From Everand
Coding In C Decoded: Decoded, #1
D Brown
No ratings yet
PCL Barcode Manual
No ratings yet
PCL Barcode Manual
59 pages
SAP Modules 1
No ratings yet
SAP Modules 1
24 pages
Proposed Release Plan
No ratings yet
Proposed Release Plan
3 pages
Free Netflix Accounts PDF
No ratings yet
Free Netflix Accounts PDF
1 page
Syllabus Computer 5 2017-2018 1st
No ratings yet
Syllabus Computer 5 2017-2018 1st
7 pages
Log
No ratings yet
Log
4 pages
SPI 2018 Internal Setup
No ratings yet
SPI 2018 Internal Setup
37 pages
Text 2
No ratings yet
Text 2
2 pages
Bitcoin AutoPilot (Easy Method)
No ratings yet
Bitcoin AutoPilot (Easy Method)
12 pages
The Impact of Full Disk Encryption On Di
No ratings yet
The Impact of Full Disk Encryption On Di
6 pages
SW 2 Manual 9 03
100% (1)
SW 2 Manual 9 03
82 pages
MISdss
No ratings yet
MISdss
24 pages
Jahnvi_sahni_Resume (1)
No ratings yet
Jahnvi_sahni_Resume (1)
1 page
Payload Csproj FileListAbsolute
No ratings yet
Payload Csproj FileListAbsolute
8 pages
Final Lec
No ratings yet
Final Lec
22 pages
Pawan Bhagwan Bade - CV
No ratings yet
Pawan Bhagwan Bade - CV
6 pages
File Handling
No ratings yet
File Handling
8 pages
JAVA Notes
No ratings yet
JAVA Notes
2 pages
Data Mining Lab
No ratings yet
Data Mining Lab
13 pages
IT Basics
No ratings yet
IT Basics
9 pages
OS Lab 6
No ratings yet
OS Lab 6
17 pages
Manual Testing Status
No ratings yet
Manual Testing Status
4 pages
Export Point To CATIA
No ratings yet
Export Point To CATIA
2 pages
Brbackup Termination
No ratings yet
Brbackup Termination
5 pages
Mailgrep
No ratings yet
Mailgrep
3 pages
Comparison of Web Browsers
No ratings yet
Comparison of Web Browsers
10 pages
2024-05-18 17点的log 0
No ratings yet
2024-05-18 17点的log 0
5 pages
S2000-001 IBM Engineering Requirements Management - DOORS Next v7.x Specialty Dumps
No ratings yet
S2000-001 IBM Engineering Requirements Management - DOORS Next v7.x Specialty Dumps
5 pages
PLU Import Format - 1 - 5
No ratings yet
PLU Import Format - 1 - 5
6 pages
How Do I Use Professionbuddy?
No ratings yet
How Do I Use Professionbuddy?
8 pages

#1 - Skill Builds - Data Analysis With Python

Uploaded by

#1 - Skill Builds - Data Analysis With Python

Uploaded by

#

#We replace headers and recheck our data frame

# if you would save the dataframe df as automobile.csv to your local

Data Formate Read Save

returns a Series with the data type of each column.

# If we would like to get a statistical summary of each column, such as

dataframe[[' column 1 ',column 2', 'column 3']]

dataframe[[' column 1 ',column 2', 'column 3'] ].describe()

Another method you can use to check your dataset is:

You might also like