#1 - Skill Builds - Data Analysis With Python
#1 - Skill Builds - Data Analysis With Python
Import pandas library
import pandas as pd
# Read the online file by the URL provides above, and assign it to variab
"df"
other_path = "https://s3-api.us-geo.objectstorage.softlayer.net/cf-
courses-data/CognitiveClass/DA0101EN/auto.csv"
df = pd.read_csv(other_path, header=None)
# show the first 5 rows using dataframe.head() method
print("The first 5 rows of the dataframe")
df.head(5)
# print("The last 10 rows of the dataframe\n")
df.tail(10)
# create headers list
headers = ["symboling","normalized-losses","make","fuel-
type","aspiration", "num-of-doors","body-style",
"drive-wheels","engine-location","wheel-base", "length","width","
height","curb-weight","engine-type",
"num-of-cylinders", "engine-size","fuel-
system","bore","stroke","compression-ratio","horsepower",
"peak-rpm","city-mpg","highway-mpg","price"]
print("headers\n", headers)
#we can drop missing values along the column "price" as follows
df.dropna(subset=["price"], axis=0)
# Write your code below and press Shift+Enter to execute
print(df.columns)
Data Types
Data has a variety of types.
The main types stored in Pandas dataframes are object, float, int, bool and datetime64. In order
to better learn about each attribute, it is always good for us to know the data type of each
column. In Pandas:
df.dtypes
# check the data type of data frame "df" by .dtypes
print(df.dtypes)
You can add an argument include = "all" inside the bracket. Let's try it again.
# describe all the columns in "df"
df.describe(include = "all")
You can select the columns of a data frame by indicating the name of each column, for example,
you can select the three columns as follows:
df[['length', 'compression-ratio']].describe()
# look at the info of "df"
df.info