Notes On Pandas.
Notes On Pandas.
The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was
created by Wes McKinney in 2008.
Pandas can clean messy data sets, and make them readable and relevant.
Pandas are also able to delete rows that are not relevant, or contain wrong values, like empty or
NULL values. This is called cleaning the data.
What is a Series?
A Pandas Series is like a column in a table.
import pandas as pd
a = [1, 7, 2]
data = pd.Series(a)
print(data)
Labels
If nothing else is specified, the values are labeled with their index number. First value has index
0, second value has index 1 etc.
Create Labels
With the index argument, you can name your own labels.
Create your own labels:
import pandas as pd
a = [1, 7, 2]
print(data)
Output:
x 1
y 7
z 2
import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
data = pd.Series(calories)
print(data)
Output:
day1 420
day2 380
day3 390
To select only some of the items in the dictionary, use the index argument and specify only the
items you want to include in the Series.
import pandas as pd
print(data)
Output:
day1 420
day2 380
DataFrames
Datasets in Pandas are usually multi-dimensional tables, called DataFrames.
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table
with rows and columns.
import pandas as pd
data = {
data = pd.DataFrame(data)
print(data)
Output:
calories duration
0 420 50
1 380 40
2 390 45
Locate Row
As you can see from the result above, the DataFrame is like a table with rows and columns.
Pandas use the loc attribute to return one or more specified row(s)
Example 1
print(df.loc[0])
Output:
calories 420
duration 50
Example 2
print(df.loc[[0, 1]])
Output:
calories duration
0 420 50
1 380 40
Named Indexes
With the index argument, you can name your own indexes.
Example
import pandas as pd
data = {
print(df)
Output:
calories duration
day1 420 50
day2 380 40
day3 390 45
Example
Return "day2":
print(df.loc["day2"])
Output:
calories 380
duration 40
CSV files contain plain text and is a well known format that can be read by everyone including
Pandas.
Example
import pandas as pd
df = pd.read_csv('data.csv')
print(df)