0% found this document useful (0 votes)

37 views25 pages

1-Python Pandas Case Study

Uploaded by

sureshkgenai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views25 pages

1-Python Pandas Case Study

Uploaded by

sureshkgenai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Pandas - Series and Dataframe

There is a close connection between the DataFrames and the Series of Pandas. A DataFrame
can be seen as a concatenation of Series, each Series having the same index, i.e. the index of
the DataFrame.

We will demonstrate this in the following example.

We define the following three Series:

import pandas as pd
years = range(2014, 2018)
shop1 = pd.Series([2409.14, 2941.01, 3496.83,
3119.55], index=years)
shop2 = pd.Series([1203.45, 3441.62, 3007.83,
3619.53], index=years)
shop3 = pd.Series([3412.12, 3491.16, 3457.19,
1963.10], index=years)

pd.concat([shop1, shop2, shop3])

What happens, if we concatenate these "shop" Series? Pandas provides a concat function
for this purpose:

shops_df = pd.concat([shop1, shop2, shop3],

axis=1)
shops_df

Let's do some fine sanding by giving names to the columns:

cities = ["Zürich", "Winterthur", "Freiburg"]

shops_df.columns = cities
print(shops_df)
# alternative way: give names to series:
shop1.name = "Zürich"
shop2.name = "Winterthur"
shop3.name = "Freiburg"
print("------")
shops_df2 = pd.concat([shop1, shop2, shop3],
axis=1)
print(shops_df2)

print(type(shops_df))

This means, we can arrange or concat Series into DataFrames!

DataFrames from Dictionaries

A DataFrame has a row and column index; it's like a dict of Series with a
common index.

cities = {"name": ["London", "Berlin",

"Madrid", "Rome",
"Paris", "Vienna", "Bucharest", "Hamburg",
"Budapest", "Warsaw", "Barcelona",
"Munich", "Milan"],
"population": [8615246, 3562166, 3165235,
2874038,
2273305, 1805681, 1803425, 1760433,
1754000, 1740119, 1602386, 1493900,
1350680],
"country": ["England", "Germany", "Spain",
"Italy",
"France", "Austria", "Romania",
"Germany", "Hungary", "Poland", "Spain",
"Germany", "Italy"]}
city_frame = pd.DataFrame(cities)
city_frame

Retrieving the Column Names

It's possible to get the names of the columns as a list:

city_frame.columns.values

Custom Index
We can see that an index (0,1,2, ...) has been automatically assigned to
the DataFrame. We can also assign a custom index to the DataFrame
object:

ordinals = ["first", "second", "third",

"fourth",
"fifth", "sixth", "seventh", "eigth",
"ninth", "tenth", "eleventh", "twelvth",
"thirteenth"]
city_frame = pd.DataFrame(cities,
index=ordinals)
city_frame

Rearranging the Order of Columns

We can also define and rearrange the order of the columns at the time of
creation of the DataFrame. This makes also sure that we will have a
defined ordering of our columns, if we create the DataFrame from a
dictionary. Dictionaries are not ordered

city_frame = pd.DataFrame(cities,
columns=["name",
"country",
"population"])
city_frame

But what if you want to change the column names and the ordering of an
existing DataFrame?

city_frame.reindex(["country", "name",
"population"])
city_frame

Now, we want to rename our columns. For this purpose, we will use the
DataFrame method 'rename'. This method supports two calling
conventions

(index=index_mapper, columns=columns_mapper, ...)

(mapper, axis={'index', 'columns'}, ...)
We will rename the columns of our DataFrame into Romanian names in
the following example. We set the parameter inplace to True so that our
DataFrame will be changed instead of returning a new DataFrame, if
inplace is set to False, which is the default!

city_frame.rename(columns={"name":"Nume",
"country":"țară", "population":"populație"},
inplace=True)
city_frame

Existing Column as the Index of a DataFrame

We want to create a more useful index in the following example. We will
use the country name as the index, i.e. the list value associated to the key
"country" of our cities dictionary:

city_frame = pd.DataFrame(cities,
columns=["name", "population"],
index=cities["country"])
city_frame

Alternatively, we can change an existing DataFrame. We can us the

method set_index to turn a column into an index. "set_index" does not
work in-place, it returns a new data frame with the chosen column as the
index:
city_frame = pd.DataFrame(cities)
city_frame2 = city_frame.set_index("country")
print(city_frame2)

We saw in the previous example that the set_index method returns a

new DataFrame object and doesn't change the original DataFrame. If we
set the optional parameter "inplace" to True, the DataFrame will be
changed in place, i.e. no new object will be created:

city_frame = pd.DataFrame(cities)
city_frame.set_index("country", inplace=True)
print(city_frame)

Label-Indexing on the Rows

So far we have indexed DataFrames via the columns. We will
demonstrate now, how we can access rows from DataFrames via the
locators 'loc' and 'iloc'. ('ix' is deprecated and will be removed in the
future)

city_frame = pd.DataFrame(cities,
columns=("name", "population"),
index=cities["country"])
print(city_frame.loc["Germany"])

print(city_frame.loc[["Germany", "France"]])
print(city_frame.loc[city_frame.population>2000
000])

Sum and Cumulative Sum

We can calculate the sum of all the columns of a DataFrame or the sum of
certain columns:

print(city_frame.sum())

city_frame["population"].sum()

x = city_frame["population"].cumsum()
print(x)

Assigning New Values to Columns

x is a Pandas Series. We can reassign the previously calculated cumulative

sums to the population column:

city_frame["population"] = x
print(city_frame)
Instead of replacing the values of the population column with the
cumulative sum, we want to add the cumulative population sum as a new
culumn with the name "cum_population".

city_frame = pd.DataFrame(cities,
columns=["country",
"population",
"cum_population"],
index=cities["name"])
city_frame

We can see that the column "cum_population" is set to Nan, as we

haven't provided any data for it.
We will assign now the cumulative sums to this column:

city_frame["cum_population"] =
city_frame["population"].cumsum()
city_frame

We can also include a column name which is not contained in the

dictionary, when we create the DataFrame from the dictionary. In this
case, all the values of this column will be set to NaN:

city_frame = pd.DataFrame(cities,
columns=["country",
"area",
"population"],
index=cities["name"])
print(city_frame)

Accessing the Columns of a DataFrame

There are two ways to access a column of a DataFrame. The result is in
both cases a Series:

# in a dictionary-like way:
print(city_frame["population"])

# as an attribute
print(city_frame.population)

print(type(city_frame.population))

city_frame.population

From the previous example, we can see that we have not copied the
population column. "p" is a view on the data of city_frame.
Assigning New Values to a Column

The column area is still not defined. We can set all elements of the
column to the same value:

city_frame["area"] = 1572
print(city_frame)

In this case, it will be definitely better to assign the exact area to the
cities. The list with the area values needs to have the same length as the
number of rows in our DataFrame.

# area in square km:

area = [1572, 891.85, 605.77, 1285,
105.4, 414.6, 228, 755,
525.2, 517, 101.9, 310.4,
181.8]
# area could have been designed as a list, a
Series, an array or a scalar
city_frame["area"] = area
print(city_frame)

Sorting DataFrames

Let's sort our DataFrame according to the city area:

city_frame = city_frame.sort_values(by="area",
ascending=False)
print(city_frame)

Let's assume, we have only the areas of London, Hamburg and Milan. The
areas are in a series with the correct indices. We can assign this series as
well:

city_frame = pd.DataFrame(cities,
columns=["country",
"area",
"population"],
index=cities["name"])
some_areas = pd.Series([1572, 755, 181.8],
index=['London', 'Hamburg', 'Milan'])
city_frame['area'] = some_areas
print(city_frame)

Inserting new columns into existing DataFrames

In the previous example we have added the column area at creation time.
Quite often it will be necessary to add or insert columns into existing
DataFrames.

city_frame = pd.DataFrame(cities,
columns=["country",
"population"],
index=cities["name"])
city_frame
idx = 1
city_frame.insert(loc=idx, column='area',
value=area)
city_frame

DataFrame from Nested Dictionaries

A nested dictionary of dicts can be passed to a DataFrame as well. The
indices of the outer dictionary are taken as the the columns and the inner
keys. i.e. the keys of the nested dictionaries, are used as the row indices:

growth = {"Switzerland": {"2010": 3.0, "2011":

1.8, "2012": 1.1, "2013": 1.9},
"Germany": {"2010": 4.1, "2011": 3.6, "2012":
0.4, "2013": 0.1},
"France": {"2010":2.0, "2011":2.1, "2012":
0.3, "2013": 0.3},
"Greece": {"2010":-5.4, "2011":-8.9, "2012":-
6.6, "2013": -3.3},
"Italy": {"2010":1.7, "2011": 0.6, "2012":-
2.3, "2013":-1.9}
}

growth_frame = pd.DataFrame(growth)
growth_frame

You like to have the years in the columns and the countries in the rows?
No problem, you can transpose the data:

growth_frame.T

growth_frame = growth_frame.T
growth_frame2 =
growth_frame.reindex(["Switzerland",
"Italy",
"Germany",
"Greece"])
print(growth_frame2)

Filling a DataFrame with random values:

import numpy as np
names = ['Frank', 'Eve', 'Stella', 'Guido',
'Lara']
index = ["January", "February", "March",
"April", "May", "June",
"July", "August", "September",
"October", "November", "December"]
df = pd.DataFrame((np.random.randn(12,
5)*1000).round(2),
columns=names,
index=index)
df

Reading CSV and DSV Files

Pandas offers two ways to read in CSV or DSV files to be precise:

DataFrame.from_csv
read_csv
There is no big difference between those two functions, e.g. they have
different default values in some cases and read_csv has more paramters.
We will focus on read_csv, because DataFrame.from_csv is kept inside
Pandas for reasons of backwards compatibility.

import pandas as pd
exchange_rates =
pd.read_csv("/home/madhu/Desktop/datasets/dolla
r_euro.txt",
sep="\t")
print(exchange_rates)

As we can see, read_csv used automatically the first line as the names for
the columns. It is possible to give other names to the columns. For this
purpose, we have to skip the first line by setting the parameter "header"
to 0 and we have to assign a list with the column names to the parameter
"names":

import pandas as pd
exchange_rates =
pd.read_csv("/home/madhu/Desktop/datasets/dolla
r_euro.txt",
sep="\t",
header=0,
names=["year", "min", "max", "days"])
print(exchange_rates)

Exercise
The file "countries_population.csv" is a csv file, containing the population
numbers of all countries (July 2014). The delimiter of the file is a space
and commas are used to separate groups of thousands in the numbers.
The method 'head(n)' of a DataFrame can be used to give out only the
first n rows or lines. Read the file into a DataFrame.

pop =
pd.read_csv("/home/madhu/Desktop/datasets/countri
es_population.csv",
header=None,
names=["Country", "Population"],
index_col=0,
quotechar="'",
sep=" ",
thousands=",")
print(pop.head(5))

",
header=None,
names=["Country", "Population"],
index_col=0,
quotechar="'",
sep=" ",
thousands=",")
print(pop.head(5))
Writing csv Files
We can create csv (or dsv) files with the method "to_csv". Before we do
this, we will prepare some data to output, which we will write to a file. We
have two csv files with population data for various
countries. countries_male_population.csv contains the figures of the male
populations and countries_male_population.csv correspondingly the
numbers for the female populations. We will create a new csv file with the
sum:

column_names = ["Country"] + list(range(2002,

2013))
male_pop =
pd.read_csv("/home/madhu/Desktop/datasets/count
ries_male_population.csv",
header=None,
index_col=0,
names=column_names)
female_pop =
pd.read_csv("/home/madhu/Desktop/datasets/count
ries_female_population.csv",
header=None,
index_col=0,
names=column_names)
population = male_pop + female_pop
population

population.to_csv("/home/madhu/Desktop/
datasets/countries_total_population.csv")
We want to create a new DataFrame with all the information, i.e. female,
male and complete population. This means that we have to introduce an
hierarchical index. Before we do it on our DataFrame, we will introduce
this problem in a simple example:

import pandas as pd
shop1 = {"foo":{2010:23, 2011:25}, "bar":
{2010:13, 2011:29}}
shop2 = {"foo":{2010:223, 2011:225}, "bar":
{2010:213, 2011:229}}
shop1 = pd.DataFrame(shop1)
shop2 = pd.DataFrame(shop2)
both_shops = shop1 + shop2
print("Sales of shop1:\n", shop1)
print("\nSales of both shops\n", both_shops)

shops = pd.concat([shop1, shop2], keys=["one",

"two"])
shops

We want to swap the hierarchical indices. For this we will

use 'swaplevel':

shops.swaplevel()
shops.sort_index(inplace=True)
shops
We will go back to our initial problem with the population
figures. We will apply the same steps to those
DataFrames:

pop_complete = pd.concat([population.T,
male_pop.T,
female_pop.T],
keys=["total", "male", "female"])
df = pop_complete.swaplevel()
df.sort_index(inplace=True)
df[["Austria", "Australia", "France"]]

df.to_csv("/home/madhu/Desktop/datasets/countries_total_population.csv")

Exercise
Read in the dsv file (csv) bundeslaender.txt. Create a new
file with the columns 'land', 'area', 'female', 'male',
'population' and 'density' (inhabitants per square
kilometres.
print out the rows where the area is greater than
30000 and the population is greater than 10000
Print the rows where the density is greater than 300

lands =
pd.read_csv('/home/madhu/Desktop/datasets/bunde
slaender.txt', sep=" ")
print(lands.columns.values)
‘nan' in Python
Python knows NaN values as well. We can create it with "float":

n1 = float("nan")
n2 = float("Nan")
n3 = float("NaN")
n4 = float("NAN")
print(n1, n2, n3, n4)

"nan" is also part of the math module since Python 3.5:

import math
n1 = math.nan
print(n1)
print(math.isnan(n1))

nan
True

NaN in Pandas
Example without NaNs
Before we will work with NaN data, we will process a file without any NaN
values. The data file temperatures.csv contains the temperature data of
six sensors taken every 15 minuts between 6:00 to 19.15 o'clock.

Reading in the data file can be done with the read_csv function:
import pandas as pd
df =
pd.read_csv("home/madhu/Desktop/datasets/temper
atures.csv",
sep=";",
decimal=",")
df.loc[:3]

We want to calculate the avarage temperatures per measuring point over

all the sensors. We can use the DataFrame method 'mean'. If we use
'mean' without parameters it will sum up the sensor columns, which isn't
what we want, but it may be interesting as well:

df.mean()

average_temp_series = df.mean(axis=1)
print(average_temp_series[:8])

sensors = df.columns.values[1:]
# all columns except the time column will be
removed:
df = df.drop(sensors, axis=1)
print(df[:5])

We will assign now the average temperature values as a

new column 'temperature':
# best practice:
df = df.assign(temperature=average_temp_series)
# inplace option not available
# alternatively:
#df.loc[:,"temperature"] = average_temp_series

df[:3]

Example with NaNs

We will use now a data file similar to the previous temperature csv, but
this time we will have to cope with NaN data, when the sensors
malfunctioned.

We will create a temperature DataFrame, in which some data is not

defined, i.e. NaN.

We will use and change the data from the the temperatures.csv file:

temp_df =
pd.read_csv("home/madhu/Desktop/datasets/temper
atures.csv",
sep=";",
index_col=0,
decimal=",")

We will randomly assign some NaN values into the data frame. For this
purpose, we will use the where method from DataFrame. If we apply
where to a DataFrame object df, i.e. df.where(cond, other_df), it will
return an object of same shape as df and whose corresponding entries are
from df where the corresponding element of cond is True and otherwise
are taken from other_df.

Before we continue with our task, we will demonstrate the way of working
of where with some simple examples:

s = pd.Series(range(5)) s.where(s > 0)

import numpy as np
A = np.random.randint(1, 30, (4, 2))
df = pd.DataFrame(A, columns=['Foo', 'Bar'])
m = df % 2 == 0
df.where(m, -df, inplace=True)
df

For our task, we need to create a DataFrame 'nan_df', which consists

purely of NaN values and has the same shape as our temperature
DataFrame 'temp_df'. We will use this DataFrame in 'where'. We also
need a DataFrame with the conditions "df_bool" as True values. For this
purpose we will create a DataFrame with random values between 0 and 1
and by applying 'random_df < 0.8' we get the df_bool DataFrame, in
which about 20 % of the values will be True:

random_df =
pd.DataFrame(np.random.random(size=(54, 6)),
columns=temp_df.columns.values,
index=temp_df.index)
nan_df = pd.DataFrame(np.nan,
columns=temp_df.columns.values,
index=temp_df.index)
df_bool = random_df<0.8
df_bool[:5]

Finally, we have everything toghether to create our DataFrame with

distrubed measurements:

disturbed_data = temp_df.where(df_bool, nan_df)

disturbed_data.to_csv("data1/
temperatures_with_NaN.csv")
disturbed_data[:10]

Using dropna on the DataFrame

'dropna' is a DataFrame method. If we call this method without
arguments, it will return an object where every row is ommitted, in which
data are missing, i.e. some value is NaN:

df = disturbed_data.dropna()
df

'dropna' can also be used to drop all columns in which some values are
NaN. This can be achieved by assigning 1 to the axis parameter. The
default value is False, as we have seen in our previous example. As every
column from our sensors contain NaN values, they will all disappear:

df = disturbed_data.dropna(axis=1)
df[:5]

Let us change our task: We only want to get rid of all the rows, which
contain more than one NaN value. The parameter 'thresh' is ideal for this
task. It can be set to the minimum number. 'thresh' is set to an integer
value, which defines the minimum number of non-NaN values. We have
six temperature values in every row. Setting 'thresh' to 5 makes sure that
we will have at least 5 valid floats in every remaining row:

cleansed_df = disturbed_data.dropna(thresh=5,
axis=0)
cleansed_df[:7]

Now we will calculate the mean values again, but this time on the
DataFrame 'cleansed_df', i.e. where we have taken out all the rows,
where more than one NaN value occurred.

average_temp_series = cleansed_df.mean(axis=1)
sensors = cleansed_df.columns.values
df = cleansed_df.drop(sensors, axis=1)
# best practice:
df = df.assign(temperature=average_temp_series)
# inplace option not available
df[:6]

Top 1000 Corporation in The Philippines
100% (1)
Top 1000 Corporation in The Philippines
24 pages
MS Access Lab Exercise
56% (9)
MS Access Lab Exercise
8 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
7 pages
Pandas
No ratings yet
Pandas
27 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
No ratings yet
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
1 page
FALLSEMFY2023-24 BCSE101E ELA CH2023241700215 Reference Material II 24-11-2023 Introduction To Pandas
No ratings yet
FALLSEMFY2023-24 BCSE101E ELA CH2023241700215 Reference Material II 24-11-2023 Introduction To Pandas
15 pages
SBLC 1
No ratings yet
SBLC 1
23 pages
Pandas (Paneled Data)
No ratings yet
Pandas (Paneled Data)
97 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
1 page
Pandas_Cheat_Sheet (1)_240511_113437
No ratings yet
Pandas_Cheat_Sheet (1)_240511_113437
1 page
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Pandas
No ratings yet
Pandas
36 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Lecture 3 - Pandas
No ratings yet
Lecture 3 - Pandas
37 pages
05Getting Started With Pandas
No ratings yet
05Getting Started With Pandas
44 pages
PandasGUIA PYTHON-04
No ratings yet
PandasGUIA PYTHON-04
1 page
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
lecture-9-pandas
No ratings yet
lecture-9-pandas
176 pages
Pandas Notes(1)
No ratings yet
Pandas Notes(1)
44 pages
Dataframe Ip
No ratings yet
Dataframe Ip
75 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
EXPERIMENT 1 - Colab
No ratings yet
EXPERIMENT 1 - Colab
2 pages
LIst of practicals 2024 - 25 class xii
No ratings yet
LIst of practicals 2024 - 25 class xii
10 pages
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
No ratings yet
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
1 page
DS
No ratings yet
DS
38 pages
Cheat Python
No ratings yet
Cheat Python
8 pages
Pandas DataFrame1
No ratings yet
Pandas DataFrame1
22 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas Fundamentals (1)
No ratings yet
Pandas Fundamentals (1)
90 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Python Pandas - DataFrame
No ratings yet
Python Pandas - DataFrame
12 pages
2.1 Pandas Objects
No ratings yet
2.1 Pandas Objects
10 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
ip study
No ratings yet
ip study
18 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Using Python For Data Analysis - July 2018 - Slides
No ratings yet
Using Python For Data Analysis - July 2018 - Slides
43 pages
PANDAS Python
No ratings yet
PANDAS Python
2 pages
Chapter 1 - Part 2 - DataFrame (1)
No ratings yet
Chapter 1 - Part 2 - DataFrame (1)
48 pages
2.2 Data Indexing and Selection
No ratings yet
2.2 Data Indexing and Selection
8 pages
Unit 4
No ratings yet
Unit 4
27 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Copy of Copy of Black Doodle Group Project Presentation - 20230903 - 211147 - 0000
No ratings yet
Copy of Copy of Black Doodle Group Project Presentation - 20230903 - 211147 - 0000
32 pages
UNIT II Notes (1)
No ratings yet
UNIT II Notes (1)
23 pages
Unit 04 Pandas
No ratings yet
Unit 04 Pandas
46 pages
UNIT-04-PANDAS
No ratings yet
UNIT-04-PANDAS
46 pages
Pandas
No ratings yet
Pandas
21 pages
Data Science Notes Unit-1 Part -2
No ratings yet
Data Science Notes Unit-1 Part -2
22 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
unit 3
No ratings yet
unit 3
10 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
04072025 City Council Packet
No ratings yet
04072025 City Council Packet
255 pages
Class Exercise 1 Identification of Accounting Elements
No ratings yet
Class Exercise 1 Identification of Accounting Elements
2 pages
DH67CF TechProdSpec03
No ratings yet
DH67CF TechProdSpec03
90 pages
JXSE - ProgGuide - v2.6 (Final)
No ratings yet
JXSE - ProgGuide - v2.6 (Final)
92 pages
Fluke 190 504 Manual
No ratings yet
Fluke 190 504 Manual
158 pages
CLW 10 - 11 - Anaerobic Digestion - Text - 3 Solved Exercises
100% (1)
CLW 10 - 11 - Anaerobic Digestion - Text - 3 Solved Exercises
10 pages
Kenwood TK f8 Manual
40% (5)
Kenwood TK f8 Manual
2 pages
Fuel Charging and Controls - Turbocharger - TD4 2.2L Diesel - Turbocharger - Removal
No ratings yet
Fuel Charging and Controls - Turbocharger - TD4 2.2L Diesel - Turbocharger - Removal
4 pages
View PHP
No ratings yet
View PHP
2 pages
Full download Authentic Childhood Experiencing Reggio Emilia in the Classroom Third Edition Susan Fraser pdf docx
100% (4)
Full download Authentic Childhood Experiencing Reggio Emilia in the Classroom Third Edition Susan Fraser pdf docx
81 pages
Product Data Sheet G Series Hydraulic Performance Data Metric Bettis en 84260
No ratings yet
Product Data Sheet G Series Hydraulic Performance Data Metric Bettis en 84260
7 pages
Hushoffice Hushhybrid Technical Specification 3.5 HUS-BX-220
No ratings yet
Hushoffice Hushhybrid Technical Specification 3.5 HUS-BX-220
2 pages
Experiment 4 Simple Motor
No ratings yet
Experiment 4 Simple Motor
6 pages
Auditing Strategic Business Management
No ratings yet
Auditing Strategic Business Management
5 pages
CHASER-G1/144P 400-420W: Mono 9Bb Half-Cut Module
No ratings yet
CHASER-G1/144P 400-420W: Mono 9Bb Half-Cut Module
2 pages
Provincial Board Proposes An Ordinance Instituting The Magsidalus Iti Arubayan Program in All Levels of Governments of The Province of La Union
No ratings yet
Provincial Board Proposes An Ordinance Instituting The Magsidalus Iti Arubayan Program in All Levels of Governments of The Province of La Union
1 page
Module V - OOPS 1
No ratings yet
Module V - OOPS 1
54 pages
Mark Scheme: P1 - Test 1 Beginner
100% (1)
Mark Scheme: P1 - Test 1 Beginner
9 pages
Kenya_Economics_Exam_Answers
No ratings yet
Kenya_Economics_Exam_Answers
5 pages
Control Systems (CS) : Lecture-17 Routh-Herwitz Stability Criterion
No ratings yet
Control Systems (CS) : Lecture-17 Routh-Herwitz Stability Criterion
18 pages
Andrew Loomis
No ratings yet
Andrew Loomis
2 pages
JMFSL Counter - 07.08.2023
No ratings yet
JMFSL Counter - 07.08.2023
23 pages
Business Finance - Aralin 8
No ratings yet
Business Finance - Aralin 8
24 pages
Bleaching
100% (1)
Bleaching
19 pages
Full Download Rocket Science From Fireworks To The Photon Drive Denny Mark PDF
100% (3)
Full Download Rocket Science From Fireworks To The Photon Drive Denny Mark PDF
52 pages
V02 0409en - DS - HCPL 7860 - 2015 03 061 908727
No ratings yet
V02 0409en - DS - HCPL 7860 - 2015 03 061 908727
18 pages
The Collective Effort of The United Nations Specialised Agencies To Tackle Et Al
No ratings yet
The Collective Effort of The United Nations Specialised Agencies To Tackle Et Al
12 pages
Trencor T1460 Screen
No ratings yet
Trencor T1460 Screen
2 pages

1-Python Pandas Case Study

Uploaded by

1-Python Pandas Case Study

Uploaded by

Pandas - Series and Dataframe

We will demonstrate this in the following example.

We define the following three Series:

pd.concat([shop1, shop2, shop3])

shops_df = pd.concat([shop1, shop2, shop3],

Let's do some fine sanding by giving names to the columns:

cities = ["Zürich", "Winterthur", "Freiburg"]

This means, we can arrange or concat Series into DataFrames!

DataFrames from Dictionaries

cities = {"name": ["London", "Berlin",

Retrieving the Column Names

ordinals = ["first", "second", "third",

Rearranging the Order of Columns

(index=index_mapper, columns=columns_mapper, ...)

Existing Column as the Index of a DataFrame

Alternatively, we can change an existing DataFrame. We can us the

We saw in the previous example that the set_index method returns a

Label-Indexing on the Rows

Sum and Cumulative Sum

Assigning New Values to Columns

x is a Pandas Series. We can reassign the previously calculated cumulative

We can see that the column "cum_population" is set to Nan, as we

We can also include a column name which is not contained in the

Accessing the Columns of a DataFrame

# area in square km:

Let's sort our DataFrame according to the city area:

Inserting new columns into existing DataFrames

DataFrame from Nested Dictionaries

growth = {"Switzerland": {"2010": 3.0, "2011":

Filling a DataFrame with random values:

Reading CSV and DSV Files

column_names = ["Country"] + list(range(2002,

shops = pd.concat([shop1, shop2], keys=["one",

We want to swap the hierarchical indices. For this we will

"nan" is also part of the math module since Python 3.5:

We want to calculate the avarage temperatures per measuring point over

We will assign now the average temperature values as a

Example with NaNs

We will create a temperature DataFrame, in which some data is not

s = pd.Series(range(5)) s.where(s > 0)

For our task, we need to create a DataFrame 'nan_df', which consists

Finally, we have everything toghether to create our DataFrame with

disturbed_data = temp_df.where(df_bool, nan_df)

Using dropna on the DataFrame

You might also like