0% found this document useful (0 votes)
3 views

pandas data frame

The document provides an overview of the Pandas DataFrame, explaining its structure and how to create, read, and manipulate data within it using Python. It covers various functionalities such as loading data from CSV files, checking dimensions, retrieving subsets, descriptive statistics, data manipulation, and saving DataFrames back to CSV. Real-life examples are used throughout to illustrate the practical applications of these concepts.

Uploaded by

ANE: Thundres
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

pandas data frame

The document provides an overview of the Pandas DataFrame, explaining its structure and how to create, read, and manipulate data within it using Python. It covers various functionalities such as loading data from CSV files, checking dimensions, retrieving subsets, descriptive statistics, data manipulation, and saving DataFrames back to CSV. Real-life examples are used throughout to illustrate the practical applications of these concepts.

Uploaded by

ANE: Thundres
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

### **2.

1 Pandas DataFrame**
**What is it?**
A DataFrame is a table with rows and columns, like a spreadsheet, where you can store and manage data.

**Real-life Example:**
Your grocery list is a table with items, quantities, and prices.

**Python Example:**

```
**Output:**

---

### **2.2 Creation of Pandas DataFrame**


**What is it?**
You can create a DataFrame from a dictionary, list, or other data structures.

**Real-life Example:**
You write your grocery list in a notebook, and now you want to put it into a program.

**Python Example:**
# Creating from a dictionary
data = {"Item": ["Eggs", "Cheese"], "Quantity": [12, 1], "Price": [4, 5]}
df = pd.DataFrame(data)
print(df)
```
**Output:**
```

### **2.3 Reading from CSV File**


**What is it?**
Load data from a CSV (comma-separated values) file into a DataFrame.

**Real-life Example:**
Your friend emails you a CSV file with their grocery list, and you load it into Python.

**Python Example:**
If you’ve made a CSV file in Notepad or any other app, I’ll show you how to bring it into Python step-by-step
with an easy explanation and example.

How to Import a CSV File into Python


What You Need
• A CSV file you made (e.g., in Notepad, Excel, or any text editor).
• Python with the pandas library (it’s like a superhero for handling tables).
Steps
1. Make Your CSV File
o Open Notepad (or any app).
o Write your data with commas separating values and hit Enter for new rows.
o Save it with a .csv name, like my_list.csv.
Example in Notepad:
Python Example
Here’s how you’d import that my_list.csv file:
import pandas as pd # Load the pandas tool

# Tell Python where your file is (use the file path)


df = pd.read_csv("my_list.csv") # Replace with your file’s path if it’s not in the same folder

# Show the table


print(df)

How to Find the File Path


• If your file is in the same folder as your Python code, just use the file name (like "my_list.csv").
• If it’s somewhere else (e.g., Desktop), use the full path:
o Windows: "C:/Users/YourName/Desktop/my_list.csv"
o Mac/Linux: "/home/YourName/Desktop/my_list.csv"
Tip: Right-click the file, check its "Properties" (Windows) or "Get Info" (Mac) to copy the path, and adjust
it with forward slashes (/) or backslashes (\) as needed.

Real-Life Example
Imagine you wrote a shopping list in Notepad:
Item,Price
Cake,5
Juice,2

Saved it as shopping.csv on your Desktop.


In Python:
Quick Tips
• First Row is Special: Python assumes the first line in your CSV is the column names (like
"Item,Price"). If it’s not, tell Python with pd.read_csv("file.csv", header=None) and it’ll number the
columns instead.
• Check the File: Open your CSV in Notepad to make sure it’s comma-separated and looks right.
• Errors? If Python says “file not found,” double-check the file name or path.

Why This Works


Python’s pandas reads your CSV like a librarian opening a book—it turns plain text into a neat table
(DataFrame) you can play with. It’s like handing your scribbled list to a friend who organizes it for you!

Why Use CSV in Python?


1. Easy Way to Store and Share Data
• Real-life Example: Imagine you’ve made a grocery list in Python using a Pandas DataFrame. You want
to send it to your friend who uses Excel. Saving it as a CSV file (like grocery.csv) lets them open it easily
without needing Python.
• Why it matters: CSV is a universal format—almost every tool or app can read it. It’s like writing your list
on a piece of paper that anyone can pick up and read.
2. Works Well with Pandas for Data Handling
• Real-life Example: Your friend sends you their grocery list as a CSV file. You load it into Python with
pd.read_csv() and instantly get a DataFrame to analyze—like checking the total cost or adding more
items.
• Why it matters: Python (with Pandas) can quickly read CSV files and turn them into tables you can
work with. It’s like getting a pre-organized notebook you can edit or study.
3. Simple and Lightweight
• Real-life Example: You don’t need fancy software to create a CSV. You could even type your grocery list
in Notepad (Item,Quantity,Price) and save it as list.csv. Python can still read it!
•Why it matters: Unlike big Excel files with formatting, CSV files are just text, so they’re small, fast to
load, and don’t waste space.
4. Great for Storing Large Datasets
• Real-life Example: Suppose you’re tracking grocery purchases for a whole year (hundreds of items). A
CSV file can hold all that data, and Python can process it without slowing down.
• Why it matters: CSV files can scale up easily. Whether it’s 3 rows or 3 million, Python handles it
efficiently with Pandas.
5. Perfect for Saving Your Work
• Real-life Example: You update your grocery DataFrame in Python (e.g., change Apples from 5 to 10).
You save it as a CSV with df.to_csv() so you don’t lose your changes.
• Why it matters: It’s like saving your game progress—you can close Python and pick up where you left
off later by reloading the CSV.
6. Connects Python to the Real World
• Real-life Example: Websites, apps, or sensors (like a store’s inventory system) often export data as
CSV. You can grab that file, load it into Python, and analyze it—like finding out which items sell the
most.
• Why it matters: CSV acts as a bridge between Python and other systems, making it super practical for
real-world tasks.

### **2.4 Dimensions of a DataFrame**


**What is it?**
Check the size of your DataFrame (number of rows and columns).

**Real-life Example:**
You want to know how many items and details are in your grocery list.

**Python Example:**
```python
import pandas as pd

data = {"Item": ["Apples", "Bread"], "Quantity": [5, 2], "Price": [1, 2]}
df = pd.DataFrame(data)
print("Dimensions:", df.shape) # (rows, columns)
```
**Output:**
```
Dimensions: (2, 3)
```
(2 rows for Apples and Bread, 3 columns for Item, Quantity, Price)

---

### **2.5 Summary Information about a DataFrame**


**What is it?**
Get a quick overview—column names, data types, and if anything’s missing.
**Real-life Example:**
You check your grocery list to see what columns you wrote and if you forgot any prices.

**Python Example:**
```python
import pandas as pd

data = {"Item": ["Apples", "Bread"], "Quantity": [5, 2], "Price": [1, None]}
df = pd.DataFrame(data)
print(df.info())
```
**Output:**
```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Item 2 non-null object
1 Quantity 2 non-null int64
2 Price 1 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 176.0+ bytes
```
(Notice Price has 1 missing value!)

---

### **2.6 Retrieving Subset of Data - Indexing and Slicing**


**What is it?**
Pick specific rows, columns, or parts of your DataFrame.

**Real-life Example:**
You want only the “Quantity” column or just the row for “Bread.”

**Python Example:**
```python
import pandas as pd

data = {"Item": ["Apples", "Bread", "Milk"], "Quantity": [5, 2, 1], "Price": [1, 2, 3]}
df = pd.DataFrame(data)

# Get just the "Quantity" column


print("Quantities:\n", df["Quantity"])
# Get the row for "Bread" (index 1)
print("\nBread row:\n", df.iloc[1])
```
**Output:**
```
Quantities:
0 5
1 2
2 1
Name: Quantity, dtype: int64

Bread row:
Item Bread
Quantity 2
Price 2
Name: 1, dtype: object
```

---

### **2.7 Descriptive Statistics**


**What is it?**
Summarize numbers—like average, total, min, or max.

**Real-life Example:**
You calculate the average price of your groceries or the total quantity.

**Python Example:**
```python
import pandas as pd

data = {"Item": ["Apples", "Bread", "Milk"], "Quantity": [5, 2, 1], "Price": [1, 2, 3]}
df = pd.DataFrame(data)

print("Summary Stats:\n", df.describe())


print("\nAverage Price:", df["Price"].mean())
```
**Output:**
```
Summary Stats:
Quantity Price
count 3.000000 3.000000
mean 2.666667 2.000000
std 2.081666 1.000000
min 1.000000 1.000000
25% 1.500000 1.500000
50% 2.000000 2.000000
75% 3.500000 2.500000
max 5.000000 3.000000

Average Price: 2.0


```

---

### **2.8 Data Manipulation**


**What is it?**
Change your DataFrame—add rows, update values, or delete columns.

**Real-life Example:**
You decide to buy 10 apples instead of 5 and add “Eggs” to your list.

**Python Example:**
```python
import pandas as pd

data = {"Item": ["Apples", "Bread"], "Quantity": [5, 2], "Price": [1, 2]}
df = pd.DataFrame(data)

# Update Quantity of Apples to 10


df.loc[0, "Quantity"] = 10

# Add a new row for Eggs


new_row = pd.DataFrame({"Item": ["Eggs"], "Quantity": [12], "Price": [4]})
df = pd.concat([df, new_row], ignore_index=True)

print(df)
```
**Output:**
```
Item Quantity Price
0 Apples 10 1
1 Bread 2 2
2 Eggs 12 4
```

---

### **2.9 Writing to CSV File**


**What is it?**
Save your DataFrame as a CSV file.

**Real-life Example:**
After updating your grocery list, you save it to share with your friend.

**Python Example:**
```python
import pandas as pd

data = {"Item": ["Apples", "Bread"], "Quantity": [5, 2], "Price": [1, 2]}
df = pd.DataFrame(data)

# Save to a CSV file


df.to_csv("my_grocery_list.csv", index=False)
```
**Result:** A file `my_grocery_list.csv` is created with:
```
Item,Quantity,Price
Apples,5,1
Bread,2,2
```

---

### **2.10 Grouping and Aggregation**


**What is it?**
Group data by a column and summarize it (e.g., sum, average).

**Real-life Example:**
You track groceries by week and want the total cost per week.

**Python Example:**
```python
import pandas as pd

data = {
"Week": ["Week1", "Week1", "Week2", "Week2"],
"Item": ["Apples", "Bread", "Milk", "Apples"],
"Price": [1, 2, 3, 1]
}
df = pd.DataFrame(data)

# Group by Week and sum the Price


grouped = df.groupby("Week")["Price"].sum()
print(grouped)
```
**Output:**
```
Week
Week1 3
Week2 4
Name: Price, dtype: int64
```
(Week1 total = $3, Week2 total = $4)

---

### **Putting It All Together**


Imagine you’re managing your grocery list:
1. You create a DataFrame (2.1, 2.2).
2. Load it from a CSV if someone sent it (2.3).
3. Check its size (2.4) and summary (2.5).
4. Pick out specific items (2.6), summarize costs (2.7), and update it (2.8).
5. Save it back to a file (2.9) and group by weeks if needed (2.10).

Let me know if you’d like me to tweak any part or add more examples!

You might also like