0% found this document useful (0 votes)
5 views6 pages

DRA Lab Exp8

The document outlines a Python program using pandas to manipulate a DataFrame containing individual data, including names, ages, cities, and incomes. It demonstrates loading data from a CSV file, checking for missing values, filtering based on age, and adding a new salary column. The program successfully saves the filtered data to a new CSV file and confirms the successful execution of the DataFrame operations.

Uploaded by

bborigarla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

DRA Lab Exp8

The document outlines a Python program using pandas to manipulate a DataFrame containing individual data, including names, ages, cities, and incomes. It demonstrates loading data from a CSV file, checking for missing values, filtering based on age, and adding a new salary column. The program successfully saves the filtered data to a new CSV file and confirms the successful execution of the DataFrame operations.

Uploaded by

bborigarla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

DATE:

EXPRIMENT-8
DATAFRAME OBJECTS
AIM: To develop python program for dataframe objects.

SOURCE CODE:

import pandas as pd

# Step 1: Load data from a CSV file

df = pd.read_csv('data.csv')

# Step 2: Display the first few rows of the dataset

print("Original dataset:")

print(df.head())

# Step 3: Get summary information about the dataset

print("\nDataset information:")

print(df.info())

# Step 4: Check for missing values in each column

print("\nMissing values in each column:")

print(df.isnull().sum())

# Step 5: Display summary statistics of the dataset

print("\nSummary statistics of the dataset:")

print(df.describe())

# Step 6: Select specific columns (adjust column names as per your dataset)

selected_columns = df[['column1', 'column2']]

# Replace 'column1', 'column2' with actual column names

print("\nSelected columns:")
print(selected_columns.head())

# Step 7: Filter rows based on a condition

# Example: Only include rows where 'Age' column is greater than 25

filtered_df = df[df['Age'] > 25]

# Adjust the column name and condition based on your dataset

print("\nFiltered data where Age > 25:")

print(filtered_df.head())

# Step 8: Add a new column with calculated or default values (optional)

# Example: Adding a 'Salary' column with a default value of 50000

df['Salary'] = 50000 # Replace with calculation or value as required

print("\nData with new 'Salary' column:")

print(df.head())

# Step 9: Save the filtered data to a new CSV file

filtered_df.to_csv('filtered_data.csv', index=False)

print("\nFiltered data saved to 'filtered_data.csv'")

DATASET DESCRIPTION:

This dataset contains basic information about a few individuals, including their names, ages, cities, and
annual incomes. Each row in the dataset represents one person.
Columns
1. Name: The person’s first name (e.g., Alice, Bob).
2. Age: The person’s age in years (e.g., 24, 27).
3. City: The city where the person lives (e.g., New York, Chicago).
4. Income: The person’s annual income in USD (e.g., 72000, 82000)
Data give below:

Name Age City Income


.
Vignesh 24 New York 72000

Bhanu 27 Los Angeles 82000

Gowtham 22 Chicago 65000

Abhi 32 Houston 91000

Balaji 29 Phoenix 60000

Save data set by ‘dataset.csv’.

OUTPUT:

Original dataset:

Name Age City Income

Vignesh 24 New York 72000

Bhanu 27 Los Angeles 82000

Gowtham 22 Chicago 65000

Abhi 32 Houston 91000

Balaji 29 Phoenix 60000

Dataset information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 5 non-null object
1 Age 5 non-null int64
2 City 5 non-null object
3 Income 5 non-null int64
dtypes: int64(2), object(2)
memory usage: 136.0 bytes

Missing values in each column:


Name 0
Age 0
City 0
Income 0
dtype: int64

Summary statistics of the dataset:


Age Income
count 5.000000 5.000000
mean 26.0 76400.0
std 3.464101 10973.29111
min 22.0 65000.0
25% 24.0 72000.0
50% 27.0 74000.0
75% 29.0 80000.0
max 32.0 91000.0

Selected columns:

Age Income

24 72000

27 82000

22 65000

32 91000

29 60000
Filtered data where Age > 25:

Name Age City Income

Bhanu 27 Los Angeles 82000

Abhi 32 Houston 91000

Balaji 29 Phoenix 60000

Data with new 'Salary' column:

Name Age City Income Salary

Vignesh 24 New York 72000 50000

Bhanu 27 Los Angeles 82000 50000

Gowtham 22 Chicago 65000 50000

Abhi 32 Houston 91000 50000

Balaji 29 Phoenix 60000 50000

Filtered data saved to 'filtered_data.csv'


RESULT:
DataFrame Objects developed by Name , Age ,City and Income dataset is successful executed.

You might also like