0% found this document useful (0 votes)
3 views9 pages

Geo Python Doc (1) 7,8 Bavesh

The document outlines a Python program using Pandas to create a DataFrame and perform various operations including data selection, indexing, and handling missing data. It also describes a separate program using Matplotlib to implement different types of plots such as line, scatter, box, and histogram. Both programs are successfully executed as per the provided algorithms.

Uploaded by

BAVESSH Chaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views9 pages

Geo Python Doc (1) 7,8 Bavesh

The document outlines a Python program using Pandas to create a DataFrame and perform various operations including data selection, indexing, and handling missing data. It also describes a separate program using Matplotlib to implement different types of plots such as line, scatter, box, and histogram. Both programs are successfully executed as per the provided algorithms.

Uploaded by

BAVESSH Chaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

EX NO : 7 CREATE A DATAFRAME USING PANDAS

DATE:

AIM:
To create a dataframe using pandas by performing the following
operations
 Data Selection
 Data Indexing
 Handling missing data in normal attributes
 Handling missing data in numeric attributes
 Grouping operations

ALGORITHM:
Step 1: Start
Step 2: Import the required libraries (NumPy and Pandas)
Step 3: Create a DataFrame containing Name, Department, Salary and
Experience in Dictionary format ({ })
Step 4: Select specific columns and specific rows by using integer -
location indexing (iloc)
Step 5: Set any column as new index using set_index() and handle missing
data in normal attributes
Step 6: Perform group operations and find its mean using
mean() Step 7: Print the result
Step 8: Stop

PROGRAM:
import pandas as pd
import numpy as np

# 1. Creating a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', np.nan],
'Department': ['HR', 'IT', 'HR', np.nan,
'Finance'], 'Salary': [50000, 60000, np.nan,

Register number: 2127240501036 Page:


65000, 70000],

Register number: 2127240501036 Page:


'Experience': [2, 4, 5, 3, np.nan]
}

df = pd.DataFrame(data)
print("Original DataFrame:\n", df)

# 2. Data Selection (selecting specific columns and rows)


print("\nSelected Columns (Name and Salary):\n", df[['Name',
'Salary']])
print("\nSelected Rows (first two rows):\n", df.iloc[0:2])

# 3. Data Indexing (setting 'Name' as the index)


df_indexed = df.set_index('Name')
print("\nDataFrame with 'Name' as index:\n", df_indexed)

# 4. Handling missing data in nominal (categorical)


attributes # Filling missing 'Department' with mode (most
frequent value)
df['Department'].fillna(df['Department'].mode()[0],
inplace=True)

# Filling missing 'Name' with a placeholder


df['Name'].fillna('Unknown', inplace=True)
print("\nDataFrame after handling missing nominal data:\n", df)

# 5. Handling missing data in numeric attributes


# Filling missing 'Salary' and 'Experience' with mean values
df['Salary'].fillna(df['Salary'].mean(), inplace=True)
df['Experience'].fillna(df['Experience'].mean(), inplace=True)

print("\nDataFrame after handling missing numeric data:\n"

Register number: 2127240501036 Page:


# 6. Grouping operations (Group by Department and calculate
average Salary and Experience)
grouped = df.groupby('Department')[['Salary',
'Experience']].mean()

print("\nGrouped by Department (average Salary and


Experience):\n", grouped)

SAMPLE INPUT AND OUTPUT:

Original DataFrame:
Name Department Salary Experience
0 SARO HR 50000.0 2.0
1 MAITHU IT 60000.0 4.0
2 JITHESH HR NaN 5.0
3 GOKUL NaN 65000.0 3.0
4 NaN Finance 70000.0 NaN

Selected Columns (Name and Salary):


Name Salary
0 SARO 50000.0
1 MAITHU 60000.0
2 JITHESH NaN
3 GOKUL 65000.0
4 NaN 70000.0

Selected Rows (first two rows):


Name Department Salary Experience
0 SARO HR 50000.0 2.0
1 MAITHU IT 60000.0 4.0

DataFrame with 'Name' as index:


Department Salary Experience
Name
SARO HR 50000.0 2.0
MAITHU IT 60000.0 4.0
JITHESH HR NaN 5.0
GOKUL NaN 65000.0 3.0
NaN Finance 70000.0 NaN

Register number: 2127240501036 Page:


DataFrame after handling missing nominal data:
Name Department Salary Experience
0 SARO HR 50000.0 2.0
1 MAITHU IT 60000.0 4.0
2 JITHESH HR NaN 5.0
3 GOKUL HR 65000.0 3.0
4 Unknown Finance 70000.0 NaN

DataFrame after handling missing numeric data:


Name Department Salary Experience
0 SARO HR 50000.0 2.0
1 MAITHU IT 60000.0 4.0
2 JITHESH HR 61250.0 5.0
3 GOKUL HR 65000.0 3.0
4 Unknown Finance 70000.0 3.5

Grouped by Department (average Salary and Experience):


Salary Experience
Department
Finance 70000.0 3.500000
HR 58750.0 3.333333
IT 60000.0 4.000000

RESULT:
Thus the python program for creating a data frame using pandas and
Register number: 2127240501036 Page:
performing various operation is implemented and executed successfully

Register number: 2127240501036 Page:


EX NO : 8 IMPLEMENTATION OF MATPLOTLIB

DATE:

AIM:
To Write a python program to implement the following plots using
Matplotlib
i. Line plot
ii. Scatter plot
iii.Density plot
iv.Box plot
v.Histogram

ALGORITHM:
Step 1: Start
Step 2: Import the libraries matplotlib and numpy
Step 3: Create x values using np.array and y value as np.array
Step 4: Create the line plot, scatter plot, density plot and box plot using
plot(), scatter(), hist() and boxplot()
Step 5: Show the plot using plt.show() and print
them Step 6: Stop

PROGRAM:

import numpy as np
import matplotlib.pyplot as plt

# Defining a small set of points


x_values = np.array([0, 3, 6, 9])
y_values = np.array([0, 1, -1, 0])

# 1. Line Plot
plt.plot(x_values, y_values, marker='o',
linestyle=’-‘,color='b', label='Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Plot with Limited Points')
plt.xlim(-1, 10)
plt.ylim(-2, 2)
plt.legend()
Register number: 2127240501036 Page:
plt.show()

# 2. Scatter Plot
plt.scatter(x_values, y_values, color='r', marker='x',
label='Scatter Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot with Limited Points')
plt.xlim(-1, 10)
plt.ylim(-2, 2)
plt.legend()
plt.show()

# 3. Box Plot (Using limited values)


plt.boxplot(y_values, vert=True, patch_artist=True,
boxprops=dict(facecolor="purple"))
plt.xlabel('Data')
plt.title('Box Plot with Limited Points')
plt.ylim(-2, 2)
plt.show()

# 4. Histogram (Using y-values)


plt.hist(y_values, bins=4, color='orange', edgecolor='black',
alpha=0.7)
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Histogram with Limited Points')
plt.xlim(-2, 2)
plt.show()

Sample Input and Output:

Register number: 2127240501036 Page:


RESULT:
Thus the following plot using matplotlib has been compiled and implemented
successfully

Register number: 2127240501036 Page:

You might also like