0% found this document useful (0 votes)
2 views19 pages

EDAP

The document contains a series of Python programs demonstrating various operations using NumPy, including finding dimensions, shape, size, reshaping, flattening, transposing arrays, and performing slicing. It also covers stacking and concatenating arrays, as well as broadcasting for element-wise operations. Additionally, it includes a task to create a structured dataset using pandas to represent employee information.

Uploaded by

srinivas79668
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views19 pages

EDAP

The document contains a series of Python programs demonstrating various operations using NumPy, including finding dimensions, shape, size, reshaping, flattening, transposing arrays, and performing slicing. It also covers stacking and concatenating arrays, as well as broadcasting for element-wise operations. Additionally, it includes a task to create a structured dataset using pandas to represent employee information.

Uploaded by

srinivas79668
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 19

Week2:----------------------

Write a program to find the dimensions (number of axes or ranks) of a given NumPy
array.
[[1, 2, 3], [4, 5, 6]]
Dimensions: 2

Code:------
import numpy as np

# Given array
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Find the number of dimensions


dimensions = arr.ndim

print("Array:")
print(arr)
print("Dimensions:", dimensions)

Write a program to find the shape of a given NumPy array.


[1, 2, 3], [4, 5, 6]]
Shape: (2, 3)

import numpy as np

# Given array
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Get the shape of the array


shape = arr.shape

print("Array:")
print(arr)
print("Shape:", shape)

Write a program to find the size (total number of elements) of a given NumPy array.
[[1, 2, 3], [4, 5, 6]]
Size: 6

import numpy as np

# Given array
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Get the size of the array


size = arr.size

print("Array:")
print(arr)
print("Size:", size)

Write a program to reshape a given NumPy array into a different shape.


[1, 2, 3, 4, 5, 6]
Reshaped Array: [[1 2 3] [4 5 6]]

import numpy as np

# Given 1D array
arr = np.array([1, 2, 3, 4, 5, 6])
# Reshape into a 2D array with 2 rows and 3 columns
reshaped_arr = arr.reshape(2, 3)

print("Original Array:")
print(arr)
print("\nReshaped Array (2x3):")
print(reshaped_arr)

Write a program to flatten a given NumPy array, converting it into a 1D array.


[1, 2, 3], [4, 5, 6]]
Flattened Array: [1 2 3 4 5 6]

import numpy as np

# Given 2D array
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Flatten the array (returns a new copy)


flattened_arr = arr.flatten()

print("Original Array:")
print(arr)
print("\nFlattened Array:")
print(flattened_arr)

Write a program to find the transpose of a given NumPy array.


[[1, 2, 3], [4, 5, 6]]
Transpose of the Array: [[1 4] [2 5] [3 6]]

import numpy as np

# Given 2D array
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Transpose the array


transposed_arr = arr.T

print("Original Array:")
print(arr)
print("\nTransposed Array:")
print(transposed_arr)

Week 3:------------------------------------------------
Write a program to expand a given NumPy array by adding a new axis, turning a 1D
array into a 2D array.
[1, 2, 3, 4]
Expanded Array: [[1] [2] [3] [4]]

import numpy as np

# Given 1D array
arr = np.array([1, 2, 3, 4])

# Reshape to column vector (shape: 4x1)


expanded_arr = arr.reshape(-1, 1)

print("Original Array (shape {}):".format(arr.shape))


print(arr)
print("\nExpanded Array (shape {}):".format(expanded_arr.shape))
print(expanded_arr)

Write a program to squeeze a given NumPy array by removing any axes of length 1.
[[[1], [2], [3], [4]]]
Squeezed Array: [1 2 3 4]

import numpy as np

# Given array with unnecessary single-dimension axes


arr = np.array([[[1], [2], [3], [4]]])

# Squeeze the array (removes all length-1 dimensions)


squeezed_arr = np.squeeze(arr)

print("Original Array (shape {}):".format(arr.shape))


print(arr)
print("\nSqueezed Array (shape {}):".format(squeezed_arr.shape))
print(squeezed_arr)

Write a program to sort a given NumPy array in ascending order.


[5, 3, 6, 2, 4, 1]
Sorted Array: [1 2 3 4 5 6]

import numpy as np

# Given array
arr = np.array([5, 3, 6, 2, 4, 1])

# Sort the array (returns new array)


sorted_arr = np.sort(arr)

print("Original Array:")
print(arr)
print("\nSorted Array (ascending):")
print(sorted_arr)

Week 4:------------------------------------------------
Write a program to slice a 1D NumPy array to extract a specific portion of the
array.
[1, 2, 3, 4, 5, 6]
1:4
Sliced Array: [2 3 4]

import numpy as np

# Given array
arr = np.array([1, 2, 3, 4, 5, 6])

# Slice from index 1 to 4 (exclusive)


sliced_arr = arr[1:4]

print("Original Array:")
print(arr)
print("\nSliced Array (indices 1:4):")
print(sliced_arr)

Write a program to slice a 2D NumPy array and extract a specific portion of the
array.
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[1:3, 1:3]
Sliced Array: [[2 3] [5 6]]

import numpy as np

# Given 2D array
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

# Slice rows 1:3 and columns 1:3


sliced_arr = arr[1:3, 1:3]

print("Original Array:")
print(arr)
print("\nSliced Array (rows 1:3, columns 1:3):")
print(sliced_arr)

Write a program to slice a 3D NumPy array and extract a specific portion of the
array.
[[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]
[0:2, 0:2, 1:3]
Sliced Array: [[[2 3] [5 6]] [[8 9] [11 12]]]

import numpy as np

# Given 3D array (shape 2x2x3)


arr = np.array([
[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]
])

# Perform the slice [0:2, 0:2, 1:3]


sliced_arr = arr[0:2, 0:2, 1:3]

print("Original Array (shape: {}):".format(arr.shape))


print(arr)
print("\nSliced Array [0:2, 0:2, 1:3] (shape: {}):".format(sliced_arr.shape))
print(sliced_arr)

Write a program to perform negative slicing on a NumPy array to extract a portion


of the array starting from the end.
[1, 2, 3, 4, 5, 6]
[-3]
Sliced Array: [4 5 6]

import numpy as np

# Given 1D array
arr = np.array([1, 2, 3, 4, 5, 6])

# Negative slicing to get last 3 elements


sliced_arr = arr[-3:]

print("Original Array:")
print(arr)
print("\nSliced Array (last 3 elements using -3:):")
print(sliced_arr)
Week 5:------------------------------------------------
You are given multiple NumPy ndarrays of varying dimensions. Your task is to stack
these arrays along a specified axis and return the resulting stacked ndarray.

Input:
A list of NumPy ndarrays with the same shape along all axes except for the stacking
axis.

Output:
A single ndarray, which is the result of stacking the input ndarrays along the
specified axis.

Constraints:
The input arrays must have compatible shapes for stacking. For example, if axis =
0, the number of columns in all arrays should be the same. If axis = 1, the number
of rows should be the same.

Input - 1:
n = [12,27,13,54,75]
m = [66,74,81,79,90]

Output - 1
[[ 12 66]
[ 27 74]
[ 13 81]
[ 54 79]
[ 75 90]]

import numpy as np

def stack_arrays(arrays, axis=0):


"""
Stack multiple NumPy arrays along a specified axis

Parameters:
arrays (list of ndarray): Arrays to stack
axis (int): Axis along which to stack (default 0)

Returns:
ndarray: Stacked array
"""
return np.stack(arrays, axis=axis)

# Input arrays
n = np.array([12, 27, 13, 54, 75])
m = np.array([66, 74, 81, 79, 90])

# Stack along axis=1 (columns)


stacked = stack_arrays([n, m], axis=1)

print("Stacked Array (axis=1):")


print(stacked)

You are given multiple 1-dimensional or 2-dimensional NumPy ndarrays. Your task is
to concatenate these ndarrays along a specified axis (either rows or columns) and
return the resultant ndarray.

Input:
a = [1, 2], [3, 4]
b = [5, 6], [7, 8]

Output:
[[1 2]
[3 4]
[5 6]
[7 8]]

import numpy as np

def concatenate_arrays(arrays, axis=0):


"""
Concatenate multiple NumPy arrays along a specified axis

Parameters:
arrays (list of ndarray): Arrays to concatenate
axis (int): Axis along which to concatenate (0 for rows, 1 for columns)

Returns:
ndarray: Concatenated array
"""
return np.concatenate(arrays, axis=axis)

# Input arrays
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

# Concatenate along rows (axis=0)


concatenated = concatenate_arrays([a, b], axis=0)

print("Concatenated Array (axis=0):")


print(concatenated)

You are tasked with performing broadcasting on NumPy arrays to carry out element-
wise operations, such as addition, subtraction, multiplication, and division.
Broadcasting allows arrays of different shapes to be used together in arithmetic
operations, as long as they are compatible according to broadcasting rules.

Input:
n = [12, 22], [13, 44]
m = [19, 32]

Output:
[[31 54]
[32 76]]

[[ -7 -10]
[ -6 12]]

[[ 228 704]
[ 247 1408]]

[[0.63157895 0.6875 ]
[0.68421053 1.375 ]]

import numpy as np

# Input arrays
n = np.array([[12, 22], [13, 44]])
m = np.array([19, 32])

# Broadcasting operations
addition = n + m
subtraction = n - m
multiplication = n * m
division = n / m

print("Addition:")
print(addition)
print("\nSubtraction:")
print(subtraction)
print("\nMultiplication:")
print(multiplication)
print("\nDivision:")
print(division)

Week6:---------------------------------------------------------
Problem1:
Create a structured dataset to analyze and manipulate information effectively using
Python's pandas library. Design a pandas DataFrame that represents the following
real-world data scenarios.

Input:
| EMPN | ENAME | JOB | HIREDATE | SALARY | DEPTNO |
|-------|--------|------------|------------|--------|--------|
| 7369 | SMITH | CLERK | 17-DEC-80 | 800 | 20 |
| 7499 | ALLEN | SALESMAN | 20-FEB-81 | 1600 | 30 |
| 7521 | WARD | SALESMAN | 22-FEB-81 | 1250 | 30 |
| 7566 | JONES | MANAGER | 02-APR-81 | 2975 | 20 |
| 7654 | MARTIN | SALESMAN | 28-SEP-81 | 1250 | 30 |
| 7698 | BLAKE | MANAGER | 01-MAY-81 | 2850 | 30 |
| 7782 | CLARK | MANAGER | 09-JUN-81 | 2450 | 10 |
| 7788 | SCOTT | ANALYST | 19-APR-87 | 3000 | 20 |
| 7839 | KING | PRESIDENT | 17-NOV-81 | 5000 | 10 |
| 7844 | TURNER | SALESMAN | 08-SEP-81 | 1500 | 30 |
| 7876 | ADAMS | CLERK | 23-MAY-87 | 1100 | 20 |
| 7900 | JAMES | CLERK | 03-DEC-81 | 950 | 30 |
| 7902 | FORD | ANALYST | 03-DEC-81 | 3000 | 20 |
| 7934 | MILLER | CLERK | 23-JAN-82 | 1300 | 10 |

Output:
EMPN ENAME JOB HIREDATE SALARY DEPTNO
0 7369 SMITH CLERK 17-DEC-80 800 20
1 7499 ALLEN SALESMAN 20-FEB-81 1600 30
2 7521 WARD SALESMAN 22-FEB-81 1250 30
3 7566 JONES MANAGER 02-APR-81 2975 20
4 7654 MARTIN SALESMAN 28-SEP-81 1250 30
5 7698 BLAKE MANAGER 01-MAY-81 2850 30
6 7782 CLARK MANAGER 09-JUN-81 2450 10
7 7788 SCOTT ANALYST 19-APR-87 3000 20
8 7839 KING PRESIDENT 17-NOV-81 5000 10
9 7844 TURNER SALESMAN 08-SEP-81 1500 30
10 7876 ADAMS CLERK 23-MAY-87 1100 20
11 7900 JAMES CLERK 03-DEC-81 950 30
12 7902 FORD ANALYST 03-DEC-81 3000 20
13 7934 MILLER CLERK 23-JAN-82 1300 10
Code:
import pandas as pd

# Step 2: Define the input data


data = {
'EMPN': [7369, 7499, 7521, 7566, 7654, 7698, 7782, 7788, 7839, 7844, 7876,
7900, 7902, 7934],
'ENAME': ['SMITH', 'ALLEN', 'WARD', 'JONES', 'MARTIN', 'BLAKE', 'CLARK',
'SCOTT', 'KING', 'TURNER', 'ADAMS', 'JAMES', 'FORD', 'MILLER'],
'JOB': ['CLERK', 'SALESMAN', 'SALESMAN', 'MANAGER', 'SALESMAN', 'MANAGER',
'MANAGER', 'ANALYST', 'PRESIDENT', 'SALESMAN', 'CLERK', 'CLERK', 'ANALYST',
'CLERK'],
'HIREDATE': ['17-DEC-80', '20-FEB-81', '22-FEB-81', '02-APR-81', '28-SEP-81',
'01-MAY-81', '09-JUN-81', '19-APR-87', '17-NOV-81', '08-SEP-81', '23-MAY-87', '03-
DEC-81', '03-DEC-81', '23-JAN-82'],
'SALARY': [800, 1600, 1250, 2975, 1250, 2850, 2450, 3000, 5000, 1500, 1100,
950, 3000, 1300],
'DEPTNO': [20, 30, 30, 20, 30, 30, 10, 20, 10, 30, 20, 30, 20, 10]
}

# Step 3: Create DataFrame


df = pd.DataFrame(data)

# Step 4: Display DataFrame


print(df)

Problem2:
Problem statement

Write a Python program using pandas to demonstrate the concat() function. The task
involves creating and combining multiple DataFrames containing employee and
department data. Perform the following operations:

Create two DataFrames:

Employee Data: Containing employee details such as EMPN, ENAME, JOB, and SALARY.
Department Data: Containing department details such as DEPTNO, DEPT_NAME, and
LOCATION.
Concatenate the two DataFrames:

Row-wise concatenation: Merge the data row-wise to stack one DataFrame on top of
the other, ignoring column alignment.
Column-wise concatenation: Merge the data column-wise to align rows of one
DataFrame with columns of the other.
Handle scenarios:

Keep only matching rows using ignore_index=True.


Merge DataFrames with different column names and demonstrate how concat() handles
missing data (NaN values).

Input:
Table 1 - Employee Data
| EMPN | ENAME | JOB | SALARY |
|-------|--------|------------|--------|
| 7369 | SMITH | CLERK | 800 |
| 7499 | ALLEN | SALESMAN | 1600 |
| 7521 | WARD | SALESMAN | 1250 |
Table 2 - Department Data
| DEPTNO | DEPT_NAME | LOCATION |
|--------|-----------|-----------|
| 10 | HR | New York |
| 20 | Finance | London |
| 30 | Sales | Tokyo |

Output:
Row-wise Concatenation:
EMPN ENAME JOB SALARY DEPTNO DEPT_NAME LOCATION
0 7369.0 SMITH CLERK 800.0 NaN NaN NaN
1 7499.0 ALLEN SALESMAN 1600.0 NaN NaN NaN
2 7521.0 WARD SALESMAN 1250.0 NaN NaN NaN
3 NaN NaN NaN NaN 10.0 HR New York
4 NaN NaN NaN NaN 20.0 Finance London
5 NaN NaN NaN NaN 30.0 Sales Tokyo

Column-wise Concatenation:
EMPN ENAME JOB SALARY DEPTNO DEPT_NAME LOCATION
0 7369 SMITH CLERK 800 10 HR New York
1 7499 ALLEN SALESMAN 1600 20 Finance London
2 7521 WARD SALESMAN 1250 30 Sales Tokyo

Code:
import pandas as pd

# Step 1: Create Employee Data DataFrame


employee_data = {
'EMPN': [7369, 7499, 7521],
'ENAME': ['SMITH', 'ALLEN', 'WARD'],
'JOB': ['CLERK', 'SALESMAN', 'SALESMAN'],
'SALARY': [800, 1600, 1250]
}
df_employee = pd.DataFrame(employee_data)

# Step 2: Create Department Data DataFrame


department_data = {
'DEPTNO': [10, 20, 30],
'DEPT_NAME': ['HR', 'Finance', 'Sales'],
'LOCATION': ['New York', 'London', 'Tokyo']
}
df_department = pd.DataFrame(department_data)

# Step 3: Row-wise concatenation (stacking DataFrames vertically)


row_concat = pd.concat([df_employee, df_department], axis=0, ignore_index=True)

# Step 4: Column-wise concatenation (aligning DataFrames horizontally)


column_concat = pd.concat([df_employee, df_department], axis=1)

# Step 5: Display the results


print("Row-wise Concatenation:")
print(row_concat)
print("\nColumn-wise Concatenation:")
print(column_concat)

Problem3:
Given an employee data set in Pandas.Filter employees based on specific conditions.
Import Necessary Libraries
Create a Sample DataFrame
Set Conditions
Combine Multiple Conditions

Code:
import pandas as pd

# Step 1: Import necessary libraries


# pandas is already imported above.

# Step 2: Create a sample DataFrame


data = {
'EMPN': [7369, 7499, 7521, 7566, 7654, 7698, 7782, 7788, 7839, 7844, 7876,
7900, 7902, 7934],
'ENAME': ['SMITH', 'ALLEN', 'WARD', 'JONES', 'MARTIN', 'BLAKE', 'CLARK',
'SCOTT', 'KING', 'TURNER', 'ADAMS', 'JAMES', 'FORD', 'MILLER'],
'JOB': ['CLERK', 'SALESMAN', 'SALESMAN', 'MANAGER', 'SALESMAN', 'MANAGER',
'MANAGER', 'ANALYST', 'PRESIDENT', 'SALESMAN', 'CLERK', 'CLERK', 'ANALYST',
'CLERK'],
'SALARY': [800, 1600, 1250, 2975, 1250, 2850, 2450, 3000, 5000, 1500, 1100,
950, 3000, 1300],
'DEPTNO': [20, 30, 30, 20, 30, 30, 10, 20, 10, 30, 20, 30, 20, 10]
}
df = pd.DataFrame(data)

# Step 3: Set conditions


# Condition 1: Employees with salary greater than 2000
condition1 = df['SALARY'] > 2000

# Condition 2: Employees in department 20


condition2 = df['DEPTNO'] == 20

# Condition 3: Employees who are managers


condition3 = df['JOB'] == 'MANAGER'

# Step 4: Combine multiple conditions


# Example 1: Employees with salary > 2000 AND in department 20
combined_condition1 = condition1 & condition2

# Example 2: Employees who are managers OR have a salary > 2000


combined_condition2 = condition1 | condition3

# Example 3: Employees in department 10 OR department 20


combined_condition3 = (df['DEPTNO'] == 10) | (df['DEPTNO'] == 20)

# Step 5: Apply the conditions and filter the DataFrame


filtered_df1 = df[combined_condition1]
filtered_df2 = df[combined_condition2]
filtered_df3 = df[combined_condition3]

# Step 6: Display the filtered DataFrames


print("Employees with salary > 2000 AND in department 20:")
print(filtered_df1)
print("\nEmployees who are managers OR have a salary > 2000:")
print(filtered_df2)
print("\nEmployees in department 10 OR department 20:")
print(filtered_df3)
Problem4:
You are working with an employee data set in Pandas. You need to add a new column
based on some calculations or conditions.

Import Necessary Libraries


Create a Sample DataFrame
Add a New Column
Add Multiple Columns at Once

Code:
import pandas as pd

# Step 1: Import necessary libraries


# pandas is already imported above.

# Step 2: Create a sample DataFrame


data = {
'EMPN': [7369, 7499, 7521, 7566, 7654, 7698, 7782, 7788, 7839, 7844, 7876,
7900, 7902, 7934],
'ENAME': ['SMITH', 'ALLEN', 'WARD', 'JONES', 'MARTIN', 'BLAKE', 'CLARK',
'SCOTT', 'KING', 'TURNER', 'ADAMS', 'JAMES', 'FORD', 'MILLER'],
'JOB': ['CLERK', 'SALESMAN', 'SALESMAN', 'MANAGER', 'SALESMAN', 'MANAGER',
'MANAGER', 'ANALYST', 'PRESIDENT', 'SALESMAN', 'CLERK', 'CLERK', 'ANALYST',
'CLERK'],
'SALARY': [800, 1600, 1250, 2975, 1250, 2850, 2450, 3000, 5000, 1500, 1100,
950, 3000, 1300],
'DEPTNO': [20, 30, 30, 20, 30, 30, 10, 20, 10, 30, 20, 30, 20, 10]
}
df = pd.DataFrame(data)

# Step 3: Add a new column based on a calculation or condition


# Example: Add a column for bonus (10% of salary)
df['BONUS'] = df['SALARY'] * 0.10

# Example: Add a column to indicate high earners (salary > 2000)


df['HIGH_EARNER'] = df['SALARY'] > 2000

# Step 4: Add multiple columns at once


# Example: Add columns for tax (5% of salary) and net salary (salary - tax)
df['TAX'] = df['SALARY'] * 0.05
df['NET_SALARY'] = df['SALARY'] - df['TAX']

# Step 5: Display the updated DataFrame


print("Updated DataFrame with new columns:")
print(df)

Week 7:----------------------------------------------
Problem statement:---------
You have a DataFrame that contains missing values (NaN) in certain columns. You
need to replace these NaN values with a specific string.

Import Necessary Libraries


Create a Sample DataFrame with NaN Values
Fill NaN Values with a String
Evaluate the result

Code:-----
# 1. Import Necessary Libraries
import pandas as pd
import numpy as np

# 2. Create a Sample DataFrame with NaN Values


data = {
'Name': ['Alice', 'Bob', np.nan, 'David'],
'Age': [25, np.nan, 30, np.nan],
'City': ['NY', 'LA', np.nan, 'Chicago']
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# 3. Fill NaN Values with a String


replacement_string = "MISSING"
df_filled = df.fillna(replacement_string)

# 4. Evaluate the result


print("\nDataFrame after filling NaN values:")
print(df_filled)

Problem statement:------
You have a DataFrame containing employee data, and you want to sort the DataFrame
based on various columns

Import Necessary Libraries


Create a Sample DataFrame
Sorting by Column Values
Sorting by Multiple Columns
Integrate the results

Code:-----
# 1. Import Necessary Libraries
import pandas as pd
import numpy as np

# 2. Create a Sample DataFrame with Employee Data


data = {
'EmployeeID': [102, 101, 104, 103],
'Name': ['John Doe', 'Jane Smith', 'Bob Johnson', 'Alice Brown'],
'Department': ['IT', 'HR', 'IT', 'Finance'],
'Salary': [75000, 65000, 80000, 72000],
'JoinDate': pd.to_datetime(['2020-01-15', '2019-05-20', '2021-03-10', '2020-11-
05'])
}

df = pd.DataFrame(data)
print("Original Employee DataFrame:")
print(df)

# 3. Sorting by Column Values (Single Column)


# Sort by EmployeeID (ascending order)
df_sorted_id = df.sort_values('EmployeeID')
print("\nDataFrame sorted by EmployeeID:")
print(df_sorted_id)

# Sort by Salary (descending order)


df_sorted_salary = df.sort_values('Salary', ascending=False)
print("\nDataFrame sorted by Salary (descending):")
print(df_sorted_salary)

# 4. Sorting by Multiple Columns


# Sort by Department (ascending) then Salary (descending)
df_sorted_multi = df.sort_values(['Department', 'Salary'], ascending=[True, False])
print("\nDataFrame sorted by Department (A-Z) then Salary (High-Low):")
print(df_sorted_multi)

# 5. Integrate the results


# Create a comprehensive sorted view
final_sorted_df = df.sort_values(
['Department', 'Salary', 'JoinDate'],
ascending=[True, False, True] # Department A-Z, Salary High-Low, JoinDate Old-
New
)
print("\nFinal Integrated Sort (Department, Salary, JoinDate):")
print(final_sorted_df)

Problem statement:-----------
You have a DataFrame containing employee data, and you need to perform various
aggregation operations grouped by specific columns.

Import Necessary Libraries


Create a Sample DataFrame
Using groupby() for Aggregation
Using Multiple Aggregation Functions
Grouping by Multiple Columns
Evaluate the results

Code:-----------
# 1. Import Necessary Libraries
import pandas as pd
import numpy as np

# 2. Create a Sample DataFrame with Employee Data


data = {
'EmployeeID': [101, 102, 103, 104, 105, 106, 107, 108],
'Name': ['John Doe', 'Jane Smith', 'Bob Johnson', 'Alice Brown',
'Mike Davis', 'Sarah Wilson', 'Tom Taylor', 'Emily Clark'],
'Department': ['IT', 'HR', 'IT', 'Finance', 'IT', 'HR', 'Finance', 'IT'],
'Salary': [75000, 65000, 80000, 72000, 82000, 68000, 76000, 78000],
'Experience': [5, 3, 7, 4, 6, 2, 5, 4],
'Location': ['NY', 'LA', 'NY', 'Chicago', 'Chicago', 'LA', 'NY', 'Chicago']
}

df = pd.DataFrame(data)
print("Original Employee DataFrame:")
print(df)

# 3. Using groupby() for Aggregation


# Basic groupby - average salary by department
dept_salary = df.groupby('Department')['Salary'].mean()
print("\nAverage Salary by Department:")
print(dept_salary)

# 4. Using Multiple Aggregation Functions


# Multiple aggregations on salary by department
dept_stats = df.groupby('Department')['Salary'].agg(['mean', 'median', 'min',
'max', 'count'])
print("\nSalary Statistics by Department:")
print(dept_stats)

# 5. Grouping by Multiple Columns


# Average salary by department and location
dept_loc_stats = df.groupby(['Department', 'Location'])['Salary'].mean()
print("\nAverage Salary by Department and Location:")
print(dept_loc_stats)

# Multiple aggregations on multiple columns


comprehensive_stats = df.groupby(['Department', 'Location']).agg({
'Salary': ['mean', 'max', 'count'],
'Experience': ['mean', 'sum']
})
print("\nComprehensive Statistics by Department and Location:")
print(comprehensive_stats)

# 6. Evaluate the results


# Let's analyze the most interesting findings
print("\nKey Findings:")
print(f"- Highest average salary: {dept_stats['mean'].idxmax()} department ($
{dept_stats['mean'].max():,.2f})")
print(f"- Most employees work in: {dept_stats['count'].idxmax()} department
({dept_stats['count'].max()} employees)")
print("- Salary range by location:")
print(dept_loc_stats.unstack())

Week 8:-------------------------------------------
Problem statement:-----
Read the following Text file using Pandas.

Note:
Step 1: Create a Text file with name Text1.txt
Step2: Enter the below content into your text file

Situation goes like this: There is one hungry Tiger not eaten for many days.
A. Save the Deer
B. Save the Tiger
C. Wait and see, what will happen
D. Do Nothing
E. None of the above

Step3: After entering the Content save the file.


Step4: Finally Read and close the file

Code:-----
# File path to the text file in D drive
file_path = r"D:\Text1.txt"

with open(file_path, 'r') as file:


text_content = file.read()

print("\nRaw text content:")


print(text_content)
Problem statement:--------
Read the following CSV file formats using pandas.

Note:
Step1: Create a CSV File with Name Animal_category.csv
Step2: Enter the Below data into your CSV File
| Index | Animals | Gender | Homly | Types |
|-------|---------|--------|-------|-------|
| 1 | Cat | Male | Yes | A |
| 2 | Dog | Male | Yes | B |
| 3 | Mouse | Male | Yes | C |
| 4 | Mouse | Male | Yes | C |
| 5 | Dog | Female | Yes | A |
| 6 | Cat | Female | Yes | B |
| 7 | Lion | Female | Yes | D |
| 8 | Goat | Female | Yes | E |
| 9 | Cat | Female | Yes | A |
| 10 | Dog | Male | Yes | NaN |
| 11 | Dog | Male | Yes | B |
| 12 | Lion | Male | No | D |
| 13 | Lion | Male | No | D |
| 14 | Lion | Male | No | D |
| 15 | Cat | Female | Yes | A |
| 16 | Lion | Female | No | D |
| 17 | Lion | Female | No | D |
| 18 | Cat | Female | Yes | A |
| 19 | Goat | Male | No | E |
| 20 | Goat | Female | No | E |
| 21 | Goat | Male | No | E |
| 22 | Goat | Female | No | E |
| 23 | Lion | Male | No | D |
| 24 | Lion | Female | No | D |
| 25 | Lion | Male | No | D |
| 26 | Lion | Female | No | D |
| 27 | Dog | Male | Yes | B |
| 28 | Lion | Female | No | D |
| 29 | Cat | Male | Yes | A |
| 30 | Lion | Female | No | D |
Step:3 Finally Read the CSV file and close it

Code:----
# Import pandas library
import pandas as pd

# Specify your file path


file_path = "D:/Animal_category.csv"

# Read the CSV file


animal_data = pd.read_csv(file_path)

# Display the first few rows


print("First 5 rows of the data:")
print(animal_data.head())

# Display basic information about the DataFrame


print("\nDataFrame Info:")
print(animal_data.info())
Problem statement:--------
Read the following Excel file formats using pandas.

Note:
Step 1: Create an Excel file and Name it as University_Clustering.xlsx
Step 2: Enter the Below data into your Excel file
| UnivID | Univ | State | SAT | Top10 | Accept | SFRatio | Expenses | GradRate
|
|--------|------------|-------|--------|-------|--------|---------|----------|-----
-----|
| 1 | Brown | RI | 1310.0 | 89 | 22 | 13.0 | 22704 | 94.0 |
| 2 | CalTech | CA | 1415.0 | 100 | 25 | 6.0 | 63575 | 81.0 |
| 3 | CMU | PA | 1260.0 | 62 | 59 | 9.0 | 25026 | 72.0 |
| 4 | Columbia | NY | 1310.0 | 76 | 24 | 12.0 | 31510 | NaN |
| 5 | Cornell | NY | 1280.0 | 83 | 33 | 13.0 | 21864 | 90.0 |
| 6 | Dartmouth | NH | 1340.0 | 89 | 23 | 10.0 | 32162 | 95.0 |
| 7 | Duke | NC | 1315.0 | 90 | 30 | 12.0 | 31585 | 95.0 |
| 8 | Georgetown | DC | NaN | 74 | 24 | 12.0 | 20126 | 92.0 |
| 9 | Harvard | MA | 1400.0 | 91 | 14 | 11.0 | 39525 | 97.0 |
| 10 | JohnsHopkins | MD | 1305.0 | 75 | 4 | 7.0 | 58691 | 87.0 |
| 11 | MIT | MA | 1380.0 | 94 | 30 | 10.0 | 34870 | 91.0 |
| 12 | Northwestern | IL | 1260.0 | 85 | 39 | 11.0 | 28052 | 89.0 |
| 13 | NotreDame | IN | 1255.0 | 81 | 42 | 13.0 | 15122 | 94.0 |
| 14 | PennState | PA | 1081.0 | 80 | 54 | 18.0 | 10185 | 80.0 |
| 15 | Princeton | NJ | 1375.0 | 94 | 10 | 8.0 | 30222 | 97.0 |
| 16 | Purdue | IN | 1005.0 | 28 | 90 | 19.0 | 20126 | 75.0 |
| 17 | Stanford | CA | 1360.0 | 90 | 20 | 12.0 | 36450 | 93.0 |
| 18 | TexasA&M | TX | 1075.0 | 49 | 67 | 25.0 | 8704 | 67.0 |
| 19 | UCBerkeley | CA | 1240.0 | 95 | 40 | 17.0 | 15140 | 78.0 |
| 20 | UChicago | IL | 1290.0 | 75 | 50 | NaN | 38380 | 87.0 |
| 21 | UMichigan | MI | 1280.0 | 65 | 68 | 16.0 | 15470 | 86.0 |
| 22 | UPenn | PA | 1285.0 | 80 | 36 | 11.0 | 27553 | 92.0 |
| 23 | UVA | VA | 1225.0 | 77 | 44 | 12.0 | 13349 | 92.0 |
| 24 | UWisconsin | WI | 1085.0 | 40 | 69 | 15.0 | 11857 | 71.0 |
| 25 | Yale | CT | 1375.0 | 95 | 19 | 11.0 | 43514 | 97.0 |
Step 3: Read and close the Excel file

Code:-----
import pandas as pd

file_path = r"D:\University_Clustering.xlsx" # Replace with your actual path

try:
df = pd.read_excel(file_path)
print(df.head()) # Display first 5 rows
except FileNotFoundError:
print(f"Error: File not found at {file_path}")
except Exception as e:
print(f"An error occurred: {str(e)}")

Problem statement:-----
Read the following JSON File formats using Pandas.

Note:
Create a JSON file with Name Sample1.json

Code:-----
import pandas as pd
# Read simple JSON file
df = pd.read_json('data.json')
print(df.head())

Week 9:---------------------------------------------------------------
Problem1:--------
Code:------
import pandas as pd
import pickle
import os

# 1. Create and save sample DataFrame


data = {
'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor'],
'Price': [999, 699, 349, 249],
'Stock': [45, 102, 28, 15]
}
df = pd.DataFrame(data)
df.to_pickle('products.pkl')

# 2. Load and analyze the pickle file


try:
# Load the pickle file
loaded_df = pd.read_pickle('products.pkl')

# Basic analysis
print("=== Product Data Analysis ===")
print(f"\nTotal products: {len(loaded_df)}")
print(f"\nMost expensive product: {loaded_df.loc[loaded_df['Price'].idxmax(),
'Product']}")
print(f"\nLow stock items (<20):")
print(loaded_df[loaded_df['Stock'] < 20][['Product', 'Stock']])

# Advanced: Add a new calculated column


loaded_df['Value'] = loaded_df['Price'] * loaded_df['Stock']
print(f"\nTotal inventory value: ${loaded_df['Value'].sum():,}")

except FileNotFoundError:
print("Error: The pickle file was not found")
except pd.errors.EmptyDataError:
print("Error: The pickle file is empty")
except Exception as e:
print(f"Unexpected error: {str(e)}")
finally:
# Clean up (remove sample file)
if os.path.exists('products.pkl'):
os.remove('products.pkl')

Problem2:--------
Code:------
from PIL import Image
import matplotlib.pyplot as plt

def process_image(image_path):
try:
# Open image
img = Image.open(image_path)
print(f"\nProcessing: {image_path}")
print(f"Original size: {img.size}, Mode: {img.mode}")

# Display original
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Original")
plt.axis('off')

# Process image (convert to grayscale and resize)


gray_img = img.convert('L')
resized_img = gray_img.resize((300, 300), Image.Resampling.LANCZOS)

# Display processed
plt.subplot(1, 2, 2)
plt.imshow(resized_img, cmap='gray')
plt.title("Processed (Grayscale & Resized)")
plt.axis('off')
plt.show()

# Save processed image


output_path = "processed_" + os.path.basename(image_path)
resized_img.save(output_path)
print(f"Saved processed image as: {output_path}")

except Exception as e:
print(f"Error processing {image_path}: {str(e)}")

# Example usage
process_image("sample.jpg")

Problem3:--------
Code:------
import glob
import pandas as pd

def process_files(directory='data', patterns=['*.csv', '*.txt']):


"""Process all files matching patterns in directory and subdirectories"""
all_files = []

# Collect all matching files


for pattern in patterns:
search_path = f"{directory}/**/{pattern}" if directory else f"**/{pattern}"
all_files.extend(glob.glob(search_path, recursive=True))

# Process each file


for file_path in all_files:
try:
if file_path.endswith('.csv'):
df = pd.read_csv(file_path)
print(f"\nCSV data from {file_path}:")
print(df.head())

elif file_path.endswith('.txt'):
with open(file_path, 'r') as file:
print(f"\nText content from {file_path}:")
print(file.read())

except Exception as e:
print(f"Error processing {file_path}: {str(e)}")

# Example usage
process_files(directory='data', patterns=['*.csv', '*.txt'])

Problem4:-----------
Install Required Libraries: pip install pandas mysql-connector-python

Code:----
import pandas as pd
import mysql.connector
from mysql.connector import Error

try:
# Establish connection
connection = mysql.connector.connect(
host='localhost',
database='company_db',
user='root',
password='securepassword'
)

if connection.is_connected():
print("Connected to MySQL database")

# Query data
query = "SELECT * FROM employees WHERE department = %s"
df = pd.read_sql(query, connection, params=['Marketing'])

# Process data
print(df.head())

except Error as e:
print(f"Error while connecting to MySQL: {e}")

finally:
# Close connection
if connection.is_connected():
connection.close()
print("MySQL connection is closed")

You might also like