0% found this document useful (0 votes)
2 views

Python ClassXII AI

This document provides an overview of Python programming with a focus on libraries like NumPy and Pandas for data manipulation and analysis. It covers creating and managing data structures, handling missing values, and importing/exporting CSV files, along with practical code examples. Additionally, it includes exercises and case studies to reinforce learning about data handling techniques in Python.

Uploaded by

Neha Makhija
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Python ClassXII AI

This document provides an overview of Python programming with a focus on libraries like NumPy and Pandas for data manipulation and analysis. It covers creating and managing data structures, handling missing values, and importing/exporting CSV files, along with practical code examples. Additionally, it includes exercises and case studies to reinforce learning about data handling techniques in Python.

Uploaded by

Neha Makhija
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

UNIT 1: PYTHON PROGRAMMING – II

1.1 Python Libraries


Explanation:
Python has a rich set of libraries (like toolkits) that save time and effort by providing ready-to-use functions for tasks like data analysis,
mathematical operations, and machine learning.

1.1.1 NumPy Library


 NumPy = Numerical Python
 It allows creation of multi-dimensional arrays and offers fast mathematical operations.
 Arrays are more powerful than lists for numerical tasks.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
✅ Explanation: This creates a 2D array (2 rows × 3 columns) using np.array(). NumPy arrays support element-wise operations like
addition, multiplication, etc.

1.1.2 Pandas Library


 Pandas is for data manipulation and analysis.
 It provides two main structures:
o Series: 1D labeled array
o DataFrame: 2D labeled table (like an Excel sheet)

Creating Series
import pandas as pd
series = pd.Series([10, 20, 30])
print(series)
✅ Explanation: This creates a simple 1D labeled array with automatic index values starting from 0.

Creating DataFrame from NumPy Arrays


import numpy as np
import pandas as pd

array1 = np.array([90, 100, 110, 120])


array2 = np.array([50, 60, 70, 80])
array3 = np.array([10, 20, 30, 40])

marksDF = pd.DataFrame([array1, array2, array3], columns=['A', 'B', 'C', 'D'])


print(marksDF)
✅ Explanation: Each array is treated as a row. The columns are named A, B, C, D.

Creating DataFrame from Dictionary of Lists


data = {'Name': ['Varun', 'Ganesh', 'Joseph', 'Abdul', 'Reena'],
'Age': [37, 30, 38, 39, 40]}
df = pd.DataFrame(data)
print(df)
✅ Explanation: Each key becomes a column (Name, Age). Each list element becomes a row. This is the most common way of creating
structured data in Pandas.

DataFrame from List of Dictionaries


listDict = [{'a': 10, 'b': 20}, {'a': 5, 'b': 10, 'c': 20}]
a = pd.DataFrame(listDict)
print(a)
✅ Explanation: Each dictionary is a row. Missing values (like no 'c' in 1st row) are filled with NaN (Not a Number).

1.1.2.2 Row and Column Operations


Adding a New Column
Result['Fathima'] = [89, 78, 76]
✅ Explanation: A new column named 'Fathima' is added to the existing DataFrame. Values must match the number of rows (subjects).

Adding a New Row


Result.loc['English'] = [90, 92, 89, 80, 90, 88]
✅ Explanation: .loc is used to add or access data using labels. A new subject 'English' is added as a row.

Updating an Existing Row


Result.loc['Science'] = [92, 84, 90, 72, 96, 88]
✅ Explanation: Overwrites the entire row for 'Science' with new values.
1.1.2.3 Deleting Rows or Columns
Deleting a Row
Result = Result.drop('Hindi', axis=0)
✅ Explanation: Deletes the row labeled 'Hindi'. axis=0 means row-wise deletion.

Deleting Multiple Columns


Result = Result.drop(['Rajat', 'Meenakshi', 'Karthika'], axis=1)
✅ Explanation: Deletes the specified student columns. axis=1 refers to column-wise deletion.

1.1.2.4 DataFrame Attributes and Head/Tail


print(df.index) # Returns row labels (Index)
print(df.columns) # Returns column labels
print(df.shape) # (rows, columns)
print(df.head(2)) # First 2 rows
print(df.tail(2)) # Last 2 rows
✅ Explanation:
 index, columns, shape: metadata of the DataFrame.
 head() and tail() are useful for quickly viewing a few entries.

1.2 Importing and Exporting CSV Files


Importing CSV File
df = pd.read_csv("studentsmarks.csv")
✅ Explanation:
Loads data from a CSV file into a DataFrame. CSV = Comma-Separated Values.

Exporting DataFrame to CSV


df.to_csv('resultout.csv', index=False)
✅ Explanation: Saves the DataFrame into a CSV file. index=False avoids saving row numbers as a separate column.

1.3 Handling Missing Values


Check for Missing Data
df.isnull() # Shows True/False where values are missing
df['Science'].isnull().any() # Checks if 'Science' column has any NaNs
df.isnull().sum().sum() # Total NaN values in DataFrame
✅ Explanation: These functions help find where data is missing (very common in real-life datasets).

Drop Rows with Missing Values


df = df.dropna()
✅ Explanation: Removes any row that has at least one missing value.

Fill Missing Values with 0


df = df.fillna(0)
✅ Explanation: Replaces all NaN values with 0. This is useful when you don’t want to lose data due to missing entries.

1.4 Case Study – Handling Missing Marks


import pandas as pd
import numpy as np

ResultSheet = {
'Maths': pd.Series([90, 91, 97, 89, 65, 93], index=['Heena', 'Shefali', 'Meera', 'Joseph', 'Suhana',
'Bismeet']),
'Science': pd.Series([92, 81, np.NaN, 87, 50, 88], index=['Heena', 'Shefali', 'Meera', 'Joseph', 'Suhana',
'Bismeet']),
'English': pd.Series([89, 91, 88, 78, 77, 82], index=['Heena', 'Shefali', 'Meera', 'Joseph', 'Suhana',
'Bismeet']),
'Hindi': pd.Series([81, 71, 67, 82, np.NaN, 89], index=['Heena', 'Shefali', 'Meera', 'Joseph', 'Suhana',
'Bismeet']),
'AI': pd.Series([94, 95, 99, np.NaN, 96, 99], index=['Heena', 'Shefali', 'Meera', 'Joseph', 'Suhana',
'Bismeet'])
}
marks = pd.DataFrame(ResultSheet)
✅ Explanation:
 Creates a full DataFrame with students and subjects
 Some entries (like Science for Meera) are missing (np.NaN)
print(marks.isnull()) # Shows where data is missing
print(marks['Science'].isnull()) # Check NaNs in Science only
print(marks.isnull().sum().sum()) # Count of total missing entries
drop = marks.dropna()
print(drop)
✅ Drops all rows with missing values.
fillZero = marks.fillna(0)
print(fillZero)
✅ Replaces all missing values with 0, so the data can still be used.

Chapter Back Exercise


Unit 1: Python Programming - II - Exercises

A. Objective Type Questions


1. Which of the following is a primary data structure in Pandas?
Answer: c) Series

2. What does the fillna(0) function do in Pandas?


Answer: b) Fills missing values with zeros

3. In Linear Regression, which library is typically used for importing and managing data?
Answer: b) Pandas

4. What is the correct syntax to read a CSV file into a Pandas DataFrame?
Answer: b) pd.read_csv("filename.csv")

5. What is the result of the df.shape function?


Answer: b) Number of rows and columns in the DataFrame

6. Which function can be used to export a DataFrame to a CSV file?


Answer: c) to_csv()

B. Short Answer Questions


1. What is a DataFrame in Pandas?
Answer: A DataFrame is a 2-dimensional labeled data structure in Pandas, similar to a table in a database or an Excel spreadsheet.

2. How do you create a Pandas Series from a dictionary?


Answer: By using the command: pd.Series({'a': 1, 'b': 2, 'c': 3})

3. Name two strategies to handle missing values in a DataFrame.


Answer: 1. Using fillna() to replace them with a specific value.
2. Using dropna() to remove rows or columns with missing values.

4. What does the head(n) function do in a DataFrame?


Answer: It returns the first 'n' rows of the DataFrame.

5. What is the role of NumPy in Python programming?


Answer: NumPy provides support for arrays, mathematical functions, and linear algebra operations.

6. Explain the use of the isnull() function in Pandas.


Answer: The isnull() function is used to detect missing (NaN) values in a DataFrame or Series.

C. Long Answer Questions


1. Describe the steps to import and export data using Pandas.
Answer: To import data: Use pd.read_csv('filename.csv')
To export data: Use df.to_csv('filename.csv')

2. Explain the concept of handling missing values in a DataFrame with examples.


Answer: Handling missing values can be done using:
- df.fillna(value): Fill with a specific value
- df.dropna(): Remove missing values
Example:
df['column'].fillna(df['column'].mean())
3. What is Linear Regression, and how is it implemented in Python?
Answer: Linear Regression is a statistical method to model the relationship between dependent and independent variables.
Implemented using scikit-learn:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X, y)

4. Compare NumPy arrays and Pandas DataFrames.


Answer: NumPy arrays are homogeneous and efficient for numerical computation.
Pandas DataFrames are heterogeneous, labeled, and provide rich data manipulation tools.

5. How can we add new rows and columns to an existing DataFrame? Explain with code examples.
Answer: To add a column:
df['new_col'] = [val1, val2, val3]
To add a row:
df.loc[len(df)] = [val1, val2, val3]

6. What are the attributes of a DataFrame? Provide examples.


Answer: Attributes include:
- df.shape: (rows, columns)
- df.columns: Column labels
- df.index: Row labels
- df.dtypes: Data types

D. Case Study
1. A dataset of student marks contains missing values for some subjects. Write Python code to handle these missing values by replacing them with the
mean of the respective columns.
Answer:
import pandas as pd
df = pd.read_csv('student_marks.csv')
df.fillna(df.mean(), inplace=True)

2. Write Python code to load the file into a Pandas DataFrame, calculate the total sales for each product, and save the results into a new CSV file.
Answer:
import pandas as pd
df = pd.read_csv('sales.csv')
total_sales = df.groupby('product')['sales'].sum()
total_sales.to_csv('total_sales.csv')

3. In a marketing dataset, analyze the performance of campaigns using Pandas. Describe steps to group data by campaign type and calculate average
sales and engagement metrics.
Answer:
df = pd.read_csv('marketing.csv')
avg_metrics = df.groupby('campaign_type')[['sales', 'engagement']].mean()

4. A company has collected data on employee performance. Some values are missing, and certain columns are irrelevant. Explain how to clean and
preprocess this data for analysis using Pandas.
Answer:
1. Remove irrelevant columns: df.drop(['col1', 'col2'], axis=1, inplace=True)
2. Handle missing values: df.fillna(method='ffill', inplace=True)
3. Convert datatypes if needed: df['col'] = df['col'].astype('int')

You might also like