0% found this document useful (0 votes)

75 views4 pages

Python Libraries for Data Analysis

This document provides an overview of Python programming with a focus on libraries like NumPy and Pandas for data manipulation and analysis. It covers creating and managing data structures, handling missing values, and importing/exporting CSV files, along with practical code examples. Additionally, it includes exercises and case studies to reinforce learning about data handling techniques in Python.

Uploaded by

Neha Makhija

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views4 pages

Python Libraries for Data Analysis

Uploaded by

Neha Makhija

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

UNIT 1: PYTHON PROGRAMMING – II

1.1 Python Libraries

Explanation:
Python has a rich set of libraries (like toolkits) that save time and effort by providing ready-to-use functions for tasks like data analysis,
mathematical operations, and machine learning.

1.1.1 NumPy Library

 NumPy = Numerical Python
 It allows creation of multi-dimensional arrays and offers fast mathematical operations.
 Arrays are more powerful than lists for numerical tasks.
import numpy as np
arr = [Link]([[1, 2, 3], [4, 5, 6]])
print(arr)
✅ Explanation: This creates a 2D array (2 rows × 3 columns) using [Link](). NumPy arrays support element-wise operations like
addition, multiplication, etc.

1.1.2 Pandas Library

 Pandas is for data manipulation and analysis.
 It provides two main structures:
o Series: 1D labeled array
o DataFrame: 2D labeled table (like an Excel sheet)

Creating Series
import pandas as pd
series = [Link]([10, 20, 30])
print(series)
✅ Explanation: This creates a simple 1D labeled array with automatic index values starting from 0.

Creating DataFrame from NumPy Arrays

import numpy as np
import pandas as pd

array1 = [Link]([90, 100, 110, 120])

array2 = [Link]([50, 60, 70, 80])
array3 = [Link]([10, 20, 30, 40])

marksDF = [Link]([array1, array2, array3], columns=['A', 'B', 'C', 'D'])

print(marksDF)
✅ Explanation: Each array is treated as a row. The columns are named A, B, C, D.

Creating DataFrame from Dictionary of Lists

data = {'Name': ['Varun', 'Ganesh', 'Joseph', 'Abdul', 'Reena'],
'Age': [37, 30, 38, 39, 40]}
df = [Link](data)
print(df)
✅ Explanation: Each key becomes a column (Name, Age). Each list element becomes a row. This is the most common way of creating
structured data in Pandas.

DataFrame from List of Dictionaries

listDict = [{'a': 10, 'b': 20}, {'a': 5, 'b': 10, 'c': 20}]
a = [Link](listDict)
print(a)
✅ Explanation: Each dictionary is a row. Missing values (like no 'c' in 1st row) are filled with NaN (Not a Number).

[Link] Row and Column Operations

Adding a New Column
Result['Fathima'] = [89, 78, 76]
✅ Explanation: A new column named 'Fathima' is added to the existing DataFrame. Values must match the number of rows (subjects).

Adding a New Row

[Link]['English'] = [90, 92, 89, 80, 90, 88]
✅ Explanation: .loc is used to add or access data using labels. A new subject 'English' is added as a row.

Updating an Existing Row

[Link]['Science'] = [92, 84, 90, 72, 96, 88]
✅ Explanation: Overwrites the entire row for 'Science' with new values.
[Link] Deleting Rows or Columns
Deleting a Row
Result = [Link]('Hindi', axis=0)
✅ Explanation: Deletes the row labeled 'Hindi'. axis=0 means row-wise deletion.

Deleting Multiple Columns

Result = [Link](['Rajat', 'Meenakshi', 'Karthika'], axis=1)
✅ Explanation: Deletes the specified student columns. axis=1 refers to column-wise deletion.

[Link] DataFrame Attributes and Head/Tail

print([Link]) # Returns row labels (Index)
print([Link]) # Returns column labels
print([Link]) # (rows, columns)
print([Link](2)) # First 2 rows
print([Link](2)) # Last 2 rows
✅ Explanation:
 index, columns, shape: metadata of the DataFrame.
 head() and tail() are useful for quickly viewing a few entries.

1.2 Importing and Exporting CSV Files

Importing CSV File
df = pd.read_csv("[Link]")
✅ Explanation:
Loads data from a CSV file into a DataFrame. CSV = Comma-Separated Values.

Exporting DataFrame to CSV

df.to_csv('[Link]', index=False)
✅ Explanation: Saves the DataFrame into a CSV file. index=False avoids saving row numbers as a separate column.

1.3 Handling Missing Values

Check for Missing Data
[Link]() # Shows True/False where values are missing
df['Science'].isnull().any() # Checks if 'Science' column has any NaNs
[Link]().sum().sum() # Total NaN values in DataFrame
✅ Explanation: These functions help find where data is missing (very common in real-life datasets).

Drop Rows with Missing Values

df = [Link]()
✅ Explanation: Removes any row that has at least one missing value.

Fill Missing Values with 0

df = [Link](0)
✅ Explanation: Replaces all NaN values with 0. This is useful when you don’t want to lose data due to missing entries.

1.4 Case Study – Handling Missing Marks

import pandas as pd
import numpy as np

ResultSheet = {
'Maths': [Link]([90, 91, 97, 89, 65, 93], index=['Heena', 'Shefali', 'Meera', 'Joseph', 'Suhana',
'Bismeet']),
'Science': [Link]([92, 81, [Link], 87, 50, 88], index=['Heena', 'Shefali', 'Meera', 'Joseph', 'Suhana',
'Bismeet']),
'English': [Link]([89, 91, 88, 78, 77, 82], index=['Heena', 'Shefali', 'Meera', 'Joseph', 'Suhana',
'Bismeet']),
'Hindi': [Link]([81, 71, 67, 82, [Link], 89], index=['Heena', 'Shefali', 'Meera', 'Joseph', 'Suhana',
'Bismeet']),
'AI': [Link]([94, 95, 99, [Link], 96, 99], index=['Heena', 'Shefali', 'Meera', 'Joseph', 'Suhana',
'Bismeet'])
}
marks = [Link](ResultSheet)
✅ Explanation:
 Creates a full DataFrame with students and subjects
 Some entries (like Science for Meera) are missing ([Link])
print([Link]()) # Shows where data is missing
print(marks['Science'].isnull()) # Check NaNs in Science only
print([Link]().sum().sum()) # Count of total missing entries
drop = [Link]()
print(drop)
✅ Drops all rows with missing values.
fillZero = [Link](0)
print(fillZero)
✅ Replaces all missing values with 0, so the data can still be used.

Chapter Back Exercise

Unit 1: Python Programming - II - Exercises

A. Objective Type Questions

1. Which of the following is a primary data structure in Pandas?
Answer: c) Series

2. What does the fillna(0) function do in Pandas?

Answer: b) Fills missing values with zeros

3. In Linear Regression, which library is typically used for importing and managing data?
Answer: b) Pandas

4. What is the correct syntax to read a CSV file into a Pandas DataFrame?
Answer: b) pd.read_csv("[Link]")

5. What is the result of the [Link] function?

Answer: b) Number of rows and columns in the DataFrame

6. Which function can be used to export a DataFrame to a CSV file?

Answer: c) to_csv()

B. Short Answer Questions

1. What is a DataFrame in Pandas?
Answer: A DataFrame is a 2-dimensional labeled data structure in Pandas, similar to a table in a database or an Excel spreadsheet.

2. How do you create a Pandas Series from a dictionary?

Answer: By using the command: [Link]({'a': 1, 'b': 2, 'c': 3})

3. Name two strategies to handle missing values in a DataFrame.

Answer: 1. Using fillna() to replace them with a specific value.
2. Using dropna() to remove rows or columns with missing values.

4. What does the head(n) function do in a DataFrame?

Answer: It returns the first 'n' rows of the DataFrame.

5. What is the role of NumPy in Python programming?

Answer: NumPy provides support for arrays, mathematical functions, and linear algebra operations.

6. Explain the use of the isnull() function in Pandas.

Answer: The isnull() function is used to detect missing (NaN) values in a DataFrame or Series.

C. Long Answer Questions

1. Describe the steps to import and export data using Pandas.
Answer: To import data: Use pd.read_csv('[Link]')
To export data: Use df.to_csv('[Link]')

2. Explain the concept of handling missing values in a DataFrame with examples.

Answer: Handling missing values can be done using:
- [Link](value): Fill with a specific value
- [Link](): Remove missing values
Example:
df['column'].fillna(df['column'].mean())
3. What is Linear Regression, and how is it implemented in Python?
Answer: Linear Regression is a statistical method to model the relationship between dependent and independent variables.
Implemented using scikit-learn:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
[Link](X, y)

4. Compare NumPy arrays and Pandas DataFrames.

Answer: NumPy arrays are homogeneous and efficient for numerical computation.
Pandas DataFrames are heterogeneous, labeled, and provide rich data manipulation tools.

5. How can we add new rows and columns to an existing DataFrame? Explain with code examples.
Answer: To add a column:
df['new_col'] = [val1, val2, val3]
To add a row:
[Link][len(df)] = [val1, val2, val3]

6. What are the attributes of a DataFrame? Provide examples.

Answer: Attributes include:
- [Link]: (rows, columns)
- [Link]: Column labels
- [Link]: Row labels
- [Link]: Data types

D. Case Study
1. A dataset of student marks contains missing values for some subjects. Write Python code to handle these missing values by replacing them with the
mean of the respective columns.
Answer:
import pandas as pd
df = pd.read_csv('student_marks.csv')
[Link]([Link](), inplace=True)

2. Write Python code to load the file into a Pandas DataFrame, calculate the total sales for each product, and save the results into a new CSV file.
Answer:
import pandas as pd
df = pd.read_csv('[Link]')
total_sales = [Link]('product')['sales'].sum()
total_sales.to_csv('total_sales.csv')

3. In a marketing dataset, analyze the performance of campaigns using Pandas. Describe steps to group data by campaign type and calculate average
sales and engagement metrics.
Answer:
df = pd.read_csv('[Link]')
avg_metrics = [Link]('campaign_type')[['sales', 'engagement']].mean()

4. A company has collected data on employee performance. Some values are missing, and certain columns are irrelevant. Explain how to clean and
preprocess this data for analysis using Pandas.
Answer:
1. Remove irrelevant columns: [Link](['col1', 'col2'], axis=1, inplace=True)
2. Handle missing values: [Link](method='ffill', inplace=True)
3. Convert datatypes if needed: df['col'] = df['col'].astype('int')

Chapter 1
No ratings yet
Chapter 1
7 pages
12 Ai Practical File
100% (3)
12 Ai Practical File
5 pages
Pandas DataFrame Checklist Student Worksheet
100% (1)
Pandas DataFrame Checklist Student Worksheet
2 pages
Practical
No ratings yet
Practical
29 pages
Ip Study
No ratings yet
Ip Study
18 pages
Python
No ratings yet
Python
16 pages
Lab 9
No ratings yet
Lab 9
9 pages
Informatics Practices Practical List22-2323
No ratings yet
Informatics Practices Practical List22-2323
6 pages
Worksheet Class 12 Ai
No ratings yet
Worksheet Class 12 Ai
38 pages
AI Student HandbookXII 2025-26!8!20
No ratings yet
AI Student HandbookXII 2025-26!8!20
13 pages
Lab 3 & 4
No ratings yet
Lab 3 & 4
10 pages
What Is A Series and How Is It Different From A 1-D Array, A List, and A Dictionary
No ratings yet
What Is A Series and How Is It Different From A 1-D Array, A List, and A Dictionary
3 pages
Python Data Handling with Pandas
No ratings yet
Python Data Handling with Pandas
12 pages
Pandas
No ratings yet
Pandas
5 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
Pandas1 Q&ans
No ratings yet
Pandas1 Q&ans
14 pages
Pandas DataFrame Basics
No ratings yet
Pandas DataFrame Basics
48 pages
Python Pandas - 2 2020-21
No ratings yet
Python Pandas - 2 2020-21
21 pages
Ip Project
No ratings yet
Ip Project
21 pages
Pandas Data Handling for Class 12 IP
No ratings yet
Pandas Data Handling for Class 12 IP
54 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Class 12 Pandas Practical Guide
No ratings yet
Class 12 Pandas Practical Guide
15 pages
Pandas Basics Guide
No ratings yet
Pandas Basics Guide
4 pages
Python Pandas Dataframe
No ratings yet
Python Pandas Dataframe
3 pages
Ip Sample Paper 1
No ratings yet
Ip Sample Paper 1
4 pages
Practical Record 2 PYTHON AND SQL PROGRAMS - 2023
No ratings yet
Practical Record 2 PYTHON AND SQL PROGRAMS - 2023
76 pages
Dataframe
No ratings yet
Dataframe
2 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Latebloomerworksheet
No ratings yet
Latebloomerworksheet
8 pages
Pandas Module Overview and Usage Guide
No ratings yet
Pandas Module Overview and Usage Guide
15 pages
Python Pandas Data Manipulation Guide
No ratings yet
Python Pandas Data Manipulation Guide
11 pages
Unit-3 DH&V
No ratings yet
Unit-3 DH&V
135 pages
Ip Sample Paper 2
No ratings yet
Ip Sample Paper 2
6 pages
Model Practical Examination 2024-25 Python Pandas QP
No ratings yet
Model Practical Examination 2024-25 Python Pandas QP
3 pages
3rd Week Report
No ratings yet
3rd Week Report
7 pages
Assignment-1 (Python Pandas-Series Object and Data Frame: 1. Answer The Following
100% (1)
Assignment-1 (Python Pandas-Series Object and Data Frame: 1. Answer The Following
8 pages
Pandas DataFrame Features Guide
No ratings yet
Pandas DataFrame Features Guide
13 pages
Exercise 7 - Pandas
No ratings yet
Exercise 7 - Pandas
2 pages
LL
No ratings yet
LL
5 pages
Assignment 2 Clss12 Pandas I
No ratings yet
Assignment 2 Clss12 Pandas I
65 pages
List of Practical Ip065 Xii Session 2025 CKC Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 CKC Academy
19 pages
Holy Innocents Public School Term-1
No ratings yet
Holy Innocents Public School Term-1
6 pages
Pandas
No ratings yet
Pandas
18 pages
Class XII Pandas & SQL Practical List
100% (1)
Class XII Pandas & SQL Practical List
7 pages
Python Pandas Worksheet
No ratings yet
Python Pandas Worksheet
3 pages
DATAFRAME
No ratings yet
DATAFRAME
11 pages
Creation of Series Using List, Dictionary & Ndarray
No ratings yet
Creation of Series Using List, Dictionary & Ndarray
65 pages
PPS - Unit 5 (Imp Topics)
No ratings yet
PPS - Unit 5 (Imp Topics)
7 pages
Xii Ip Practical List 2022-23-1
No ratings yet
Xii Ip Practical List 2022-23-1
23 pages
Pragya File
No ratings yet
Pragya File
31 pages
Python Project File
No ratings yet
Python Project File
31 pages
Acknowledgement
No ratings yet
Acknowledgement
25 pages
Worksheet Dataframe
No ratings yet
Worksheet Dataframe
2 pages
Practice Questions (Unsolved)
No ratings yet
Practice Questions (Unsolved)
8 pages
Pandas DataFrame and Series Guide
No ratings yet
Pandas DataFrame and Series Guide
6 pages
Ip HHW
No ratings yet
Ip HHW
32 pages
Minimum Level Pandas Skill Based Questions
No ratings yet
Minimum Level Pandas Skill Based Questions
8 pages
Unit 5 Python
No ratings yet
Unit 5 Python
30 pages
WS - The Laburnum
100% (1)
WS - The Laburnum
2 pages
NumPy Array Creation and Manipulation
No ratings yet
NumPy Array Creation and Manipulation
2 pages
NLP: Bridging Language and AI
No ratings yet
NLP: Bridging Language and AI
5 pages
Cyber Stage Act
No ratings yet
Cyber Stage Act
3 pages
Python List Operations Practice Questions
No ratings yet
Python List Operations Practice Questions
3 pages
Rural Test AK
No ratings yet
Rural Test AK
2 pages
Unga 1
No ratings yet
Unga 1
1 page
UNIT1 - AI For Everyone
No ratings yet
UNIT1 - AI For Everyone
2 pages
Data Literacy Homework Guide
No ratings yet
Data Literacy Homework Guide
4 pages
Variables
No ratings yet
Variables
1 page
Body Movements and Joints WS
No ratings yet
Body Movements and Joints WS
3 pages
Crossword 10
No ratings yet
Crossword 10
7 pages
CBSE Training for Automotive & Web Teachers
No ratings yet
CBSE Training for Automotive & Web Teachers
1 page
LUMINARA
No ratings yet
LUMINARA
2 pages
Crossword 9
No ratings yet
Crossword 9
7 pages
Class 6 History: Kingdoms and Governance
No ratings yet
Class 6 History: Kingdoms and Governance
6 pages
SC1 - Light Shadows and Reflection Class 6 Extra Questions and Answers
No ratings yet
SC1 - Light Shadows and Reflection Class 6 Extra Questions and Answers
4 pages
CH1 Iq
No ratings yet
CH1 Iq
28 pages
Orange - AI417 - 10 - MS (P1)
No ratings yet
Orange - AI417 - 10 - MS (P1)
4 pages
139 Notification 2024
No ratings yet
139 Notification 2024
2 pages
Project File
No ratings yet
Project File
30 pages
Class 10 AI Sample Paper & Marking Scheme
100% (1)
Class 10 AI Sample Paper & Marking Scheme
5 pages
Introduction TOAI
No ratings yet
Introduction TOAI
22 pages
Class X Communication Guide
No ratings yet
Class X Communication Guide
57 pages
138 Notification 2024
No ratings yet
138 Notification 2024
14 pages
Orange - AI417 - 10 - QP (P2)
No ratings yet
Orange - AI417 - 10 - QP (P2)
8 pages
Sequencing in Block Coding Explained
100% (1)
Sequencing in Block Coding Explained
2 pages
Class 9 Notes PT1 - New
No ratings yet
Class 9 Notes PT1 - New
3 pages
Proposal
No ratings yet
Proposal
17 pages
Unit - 1 DDB
No ratings yet
Unit - 1 DDB
34 pages
Office Management Tools
No ratings yet
Office Management Tools
2 pages
Information Retrieval Dissertation Help
100% (2)
Information Retrieval Dissertation Help
5 pages
DBMS Lab Assignment 12
No ratings yet
DBMS Lab Assignment 12
7 pages
Probability - Statistics3
No ratings yet
Probability - Statistics3
2 pages
1 Online
No ratings yet
1 Online
6 pages
Oracle DDL L2
No ratings yet
Oracle DDL L2
28 pages
AI Innovations in Anti-Money Laundering
No ratings yet
AI Innovations in Anti-Money Laundering
48 pages
Aws Short Notes
No ratings yet
Aws Short Notes
2 pages
IRS Most Important Topic
No ratings yet
IRS Most Important Topic
4 pages
MIS As A Communication Process
No ratings yet
MIS As A Communication Process
12 pages
PLM Developer Profile 1
No ratings yet
PLM Developer Profile 1
4 pages
Les Set Analysis - ENG
100% (1)
Les Set Analysis - ENG
20 pages
Statistics Formulas: Parameters
No ratings yet
Statistics Formulas: Parameters
3 pages
Java Programming - Internship Projects
No ratings yet
Java Programming - Internship Projects
7 pages
Data Analysis Procedure
0% (1)
Data Analysis Procedure
27 pages
QA Automation Expert Profile
No ratings yet
QA Automation Expert Profile
3 pages
BS IT Course Outline - Punjab University
No ratings yet
BS IT Course Outline - Punjab University
14 pages
Java MCQ Assessment Overview
No ratings yet
Java MCQ Assessment Overview
705 pages
Data Modeling for Business Users
No ratings yet
Data Modeling for Business Users
30 pages
Iso 11783-11-2011
No ratings yet
Iso 11783-11-2011
8 pages
Data Science Fundamentals Overview
No ratings yet
Data Science Fundamentals Overview
3 pages
AI Agents and Their Workflow
No ratings yet
AI Agents and Their Workflow
3 pages
Basic Structure and Syntax of PL
No ratings yet
Basic Structure and Syntax of PL
14 pages
Online Laundry Management System For Hachdobbies LTD
No ratings yet
Online Laundry Management System For Hachdobbies LTD
20 pages
Excel-Navisworks Data Linking Guide
No ratings yet
Excel-Navisworks Data Linking Guide
2 pages
Rdbms III Sem
100% (1)
Rdbms III Sem
80 pages
Dbms Notes
100% (1)
Dbms Notes
28 pages
Module 2 Intelligence (4 Lessons) Fundamentals of Data Analytics
No ratings yet
Module 2 Intelligence (4 Lessons) Fundamentals of Data Analytics
19 pages

Python Libraries for Data Analysis

Uploaded by

Python Libraries for Data Analysis

Uploaded by

UNIT 1: PYTHON PROGRAMMING – II

1.1 Python Libraries

1.1.1 NumPy Library

1.1.2 Pandas Library

Creating DataFrame from NumPy Arrays

array1 = [Link]([90, 100, 110, 120])

marksDF = [Link]([array1, array2, array3], columns=['A', 'B', 'C', 'D'])

Creating DataFrame from Dictionary of Lists

DataFrame from List of Dictionaries

[Link] Row and Column Operations

Adding a New Row

Updating an Existing Row

Deleting Multiple Columns

[Link] DataFrame Attributes and Head/Tail

1.2 Importing and Exporting CSV Files

Exporting DataFrame to CSV

1.3 Handling Missing Values

Drop Rows with Missing Values

Fill Missing Values with 0

1.4 Case Study – Handling Missing Marks

Chapter Back Exercise

A. Objective Type Questions

2. What does the fillna(0) function do in Pandas?

5. What is the result of the [Link] function?

6. Which function can be used to export a DataFrame to a CSV file?

B. Short Answer Questions

2. How do you create a Pandas Series from a dictionary?

3. Name two strategies to handle missing values in a DataFrame.

4. What does the head(n) function do in a DataFrame?

5. What is the role of NumPy in Python programming?

6. Explain the use of the isnull() function in Pandas.

C. Long Answer Questions

2. Explain the concept of handling missing values in a DataFrame with examples.

4. Compare NumPy arrays and Pandas DataFrames.

6. What are the attributes of a DataFrame? Provide examples.

You might also like