Assignment 1

This assignment introduces the Pandas library for data manipulation and analysis in Python, covering basic functions such as reading data from various formats, identifying missing values, and sorting data. It emphasizes data cleaning and preprocessing techniques to enhance skills in handling structured data. The assignment concludes that proficiency in Pandas will facilitate more advanced data analysis projects.

Uploaded by

Pathan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views2 pages

Assignment 1

Uploaded by

Pathan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Name: Pathan Firdos Maheraj

Roll no: 281073

Batch: A3
Assignment 1
Statement:

Q. Perform the following operations using R/Python on suitable data sets:

a) Read data from different formats (like CSV, XLS)
b) Find Shape of Data
c) Find Missing Values
d) Find Data Type of Each Column
e) Finding Out Zeros
f) Indexing and Selecting Data, Sort Data
g) Describe Attributes of Data, Checking Data Types of Each Column
h) Counting Unique Values of Data, Format of Each Column, Converting Variable Data Type (e.g.,
from long to short, vice versa)

Objective:

1. This assignment aims to introduce the Pandas library and its basic functions, which provide
functionality for reading different file formats such as CSV and Excel.
2. Additionally, it familiarizes users with data cleaning and preprocessing techniques.
3. Enhance our skills in handling data in various formats, improving our proficiency in data
analysis and manipulation.

Resources used:

1. Software used: Google Colab

2. Library used: Pandas

Introduction to Pandas:

1. Pandas is a powerful and widely-used open-source Python library for data manipulation and
analysis.
2. It provides easy-to-use data structures and functions, making it an essential tool for working
with structured data.
3. At the core of Pandas are two main data structures: Series and DataFrame.
4. A Series is a one-dimensional labeled array capable of holding any data type.
5. A DataFrame is a two-dimensional labeled data structure with columns of potentially
different types.
6. These data structures allow users to perform a wide range of operations on data, including
loading data from various file formats (such as CSV, Excel, SQL databases), manipulating
data (e.g., sorting, filtering, grouping), and performing statistical and analytical tasks.

Some basic functions that we used in the program:

1. pd.read_csv(): This function is used to read data from a CSV file into a DataFrame.
2. shape: Returns the number of rows and columns in the dataset.
3. isnull().sum(): Identifies missing values in the dataset.
4. dtypes: Returns the data type of each column in the dataset.
5. (df == 0).sum(): Identifies the number of zeros in each column.
6. sort_values(): Sorts the DataFrame by the values of a specified column, allowing data to be
arranged in ascending order.
7. describe(): Generates descriptive statistics for numerical columns in the DataFrame, such as
count, mean, standard deviation, minimum, and maximum values.
8. unique(): Returns an array of unique values in a column of the DataFrame, useful for
identifying distinct categories or groups in categorical data.

Methodology:

1. Data Collection and Exploration:

o Collect Data: Obtain a relevant dataset ensuring it contains key features.
o Explore Data: Load the dataset into a Pandas DataFrame and analyze its structure,
including the number of samples, features, data types, and any missing or erroneous
values.
2. Data Preprocessing:
o Handle Missing Values: Identify and manage missing values using techniques such
as imputation or removal.
o Data Cleaning: Remove duplicates, correct erroneous entries, and ensure consistency
in data formatting.
3. Feature Engineering:
o Feature Selection: Select relevant features using domain knowledge and statistical
techniques.
o Feature Encoding: Convert categorical variables into numerical format using one-hot
encoding or label encoding for better processing.

Advantages:

1. Pandas is an easy-to-use library, making it widely popular.

2. It provides powerful data structures like Series and DataFrame.
3. It offers extensive functionality for data manipulation.

Disadvantages:

1. Pandas can consume significant memory when working with large datasets.
2. It is highly integrated with the Python ecosystem, limiting interoperability with other
programming languages.

Conclusion:

In summary, this assignment provided an introduction to the Pandas library, a crucial tool for data
manipulation and analysis in Python. We explored its basic functions, such as reading various data
formats, organizing and describing data, and handling missing values. Through practical exercises,
we gained a better understanding of how Pandas can simplify complex data tasks, making data
analysis more accessible and efficient. These foundational skills with Pandas will serve as a strong
base for more advanced data analysis projects in the future.

On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
63 pages
An Mini Project Report On
64% (11)
An Mini Project Report On
31 pages
A Mini Project Report ON Web Based College Admission System: Bachelor of Computer Applications
50% (2)
A Mini Project Report ON Web Based College Admission System: Bachelor of Computer Applications
48 pages
Employee Data Analysis System (Ip Class Xii)
No ratings yet
Employee Data Analysis System (Ip Class Xii)
26 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
64 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
Sap 7.5
25% (4)
Sap 7.5
18 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
12 SM Ip
No ratings yet
12 SM Ip
180 pages
Pandas, Numpy, Matplotlib
No ratings yet
Pandas, Numpy, Matplotlib
11 pages
Data Frame
No ratings yet
Data Frame
95 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
DSBDA Lab Manual24-25
No ratings yet
DSBDA Lab Manual24-25
58 pages
Class Xii Information Practices PPT On Data Handling Using Pandas-I
No ratings yet
Class Xii Information Practices PPT On Data Handling Using Pandas-I
64 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
No ratings yet
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
6 pages
Dav 2 Unit
No ratings yet
Dav 2 Unit
55 pages
Pandas
No ratings yet
Pandas
82 pages
DS Final
No ratings yet
DS Final
46 pages
Chapter2 - Data Wrangling
No ratings yet
Chapter2 - Data Wrangling
48 pages
ML File Updated
No ratings yet
ML File Updated
60 pages
1 Data Handling Using Pandas 1
No ratings yet
1 Data Handling Using Pandas 1
63 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
Employee Data Analysis System (Ip Class 12) (2024-25)
No ratings yet
Employee Data Analysis System (Ip Class 12) (2024-25)
30 pages
UNIT-4 Important Q-A
No ratings yet
UNIT-4 Important Q-A
28 pages
18 Pandas
No ratings yet
18 Pandas
33 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
Rest of The Ip Project
No ratings yet
Rest of The Ip Project
26 pages
UNIT II Notes
No ratings yet
UNIT II Notes
23 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
Server Hosting Management System (Ip Class 12) (2024-25)
No ratings yet
Server Hosting Management System (Ip Class 12) (2024-25)
21 pages
Ip Study
No ratings yet
Ip Study
18 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
SPiiPlus C Library Reference Programmer's Guide
No ratings yet
SPiiPlus C Library Reference Programmer's Guide
436 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Data Science With Python - Lesson 07 - Data Manipulation With Python - Pandas
No ratings yet
Data Science With Python - Lesson 07 - Data Manipulation With Python - Pandas
72 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
EX-02-Data Manipulation Pandas Matplot
No ratings yet
EX-02-Data Manipulation Pandas Matplot
9 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Icf 9 Excel
No ratings yet
Icf 9 Excel
17 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
Dejene Chala Stat606 Screening Quiz Programming Part
No ratings yet
Dejene Chala Stat606 Screening Quiz Programming Part
12 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Python CA2
No ratings yet
Python CA2
11 pages
3rd Week Report
No ratings yet
3rd Week Report
7 pages
L32, 33 Pandas
No ratings yet
L32, 33 Pandas
7 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
No ratings yet
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
9 pages
Lab #2 - Data Analysis With NumPy and Pandas
No ratings yet
Lab #2 - Data Analysis With NumPy and Pandas
7 pages
What Is Pandas
No ratings yet
What Is Pandas
9 pages
Utf-8''libraries Data Management
No ratings yet
Utf-8''libraries Data Management
9 pages
Py 10
No ratings yet
Py 10
5 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Python ClassXII AI
No ratings yet
Python ClassXII AI
4 pages
Exercise Session 4 and Lab 2: Washing Machine: The Programming Task
0% (1)
Exercise Session 4 and Lab 2: Washing Machine: The Programming Task
4 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
How To Use GitHub Copilot - Prompts, Tips, and Use Cases
No ratings yet
How To Use GitHub Copilot - Prompts, Tips, and Use Cases
4 pages
PSCAD V5 - Setup Instructions (Detailed)
No ratings yet
PSCAD V5 - Setup Instructions (Detailed)
82 pages
1500 Revision Management
No ratings yet
1500 Revision Management
18 pages
Software Division BLR
No ratings yet
Software Division BLR
2 pages
Module 05 - MW11D Intune - Profile Management
No ratings yet
Module 05 - MW11D Intune - Profile Management
32 pages
Report
No ratings yet
Report
31 pages
Complete ATM Project Report
No ratings yet
Complete ATM Project Report
11 pages
Sqa Hust
No ratings yet
Sqa Hust
516 pages
Nastran Interface Tutorial PDF
No ratings yet
Nastran Interface Tutorial PDF
14 pages
L15-C2 Progress Bar Control
No ratings yet
L15-C2 Progress Bar Control
6 pages
FPGA Design and Verification (Project Internship Program) - 15 Weeks
No ratings yet
FPGA Design and Verification (Project Internship Program) - 15 Weeks
4 pages
Coin3D-Qt Paper
No ratings yet
Coin3D-Qt Paper
8 pages
Tutorial Lua
No ratings yet
Tutorial Lua
11 pages
الملخص لنظم التشغيل من 1الى6
No ratings yet
الملخص لنظم التشغيل من 1الى6
64 pages
C++ STL For Embedded System PDF
No ratings yet
C++ STL For Embedded System PDF
7 pages
Entrylevel Software Engineer Resume Example
No ratings yet
Entrylevel Software Engineer Resume Example
1 page
EMC Networker Nsradmin PDF
No ratings yet
EMC Networker Nsradmin PDF
60 pages
GFX 3.3 Drawtext API
No ratings yet
GFX 3.3 Drawtext API
20 pages
Agile Estimation: Agile42 - We Advise, Train and Coach Companies Building Software
No ratings yet
Agile Estimation: Agile42 - We Advise, Train and Coach Companies Building Software
11 pages
Kubernetes Addons
No ratings yet
Kubernetes Addons
3 pages
MCA - Project Documentation Guidelines 2021-2022
No ratings yet
MCA - Project Documentation Guidelines 2021-2022
4 pages
Final Resume Edit
No ratings yet
Final Resume Edit
2 pages
Sawan Kulkarni: Professional Summary
No ratings yet
Sawan Kulkarni: Professional Summary
3 pages
rc159-HBase 7 PDF
No ratings yet
rc159-HBase 7 PDF
7 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Python Data Structures Explained: A Practical Guide with Examples
From Everand
Python Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet

Assignment 1

Uploaded by

Assignment 1

Uploaded by

Name: Pathan Firdos Maheraj

Roll no: 281073

Q. Perform the following operations using R/Python on suitable data sets:

1. Software used: Google Colab

Some basic functions that we used in the program:

1. Data Collection and Exploration:

1. Pandas is an easy-to-use library, making it widely popular.

You might also like