0% found this document useful (0 votes)

9 views13 pages

AM19 EDA Assignment1

Assignment on EDA

Uploaded by

Swapnil Chaudhari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views13 pages

AM19 EDA Assignment1

Assignment on EDA

Uploaded by

Swapnil Chaudhari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

am19-eda-assignment1

November 28, 2024

Name: Swapnil Chaudhari

PRN: 2122000238
Roll No.: AM19
Assignment No. 1
[1]: import pandas as pd

[3]: df=pd.read_excel('Titanic-Dataset.xlsx')
df

[3]: PassengerId Survived Pclass \

0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
.. … … …
886 887 0 2
887 888 1 1
888 889 0 3
889 890 1 1
890 891 0 3

Name Sex Age SibSp \

0 Braund, Mr. Owen Harris male 22.0 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
.. … … … …
886 Montvila, Rev. Juozas male 27.0 0
887 Graham, Miss. Margaret Edith female 19.0 0
888 Johnston, Miss. Catherine Helen "Carrie" female NaN 1
889 Behr, Mr. Karl Howell male 26.0 0
890 Dooley, Mr. Patrick male 32.0 0

1
Parch Ticket Fare Cabin Embarked
0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
.. … … … … …
886 0 211536 13.0000 NaN S
887 0 112053 30.0000 B42 S
888 2 W./C. 6607 23.4500 NaN S
889 0 111369 30.0000 C148 C
890 0 370376 7.7500 NaN Q

[891 rows x 12 columns]

[5]: #1.Create a dataframe

data = {"Name":["Ram","Shyam","Gita","Sita","Druv","Om","Radhika"],
"Age":[20,22,19,18,20,21,20],
"Gender":["Male","Male","Female","Female","Male","Male","Female"],
"Salary":[12000,23000,20000,19000,10000,25000,40000]}

[6]: df_d = pd.DataFrame(data)

[7]: df_d

[7]: Name Age Gender Salary

0 Ram 20 Male 12000
1 Shyam 22 Male 23000
2 Gita 19 Female 20000
3 Sita 18 Female 19000
4 Druv 20 Male 10000
5 Om 21 Male 25000
6 Radhika 20 Female 40000

[8]: #2.Find shape of the data

df.shape

[8]: (891, 12)

[9]: #3.Find size of the data

df.size

[9]: 10692

[10]: #4.Find dimensions of the data

df.ndim

2
[10]: 2

[11]: #5.List all columns in df

df.columns

[11]: Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',

'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')

[12]: #6.Find datatypes of each column

df.dtypes

[12]: PassengerId int64

Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object

[13]: #7.Find axes of the df

df.axes

[13]: [RangeIndex(start=0, stop=891, step=1),

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')]

[14]: #8.Find index of the df

df.index

[14]: RangeIndex(start=0, stop=891, step=1)

[15]: #9.Find all values of df

df.values

[15]: array([[1, 0, 3, …, 7.25, nan, 'S'],

[2, 1, 1, …, 71.2833, 'C85', 'C'],
[3, 1, 3, …, 7.925, nan, 'S'],
…,
[889, 0, 3, …, 23.45, nan, 'S'],

3
[890, 1, 1, …, 30.0, 'C148', 'C'],
[891, 0, 3, …, 7.75, nan, 'Q']], dtype=object)

[16]: #10.Check whether the df id empty

df.empty

[16]: False

[17]: #11.Transpose the df

# df.T

[18]: #12.Find detailed info of the df

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

[19]: #13.Display top n record from the df

df.head()

[19]: PassengerId Survived Pclass \

0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3

Name Sex Age SibSp \

0 Braund, Mr. Owen Harris male 22.0 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1

4
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0

Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S

[20]: #14.Display bottom n record from the df

df.tail()

[20]: PassengerId Survived Pclass Name \

886 887 0 2 Montvila, Rev. Juozas
887 888 1 1 Graham, Miss. Margaret Edith
888 889 0 3 Johnston, Miss. Catherine Helen "Carrie"
889 890 1 1 Behr, Mr. Karl Howell
890 891 0 3 Dooley, Mr. Patrick

Sex Age SibSp Parch Ticket Fare Cabin Embarked

886 male 27.0 0 0 211536 13.00 NaN S
887 female 19.0 0 0 112053 30.00 B42 S
888 female NaN 1 2 W./C. 6607 23.45 NaN S
889 male 26.0 0 0 111369 30.00 C148 C
890 male 32.0 0 0 370376 7.75 NaN Q

[21]: #15.Display descriptive statistics for the numerical columns from the df
df.describe()

[21]: PassengerId Survived Pclass Age SibSp \

count 891.000000 891.000000 891.000000 714.000000 891.000000
mean 446.000000 0.383838 2.308642 29.699118 0.523008
std 257.353842 0.486592 0.836071 14.526497 1.102743
min 1.000000 0.000000 1.000000 0.420000 0.000000
25% 223.500000 0.000000 2.000000 20.125000 0.000000
50% 446.000000 0.000000 3.000000 28.000000 0.000000
75% 668.500000 1.000000 3.000000 38.000000 1.000000
max 891.000000 1.000000 3.000000 80.000000 8.000000

Parch Fare
count 891.000000 891.000000
mean 0.381594 32.204208
std 0.806057 49.693429
min 0.000000 0.000000
25% 0.000000 7.910400

5
50% 0.000000 14.454200
75% 0.000000 31.000000
max 6.000000 512.329200

[22]: #16. Find mean, median, mode, std values for numerical columns in df
df['Age'].mean()

[22]: 29.69911764705882

[23]: df['Age'].median()

[23]: 28.0

[24]: df['Age'].mode()

[24]: 0 24.0
Name: Age, dtype: float64

[25]: df['Age'].min()

[25]: 0.42

[26]: df['Age'].max()

[26]: 80.0

[27]: df['Age'].std()

[27]: 14.526497332334042

[28]: #17.Return a random samples from df

df.sample()

[28]: PassengerId Survived Pclass Name Sex Age SibSp \

470 471 0 3 Keefe, Mr. Arthur male NaN 0

Parch Ticket Fare Cabin Embarked

470 0 323592 7.25 NaN S

[29]: #18.Find unique values for the categorical columns

df['Sex'].unique()

[29]: array(['male', 'female'], dtype=object)

[30]: #19.Find number of unique values for the categorical columns

df['Sex'].nunique()

6
[30]: 2

[31]: #20.Locate first row in the df using loc

df.loc[0]

[31]: PassengerId 1
Survived 0
Pclass 3
Name Braund, Mr. Owen Harris
Sex male
Age 22.0
SibSp 1
Parch 0
Ticket A/5 21171
Fare 7.25
Cabin NaN
Embarked S
Name: 0, dtype: object

[32]: #21.Locate nth row in the df using loc

df.loc[4]

[32]: PassengerId 5
Survived 0
Pclass 3
Name Allen, Mr. William Henry
Sex male
Age 35.0
SibSp 0
Parch 0
Ticket 373450
Fare 8.05
Cabin NaN
Embarked S
Name: 4, dtype: object

[33]: #22.Locate last row in the df using loc

df.loc[df.index[-1]]

[33]: PassengerId 891

Survived 0
Pclass 3
Name Dooley, Mr. Patrick
Sex male
Age 32.0
SibSp 0
Parch 0

7
Ticket 370376
Fare 7.75
Cabin NaN
Embarked Q
Name: 890, dtype: object

[34]: #23.Locate all values in a range in df using loc

df.loc[1:5]

[34]: PassengerId Survived Pclass \

1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
5 6 0 3

Name Sex Age SibSp \

1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
5 Moran, Mr. James male NaN 0

Parch Ticket Fare Cabin Embarked

1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
5 0 330877 8.4583 NaN Q

[35]: #24.Locate all the rows in df with specific criteria using loc
df.loc[df['Age']>20]

[35]: PassengerId Survived Pclass \

0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
.. … … …
884 885 0 3
885 886 0 3
886 887 0 2
889 890 1 1
890 891 0 3

Name Sex Age SibSp \

8
0 Braund, Mr. Owen Harris male 22.0 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
.. … … … …
884 Sutehall, Mr. Henry Jr male 25.0 0
885 Rice, Mrs. William (Margaret Norton) female 39.0 0
886 Montvila, Rev. Juozas male 27.0 0
889 Behr, Mr. Karl Howell male 26.0 0
890 Dooley, Mr. Patrick male 32.0 0

Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
.. … … … … …
884 0 SOTON/OQ 392076 7.0500 NaN S
885 5 382652 29.1250 NaN Q
886 0 211536 13.0000 NaN S
889 0 111369 30.0000 C148 C
890 0 370376 7.7500 NaN Q

[535 rows x 12 columns]

[36]: #25.Locate all the rows in df with specific criteria with logical AND opeartor␣
↪using loc

df.loc[(df['Age']>15)&(df['Age']<20)]

[36]: PassengerId Survived Pclass \

27 28 0 1
38 39 0 3
44 45 1 3
49 50 0 3
67 68 0 3
.. … … …
844 845 0 3
853 854 1 1
855 856 1 3
877 878 0 3
887 888 1 1

Name Sex Age SibSp \

27 Fortune, Mr. Charles Alexander male 19.0 3
38 Vander Planke, Miss. Augusta Maria female 18.0 2

9
44 Devaney, Miss. Margaret Delia female 19.0 0
49 Arnold-Franchi, Mrs. Josef (Josefine Franchi) female 18.0 1
67 Crease, Mr. Ernest James male 19.0 0
.. … … … …
844 Culumovic, Mr. Jeso male 17.0 0
853 Lines, Miss. Mary Conover female 16.0 0
855 Aks, Mrs. Sam (Leah Rosen) female 18.0 0
877 Petroff, Mr. Nedelio male 19.0 0
887 Graham, Miss. Margaret Edith female 19.0 0

Parch Ticket Fare Cabin Embarked

27 2 19950 263.0000 C23 C25 C27 S
38 0 345764 18.0000 NaN S
44 0 330958 7.8792 NaN Q
49 0 349237 17.8000 NaN S
67 0 S.P. 3464 8.1583 NaN S
.. … … … … …
844 0 315090 8.6625 NaN S
853 1 PC 17592 39.4000 D28 S
855 1 392091 9.3500 NaN S
877 0 349212 7.8958 NaN S
887 0 112053 30.0000 B42 S

[81 rows x 12 columns]

[37]: #26.Locate first row in the df using iloc

df.iloc[0]

[37]: PassengerId 1
Survived 0
Pclass 3
Name Braund, Mr. Owen Harris
Sex male
Age 22.0
SibSp 1
Parch 0
Ticket A/5 21171
Fare 7.25
Cabin NaN
Embarked S
Name: 0, dtype: object

[38]: #27.Locate last row in the df using iloc

df.iloc[4]

[38]: PassengerId 5
Survived 0

10
Pclass 3
Name Allen, Mr. William Henry
Sex male
Age 35.0
SibSp 0
Parch 0
Ticket 373450
Fare 8.05
Cabin NaN
Embarked S
Name: 4, dtype: object

[39]: #28.Locate last row in the df using iloc

df.iloc[-1]

[39]: PassengerId 891

Survived 0
Pclass 3
Name Dooley, Mr. Patrick
Sex male
Age 32.0
SibSp 0
Parch 0
Ticket 370376
Fare 7.75
Cabin NaN
Embarked Q
Name: 890, dtype: object

[40]: #29.Locate first , nth, last col in df using iloc

df.iloc[:,-1]

[40]: 0 S
1 C
2 S
3 S
4 S
..
886 S
887 S
888 S
889 C
890 Q
Name: Embarked, Length: 891, dtype: object

[41]: #30 locate all the rows in df within specific range of row index
df.iloc[0:5]

11
[41]: PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3

Name Sex Age SibSp \

Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S

[42]: #31 locate all cols within a range of col index

df.iloc[:,0:2]

[42]: PassengerId Survived

0 1 0
1 2 1
2 3 1
3 4 1
4 5 0
.. … …
886 887 0
887 888 1
888 889 0
889 890 1
890 891 0

[891 rows x 2 columns]

[43]: #32 locate second through third row and first 2 cols
df.iloc[1:3,0:2]

[43]: PassengerId Survived

1 2 1
2 3 1

12
[44]: #33 loc 1st and 6thh row
df.iloc[[0,5],[0,3]]

[44]: PassengerId Name

0 1 Braund, Mr. Owen Harris
5 6 Moran, Mr. James

[46]: #34. save created data in csv, excel file and reload and check
df_d.to_csv('EDA_data.csv')

[47]: df_d.to_excel('EDA_data.xlsx')

[48]: df_csv = pd.read_csv('EDA_data.csv')

df_excel = pd.read_excel('EDA_data.xlsx')
print(df_csv)
print(df_excel)

Unnamed: 0 Name Age Gender Salary

0 0 Ram 20 Male 12000
1 1 Shyam 22 Male 23000
2 2 Gita 19 Female 20000
3 3 Sita 18 Female 19000
4 4 Druv 20 Male 10000
5 5 Om 21 Male 25000
6 6 Radhika 20 Female 40000
Unnamed: 0 Name Age Gender Salary
0 0 Ram 20 Male 12000
1 1 Shyam 22 Male 23000
2 2 Gita 19 Female 20000
3 3 Sita 18 Female 19000
4 4 Druv 20 Male 10000
5 5 Om 21 Male 25000
6 6 Radhika 20 Female 40000

[ ]:

Mean Median and Mode PowerPoint
No ratings yet
Mean Median and Mode PowerPoint
19 pages
Class - X Statistics
No ratings yet
Class - X Statistics
2 pages
Logistic Regression On Titanic Dataset
No ratings yet
Logistic Regression On Titanic Dataset
6 pages
KLM k3
No ratings yet
KLM k3
929 pages
Celonis Assessment Answer
0% (1)
Celonis Assessment Answer
13 pages
Titanic
100% (2)
Titanic
13 pages
Titanic Classification
100% (1)
Titanic Classification
7 pages
Mean Median Mode Range Demonstration
No ratings yet
Mean Median Mode Range Demonstration
29 pages
Mean, Median, Standard
100% (1)
Mean, Median, Standard
11 pages
Nimcet Math Practice Set
No ratings yet
Nimcet Math Practice Set
4 pages
???? ???????????? ???? ??????
No ratings yet
???? ???????????? ???? ??????
63 pages
Is Notes Ese
No ratings yet
Is Notes Ese
40 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
ch11 WB Ans PDF
No ratings yet
ch11 WB Ans PDF
39 pages
Maths Notes PDF
No ratings yet
Maths Notes PDF
100 pages
Learneverythingai 1695069129
No ratings yet
Learneverythingai 1695069129
56 pages
Titanic Survival Prediction ML
No ratings yet
Titanic Survival Prediction ML
36 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
33 pages
Unit 5 Analysis With Pandas in Python
No ratings yet
Unit 5 Analysis With Pandas in Python
26 pages
Untitled: Pandas PD Os
No ratings yet
Untitled: Pandas PD Os
55 pages
Data Cleaning and Manipulation in Python
No ratings yet
Data Cleaning and Manipulation in Python
33 pages
Pandas Toolkit
No ratings yet
Pandas Toolkit
44 pages
LOGISTIC - REGRESSION - Jupyter Notebook
No ratings yet
LOGISTIC - REGRESSION - Jupyter Notebook
18 pages
Titanic Eda
No ratings yet
Titanic Eda
17 pages
Titanic
No ratings yet
Titanic
22 pages
Data Cleaning by Manish Batra 1697684636
No ratings yet
Data Cleaning by Manish Batra 1697684636
30 pages
Seaborn Ploting in Titanic
No ratings yet
Seaborn Ploting in Titanic
18 pages
4a Measures of Location
No ratings yet
4a Measures of Location
32 pages
23L-2589 Lab 10
No ratings yet
23L-2589 Lab 10
17 pages
3rd Grading Grade 10 Mathematics
No ratings yet
3rd Grading Grade 10 Mathematics
17 pages
Information Security Notes On Last Three-Unites
No ratings yet
Information Security Notes On Last Three-Unites
12 pages
Titanic Data Analysis
No ratings yet
Titanic Data Analysis
14 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
28 pages
Machine Learning Notebook
No ratings yet
Machine Learning Notebook
19 pages
Aiml Lab04&5 - Output
No ratings yet
Aiml Lab04&5 - Output
18 pages
Ds 9
No ratings yet
Ds 9
12 pages
Sst101-Lecture 3
No ratings yet
Sst101-Lecture 3
25 pages
Mean Median Mode Range Practice Problems
No ratings yet
Mean Median Mode Range Practice Problems
14 pages
Generative AI Notes
No ratings yet
Generative AI Notes
11 pages
Titanic Test
No ratings yet
Titanic Test
22 pages
Project Report
No ratings yet
Project Report
7 pages
Python Pandas Library
No ratings yet
Python Pandas Library
10 pages
ML Dataset Performance
No ratings yet
ML Dataset Performance
11 pages
Pandas PD: Import As
No ratings yet
Pandas PD: Import As
19 pages
NCERT Solutions For Maths Chapter 3 - Data Handling Class 7 FREE PDF
No ratings yet
NCERT Solutions For Maths Chapter 3 - Data Handling Class 7 FREE PDF
27 pages
EPS - Chapter - 1 - Descriptive Statistics - JNN - OK
No ratings yet
EPS - Chapter - 1 - Descriptive Statistics - JNN - OK
29 pages
Assign8.ipynb - Colab
No ratings yet
Assign8.ipynb - Colab
14 pages
Passengerid Survived Pclass Name Sex Age Sibsp Parch Ticket
No ratings yet
Passengerid Survived Pclass Name Sex Age Sibsp Parch Ticket
16 pages
Assignment 5
No ratings yet
Assignment 5
14 pages
Averages PixiPPt
No ratings yet
Averages PixiPPt
20 pages
The Titanic Dataset
No ratings yet
The Titanic Dataset
6 pages
Dspracticalexternak 23 Aug
No ratings yet
Dspracticalexternak 23 Aug
8 pages
Dsbda 8
No ratings yet
Dsbda 8
8 pages
RD Sharma Solution Jan2021 Class 7 Chapter 23
No ratings yet
RD Sharma Solution Jan2021 Class 7 Chapter 23
26 pages
Dataset Visualization Basic Ml-1
No ratings yet
Dataset Visualization Basic Ml-1
12 pages
Titanic Survival Prediction 1692609491
No ratings yet
Titanic Survival Prediction 1692609491
15 pages
Homework 1
No ratings yet
Homework 1
17 pages
IOT Notes
No ratings yet
IOT Notes
11 pages
Pandas - Data Manipulation and Analysis Library - Educative
No ratings yet
Pandas - Data Manipulation and Analysis Library - Educative
7 pages
Unit 1 - SM and DM
No ratings yet
Unit 1 - SM and DM
16 pages
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
No ratings yet
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
13 pages
Maneesha Nidigonda Minor Project .Ipynb
No ratings yet
Maneesha Nidigonda Minor Project .Ipynb
35 pages
ML 3
No ratings yet
ML 3
9 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Lab 5.ipynb - Colab
No ratings yet
Lab 5.ipynb - Colab
6 pages
Dev Assignment - 1
No ratings yet
Dev Assignment - 1
6 pages
Day 20
No ratings yet
Day 20
5 pages
PANDAS Groupby Continues 2
No ratings yet
PANDAS Groupby Continues 2
5 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
13 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Unit 3 - SM and DM
No ratings yet
Unit 3 - SM and DM
6 pages
Statistics 2024 - 25
No ratings yet
Statistics 2024 - 25
10 pages
Business Mathematics and Statistic
No ratings yet
Business Mathematics and Statistic
18 pages
Pandas Day 4
No ratings yet
Pandas Day 4
7 pages
Prac3 23bme053
No ratings yet
Prac3 23bme053
5 pages
ML Lab File
No ratings yet
ML Lab File
19 pages
Field Work Project 1
No ratings yet
Field Work Project 1
7 pages
Assign9.Ipynb - Colab
No ratings yet
Assign9.Ipynb - Colab
4 pages
Pythion Assigment
No ratings yet
Pythion Assigment
3 pages
7 8 - Missing Value Handling
No ratings yet
7 8 - Missing Value Handling
4 pages
178 - NaiveBaye's.ipynb - Colab
No ratings yet
178 - NaiveBaye's.ipynb - Colab
3 pages
Day 20
No ratings yet
Day 20
5 pages
Quantitative Techniques in Business: Statistical Part
No ratings yet
Quantitative Techniques in Business: Statistical Part
28 pages
Onkar Exp 3 - Jupyter Notebook
No ratings yet
Onkar Exp 3 - Jupyter Notebook
2 pages
Ishika Chaprana
No ratings yet
Ishika Chaprana
5 pages
Assignment Data Science
No ratings yet
Assignment Data Science
2 pages
Lecture CH 3
No ratings yet
Lecture CH 3
11 pages
Skor Hasil Lempar Cakram B. Data Terserak (Tunggal)
No ratings yet
Skor Hasil Lempar Cakram B. Data Terserak (Tunggal)
8 pages
A. Tinggi Tanaman Sawi Hijau (Brassica Juncea L.) Pada 14 HST. Deskriptif
No ratings yet
A. Tinggi Tanaman Sawi Hijau (Brassica Juncea L.) Pada 14 HST. Deskriptif
11 pages
Bokkallocation
No ratings yet
Bokkallocation
2 pages
Mean. Num.
No ratings yet
Mean. Num.
3 pages
Titanic Data
No ratings yet
Titanic Data
5 pages
Measures of Averages
No ratings yet
Measures of Averages
32 pages
Tut I
No ratings yet
Tut I
3 pages
Average
No ratings yet
Average
4 pages
Question of Business Statistics
No ratings yet
Question of Business Statistics
4 pages
Cbjemaco 14
No ratings yet
Cbjemaco 14
9 pages

AM19 EDA Assignment1

Uploaded by

AM19 EDA Assignment1

Uploaded by

am19-eda-assignment1

November 28, 2024

Name: Swapnil Chaudhari

[3]: PassengerId Survived Pclass \

Name Sex Age SibSp \

[891 rows x 12 columns]

[5]: #1.Create a dataframe

[6]: df_d = pd.DataFrame(data)

[7]: Name Age Gender Salary

[8]: #2.Find shape of the data

[8]: (891, 12)

[9]: #3.Find size of the data

[10]: #4.Find dimensions of the data

[11]: #5.List all columns in df

[11]: Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',

[12]: #6.Find datatypes of each column

[12]: PassengerId int64

[13]: #7.Find axes of the df

[13]: [RangeIndex(start=0, stop=891, step=1),

[14]: #8.Find index of the df

[14]: RangeIndex(start=0, stop=891, step=1)

[15]: #9.Find all values of df

[15]: array([[1, 0, 3, …, 7.25, nan, 'S'],

[16]: #10.Check whether the df id empty

[17]: #11.Transpose the df

[18]: #12.Find detailed info of the df

[19]: #13.Display top n record from the df

[19]: PassengerId Survived Pclass \

Name Sex Age SibSp \

Parch Ticket Fare Cabin Embarked

[20]: #14.Display bottom n record from the df

[20]: PassengerId Survived Pclass Name \

Sex Age SibSp Parch Ticket Fare Cabin Embarked

[21]: PassengerId Survived Pclass Age SibSp \

[28]: #17.Return a random samples from df

[28]: PassengerId Survived Pclass Name Sex Age SibSp \

Parch Ticket Fare Cabin Embarked

[29]: #18.Find unique values for the categorical columns

[29]: array(['male', 'female'], dtype=object)

[30]: #19.Find number of unique values for the categorical columns

[31]: #20.Locate first row in the df using loc

[32]: #21.Locate nth row in the df using loc

[33]: #22.Locate last row in the df using loc

[33]: PassengerId 891

[34]: #23.Locate all values in a range in df using loc

[34]: PassengerId Survived Pclass \

Name Sex Age SibSp \

Parch Ticket Fare Cabin Embarked

[35]: PassengerId Survived Pclass \

Name Sex Age SibSp \

Parch Ticket Fare Cabin Embarked

[535 rows x 12 columns]

[36]: PassengerId Survived Pclass \

Name Sex Age SibSp \

Parch Ticket Fare Cabin Embarked

[81 rows x 12 columns]

[37]: #26.Locate first row in the df using iloc

[38]: #27.Locate last row in the df using iloc

[39]: #28.Locate last row in the df using iloc

[39]: PassengerId 891

[40]: #29.Locate first , nth, last col in df using iloc

Name Sex Age SibSp \

Parch Ticket Fare Cabin Embarked

[42]: #31 locate all cols within a range of col index

[42]: PassengerId Survived

[891 rows x 2 columns]

[43]: PassengerId Survived

[44]: PassengerId Name

[48]: df_csv = pd.read_csv('EDA_data.csv')

Unnamed: 0 Name Age Gender Salary

You might also like