100% found this document useful (1 vote)

85 views7 pages

2.basic Statistics - Jupyter Notebook

This document demonstrates various data analysis and manipulation techniques using the Pandas library in Python. It loads breast cancer data from a CSV file, then cleans and explores the data. Key steps include deleting unnecessary columns, calculating descriptive statistics, frequencies of diagnosis types, and replacing text labels. The goal is to prepare and understand the data for further analysis.

Uploaded by

venkatesh m

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

85 views7 pages

2.basic Statistics - Jupyter Notebook

Uploaded by

venkatesh m

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

In

[3]: import pandas as pd

In [4]: import numpy as np

In [3]: mba = pd.read_csv("D:\\Course PPTS\\R Codes\\1 Basis staistics\\mba.csv")

#C:\\Users\\Rohit\\Desktop\\Course PPTS\\R Codes\\1 Basis staistics\\mba.csv

In [4]: mba

...

In [5]: # number of Rows

len(mba)

...

In [6]: # check the number of columns

len(mba.columns)

...

In [7]: mba.shape

...

In [8]: # column Names

mba.columns

#in R mba$gmat
#in python mba['gmat']

...

In [9]: # Top rows

mba.head() # will give top 5 rows

...

In [10]: # mention number of rows to display

mba.head(10)

...

In [11]: # tail function to bottom rows

mba.tail(6)

...
In [12]: # column information dataset structure
mba.info()

...

In [13]: # Get stats on the columns

mba.describe()
#summary(mba)
#summary(mba$gmat)

...

In [14]: mba.describe().transpose()

...

In [15]: mba

# del mba['datasrno'] by using delete command
#mba.drop(0) by giving index it will remove the column

...

In [16]: # Rows information

mba[23:27]

...

In [17]: #mba$workex

mba['workex']

...

In [18]: mba1 = mba[['workex','gmat']]

mba1

...

In [19]:
del mba['Datasrno']

mba

...

In [20]: #In R mean(mba)

mba.mean()

...
In [21]: mba.std()

...

In [22]: mba.describe()

...

In [1]: # in R mean(mba$gmat)

mba['gmat'].mean()

...

In [24]: mba['gmat'].median()

#mba['gmat'].mean
#mba['gmat'].mode()
#mba['gmat'].var()
#mba['gmat'].std()
#mba['gmat'].max()
#mba['gmat'].min()
# Range = mba['gmat'].max() - mba['gmat'].min()
...

In [10]: mba['workex'].mode()

...

In [11]: mba['gmat'].var()

...

In [12]: mba['gmat'].std()

...

In [17]: max(mba['gmat'])

...

In [18]: min(mba['gmat'])
...

In [19]: range = max(mba['gmat'])-min(mba['gmat'])

range

...
In [13]: # In R skewness and kurtosis - we have installed e1071 package

from scipy.stats import skew

skew = skew(mba['gmat'])
skew
#print("skewness value of gmat:",skew)

...

In [13]: from scipy.stats import kurtosis

kurtosis(mba['gmat'])

...

In [14]: from scipy.stats import mode

mode(mba['gmat'])

...

In [1]: from scipy import stats

In [ ]:

Categorical Analysis
In [5]: import pandas as pd
In [6]: wbcd = pd.read_csv("D:\\Course\\Python\\Datasets\\wbcd.csv")
wbcd

Out[6]: id diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_m

0 87139402 B 12.32 12.39 78.85 464.1 0.1

1 8910251 B 10.60 18.95 69.28 346.4 0.0

2 905520 B 11.04 16.83 70.92 373.2 0.1

3 868871 B 11.28 13.39 73.00 384.8 0.1

4 9012568 B 15.19 13.21 97.65 711.8 0.0

... ... ... ... ... ... ...

564 911320502 B 13.17 18.22 84.28 537.3 0.0

565 898677 B 10.26 14.71 66.20 321.6 0.0

566 873885 M 15.28 22.41 98.92 710.6 0.0

567 911201 B 14.53 13.98 93.86 644.2 0.1

568 9012795 M 21.37 15.10 141.30 1386.0 0.1

569 rows × 32 columns

In [3]: wbcd

del wbcd['id']

In [4]: wbcd

Out[4]: diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean comp

0 B 12.32 12.39 78.85 464.1 0.10280

1 B 10.60 18.95 69.28 346.4 0.09688

2 B 11.04 16.83 70.92 373.2 0.10770

3 B 11.28 13.39 73.00 384.8 0.11640

4 B 15.19 13.21 97.65 711.8 0.07963

... ... ... ... ... ... ...

564 B 13.17 18.22 84.28 537.3 0.07466

565 B 10.26 14.71 66.20 321.6 0.09882

566 M 15.28 22.41 98.92 710.6 0.09057

567 B 14.53 13.98 93.86 644.2 0.10990

568 M 21.37 15.10 141.30 1386.0 0.10010

569 rows × 31 columns

In [27]: wbcd['diagnosis'].value_counts()

Out[27]: B 357

M 212

Name: diagnosis, dtype: int64

In [28]: freq = pd.crosstab(index=wbcd['diagnosis'], # Make a crosstab

columns="count")
freq

# Number of B are 357
# number of M are 212

Out[28]: col_0 count

diagnosis

B 357

M 212

In [29]: freq/freq.sum()

# percentage

# b = 357 /357 + 212 = 357/569= 62
# m = 212 / 357+212 = 212.569 = 38

Out[29]: col_0 count

diagnosis

B 0.627417

M 0.372583

In [ ]: # 62 % of users have benign Diagnosis

# 38 % of users have Malignant Diagnosis

In [30]: # replace function used to change the label name in the rows

wbcd['diagnosis'].replace({"B":"Bengign","M":"Malignant"},inplace=True)

In [31]: wbcd

...

In [7]: # To replace Column names in the dataset

wbcd.rename(columns={'radius_mean':'Mean Radius'},inplace=True)

In [38]: wbcd

Out[38]: Mean
diagnosis texture_mean perimeter_mean area_mean smoothness_mean compactnes
Radius

0 Bengign 12.32 12.39 78.85 464.1 0.10280

1 Bengign 10.60 18.95 69.28 346.4 0.09688

2 Bengign 11.04 16.83 70.92 373.2 0.10770

3 Bengign 11.28 13.39 73.00 384.8 0.11640

4 Bengign 15.19 13.21 97.65 711.8 0.07963

... ... ... ... ... ... ...

564 Bengign 13.17 18.22 84.28 537.3 0.07466

565 Bengign 10.26 14.71 66.20 321.6 0.09882

566 Malignant 15.28 22.41 98.92 710.6 0.09057

567 Bengign 14.53 13.98 93.86 644.2 0.10990

568 Malignant 21.37 15.10 141.30 1386.0 0.10010

569 rows × 31 columns

In [ ]:

MTA Daily Ridership
No ratings yet
MTA Daily Ridership
137 pages
Machine Learning With MATLAB Quick Reference
No ratings yet
Machine Learning With MATLAB Quick Reference
36 pages
Chapter 14 Ia2
75% (4)
Chapter 14 Ia2
18 pages
Batch Control IsA 9 21 2010
100% (2)
Batch Control IsA 9 21 2010
52 pages
Project-Report Sample
No ratings yet
Project-Report Sample
59 pages
Indikator Skripta2
No ratings yet
Indikator Skripta2
3 pages
CS-3361-Data-science-lab Manual
No ratings yet
CS-3361-Data-science-lab Manual
36 pages
Data Science Practicals
No ratings yet
Data Science Practicals
47 pages
Shahun Term Workr1
No ratings yet
Shahun Term Workr1
34 pages
Machine Learning (ML)
No ratings yet
Machine Learning (ML)
35 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
11com OCM Final 21-22
80% (5)
11com OCM Final 21-22
5 pages
Final Ai M225187154i
No ratings yet
Final Ai M225187154i
25 pages
COST - JournalPracticals (1-7)
No ratings yet
COST - JournalPracticals (1-7)
22 pages
Useful R Commands
No ratings yet
Useful R Commands
17 pages
Package Msstats': March 1, 2022
No ratings yet
Package Msstats': March 1, 2022
59 pages
Applied Statistics MAT1011
No ratings yet
Applied Statistics MAT1011
22 pages
Datamining
No ratings yet
Datamining
20 pages
Instructions For Using R To Create Predictive Models v5
No ratings yet
Instructions For Using R To Create Predictive Models v5
17 pages
Program 8
No ratings yet
Program 8
11 pages
Aiml
No ratings yet
Aiml
18 pages
Python Code Longterm
No ratings yet
Python Code Longterm
5 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
A926534728 - 28953 - 8 - 2025 - Spark Mllib
No ratings yet
A926534728 - 28953 - 8 - 2025 - Spark Mllib
8 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
7 Data Transformation - Jupyter Notebook
No ratings yet
7 Data Transformation - Jupyter Notebook
3 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
New K Means - Jupyter Notebook
No ratings yet
New K Means - Jupyter Notebook
4 pages
Python Statisc
No ratings yet
Python Statisc
7 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
Aditya Garg DMDW
No ratings yet
Aditya Garg DMDW
40 pages
Case Study Output
No ratings yet
Case Study Output
4 pages
Intro To R Software
No ratings yet
Intro To R Software
7 pages
Collapse Cheat Sheet
No ratings yet
Collapse Cheat Sheet
2 pages
Escript Com Rede de Correlação
No ratings yet
Escript Com Rede de Correlação
2 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
R File Code
No ratings yet
R File Code
16 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Vicky Patil - Practical - 9 - Colab
No ratings yet
Vicky Patil - Practical - 9 - Colab
4 pages
Matlab For Pattern Recognition
No ratings yet
Matlab For Pattern Recognition
58 pages
Stats With Py
No ratings yet
Stats With Py
2 pages
Problem Set 1 Solution Numerical Methods
No ratings yet
Problem Set 1 Solution Numerical Methods
32 pages
R Practice
No ratings yet
R Practice
38 pages
Matplotlib Fundamentals
No ratings yet
Matplotlib Fundamentals
31 pages
% T Test of Portfolio Returns: 'Sdata - Mat' 'Bdata - Mat'
No ratings yet
% T Test of Portfolio Returns: 'Sdata - Mat' 'Bdata - Mat'
4 pages
Crankshaft - Ema
No ratings yet
Crankshaft - Ema
7 pages
R Console
No ratings yet
R Console
6 pages
Scilab Cose
No ratings yet
Scilab Cose
3 pages
R Functions List
No ratings yet
R Functions List
8 pages
Using R For Data Preprocessing, Exploratory Analysis, Visualization
No ratings yet
Using R For Data Preprocessing, Exploratory Analysis, Visualization
7 pages
Resumo Adp
No ratings yet
Resumo Adp
5 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
Machine Learning Notes: 2. All The Commands For Eda
100% (2)
Machine Learning Notes: 2. All The Commands For Eda
5 pages
New Misc Mod
No ratings yet
New Misc Mod
36 pages
Cat - D8T Dozer Specs, Videos & 360 Views - D8 Dozer - Caterpillar
No ratings yet
Cat - D8T Dozer Specs, Videos & 360 Views - D8 Dozer - Caterpillar
17 pages
Handling Missing Values in A Real-Time Dataset During
No ratings yet
Handling Missing Values in A Real-Time Dataset During
5 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
No ratings yet
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
13 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
01 Road Roller Basic Knowledge (6611E)
0% (1)
01 Road Roller Basic Knowledge (6611E)
16 pages
Data Visualisation L9+L10 Lab 1 R Basics: Printing Character
No ratings yet
Data Visualisation L9+L10 Lab 1 R Basics: Printing Character
9 pages
E Ticket
No ratings yet
E Ticket
2 pages
NCL Functions and Procedures Reference Cards
No ratings yet
NCL Functions and Procedures Reference Cards
13 pages
6 XG Boost - Jupyter Notebook
100% (1)
6 XG Boost - Jupyter Notebook
3 pages
R Reference Card
No ratings yet
R Reference Card
1 page
Control and Operation of Centrifugal Gas Compressors
0% (1)
Control and Operation of Centrifugal Gas Compressors
6 pages
TQM 2-Customer Satisfaction
No ratings yet
TQM 2-Customer Satisfaction
10 pages
List of Books and Notebooks - 2025-26 Class 6-12
No ratings yet
List of Books and Notebooks - 2025-26 Class 6-12
7 pages
G2 - Imrad Hbo
No ratings yet
G2 - Imrad Hbo
19 pages
08Pr067C Electrical Safety: Safety Management System Procedure
No ratings yet
08Pr067C Electrical Safety: Safety Management System Procedure
8 pages
Property Management Presentation
100% (1)
Property Management Presentation
14 pages
3 Com
No ratings yet
3 Com
465 pages
FII and DII in Indian Stock Market: A Behavioural Study
No ratings yet
FII and DII in Indian Stock Market: A Behavioural Study
9 pages
Hydrogen Aircraft and Airport Safety
No ratings yet
Hydrogen Aircraft and Airport Safety
31 pages
International Marketing Planning and Control
No ratings yet
International Marketing Planning and Control
8 pages
Notes of Trends Makmak
No ratings yet
Notes of Trends Makmak
14 pages
FD Revised 5 - Asf Devastation and Financial Performance of Pork Suppliers in Davao City
No ratings yet
FD Revised 5 - Asf Devastation and Financial Performance of Pork Suppliers in Davao City
53 pages
Cat 966h WL Hydraulic System
No ratings yet
Cat 966h WL Hydraulic System
1 page
Journal of Accounting and Economics: Shuping Chen, Ying Huang, Ningzhong Li, Terry Shevlin T
No ratings yet
Journal of Accounting and Economics: Shuping Chen, Ying Huang, Ningzhong Li, Terry Shevlin T
19 pages
Sison V Teodoro
No ratings yet
Sison V Teodoro
1 page
The Interactive Effect of Job Involvement and Organizational Commitment On Job Turnover Revisited: A Note On The Mediating Role of Turnover Intention
No ratings yet
The Interactive Effect of Job Involvement and Organizational Commitment On Job Turnover Revisited: A Note On The Mediating Role of Turnover Intention
6 pages
Introduction To The USA and Canada
No ratings yet
Introduction To The USA and Canada
10 pages
2016 CCNY Great Grads
No ratings yet
2016 CCNY Great Grads
16 pages
1 Basics of Python
No ratings yet
1 Basics of Python
6 pages
Royal Ahold NV
No ratings yet
Royal Ahold NV
6 pages
1 Simple Linear Regression
No ratings yet
1 Simple Linear Regression
9 pages
123GL Undstd Cybersec
No ratings yet
123GL Undstd Cybersec
6 pages
Birds Nest Menu
No ratings yet
Birds Nest Menu
7 pages
3 SVM - Jupyter Notebook
No ratings yet
3 SVM - Jupyter Notebook
4 pages
RD Rigidsteelconduitimc
No ratings yet
RD Rigidsteelconduitimc
1 page
2 Basic of Python - Functions
No ratings yet
2 Basic of Python - Functions
3 pages
1 KNN - Jupyter Notebook
No ratings yet
1 KNN - Jupyter Notebook
3 pages
5 Random Forest - Jupyter Notebook
No ratings yet
5 Random Forest - Jupyter Notebook
2 pages
Alphatec Solvex 37 176
No ratings yet
Alphatec Solvex 37 176
1 page
PostGIS Cookbook
From Everand
PostGIS Cookbook
Paolo Corti
No ratings yet
TensorFlow深度学习项目实战: Chinese Edition
From Everand
TensorFlow深度学习项目实战: Chinese Edition
Posts & Telecom Press
No ratings yet

2.basic Statistics - Jupyter Notebook

Uploaded by

2.basic Statistics - Jupyter Notebook

Uploaded by

In

[3]: import pandas as pd

In [4]: import numpy as np

In [3]: mba = pd.read_csv("D:\\Course PPTS\\R Codes\\1 Basis staistics\\mba.csv")

In [5]: # number of Rows

In [6]: # check the number of columns

In [8]: # column Names

In [9]: # Top rows

In [10]: # mention number of rows to display

In [11]: # tail function to bottom rows

In [13]: # Get stats on the columns

In [16]: # Rows information

In [18]: mba1 = mba[['workex','gmat']]

In [20]: #In R mean(mba)

In [19]: range = max(mba['gmat'])-min(mba['gmat'])

In [13]: from scipy.stats import kurtosis

In [14]: from scipy.stats import mode

In [1]: from scipy import stats

Out[6]: id diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_m

0 87139402 B 12.32 12.39 78.85 464.1 0.1

1 8910251 B 10.60 18.95 69.28 346.4 0.0

2 905520 B 11.04 16.83 70.92 373.2 0.1

3 868871 B 11.28 13.39 73.00 384.8 0.1

4 9012568 B 15.19 13.21 97.65 711.8 0.0

... ... ... ... ... ... ...

564 911320502 B 13.17 18.22 84.28 537.3 0.0

565 898677 B 10.26 14.71 66.20 321.6 0.0

566 873885 M 15.28 22.41 98.92 710.6 0.0

567 911201 B 14.53 13.98 93.86 644.2 0.1

568 9012795 M 21.37 15.10 141.30 1386.0 0.1

569 rows × 32 columns

Out[4]: diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean comp

0 B 12.32 12.39 78.85 464.1 0.10280

1 B 10.60 18.95 69.28 346.4 0.09688

2 B 11.04 16.83 70.92 373.2 0.10770

3 B 11.28 13.39 73.00 384.8 0.11640

4 B 15.19 13.21 97.65 711.8 0.07963

... ... ... ... ... ... ...

564 B 13.17 18.22 84.28 537.3 0.07466

565 B 10.26 14.71 66.20 321.6 0.09882

566 M 15.28 22.41 98.92 710.6 0.09057

567 B 14.53 13.98 93.86 644.2 0.10990

568 M 21.37 15.10 141.30 1386.0 0.10010

569 rows × 31 columns

Name: diagnosis, dtype: int64

In [28]: freq = pd.crosstab(index=wbcd['diagnosis'], # Make a crosstab

Out[28]: col_0 count

Out[29]: col_0 count

In [ ]: # 62 % of users have benign Diagnosis

In [7]: # To replace Column names in the dataset

0 Bengign 12.32 12.39 78.85 464.1 0.10280

1 Bengign 10.60 18.95 69.28 346.4 0.09688

2 Bengign 11.04 16.83 70.92 373.2 0.10770

3 Bengign 11.28 13.39 73.00 384.8 0.11640

4 Bengign 15.19 13.21 97.65 711.8 0.07963

... ... ... ... ... ... ...

564 Bengign 13.17 18.22 84.28 537.3 0.07466

565 Bengign 10.26 14.71 66.20 321.6 0.09882

566 Malignant 15.28 22.41 98.92 710.6 0.09057

567 Bengign 14.53 13.98 93.86 644.2 0.10990

568 Malignant 21.37 15.10 141.30 1386.0 0.10010

569 rows × 31 columns

You might also like