0% found this document useful (0 votes)
61 views44 pages

Notebooks pp2

The document describes cleaning and preparing a dataset for analysis. It loads necessary packages and reads in a CSV file. It then checks the shape of the data, looks at the top rows, checks the data types of each parameter, fills in any null values, rechecks for null values, and checks for duplicate values. The goal is to clean the raw data file and produce a final cleaned dataset ready for analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views44 pages

Notebooks pp2

The document describes cleaning and preparing a dataset for analysis. It loads necessary packages and reads in a CSV file. It then checks the shape of the data, looks at the top rows, checks the data types of each parameter, fills in any null values, rechecks for null values, and checks for duplicate values. The goal is to clean the raw data file and produce a final cleaned dataset ready for analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

notebooks-pp2

March 29, 2023

[1]: #@title
#Read the data set, clean the data and prepare final dataset to be used for␣
↪analysis.

import numpy as np #Loading necessary packages


import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
from PIL import Image
import io
from google.colab import files

[2]: uploaded = files.upload()

for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))

<IPython.core.display.HTML object>
Saving data_faults_dup.csv to data_faults_dup.csv

User uploaded file "data_faults_dup.csv" with length 11324465 bytes

[3]: BB = pd.read_csv('data_faults_dup.csv')

Shape of the training data set:


[4]: BB.shape #Finding the no.of.entries

[4]: (34000, 31)

First 5 entries of the dataset:


[5]: BB.head() #Listing the top data

1
[5]: Vs1 Vs2 Vs3 Is1 Is2 Is3 Ipv \
0 22.213547 22.213547 22.213547 1.542505 1.542505 1.542505 4.622393
1 22.238960 22.238960 22.238960 1.542353 1.542353 1.542353 4.621720
2 22.294014 22.294014 22.294014 1.541992 1.541992 1.541992 4.620301
3 22.294085 22.294085 22.294085 1.541991 1.541991 1.541991 4.620299
4 22.299598 22.299598 22.299598 1.541955 1.541955 1.541955 4.620160

Vpv Vbdc Ibdc … Vb Vc Ia \


0 22.213547 28.538426 -8.364281 … 14.231665 28.514607 0.118882
1 22.238960 28.515030 -7.634220 … 14.219998 28.491230 0.118785
2 22.294014 28.474557 -6.351888 … 14.199815 28.450791 0.118616
3 22.294085 28.474513 -6.350457 … 14.199792 28.450747 0.118616
4 22.299598 28.471093 -6.240673 … 14.198087 28.447330 0.118602

Ib Ic Ivsiin Vab Vbc Vca state


0 -0.000142 -0.118740 0.119453 14.265850 -0.017092 -14.248758 0
1 -0.000142 -0.118642 0.119355 14.254154 -0.017078 -14.237076 0
2 -0.000142 -0.118474 0.119185 14.233923 -0.017054 -14.216869 0
3 -0.000142 -0.118474 0.119185 14.233900 -0.017054 -14.216846 0
4 -0.000142 -0.118459 0.119171 14.232191 -0.017052 -14.215139 0

[5 rows x 31 columns]

Information on data types of each parameter of the dataset:


[6]: BB.info() #datatype and parameter information

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 34000 entries, 0 to 33999

Data columns (total 31 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Vs1 34000 non-null float64

1 Vs2 34000 non-null float64

2 Vs3 34000 non-null float64

3 Is1 34000 non-null float64

4 Is2 34000 non-null float64

5 Is3 34000 non-null float64

2
6 Ipv 34000 non-null float64

7 Vpv 34000 non-null float64

8 Vbdc 34000 non-null float64

9 Ibdc 34000 non-null float64

10 Vs2a 34000 non-null float64

11 Is2a 34000 non-null float64

12 Vs2b 34000 non-null float64

13 Is2b 34000 non-null float64

14 Iin 34000 non-null float64

15 Iout 34000 non-null float64

16 Vin 34000 non-null float64

17 Vout 34000 non-null float64

18 Iswitch1 34000 non-null float64

19 Vswitch1 34000 non-null float64

20 Va 34000 non-null float64

21 Vb 34000 non-null float64

22 Vc 34000 non-null float64

23 Ia 34000 non-null float64

24 Ib 34000 non-null float64

25 Ic 34000 non-null float64

26 Ivsiin 34000 non-null float64

27 Vab 34000 non-null float64

28 Vbc 34000 non-null float64

29 Vca 34000 non-null float64

3
30 state 34000 non-null int64

dtypes: float64(30), int64(1)

memory usage: 8.0 MB

[7]: BB.isna().sum() #Check for null values


BB_new = BB.fillna("") #Fill null values with Nan
print(BB_new)

Vs1 Vs2 Vs3 Is1 Is2 Is3 \

0 22.213547 22.213547 22.213547 1.542505 1.542505 1.542505

1 22.238960 22.238960 22.238960 1.542353 1.542353 1.542353

2 22.294014 22.294014 22.294014 1.541992 1.541992 1.541992

3 22.294085 22.294085 22.294085 1.541991 1.541991 1.541991

4 22.299598 22.299598 22.299598 1.541955 1.541955 1.541955

… … … … … … …

33995 9.246019 9.246019 9.246019 1.612734 1.612734 1.612734

33996 9.246019 9.246019 9.246019 1.612734 1.612734 1.612734

33997 9.246021 9.246021 9.246021 1.612734 1.612734 1.612734

33998 9.246038 9.246038 9.246038 1.612734 1.612734 1.612734

33999 9.246299 9.246299 9.246299 1.612733 1.612733 1.612733

Ipv Vpv Vbdc Ibdc … Vb Vc \

0 4.622393 22.213547 28.538426 -8.364281 … 14.231665 28.514607

1 4.621720 22.238960 28.515030 -7.634220 … 14.219998 28.491230

2 4.620301 22.294014 28.474557 -6.351888 … 14.199815 28.450791

3 4.620299 22.294085 28.474513 -6.350457 … 14.199792 28.450747

4 4.620160 22.299598 28.471093 -6.240673 … 14.198087 28.447330

4
… … … … … … … …

33995 4.838129 9.246019 8.358975 45.522989 … 0.000000 0.000000

33996 4.838129 9.246019 8.358930 45.536640 … 0.000000 0.000000

33997 4.838133 9.246021 8.354621 46.792753 … 0.000000 0.000000

33998 4.838144 9.246038 8.346060 48.067163 … 0.000000 0.000000

33999 4.838144 9.246299 8.274699 46.790538 … 0.000000 0.000000

Ia Ib Ic Ivsiin Vab Vbc \

0 0.118882 -0.000142 -0.118740 0.119453 14.265850 -0.017092

1 0.118785 -0.000142 -0.118642 0.119355 14.254154 -0.017078

2 0.118616 -0.000142 -0.118474 0.119185 14.233923 -0.017054

3 0.118616 -0.000142 -0.118474 0.119185 14.233900 -0.017054

4 0.118602 -0.000142 -0.118459 0.119171 14.232191 -0.017052

… … … … … … …

33995 -27.863249 -27.863249 55.726498 55.726637 0.000000 0.000000

33996 -27.863098 -27.863098 55.726197 55.726336 0.000000 0.000000

33997 -27.848738 -27.848738 55.697476 55.697616 0.000000 0.000000

33998 -27.820200 -27.820200 55.640400 55.640539 0.000000 0.000000

33999 -27.582330 -27.582330 55.164659 55.164797 0.000000 0.000000

Vca state

0 -14.248758 0

1 -14.237076 0

2 -14.216869 0

5
3 -14.216846 0

4 -14.215139 0

… … …

33995 0.000000 57

33996 0.000000 57

33997 0.000000 57

33998 0.000000 57

33999 0.000000 57

[34000 rows x 31 columns]


Null values in the dataset:
[8]: BB_new.isna().sum() #ReCheck for null values

[8]: Vs1 0
Vs2 0
Vs3 0
Is1 0
Is2 0
Is3 0
Ipv 0
Vpv 0
Vbdc 0
Ibdc 0
Vs2a 0
Is2a 0
Vs2b 0
Is2b 0
Iin 0
Iout 0
Vin 0
Vout 0
Iswitch1 0
Vswitch1 0
Va 0
Vb 0
Vc 0

6
Ia 0
Ib 0
Ic 0
Ivsiin 0
Vab 0
Vbc 0
Vca 0
state 0
dtype: int64

[9]: dupes = BB_new.duplicated() #Check for duplicate values


sum(dupes)

[9]: 87

Statistical Information on the parameter values:


[10]: BB_new.describe(include="all")

[10]: Vs1 Vs2 Vs3 Is1 Is2 \


count 34000.000000 34000.000000 34000.000000 34000.000000 3.400000e+04
mean 24.424958 24.424958 24.424958 0.712736 9.921109e-01
std 15.471733 15.471733 15.471733 1.060335 5.777355e-01
min -0.386826 -0.386826 -0.386826 -3.141043 -1.510000e-07
25% 9.028140 9.028140 9.028140 0.427885 5.257975e-01
50% 26.156690 26.156690 26.156690 0.646582 7.818494e-01
75% 39.978182 39.978182 39.978182 1.568042 1.580762e+00
max 40.974575 40.974575 40.974575 1.660987 1.660987e+00

Is3 Ipv Vpv Vbdc Ibdc \


count 3.400000e+04 34000.000000 34000.000000 34000.000000 34000.000000
mean 9.947535e-01 2.538181 24.424958 80.014397 8.735896
std 5.800126e-01 1.833794 15.471733 52.793676 30.544479
min -1.510000e-07 -0.000045 -0.386826 0.000000 -20.294517
25% 5.258062e-01 1.282527 9.028140 24.215171 0.140394
50% 7.818491e-01 1.813026 26.156690 109.874332 0.611993
75% 1.580762e+00 4.644787 39.978182 112.410481 1.980305
max 1.660987e+00 4.984099 40.974575 228.253079 532.031976

… Vb Vc Ia Ib \
count … 34000.000000 34000.000000 34000.000000 34000.000000
mean … -43.323263 -63.969280 -2.512685 -0.172071
std … 51.962692 52.912132 8.563583 11.691416
min … -117.194296 -117.200626 -56.682101 -160.194876
25% … -109.794774 -111.401093 -0.486577 -0.305050
50% … -18.680238 -106.619207 -0.304989 -0.041296
75% … 0.000000 -7.674458 -0.037460 0.000583

7
max … 58.462856 57.679924 70.442792 31.484645

Ic Ivsiin Vab Vbc Vca \


count 34000.000000 34000.000000 34000.000000 3.400000e+04 34000.000000
mean 2.721836 5.363582 -26.605957 -5.959940e+00 37.363323
std 11.955415 23.163806 27.562749 2.885601e+01 33.828264
min -0.320444 0.000000 -78.045962 -5.863195e+01 -38.453282
25% 0.079161 0.148836 -38.955613 -3.659622e+01 2.581737
50% 0.311909 0.613157 -34.516884 -5.750000e-08 36.592932
75% 0.610209 0.627341 0.000000 1.099240e-04 73.194801
max 160.328261 561.948902 19.692111 1.099769e+02 110.156129

state
count 34000.000000
mean 27.852941
std 17.215384
min 0.000000
25% 16.000000
50% 26.500000
75% 46.000000
max 57.000000

[8 rows x 31 columns]

[11]: BB_new.tail() #Listing the last data

[11]: Vs1 Vs2 Vs3 Is1 Is2 Is3 Ipv \


33995 9.246019 9.246019 9.246019 1.612734 1.612734 1.612734 4.838129
33996 9.246019 9.246019 9.246019 1.612734 1.612734 1.612734 4.838129
33997 9.246021 9.246021 9.246021 1.612734 1.612734 1.612734 4.838133
33998 9.246038 9.246038 9.246038 1.612734 1.612734 1.612734 4.838144
33999 9.246299 9.246299 9.246299 1.612733 1.612733 1.612733 4.838144

Vpv Vbdc Ibdc … Vb Vc Ia Ib \


33995 9.246019 8.358975 45.522989 … 0.0 0.0 -27.863249 -27.863249
33996 9.246019 8.358930 45.536640 … 0.0 0.0 -27.863098 -27.863098
33997 9.246021 8.354621 46.792753 … 0.0 0.0 -27.848738 -27.848738
33998 9.246038 8.346060 48.067163 … 0.0 0.0 -27.820200 -27.820200
33999 9.246299 8.274699 46.790538 … 0.0 0.0 -27.582330 -27.582330

Ic Ivsiin Vab Vbc Vca state


33995 55.726498 55.726637 0.0 0.0 0.0 57
33996 55.726197 55.726336 0.0 0.0 0.0 57
33997 55.697476 55.697616 0.0 0.0 0.0 57
33998 55.640400 55.640539 0.0 0.0 0.0 57
33999 55.164659 55.164797 0.0 0.0 0.0 57

8
[5 rows x 31 columns]

Team61 indicates that almost all columns are devoid of data and has a special character ‘-’.Also,
Team Launch column data shows the presence of special character.
[12]: BB_new.tail()

[12]: Vs1 Vs2 Vs3 Is1 Is2 Is3 Ipv \


33995 9.246019 9.246019 9.246019 1.612734 1.612734 1.612734 4.838129
33996 9.246019 9.246019 9.246019 1.612734 1.612734 1.612734 4.838129
33997 9.246021 9.246021 9.246021 1.612734 1.612734 1.612734 4.838133
33998 9.246038 9.246038 9.246038 1.612734 1.612734 1.612734 4.838144
33999 9.246299 9.246299 9.246299 1.612733 1.612733 1.612733 4.838144

Vpv Vbdc Ibdc … Vb Vc Ia Ib \


33995 9.246019 8.358975 45.522989 … 0.0 0.0 -27.863249 -27.863249
33996 9.246019 8.358930 45.536640 … 0.0 0.0 -27.863098 -27.863098
33997 9.246021 8.354621 46.792753 … 0.0 0.0 -27.848738 -27.848738
33998 9.246038 8.346060 48.067163 … 0.0 0.0 -27.820200 -27.820200
33999 9.246299 8.274699 46.790538 … 0.0 0.0 -27.582330 -27.582330

Ic Ivsiin Vab Vbc Vca state


33995 55.726498 55.726637 0.0 0.0 0.0 57
33996 55.726197 55.726336 0.0 0.0 0.0 57
33997 55.697476 55.697616 0.0 0.0 0.0 57
33998 55.640400 55.640539 0.0 0.0 0.0 57
33999 55.164659 55.164797 0.0 0.0 0.0 57

[5 rows x 31 columns]

[13]: #Perform detailed statistical analysis and EDA using univariate, bi-variate and␣
↪multivariate EDA techniques to get data driven insights on recommending␣

↪which teams they can approach which will be a deal win for them. Also as a␣

↪data and statistics expert you have to develop a detailed performance report␣

↪using this data. [10 Marks]

#Hint: Use statistical techniques and visualisation techniques to come up with␣


↪useful metrics and reporting. Find out the best performing team, oldest␣

↪team, team with highest goals, team with lowest performance etc. and many␣

↪more. These are just random examples. Please use your best analytical␣

↪approach to build this report. You can mix match columns to create new ones␣

↪which can be used for better analysis. Create your own features if required.␣

↪Be highly experimental and analytical here to find hidden patterns. Use␣

↪graphical interactive libraries to enable you to publish interactive plots␣

↪in python.

#Univariate Analysis = One column of data is considered - 'Tournament'.

9
BB_new.describe(include="all")

[13]: Vs1 Vs2 Vs3 Is1 Is2 \


count 34000.000000 34000.000000 34000.000000 34000.000000 3.400000e+04
mean 24.424958 24.424958 24.424958 0.712736 9.921109e-01
std 15.471733 15.471733 15.471733 1.060335 5.777355e-01
min -0.386826 -0.386826 -0.386826 -3.141043 -1.510000e-07
25% 9.028140 9.028140 9.028140 0.427885 5.257975e-01
50% 26.156690 26.156690 26.156690 0.646582 7.818494e-01
75% 39.978182 39.978182 39.978182 1.568042 1.580762e+00
max 40.974575 40.974575 40.974575 1.660987 1.660987e+00

Is3 Ipv Vpv Vbdc Ibdc \


count 3.400000e+04 34000.000000 34000.000000 34000.000000 34000.000000
mean 9.947535e-01 2.538181 24.424958 80.014397 8.735896
std 5.800126e-01 1.833794 15.471733 52.793676 30.544479
min -1.510000e-07 -0.000045 -0.386826 0.000000 -20.294517
25% 5.258062e-01 1.282527 9.028140 24.215171 0.140394
50% 7.818491e-01 1.813026 26.156690 109.874332 0.611993
75% 1.580762e+00 4.644787 39.978182 112.410481 1.980305
max 1.660987e+00 4.984099 40.974575 228.253079 532.031976

… Vb Vc Ia Ib \
count … 34000.000000 34000.000000 34000.000000 34000.000000
mean … -43.323263 -63.969280 -2.512685 -0.172071
std … 51.962692 52.912132 8.563583 11.691416
min … -117.194296 -117.200626 -56.682101 -160.194876
25% … -109.794774 -111.401093 -0.486577 -0.305050
50% … -18.680238 -106.619207 -0.304989 -0.041296
75% … 0.000000 -7.674458 -0.037460 0.000583
max … 58.462856 57.679924 70.442792 31.484645

Ic Ivsiin Vab Vbc Vca \


count 34000.000000 34000.000000 34000.000000 3.400000e+04 34000.000000
mean 2.721836 5.363582 -26.605957 -5.959940e+00 37.363323
std 11.955415 23.163806 27.562749 2.885601e+01 33.828264
min -0.320444 0.000000 -78.045962 -5.863195e+01 -38.453282
25% 0.079161 0.148836 -38.955613 -3.659622e+01 2.581737
50% 0.311909 0.613157 -34.516884 -5.750000e-08 36.592932
75% 0.610209 0.627341 0.000000 1.099240e-04 73.194801
max 160.328261 561.948902 19.692111 1.099769e+02 110.156129

state
count 34000.000000
mean 27.852941
std 17.215384
min 0.000000

10
25% 16.000000
50% 26.500000
75% 46.000000
max 57.000000

[8 rows x 31 columns]

[14]: plt.hist(BB_new.Vs1,density = True,color='y', label='Vs1 - PV') #take all params


plt.legend()
plt.show()

The plot depicts the skewness. Upon looking into the histogram, the highly repeated data are low
and hence the plot shows a positively skewed nature.
Photovoltaic Array Distribution Plot
[15]: sns.distplot(BB_new.Vs1, color='r', axlabel = 'PV String Voltage 1')

[15]: <Axes: xlabel='PV String Voltage 1', ylabel='Density'>

11
[16]: sns.distplot(BB_new.Vs2, color='y', axlabel = 'PV String Voltage 2')

[16]: <Axes: xlabel='PV String Voltage 2', ylabel='Density'>

12
[17]: sns.distplot(BB_new.Vs3, color='g', axlabel = 'PV String Voltage 3')

[17]: <Axes: xlabel='PV String Voltage 3', ylabel='Density'>

13
[18]: sns.distplot(BB_new.Is1, axlabel = 'PV - String 1 Current')

[18]: <Axes: xlabel='PV - String 1 Current', ylabel='Density'>

14
[19]: sns.distplot(BB_new.Is2, axlabel = 'PV - String 2 Current')

[19]: <Axes: xlabel='PV - String 2 Current', ylabel='Density'>

15
[20]: sns.distplot(BB_new.Is3, axlabel = 'PV - String 3 Current')

[20]: <Axes: xlabel='PV - String 3 Current', ylabel='Density'>

16
Battery and Bidirectional Converter Distribution Plot
[21]: sns.distplot(BB_new.Vbdc, axlabel = 'Battery & BDC - Output Voltage')

[21]: <Axes: xlabel='Battery & BDC - Output Voltage', ylabel='Density'>

17
[22]: sns.distplot(BB_new.Ibdc, axlabel = 'Battery & BDC - Output Current')

[22]: <Axes: xlabel='Battery & BDC - Output Current', ylabel='Density'>

18
[23]: sns.distplot(BB_new.Vs2a, axlabel = 'Battery & BDC - Switch 1 Voltage')

[23]: <Axes: xlabel='Battery & BDC - Switch 1 Voltage', ylabel='Density'>

19
[24]: sns.distplot(BB_new.Is2a, axlabel = 'Battery & BDC - Switch 1 Current')

[24]: <Axes: xlabel='Battery & BDC - Switch 1 Current', ylabel='Density'>

20
[25]: sns.distplot(BB_new.Vs2b, axlabel = 'Battery & BDC - Switch 2 Voltage')

[25]: <Axes: xlabel='Battery & BDC - Switch 2 Voltage', ylabel='Density'>

21
[26]: sns.distplot(BB_new.Is2b, axlabel = 'Battery & BDC - Switch 2 Current')

[26]: <Axes: xlabel='Battery & BDC - Switch 2 Current', ylabel='Density'>

22
Boost Converter Distribution Plot
[27]: sns.distplot(BB_new.Iin, axlabel = 'Boost Converter - Input Current')

[27]: <Axes: xlabel='Boost Converter - Input Current', ylabel='Density'>

23
[28]: sns.distplot(BB_new.Iout, axlabel = 'Boost Converter - Output Current')

[28]: <Axes: xlabel='Boost Converter - Output Current', ylabel='Density'>

24
[29]: sns.distplot(BB_new.Vin, axlabel = 'Boost Converter - Input Voltage')

[29]: <Axes: xlabel='Boost Converter - Input Voltage', ylabel='Density'>

25
[30]: sns.distplot(BB_new.Vout, axlabel = 'Boost Converter - Output Voltage')

[30]: <Axes: xlabel='Boost Converter - Output Voltage', ylabel='Density'>

26
[31]: sns.distplot(BB_new.Iswitch1, axlabel = 'Boost Converter - Switch Current')

[31]: <Axes: xlabel='Boost Converter - Switch Current', ylabel='Density'>

27
[32]: sns.distplot(BB_new.Vswitch1, axlabel = 'Boost Converter - Switch Voltage')

[32]: <Axes: xlabel='Boost Converter - Switch Voltage', ylabel='Density'>

28
VSI Distribution Plot
[33]: sns.distplot(BB_new.Va, label = 'VSI - Phase A Voltage')
sns.distplot(BB_new.Vb, label = 'VSI - Phase B Voltage')
sns.distplot(BB_new.Vc, axlabel = 'VSI - Phase Voltages')

[33]: <Axes: xlabel='VSI - Phase Voltages', ylabel='Density'>

29
[34]: sns.distplot(BB_new.Ia, label = 'VSI - Phase A Current')
sns.distplot(BB_new.Ib, label = 'VSI - Phase B Current')
sns.distplot(BB_new.Ic, axlabel = 'VSI - Phase Currents')

[34]: <Axes: xlabel='VSI - Phase Currents', ylabel='Density'>

30
[35]: sns.distplot(BB_new.Ivsiin, axlabel = 'VSI - Input Current')

[35]: <Axes: xlabel='VSI - Input Current', ylabel='Density'>

31
[36]: sns.distplot(BB_new.Vab, label = 'VSI - Line AB Voltage')
sns.distplot(BB_new.Vbc, label = 'VSI - Line BC Voltage')
sns.distplot(BB_new.Vca, axlabel = 'VSI - Line Voltages')

[36]: <Axes: xlabel='VSI - Line Voltages', ylabel='Density'>

32
[37]: #Multivariate Analysis = Correlation among all data given

BB_new.columns #Column datatype check

[37]: Index(['Vs1', 'Vs2', 'Vs3', 'Is1', 'Is2', 'Is3', 'Ipv', 'Vpv', 'Vbdc', 'Ibdc',
'Vs2a', 'Is2a', 'Vs2b', 'Is2b', 'Iin', 'Iout', 'Vin', 'Vout',
'Iswitch1', 'Vswitch1', 'Va', 'Vb', 'Vc', 'Ia', 'Ib', 'Ic', 'Ivsiin',
'Vab', 'Vbc', 'Vca', 'state'],
dtype='object')

[38]: #BB_new[['Vs1', 'Vs2', 'Vs3', 'Is1', 'Is2', 'Is3', 'Ipv', 'Vpv', 'Vbdc',␣
↪'Ibdc', 'Vs2a', 'Is2a', 'Vs2b', 'Is2b', 'Iin', 'Iout', 'Vin', 'Vout',␣

↪'Iswitch1', 'Vswitch1', 'Va', 'Vb', 'Vc', 'Ia', 'Ib', 'Ic','Ivsiin', 'Vab',␣

↪'Vbc', 'Vca', 'state']]= BB_new[['Vs1', 'Vs2', 'Vs3', 'Is1', 'Is2', 'Is3',␣

↪'Ipv', 'Vpv', 'Vbdc', 'Ibdc', 'Vs2a', 'Is2a', 'Vs2b', 'Is2b', 'Iin', 'Iout',␣

↪'Vin', 'Vout', 'Iswitch1', 'Vswitch1', 'Va', 'Vb', 'Vc', 'Ia', 'Ib', 'Ic',␣

↪'Ivsiin', 'Vab', 'Vbc', 'Vca', 'state']].apply(pd.to_numeric)

#Column datatype change

[39]: BB_new.corr()

33
[39]: Vs1 Vs2 Vs3 Is1 Is2 Is3 \
Vs1 1.000000 1.000000 1.000000 -0.222136 -0.425274 -0.421767
Vs2 1.000000 1.000000 1.000000 -0.222136 -0.425274 -0.421767
Vs3 1.000000 1.000000 1.000000 -0.222136 -0.425274 -0.421767
Is1 -0.222136 -0.222136 -0.222136 1.000000 0.331634 0.316308
Is2 -0.425274 -0.425274 -0.425274 0.331634 1.000000 0.999650
Is3 -0.421767 -0.421767 -0.421767 0.316308 0.999650 1.000000
Ipv -0.256572 -0.256572 -0.256572 0.715287 0.741474 0.733328
Vpv 1.000000 1.000000 1.000000 -0.222136 -0.425274 -0.421767
Vbdc 0.570540 0.570540 0.570540 -0.568707 -0.775157 -0.769451
Ibdc -0.300296 -0.300296 -0.300296 0.234425 0.299579 0.297140
Vs2a 1.000000 1.000000 1.000000 -0.222136 -0.425274 -0.421767
Is2a 0.395968 0.395968 0.395968 -0.468229 -0.594697 -0.589115
Vs2b -0.216776 -0.216776 -0.216776 0.142318 0.218698 0.217549
Is2b 0.008733 0.008733 0.008733 -0.000695 -0.005412 -0.005432
Iin -0.256572 -0.256572 -0.256572 0.715287 0.741474 0.733328
Iout -0.063080 -0.063080 -0.063080 0.088527 0.094359 0.093522
Vin 1.000000 1.000000 1.000000 -0.222136 -0.425274 -0.421767
Vout 0.570540 0.570540 0.570540 -0.568707 -0.775157 -0.769451
Iswitch1 0.386823 0.386823 0.386823 0.027771 -0.146941 -0.148303
Vswitch1 0.321114 0.321114 0.321114 -0.177838 -0.108459 -0.104098
Va -0.062936 -0.062936 -0.062936 0.208144 0.285212 0.283321
Vb -0.288099 -0.288099 -0.288099 0.405080 0.403851 0.398759
Vc -0.335123 -0.335123 -0.335123 0.570426 0.633133 0.626564
Ia 0.265307 0.265307 0.265307 -0.214842 -0.274168 -0.271961
Ib -0.026940 -0.026940 -0.026940 -0.003209 -0.004562 -0.004536
Ic -0.160642 -0.160642 -0.160642 0.157019 0.199329 0.197716
Ivsiin -0.140149 -0.140149 -0.140149 0.168650 0.210296 0.208523
Vab -0.197408 -0.197408 -0.197408 0.478764 0.534482 0.529097
Vbc -0.092856 -0.092856 -0.092856 0.140790 0.076815 0.074548
Vca 0.363334 0.363334 0.363334 -0.502137 -0.554820 -0.548933
state -0.284426 -0.284426 -0.284426 0.506898 0.593021 0.588150

Ipv Vpv Vbdc Ibdc … Vb Vc \


Vs1 -0.256572 1.000000 0.570540 -0.300296 … -0.288099 -0.335123
Vs2 -0.256572 1.000000 0.570540 -0.300296 … -0.288099 -0.335123
Vs3 -0.256572 1.000000 0.570540 -0.300296 … -0.288099 -0.335123
Is1 0.715287 -0.222136 -0.568707 0.234425 … 0.405080 0.570426
Is2 0.741474 -0.425274 -0.775157 0.299579 … 0.403851 0.633133
Is3 0.733328 -0.421767 -0.769451 0.297140 … 0.398759 0.626564
Ipv 1.000000 -0.256572 -0.865313 0.347413 … 0.512197 0.802789
Vpv -0.256572 1.000000 0.570540 -0.300296 … -0.288099 -0.335123
Vbdc -0.865313 0.570540 1.000000 -0.381123 … -0.412217 -0.634540
Ibdc 0.347413 -0.300296 -0.381123 1.000000 … 0.249969 0.348687
Vs2a -0.256572 1.000000 0.570540 -0.300296 … -0.288099 -0.335123
Is2a -0.693771 0.395968 0.775670 -0.278868 … -0.334989 -0.490237
Vs2b 0.221312 -0.216776 -0.254039 0.103959 … 0.175929 0.200330

34
Is2b -0.003111 0.008733 0.005434 -0.002380 … -0.011376 -0.007848
Iin 1.000000 -0.256572 -0.865313 0.347413 … 0.512197 0.802789
Iout 0.127235 -0.063080 -0.133530 0.587055 … 0.090308 0.122731
Vin -0.256572 1.000000 0.570540 -0.300296 … -0.288099 -0.335123
Vout -0.865313 0.570540 1.000000 -0.381123 … -0.412217 -0.634540
Iswitch1 -0.023391 0.386823 0.162562 -0.043608 … -0.075535 -0.108376
Vswitch1 -0.120710 0.321114 0.230018 -0.132886 … -0.096395 -0.095113
Va 0.361486 -0.062936 -0.277002 0.124446 … -0.400395 0.436096
Vb 0.512197 -0.288099 -0.412217 0.249969 … 1.000000 0.650007
Vc 0.802789 -0.335123 -0.634540 0.348687 … 0.650007 1.000000
Ia -0.317983 0.265307 0.357445 -0.209785 … -0.207230 -0.290866
Ib -0.006048 -0.026940 -0.004089 -0.153685 … 0.000152 -0.006420
Ic 0.232994 -0.160642 -0.250278 0.299741 … 0.147904 0.211933
Ivsiin 0.248086 -0.140149 -0.258811 0.623958 … 0.198864 0.268734
Vab 0.698620 -0.197408 -0.527270 0.272085 … 0.196381 0.827976
Vbc 0.117611 -0.092856 -0.082412 0.070651 … 0.796446 0.127712
Vca -0.686449 0.363334 0.562898 -0.323704 … -0.856693 -0.889517
state 0.731417 -0.284426 -0.644080 0.162105 … 0.377284 0.604051

Ia Ib Ic Ivsiin Vab Vbc \


Vs1 0.265307 -0.026940 -0.160642 -0.140149 -0.197408 -0.092856
Vs2 0.265307 -0.026940 -0.160642 -0.140149 -0.197408 -0.092856
Vs3 0.265307 -0.026940 -0.160642 -0.140149 -0.197408 -0.092856
Is1 -0.214842 -0.003209 0.157019 0.168650 0.478764 0.140790
Is2 -0.274168 -0.004562 0.199329 0.210296 0.534482 0.076815
Is3 -0.271961 -0.004536 0.197716 0.208523 0.529097 0.074548
Ipv -0.317983 -0.006048 0.232994 0.248086 0.698620 0.117611
Vpv 0.265307 -0.026940 -0.160642 -0.140149 -0.197408 -0.092856
Vbdc 0.357445 -0.004089 -0.250278 -0.258811 -0.527270 -0.082412
Ibdc -0.209785 -0.153685 0.299741 0.623958 0.272085 0.070651
Vs2a 0.265307 -0.026940 -0.160642 -0.140149 -0.197408 -0.092856
Is2a 0.251037 0.005514 -0.183134 -0.197910 -0.384767 -0.071829
Vs2b -0.054963 -0.027986 0.066883 0.039317 0.148986 0.091778
Is2b 0.002337 -0.000131 -0.001573 -0.001797 -0.003409 -0.009351
Iin -0.317983 -0.006048 0.232994 0.248086 0.698620 0.117611
Iout -0.241400 -0.203688 0.372139 0.764676 0.094939 0.028261
Vin 0.265307 -0.026940 -0.160642 -0.140149 -0.197408 -0.092856
Vout 0.357445 -0.004089 -0.250278 -0.258811 -0.527270 -0.082412
Iswitch1 0.024976 -0.024471 0.008386 0.027620 -0.052555 0.012504
Vswitch1 0.109987 -0.013515 -0.064604 -0.059042 -0.045149 -0.042304
Va -0.105335 -0.007922 0.080408 0.088554 0.765844 -0.789145
Vb -0.207230 0.000152 0.147904 0.198864 0.196381 0.796446
Vc -0.290866 -0.006420 0.211933 0.268734 0.827976 0.127712
Ia 1.000000 -0.334133 -0.388628 -0.372604 -0.226065 -0.055754
Ib -0.334133 1.000000 -0.738402 -0.241991 0.002513 0.014447
Ic -0.388628 -0.738402 1.000000 0.503012 0.162465 0.032911
Ivsiin -0.372604 -0.241991 0.503012 1.000000 0.205705 0.061825

35
Vab -0.226065 0.002513 0.162465 0.205705 1.000000 -0.209410
Vbc -0.055754 0.014447 0.032911 0.061825 -0.209410 1.000000
Vca 0.270761 0.012090 -0.199119 -0.252732 -0.480286 -0.370384
state -0.374515 -0.054020 0.320756 0.264323 0.536627 0.084349

Vca state
Vs1 0.363334 -0.284426
Vs2 0.363334 -0.284426
Vs3 0.363334 -0.284426
Is1 -0.502137 0.506898
Is2 -0.554820 0.593021
Is3 -0.548933 0.588150
Ipv -0.686449 0.731417
Vpv 0.363334 -0.284426
Vbdc 0.562898 -0.644080
Ibdc -0.323704 0.162105
Vs2a 0.363334 -0.284426
Is2a 0.453297 -0.509955
Vs2b -0.191952 0.075907
Is2b 0.009498 -0.003494
Iin -0.686449 0.731417
Iout -0.114613 0.102628
Vin 0.363334 -0.284426
Vout 0.562898 -0.644080
Iswitch1 0.126694 -0.082309
Vswitch1 0.111984 -0.114393
Va -0.058117 0.281606
Vb -0.856693 0.377284
Vc -0.889517 0.604051
Ia 0.270761 -0.374515
Ib 0.012090 -0.054020
Ic -0.199119 0.320756
Ivsiin -0.252732 0.264323
Vab -0.480286 0.536627
Vbc -0.370384 0.084349
Vca 1.000000 -0.507585
state -0.507585 1.000000

[31 rows x 31 columns]

[40]: plt.figure(figsize=(40,20))
htmat = BB_new.corr(method='pearson')
sns.heatmap(htmat, cmap="YlGnBu", fmt='0.2f',annot=True)
plt.show();

36
All the parameters except TeamLaunch are highly corelated with a corelation value equal to 1.
TeamLaunch is seen to have no corelation with other parameters. The HighestPositionHeld has a
negative corelation with all other parameters in the dataframe. s that have a corelation greater
than 90% are negatively correlated means for old teams given values are more and for new teams
its low.
[59]: #ADD BOX PLOT
plt.figure(figsize=(20,10))
data_1 = BB_new.Vs1
data_2 = BB_new.Vs2
data_3 = BB_new.Vs3

data = [data_1, data_2, data_3]

fig = plt.figure(figsize =(20, 10))

# Creating axes instance


ax = fig.add_axes([0, 0, 1, 1])

# Creating plot
bp = sns.boxplot(data = BB_new[['Vs1', 'Vs2','Vs3']])

# show plot
plt.show()

<Figure size 2000x1000 with 0 Axes>

37
[60]: data_4 = BB_new.Is1
data_5 = BB_new.Is2
data_6 = BB_new.Is3
data = [data_4, data_5, data_6]

fig = plt.figure(figsize =(20, 10))

# Creating axes instance


ax = fig.add_axes([0, 0, 1, 1])

# Creating plot
bp = sns.boxplot(data = BB_new[['Is1', 'Is2','Is3']])

# show plot
plt.show()

38
[61]: data = [BB_new.Vbdc, BB_new.Vs2a, BB_new.Vs2b]

fig = plt.figure(figsize =(20, 10))

# Creating axes instance


ax = fig.add_axes([0, 0, 1, 1])

# Creating plot
bp = sns.boxplot(data = BB_new[['Vbdc', 'Vs2a','Vs2b']])

# show plot
plt.show()

39
[62]: data = [BB_new.Ibdc, BB_new.Is2a, BB_new.Is2b]

fig = plt.figure(figsize =(20, 10))

# Creating axes instance


ax = fig.add_axes([0, 0, 1, 1])

# Creating plot
bp = sns.boxplot(data = BB_new[['Ibdc', 'Is2a','Is2b']])

# show plot
plt.show()

[63]: data = [BB_new.Vin, BB_new.Vout, BB_new.Vswitch1]

fig = plt.figure(figsize =(20, 10))

# Creating axes instance


ax = fig.add_axes([0, 0, 1, 1])

# Creating plot
bp = sns.boxplot(data = BB_new[['Vin', 'Vout','Vswitch1']])

# show plot
plt.show()

40
[64]: data = [BB_new.Iin, BB_new.Iout, BB_new.Iswitch1]

fig = plt.figure(figsize =(20, 10))

# Creating axes instance


ax = fig.add_axes([0, 0, 1, 1])

# Creating plot
bp = sns.boxplot(data = BB_new[['Iin', 'Iout','Iswitch1']])

# show plot
plt.show()

41
[65]: data = [BB_new.Va, BB_new.Vb, BB_new.Vc]

fig = plt.figure(figsize =(20, 10))

# Creating axes instance


ax = fig.add_axes([0, 0, 1, 1])

# Creating plot
bp = sns.boxplot(data = BB_new[['Va', 'Vb','Vc']])

# show plot
plt.show()

[66]: data = [BB_new.Ia, BB_new.Ib, BB_new.Ic]

fig = plt.figure(figsize =(20, 10))

# Creating axes instance


ax = fig.add_axes([0, 0, 1, 1])

# Creating plot
bp = sns.boxplot(data = BB_new[['Ia', 'Ib','Ic']])

# show plot
plt.show()

42
[68]: data = [BB_new.Vab, BB_new.Vbc, BB_new.Vca]

fig = plt.figure(figsize =(20, 10))

# Creating axes instance


ax = fig.add_axes([0, 0, 1, 1])

# Creating plot
bp = sns.boxplot(data = BB_new[['Vab', 'Vbc','Vca']])

# show plot
plt.show()

43
[69]: data = [BB_new.Ivsiin]

fig = plt.figure(figsize =(20, 10))

# Creating axes instance


ax = fig.add_axes([0, 0, 1, 1])

# Creating plot
bp = sns.boxplot(data = BB_new[['Ivsiin']])

# show plot
plt.show()

[ ]: #@title
plt.figure(figsize=(40,20))
pairplot=sns.pairplot(BB_new.sample(15000));
plt.show()

The scatterplot seen above indicates the way of corelation between the parameters

44

You might also like