Notebooks pp2
Notebooks pp2
[1]: #@title
#Read the data set, clean the data and prepare final dataset to be used for␣
↪analysis.
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
<IPython.core.display.HTML object>
Saving data_faults_dup.csv to data_faults_dup.csv
[3]: BB = pd.read_csv('data_faults_dup.csv')
1
[5]: Vs1 Vs2 Vs3 Is1 Is2 Is3 Ipv \
0 22.213547 22.213547 22.213547 1.542505 1.542505 1.542505 4.622393
1 22.238960 22.238960 22.238960 1.542353 1.542353 1.542353 4.621720
2 22.294014 22.294014 22.294014 1.541992 1.541992 1.541992 4.620301
3 22.294085 22.294085 22.294085 1.541991 1.541991 1.541991 4.620299
4 22.299598 22.299598 22.299598 1.541955 1.541955 1.541955 4.620160
[5 rows x 31 columns]
<class 'pandas.core.frame.DataFrame'>
2
6 Ipv 34000 non-null float64
3
30 state 34000 non-null int64
… … … … … … …
4
… … … … … … … …
… … … … … … …
Vca state
0 -14.248758 0
1 -14.237076 0
2 -14.216869 0
5
3 -14.216846 0
4 -14.215139 0
… … …
33995 0.000000 57
33996 0.000000 57
33997 0.000000 57
33998 0.000000 57
33999 0.000000 57
[8]: Vs1 0
Vs2 0
Vs3 0
Is1 0
Is2 0
Is3 0
Ipv 0
Vpv 0
Vbdc 0
Ibdc 0
Vs2a 0
Is2a 0
Vs2b 0
Is2b 0
Iin 0
Iout 0
Vin 0
Vout 0
Iswitch1 0
Vswitch1 0
Va 0
Vb 0
Vc 0
6
Ia 0
Ib 0
Ic 0
Ivsiin 0
Vab 0
Vbc 0
Vca 0
state 0
dtype: int64
[9]: 87
… Vb Vc Ia Ib \
count … 34000.000000 34000.000000 34000.000000 34000.000000
mean … -43.323263 -63.969280 -2.512685 -0.172071
std … 51.962692 52.912132 8.563583 11.691416
min … -117.194296 -117.200626 -56.682101 -160.194876
25% … -109.794774 -111.401093 -0.486577 -0.305050
50% … -18.680238 -106.619207 -0.304989 -0.041296
75% … 0.000000 -7.674458 -0.037460 0.000583
7
max … 58.462856 57.679924 70.442792 31.484645
state
count 34000.000000
mean 27.852941
std 17.215384
min 0.000000
25% 16.000000
50% 26.500000
75% 46.000000
max 57.000000
[8 rows x 31 columns]
8
[5 rows x 31 columns]
Team61 indicates that almost all columns are devoid of data and has a special character ‘-’.Also,
Team Launch column data shows the presence of special character.
[12]: BB_new.tail()
[5 rows x 31 columns]
[13]: #Perform detailed statistical analysis and EDA using univariate, bi-variate and␣
↪multivariate EDA techniques to get data driven insights on recommending␣
↪which teams they can approach which will be a deal win for them. Also as a␣
↪data and statistics expert you have to develop a detailed performance report␣
↪team, team with highest goals, team with lowest performance etc. and many␣
↪more. These are just random examples. Please use your best analytical␣
↪approach to build this report. You can mix match columns to create new ones␣
↪which can be used for better analysis. Create your own features if required.␣
↪Be highly experimental and analytical here to find hidden patterns. Use␣
↪in python.
9
BB_new.describe(include="all")
… Vb Vc Ia Ib \
count … 34000.000000 34000.000000 34000.000000 34000.000000
mean … -43.323263 -63.969280 -2.512685 -0.172071
std … 51.962692 52.912132 8.563583 11.691416
min … -117.194296 -117.200626 -56.682101 -160.194876
25% … -109.794774 -111.401093 -0.486577 -0.305050
50% … -18.680238 -106.619207 -0.304989 -0.041296
75% … 0.000000 -7.674458 -0.037460 0.000583
max … 58.462856 57.679924 70.442792 31.484645
state
count 34000.000000
mean 27.852941
std 17.215384
min 0.000000
10
25% 16.000000
50% 26.500000
75% 46.000000
max 57.000000
[8 rows x 31 columns]
The plot depicts the skewness. Upon looking into the histogram, the highly repeated data are low
and hence the plot shows a positively skewed nature.
Photovoltaic Array Distribution Plot
[15]: sns.distplot(BB_new.Vs1, color='r', axlabel = 'PV String Voltage 1')
11
[16]: sns.distplot(BB_new.Vs2, color='y', axlabel = 'PV String Voltage 2')
12
[17]: sns.distplot(BB_new.Vs3, color='g', axlabel = 'PV String Voltage 3')
13
[18]: sns.distplot(BB_new.Is1, axlabel = 'PV - String 1 Current')
14
[19]: sns.distplot(BB_new.Is2, axlabel = 'PV - String 2 Current')
15
[20]: sns.distplot(BB_new.Is3, axlabel = 'PV - String 3 Current')
16
Battery and Bidirectional Converter Distribution Plot
[21]: sns.distplot(BB_new.Vbdc, axlabel = 'Battery & BDC - Output Voltage')
17
[22]: sns.distplot(BB_new.Ibdc, axlabel = 'Battery & BDC - Output Current')
18
[23]: sns.distplot(BB_new.Vs2a, axlabel = 'Battery & BDC - Switch 1 Voltage')
19
[24]: sns.distplot(BB_new.Is2a, axlabel = 'Battery & BDC - Switch 1 Current')
20
[25]: sns.distplot(BB_new.Vs2b, axlabel = 'Battery & BDC - Switch 2 Voltage')
21
[26]: sns.distplot(BB_new.Is2b, axlabel = 'Battery & BDC - Switch 2 Current')
22
Boost Converter Distribution Plot
[27]: sns.distplot(BB_new.Iin, axlabel = 'Boost Converter - Input Current')
23
[28]: sns.distplot(BB_new.Iout, axlabel = 'Boost Converter - Output Current')
24
[29]: sns.distplot(BB_new.Vin, axlabel = 'Boost Converter - Input Voltage')
25
[30]: sns.distplot(BB_new.Vout, axlabel = 'Boost Converter - Output Voltage')
26
[31]: sns.distplot(BB_new.Iswitch1, axlabel = 'Boost Converter - Switch Current')
27
[32]: sns.distplot(BB_new.Vswitch1, axlabel = 'Boost Converter - Switch Voltage')
28
VSI Distribution Plot
[33]: sns.distplot(BB_new.Va, label = 'VSI - Phase A Voltage')
sns.distplot(BB_new.Vb, label = 'VSI - Phase B Voltage')
sns.distplot(BB_new.Vc, axlabel = 'VSI - Phase Voltages')
29
[34]: sns.distplot(BB_new.Ia, label = 'VSI - Phase A Current')
sns.distplot(BB_new.Ib, label = 'VSI - Phase B Current')
sns.distplot(BB_new.Ic, axlabel = 'VSI - Phase Currents')
30
[35]: sns.distplot(BB_new.Ivsiin, axlabel = 'VSI - Input Current')
31
[36]: sns.distplot(BB_new.Vab, label = 'VSI - Line AB Voltage')
sns.distplot(BB_new.Vbc, label = 'VSI - Line BC Voltage')
sns.distplot(BB_new.Vca, axlabel = 'VSI - Line Voltages')
32
[37]: #Multivariate Analysis = Correlation among all data given
[37]: Index(['Vs1', 'Vs2', 'Vs3', 'Is1', 'Is2', 'Is3', 'Ipv', 'Vpv', 'Vbdc', 'Ibdc',
'Vs2a', 'Is2a', 'Vs2b', 'Is2b', 'Iin', 'Iout', 'Vin', 'Vout',
'Iswitch1', 'Vswitch1', 'Va', 'Vb', 'Vc', 'Ia', 'Ib', 'Ic', 'Ivsiin',
'Vab', 'Vbc', 'Vca', 'state'],
dtype='object')
[38]: #BB_new[['Vs1', 'Vs2', 'Vs3', 'Is1', 'Is2', 'Is3', 'Ipv', 'Vpv', 'Vbdc',␣
↪'Ibdc', 'Vs2a', 'Is2a', 'Vs2b', 'Is2b', 'Iin', 'Iout', 'Vin', 'Vout',␣
↪'Ipv', 'Vpv', 'Vbdc', 'Ibdc', 'Vs2a', 'Is2a', 'Vs2b', 'Is2b', 'Iin', 'Iout',␣
↪'Vin', 'Vout', 'Iswitch1', 'Vswitch1', 'Va', 'Vb', 'Vc', 'Ia', 'Ib', 'Ic',␣
[39]: BB_new.corr()
33
[39]: Vs1 Vs2 Vs3 Is1 Is2 Is3 \
Vs1 1.000000 1.000000 1.000000 -0.222136 -0.425274 -0.421767
Vs2 1.000000 1.000000 1.000000 -0.222136 -0.425274 -0.421767
Vs3 1.000000 1.000000 1.000000 -0.222136 -0.425274 -0.421767
Is1 -0.222136 -0.222136 -0.222136 1.000000 0.331634 0.316308
Is2 -0.425274 -0.425274 -0.425274 0.331634 1.000000 0.999650
Is3 -0.421767 -0.421767 -0.421767 0.316308 0.999650 1.000000
Ipv -0.256572 -0.256572 -0.256572 0.715287 0.741474 0.733328
Vpv 1.000000 1.000000 1.000000 -0.222136 -0.425274 -0.421767
Vbdc 0.570540 0.570540 0.570540 -0.568707 -0.775157 -0.769451
Ibdc -0.300296 -0.300296 -0.300296 0.234425 0.299579 0.297140
Vs2a 1.000000 1.000000 1.000000 -0.222136 -0.425274 -0.421767
Is2a 0.395968 0.395968 0.395968 -0.468229 -0.594697 -0.589115
Vs2b -0.216776 -0.216776 -0.216776 0.142318 0.218698 0.217549
Is2b 0.008733 0.008733 0.008733 -0.000695 -0.005412 -0.005432
Iin -0.256572 -0.256572 -0.256572 0.715287 0.741474 0.733328
Iout -0.063080 -0.063080 -0.063080 0.088527 0.094359 0.093522
Vin 1.000000 1.000000 1.000000 -0.222136 -0.425274 -0.421767
Vout 0.570540 0.570540 0.570540 -0.568707 -0.775157 -0.769451
Iswitch1 0.386823 0.386823 0.386823 0.027771 -0.146941 -0.148303
Vswitch1 0.321114 0.321114 0.321114 -0.177838 -0.108459 -0.104098
Va -0.062936 -0.062936 -0.062936 0.208144 0.285212 0.283321
Vb -0.288099 -0.288099 -0.288099 0.405080 0.403851 0.398759
Vc -0.335123 -0.335123 -0.335123 0.570426 0.633133 0.626564
Ia 0.265307 0.265307 0.265307 -0.214842 -0.274168 -0.271961
Ib -0.026940 -0.026940 -0.026940 -0.003209 -0.004562 -0.004536
Ic -0.160642 -0.160642 -0.160642 0.157019 0.199329 0.197716
Ivsiin -0.140149 -0.140149 -0.140149 0.168650 0.210296 0.208523
Vab -0.197408 -0.197408 -0.197408 0.478764 0.534482 0.529097
Vbc -0.092856 -0.092856 -0.092856 0.140790 0.076815 0.074548
Vca 0.363334 0.363334 0.363334 -0.502137 -0.554820 -0.548933
state -0.284426 -0.284426 -0.284426 0.506898 0.593021 0.588150
34
Is2b -0.003111 0.008733 0.005434 -0.002380 … -0.011376 -0.007848
Iin 1.000000 -0.256572 -0.865313 0.347413 … 0.512197 0.802789
Iout 0.127235 -0.063080 -0.133530 0.587055 … 0.090308 0.122731
Vin -0.256572 1.000000 0.570540 -0.300296 … -0.288099 -0.335123
Vout -0.865313 0.570540 1.000000 -0.381123 … -0.412217 -0.634540
Iswitch1 -0.023391 0.386823 0.162562 -0.043608 … -0.075535 -0.108376
Vswitch1 -0.120710 0.321114 0.230018 -0.132886 … -0.096395 -0.095113
Va 0.361486 -0.062936 -0.277002 0.124446 … -0.400395 0.436096
Vb 0.512197 -0.288099 -0.412217 0.249969 … 1.000000 0.650007
Vc 0.802789 -0.335123 -0.634540 0.348687 … 0.650007 1.000000
Ia -0.317983 0.265307 0.357445 -0.209785 … -0.207230 -0.290866
Ib -0.006048 -0.026940 -0.004089 -0.153685 … 0.000152 -0.006420
Ic 0.232994 -0.160642 -0.250278 0.299741 … 0.147904 0.211933
Ivsiin 0.248086 -0.140149 -0.258811 0.623958 … 0.198864 0.268734
Vab 0.698620 -0.197408 -0.527270 0.272085 … 0.196381 0.827976
Vbc 0.117611 -0.092856 -0.082412 0.070651 … 0.796446 0.127712
Vca -0.686449 0.363334 0.562898 -0.323704 … -0.856693 -0.889517
state 0.731417 -0.284426 -0.644080 0.162105 … 0.377284 0.604051
35
Vab -0.226065 0.002513 0.162465 0.205705 1.000000 -0.209410
Vbc -0.055754 0.014447 0.032911 0.061825 -0.209410 1.000000
Vca 0.270761 0.012090 -0.199119 -0.252732 -0.480286 -0.370384
state -0.374515 -0.054020 0.320756 0.264323 0.536627 0.084349
Vca state
Vs1 0.363334 -0.284426
Vs2 0.363334 -0.284426
Vs3 0.363334 -0.284426
Is1 -0.502137 0.506898
Is2 -0.554820 0.593021
Is3 -0.548933 0.588150
Ipv -0.686449 0.731417
Vpv 0.363334 -0.284426
Vbdc 0.562898 -0.644080
Ibdc -0.323704 0.162105
Vs2a 0.363334 -0.284426
Is2a 0.453297 -0.509955
Vs2b -0.191952 0.075907
Is2b 0.009498 -0.003494
Iin -0.686449 0.731417
Iout -0.114613 0.102628
Vin 0.363334 -0.284426
Vout 0.562898 -0.644080
Iswitch1 0.126694 -0.082309
Vswitch1 0.111984 -0.114393
Va -0.058117 0.281606
Vb -0.856693 0.377284
Vc -0.889517 0.604051
Ia 0.270761 -0.374515
Ib 0.012090 -0.054020
Ic -0.199119 0.320756
Ivsiin -0.252732 0.264323
Vab -0.480286 0.536627
Vbc -0.370384 0.084349
Vca 1.000000 -0.507585
state -0.507585 1.000000
[40]: plt.figure(figsize=(40,20))
htmat = BB_new.corr(method='pearson')
sns.heatmap(htmat, cmap="YlGnBu", fmt='0.2f',annot=True)
plt.show();
36
All the parameters except TeamLaunch are highly corelated with a corelation value equal to 1.
TeamLaunch is seen to have no corelation with other parameters. The HighestPositionHeld has a
negative corelation with all other parameters in the dataframe. s that have a corelation greater
than 90% are negatively correlated means for old teams given values are more and for new teams
its low.
[59]: #ADD BOX PLOT
plt.figure(figsize=(20,10))
data_1 = BB_new.Vs1
data_2 = BB_new.Vs2
data_3 = BB_new.Vs3
# Creating plot
bp = sns.boxplot(data = BB_new[['Vs1', 'Vs2','Vs3']])
# show plot
plt.show()
37
[60]: data_4 = BB_new.Is1
data_5 = BB_new.Is2
data_6 = BB_new.Is3
data = [data_4, data_5, data_6]
# Creating plot
bp = sns.boxplot(data = BB_new[['Is1', 'Is2','Is3']])
# show plot
plt.show()
38
[61]: data = [BB_new.Vbdc, BB_new.Vs2a, BB_new.Vs2b]
# Creating plot
bp = sns.boxplot(data = BB_new[['Vbdc', 'Vs2a','Vs2b']])
# show plot
plt.show()
39
[62]: data = [BB_new.Ibdc, BB_new.Is2a, BB_new.Is2b]
# Creating plot
bp = sns.boxplot(data = BB_new[['Ibdc', 'Is2a','Is2b']])
# show plot
plt.show()
# Creating plot
bp = sns.boxplot(data = BB_new[['Vin', 'Vout','Vswitch1']])
# show plot
plt.show()
40
[64]: data = [BB_new.Iin, BB_new.Iout, BB_new.Iswitch1]
# Creating plot
bp = sns.boxplot(data = BB_new[['Iin', 'Iout','Iswitch1']])
# show plot
plt.show()
41
[65]: data = [BB_new.Va, BB_new.Vb, BB_new.Vc]
# Creating plot
bp = sns.boxplot(data = BB_new[['Va', 'Vb','Vc']])
# show plot
plt.show()
# Creating plot
bp = sns.boxplot(data = BB_new[['Ia', 'Ib','Ic']])
# show plot
plt.show()
42
[68]: data = [BB_new.Vab, BB_new.Vbc, BB_new.Vca]
# Creating plot
bp = sns.boxplot(data = BB_new[['Vab', 'Vbc','Vca']])
# show plot
plt.show()
43
[69]: data = [BB_new.Ivsiin]
# Creating plot
bp = sns.boxplot(data = BB_new[['Ivsiin']])
# show plot
plt.show()
[ ]: #@title
plt.figure(figsize=(40,20))
pairplot=sns.pairplot(BB_new.sample(15000));
plt.show()
The scatterplot seen above indicates the way of corelation between the parameters
44