Data Pre Processing Using Python
Data Pre Processing Using Python
Umer Saeed
Umer Saeed
Copyright © 2021 Umer Saeed
Licensed under the Creative Commons Attribution-NonCommercial 3.0 Unported License (the “License”).
You may not use this file except in compliance with the License. You may obtain a copy of the License at
http://creativecommons.org/licenses/by-nc/3.0. Unless required by applicable law or agreed to in
writing, software distributed under the License is distributed on an “AS IS ” BASIS , WITHOUT WARRANTIES
OR CONDITIONS OF ANY KIND , either express or implied. See the License for the specific language
governing permissions and limitations under the License.
Umer Saeed
3
Preface
Python is an amazing language with a strong and friendly community of programmers. However, there is a
lack of documentation on what to learn after getting the basics of Python down your throat. Through this
book I aim to solve this problem. I would give you bits of information about some interesting topics which
you can further explore.
The topics which are discussed in the book open up your mind toward some nice comers of Python language.
This book is an outcome of my desire to have something like this when I was beginning to learn Python.
If you are beginner, intermediate or even an advanced programmer there is something for you in this book.
Please note that this book is not a tutorial and does not teach you Python. The topics are note explained in
depth, instead only the minimum required information is given.
I love Python. Pandas New Era Excel!
Microsoft Excel is the industry leading spreadsheet software program, a powerful data visualization and
analysis tool but it is not suitable for processing large amounts of data so I am sharing some common things a
lot of people do in excel but using python’s pandas package, for example vlookup, filtering data or pivot table.
Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working
with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building
block for doing practical, real-world data analysis in Python.
This book is a continuous work in progress. If you find anything which you can further improve (I know you
will find a lot of stuff) then kindly submit a pull request!
I am sure you are as excited as I am so let’s start!
Umer Saeed
Umer Saeed
Contents
Umer Saeed
6
Umer Saeed
7
Umer Saeed
8
Umer Saeed
9
Umer Saeed
10
15 Traffic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
15.1 Traffic Analysis 69
15.1.1 Input File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
15.1.2 Import required Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
15.1.3 Working Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
15.1.4 Import Excel Sheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
15.1.5 Add additional columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
15.1.6 Concat data frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
15.1.7 map cluster name to Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
15.1.8 Pivot_table(re-shape data set) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
15.1.9 Pre-processing on the header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
15.1.10 Export Final Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
15.1.11 Output File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Umer Saeed
11
Umer Saeed
12
26 Frequency Export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
26.1 Frequency Export For IFOS 98
26.1.1 Input File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
26.1.2 Import required Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Umer Saeed
13
Umer Saeed
14
Umer Saeed
15
34 2G RF Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
34.1 Calculation For 2G RF Utilization Cell and Network Level 123
34.1.1 Input File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
34.1.2 Import Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
34.1.3 Set Working Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
34.1.4 Import Erlang B Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
34.1.5 Unzip Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
34.1.6 List the Files in the Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
34.1.7 Concat All the csv Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
34.1.8 Delete csv File from the Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
34.1.9 Calculate FR and HR Traffic Share . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
34.1.10 Convert K3015 Counter from float to integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
34.1.11 Calculate Offer Traffic Per Cell/Hour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
34.1.12 Calculate 2G RF Utilization (Cell Hourly) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
34.1.13 Calculate 2G RF Utilization (Cell Busy Hour) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
34.1.14 Sum Network Level Traffic and Offer Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
34.1.15 Calculation 2G RF Utilization(Network Level Hourly) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
34.1.16 Calculation 2G RF Utilization(Network Level Busy Hour) . . . . . . . . . . . . . . . . . . . . . . . . . 125
34.1.17 Export Final Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
34.1.18 SLA Target Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
34.1.19 Re-shape Cell Busy Hour Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
34.1.20 Compare KPIs with Target Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
34.1.21 Conditional Pivot table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
34.1.22 Export Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
34.1.23 Output File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Umer Saeed
Umer Saeed
1. Merge GSM Worst Cells from PRS Umer Saeed
[1]: import os
import zipfile
import pandas as pd
from glob import glob
Umer Saeed
18 Chapter 1. Merge GSM Worst Cells from PRS
Umer Saeed
2. Merge UMTS Worst Cells from PRS Umer Saeed
[1]: import os
import zipfile
import pandas as pd
from glob import glob
Umer Saeed
20 Chapter 2. Merge UMTS Worst Cells from PRS
Umer Saeed
3. LAC TAC Convert Hexadecimal to Decimal
Umer Saeed
[1]: import os
import pandas as pd
[3]: df=pd.read_csv('TACLAC.txt')
Umer Saeed
22 Chapter 3. LAC TAC Convert Hexadecimal to Decimal
[5]: df.to_csv('Final_Values.csv',index=False)
Umer Saeed
4. Cell on Cluster Busy Hour Filtering Umer Saeed
Following PRS Report use to prepare the Cell On Cluster Busy Hour data;
[1]: import os
import zipfile
import pandas as pd
from glob import glob
import dask.dataframe as dd
Umer Saeed
24 Chapter 4. Cell on Cluster Busy Hour Filtering
[3]: %%time
for file in os.listdir(working_directory): # get the list of files
if zipfile.is_zipfile(file): # if it is a zipfile, extract it
with zipfile.ZipFile(file) as item: # treat the file as a zip
item.extractall() # extract it in the working directory
[4]: %%time
cell = dd.read_csv('*.csv',\
skiprows=[0,1,2,3,4,5],\
skipfooter=1,
engine='python',\
na_values=['NIL','/0'],
parse_dates=["Date"],assume_missing=True)
[6]: %%time
for file in os.listdir(folder_path): # get the list of files
if zipfile.is_zipfile(file): # if it is a zipfile, extract it
with zipfile.ZipFile(file) as item: # treat the file as a zip
item.extractall() # extract it in the working directory
Umer Saeed
4.1 GSM Cell on Cluster Busy Hour 25
[8]: %%time
ccbh = dd.merge(cell,cluster,on=['Date','Time','Location'])
[9]: %%time
ccbh = ccbh.compute()
Umer Saeed
26 Chapter 4. Cell on Cluster Busy Hour Filtering
Umer Saeed
5. UMTS IP-Pool KPIs Umer Saeed
[1]: import os
import pandas as pd
from glob import glob
[3]: df=pd.DataFrame({
'KPI':['VS.IPPOOL.ADJNODE.PING.MeanDELAY(ms)',\
'VS.IPPOOL.ADJNODE.PING.MeanJITTER(ms)',\
'VS.IPPOOL.ADJNODE.PING.MeanLOST(%)'],
'Target Value':[20,2,0.1]})
Umer Saeed
28 Chapter 5. UMTS IP-Pool KPIs
[5]: df2=pd.melt(df1,\
id_vars=['Date', 'Time', 'RNC','Adjacent Node ID'],\
var_name="KPI", value_name='KPI-Value')
Umer Saeed
5.1 IP-Pool KPIs Summary 29
[13]: df7.to_csv('Output.csv',index=False)
Umer Saeed
6. UMTS IPPM KPIs Umer Saeed
[1]: import os
import pandas as pd
from glob import glob
Umer Saeed
6.1 IPPM KPIs Summary 31
[5]: df2=pd.melt(df1,\
id_vars=['Date', 'Time', 'RNC','Adjacent Node ID'],\
var_name="KPI", value_name='KPI-Value')
Umer Saeed
32 Chapter 6. UMTS IPPM KPIs
[13]: df7.to_csv('Output.csv',index=False)
Umer Saeed
7. GSM IoI KPI Umer Saeed
[1]: import os
import zipfile
import numpy as np
import pandas as pd
from glob import glob
[3]: da = sorted(glob('*.csv'))
df_2g_da=pd.concat((pd.read_csv(file,header=3,\
skipfooter=1,engine='python',na_values=['NIL','/0'],\
parse_dates=["Date"]) for file in da))\
.sort_values('Date').set_index(['Date']).last('10D').reset_index()
Umer Saeed
34 Chapter 7. GSM IoI KPI
7.1.6 2G Hourly - IOI :Calculate Max IOI Per Cell Per Day
[10]: # re-shapre
df_2g_hr_gt_10_interval=df_2g_hr.groupby(['GBSC','Cell CI'])\
['Interference Band Proportion (4~5)(%)'].\
apply(lambda x: (x.ge(10)).sum())\
.reset_index(name='Total Interval IOI>10')
Umer Saeed
7.1 GSM I0I KPI Summary 35
[16]: df_2g_da.Date.unique()
[17]: df_2g_hr.Date.unique()
Umer Saeed
36 Chapter 7. GSM IoI KPI
Umer Saeed
8. UMTS RTWP KPI Umer Saeed
[1]: import os
import zipfile
import numpy as np
import pandas as pd
from glob import glob
Umer Saeed
38 Chapter 8. UMTS RTWP KPI
[3]: da = sorted(glob('*.csv'))
df_3g_da=pd.concat((pd.read_csv(file,header=3,\
skipfooter=1,engine='python',na_values=['NIL','/0'],\
parse_dates=["Date"]) for file in da)).\
sort_values('Date').set_index(['Date']).\
last('10D').reset_index()
8.1.6 3G Hourly - RTWP :Calculate Max RTWP Per Cell Per Day
Umer Saeed
8.1 UMT RTWP KPI Summary 39
8.1.7 3G Hourly - RTWP :Calculate Number of Interval RTWP >=-95(U900) and >=98(U2100)
Umer Saeed
40 Chapter 8. UMTS RTWP KPI
[16]: df_3g_da.Date.unique()
[17]: df_3g_hr.Date.unique()
Umer Saeed
9. LTE UL Interference KPI Umer Saeed
[1]: import os
import zipfile
import numpy as np
import pandas as pd
from glob import glob
Umer Saeed
42 Chapter 9. LTE UL Interference KPI
[3]: da = sorted(glob('*.csv'))
df_4g_da=pd.concat((pd.read_csv(file,header=3,\
skipfooter=1,engine='python',na_values=['NIL','/0'],\
parse_dates=["Date"]) for file in da)).\
sort_values('Date').set_index(['Date']).\
last('10D').reset_index()
9.1.6 4G Hourly - Interference :Calculate Max UL Interference Per Cell Per Day
Umer Saeed
9.1 LTE UL Interference KPI Summary 43
[10]: # re-shapre
df_4g_hr_gt_n108_interval=df_4g_hr.groupby(['Cell Name'])\
['Average UL Interference per Non Shared PRB for GL6MHz (dBm)'].\
apply(lambda x: (x.ge(-108)).sum()).\
reset_index(name='Total Interval IOI>-100')
Umer Saeed
44 Chapter 9. LTE UL Interference KPI
[16]: df_4g_da.Date.unique()
[17]: df_4g_hr.Date.unique()
Umer Saeed
10. GSM BSS Issues Umer Saeed
[1]: import os
import zipfile
import numpy as np
import pandas as pd
from glob import glob
Umer Saeed
46 Chapter 10. GSM BSS Issues
[8]: df5=cell_da.pivot_table\
(index=["GBSC",'Cell CI','Cell Name'],\
columns="Date").reset_index()
Umer Saeed
10.1 BSS Drops and TCH Availability Rate 47
[11]: cell_da.Date.unique()
Umer Saeed
11. Calculate Cluster Busy Hour Umer Saeed
[1]: import os
import numpy as np
import pandas as pd
from glob import glob
Umer Saeed
11.2 Case-2: If Date and Time in Same Column 49
11.1.6 Export
[5]: df_d_bh.to_csv('cluster_bh.csv',index=False)
[10]: df_d_bh=df_d_bh.iloc[:,:-2]
[11]: df_d_bh.to_csv('cluster_bh.csv',index=False)
Umer Saeed
50 Chapter 11. Calculate Cluster Busy Hour
Umer Saeed
12. Daily SLA Target Identification Umer Saeed
[1]: import os
import zipfile
import numpy as np
import pandas as pd
from glob import glob
from collections import ChainMap
Umer Saeed
52 Chapter 12. Daily SLA Target Identification
[7]: d = ChainMap(dict.fromkeys(['GUJRANWALA_CLUSTER_01_Rural',
'GUJRANWALA_CLUSTER_01_Urban',
'GUJRANWALA_CLUSTER_02_Rural','GUJRANWALA_CLUSTER_02_Urban',
'GUJRANWALA_CLUSTER_03_Rural','GUJRANWALA_CLUSTER_03_Urban',
'GUJRANWALA_CLUSTER_04_Rural','GUJRANWALA_CLUSTER_04_Urban',
'GUJRANWALA_CLUSTER_05_Rural','GUJRANWALA_CLUSTER_05_Urban',
'GUJRANWALA_CLUSTER_06_Rural','KASUR_CLUSTER_01_Rural',
'KASUR_CLUSTER_02_Rural','KASUR_CLUSTER_03_Rural',
'KASUR_CLUSTER_03_Urban','LAHORE_CLUSTER_01_Rural',
'LAHORE_CLUSTER_01_Urban',
'LAHORE_CLUSTER_02_Rural','LAHORE_CLUSTER_02_Urban',
'LAHORE_CLUSTER_03_Rural','LAHORE_CLUSTER_03_Urban',
'LAHORE_CLUSTER_04_Urban','LAHORE_CLUSTER_05_Rural',
'LAHORE_CLUSTER_05_Urban',
'LAHORE_CLUSTER_06_Rural','LAHORE_CLUSTER_06_Urban',
Umer Saeed
12.1 Daily Conformance 53
'LAHORE_CLUSTER_07_Rural','LAHORE_CLUSTER_07_Urban',
'LAHORE_CLUSTER_08_Rural','LAHORE_CLUSTER_08_Urban',
'LAHORE_CLUSTER_09_Rural','LAHORE_CLUSTER_09_Urban',
'LAHORE_CLUSTER_10_Urban','LAHORE_CLUSTER_11_Rural',
'LAHORE_CLUSTER_11_Urban','LAHORE_CLUSTER_12_Urban',
'LAHORE_CLUSTER_13_Urban','LAHORE_CLUSTER_14_Urban',
'SIALKOT_CLUSTER_01_Rural','SIALKOT_CLUSTER_01_Urban',
'SIALKOT_CLUSTER_02_Rural','SIALKOT_CLUSTER_02_Urban',
'SIALKOT_CLUSTER_03_Rural','SIALKOT_CLUSTER_03_Urban',
'SIALKOT_CLUSTER_04_Rural','SIALKOT_CLUSTER_05_Rural',
'SIALKOT_CLUSTER_05_Urban','SIALKOT_CLUSTER_06_Rural',
'SIALKOT_CLUSTER_06_Urban','SIALKOT_CLUSTER_07_Rural',
'SIALKOT_CLUSTER_07_Urban'], 'Center-1'),
dict.fromkeys(['DG_KHAN_CLUSTER_01_Rural',
'DG_KHAN_CLUSTER_02_Rural','DG_KHAN_CLUSTER_02_Urban',
'DI_KHAN_CLUSTER_01_Rural','DI_KHAN_CLUSTER_01_Urban',
'DI_KHAN_CLUSTER_02_Rural',
'DI_KHAN_CLUSTER_02_Urban','DI_KHAN_CLUSTER_03_Rural',
'FAISALABAD_CLUSTER_01_Rural',
'FAISALABAD_CLUSTER_02_Rural','FAISALABAD_CLUSTER_03_Rural',
'FAISALABAD_CLUSTER_04_Rural',
'FAISALABAD_CLUSTER_04_Urban','FAISALABAD_CLUSTER_05_Rural',
'FAISALABAD_CLUSTER_05_Urban',
'FAISALABAD_CLUSTER_06_Rural','FAISALABAD_CLUSTER_06_Urban',
'JHUNG_CLUSTER_01_Rural',
'JHUNG_CLUSTER_01_Urban','JHUNG_CLUSTER_02_Rural',
'JHUNG_CLUSTER_02_Urban',
'JHUNG_CLUSTER_03_Rural','JHUNG_CLUSTER_03_Urban',
'JHUNG_CLUSTER_04_Rural',
'JHUNG_CLUSTER_04_Urban','JHUNG_CLUSTER_05_Rural',
'JHUNG_CLUSTER_05_Urban',
'SAHIWAL_CLUSTER_01_Rural','SAHIWAL_CLUSTER_01_Urban',
'SAHIWAL_CLUSTER_02_Rural',
'SAHIWAL_CLUSTER_02_Urban'], 'Center-2'),
dict.fromkeys(['JAMPUR_CLUSTER_01_Urban',
'RAJANPUR_CLUSTER_01_Rural','RAJANPUR_CLUSTER_01_Urban',
'JAMPUR_CLUSTER_01_Rural','DG_KHAN_CLUSTER_03_Rural',
'DG_KHAN_CLUSTER_03_Urban',
'DG_KHAN_CLUSTER_04_Rural','DG_KHAN_CLUSTER_04_Urban',
'SAHIWAL_CLUSTER_03_Rural',
'SAHIWAL_CLUSTER_03_Urban','KHANPUR_CLUSTER_01_Rural',
'KHANPUR_CLUSTER_01_Urban',
'RAHIMYARKHAN_CLUSTER_01_Rural','RAHIMYARKHAN_CLUSTER_01_Urban',
'AHMEDPUREAST_CLUSTER_01_Rural','AHMEDPUREAST_CLUSTER_01_Urban',
'ALIPUR_CLUSTER_01_Rural','ALIPUR_CLUSTER_01_Urban',
'BAHAWALPUR_CLUSTER_01_Rural','BAHAWALPUR_CLUSTER_01_Urban',
'BAHAWALPUR_CLUSTER_02_Rural','SAHIWAL_CLUSTER_04_Rural',
Umer Saeed
54 Chapter 12. Daily SLA Target Identification
'SAHIWAL_CLUSTER_04_Urban','MULTAN_CLUSTER_01_Rural',
'MULTAN_CLUSTER_01_Urban','MULTAN_CLUSTER_02_Rural',
'MULTAN_CLUSTER_02_Urban',
'MULTAN_CLUSTER_03_Rural','MULTAN_CLUSTER_03_Urban',
'RYK DESERT_Cluster_Rural',
'SADIQABAD_CLUSTER_01_Rural','SAHIWAL_CLUSTER_05_Rural',
'SAHIWAL_CLUSTER_05_Urban',
'SAHIWAL_CLUSTER_06_Rural','SAHIWAL_CLUSTER_06_Urban',
'SAHIWAL_CLUSTER_07_Rural',
'SAHIWAL_CLUSTER_07_Urban'], 'Center-3'))
[8]: qformat=cluster_bh.pivot(index=['Date','Region','Location'],\
columns='Cluster Type',\
values=['CSSR_Non Blocking',\
'HSR (Incoming & Outgoing)', 'DCR', 'GOS-SDCCH(%)',\
'CallSetup TCH GOS(%)', 'Mobility TCH GOS(%)', 'RxQual Index DL(%)',\
'RxQual Index UL(%)']).\
fillna('N/A').\
sort_index(level=[0,1],\
axis=1,ascending=[True,False])
[9]: qformat=qformat.reset_index()
[10]: # Export
qformat.to_excel("SLA Target.xlsx",engine='openpyxl',na_rep='N/A')
[11]: #import
aa=pd.read_excel('SLA Target.xlsx',header=[0,1])
[12]: bb=aa.style\
.applymap(lambda x: 'color: black' if pd.isnull(x) else
'background-color: %s' % 'green'
if x>=99.50 else 'background-color: %s' % 'red'
,subset=[('CSSR_Non Blocking','Urban')])\
.applymap(lambda x: 'color: black' if pd.isnull(x) else
'background-color: %s' % 'green'
if x>=99.00 else 'background-color: %s' % 'red',
subset=[('CSSR_Non Blocking','Rural')])\
.applymap(lambda x: 'color: black' if pd.isnull(x)
else 'background-color: %s' % 'green'
Umer Saeed
12.1 Daily Conformance 55
Umer Saeed
56 Chapter 12. Daily SLA Target Identification
Umer Saeed
13. Quarterly SLA Target Identification Umer Saeed
[1]: import os
import zipfile
import numpy as np
import pandas as pd
from glob import glob
from collections import ChainMap
Umer Saeed
58 Chapter 13. Quarterly SLA Target Identification
[6]: d = ChainMap(dict.fromkeys(['GUJRANWALA_CLUSTER_01_Rural',
'GUJRANWALA_CLUSTER_01_Urban',
'GUJRANWALA_CLUSTER_02_Rural','GUJRANWALA_CLUSTER_02_Urban',
'GUJRANWALA_CLUSTER_03_Rural','GUJRANWALA_CLUSTER_03_Urban',
'GUJRANWALA_CLUSTER_04_Rural','GUJRANWALA_CLUSTER_04_Urban',
'GUJRANWALA_CLUSTER_05_Rural','GUJRANWALA_CLUSTER_05_Urban',
'GUJRANWALA_CLUSTER_06_Rural','KASUR_CLUSTER_01_Rural',
'KASUR_CLUSTER_02_Rural','KASUR_CLUSTER_03_Rural',
'KASUR_CLUSTER_03_Urban','LAHORE_CLUSTER_01_Rural',
'LAHORE_CLUSTER_01_Urban',
'LAHORE_CLUSTER_02_Rural','LAHORE_CLUSTER_02_Urban',
'LAHORE_CLUSTER_03_Rural','LAHORE_CLUSTER_03_Urban',
'LAHORE_CLUSTER_04_Urban','LAHORE_CLUSTER_05_Rural',
'LAHORE_CLUSTER_05_Urban',
Umer Saeed
13.1 Quarterly Conformance 59
'LAHORE_CLUSTER_06_Rural','LAHORE_CLUSTER_06_Urban',
'LAHORE_CLUSTER_07_Rural','LAHORE_CLUSTER_07_Urban',
'LAHORE_CLUSTER_08_Rural','LAHORE_CLUSTER_08_Urban',
'LAHORE_CLUSTER_09_Rural','LAHORE_CLUSTER_09_Urban',
'LAHORE_CLUSTER_10_Urban','LAHORE_CLUSTER_11_Rural',
'LAHORE_CLUSTER_11_Urban','LAHORE_CLUSTER_12_Urban',
'LAHORE_CLUSTER_13_Urban','LAHORE_CLUSTER_14_Urban',
'SIALKOT_CLUSTER_01_Rural','SIALKOT_CLUSTER_01_Urban',
'SIALKOT_CLUSTER_02_Rural','SIALKOT_CLUSTER_02_Urban',
'SIALKOT_CLUSTER_03_Rural','SIALKOT_CLUSTER_03_Urban',
'SIALKOT_CLUSTER_04_Rural','SIALKOT_CLUSTER_05_Rural',
'SIALKOT_CLUSTER_05_Urban','SIALKOT_CLUSTER_06_Rural',
'SIALKOT_CLUSTER_06_Urban','SIALKOT_CLUSTER_07_Rural',
'SIALKOT_CLUSTER_07_Urban'], 'Center-1'),
dict.fromkeys(['DG_KHAN_CLUSTER_01_Rural',
'DG_KHAN_CLUSTER_02_Rural','DG_KHAN_CLUSTER_02_Urban',
'DI_KHAN_CLUSTER_01_Rural','DI_KHAN_CLUSTER_01_Urban',
'DI_KHAN_CLUSTER_02_Rural',
'DI_KHAN_CLUSTER_02_Urban','DI_KHAN_CLUSTER_03_Rural',
'FAISALABAD_CLUSTER_01_Rural',
'FAISALABAD_CLUSTER_02_Rural','FAISALABAD_CLUSTER_03_Rural',
'FAISALABAD_CLUSTER_04_Rural',
'FAISALABAD_CLUSTER_04_Urban','FAISALABAD_CLUSTER_05_Rural',
'FAISALABAD_CLUSTER_05_Urban',
'FAISALABAD_CLUSTER_06_Rural','FAISALABAD_CLUSTER_06_Urban',
'JHUNG_CLUSTER_01_Rural',
'JHUNG_CLUSTER_01_Urban','JHUNG_CLUSTER_02_Rural',
'JHUNG_CLUSTER_02_Urban',
'JHUNG_CLUSTER_03_Rural','JHUNG_CLUSTER_03_Urban',
'JHUNG_CLUSTER_04_Rural',
'JHUNG_CLUSTER_04_Urban','JHUNG_CLUSTER_05_Rural',
'JHUNG_CLUSTER_05_Urban',
'SAHIWAL_CLUSTER_01_Rural','SAHIWAL_CLUSTER_01_Urban',
'SAHIWAL_CLUSTER_02_Rural',
'SAHIWAL_CLUSTER_02_Urban'], 'Center-2'),
dict.fromkeys(['JAMPUR_CLUSTER_01_Urban','RAJANPUR_CLUSTER_01_Rural',
'RAJANPUR_CLUSTER_01_Urban',
'JAMPUR_CLUSTER_01_Rural','DG_KHAN_CLUSTER_03_Rural',
'DG_KHAN_CLUSTER_03_Urban',
'DG_KHAN_CLUSTER_04_Rural','DG_KHAN_CLUSTER_04_Urban',
'SAHIWAL_CLUSTER_03_Rural',
'SAHIWAL_CLUSTER_03_Urban','KHANPUR_CLUSTER_01_Rural',
'KHANPUR_CLUSTER_01_Urban',
'RAHIMYARKHAN_CLUSTER_01_Rural','RAHIMYARKHAN_CLUSTER_01_Urban',
'AHMEDPUREAST_CLUSTER_01_Rural','AHMEDPUREAST_CLUSTER_01_Urban',
'ALIPUR_CLUSTER_01_Rural','ALIPUR_CLUSTER_01_Urban',
'BAHAWALPUR_CLUSTER_01_Rural','BAHAWALPUR_CLUSTER_01_Urban',
Umer Saeed
60 Chapter 13. Quarterly SLA Target Identification
'BAHAWALPUR_CLUSTER_02_Rural','SAHIWAL_CLUSTER_04_Rural',
'SAHIWAL_CLUSTER_04_Urban','MULTAN_CLUSTER_01_Rural',
'MULTAN_CLUSTER_01_Urban','MULTAN_CLUSTER_02_Rural',
'MULTAN_CLUSTER_02_Urban',
'MULTAN_CLUSTER_03_Rural','MULTAN_CLUSTER_03_Urban',
'RYK DESERT_Cluster_Rural',
'SADIQABAD_CLUSTER_01_Rural','SAHIWAL_CLUSTER_05_Rural',
'SAHIWAL_CLUSTER_05_Urban',
'SAHIWAL_CLUSTER_06_Rural','SAHIWAL_CLUSTER_06_Urban',
'SAHIWAL_CLUSTER_07_Rural',
'SAHIWAL_CLUSTER_07_Urban'], 'Center-3'))
Umer Saeed
13.1 Quarterly Conformance 61
[12]: cluster_bh_rq_cs['DCR']=(cluster_bh_rq_cs['_DCR_N']/
cluster_bh_rq_cs['_DCR_D'])*100
[13]: cluster_bh_rq_cs['HSR']=(cluster_bh_rq_cs['_HSR%_N']/
cluster_bh_rq_cs['_HSR%_D'])*100
Umer Saeed
62 Chapter 13. Quarterly SLA Target Identification
[20]: cluster_bh_rq_cs_rs=pd.DataFrame(pd.
,→melt(cluster_bh_rq_cs,id_vars=['Region','Location','Cluster Type'],\
[21]: sla=pd.DataFrame({
'KPI':['CSSR','CSSR','DCR','DCR','HSR','HSR','SDCCH GoS','SDCCH GoS','TCH GoS',\
'TCH GoS','MoB GoS','MoB GoS','DL RQI','DL RQI','UL RQI','UL RQI'],
'Cluster Type':['Urban','Rural','Urban','Rural','Urban','Rural','Urban','Rural',\
'Urban','Rural','Urban','Rural','Urban','Rural','Urban','Rural'],
'Target Value':[99.5,99,0.6,1,97.5,96,0.1,0.1,2,2,4,4,98.4,97,98.2,97.7]
})
# Transpose SLA Target
sla1 = sla.set_index(['KPI','Cluster Type']).T
Umer Saeed
13.1 Quarterly Conformance 63
[24]: non_sla_kpis=cluster_bh_rq_cs_rs_t\
[cluster_bh_rq_cs_rs_t.Comments=='Non Conformance']
13.1.18 Summary
[26]: kp3=kp3.iloc[:-1,:]
[27]: gg3=pd.DataFrame(kp3.stack()).reset_index()
gg4 = gg3.pivot_table(index=['Region','Cluster Type'],\
columns='KPI', aggfunc='sum').fillna(0)
#sub total
gg4['Total NC KPIs']= gg4.sum(level=0, axis=1)
13.1.19 HQ Format
[29]: qformat=cluster_bh_rq_cs.pivot_table(index=['Region','Location'],\
columns='Cluster Type',\
values=['CSSR','DCR','HSR',\
'SDCCH GoS','TCH GoS','MoB GoS','DL RQI','UL RQI'],\
aggfunc=sum).\
fillna('N/A').\
sort_index(level=[0,1],axis=1,ascending=[True,False])
Umer Saeed
64 Chapter 13. Quarterly SLA Target Identification
[31]: qformat=qformat.reset_index()
[32]: #export
qformat.to_excel('Quarter_Conformanc.xlsx',engine='openpyxl',na_rep='N/A')
[33]: #import
aa=pd.read_excel('Quarter_Conformanc.xlsx',header=[0,1])
[34]: # Formatting
bb=aa.style\
.applymap(lambda x: 'color: black'
if pd.isnull(x)
else 'background-color: %s' % 'green'
if x>=99.50 else 'background-color: %s' % 'red'
,subset=[('CSSR','Urban')])\
.applymap(lambda x: 'color: black' if pd.isnull(x)
else 'background-color: %s' % 'green'
if x>=99.00
else 'background-color: %s' % 'red'
,subset=[('CSSR','Rural')])\
.applymap(lambda x: 'color: black'
if pd.isnull(x)
else 'background-color: %s' % 'green'
if x<=0.60 else 'background-color: %s' % 'red'
,subset=[('DCR','Urban')])\
.applymap(lambda x: 'color: black'
if pd.isnull(x)
Umer Saeed
13.1 Quarterly Conformance 65
Umer Saeed
66 Chapter 13. Quarterly SLA Target Identification
,subset=[('DL RQI','Urban')])\
.applymap(lambda x: 'color: black'
if pd.isnull(x)
else 'background-color: %s' % 'green'
if x>=97.00 else 'background-color: %s' % 'red'
,subset=[('DL RQI','Rural')])\
.applymap(lambda x: 'color: black'
if pd.isnull(x)
else 'background-color: %s' % 'green'
if x>=98.20 else 'background-color: %s' % 'red'
,subset=[('UL RQI','Urban')])\
.applymap(lambda x: 'color: black'
if pd.isnull(x)
else 'background-color: %s' % 'green'
if x>=97.70 else 'background-color: %s' % 'red'
,subset=[('UL RQI','Rural')])
Umer Saeed
14. GSM Quatrly Data Reshape Umer Saeed
[1]: import os
import pandas as pd
from glob import glob
Umer Saeed
68 Chapter 14. GSM Quatrly Data Reshape
[4]: df=pd.melt(concatdf,\
id_vars=[('Region', 'Unnamed: 0_level_1'),\
('Cell Group', 'Unnamed: 1_level_1'),\
('Quatr','')],
var_name=["KPI-Name",'Cluster-Sub-Zone'],\
value_name='KPI-Value')
[7]: df1.to_csv('2G_Quatrly_Data_Reshape.csv',index=False)
Umer Saeed
15. Traffic Analysis Umer Saeed
[1]: import os
import numpy as np
import pandas as pd
[3]: df = pd.read_excel('Center_Traffic.xlsx',\
sheet_name='2G DA',\
converters={'Integrity': lambda value: '{:,.0f}%'.format(value * 100)},\
parse_dates=['Date'])\
.rename(columns={'GCell Group':'Cluster',\
'Global Traffic':'CS Traffic',\
'Payload(GB)':'PS Traffic'})
Umer Saeed
70 Chapter 15. Traffic Analysis
[7]: df2=pd.concat([df,df0,df1])
Umer Saeed
15.1 Traffic Analysis 71
[9]: df3=pd.DataFrame(df2.pivot_table(index=["Date",'Region'],\
columns="Tech",\
values=['CS Traffic','PS Traffic'],\
margins=True,\
aggfunc=sum)).\
fillna(0).reset_index().\
iloc[:-1, :]
[11]: df3.to_csv('GUL_Daily.csv',index=False)
Umer Saeed
16. Transpose All the Tabs in Excel Sheet Umer Saeed
[1]: import os
import pandas as pd
Umer Saeed
16.1 Transpose All the Tabs in Excel Sheet 73
Umer Saeed
17. LTE High Utilize Cells Umer Saeed
[1]: import os
import pandas as pd
from glob import glob
Umer Saeed
17.1 LTE High Utilize Cells 75
[7]: hucellcount['Total_NW_High_Utilize_Cells']=hucellcount\
['Central_High_Utilize_Cells']+\
hucellcount['North_High_Utilize_Cells']+\
hucellcount['South_High_Utilize_Cells']
Umer Saeed
76 Chapter 17. LTE High Utilize Cells
[14]: rfe['%age_of_North_High_Utilize_Cells']=\
(rfe['North_High_Utilize_Cells']/rfe['North_Total_Cells'])*100
[15]: rfe['%age_of_South_High_Utilize_Cells']=\
(rfe['South_High_Utilize_Cells']/rfe['South_Total_Cells'])*100
[16]: rfe['%age_of_NW_High_Utilize_Cells']=\
(rfe['Total_NW_High_Utilize_Cells']/rfe['Total_NW_Cells'])*100
Umer Saeed
18. Genex Cloud ACP UMTs Engineering Parameters
Umer Saeed
[1]: import os
import numpy as np
import pandas as pd
from glob import glob
[6]: df = pd.read_csv('Counter.csv',header=3,\
skipfooter=1,engine='python',\
parse_dates=["Date"])
Umer Saeed
18.1 ACP UMTs Engineering Parameters 79
[14]: df4.to_csv('GENEXCloud_Platform_ACP_UMTS_Engineer_Parameter.csv',index=False)
Umer Saeed
Umer Saeed
[1]: import os
import glob
import zipfile
import pandas as pd
Umer Saeed
19.1 CGID Log Analysis 81
[3]: # get the the file names in the dir, sub dir's
nicFiles = list()
for (path, dirnames, filenames) in os.walk(path):
nicFiles += [os.path.join(path, file) for file in filenames]
#filter required Files
matchers = ['.log_']
matching = [s for s in nicFiles if any(xs in s for xs in matchers)]
[4]: #concat the All the CGID log Files with specific columns
df_from_each_file = (pd.read_csv(f,header=None,usecols=[0,12],\
names=['Time','Cell Index'])\
.assign(File=f.split('.')[0]) for f in matching)
Umer Saeed
82 Chapter 19. GSM CGID Logs Analysis
[13]: cgid_log_analysis=dff1.pivot_table\
(index=['NE Name','Cell Index','BTS Name','Cell Name']\
,columns="Time",values='File',aggfunc='count').\
fillna(0).reset_index()
[14]: cgid_log_analysis.to_csv('CGID_Analysis_File.csv',index=False)
Umer Saeed
20. External Interference Tracker Parser Umer Saeed
[1]: import os
import numpy as np
import pandas as pd
Umer Saeed
84 Chapter 20. External Interference Tracker Parser
[3]: df = pd.read_excel('input.xlsx',sheet_name=0)
df.to_csv('External_Interference.txt',index=False)
df1 = pd.read_csv('External_Interference.txt')
Umer Saeed
20.1 External Interference Tracker 85
Umer Saeed
21. External Interference Tracker Final Output
Umer Saeed
21.2 Compare External Interference Trackers 87
[4]: # get the count of cell for each RCA Cells count
df1['Count']=df1['CI'].str.split(';').\
apply(set).str.len()
df.loc[:,'June_CI'] = df['June_CI'].\
replace(';','_').\
replace(np.nan,'').\
str.split('_').\
apply(set)
Umer Saeed
88 Chapter 21. External Interference Tracker Final Output
[8]: #export
df1.to_csv('Compare_results.csv',index=False)
Umer Saeed
22. UMTs Timers Merge w.r.t Column Umer Saeed
• 3G Timers
[6]: import os
import pandas as pd
Umer Saeed
90 Chapter 22. UMTs Timers Merge w.r.t Column
[11]: finaldf.to_csv('3G_Timers_Ouput.csv',index=False)
Umer Saeed
Umer Saeed
23. GSM DSP Formatting
[1]: import os
import zipfile
import numpy as np
import pandas as pd
from glob import glob
Umer Saeed
92 Chapter 23. GSM DSP Formatting
[7]: #strip
df['License Identifier_License Item_Allocated_Usage']= \
df['License Identifier_License Item_Allocated_Usage']\
.str.strip().str.replace('\s\s+', ';')
[8]: #split
df[['License Identifier', 'License Item', 'Allocated','Usage']] = \
df['License Identifier_License Item_Allocated_Usage']\
.str.split(';', expand=True)
Umer Saeed
23.1 GSM DSP fixed-width formatted lines 93
[15]: # export
df0.to_csv('DSP_2G_output.csv',index=False)
Umer Saeed
24. UMTS NodeB DSP Formatting Umer Saeed
[1]: import os
import zipfile
import numpy as np
import pandas as pd
from glob import glob
Umer Saeed
24.1 UMTS DSP NodeB File 95
[3]: df=pd.read_fwf('DSP_NodeB.txt',\
colspecs = [(0,500),(0,15),(16,30),(31,50),(51,220)],\
names=['NodeName',\
'Operator Index',\
'Operator Name',\
'License Identifier',\
'License Item_Allocated_Expiration Date'],\
comment='+++')
[9]: df.to_csv('DSP_3GNodeB_output.csv.csv',index=False)
Umer Saeed
25. UMTS RNC DSP Formatting Umer Saeed
[1]: import os
import zipfile
import numpy as np
import pandas as pd
from glob import glob
Umer Saeed
25.1 UMTS DSP RNC File 97
[3]: df=pd.read_fwf('DSP_RNC_3G.txt',\
colspecs = [(0,20),(0,20),(20,35),(35,55),(55,300)],\
names=['RNCName','Cn Operator Index', \
'Operator Name',\
'License Identifier',\
'License Item_Allocated_Usage'],\
comment='+++')
[6]: # strip
df['License Item_Allocated_Usage']= df['License Item_Allocated_Usage']\
.str.strip().str.replace('\s\s+', ';')
[9]: df.to_csv('DSP_3GRNC_output.csv',index=False)
Umer Saeed
26. Frequency Export Umer Saeed
[1]: import os
import zipfile
import numpy as np
import pandas as pd
from glob import glob
Umer Saeed
26.1 Frequency Export For IFOS 99
Umer Saeed
100 Chapter 26. Frequency Export
[15]: df2=df2[list(df2.columns[0:2])+list(df2.columns[7:])]
Umer Saeed
26.1 Frequency Export For IFOS 101
df5.loc[:,'MAL'] = df5.loc[:,'MAL'].\
replace(';','_').replace(np.nan,'-1').\
str.split(';').apply(set).\
apply(lambda x: {int(i) for i in x})
Umer Saeed
102 Chapter 26. Frequency Export
[23]: df5.to_csv('Frequency_Export2.csv',index=False)
Umer Saeed
27. GSM RF Export Parameter Utilization Umer Saeed
[1]: import os
import pandas as pd
Umer Saeed
104 Chapter 27. GSM RF Export Parameter Utilization
[7]: df_value_counts.to_csv('2G_RF_Export_GCELL_Audit.csv',index=False)
Umer Saeed
27.1 RF Export Values Utilization 105
[11]: fds.to_csv('Discrepancy_Cell_List.csv',index=False)
Umer Saeed
28. Pre Post GSM RF Export Audit Umer Saeed
[1]: import os
import pandas as pd
Umer Saeed
28.1 Compare Pre and Post GSM RF Export 107
Umer Saeed
108 Chapter 28. Pre Post GSM RF Export Audit
Umer Saeed
29. UMTS RF Export Audit Umer Saeed
[1]: import os
import pandas as pd
Umer Saeed
110 Chapter 29. UMTS RF Export Audit
[9]: df_value_counts.to_csv('3G_RF_Export_GCELL_Audit.csv',index=False)
Umer Saeed
30. Mege ZTe UMTS RF Exports Umer Saeed
• 3G RF Export
[1]: import os
import glob
import pandas as pd
[3]: %%%time
sheets = pd.ExcelFile(all_csv[0]).sheet_names
dfs = {s: pd.concat(pd.read_excel(f, sheet_name=s,header=0,skiprows=[1,2,3,4]) \
for f in all_csv) for s in sheets}
Umer Saeed
112 Chapter 30. Mege ZTe UMTS RF Exports
Umer Saeed
31. Miscellaneous Operations Umer Saeed
[1]: import os
import pandas as pd
[3]: df = pd.read_csv('MIMO.csv',header=5)
[4]: df.dtypes
Umer Saeed
114 Chapter 31. Miscellaneous Operations
31.1.6 ‘RANK=2 Ratio’ variable data type is object, we have to convert into float
[7]: df0.to_csv('MIMO_Output.csv',index=False)
Umer Saeed
31.2 Conditional Filtering in Python list using regex 115
[8]: import os
import re
import pandas as pd
Umer Saeed
116 Chapter 31. Miscellaneous Operations
[15]: deg.to_csv('degraded_kpis_output.csv',index=False)
Umer Saeed
32. BH KPIs Month Level Umer Saeed
Following PRS Report use to prepare the Cell On Cluster Busy Hour data;
• Cluster BH Report
• Input File must be .zip and .csv Format, Date and Time must be in different columns
[1]: import os
import zipfile
import numpy as np
import pandas as pd
from glob import glob
from collections import ChainMap
Umer Saeed
118 Chapter 32. BH KPIs Month Level
Umer Saeed
32.1 Month/Week Level BH KPIs Calculation 119
df_sum['SDCCH_GOS'] = (df_sum['_GOS-SDCCH(%)_N']/df_sum['_GOS-SDCCH(%)_D'])*100
[12]: dff.to_csv('BH_Monthly_Level_KPIs.csv',index=False)
Umer Saeed
33. DA KPIs Month Level Umer Saeed
Following PRS Report use to prepare the Cell On Cluster Busy Hour data;
• Cluster DA Report
• Input File must be .zip and .csv Format, Date and Time must be in different columns
[1]: import os
import zipfile
import numpy as np
import pandas as pd
from glob import glob
from collections import ChainMap
Umer Saeed
33.1 Month/Week Level DA KPIs Calculation 121
Umer Saeed
122 Chapter 33. DA KPIs Month Level
(1-(df_sum['CSSR_Non Blocking_2_N']/df_sum['CSSR_Non␣
,→Blocking_2_D']))*100
[12]: dff.to_csv('DA_Monthly_Level_KPIs.csv',index=False)
Umer Saeed
34. 2G RF Utilization Umer Saeed
Following PRS Report use to prepare the Cell On Cluster Busy Hour data;
[1]: import os
import zipfile
import numpy as np
import pandas as pd
from glob import glob
Umer Saeed
124 Chapter 34. 2G RF Utilization
[3]: df = pd.read_html('https://github.com/Umersaeed81/
,→KPI_Data_Set_For_Python_Scripts/blob/main/erlang_b.csv')
df=df[0]
df=df.iloc[:,1:].\
astype({'No of Trunks (N)':'str'}).\
rename(columns={"with 2% Blocking":"Offer Traffic"})
Umer Saeed
34.1 Calculation For 2G RF Utilization Cell and Network Level 125
[15]: df_n_bh=df1.loc[df1.groupby(['Date'])\
['GlobelTraffic'].idxmax()]
[16]: df0.to_csv('2G_Cell_Hourly_RF_Utilization.csv',index=False)
df_cell_bh.to_csv('2G_Cell_Busy_Hourly_RF_Utilization.csv',index=False)
df1.to_csv('2G_Network_Hourly_RF_Utilization.csv',index=False)
df_n_bh.to_csv('2G_Network_BH_RF_Utilization.csv',index=False)
Umer Saeed
126 Chapter 34. 2G RF Utilization
[17]: df_sla=pd.DataFrame({
'KPI':['GOS-SDCCH(%)','CallSetup TCH GOS(%)','Mobility TCH GOS(%)'],
'Target Value':[0.1,2,2]})
[21]: kp3.to_excel('Summary.xlsx')
Umer Saeed