0% found this document useful (0 votes)

126 views9 pages

Cyber Security Coding in Python

Uploaded by

rasheedmumuni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

126 views9 pages

Cyber Security Coding in Python

Uploaded by

rasheedmumuni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

# This Python 3 environment comes with many helpful analytics libraries

installed
# It is defined by the python Docker image
# For example, here's several helpful packages to load

import numpy as np # linear algebra

import pandas as pd # data processing, CSV file I/O
import matplotlib.pyplot as plt # Data Visualization
import seaborn as sns # Data Visualization

# Input data files are available in the read-only "../input/" directory

# For example, running this will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146:
UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this
version of SciPy (detected version 1.23.5
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/input/cyber-security-breaches/CyberSecurityBreaches.csv
In [2]:
# Load dataset
health_data =
pd.read_csv('/input/cyber-security-breaches/CyberSecurityBreaches.csv')
# Displaying first 5 columns
health_data.head()
Out[2]:

S/n Individual
Name Type Date Breach Location Associate Description
0 s

A binder containing
Brooke Army Healthcare 2009-
0 1 1000 Theft Paper/Films False the protected health
Medical Center Provider 10-21
information

Five desktop
Kidney Stone Healthcare 2009-
1 2 1000 Theft Network Server False computers containing
Association, LLC Provider 10-28
unencrypted

Department of Other Portable

Healthcare 2009-
2 3 Health and Social 501 Theft Electronic False \N
Provider 10-30
Services Device

3 4 Health Services for Health Plan 3800 2009- Loss Laptop False A laptop was lost by
S/n Individual
Name Type Date Breach Location Associate Description
0 s

Children with an employee while in

11-17
Special Need transit

A shared Computer
Healthcare 2009- Desktop
4 5 Douglas Carlson 5257 Theft False that was used for
Provider 11-20 Computer
backup was hacked

In [3]:
# Display the number of rows and columns of the DataFrame
health_data.shape
Out[3]:
(1151, 10)
In [4]:
# Display the summary of the DataFrame
health_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1151 entries, 0 to 1150
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 1151 non-null int64
1 Name.of.Covered.Entity 1151 non-null object
2 State 1151 non-null object
3 Covered.Entity.Type 1151 non-null object
4 Individuals.Affected 1151 non-null int64
5 Breach.Submission.Date 1151 non-null object
6 Type.of.Breach 1151 non-null object
7 Location.of.Breached.Information 1151 non-null object
8 Business.Associate.Present 1151 non-null bool
9 Web.Description 1101 non-null object
dtypes: bool(1), int64(2), object(7)
memory usage: 82.2+ KB

 ### Data Cleaning

In [5]:
# Drop unwanted columns
health_data.drop(columns=['Unnamed: 0'], inplace=True)
In [6]:
# Replace '.' wuth ' ' in column names
health_data.columns = health_data.columns.str.replace('.', ' ', regex=False)
In [7]:
# Check for missing values
health_data.isna().sum()
Out[7]:
Name of Covered Entity 0
State 0
Covered Entity Type 0
Individuals Affected 0
Breach Submission Date 0
Type of Breach 0
Location of Breached Information 0
Business Associate Present 0
Web Description 50
dtype: int64
In [8]:
# Handling missing values in the 'Web Description' column by filling them
with 'Not Available'
health_data['Web Description'].fillna('Not Available', inplace=True)
# Replace problematic characters '\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a' with
a single apostrophe "'"
health_data['Web Description'] = health_data['Web
Description'].str.replace('\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a\x1a', "'")

# Remove any '\n\\' characters by escaping the backlashes

health_data['Web Description'] = health_data['Web
Description'].str.replace('\\n\\\\', '', regex=False)
In [9]:
# Remove duplicate rows from the DataFrame
health_data_no_duplicates = health_data.drop_duplicates()

# Print the shape of the DataFrame before and after removing duplicates
print("Shape of DataFrame before removing duplicates:", health_data.shape)
print("Shape of DataFrame after removing duplicates:",
health_data_no_duplicates.shape)
Shape of DataFrame before removing duplicates: (1151, 9)
Shape of DataFrame after removing duplicates: (1151, 9)
In [10]:
unique_location = health_data['Location of Breached
Information'].value_counts()
print(unique_location)
Paper/Films
254
Laptop
222
Other
132
Network Server
127
Desktop Computer
108
Other Portable Electronic Device
68
Email
66
Other, Other Portable Electronic Device
44
Electronic Medical Record
30
Laptop, Other Portable Electronic Device
11
Desktop Computer, Laptop
10
Desktop Computer, Network Server
8
Laptop, Paper/Films
6
Desktop Computer, Email, Laptop, Network Server
6
Desktop Computer, Paper/Films
5
Other, Paper/Films
4
Desktop Computer, Laptop, Other Portable Electronic Device
4
Email, Network Server
4
Electronic Medical Record, Other
4
Electronic Medical Record, Paper/Films
4
Desktop Computer, Electronic Medical Record
2
Electronic Medical Record, Laptop
2
Email, Other
2
Desktop Computer, Other Portable Electronic Device
2
Laptop, Network Server
2
Email, Other Portable Electronic Device
2
Electronic Medical Record, Other, Other Portable Electronic Device
2
Electronic Medical Record, Network Server
1
Email, Laptop
1
Desktop Computer, Network Server, Paper/Films
1
Desktop Computer, Email, Laptop, Network Server, Other, Other Portable
Electronic Device 1
Desktop Computer, Email
1
Laptop, Other
1
Desktop Computer, Laptop, Network Server
1
Email, Laptop, Other Portable Electronic Device
1
Email, Laptop, Network Server
1
Desktop Computer, Network Server, Other, Other Portable Electronic Device
1
Desktop Computer, Other
1
Desktop Computer, Electronic Medical Record, Email, Network Server,
Paper/Films 1
Network Server, Other
1
Desktop Computer, Electronic Medical Record, Email, Laptop, Network
Server, Other, Other Portable Electronic Device, Paper/Films 1
Laptop, Other Portable Electronic Device, Paper/Films
1
Desktop Computer, Electronic Medical Record, Email, Laptop, Network
Server, Other, Other Portable Electronic Device 1
Desktop Computer, Other, Other Portable Electronic Device
1
Desktop Computer, Laptop, Other, Other Portable Electronic Device
1
Desktop Computer, Electronic Medical Record, Network Server
1
Other Portable Electronic Device, Paper/Films
1
Name: Location of Breached Information, dtype: int64
In [11]:
location_mapping = {
'Paper/Films': 'Physical',
'Laptop': 'Electronic',
'Network Server': 'Electronic',
'Desktop Computer': 'Electronic',
'Other Portable Electronic Device': 'Electronic',
'Email': 'Electronic',
'Electronic Medical Record': 'Electronic',
'Other': 'Other'
}
health_data['Location Group'] = health_data['Location of Breached
Information'].map(location_mapping)
In [12]:
unique_breach_types = health_data['Type of Breach'].value_counts()
print(unique_breach_types)
Theft 577
Unauthorized Access/Disclosure 183
Other 89
Loss 79
Hacking/IT Incident 77
Improper Disposal 42
Theft, Unauthorized Access/Disclosure 24
Loss, Theft 15
Hacking/IT Incident, Unauthorized Access/Disclosure 10
Unknown 10
Other, Unauthorized Access/Disclosure 7
Other, Theft 5
Loss, Unauthorized Access/Disclosure 5
Improper Disposal, Loss, Theft 3
Hacking/IT Incident, Theft, Unauthorized Access/Disclosure 3
Improper Disposal, Loss 3
Loss, Other 2
Improper Disposal, Unauthorized Access/Disclosure 2
Other, Theft, Unauthorized Access/Disclosure 2
Loss, Unknown 2
Other, Unknown 2
Hacking/IT Incident, Other 2
Loss, Unauthorized Access/Disclosure, Unknown 1
Hacking/IT Incident, Other, Unauthorized Access/Disclosure 1
Hacking/IT Incident, Theft 1
Loss, Other, Theft 1
Improper Disposal, Theft, Unauthorized Access/Disclosure 1
Unauthorized Access/Disclosure 1
Theft, Unauthorized Access/Disclosure, Unknown 1
Name: Type of Breach, dtype: int64
In [13]:
# Define a dictionary to map breach types to groups
breach_type_mapping = {
'Hacking/IT Incident': 'IT Incident',
'Improper Disposal': 'Physical Loss',
'Loss': 'Physical Loss',
'Theft': 'Physical Theft',
'Unauthorized Access/Disclosure': 'Unauthorized Access',
'Unknown': 'Unknown',
'Other': 'Other',
}

# Create a new column 'Breach Type Group' based on the mapping

health_data['Breach Type Group'] = health_data['Type of
Breach'].map(breach_type_mapping)

# Check the unique values in the new 'Breach Type Group' column
unique_breach_type_groups = health_data['Breach Type Group'].unique()
print(unique_breach_type_groups)
['Physical Theft' 'Physical Loss' 'Other' 'Unauthorized Access'
'IT Incident' nan 'Unknown']
In [14]:
health_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1151 entries, 0 to 1150
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name of Covered Entity 1151 non-null object
1 State 1151 non-null object
2 Covered Entity Type 1151 non-null object
3 Individuals Affected 1151 non-null int64
4 Breach Submission Date 1151 non-null object
5 Type of Breach 1151 non-null object
6 Location of Breached Information 1151 non-null object
7 Business Associate Present 1151 non-null bool
8 Web Description 1151 non-null object
9 Location Group 1007 non-null object
10 Breach Type Group 1057 non-null object
dtypes: bool(1), int64(1), object(9)
memory usage: 91.2+ KB

 ### Data Visualization

 #### Univariate Visualization

 ##### Bar Chart
 ###### Pie Chart

In [15]:
# Group data by breach type and count the number of breaches in each category
breach_type_counts = health_data['Breach Type Group'].value_counts()

# Create a bar chart

plt.figure(figsize=(10, 6))
breach_type_counts.plot(kind='bar', color='skyblue')
plt.xlabel('Breach Type')
plt.ylabel('Number of Breaches')
plt.title('Distribution of Breach Types')
plt.xticks(rotation=90)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

In [16]:
# Group data by covered entity type and count the number of breaches in each
category
covered_entity_counts = health_data['Covered Entity Type'].value_counts()

# Create a pie chart

plt.figure(figsize=(8, 8))
plt.pie(covered_entity_counts, labels=covered_entity_counts.index,
autopct='%1.1f%%', startangle=140, colors=sns.color_palette('pastel'))
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a
circle.
plt.title('Distribution of Breaches by Covered Entity Type')

plt.show()
 #### Bivariate Visualization
 ##### Scatter Plot
 ###### Stacked BarChart

In [17]:
health_data['Breach Submission Date'] = pd.to_datetime(health_data['Breach
Submission Date'])
health_data['Year'] = health_data['Breach Submission Date'].dt.year
# Create a scatter plot
plt.figure(figsize=(12, 6))
sns.scatterplot(x='Year', y='Individuals Affected', data=health_data,
alpha=0.5, color='purple')
plt.ylabel('Number of Individuals Affected')
plt.xlabel('Year')
plt.title('Trends in Cybersecurity Breaches Impact Over Time')
plt.axhline(y=500, color='red', linestyle='--', label='Threshold for
Reporting (500)')
plt.legend()
# Explain
plt.text(2012, 2000000, 'More Individuals\nAffected', fontsize=12,
color='red')
plt.text(2014, 1000, 'Fewer Individuals\nAffected', fontsize=12,
color='green')
plt.text(2010, 1000, 'Reporting\nThreshold', fontsize=12, color='blue')

# context
plt.annotate(
'Cybersecurity breaches reported\nsince 2009\nThreshold for reporting
breaches: 500',
xy=(2010, 500),
xytext=(2010, 2000000),
fontsize=12,
arrowprops=dict(arrowstyle='->', color='gray')
)
plt.tight_layout()
plt.show()

In [18]:
linkcode
# Group data by year and breach type, count the number of breaches in each
category
grouped_data = health_data.groupby(['Year', 'Breach Type
Group']).size().unstack().fillna(0)

# Create a stacked bar chart with the 'pastel' color palette

plt.figure(figsize=(12, 6))
grouped_data.plot(kind='bar', stacked=True, colormap='GnBu')
plt.xlabel('Year')
plt.ylabel('Number of Breaches')
plt.title('Breach Type vs. Year')
plt.xticks(rotation=90)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.legend(title='Breach Type Group', loc='upper left')
plt.tight_layout()
plt.show()
<Figure size 1200x600 with 0 Axes>

QUANTITATIVE ECONOMICS With Python PDF
No ratings yet
QUANTITATIVE ECONOMICS With Python PDF
670 pages
PWP Model Answer Summer 2022
100% (10)
PWP Model Answer Summer 2022
23 pages
12 - Skin Cancer Detection Using ML
No ratings yet
12 - Skin Cancer Detection Using ML
65 pages
Q2 G10 CSS Learning Material 2.0
No ratings yet
Q2 G10 CSS Learning Material 2.0
102 pages
CSEC IT P2 June 2023 Solution
100% (7)
CSEC IT P2 June 2023 Solution
16 pages
Preventive Maintenance Checklist
100% (1)
Preventive Maintenance Checklist
1 page
Network Diagram For A Small Organisation With 50 Users, 2 Servers, 4 Departments 3 Printers Using STAR Topology
No ratings yet
Network Diagram For A Small Organisation With 50 Users, 2 Servers, 4 Departments 3 Printers Using STAR Topology
3 pages
3rd Party Outsourcing Information Security Assessment Questionnaire V1.4
No ratings yet
3rd Party Outsourcing Information Security Assessment Questionnaire V1.4
10 pages
Template For ICT Report
No ratings yet
Template For ICT Report
8 pages
CNN Hands On
No ratings yet
CNN Hands On
12 pages
Digital Image Processing Lab Manual Part-1
No ratings yet
Digital Image Processing Lab Manual Part-1
41 pages
Deep Learning A Comprehensive Guide 1st Edition Vasudevan
No ratings yet
Deep Learning A Comprehensive Guide 1st Edition Vasudevan
60 pages
Numpy - Tutorial - Ipynb - Colaboratory
No ratings yet
Numpy - Tutorial - Ipynb - Colaboratory
9 pages
DocScanner 14-Mar-2025 11-59
No ratings yet
DocScanner 14-Mar-2025 11-59
64 pages
ICT Assessment Form
No ratings yet
ICT Assessment Form
16 pages
TGDG 2018 Introduction To Python Adrian Martinez November 2018
No ratings yet
TGDG 2018 Introduction To Python Adrian Martinez November 2018
64 pages
Nvidia-Learning-Training Course-Catalog
No ratings yet
Nvidia-Learning-Training Course-Catalog
27 pages
Personal Data Inventory For PDF
100% (1)
Personal Data Inventory For PDF
17 pages
Data Handling Using Pandas-1 - Series Object Notes PDF
No ratings yet
Data Handling Using Pandas-1 - Series Object Notes PDF
25 pages
Python Chart
No ratings yet
Python Chart
11 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
31 pages
Datascience Lab
No ratings yet
Datascience Lab
42 pages
MIT6 0001F16 ProblemSet0 PDF
No ratings yet
MIT6 0001F16 ProblemSet0 PDF
4 pages
DFA Template
No ratings yet
DFA Template
28 pages
CG - REPORT Nithin
No ratings yet
CG - REPORT Nithin
23 pages
Data Science AI Certification Program
No ratings yet
Data Science AI Certification Program
30 pages
Practical Assignment Python
No ratings yet
Practical Assignment Python
28 pages
NUS Python Analytics Brochure
No ratings yet
NUS Python Analytics Brochure
14 pages
TGCSB PPT 3 Final
No ratings yet
TGCSB PPT 3 Final
14 pages
Question Set Cyber Essentials Only VBeacon
No ratings yet
Question Set Cyber Essentials Only VBeacon
46 pages
DS-DS Lab-1
No ratings yet
DS-DS Lab-1
4 pages
Cheat Python
No ratings yet
Cheat Python
8 pages
C3M1 - Assignment: 1 Estimating Treatment Effect Using Machine Learning
No ratings yet
C3M1 - Assignment: 1 Estimating Treatment Effect Using Machine Learning
6 pages
Week1a-Notes - Jupyter Notebook
No ratings yet
Week1a-Notes - Jupyter Notebook
9 pages
Lab 06
No ratings yet
Lab 06
12 pages
Constitution
No ratings yet
Constitution
3 pages
TOPIC1
No ratings yet
TOPIC1
37 pages
Assignment-1 (Python Pandas-Series Object and Data Frame: 1. Answer The Following
100% (1)
Assignment-1 (Python Pandas-Series Object and Data Frame: 1. Answer The Following
8 pages
Computer Notes
No ratings yet
Computer Notes
44 pages
Image Datasets For Practicing Machine Learning in OpenCV
No ratings yet
Image Datasets For Practicing Machine Learning in OpenCV
9 pages
Exp1 Pcom
No ratings yet
Exp1 Pcom
7 pages
Onnx Machine Learning in Production - Blog
No ratings yet
Onnx Machine Learning in Production - Blog
4 pages
Candidate Written Assessment Digital Literacy Level 5
No ratings yet
Candidate Written Assessment Digital Literacy Level 5
8 pages
Practical File Index
No ratings yet
Practical File Index
1 page
Cybersecurity
No ratings yet
Cybersecurity
14 pages
Osint Report Vipdemo100 Gmail Com 2025-05-21
No ratings yet
Osint Report Vipdemo100 Gmail Com 2025-05-21
45 pages
On Tap Cuoi Hoc Phan
No ratings yet
On Tap Cuoi Hoc Phan
7 pages
Osint Report Viphacker100org Gmail Com 2025-05-21
No ratings yet
Osint Report Viphacker100org Gmail Com 2025-05-21
71 pages
Osint Report Vipdemo100 Gmail Com 2025-05-21
No ratings yet
Osint Report Vipdemo100 Gmail Com 2025-05-21
45 pages
ICT Infrastructure Inventory 20190527
No ratings yet
ICT Infrastructure Inventory 20190527
7 pages
Powercv 1 Docx
No ratings yet
Powercv 1 Docx
2 pages
Data Flow Analysis (Dfa) : Secret
No ratings yet
Data Flow Analysis (Dfa) : Secret
28 pages
Final Nci Idam Sop Template v1 1
No ratings yet
Final Nci Idam Sop Template v1 1
15 pages
Information Technology
No ratings yet
Information Technology
9 pages
Computing Case Study Cheat Sheet
No ratings yet
Computing Case Study Cheat Sheet
3 pages
Grade 9 REVISION (Ch6) (Ch7)
No ratings yet
Grade 9 REVISION (Ch6) (Ch7)
40 pages
Student Guideline 1
No ratings yet
Student Guideline 1
44 pages
EGES Level 1 Baseline Wksts
No ratings yet
EGES Level 1 Baseline Wksts
52 pages
MIcrosoft 365-PIA
No ratings yet
MIcrosoft 365-PIA
12 pages
IT
No ratings yet
IT
156 pages
Computer Notes
No ratings yet
Computer Notes
6 pages
2ND Opp His Examination Q Paper Sept 2923
No ratings yet
2ND Opp His Examination Q Paper Sept 2923
7 pages
Chapter 1 - Computers and The Health Professionals
No ratings yet
Chapter 1 - Computers and The Health Professionals
53 pages
Annex A5 1
No ratings yet
Annex A5 1
8 pages
Isg Data Breach Case Study 1
No ratings yet
Isg Data Breach Case Study 1
9 pages
Nursing Informatics
No ratings yet
Nursing Informatics
16 pages
Opd Sec019a
No ratings yet
Opd Sec019a
43 pages
Discussion05 - IAP301
No ratings yet
Discussion05 - IAP301
9 pages
Hims BRS? 07
No ratings yet
Hims BRS? 07
7 pages
Ajay Pooja - Resume
No ratings yet
Ajay Pooja - Resume
2 pages
Written Exam
No ratings yet
Written Exam
9 pages
PLIA 20160819 A
No ratings yet
PLIA 20160819 A
36 pages
Annex A5
No ratings yet
Annex A5
10 pages
Ethical Moral Legal Considerations in Nursing Informatics
No ratings yet
Ethical Moral Legal Considerations in Nursing Informatics
7 pages
Privacy Impact Assessment Template
No ratings yet
Privacy Impact Assessment Template
11 pages
Consolidated Readiness Assessment Tool - 23 July 2024
No ratings yet
Consolidated Readiness Assessment Tool - 23 July 2024
8 pages
Ict Inventory Cy 2020: Mobile Phone (1) (Incl. Smartphones)
No ratings yet
Ict Inventory Cy 2020: Mobile Phone (1) (Incl. Smartphones)
10 pages
302 - DOC2 - Personal Data Analysis Form
No ratings yet
302 - DOC2 - Personal Data Analysis Form
2 pages
Ajay Pooja - Resume
No ratings yet
Ajay Pooja - Resume
2 pages
Himms Copy Ko
No ratings yet
Himms Copy Ko
5 pages
Information Security Policy Template
0% (1)
Information Security Policy Template
2 pages
Cyber Security Checklist v1.0
No ratings yet
Cyber Security Checklist v1.0
5 pages
Computer Sqp1
No ratings yet
Computer Sqp1
6 pages
C S C C: Yber Ecurity Ontrols Hecklist
No ratings yet
C S C C: Yber Ecurity Ontrols Hecklist
12 pages
SR - No. Audit Checklists Items Interviewed R/Per. Remarks
No ratings yet
SR - No. Audit Checklists Items Interviewed R/Per. Remarks
4 pages
Csec Information Technology Paper 2 2023
No ratings yet
Csec Information Technology Paper 2 2023
16 pages
Bru Cyber Security Checklist
No ratings yet
Bru Cyber Security Checklist
12 pages
Train ChatGPT
79% (14)
Train ChatGPT
67 pages
Trust in Computer Systems and the Cloud
From Everand
Trust in Computer Systems and the Cloud
Mike Bursell
No ratings yet
Deep Learning For Dummies
From Everand
Deep Learning For Dummies
John Paul Mueller
No ratings yet

Cyber Security Coding in Python

Uploaded by

Cyber Security Coding in Python

Uploaded by

# This Python 3 environment comes with many helpful analytics libraries

import numpy as np # linear algebra

# Input data files are available in the read-only "../input/" directory

Department of Other Portable

Children with an employee while in

 ### Data Cleaning

# Remove any '\n\\' characters by escaping the backlashes

# Create a new column 'Breach Type Group' based on the mapping

 ### Data Visualization

 #### Univariate Visualization

# Create a bar chart

# Create a pie chart

# Create a stacked bar chart with the 'pastel' color palette

You might also like