0% found this document useful (0 votes)

37 views3 pages

Assignment 3

Uploaded by

Jiawei Huang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views3 pages

Assignment 3

Uploaded by

Jiawei Huang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Assignment 3

Assignment on Spark and Cloud Data Platform

Due by July 7

Note:

Submit a compressed archive (zip, tar, etc.) of your code, along with the output files and
CLI screenshots (output/input commands with results). Please include a PDF document
with answers to the below questions.

Contact your TA for any questions related to this assignment or post clarification questions
to the Piazza platform.

Part A:
Input Data - kddcup.data_10_percent.gz 10% subset. (2.1M; 75M Uncompressed) from
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

Please read the paper provided with your assignment in the Quercus and answer the
following question.

1. [Marks: 15] What is an Intrusion Detection System? Is it possible to implement an

Intrusion Detection System on this dataset? Explain the workflow described in the
paper for implementing the Intrusion Detection System.

This part needs to be done by using PySpark or Spark-SQL in Databricks.

2. [Marks: 5] Use the python urllib library to extract the KDD Cup 99 data from their web
repository, store it in a temporary location and then move it to the Databricks filesystem
which can enable easy access to this data for analysis. {Hint: You can use the following
commands in Databricks to get your data.}

import urllib.request
urllib.request.urlretrieve("http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10
_percent.gz", "/tmp/kddcup_data.gz")
dbutils.fs.mv("file:/tmp/kddcup_data.gz", "dbfs:/kdd/kddcup_data.gz")
display(dbutils.fs.ls("dbfs:/kdd"))

3. [Marks: 5] After storing the data in the Databricks filesystem. Load your data from the
disk into Spark's RDD. Print 10 values of your RDD and verify the type of data
structure of your data (RDD).
4. [Marks: 5] Split the data. (Each entry in your RDD is a comma-separated line of data,
which you first need to split before you can parse and build your data frame.) Show
the total number of features (columns) and print results. See this link for more details.
http://kdd.ics.uci.edu/databases/kddcup99/task.html

5. [Marks: 5] Now extract these 6 columns (duration, protocol_type, service, src_bytes,

dst_bytes, flag and label) from your dataset. Build a new RDD and data frame. Print
schema and display 10 values.

6. [Marks: 5] Get the total number of connections based on the protocol_type and based
on the service. Show results in an ascending order. Plot the bar graph for both.

7. [Marks: 15] Do a further exploratory data analysis, including other columns of this
dataset and plot graphs. Plot at least 3 different charts and explain them.

8. [Marks: 20] Look at the label column where label == ‘normal’. Now create a new label
column where you have a label == ‘normal’ and everything else is considered as an
‘attack’. Split your data (train/test) and based on your new label column now build a
simple machine learning model for intrusion detection (you can use few selected
columns for your model out of all). Explain which algorithm you have selected and
why. Show the results with some success metrics.

Part B:

1. [Marks: 5] Read the below statements, choose the correct answer, and provide
explanations. You can get more information by visiting this link.
https://azure.microsoft.com/en-us/overview/what-is-paas/

Statements Yes No

1. A platform as a service (PaaS) solution that hosts

web apps in Azure provide professional
development services to continuously add
features to custom applications.

2. A platform as a service (PaaS) database offering in

Azure provides built-in high availability.

2. [Marks: 5] Read the below statement, choose the correct answer, and provide
explanations.

A relational database must be used when:

a. A dynamic schema is required
b. Data will be stored as key/value pairs
c. Storing large images and videos
d. Strong consistency guarantees are required
3. [Marks: 5] Read the below statement, choose the correct answer, and provide
explanations.

When you are implementing a Software as a Service solution, you are responsible for:
a. Configuring high availability
b. Defining scalability rules
c. Installing the SaaS solution
d. Configuring the SaaS solution

4. [Marks: 5] Read the below statements, choose the correct answer, and provide
explanations.
Statements Yes No

1. To achieve a hybrid cloud model, a company

must always migrate from a private cloud model

2. A company can extend the capacity of its internal

network by using a public cloud

3. In a public cloud model, only guest users at your

company can access the resources in the cloud

5. [Marks: 5] Read the below statements, choose the correct answer, and provide
explanations.

a. A cloud service that remains available after a failure occurs ______________

b. A cloud service that can be recovered after a failure occurs _______________
c. A cloud service that performs quickly when demand increases ____________
d. A cloud service that can be accessed quickly from the internet _____________

Disaster recovery, Fault Tolerance, Low Latency, Dynamic Scalability

12 IP QUESTION PAPER
No ratings yet
12 IP QUESTION PAPER
8 pages
Data Science Papers
No ratings yet
Data Science Papers
109 pages
Ade Companywise Interview
No ratings yet
Ade Companywise Interview
133 pages
cyber_security
No ratings yet
cyber_security
47 pages
MLS C01
0% (1)
MLS C01
4 pages
DATA ENG
No ratings yet
DATA ENG
10 pages
BDA_ASSIGNMENT-1
No ratings yet
BDA_ASSIGNMENT-1
3 pages
bda_23
No ratings yet
bda_23
12 pages
Computational Thinking Theory Answers
No ratings yet
Computational Thinking Theory Answers
2 pages
heq-oct23-dip-bdm
No ratings yet
heq-oct23-dip-bdm
4 pages
It Is A Model For Enabling Convenient
No ratings yet
It Is A Model For Enabling Convenient
6 pages
T24 Versions
100% (1)
T24 Versions
26 pages
Big Tech Interview Prep Sergei Iastrebov
100% (1)
Big Tech Interview Prep Sergei Iastrebov
27 pages
Programming Guide 201
No ratings yet
Programming Guide 201
326 pages
All-chapters (Revised Chapter 1 and 2) (1)
No ratings yet
All-chapters (Revised Chapter 1 and 2) (1)
26 pages
Unit 5
No ratings yet
Unit 5
17 pages
CC Mini Project Report
No ratings yet
CC Mini Project Report
20 pages
Part A - Develop An Outline For A Business Continuity Plan For An IT Infrastructure
0% (1)
Part A - Develop An Outline For A Business Continuity Plan For An IT Infrastructure
3 pages
SAP ABAP Questions - Smartforms - Part 1
100% (2)
SAP ABAP Questions - Smartforms - Part 1
20 pages
Setting Up (53M) : ABAP Core Data Services Extraction For SAP Data Intelligence
No ratings yet
Setting Up (53M) : ABAP Core Data Services Extraction For SAP Data Intelligence
16 pages
SE3060 - Database Systems
No ratings yet
SE3060 - Database Systems
6 pages
Expression of Interest (EOI) For Vendor Empanelment
No ratings yet
Expression of Interest (EOI) For Vendor Empanelment
13 pages
ADF Common Mistakes
No ratings yet
ADF Common Mistakes
83 pages
IT Security
No ratings yet
IT Security
56 pages
DOCPRO Whatsnew
No ratings yet
DOCPRO Whatsnew
3 pages
Partial Project Planning
No ratings yet
Partial Project Planning
36 pages
Signetweb Access On A Windows Computer
No ratings yet
Signetweb Access On A Windows Computer
7 pages
WS-BPEL 2.0 Beginner's Guide Sample Chapter
No ratings yet
WS-BPEL 2.0 Beginner's Guide Sample Chapter
62 pages
316302main - ITS-policies-list-sort 030609
No ratings yet
316302main - ITS-policies-list-sort 030609
4 pages
Aim-To Provide A Project Plan of Online Boookstore System Objective
No ratings yet
Aim-To Provide A Project Plan of Online Boookstore System Objective
3 pages
12 Essential SQL Interview Questions and Answers - Upwork™
No ratings yet
12 Essential SQL Interview Questions and Answers - Upwork™
16 pages
Jira Tutorial PDF Tutorialspoint
No ratings yet
Jira Tutorial PDF Tutorialspoint
5 pages
What Is Java Technology and Why Do I Need It?
No ratings yet
What Is Java Technology and Why Do I Need It?
2 pages
The Enhanced Er (Eer) Model: CHAPTER 8 (6/E) CHAPTER 4 (5/E)
No ratings yet
The Enhanced Er (Eer) Model: CHAPTER 8 (6/E) CHAPTER 4 (5/E)
14 pages
Administrative Essentials For New Admins in Lightining Experience
No ratings yet
Administrative Essentials For New Admins in Lightining Experience
2 pages
Abhijeet Wankhade Angular Developer
No ratings yet
Abhijeet Wankhade Angular Developer
7 pages
InnerClasses PDF
No ratings yet
InnerClasses PDF
30 pages
VENKAT-BA Resume
No ratings yet
VENKAT-BA Resume
7 pages
15 Servelets
No ratings yet
15 Servelets
27 pages
Apex - Soql PDF
No ratings yet
Apex - Soql PDF
3 pages
Make2Pack Seger
No ratings yet
Make2Pack Seger
17 pages
Comptia Cloud+ CV0 - 004: 715 Questions and Explanation
From Everand
Comptia Cloud+ CV0 - 004: 715 Questions and Explanation
Arabella Kushner
No ratings yet
Engineering Data Mesh in Azure Cloud: Implement data mesh using Microsoft Azure's Cloud Adoption Framework
From Everand
Engineering Data Mesh in Azure Cloud: Implement data mesh using Microsoft Azure's Cloud Adoption Framework
Aniruddha Deswandikar
No ratings yet
Azure Fundamentals Exam Insights
From Everand
Azure Fundamentals Exam Insights
PRIYANKA
No ratings yet
Confluent Certified Developer for Apache Kafka® Exam kit
From Everand
Confluent Certified Developer for Apache Kafka® Exam kit
PRIYANKA
No ratings yet
AWS Certified Developer Associate (DVA-C01) Practice Test
From Everand
AWS Certified Developer Associate (DVA-C01) Practice Test
iCertify Training
No ratings yet
Azure Fundamentals Success Kit
From Everand
Azure Fundamentals Success Kit
PRIYANKA
No ratings yet
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
From Everand
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
Dave Fowler
No ratings yet
C++ Basics for New Programmers: A Practical Guide with Examples
From Everand
C++ Basics for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
AZ-900 Microsoft Azure Fundamentals - Practice Exam Guide
From Everand
AZ-900 Microsoft Azure Fundamentals - Practice Exam Guide
Steve Brown
No ratings yet
Microsoft Power Platform For Dummies
From Everand
Microsoft Power Platform For Dummies
Jack A. Hyman
1/5 (1)
DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Preparation
From Everand
DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Preparation
Georgio Daccache
No ratings yet
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
From Everand
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Kim Chantala
No ratings yet
Creating your MySQL Database: Practical Design Tips and Techniques
From Everand
Creating your MySQL Database: Practical Design Tips and Techniques
Marc Delisle
3/5 (1)
Designing Microsoft Azure Infrastructure Solution AZ 305
From Everand
Designing Microsoft Azure Infrastructure Solution AZ 305
Manish Soni
No ratings yet
AWS Cloud Practitioner Study Guide & Practice Tests
From Everand
AWS Cloud Practitioner Study Guide & Practice Tests
SUJAN
No ratings yet
AZ-900 Azure Fundamentals Practice Paper 6: AZ-900 Azure Fundamentals, #6
From Everand
AZ-900 Azure Fundamentals Practice Paper 6: AZ-900 Azure Fundamentals, #6
Tech Interviews
No ratings yet
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
From Everand
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
vivian njoroge
No ratings yet
AZ-900 Azure Fundamentals Practice Paper 4: AZ-900 Azure Fundamentals, #4
From Everand
AZ-900 Azure Fundamentals Practice Paper 4: AZ-900 Azure Fundamentals, #4
Tech Interviews
No ratings yet
AWS Cloud Practitioner Exam Success Kit
From Everand
AWS Cloud Practitioner Exam Success Kit
SUJAN
No ratings yet
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
From Everand
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
Steve Brown
No ratings yet
AZ-900 Azure Fundamentals Practice Paper 1: AZ-900 Azure Fundamentals, #1
From Everand
AZ-900 Azure Fundamentals Practice Paper 1: AZ-900 Azure Fundamentals, #1
Tech Interviews
No ratings yet
AZ-104 Azure Administrator Practice Paper 1: AZ-104 Azure Administrator, #1
From Everand
AZ-104 Azure Administrator Practice Paper 1: AZ-104 Azure Administrator, #1
Tech Interviews
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Google Cloud Professional Cloud Security Engineer 100+ Practice Exam Questions with Detailed Answers
From Everand
Google Cloud Professional Cloud Security Engineer 100+ Practice Exam Questions with Detailed Answers
vivian njoroge
No ratings yet
AZ-900 Azure Fundamentals Practice Paper 5: AZ-900 Azure Fundamentals, #5
From Everand
AZ-900 Azure Fundamentals Practice Paper 5: AZ-900 Azure Fundamentals, #5
Tech Interviews
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Oracle Certified Master Java Enterprise Architect OCMJEA 1Z0 807
From Everand
Oracle Certified Master Java Enterprise Architect OCMJEA 1Z0 807
Manish Soni
No ratings yet
SC-200: Microsoft Security Operations Analyst Preparation
From Everand
SC-200: Microsoft Security Operations Analyst Preparation
Georgio Daccache
No ratings yet
Exam AZ 900: Azure Fundamental Study Guide-1: Explore Azure Fundamental guide and Get certified AZ 900 exam
From Everand
Exam AZ 900: Azure Fundamental Study Guide-1: Explore Azure Fundamental guide and Get certified AZ 900 exam
Mamta Devi
No ratings yet
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
From Everand
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
Devi Prasad
No ratings yet
AZ-104 Azure Administrator Practice Paper 2: AZ-104 Azure Administrator, #2
From Everand
AZ-104 Azure Administrator Practice Paper 2: AZ-104 Azure Administrator, #2
Tech Interviews
No ratings yet
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
From Everand
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
Brian Knight
No ratings yet
Customizing AutoCAD 2020, 13th Edition
From Everand
Customizing AutoCAD 2020, 13th Edition
Prof. Sham Tickoo
No ratings yet
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
From Everand
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
SUJAN
No ratings yet
Learning Oracle 12c: A PL/SQL Approach
From Everand
Learning Oracle 12c: A PL/SQL Approach
Prof. Sham Tickoo
No ratings yet
Exam AZ-800: Administering Windows Server Hybrid Core Infrastructure Preparation
From Everand
Exam AZ-800: Administering Windows Server Hybrid Core Infrastructure Preparation
Georgio Daccache
No ratings yet
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
From Everand
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
Exam OG
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
AWS Solution Architect Certification Exam Practice Paper 2019
From Everand
AWS Solution Architect Certification Exam Practice Paper 2019
Tech Interviews
3.5/5 (3)
Cloud Computing Interview Questions You'll Most Likely Be Asked: Second Edition
From Everand
Cloud Computing Interview Questions You'll Most Likely Be Asked: Second Edition
Vibrant Publishers
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SQL Server Interview Questions You'll Most Likely Be Asked
From Everand
SQL Server Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
From Everand
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
Vibrant Publishers
5/5 (1)
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Java / J2EE Interview Questions You'll Most Likely Be Asked
From Everand
Java / J2EE Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Assignment 3

Uploaded by

Assignment 3

Uploaded by

Assignment 3

Assignment on Spark and Cloud Data Platform

1. [Marks: 15] What is an Intrusion Detection System? Is it possible to implement an

This part needs to be done by using PySpark or Spark-SQL in Databricks.

5. [Marks: 5] Now extract these 6 columns (duration, protocol_type, service, src_bytes,

1. A platform as a service (PaaS) solution that hosts

2. A platform as a service (PaaS) database offering in

A relational database must be used when:

1. To achieve a hybrid cloud model, a company

2. A company can extend the capacity of its internal

3. In a public cloud model, only guest users at your

a. A cloud service that remains available after a failure occurs ______________

Disaster recovery, Fault Tolerance, Low Latency, Dynamic Scalability

You might also like