0% found this document useful (0 votes)

13 views2 pages

Assignment 2

The document provides instructions to simulate an ETL pipeline that: 1. Filters order data from a local file for pending payments, saves it to staging. 2. Moves the filtered data to HDFS landing and runs validation checks. 3. "Processes" the data by moving it to HDFS staging and creating a sample results file. 4. Brings the results file back locally, renames it, and cleans up temporary files and folders.

Uploaded by

kalidas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views2 pages

Assignment 2

Uploaded by

kalidas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Assignment - Week 2

1. Login to your Gateway node & open a terminal

2. write a command to know what's your home directory in gateway node

3. There is a third party service which will drop a file named orders.csv in the
landing folder under your home directory.

Then you need to filter for all the orders where status is PENDING_PAYMENT &
create a new file named orders_filtered.csv and put it to the staging folder.

Then take this file and put it to hdfs in landing folder in your hdfs

and do a couple of more things...

So to simulate this..

1. create two folders named landing and staging in your home directory.

2. copy the file present under /data/retail_db/orders folder to the landing folder in
your home directory.

3. Apply the grep command to filter for all orders with PENDING_PAYMENT
status.

4. create a new file named orders_filtered.csv under your staging folder with the
filtered results.

5. create a folder hierarchy in your hdfs home named data/landing

6. copy this orders_filtered.csv file from your staging folder in local to

data/landing folder in your hdfs.

7. Run a command to check number of records in orders_filtered.csv file under

data/landing folder

8. Write a command to list the files in the data/landing folder of hdfs.

9. reframe this command so that you can see the file size in kb's

10. change the permission of this file

give read,write and execute to the owner
read and write to the group
read to others

11. create a new folder data/staging in your hdfs and move orders_filtered.csv
from data/landing to data/staging

12. Now let's assume a spark program would have run on your staging folder to
do some processing and let's say the processed results gives you just 2 lines as
ouput
3617,2013-08-15 00:00:00.0,8889,PENDING_PAYMENT
68714,2013-09-06 00:00:00.0,8889,PENDING_PAYMENT

To simulate this, create a new file called orders_result.csv in the home directory
of your local gateway node using vi editor and have the above 2 records..

13. move orders_result.csv from local to hdfs under a new directory called
data/results (thing as if spark program has run and has created this file)

14. Now the processed results we want to bring back to local under a folder
data/results in your local. so run a command to bring the file from hdfs to local.

15. rename the file orders_result.csv under data/results folder in your local to
final_results.csv

16. Now we are done.. so delete all the directories that you have created in your
local as well as hdfs.

Osmania University Unix Programming Lab Manual
No ratings yet
Osmania University Unix Programming Lab Manual
82 pages
EMMA Usermanual
No ratings yet
EMMA Usermanual
215 pages
Big Data Class Activity Assignment 2
No ratings yet
Big Data Class Activity Assignment 2
17 pages
Hadoop Assignement Sumit 241111 133837
No ratings yet
Hadoop Assignement Sumit 241111 133837
13 pages
Get DefenderReport
No ratings yet
Get DefenderReport
8 pages
Lab - Exp - 10 (Creating, Reading and Writing A File Using Input and Output Streams)
No ratings yet
Lab - Exp - 10 (Creating, Reading and Writing A File Using Input and Output Streams)
7 pages
Exp-2 Hadoop Commands
No ratings yet
Exp-2 Hadoop Commands
6 pages
BDC Output 2
No ratings yet
BDC Output 2
4 pages
Avdut Final
No ratings yet
Avdut Final
33 pages
Assignment Week 1
No ratings yet
Assignment Week 1
9 pages
Practical 1 - 1 - Hadoop Commands
No ratings yet
Practical 1 - 1 - Hadoop Commands
3 pages
Batch Data Communication: Objective
No ratings yet
Batch Data Communication: Objective
59 pages
Importing and Exporting Files in Hadoop Distributed File System
No ratings yet
Importing and Exporting Files in Hadoop Distributed File System
6 pages
RPD X RPT Com MILP
No ratings yet
RPD X RPT Com MILP
29 pages
L2 Accessing HDFS On Cloudera Distribution
No ratings yet
L2 Accessing HDFS On Cloudera Distribution
5 pages
2007 08 23 STE Journal Based Backup (TSM)
100% (3)
2007 08 23 STE Journal Based Backup (TSM)
61 pages
Assignment HDFS
No ratings yet
Assignment HDFS
1 page
UNIX Shell Scripting: Objectives
No ratings yet
UNIX Shell Scripting: Objectives
41 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Dsa Practical File
No ratings yet
Dsa Practical File
16 pages
Wa0003.
No ratings yet
Wa0003.
40 pages
OSV Practical List
No ratings yet
OSV Practical List
1 page
05 Architectural Styles
No ratings yet
05 Architectural Styles
37 pages
Architectural Patterns: Massimo Felici Conrad Hughes
No ratings yet
Architectural Patterns: Massimo Felici Conrad Hughes
49 pages
GitHub - Nickmckay - LiPD-utilities - Input - Output and Manipulation Utilities For LiPD Files in Matlab, R and Python
No ratings yet
GitHub - Nickmckay - LiPD-utilities - Input - Output and Manipulation Utilities For LiPD Files in Matlab, R and Python
3 pages
WpDataTables Documentation
No ratings yet
WpDataTables Documentation
164 pages
MadVR Settings
100% (1)
MadVR Settings
27 pages
HOL - Exploring HDFS
No ratings yet
HOL - Exploring HDFS
6 pages
270954
100% (1)
270954
4 pages
Uipath Uipath Ardv1
No ratings yet
Uipath Uipath Ardv1
19 pages
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
No ratings yet
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
51 pages
UNIT-4: Filters
No ratings yet
UNIT-4: Filters
30 pages
UNIX Programming - Module 2 Notes PDF
100% (1)
UNIX Programming - Module 2 Notes PDF
42 pages
Unix 100 Scripts Sample
No ratings yet
Unix 100 Scripts Sample
16 pages
lab lookups subsearches
0% (1)
lab lookups subsearches
16 pages
What Is A Pipe in Linux?
No ratings yet
What Is A Pipe in Linux?
4 pages
Hadoop Imp Commands
No ratings yet
Hadoop Imp Commands
21 pages
60 Methods For Cloud Attacks
No ratings yet
60 Methods For Cloud Attacks
42 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Log Analysis With The ELK Stack
100% (1)
Log Analysis With The ELK Stack
30 pages
Frequency Mean and Standard Deviation of Control Group
No ratings yet
Frequency Mean and Standard Deviation of Control Group
4 pages
Os Lab Manual
No ratings yet
Os Lab Manual
98 pages
5) Unix
No ratings yet
5) Unix
33 pages
Unix Notes
No ratings yet
Unix Notes
40 pages
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
From Everand
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
Adam Freeman
No ratings yet
PowerShell 7 for IT Professionals
From Everand
PowerShell 7 for IT Professionals
Thomas Lee
1/5 (1)
Functional Programming For Dummies
From Everand
Functional Programming For Dummies
John Paul Mueller
No ratings yet
Firebase Storage for Angular: A reliable file upload solution for your applications
From Everand
Firebase Storage for Angular: A reliable file upload solution for your applications
Abdelfattah Ragab
No ratings yet
Practical Go: Building Scalable Network and Non-Network Applications
From Everand
Practical Go: Building Scalable Network and Non-Network Applications
Amit Saha
No ratings yet
Creating Wordpress Online Store and Wordpress Online Magazine
From Everand
Creating Wordpress Online Store and Wordpress Online Magazine
Dr. Hidaia Mahmood Alassouli
No ratings yet
Getting Started with SAS Programming: Using SAS Studio in the Cloud
From Everand
Getting Started with SAS Programming: Using SAS Studio in the Cloud
Ron Cody
No ratings yet
Red Hat Enterprise Linux 6 Administration: Real World Skills for Red Hat Administrators
From Everand
Red Hat Enterprise Linux 6 Administration: Real World Skills for Red Hat Administrators
Sander van Vugt
No ratings yet
Mastering Unix Shell Scripting: Bash, Bourne, and Korn Shell Scripting for Programmers, System Administrators, and UNIX Gurus
From Everand
Mastering Unix Shell Scripting: Bash, Bourne, and Korn Shell Scripting for Programmers, System Administrators, and UNIX Gurus
Randal K. Michael
3.5/5 (2)
Product Data Filtering: User's Guide
No ratings yet
Product Data Filtering: User's Guide
34 pages
Backend Handbook: for Ruby on Rails Apps
From Everand
Backend Handbook: for Ruby on Rails Apps
Francisco Quintero
1/5 (1)
How to Write a Bulk Emails Application in Vb.Net and Mysql: Step by Step Fully Working Program
From Everand
How to Write a Bulk Emails Application in Vb.Net and Mysql: Step by Step Fully Working Program
Lotfi Ferchichi
No ratings yet
PHP and MySQL 24-Hour Trainer
From Everand
PHP and MySQL 24-Hour Trainer
Andrea Tarr
No ratings yet
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mamood Alassouli
No ratings yet
Introduction to PHP Web Services: PHP, JavaScript, MySQL, SOAP, RESTful, JSON, XML, WSDL
From Everand
Introduction to PHP Web Services: PHP, JavaScript, MySQL, SOAP, RESTful, JSON, XML, WSDL
Imran Ghani
No ratings yet
Linux Command Line and Shell Scripting Bible
From Everand
Linux Command Line and Shell Scripting Bible
Richard Blum
3/5 (3)
Learn Kubernetes & Docker - .NET Core, Java, Node.JS, PHP or Python
From Everand
Learn Kubernetes & Docker - .NET Core, Java, Node.JS, PHP or Python
Arnaud Weil
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Ansible For Linux by Examples
From Everand
Ansible For Linux by Examples
Luca Berton
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Make Bootstrap Themes
From Everand
Make Bootstrap Themes
Bo Feng
No ratings yet
OpenCart Tips and Tricks
From Everand
OpenCart Tips and Tricks
iSenseLabs
No ratings yet
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mahmood Alassouli
No ratings yet
Windows Command Prompt
From Everand
Windows Command Prompt
Murat Yildirimoglu
No ratings yet
phpMyAdmin Starter
From Everand
phpMyAdmin Starter
Marc Delisle
No ratings yet
Introduction to PHP, Part 1, Second Edition
From Everand
Introduction to PHP, Part 1, Second Edition
Adam Majczak
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Internet Information Services 8.5
From Everand
Internet Information Services 8.5
Murat Yildirimoglu
No ratings yet
The Mac Terminal Reference and Scripting Primer
From Everand
The Mac Terminal Reference and Scripting Primer
Jay Docherty
4.5/5 (3)
Learn Kubernetes - Container orchestration using Docker: Learn Collection
From Everand
Learn Kubernetes - Container orchestration using Docker: Learn Collection
Arnaud Weil
4/5 (1)
Linux Services Deployment
From Everand
Linux Services Deployment
Fabian Mestre
No ratings yet
Easy Linux For Beginners
From Everand
Easy Linux For Beginners
Felix Cannon
2/5 (1)
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
The Definitive Guide to Getting Started with OpenCart 2.x
From Everand
The Definitive Guide to Getting Started with OpenCart 2.x
iSenseLabs
No ratings yet
Hacking of Computer Networks: Full Course on Hacking of Computer Networks
From Everand
Hacking of Computer Networks: Full Course on Hacking of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
From Everand
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
NoSQL Injection for Elasticsearch
From Everand
NoSQL Injection for Elasticsearch
Gary Drocella
No ratings yet
A concise guide to PHP MySQL and Apache
From Everand
A concise guide to PHP MySQL and Apache
alasdair gilchrist
4/5 (2)
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
Evaluation of Some Windows and Linux Intrusion Detection Tools
From Everand
Evaluation of Some Windows and Linux Intrusion Detection Tools
Dr. Hidaia Mahmood Alassouli
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet
Overview of Some Windows and Linux Intrusion Detection Tools
From Everand
Overview of Some Windows and Linux Intrusion Detection Tools
Dr. Hidaia Mahmood Alassouli
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)

Assignment 2

Uploaded by

Assignment 2

Uploaded by

Assignment - Week 2

1. Login to your Gateway node & open a terminal

2. write a command to know what's your home directory in gateway node

and do a couple of more things...

5. create a folder hierarchy in your hdfs home named data/landing

6. copy this orders_filtered.csv file from your staging folder in local to

7. Run a command to check number of records in orders_filtered.csv file under

8. Write a command to list the files in the data/landing folder of hdfs.

10. change the permission of this file

You might also like