0% found this document useful (0 votes)

42 views4 pages

Upwork Scraping Job 12871

The document provides instructions for scraping contract data from a natural gas pipeline website. It outlines scraping data from a summary grid displaying post dates, contract holders, and other details for a 90 day range. It also describes scraping additional data from linked detail pages and specifying the desired output schema and files to submit. The job is to be completed within a week by submitting source code and sample output to a linked GitHub repository.

Uploaded by

lacktii lackti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views4 pages

Upwork Scraping Job 12871

Uploaded by

lacktii lackti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

GTN (Gas Transmission Northwest LLC) Pipeline Data – 12871

⚠️ If the resource you are scraping requires you to agree to any Terms & Conditions,
please do not proceed and notify your contract manager immediately. Under no
circumstances should you create a false account or fake identity.

Description:

Please write a scraper tool to enter a Post Date (from) and Post Date (to) range of 90 days back from
the day the scrape is run (ie. Today going back 90 days). There is a limit of 90 days for the query.

• Parameters Setup,
o Enter the From Date,
o Enter the End Date
o Click Retrieve Button, then you will see the grid of data

• Click each row in the table, you can get into the details for each contract:
Scraping Description

There should be one output dataset:

• Summary Dataset
o Scrape all data display in the summary grid (blue section in the first illustration). You
can either
▪ Manually scrape the data in the html
▪ Using the [Download] button on the top right, which will give you CSV
o Add an additional link column in the dataset.
▪ For each row, the link is the html link pointing to the detail page.
o Add an additional scrape_time column to indicate the scrape time

The desired schema is listed below

Root URL:

http://tcplus.com/GTN/ContractRouteRate/Interruptible

Job Frequency:

Realtime (every minute)

Output Columns:

File One: summary.csv

Column Name Original Columns Type Example

scrape_time - datetime
post_datetime Post Date / Time datetime 20230223 12:10:45
Castleton Commodities Merchant Trading
k_holder_name K Holder Name str
L.P.
k_holder K Holder str 118638852
svc_req_k Svc Req K str 20578
rate_sch Rate Sch str PAL
it_qty_k IT Qty – K int 30000
k_stat K Stat str N
disc_beg_date Disc Beg Date datetime 20230224
disc_end_date Disc End Date datetime 20230224
receipt_loc Loc str 370672
receipt_loc_name Loc Name sts MALIN MC
receipt_loc_qti_desc Loc/QTI Desc str Rec Qty
delivery_loc Loc int 0
delivery_loc_name Loc Name str MALIN MC
delivery_loc_qti_desc Loc/QTI Desc str
loc_ind Loc Ind str I
rate_chgd Rate Chgd float 0.2
max_trf_rate Max Trf Rate float 0.204356
ngtd_rate_ind Ngtd Rate Ind n N
rate_id_desc Rate ID Desc str Loan Chrg-Bal
affil Affil str None
terms_notes Terms/Notes str N
link - str -

• Please note that the original column in the table has the same column names for receipt and
delivery locations. We should treat the first three as the receipt and the last three as
delivery.

Timeline:

You may complete this job any time and submit any required files to the linked GitHub repository
within one week of accepting the job.

Please submit your code here: https://github.com/international-data-repository-cpd/scrape-12871

Submission Files:
Sample.csv for sample data

A requirement.txt

scrape/ - containing all of the source code

Main file: scrape.py that will be run with a output $filename.

Job Schema/Output Format:

You should save the output csv using these settings from a pandas DataFrame:

encoding="utf-8",
line_terminator="\n",
quotechar='"',
quoting=csv.QUOTE_ALL,
index=False

Runtime Environment:

Your code will be copied form the root to/usr/src/scrape

You should feel free to modify the requirements as you need. However, you must keep the
awscli dependency

You may also upload additional binaries into the repository root and reference them
there.

Please do not change the Dockerfile or shell scripts in the repository as this will cause
automated test failure.

python scrape.py $filename

Page access limitations (max requests / day):

If you encounter a captcha during your scrape job, please contact the job poster before continuing.

10% of website traffic max

Scrapy Docs
100% (1)
Scrapy Docs
197 pages
Scrapy
No ratings yet
Scrapy
171 pages
Experiment2 Web Scraping and Data Analysis
No ratings yet
Experiment2 Web Scraping and Data Analysis
5 pages
Create A Web Scraping Pipeline With Python Using Data Contracts by Stephen David-Williams Feb, 2024 Level Up Coding
No ratings yet
Create A Web Scraping Pipeline With Python Using Data Contracts by Stephen David-Williams Feb, 2024 Level Up Coding
50 pages
Web Scraping
No ratings yet
Web Scraping
5 pages
Pega Notes
63% (8)
Pega Notes
167 pages
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
No ratings yet
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
193 pages
Scrapingquickstart
No ratings yet
Scrapingquickstart
32 pages
Industrial Training Presentation: Prepared By: Guided by
No ratings yet
Industrial Training Presentation: Prepared By: Guided by
27 pages
Web Scrape For Barcodes
No ratings yet
Web Scrape For Barcodes
9 pages
Web Scraping
No ratings yet
Web Scraping
28 pages
Upwork Scraping Job 12873
No ratings yet
Upwork Scraping Job 12873
4 pages
Demonstration of Arcpy GIS Applications: By: Anmol Bhardwaj NHP, BBMB Chandigarh
No ratings yet
Demonstration of Arcpy GIS Applications: By: Anmol Bhardwaj NHP, BBMB Chandigarh
38 pages
Sagax Upd
No ratings yet
Sagax Upd
5 pages
Creating Cronjobs With Selenium and Python 1686640101
No ratings yet
Creating Cronjobs With Selenium and Python 1686640101
9 pages
Amazon WEB Scrapin G: Using Python
No ratings yet
Amazon WEB Scrapin G: Using Python
9 pages
URL Void README PDF
No ratings yet
URL Void README PDF
6 pages
Image Scrapper From Scratch To Proudction
No ratings yet
Image Scrapper From Scratch To Proudction
22 pages
Taiyo - Ai Asignment Solution
No ratings yet
Taiyo - Ai Asignment Solution
16 pages
FDSWeb Scraping
No ratings yet
FDSWeb Scraping
31 pages
Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
No ratings yet
Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
14 pages
Project Documentation
No ratings yet
Project Documentation
36 pages
Daniel Burseth Co-President MIT Big Data Explorers
No ratings yet
Daniel Burseth Co-President MIT Big Data Explorers
34 pages
How To Build A Web Scraper For Tenders Extraction
No ratings yet
How To Build A Web Scraper For Tenders Extraction
12 pages
Extracting Code
No ratings yet
Extracting Code
4 pages
Chatgpt Code Chat Data
No ratings yet
Chatgpt Code Chat Data
32 pages
Introduction To Web Scraping in RPA With Python
No ratings yet
Introduction To Web Scraping in RPA With Python
10 pages
Scraping Document
No ratings yet
Scraping Document
5 pages
VL2023240503445 Pe003
No ratings yet
VL2023240503445 Pe003
11 pages
New Text Document
No ratings yet
New Text Document
5 pages
Energy Information, Data, and Other Resources - OpenEI
No ratings yet
Energy Information, Data, and Other Resources - OpenEI
4 pages
Data Description
No ratings yet
Data Description
1 page
Pivot-Points Scraper
No ratings yet
Pivot-Points Scraper
5 pages
Real Estate Scraper
No ratings yet
Real Estate Scraper
23 pages
LinkedAgent MVP
No ratings yet
LinkedAgent MVP
51 pages
Flask
No ratings yet
Flask
4 pages
Web Scraping and Data Collection CheatSheet 1731972399
No ratings yet
Web Scraping and Data Collection CheatSheet 1731972399
10 pages
Web Scraping 2
No ratings yet
Web Scraping 2
14 pages
21CSC303JJ SEPM - Ex 1
No ratings yet
21CSC303JJ SEPM - Ex 1
4 pages
4 Tasks - Python Developer
No ratings yet
4 Tasks - Python Developer
3 pages
DataReplicator - Applicaton Deployment Architecture
No ratings yet
DataReplicator - Applicaton Deployment Architecture
11 pages
Scrapeez
No ratings yet
Scrapeez
3 pages
Summary Paper 13 14 15
No ratings yet
Summary Paper 13 14 15
2 pages
Basic Scraping Techniques
No ratings yet
Basic Scraping Techniques
7 pages
Task - Data Engineering
No ratings yet
Task - Data Engineering
2 pages
87 1
No ratings yet
87 1
10 pages
Automl Code
No ratings yet
Automl Code
3 pages
Indeed Scraper
No ratings yet
Indeed Scraper
2 pages
Document 2
No ratings yet
Document 2
6 pages
Synopsis Format Project
No ratings yet
Synopsis Format Project
6 pages
DH
No ratings yet
DH
4 pages
Software Requirements Specification (SRS) : Project Title: Prepared By: Date
No ratings yet
Software Requirements Specification (SRS) : Project Title: Prepared By: Date
5 pages
Data Description
No ratings yet
Data Description
1 page
Advanced Scraping Techniques
No ratings yet
Advanced Scraping Techniques
4 pages
UI Ex 6 (61) - 1
No ratings yet
UI Ex 6 (61) - 1
3 pages
Genesys Cloud CX - Data Actions Notes - Part 6
No ratings yet
Genesys Cloud CX - Data Actions Notes - Part 6
7 pages
Project Report: Id Card Generator
No ratings yet
Project Report: Id Card Generator
37 pages
Attachment 1
No ratings yet
Attachment 1
2 pages
84 3
No ratings yet
84 3
10 pages
1Z0 1003 24 Demo
No ratings yet
1Z0 1003 24 Demo
4 pages
BMS Scraper Technical Overview
No ratings yet
BMS Scraper Technical Overview
3 pages
The Doxing Bible: Religion of Web Forensics
No ratings yet
The Doxing Bible: Religion of Web Forensics
30 pages
Task 1
No ratings yet
Task 1
48 pages
TCODE - S - ALR - 87013181 Material Ledger Data Over Several Periods
No ratings yet
TCODE - S - ALR - 87013181 Material Ledger Data Over Several Periods
5 pages
FISMA-ISO27002 Policy Map
No ratings yet
FISMA-ISO27002 Policy Map
3 pages
MDM 103HF1 UpgradingFromVersion10x en PDF
No ratings yet
MDM 103HF1 UpgradingFromVersion10x en PDF
168 pages
SIM7500 SIM7600 Series at Command Manual V2.00
No ratings yet
SIM7500 SIM7600 Series at Command Manual V2.00
460 pages
Opennebula Instal Steps
No ratings yet
Opennebula Instal Steps
23 pages
Business Requirement
No ratings yet
Business Requirement
4 pages
Chapter 2
No ratings yet
Chapter 2
43 pages
Ahmad Car Theft Reporting System Complete-1
No ratings yet
Ahmad Car Theft Reporting System Complete-1
47 pages
QP24DP2 - 290 - 13-03-2024 13:25:26 - 117.55.242.132
No ratings yet
QP24DP2 - 290 - 13-03-2024 13:25:26 - 117.55.242.132
1 page
Nagarjuna Java Full Stack Resume
No ratings yet
Nagarjuna Java Full Stack Resume
3 pages
Elisabet Yuvitasari - ERP Mind Map PDF
No ratings yet
Elisabet Yuvitasari - ERP Mind Map PDF
1 page
CMDB Health
No ratings yet
CMDB Health
3 pages
Lesson 1 Overview of Big Data Analytics
No ratings yet
Lesson 1 Overview of Big Data Analytics
6 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
20 pages
AAI Delivery Enablement Guide v1.8 September 2021
No ratings yet
AAI Delivery Enablement Guide v1.8 September 2021
27 pages
Rakesh Khatri Senior Business Analyst
No ratings yet
Rakesh Khatri Senior Business Analyst
2 pages
Man Eng Mov11.6 Alarm Dispatcher
No ratings yet
Man Eng Mov11.6 Alarm Dispatcher
34 pages
Book2 (KARA Solution Introduction v2.1401.03)
No ratings yet
Book2 (KARA Solution Introduction v2.1401.03)
20 pages
Information System Auditing UTS
No ratings yet
Information System Auditing UTS
5 pages
Bhargavi Resume
No ratings yet
Bhargavi Resume
2 pages
Software Engineering - ESC501: - Prof. Poulami Dutta
No ratings yet
Software Engineering - ESC501: - Prof. Poulami Dutta
7 pages
Higher Education's Top 10 Strategic Technologies For 2016: Educause Center For Analysis and Research
No ratings yet
Higher Education's Top 10 Strategic Technologies For 2016: Educause Center For Analysis and Research
55 pages
rdb1 ws0910 v2 2x3 PDF
No ratings yet
rdb1 ws0910 v2 2x3 PDF
14 pages
Saas-Based Solutions For Healthcare Transformation: Ibm Explorys Epm Application Suite
No ratings yet
Saas-Based Solutions For Healthcare Transformation: Ibm Explorys Epm Application Suite
5 pages
C Programming Pocket Primer: An Essential Guide to C Programming Basics
From Everand
C Programming Pocket Primer: An Essential Guide to C Programming Basics
Mercury Learning and Information
No ratings yet
VPS Server Setup
From Everand
VPS Server Setup
L Mohan Arun
5/5 (1)

Upwork Scraping Job 12871

Uploaded by

Upwork Scraping Job 12871

Uploaded by

GTN (Gas Transmission Northwest LLC) Pipeline Data – 12871

There should be one output dataset:

The desired schema is listed below

Realtime (every minute)

File One: summary.csv

Column Name Original Columns Type Example

Please submit your code here: https://github.com/international-data-repository-cpd/scrape-12871

scrape/ - containing all of the source code

Main file: scrape.py that will be run with a output $filename.

Job Schema/Output Format:

Your code will be copied form the root to/usr/src/scrape

python scrape.py $filename

Page access limitations (max requests / day):

10% of website traffic max

You might also like