Upwork Scraping Job 12871
Upwork Scraping Job 12871
⚠️ If the resource you are scraping requires you to agree to any Terms & Conditions,
please do not proceed and notify your contract manager immediately. Under no
circumstances should you create a false account or fake identity.
Description:
Please write a scraper tool to enter a Post Date (from) and Post Date (to) range of 90 days back from
the day the scrape is run (ie. Today going back 90 days). There is a limit of 90 days for the query.
• Parameters Setup,
o Enter the From Date,
o Enter the End Date
o Click Retrieve Button, then you will see the grid of data
• Click each row in the table, you can get into the details for each contract:
Scraping Description
• Summary Dataset
o Scrape all data display in the summary grid (blue section in the first illustration). You
can either
▪ Manually scrape the data in the html
▪ Using the [Download] button on the top right, which will give you CSV
o Add an additional link column in the dataset.
▪ For each row, the link is the html link pointing to the detail page.
o Add an additional scrape_time column to indicate the scrape time
Root URL:
http://tcplus.com/GTN/ContractRouteRate/Interruptible
Job Frequency:
• Please note that the original column in the table has the same column names for receipt and
delivery locations. We should treat the first three as the receipt and the last three as
delivery.
Timeline:
You may complete this job any time and submit any required files to the linked GitHub repository
within one week of accepting the job.
Submission Files:
Sample.csv for sample data
A requirement.txt
You should save the output csv using these settings from a pandas DataFrame:
encoding="utf-8",
line_terminator="\n",
quotechar='"',
quoting=csv.QUOTE_ALL,
index=False
Runtime Environment:
You should feel free to modify the requirements as you need. However, you must keep the
awscli dependency
You may also upload additional binaries into the repository root and reference them
there.
Please do not change the Dockerfile or shell scripts in the repository as this will cause
automated test failure.
If you encounter a captcha during your scrape job, please contact the job poster before continuing.