0% found this document useful (0 votes)
17 views

4 Tasks_Python Developer_

Uploaded by

yash sontakke
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

4 Tasks_Python Developer_

Uploaded by

yash sontakke
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

I.

Ycombinator Scraping

https://www.ycombinator.com/companies?tags=SaaS.

Name
Category
Description
Team Size
Job openings (count)
Linkedin URLs
Location

Stored output in .csv file to be shared. Also share the working file. Name the file as
<Ycombinator_your name>.

II. Directory

The directory shared with you is the USA Aerospace Directory. Scrape pages 14-18.Each page has 3
columns - you have to scrape all the contents of each page. Roughly 120-150 companies would be
available in total.
Output file to contain the following information:
Category Name
Company Name
Phone Number
Fax
Persona
Designation
Address
Licensing
Company Brief
Email
Website
Stored output in .csv file to be shared. Also share the working file. Name the file as
<directory_your name>.

III. G2 Scraping

Link: CRM SoftwareSales Tools


There are 875 listings within the link above. Scrape all 875 listings, and fetch the below fields within these
listings.

For each listing, the output file should have the following fields.

From the first page, fetch the following


1. Product Name (Example: Salesforce Sales Cloud)
2. Rating
Within the first page (If you click on Salesforce Sales Cloud)
3. Product Description
4. Website
5. Seller Name
6. HQ Location
7. Total Revenue
8. Ownership
9. Twitter (Social Media Handle)

The output should be stored as a .csv file and shared. Also share the working file. Name the file as
<g2 scraping_your name>.

IV. Populate the tech stack (key words given to you below) for the websites given to you in
csv file.

"CRM Software": ["Salesforce", "HubSpot", "Zoho CRM", "Pipedrive"],


"ERP Software": ["SAP", "Oracle", "Microsoft Dynamics"],
"Marketing Automation Software": ["Marketo", "Pardot", "Eloqua", "Hubspot"],
"Front-End Technologies": ["React.js", "AngularJS", "Vue.js"],
"Back-End Technologies": ["Node.js", "Ruby on Rails", "Django"],
"Database Systems": ["MySQL", "PostgreSQL", "MongoDB"],
"Version Control Systems": ["Git (GitHub)", "GitLab", "Bitbucket"],
"Cloud Service Providers": ["Amazon Web Services (AWS)", "Microsoft Azure", "Google Cloud
Platform"],
"Web Servers": ["Apache", "Nginx"],
"Ads": ["Google Ads", "Bing Ads"],
"Container Orchestration": ["Kubernetes", "Docker Swarm"],
"Security Software": ["McAfee", "Symantec", "Kaspersky"],
"Identity Management": ["Okta", "OneLogin"],
"Communication Tools": ["Slack", "Microsoft Teams"],
"Project Management Tools": ["Jira", "Trello", "Asana"],
"Analytics and Data Visualization Tools": ["Google Analytics", "Tableau", "Looker", "Power BI", "Clarity"],
"Content Management Systems (CMS)": ["WordPress", "Drupal"],
"E-commerce Platforms": ["Shopify", "Magento", "Wix"],
"Payment Gateways": ["Stripe", "PayPal", "Chargebee"],
"CDN": ["Cloudflare", "Cloudfront", "fastly", "google cloud CDN" ],
"Customer Support Tools": ["Zendesk", "Freshdesk", "Intercom", "Qualified"],
"Email Provider": ["Gmail", "Outlook"]

Output File should be in csv - something like this:

The output should be stored as a .csv file and shared. Also share the working file. Name the file as
<techstack_your name>.

You might also like