Assessment Task - Carbon38
Assessment Task - Carbon38
This is a scrapy data extraction assignment where you need to use Scrapy Framework for data
extraction.
The objective of this task is to evaluate the learning, technical and other skills related to a
programming environment.
For the given website the candidate must do the following guidelines to extract the data and store
in the required format mentioned below.
1. The candidate must develop a scrapy project for the website provided and extract a
minimum of 1000 data items from the website and store the data in a database
2. The project should be well structured and modularized.
3. The coding must go through three important steps:
a. Crawling
i. Going through each of the URLs from the URL provided and going through
each and every pages as well (Proper Pagination)
b. Parsing
i. Parsing is to be done in the last depth, where we find product/person/
property details, all of the required fields mentioned below is to be collected
using xpath
c. Cleaning & Data Structuring
i. The extracted data should be cleaned properly and the data should be
structured in the specified format as explained below.
Prerequisite:
B. Parsing
Eg URL: https://www.carbon38.com/product/tessa-top-primary-stripe
{
"breadcrumbs": [
"Home",
"Designers",
"Beach Riot",
"Tessa Top"
],
"image_url":
"https://www.carbon38.com/media/catalog/product/s/u/sund-su214em40-blusky-bik
er-shorts-tile-2452.jpg",
"brand": "BEACH RIOT",
"product_name": "Tessa Top",
"price": "$98",
"reviews": "0 Reviews",
"colour": "PRIMARY STRIPE",
"sizes": [
"XS",
"S",
"M",
"L",
"XL"
],
"description": "The Tessa Top from Beach Riot is a cropped, tight-fitting
active tank done in the brand's signature ultra-soft ribbed fabric. This
scoop-neck top features thick straps for extra support and brightly colored
side stripes. Pair with the matching Megan legging for a playful,
spring-ready active set.",
"sku": "BEAC-BR00309SX-COLBLK",
"product_id": "170378"
}