Assignment Web Scraping

The document outlines an assignment to scrape data from a government website on foreign contributions to Indian organizations, clean the raw data by consolidating files and adding additional fields, and submit the scripts, documentation, and final dataset within two working days to continue the interview process. The raw data would be stored in a nested folder structure by year, state and district before being transformed into a single consolidated file with additional columns for state, district, and other identifying information.

Uploaded by

Aakriti Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

183 views2 pages

Assignment Web Scraping

Uploaded by

Aakriti Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Assignment

Web Scraping and Data Transformation using python

Website: FCRA Online Services

This is a dataset of foreign contributions to Indian organizations and the organizations’ returns,
which falls under the ministry of home affairs.

The objective is to scrape this data for all years, states, and districts and create another script to
clean and consolidate all the data into one single file.

The first python script should go to each year, state, and district to get the tabular data present on
the website and store it as a CSV file under raw data in the proper folder structure of the below-
mentioned fashion

1. raw_data
a. State
i. District
1. Year
a. Output.csv

You may use python libraries like requests, urllib3, beautiful soup, or scrapy and pandas to achieve
this task. It would be lovely if you can automate the data upload process to your cloud storage
(google drive or onedrive). Data upload automation is not mandatory but is great to have.

The second python script should read all the files from the raw_data folder to read the CSV files
using pandas and clean the columns. You should add the state name district name to each file and
create a consolidated data file for all the years, states, and districts.

The final dataset should be of this format as a CSV file.

yea state_na district_na registration_ association_na addre amount_of_FC_recei

r me me no me ss ved

Submission:

1. You need to create a zip folder in your cloud storage (gdrive/onedrive) with Name_JDC_IDP
as a folder name and share it with us.
2. Under Assignment, we expect you to provide us with the following
a. Python script to scrape the raw data
b. Draft on the steps followed during the data scraping process
c. Python script to clean and consolidate (data transformation) the data
d. Draft on the steps followed during the data transformation process, ss
e. Exploratory analysis report on final dataset
f. Final Dataset

Note:

Please refrain from sending us ipython notebooks, you can always convert your Jupiter notebook to
a script.

Please adhere to best coding standards like naming conventions script comments in all of your
scripts with the aim that anyone can run the scripts get the raw data, and transform the data.
Timeline:

You should be completing these exercises in 2 working days after receiving the email. If you fail to
send the assignments on time, the interview process will void. In case of emergency, please write to
us.

DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
100% (1)
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
256 pages
SEO 101 Guide
100% (5)
SEO 101 Guide
341 pages
Delhivery Feature Engineering - Solution Approach
No ratings yet
Delhivery Feature Engineering - Solution Approach
7 pages
DSBDA Lab Manual24-25
No ratings yet
DSBDA Lab Manual24-25
58 pages
N RQgi 8 Eg DUNFS451 K4 X QXA
No ratings yet
N RQgi 8 Eg DUNFS451 K4 X QXA
61 pages
Data Science Papers
No ratings yet
Data Science Papers
109 pages
Project 5-EasyVisa Assignment
No ratings yet
Project 5-EasyVisa Assignment
57 pages
Dataset Join
No ratings yet
Dataset Join
12 pages
Finaldoc
No ratings yet
Finaldoc
19 pages
Final Document
No ratings yet
Final Document
14 pages
Assignment
No ratings yet
Assignment
12 pages
Exercises 5
No ratings yet
Exercises 5
7 pages
DW Lab File
No ratings yet
DW Lab File
18 pages
Projects On Big Data
No ratings yet
Projects On Big Data
4 pages
Phase 2
No ratings yet
Phase 2
14 pages
Prac 7
No ratings yet
Prac 7
5 pages
Self Intoduction 1 Project
No ratings yet
Self Intoduction 1 Project
11 pages
Naukri ShyamPrabhakarAmbilkar 9124317 - 03 04 - 1
No ratings yet
Naukri ShyamPrabhakarAmbilkar 9124317 - 03 04 - 1
4 pages
All Code Explanations
No ratings yet
All Code Explanations
8 pages
Robotics Process Automation (RPA)
No ratings yet
Robotics Process Automation (RPA)
12 pages
Doc3 Merged
No ratings yet
Doc3 Merged
16 pages
ELC Assignment
No ratings yet
ELC Assignment
4 pages
Cooperative University of Kenyan Python Assignment
No ratings yet
Cooperative University of Kenyan Python Assignment
2 pages
Dsbda Lab - 1 - 1736243987425
No ratings yet
Dsbda Lab - 1 - 1736243987425
10 pages
1
No ratings yet
1
3 pages
Sandhya Bompelly
No ratings yet
Sandhya Bompelly
5 pages
DWV A1
No ratings yet
DWV A1
3 pages
Class 12 Practical File Informatics Practices Python
No ratings yet
Class 12 Practical File Informatics Practices Python
19 pages
My Resume Nov
No ratings yet
My Resume Nov
1 page
Sample Resume - 1yr DS
No ratings yet
Sample Resume - 1yr DS
2 pages
DWV Internal Sets
No ratings yet
DWV Internal Sets
1 page
Naukri BindukumarC (3y 6m)
No ratings yet
Naukri BindukumarC (3y 6m)
1 page
Capstone Project Assignment
No ratings yet
Capstone Project Assignment
3 pages
Hrushi de Update
No ratings yet
Hrushi de Update
2 pages
CSV Processor
No ratings yet
CSV Processor
3 pages
Mid Term Project
No ratings yet
Mid Term Project
3 pages
Assignment No 01
No ratings yet
Assignment No 01
2 pages
Getting Started With Beautiful Soup Build Your Own Web Scraper and Learn All About Web Scraping With Beautiful Soup (PDFDrive)
100% (2)
Getting Started With Beautiful Soup Build Your Own Web Scraper and Learn All About Web Scraping With Beautiful Soup (PDFDrive)
130 pages
Internship Entrance Test
No ratings yet
Internship Entrance Test
2 pages
Attachment 1
No ratings yet
Attachment 1
2 pages
Viet An Resume
No ratings yet
Viet An Resume
1 page
Cummins 6bt5.9
No ratings yet
Cummins 6bt5.9
461 pages
Millermatic 200 User Manual
No ratings yet
Millermatic 200 User Manual
72 pages
Data Visualization R Programming Power Bi Lab Record
No ratings yet
Data Visualization R Programming Power Bi Lab Record
29 pages
Data Analytics For Absolute Beginners A Deconstructed Guide To Data Literacy 1081762462 9781081762469
100% (1)
Data Analytics For Absolute Beginners A Deconstructed Guide To Data Literacy 1081762462 9781081762469
133 pages
Analyzing Financial and Economic Data With R
100% (1)
Analyzing Financial and Economic Data With R
304 pages
Gilera-DNA 180 2003
No ratings yet
Gilera-DNA 180 2003
255 pages
CDC Oracle
No ratings yet
CDC Oracle
223 pages
UserGuide 15.3 Sunrise Elink PDF
No ratings yet
UserGuide 15.3 Sunrise Elink PDF
490 pages
Learn Python-eBook
No ratings yet
Learn Python-eBook
27 pages
Web Crawling State of ArtTechniques ApproachesandApplication
No ratings yet
Web Crawling State of ArtTechniques ApproachesandApplication
26 pages
How To Increase Traffic & Sales With SEO - (EN)
No ratings yet
How To Increase Traffic & Sales With SEO - (EN)
36 pages
Power SEO Tool
No ratings yet
Power SEO Tool
10 pages
Smartz
No ratings yet
Smartz
18 pages
Search Optimization Techniques - Trainer Guide
No ratings yet
Search Optimization Techniques - Trainer Guide
77 pages
Activity 1. Saliksikin Ang Iba't Ibang Search Engine at Ibigay Ang Mga Gamit (Functions) Nito. (30 PTS)
100% (1)
Activity 1. Saliksikin Ang Iba't Ibang Search Engine at Ibigay Ang Mga Gamit (Functions) Nito. (30 PTS)
2 pages
Google Slides: Prepared By: in Collaboration
No ratings yet
Google Slides: Prepared By: in Collaboration
7 pages
Boss SE-70 Algorithm Guide
No ratings yet
Boss SE-70 Algorithm Guide
114 pages
Ringer
No ratings yet
Ringer
17 pages
Search Engine - Wikipedia
No ratings yet
Search Engine - Wikipedia
25 pages
Data Scientist Intern 1594698832
No ratings yet
Data Scientist Intern 1594698832
2 pages
Asok
No ratings yet
Asok
5 pages
Downloaded From Manuals Search Engine
No ratings yet
Downloaded From Manuals Search Engine
46 pages
Texture Pack Old
No ratings yet
Texture Pack Old
20 pages
Cookies 1 1 1
No ratings yet
Cookies 1 1 1
18 pages
Semrush-Domain Overview (Desktop) - Threadsofperu Com-21st Nov 2022
No ratings yet
Semrush-Domain Overview (Desktop) - Threadsofperu Com-21st Nov 2022
8 pages
SEO Checklist by H-Educate
No ratings yet
SEO Checklist by H-Educate
5 pages
Hemant Kumar: Objective
No ratings yet
Hemant Kumar: Objective
2 pages
Python Data Science Cookbook
From Everand
Python Data Science Cookbook
Taryn Voska
No ratings yet
Beginning R: The Statistical Programming Language
From Everand
Beginning R: The Statistical Programming Language
Mark Gardener
4.5/5 (4)
R Programming Insights Textbook
From Everand
R Programming Insights Textbook
Manish Soni
No ratings yet
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
From Everand
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
Taryn Voska
No ratings yet
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
From Everand
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Kim Chantala
No ratings yet
IBM Cognos 8 Planning
From Everand
IBM Cognos 8 Planning
Jason Edwards
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
Building a Product Master
From Everand
Building a Product Master
Edufdev
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Q Tips: Fast, Scalable, and Maintainable Kdb+
From Everand
Q Tips: Fast, Scalable, and Maintainable Kdb+
Nick Psaris
No ratings yet
Information Technology HandBook
From Everand
Information Technology HandBook
Duong Tran
3/5 (1)
Learn R By Coding
From Everand
Learn R By Coding
Thomas Kurnicki
No ratings yet
Active Directory Disaster Recovery
From Everand
Active Directory Disaster Recovery
Florian Rommel
No ratings yet
SAS Viya: The Python Perspective
From Everand
SAS Viya: The Python Perspective
Kevin D. Smith
No ratings yet
IGNOU PGDCA MCS 201 Programming in C and Python Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 201 Programming in C and Python Previous Years Unsolved Papers
Manish Soni
No ratings yet
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
Nios4 FIRST STEPS
From Everand
Nios4 FIRST STEPS
Gessica Monteforte
No ratings yet
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet

Assignment Web Scraping

Uploaded by

Assignment Web Scraping

Uploaded by

Assignment

Web Scraping and Data Transformation using python

Website: FCRA Online Services

The final dataset should be of this format as a CSV file.

yea state_na district_na registration_ association_na addre amount_of_FC_recei

You might also like