0% found this document useful (0 votes)
13 views24 pages

Itr Project

This industrial training report details Neha Navale's experience with web scraping and SQL database management at Dcodetech, highlighting the integration of these techniques for data collection and analysis. The project emphasizes the use of Python libraries for web scraping and the application of SQL for data storage and retrieval. Overall, the training was a valuable opportunity for skill development and practical experience in the IT field.

Uploaded by

Neha Navale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views24 pages

Itr Project

This industrial training report details Neha Navale's experience with web scraping and SQL database management at Dcodetech, highlighting the integration of these techniques for data collection and analysis. The project emphasizes the use of Python libraries for web scraping and the application of SQL for data storage and retrieval. Overall, the training was a valuable opportunity for skill development and practical experience in the IT field.

Uploaded by

Neha Navale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

EDUCATION MAHARASHTRA STATE BOARD OF TECHNICAL

DILKAP RESEARCH INSTITUTE OF ENGINEERING &


MANAGEMENT STUDIES (POLYTECHNIC)

Industrial Training Report


On
Web scrapping
AT
(Dcodetech)

Submitted By:-
NEHA NAVALE
Enrollment No:-2217480037

Under the Industrial Supervisor Mentor


Shweta Salunke Swati Shelar
MAHARASHTRA STATE BOARD OF
TECHNICAL EDUCATION
Certificate
This is to certify that Roll No of 5th Semester of Diploma in computer
engineering of Institute Dilkap Institute of Engineering and Management
studies (Code: 22057) has completed the Industrial Training satisfactorily in
web-scrapping for the academic year 2024- 2025 as prescribed in the curriculum.

Place: Neral Enrollment No:


Date:

Mentor Head of the Department Principal

Seal of
Institution
Certificate of Industrial Training
ITR PROJECT

Index

Sr.No. Title Page No.


1. Acknowledgement
2. Abstract
3. Introduction to Industry
4. Project 1: Web Scrapping
4.1. Basics of Python
4.2. About Web Scrapping
4.3. Source Code
4.4. Output
5. Project2: SQL
5.1. About SQL
5.2. Source Code
5.3. Output
6. Conclusion
7. Experience of industrial training

Dilkap Research Institute of Engineering and Management studies)


i
ITR PROJECT

Acknowledgement

I am immensely grateful for the valuable industrial training opportunity provided by


Decodetechs. Being a part of this experience has been truly enriching, and I consider myself
fortunate to have had this chance for learning and growth.

I extend my heartfelt appreciation to Industries Continentals and MSBTE for giving us this
exceptional opportunity to participate in this course. It has been a significant milestone in my
career development, and I am determined to apply the skills and knowledge gained to achieve
my career objectives.

I would like to express my deepest thanks to Miss. Reshma salunke for his invaluable
involvement in making crucial decisions, providing guidance, and arranging facilities that
facilitated our learning journey. His contribution has been instrumental in shaping our
experience positively.

Moving forward, I look forward to continuing our cooperation and collaboration with all of
you in the future. Lastly, I extend my sincere gratitude to the industrial training coordinators
and faculty members for their support and dedication throughout this journey.

Dilkap Research Institute of Engineering and Management studies)


ii
ITR Project

Abstract
This project explores the integration of web scraping techniques with SQL database management to
efficiently collect, store, and analyze online data. Web scraping, a process used to extract
information from websites, is coupled with SQL (Structured Query Language) to handle data
storage and retrieval effectively. The primary objectives of this project are to demonstrate the
process of scraping data from multiple web sources, to develop a systematic approach for storing
this data in an SQL database, and to showcase the potential of SQL queries in analyzing and
visualizing the scraped data.

The project begins with the identification of relevant web sources and the design of scraping scripts
using Python libraries such as BeautifulSoup and Scrapy. These scripts are responsible for
extracting data from structured and unstructured web pages. The extracted data is then cleaned and
formatted before being imported into a relational database managed by SQL. The SQL database
schema is designed to optimize data integrity and query performance, incorporating tables and
relationships that reflect the data’s structure.

Subsequent stages involve the development of SQL queries to perform various analyses, such as
data aggregation, filtering, and reporting. The effectiveness of the integration is evaluated through
case studies and performance metrics, highlighting the benefits of combining web scraping with
SQL in terms of data accessibility and analytical capability.

Overall, this project demonstrates the practical application of web scraping and SQL in
transforming raw web data into structured, actionable insights, offering a valuable toolset for data-
driven decision-making across diverse domains.

(Dilkap Research Institute of Engineering and Management studies)


Page | 3
ITR Project

Introduction to Industry

About decodetech

“Dcodetech” established in 2016. Dcodetech is a career and educational network for


professionals and professional development. We offers a quality learning experience in the
areas of IT training. Dcodetech’s focus is on providing advanced training and certifications in
all technologies. We provide training for PYTHON, DATA SCIENCE, MACHINE LEARNING,
DATA ANALYTICS, ARITIFICIAL INTELIGENCE, ORACLE, IOT, DIGITAL MARKETING, .NET,
JAVA, PHP, IOS, ANDROID etc. in Thane with certified professionals having very good
experience and skills in particular languages. Dcodetech, a leading IT Training provider,
provides training methodologies and real-time learning experience to deliver integrated
learning solutions. Our institute also offers a wide range of IT career courses.

(Dilkap Research Institute of Engineering and Management studies)


Page | 4
ITR Project

Project 1: Web Scrapping

Basics of Python

• Python

o Python is a high level programming language

o We can develop softwares, games, web application, automation, AI

o Python was developed in the year 1991

o It is a open source programming language

o Python has more than 5 lakh libraries available for use.

o Python is a case sensitive language,

• Data Type

integer : it allows us to store whole numbers. it can be either positiveint


or negative

float : it allows us to store data in numeric form i.e in decimal. It canfloat


be positive or negative

string : It is collection of characters, numbers and special characters.str


It is always enclosed in either single quote or double quote

boolean : the boolean stores only two values i.e True /. False bool

complex : it stores complex numbers (real / imaginary) complex

(Dilkap Research Institute of Engineering and Management studies)


Page | 5
ITR Project

Variables
a variable is a human readable value that points to the address of the current location for that
specific value.
• It should not start with a number
• it should not consist of any spaces in between
• you can use special characters in the variable (-,_,)
• you can use a combination of characters and numbers
• it should not be a reserved word.

Data Types Example


1. String

2. Integer

3. Float

4. Boolean

(Dilkap Research Institute of Engineering and Management studies)


Page | 6
ITR Project

Operators in Python
1.Arithmetic operator
a) +
b) –
c) *
d) /
e) //
f) **
2. Logical operator
a)AND
b)OR
c)NOT

3. Conditional Operators
1. (>) greater than
2. (<) lesser than
3. ≥ (greater than equals)
4. ≤. (less than equals)
5. == (equality)
6. ≠ (not equals)

2. bitwise operators.
1. Bitwise and (&&)
2. bitwise or (|)
3. bitwise xor (^)
4. bitwise not (~)

3. membership operators
1. in
2. not in

(Dilkap Research Institute of Engineering and Management studies)


Page | 7
ITR Project

Data Structures
• List
o it is mutable (update)
o it can store any datatype
o you are not required to provide any indexing
o the indexing in the list starts from 0th position
o it can even store duplicate values.

• Tuple
o it is immutable (cannot be updated)
o you can store duplicated values
o the indexing in tuple starts from 0

• Dictionary
o it stores the data in key (indexing) value pair
o it is mutable (update)
o you can store duplicate values inside the dictionary, keys cannot be duplicate.
o it is denoted with {}

(Dilkap Research Institute of Engineering and Management studies)


Page | 8
ITR Project

About Web Scrapping

Web scraping is a technique used to extract data from websites. It involves retrieving web pages
and extracting useful information from them. This is often done using programming languages and
tools designed to interact with web content. Python is particularly popular for web scraping due to
its robust libraries and ease of use.Web scraping is the process of extracting data from websites.
Python is a popular choice for web scraping due to its powerful libraries and ease of use. Here’s a
basic guide on how to get started with web scraping in Python:

1. Libraries for Web Scraping

• Requests: For sending HTTP requests to fetch the webpage.


• BeautifulSoup: For parsing HTML and extracting data.
• lxml: For faster parsing of HTML and XML (can be used with BeautifulSoup).
• Selenium: For web scraping dynamic content that is loaded with JavaScript.

2. Basic Web Scraping with Requests and BeautifulSoup

• Step 1: Install Libraries

You need to install the libraries if you haven’t already. You can install them using pip:

• Step 2: Fetch a Web Page

Use the requests library to download the web page.

(Dilkap Research Institute of Engineering and Management studies)


Page | 9
ITR Project

Source Code

(Dilkap Research Institute of Engineering and Management studies)


Page | 10
ITR Project

Output

(Dilkap Research Institute of Engineering and Management studies)


Page | 11
ITR Project

(Dilkap Research Institute of Engineering and Management studies)


Page | 12
ITR Project

Project 2: SQL

About SQL

SQL - Structured Query Language.

Types of Relationship
o One to One
o One to Many. (one trainer multiple batches)
o Many to one. (multiple batches trainer one)
o Many to Many

• Database: A database is a collection of multiple tables.

• Table: A table is a collection of rows and columns.

SQL Queries Examples

(Dilkap Research Institute of Engineering and Management studies)


Page | 13
ITR Project

(Dilkap Research Institute of Engineering and Management studies)


Page | 14
ITR Project

Source Code

(Dilkap Research Institute of Engineering and Management studies)


Page | 15
ITR Project

(Dilkap Research Institute of Engineering and Management studies)


Page | 16
ITR Project

(Dilkap Research Institute of Engineering and Management studies)


Page | 17
ITR Project

Output

(Dilkap Research Institute of Engineering and Management studies)


Page | 18
ITR Project

Conclusion

The projects we tackled during our training period were incredibly valuable, greatly enhancing
our skills and knowledge. As I embarked on a 6-week industrial training at the beginning of
my 3rd year, I not only engaged in academic pursuits but also gained hands-on industrial
experience. Throughout this training stint, I successfully completed all assigned tasks and even
took charge of planning and executing projects.

Working on these projects was truly enjoyable, and the skills we acquired will undoubtedly
prove beneficial in the future. The lessons in team management that we learned are bound to
come in handy for upcoming projects. The projects themselves were captivating, providing us
with opportunities to explore new domains of study and broaden our horizons. It was a privilege
to be granted the chance to actively contribute to and learn from these projects.

(Dilkap Research Institute of Engineering and Management studies)


Page | 19
ITR Project

Eperience of Industrial Training

Embarking on this journey proved to be a fresh and demanding encounter for our team. With
determination, we successfully fulfilled all assigned tasks within the allocated timeframe. The
initial week was primarily devoted to the setup and introduction of Tableau Desktop, acquainting us
with its array of functions. While comprehensible to each of us, the true trial emerged when
constructing dashboards. Never the less, as we dedicated more time to practicing with dashboards,
their creation gradually became more manageable.

The project bestowed upon us further intensified the challenge, yet it became a catalyst for gaining
practical insights. Despite its difficulty, the project significantly enriched our knowledge and
fortified our proficiency in utilizing Tableau Desktop. This immersive industrial training experience
has been an enlightening voyage, enriching us with a myriad of newfound skills and understanding.

As the culmination of this training, I am now equipped with a wealth of valuable insights. The
exposure to real-world scenarios and hands-on practice has broadened my horizons and deepened
my understanding of data visualization and analysis. I am grateful for the opportunity to learn and
grow in such a dynamic environment, and I am excited to apply these newfound skills to future
endeavors.

(Dilkap Research Institute of Engineering and Management studies)


Page | 20
ITR Project

(Dilkap Research Institute of Engineering and Management studies)


Page | 21

You might also like