0% found this document useful (0 votes)
43 views

Web Scraping With Python - A Simple Guide (IMDB Movies)

The document provides a guide to web scraping movie data from IMDB using Python. It outlines 6 steps: 1) importing libraries like Requests and BeautifulSoup, 2) sending an HTTP request to IMDB, 3) parsing the HTML response, 4) extracting title and year data for each movie, 5) cleaning the data, and 6) saving the results to a text file. The goal is to automatically collect and save a list of the top rated movies and their associated years from IMDB.

Uploaded by

Kane Smith
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Web Scraping With Python - A Simple Guide (IMDB Movies)

The document provides a guide to web scraping movie data from IMDB using Python. It outlines 6 steps: 1) importing libraries like Requests and BeautifulSoup, 2) sending an HTTP request to IMDB, 3) parsing the HTML response, 4) extracting title and year data for each movie, 5) cleaning the data, and 6) saving the results to a text file. The goal is to automatically collect and save a list of the top rated movies and their associated years from IMDB.

Uploaded by

Kane Smith
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

MJ ANALYTICS

Manoj Kumar

WEB SCRAPING WITH


PYTHON: A SIMPLE
GUIDE (IMDB MOVIES)

https://www.linkedin.com/in/mk-analytics
MJ ANALYTICS

Manoj Kumar

Web scraping is a tool we can use to copy and


collect information from websites.

Think of it like using a highlighter to select and


collect the information you want.

We're going to do this with the Top Rated Movies on


IMDB using Python's BeautifulSoup and requests
libraries.

https://www.linkedin.com/in/mk-analytics/
MJ ANALYTICS

Manoj Kumar

Step 1: Import Required Libraries

Before we start, we need the right tools.

Python’s requests library is like our browser: it can


send requests to a website to get information.

BeautifulSoup is our data organizer: it makes sense


of the information we receive.

https://www.linkedin.com/in/mk-analytics/
MJ ANALYTICS

Manoj Kumar

Step 2: Send HTTP Request

Now, we tell requests to get the IMDB's Top Rated


Movies page.

It's like typing the URL into your browser and hitting
enter.

https://www.linkedin.com/in/mk-analytics/
MJ ANALYTICS

Manoj Kumar

Step 3: Parse HTML Content

Here's where BeautifulSoup comes in.

It takes the 'soup' of code we got from the request and


organizes it so we can find what we're looking for.

https://www.linkedin.com/in/mk-analytics/
MJ ANALYTICS

Manoj Kumar

Step 4: Extract Data


Now that the data is organized, we ask BeautifulSoup
to find the specific pieces we're interested in.

In this case, it's the title and year of each movie.

https://www.linkedin.com/in/mk-analytics/
MJ ANALYTICS

Manoj Kumar

Step 5: Clean and Print Data

Then we tidy up the data.


We're removing any unwanted characters from the
title and year.
After cleaning, we print out the result.

https://www.linkedin.com/in/mk-analytics/
MJ ANALYTICS

Manoj Kumar

Output

https://www.linkedin.com/in/mk-analytics/
MJ ANALYTICS

Manoj Kumar

Step 6: Save Data

Finally, we're writing the same movie titles and years


to a file.
The output is a file named 'imdb_movies.txt' in your
current directory.
If you open this file, you'll see the same list of movie
titles and years:

https://www.linkedin.com/in/mk-analytics/
MJ ANALYTICS

Manoj Kumar

There you have it!

You've just learned the basics of web scraping


using Python.

We can use this technique on other websites too.


Happy scraping!😄

https://www.linkedin.com/in/mk-analytics/
MJ ANALYTICS

Manoj Kumar

LOOKING FOR REAL-


WORLD EXPERIENCE IN
DATA ANALYTICS?

DM ME ON LINKEDIN TO KNOW MORE


OR
USE 'BOOK A 1:1 CALL' LINK IN MY
PROFILE BIO

https://www.linkedin.com/in/mk-analytics

You might also like