Scrape LinkedIn Using Selenium And Beautiful Soup in Python
Last Updated :
23 Jul, 2025
In this article, we are going to scrape LinkedIn using Selenium and Beautiful Soup libraries in Python.
First of all, we need to install some libraries. Execute the following commands in the terminal.
pip install selenium
pip install beautifulsoup4
In order to use selenium, we also need a web driver. You can download the web driver of either Internet Explorer, Firefox, or Chrome. In this article, we will be using the Chrome web driver.
Note: While following along with this article, if you get an error, there are most likely 2 possible reasons for that.
- The webpage took too long to load (probably because of a slow internet connection). In this case, use time.sleep() function to provide extra time for the webpage to load. Specify the number of seconds to sleep as per your need.
- The HTML of the webpage has changed from the one when this article was written. If so, you will have to manually select the required webpage elements, instead of copying the element names written below. How to find the element names is explained below. Additionally, don't decrease the window height and width from the default height and width. It also changes the HTML of the webpage.
Logging in to LinkedIn
Here we will write code for login into Linkedin, First, we need to initiate the web driver using selenium and send a get request to the URL and Identify the HTML document and find the input tags and button tags that accept username/email, password, and sign-in button.
LinkedIn Login Page
Code:
Python3
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time
# Creating a webdriver instance
driver = webdriver.Chrome("Enter-Location-Of-Your-Web-Driver")
# This instance will be used to log into LinkedIn
# Opening linkedIn's login page
driver.get("https://www.linkedin.com/login")
# waiting for the page to load
time.sleep(5)
# entering username
username = driver.find_element(By.ID, "username")
# In case of an error, try changing the element
# tag used here.
# Enter Your Email Address
username.send_keys("User_email")
# entering password
pword = driver.find_element(By.ID, "password")
# In case of an error, try changing the element
# tag used here.
# Enter Your Password
pword.send_keys("User_pass")
# Clicking on the log in button
# Format (syntax) of writing XPath -->
# //tagname[@attribute='value']
driver.find_element(By.XPATH, "//button[@type='submit']").click()
# In case of an error, try changing the
# XPath used here.
After executing the above command, you will be logged into your LinkedIn profile. Here is what it would look like.
Part 1 Code ExecutionExtracting Data From a LinkedIn Profile
Here is the video of the execution of the complete code.
Part 2 Code Execution2.A) Opening a Profile and Scrolling to the Bottom
Let us say that you want to extract data from Kunal Shah's LinkedIn profile. First of all, we need to open his profile using the URL of his profile. Then we have to scroll to the bottom of the web page so that the complete data gets loaded.
Python3
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time
# Creating an instance
driver = webdriver.Chrome("Enter-Location-Of-Your-Web-Driver")
# Logging into LinkedIn
driver.get("https://www.linkedin.com/login")
time.sleep(5)
username = driver.find_element(By.ID, "username")
username.send_keys("") # Enter Your Email Address
pword = driver.find_element(By.ID, "password")
pword.send_keys("") # Enter Your Password
driver.find_element(By.XPATH, "//button[@type='submit']").click()
# Opening Kunal's Profile
# paste the URL of Kunal's profile here
profile_url = "https://www.linkedin.com/in/kunalshah1/"
driver.get(profile_url) # this will open the link
Output:
Kunal Shah - LinkedIn Profile
Now, we need to scroll to the bottom. Here is the code to do that:
Python3
start = time.time()
# will be used in the while loop
initialScroll = 0
finalScroll = 1000
while True:
driver.execute_script(f"window.scrollTo({initialScroll},
{finalScroll})
")
# this command scrolls the window starting from
# the pixel value stored in the initialScroll
# variable to the pixel value stored at the
# finalScroll variable
initialScroll = finalScroll
finalScroll += 1000
# we will stop the script for 3 seconds so that
# the data can load
time.sleep(3)
# You can change it as per your needs and internet speed
end = time.time()
# We will scroll for 20 seconds.
# You can change it as per your needs and internet speed
if round(end - start) > 20:
break
The page is now scrolled to the bottom. As the page is completely loaded, we will scrape the data we want.
Extracting Data from the Profile
To extract data, firstly, store the source code of the web page in a variable. Then, use this source code to create a Beautiful Soup object.
Python3
src = driver.page_source
# Now using beautiful soup
soup = BeautifulSoup(src, 'lxml')
Extracting Profile Introduction:
To extract the profile introduction, i.e., the name, the company name, and the location, we need to find the source code of each element. First, we will find the source code of the div tag that contains the profile introduction.
Chrome - Inspect Elements
Now, we will use Beautiful Soup to import this div tag into python.
Python3
# Extracting the HTML of the complete introduction box
# that contains the name, company name, and the location
intro = soup.find('div', {'class': 'pv-text-details__left-panel'})
print(intro)
Output:
(Scribbled) Introduction HTML
We now have the required HTML to extract the name, company name, and location. Let's extract the information now:
Python3
# In case of an error, try changing the tags used here.
name_loc = intro.find("h1")
# Extracting the Name
name = name_loc.get_text().strip()
# strip() is used to remove any extra blank spaces
works_at_loc = intro.find("div", {'class': 'text-body-medium'})
# this gives us the HTML of the tag in which the Company Name is present
# Extracting the Company Name
works_at = works_at_loc.get_text().strip()
location_loc = intro.find_all("span", {'class': 'text-body-small'})
# Ectracting the Location
# The 2nd element in the location_loc variable has the location
location = location_loc[0].get_text().strip()
print("Name -->", name,
"\nWorks At -->", works_at,
"\nLocation -->", location)
Output:
Name --> Kunal Shah
Works At --> Founder : CRED
Location --> Bengaluru, Karnataka, India
Extracting Data from the Experience Section
Next, we will extract the Experience from the profile.
HTML of Experience Section
Python3
# Getting the HTML of the Experience section in the profile
experience = soup.find("section", {"id": "experience-section"}).find('ul')
print(experience)
Output:
Experience HTML Output
We have to go inside the HTML tags until we find our desired information. In the above image, we can see the HTML to extract the current job title and the name of the company. We now need to go inside each tag to extract the data
Scrape Job Title, company name and experience:
Python3
# In case of an error, try changing the tags used here.
li_tags = experience.find('div')
a_tags = li_tags.find("a")
job_title = a_tags.find("h3").get_text().strip()
print(job_title)
company_name = a_tags.find_all("p")[1].get_text().strip()
print(company_name)
joining_date = a_tags.find_all("h4")[0].find_all("span")[1].get_text().strip()
employment_duration = a_tags.find_all("h4")[1].find_all(
"span")[1].get_text().strip()
print(joining_date + ", " + employment_duration)
Output:
'Founder'
'CRED'
Apr 2018 – Present, 3 yrs 6 mos
Extracting Job Search Data
We will use selenium to open the jobs page.
Python3
jobs = driver.find_element(By.XPATH, "//a[@data-link-to='jobs']/span")
# In case of an error, try changing the XPath.
jobs.click()
Now that the jobs page is open, we will create a BeautifulSoup object to scrape the data.
Python3
job_src = driver.page_source
soup = BeautifulSoup(job_src, 'lxml')
Scrape Job Title:
First of all, we will scrape the Job Titles.
HTML of Job Title
On skimming through the HTML of this page, we will find that each Job Title has the class name "job-card-list__title". We will use this class name to extract the job titles.
Python3
jobs_html = soup.find_all('a', {'class': 'job-card-list__title'})
# In case of an error, try changing the XPath.
job_titles = []
for title in jobs_html:
job_titles.append(title.text.strip())
print(job_titles)
Output:
Job Titles ListScrape Company Name:
Next, we will extract the Company Name.
HTML of Company Name
We will use the class name to extract the names of the companies:
Python3
company_name_html = soup.find_all(
'div', {'class': 'job-card-container__company-name'})
company_names = []
for name in company_name_html:
company_names.append(name.text.strip())
print(company_names)
Output:
Company Names ListScrape Job Location:
Finally, we will extract the Job Location.
HTML of Job Location
Once again, we will use the class name to extract the location.
Python3
import re # for removing the extra blank spaces
location_html = soup.find_all(
'ul', {'class': 'job-card-container__metadata-wrapper'})
location_list = []
for loc in location_html:
res = re.sub('\n\n +', ' ', loc.text.strip())
location_list.append(res)
print(location_list)
Output:
Job Locations List
Similar Reads
Python Web Scraping Tutorial Web scraping is the process of extracting data from websites automatically. Python is widely used for web scraping because of its easy syntax and powerful libraries like BeautifulSoup, Scrapy, and Selenium. In this tutorial, you'll learn how to use these Python tools to scrape data from websites and
10 min read
Introduction to Web Scraping
Basics of Web Scraping
HTML BasicsHTML (HyperText Markup Language) is the standard markup language used to create and structure web pages. It defines the layout of a webpage using elements and tags, allowing for the display of text, images, links, and multimedia content. As the foundation of nearly all websites, HTML is used in over
7 min read
Tags vs Elements vs Attributes in HTMLIn HTML, tags represent the structural components of a document, such as <h1> for headings. Elements are formed by tags and encompass both the opening and closing tags along with the content. Attributes provide additional information or properties to elements, enhancing their functionality or
2 min read
CSS IntroductionCSS (Cascading Style Sheets) is a language designed to simplify the process of making web pages presentable.It allows you to apply styles to HTML documents by prescribing colors, fonts, spacing, and positioning.The main advantages are the separation of content (in HTML) and styling (in CSS) and the
4 min read
CSS SyntaxCSS is written as a rule set, which consists of a selector and a declaration block. The basic syntax of CSS is as follows:The selector is a targeted HTML element or elements to which we have to apply styling.The Declaration Block or " { } " is a block in which we write our CSS.HTML<html> <h
2 min read
JavaScript Cheat Sheet - A Basic Guide to JavaScriptJavaScript is a lightweight, open, and cross-platform programming language. It is omnipresent in modern development and is used by programmers across the world to create dynamic and interactive web content like applications and browsersJavaScript (JS) is a versatile, high-level programming language
15+ min read
Setting Up the Environment
Extracting Data from Web Pages
Fetching Web Pages
HTTP Request Methods
Searching and Extract for specific tags Beautifulsoup
Scrapy Basics
Scrapy - Command Line ToolsPrerequisite: Implementing Web Scraping in Python with Scrapy Scrapy is a python library that is used for web scraping and searching the contents throughout the web. It uses Spiders which crawls throughout the page to find out the content specified in the selectors. Hence, it is a very handy tool to
5 min read
Scrapy - Item LoadersIn this article, we are going to discuss Item Loaders in Scrapy. Scrapy is used for extracting data, using spiders, that crawl through the website. The obtained data can also be processed, in the form, of Scrapy Items. The Item Loaders play a significant role, in parsing the data, before populating
15+ min read
Scrapy - Item PipelineScrapy is a web scraping library that is used to scrape, parse and collect web data. For all these functions we are having a pipelines.py file which is used to handle scraped data through various components (known as class) which are executed sequentially. In this article, we will be learning throug
10 min read
Scrapy - SelectorsScrapy Selectors as the name suggest are used to select some things. If we talk of CSS, then there are also selectors present that are used to select and apply CSS effects to HTML tags and text. In Scrapy we are using selectors to mention the part of the website which is to be scraped by our spiders
7 min read
Scrapy - ShellScrapy is a well-organized framework, used for large-scale web scraping. Using selectors, like XPath or CSS expressions, one can scrape data seamlessly. It allows systematic crawling, and scraping the data, and storing the content in different file formats. Scrapy comes equipped with a shell, that h
9 min read
Scrapy - SpidersScrapy is a free and open-source web-crawling framework which is written purely in python. Thus, scrapy can be installed and imported like any other python package. The name of the package is self-explanatory. It is derived from the word 'scraping' which literally means extracting desired substance
11 min read
Scrapy - Feed exportsScrapy is a fast high-level web crawling and scraping framework written in Python used to crawl websites and extract structured data from their pages. It can be used for many purposes, from data mining to monitoring and automated testing. This article is divided into 2 sections:Creating a Simple web
5 min read
Scrapy - Link ExtractorsIn this article, we are going to learn about Link Extractors in scrapy. "LinkExtractor" is a class provided by scrapy to extract links from the response we get while fetching a website. They are very easy to use which we'll see in the below post. Scrapy - Link Extractors Basically using the "LinkEx
5 min read
Scrapy - SettingsScrapy is an open-source tool built with Python Framework. It presents us with a strong and robust web crawling framework that can easily extract the info from the online page with the assistance of selectors supported by XPath. We can define the behavior of Scrapy components with the help of Scrapy
7 min read
Scrapy - Sending an E-mailPrerequisites: Scrapy Scrapy provides its own facility for sending e-mails which is extremely easy to use, and itâs implemented using Twisted non-blocking IO, to avoid interfering with the non-blocking IO of the crawler. This article discusses how mail can be sent using scrapy. For this MailSender
2 min read
Scrapy - ExceptionsPython-based Scrapy is a robust and adaptable web scraping platform. It provides a variety of tools for systematic, effective data extraction from websites. It helps us to automate data extraction from numerous websites. Scrapy Python Scrapy describes the spider that browses websites and gathers dat
7 min read
Selenium Python Basics