0% found this document useful (0 votes)
2 views16 pages

Project 2 EmailbySeleniumSameProject

The document outlines a project using Selenium and Python to scrape school data from a specified website. It details the necessary setup, including module imports, the use of selectors, wait strategies, and the final code to extract and save the data into a CSV file. Key steps include handling dynamic elements, managing browser visibility, and ensuring proper data extraction techniques.

Uploaded by

rana.navdeep557
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views16 pages

Project 2 EmailbySeleniumSameProject

The document outlines a project using Selenium and Python to scrape school data from a specified website. It details the necessary setup, including module imports, the use of selectors, wait strategies, and the final code to extract and save the data into a CSV file. Key steps include handling dynamic elements, managing browser visibility, and ensuring proper data extraction techniques.

Uploaded by

rana.navdeep557
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Project-2 USing the Selemium+ Python(schools)

Demo url: https://directory.ntschools.net/#/schools

Data to be extracted:
• Name of schools
• Telephone number
• Email Address
• Physical and Postal Address
METHOD:
• Pip install selenium

• Use https://selenium-python.readthedocs.io/ for further


documentation
Step 1: import necessary modules like web driver,
chrome,service,by, keys ,time
• By module is used to access locating elements mainly
•  By XPATH
•  BY CSS SELECTOR (if css selector not working ), then only goes with
•  BY ID
•  BY CLASSNAME
• Keys module is used for sending keys event
• The time module is used to pause the browser until the desired result is obtained, unless the driver quits unexpectedly without prior
notice.

from selenium import webdriver


from selenium.webdriver import Chrome
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time
Ques: Nowadays executable_path is not working . What to do?

• Ans:
• #driver=Chrome(executable_path="D:/DataScientist/WebScraperPractical/chromedriver-win64/
chromedriver.exe")
• # executable path is removed , so use SERVICE instead
• s=Service("D:/DataScientist/WebScraperPractical/chromedriver-win64/chromedriver.exe")
• driver=webdriver.Chrome(service=s)

• CHECK DOCUMENTATION : https://selenium-python.readthedocs.io/


Step 2: Add Selector Gadget extension in chrome

We are using this to get selector for shortcuts so that we don’t have to go to repeatedly in INSPECT tab
Step 3:
• Now go to your desired website, and click on extension . After that ,click on desired link which you want
turns YELLOW and double click on what you don’t want turns RED
• On right-bottom corner, you have something like this

• 273 here is total scraped item


Step 4: There are two types of WAITS:

1.Implicitly_wait- tells the driver to wait for seconds until it locate the element
Eg. Driver.implictly_waits(10)

# implicit wait
driver.implicitly_wait(20)

selector="#search-panel-container .nav-link"
links=driver.find_elements(By.CSS_SELECTOR, selector)
2.Explicitly_wait -makes WebDriver wait for a certain condition to occur before proceeding further
with execution
Eg: time.sleep(10)

# NOW optional explicit wait,

links=WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,
selector)))

Remember, there is chance of getting error when you run above links as parenthesis after until wants
one argument, but sometimes it shows two arguments. There is ERROR, so check there should be
three closing parenthesis. And selector without quotes.
Step 5: Now , we have to click on links

1. Write links.click()
2. Click on school_name and INSPECT tab--> copy school title
which is a class tag
3. CTRL +F now paste it after putting . school-title h1
4. Since check it should be one of its kind ,so this is our selector
name
Ques: Whenever you have “stale-element-reference-exception”

Ans: it means browser is not getting any page.


Because whenever we click on link , it move to another page and previous link[memory] becomes
Empty.

So we have to be inside the loop

for i in links[:2]:
# print(link.text)
links=WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,
selector)))
links[i].click()
Step 6: Here,

driver.find_element(By.XPATH,'//div[text()="Physical Address"]/following-sibling::div')
Here we are specific with div element that contains text =physical address

Or
driver.find_element(By.XPATH,'//*[text()="Physical Address"]/following-sibling::*')
Where * means any element
Step 7: making the chrome browser headless

# used to hide chrome visibility(headless mode)


options=ChromeOptions()
options.headless=True

Step 8: To save the output

with open('ntschools_data.csv','w', newline='', encoding='utf-8') as f:


writer=csv.DictWriter(f, fieldnames=['name','physical_add','postal_add','phone_no'])
writer.writeheader()
writer.writerows(results)

Here ntschools_data.csv=output file, newline means beginning of excel sheet, ‘w’ means writing mode
FINAL CODE:
from selenium import webdriver
from selenium.webdriver import Chrome, ChromeOptions
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait


from selenium.webdriver.support import expected_conditions as EC
import time
import csv
# used to hide chrome visibility(headless mode)
options=ChromeOptions()
options.add_argument('--headless')
# options.headless=True
options.add_argument('--disable-gpu')

s=Service("D:/DataScientist/WebScraperPractical/chromedriver-win64/chromedriver.exe")
driver=webdriver.Chrome(service=s, options=options)
driver.get("https://directory.ntschools.net/#/schools")
selector="#search-panel-container .nav-link"
# NOW optional explicit wait,
links=WebDriverWait(driver,30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, selector)))
school_name_selector=".school-title h1"
results=[]
# for i in range(len(links))
for i in range(3):
# print(link.text)
links=WebDriverWait(driver,30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, selector)))
links[i].click()

name_e=WebDriverWait(driver,30).until(EC.presence_of_element_located((By.CSS_SELECTOR,
school_name_selector)))
# print(name_e.text)
details={
'name':name_e.text,
'physical_add':driver.find_element(By.XPATH,'//div[text()="Physical Address"]/following-sibling::div').text,
'postal_add':driver.find_element(By.XPATH,'//div[text()="Postal Address"]/following-sibling::div').text,
'phone_no':driver.find_element(By.XPATH,'//div[text()="Phone"]/following-sibling::*/a').text,

results.append(details)
driver.back() #goes one step back in browser history

# print(results)
with open('ntschools_data1.csv','w', newline='', encoding='utf-8') as f:
writer=csv.DictWriter(f, fieldnames=['name','physical_add','postal_add','phone_no'])
writer.writeheader()
writer.writerows(results)

driver.quit()

You might also like