Project 2 EmailbySeleniumSameProject
Project 2 EmailbySeleniumSameProject
Data to be extracted:
• Name of schools
• Telephone number
• Email Address
• Physical and Postal Address
METHOD:
• Pip install selenium
• Ans:
• #driver=Chrome(executable_path="D:/DataScientist/WebScraperPractical/chromedriver-win64/
chromedriver.exe")
• # executable path is removed , so use SERVICE instead
• s=Service("D:/DataScientist/WebScraperPractical/chromedriver-win64/chromedriver.exe")
• driver=webdriver.Chrome(service=s)
We are using this to get selector for shortcuts so that we don’t have to go to repeatedly in INSPECT tab
Step 3:
• Now go to your desired website, and click on extension . After that ,click on desired link which you want
turns YELLOW and double click on what you don’t want turns RED
• On right-bottom corner, you have something like this
1.Implicitly_wait- tells the driver to wait for seconds until it locate the element
Eg. Driver.implictly_waits(10)
# implicit wait
driver.implicitly_wait(20)
selector="#search-panel-container .nav-link"
links=driver.find_elements(By.CSS_SELECTOR, selector)
2.Explicitly_wait -makes WebDriver wait for a certain condition to occur before proceeding further
with execution
Eg: time.sleep(10)
links=WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,
selector)))
Remember, there is chance of getting error when you run above links as parenthesis after until wants
one argument, but sometimes it shows two arguments. There is ERROR, so check there should be
three closing parenthesis. And selector without quotes.
Step 5: Now , we have to click on links
1. Write links.click()
2. Click on school_name and INSPECT tab--> copy school title
which is a class tag
3. CTRL +F now paste it after putting . school-title h1
4. Since check it should be one of its kind ,so this is our selector
name
Ques: Whenever you have “stale-element-reference-exception”
for i in links[:2]:
# print(link.text)
links=WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,
selector)))
links[i].click()
Step 6: Here,
driver.find_element(By.XPATH,'//div[text()="Physical Address"]/following-sibling::div')
Here we are specific with div element that contains text =physical address
Or
driver.find_element(By.XPATH,'//*[text()="Physical Address"]/following-sibling::*')
Where * means any element
Step 7: making the chrome browser headless
Here ntschools_data.csv=output file, newline means beginning of excel sheet, ‘w’ means writing mode
FINAL CODE:
from selenium import webdriver
from selenium.webdriver import Chrome, ChromeOptions
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
s=Service("D:/DataScientist/WebScraperPractical/chromedriver-win64/chromedriver.exe")
driver=webdriver.Chrome(service=s, options=options)
driver.get("https://directory.ntschools.net/#/schools")
selector="#search-panel-container .nav-link"
# NOW optional explicit wait,
links=WebDriverWait(driver,30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, selector)))
school_name_selector=".school-title h1"
results=[]
# for i in range(len(links))
for i in range(3):
# print(link.text)
links=WebDriverWait(driver,30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, selector)))
links[i].click()
name_e=WebDriverWait(driver,30).until(EC.presence_of_element_located((By.CSS_SELECTOR,
school_name_selector)))
# print(name_e.text)
details={
'name':name_e.text,
'physical_add':driver.find_element(By.XPATH,'//div[text()="Physical Address"]/following-sibling::div').text,
'postal_add':driver.find_element(By.XPATH,'//div[text()="Postal Address"]/following-sibling::div').text,
'phone_no':driver.find_element(By.XPATH,'//div[text()="Phone"]/following-sibling::*/a').text,
results.append(details)
driver.back() #goes one step back in browser history
# print(results)
with open('ntschools_data1.csv','w', newline='', encoding='utf-8') as f:
writer=csv.DictWriter(f, fieldnames=['name','physical_add','postal_add','phone_no'])
writer.writeheader()
writer.writerows(results)
driver.quit()