
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Extract Text from a Web Page Using Selenium and Save as Text File
We can extract text from a webpage using Selenium webdriver and save it as a text file using the getText method. It can extract the text for an element which is displayed (and not hidden by CSS).
We have to locate the element on the page using any of the locators like id, class, name, xpath, css, tag name, link text or partial link text. Once the text is obtained, we shall write its content to a file with the help of File class.
Let us obtain the text – You are browsing the best resource for Online Education from the below page −
Example
import org.openqa.selenium.By; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.firefox.FirefoxDriver; import java.util.concurrent.TimeUnit; import java.io.File; import java.io.IOException; import org.apache.commons.io.FileUtils; import java.nio.charset.Charset; public class GetTxtSaveFile{ public static void main(String[] args) { System.setProperty("webdriver.gecko.driver", "C:\Users\ghs6kor\Desktop\Java\geckodriver.exe"); WebDriver driver = new FirefoxDriver(); //implicit wait driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS); //URL launch driver.get("https://www.tutorialspoint.com/index.htm"); // identify element WebElement e = driver.findElement(By.tagName("h4")); //obtain text String s = e.getText(); //write text to file File f = new File("savetxt.txt"); try{ FileUtils.writeStringToFile(f, s, Charset.defaultCharset()); }catch(IOException exc){ exc.printStackTrace(); } driver.quit(); } }
Output
The savetxt.txt file gets generated within the project which captures the text from the page.
Advertisements