Java Program to Extract an HTML Tag from a String using RegEx
Last Updated :
31 Jan, 2023
In this article, we will find/extract an HTML tag from a string with help of regular expressions. The Regular Expression Regex or Rational Expression is simply a character sequence that specifies a search pattern in a particular text. It can contain a single character or it can have a complex sequence of characters. In Java, there is no class for Regular expressions but it uses the java.util.regex package in order to deal with regular expressions. The following classes are present inside the java.util.regex package:
- Pattern Class: The object of the Pattern class is the compiled form of the Regular Expression. Pattern doesn't have any public constructor so it makes use of the compile method which is a static method to create the pattern object.
- Matcher Class: As the name suggests this class object matches the pattern of the input string with the pattern class object. Similar to the pattern class, the Matcher class also does not have any public constructor so we get the Matcher object with the help of the matcher method present in the Pattern class.
We are using the above classes to find and extract the content within the HTML tags.
Steps to Be Followed
Step 1:
Import necessary library and classes - java.util.regex.Matcher, java.util.regex.Pattern;
Step 2:
Create an object of the pattern class and pass the regular expression which will represent the required HTML tag as a parameter to compile the function.
Step 3:
Using the matcher() function match it with the string from which we want to detect the HTML tags.
Step 4:
Use the find method under the Matcher class to see if any instance of the HTML tag is present inside the provided string or not.
Step 5:
If any instance of the HTML tag is present inside the given string the using group() function under the Matcher class is to retrieve the matched instance of the string.
Code
Java
// importing necessary library and classes
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args)
{
String str = "Learning from <h1>GeeksforGeeks<h1>";
// pattern object creation
Pattern pattern = Pattern.compile("<h1>(/S+)</h1>");
// now the above compiled pattern will be checked
// for its availability in target string
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
String instr = matcher.group(1);
System.out.println(instr);
}
}
}
Output:
GeeksforGeeks
Time complexity : O(n), where n is the length of the input string "str".
Space complexity :O(1), as the space used by the program remains constant regardless of the size of the input string.
Similar Reads
Java Program to Extract a Single Quote Enclosed String From a Larger String using Regex Problem Statement: Given a String extract the substring enclosed in single quotes (') using Java Regex. Java regex is an API for pattern matching with regular expression. 'java.util.regex' is a class used for CharSequence< interface in order to support matching against characters from a wide vari
3 min read
Java Program to Convert String to String Array Using Regular Expression Regular Expressions or Regex (in short) is an API for defining String patterns that can be used for searching, manipulating, and editing a string in Java. Email validation and passwords are few areas of strings where Regex is widely used to define the constraints. Regular Expressions are provided un
3 min read
Java Program to Search a Particular Word in a String Using Regex In Java string manipulation, searching for specific words is a fundamental task. Regular expressions (regex) offer a powerful and flexible approach to achieve this search. It matches the patterns simply by comparing substrings. In this article, we will learn how to search for a particular word in a
3 min read
Java Program to Extract Content from a HTML document HTML is the core of the web, all the pages you see on the internet are HTML, whether they are dynamically generated by JavaScript, JSP, PHP, ASP, or any other web technology. Your browser actually parses HTML and render it for you But if we need to parse an HTML document and find some elements, tags
4 min read
How to Extract Domain Name From Email Address using Java? Given some Strings as Email addresses, the task is to extract the domain name from it Examples: Input: test_str = â[email protected]â Output: geeksforgeeks.org Explanation: Domain name, geeksforgeeks.org extracted. Input: test_str = â[email protected]â Output: gmail.com Explanation: Domain name, g
3 min read
Java Program to Extract Content from a Java's .class File In this article, we are going to extract the contents of the Java class file using the Apache Tika library. Apache Tika is used for document type detection and content extraction from various file formats. It uses various document parsers and document type detection techniques to detect and extract
2 min read