Remove URLs from string in Python
Last Updated :
24 Jan, 2024
A regular expression (regex) is a sequence of characters that defines a search pattern in text. To remove URLs from a string in Python, you can either use regular expressions (regex) or some external libraries like urllib.parse. The re-module in Python is used for working with regular expressions. In this article, we will see how we can remove URLs from a string in Python.
Python Remove URLs from a String
Below are the ways by which we can remove URLs from a string in Python:
- Using the re.sub() function
- Using the re.findall() function
- Using the re.search() function
- Using the urllib.parse class
Python Remove URLs from String Using re.sub() function
In this example, the code defines a function 'remove_urls' to find URLs in text and replace them with a placeholder [URL REMOVED], using regular expressions for pattern matching and the re.sub() method for substitution.
Python3
import re
def remove_urls(text, replacement_text="[URL REMOVED]"):
# Define a regex pattern to match URLs
url_pattern = re.compile(r'https?://\S+|www\.\S+')
# Use the sub() method to replace URLs with the specified replacement text
text_without_urls = url_pattern.sub(replacement_text, text)
return text_without_urls
# Example:
input_text = "Visit on GeeksforGeeks Website: https://www.geeksforgeeks.org/"
output_text = remove_urls(input_text)
print("Original Text:")
print(input_text)
print("\nText with URLs Removed:")
print(output_text)
OutputOriginal Text:
Visit on GeeksforGeeks Website: https://www.geeksforgeeks.org/
Text with URLs Removed:
Visit on GeeksforGeeks Website: [URL REMOVED]
Remove URLs from String Using re.findall() function
In this example, the Python code defines a function 'remove_urls_findall' that uses regular expressions to find all URLs using re.findall() method in a given text and replaces them with a replacement text "[URL REMOVED]".
Python3
import re
def remove_urls_findall(text, replacement_text="[URL REMOVED]"):
url_pattern = re.compile(r'https?://\S+|www\.\S+')
urls = url_pattern.findall(text)
for url in urls:
text = text.replace(url, replacement_text)
return text
# Example:
input_text = "Check out the latest Python tutorials on GeeksforGeeks: https://www.geeksforgeeks.org/category/python/"
output_text_findall = remove_urls_findall(input_text)
print("\nUsing re.findall():")
print("Original Text:")
print(input_text)
print("\nText with URLs Removed:")
print(output_text_findall)
Output:
Using re.findall():
Original Text:
Check out the latest Python tutorials on GeeksforGeeks: https://www.geeksforgeeks.org/category/python/
Text with URLs Removed:
Check out the latest Python tutorials on GeeksforGeeks: [URL REMOVED]
Remove URLs from String in Python Using re.search() function
In this example, the Python code defines a function 'remove_urls_search' using regular expressions and re.search() to find and replace URLs in a given text with a replacement text "[URL REMOVED]".
Python3
import re
def remove_urls_search(text, replacement_text="[URL REMOVED]"):
url_pattern = re.compile(r'https?://\S+|www\.\S+')
while True:
match = url_pattern.search(text)
if not match:
break
text = text[:match.start()] + replacement_text + text[match.end():]
return text
# Example:
input_text = "Visit our website at https://geeksforgeeks.org/ for more information. Follow us on Twitter: @geeksforgeeks"
output_text_search = remove_urls_search(input_text)
print("\nUsing re.search():")
print("Original Text:")
print(input_text)
print("\nText with URLs Removed:")
print(output_text_search)
Output:
Using re.search():
Original Text:
Visit our website at https://geeksforgeeks.org/ for more information. Follow us on Twitter: @geeksforgeeks
Text with URLs Removed:
Visit our website at [URL REMOVED] for more information. Follow us on Twitter: @geeksforgeeks
Remove URLs from String Using urllib.parse
In this example, the Python code defines a function 'remove_urls_urllib' that uses urllib.parse to check and replace URLs in a given text with a replacement text "[URL REMOVED]".
Python3
# Using urllib.parse
from urllib.parse import urlparse
def remove_urls_urllib(text, replacement_text="[URL REMOVED]"):
words = text.split()
for i, word in enumerate(words):
parsed_url = urlparse(word)
if parsed_url.scheme and parsed_url.netloc:
words[i] = replacement_text
return ' '.join(words)
# Example:
input_text = "Check out the GeeksforGeeks website at https://www.geeksforgeeks.org/ for programming tutorials."
output_text_urllib = remove_urls_urllib(input_text)
print("Using urllib.parse:")
print("Text with URLs Removed:")
print(output_text_urllib)
OutputUsing urllib.parse:
Text with URLs Removed:
Check out the GeeksforGeeks website at [URL REMOVED] for programming tutorials.
Similar Reads
Python Tutorial | Learn Python Programming Language Python Tutorial â Python is one of the most popular programming languages. Itâs simple to use, packed with features and supported by a wide range of libraries and frameworks. Its clean syntax makes it beginner-friendly.Python is:A high-level language, used in web development, data science, automatio
10 min read
Python Interview Questions and Answers Python is the most used language in top companies such as Intel, IBM, NASA, Pixar, Netflix, Facebook, JP Morgan Chase, Spotify and many more because of its simplicity and powerful libraries. To crack their Online Assessment and Interview Rounds as a Python developer, we need to master important Pyth
15+ min read
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Python OOPs Concepts Object Oriented Programming is a fundamental concept in Python, empowering developers to build modular, maintainable, and scalable applications. By understanding the core OOP principles (classes, objects, inheritance, encapsulation, polymorphism, and abstraction), programmers can leverage the full p
11 min read
Python Projects - Beginner to Advanced Python is one of the most popular programming languages due to its simplicity, versatility, and supportive community. Whether youâre a beginner eager to learn the basics or an experienced programmer looking to challenge your skills, there are countless Python projects to help you grow.Hereâs a list
10 min read
Python Exercise with Practice Questions and Solutions Python Exercise for Beginner: Practice makes perfect in everything, and this is especially true when learning Python. If you're a beginner, regularly practicing Python exercises will build your confidence and sharpen your skills. To help you improve, try these Python exercises with solutions to test
9 min read
Python Programs Practice with Python program examples is always a good choice to scale up your logical understanding and programming skills and this article will provide you with the best sets of Python code examples.The below Python section contains a wide collection of Python programming examples. These Python co
11 min read
Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Class Diagram | Unified Modeling Language (UML) A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
Enumerate() in Python enumerate() function adds a counter to each item in a list or other iterable. It turns the iterable into something we can loop through, where each item comes with its number (starting from 0 by default). We can also turn it into a list of (number, item) pairs using list().Let's look at a simple exam
3 min read