0% found this document useful (0 votes)
5 views

Web Scraping

Web scraping

Uploaded by

Zahabiya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
5 views

Web Scraping

Web scraping

Uploaded by

Zahabiya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 4
Web Scraping Suppose you want some information from a website? Let's say a paragraph on a topic What do you do? Well, you can copy and paste the information from Wikipedia to your own file, But what if you want to get large amounts of information from a website as quickly as possible? Such as large amounts of data from a website to train a Machine Learning algorithm? In such a situation, copying and pasting will not work! And that’s when you'll need to use Web Scraping. Web scraping uses intelligence automation methods to get thousands or even millions of data sets in a smaller amount of time, Web scraping is an automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database so that it can be used in various applications. There are many different ways to perform web scraping to obtain data from websites. These include using online services, particular API’s or even creating your code for web scraping from scratch. Many large websites, like Google, Twitter, Facebook, StackOverflow, etc. have API’s that allow you to access their data in a structured format. This is the best option, but there are other sites that don’t allow users to access large amounts of data in a structured form or they are simply not that technologically advanced. In that situation, it’s best to use Web Scraping to scrape the website for data. The basics of web scraping The web scrapping consists of two parts: a web crawler and a web scraper. In simple words, the web crawler is a horse, and the scrapper is the chariot. The crawler leads the scrapper and extracts the requested data, Let’s understand about these two components of web scrapping: The crawler A web crawler is generally called a "spider." It is an artificial intelligence technology that browses the internet to index and searches for the content by given links. It searches for the relevant information asked by the programmer The serapper © A web scraper is a dedicated tool that is designed to extract the data from several websites quickly and effec ely. Web scrapers vary widely in design and complexity, depending on the projects. How does Web Scrapping work? These are the following steps to perform web scraping. Let’s understand the working of web scraping, Step -1: Find the URL that you want to scrape t, you should understand the requirement of data according to your project. A webpage or website contains a large amount of information. That's why scrap only relevant information. In simple words, the developer should be familiar with the data requirement. Step - 2: Inspecting the Page The data is extracted in raw HTML format, which must be carefully parsed and reduce the noise from the raw data, In some cases, data can be simple as name and address or as complex as high dimensional weather and stock market dat Step - 3: Write the code Write a code to extract the information, provide relevant information, and run the code. Step - 4: Store the data in the file Store that information in required csv, xml, JSON file format Why Web Scrapping? As we have discussed above, web scrapping is used to extract the data from websites. But we should know how to use that raw data. That raw data can be used in various fields. Let's have a look at the usage of web scrapping: o Dynamic Price Monitoring It is widely used to collect data from several online shopping sites and compare the prices of products and make profitable pricing decisions. Price monitoring using web scrapped data gives the ability to the companies to know the market condition and facilitate dynamic pricing. It ensures the companies they always outrank others. o Market Research Web Scrapping is perfectly appropriate for market trend analysis. It is gaining insights into a particular market. The large organization requires a great deal of data, and web scrapping provides the data with a guaranteed level of reliability and accuracy. © Email Gathering Many companies use personals e-mail data for email marketing. They can target the speci audience for their marketing. News and Content Monitoring A single news cycle can create an outstanding effect or a genuine threat to your business. If your company depends on the news analysis of an organization, it frequently appears in the news. So web scraping provides the ultimate solution to monitoring and parsing the most critical stories, News articles and social media platform can directly influence the stock market. © Social Media Scrapping Web Scrapping plays an essential role in extracting data from social media websites such as Twitter, Facebook, and Instagram, to find the trending topics. o Research and Development ‘The large set of data such as general information, statistics, and temperature is scrapped from websites, which is analyzed and used to carry out surveys or research and development. Why use Python for Web Scrapping? ‘There are other popular programming languages, but why we choose the Python over other programming languages for web scraping? Below we are describing a list of Python's features that make the most useful programming language for web scrapping. o Dynamically Typed In Python, we don't need to define data types for variables; we can directly use the variable wherever it requires. It saves time and makes a task faster. Python defines its classes to identify the data type of variable. ©. Vast collection of libraries Python comes with an extensive range of libraries such as NumPy, Matplotlib, Pandas, Scipy, ete,, that provide flexibility t0 work with various purposes. It i ited for almost every ‘emerging field and also for web scrapping for extracting data and do manipulation. o Less Code The purpose of the web scrapping is to save time. But what if you spend more time in writing, the code? That's why we use Python, as it can perform a task in a few lines of code. Libraries used for Web Scraping As we know, Python is has various applications and there are different libraries for different purposes. In our further demonstration, we will be using the following libraries: * Selenium: Selenium is a web testing library. It is used to automate browser activities. + BeautifulSoup: Beautiful Soup is a Python package for parsing HTML and XML documents. It creates parse trees that is helpful to extract the data easily. * Pandas: Pandas is a library used for data manipulation and analysis. It is used to extract the data and store it in the desired format.

You might also like