Search for the Article

How to Scrape Multiple URLs using Web Scraper

Web scraping is a technique or process of extracting content and data from websites using bots. This is an effective way of stealing content from any site. There are primarily 2 methods which is used for scraping.

  1. Manual Scraping
  2. Automate Scraping

Manual scraping involves copying and pasting the content from web pages, which takes a lot of effort and is highly repetitive in the way it is carried out. Automate Web Scraping saves cost and time as it reduces the time involved in the data extraction task. There are many tools and plugins available for web scraping. Some of them are ParseHub, Scrapy, OctoParse, Scraper API, Mozenda, Webhose.io, Content Grabber, Common Crawl etc.. 

Most of them are costly.

But I used web scraper plugin for scarping related task. Its free and works like premium.

Web Scraper

Web Scraper utilizes a modular structure that is made of selectors, which instruct the scraper on how to traverse the target site and what data to extract.

Benefits of using Web Scraper

  1. Simple & easy to use & install a browsers extension.
  2. Totally free of cost.
  3. A big community of user database where you can find answers to all your queries.(https://forum.webscraper.io/)

How to scrape elements of a Page though Web Scrapper Tool

Suppose I want to extract all the website from this page into an excel. (https://sitesalike.blogspot.com/2014/04/free-infographics-submission-sites-list.html)



How to Scrape Multiple URLs using Web Scraper

You can do this by adding multiple URLs into sitemap file.

Here is how your a basic sitemap file looks, you have to just paste the URLs after the one URL in right format. 

Sample Sitemap file:

  
  { 
"_id":"sitemap", 
"startUrl":[ 
"https://yourscrapingwebsiteurl.com/" 
],"selectors":[] 
}
  
  

Added multiple URLs:



  {
"_id":"sitemapfile","startUrl":[
"https://www.yourscrapingwebsiteurl.com/pageurl1",
"https://www.yourscrapingwebsiteurl.com/pageurl2",
"https://www.yourscrapingwebsiteurl.com/pageurl3",
"https://www.yourscrapingwebsiteurl.com/pageurl4",
"https://www.yourscrapingwebsiteurl.com/pageurl5"


],"selectors":[]

}

that's it. If you face issue with the code, please let me know below in comments, I will help you !