Web Scraping: Scraping Multiple URLs
This tutorial is just to guide you about how to perform web scraping on multiple URLs together, although you would have figured it out in the hour of need.
Some of you might have already guessed, yes we will use the for
loop.
We will start with creating an array to store the URLs in it,
## array holding url values
urls = ['https://xyz.com/page/1', 'https://xyz.com/page/2']
You can have many URLs in an array. A word of advice though, do not include any URL unnecessary because whenever we make a request to any URL it costs the website owners in terms of an additional request made to their server.
And never run a web scraping script in infinite loop
Once you have created an array, start a loop from the beginning and do everything inside the loop:
## importing bs4, requests, fake_useragent and csv modules
import bs4
import requests
from fake_useragent import UserAgent
import csv
## create an array with URLs
urls = ['https://xyz.com/page/1', 'https://xyz.com/page/2']
## initializing the UserAgent object
user_agent = UserAgent()
## starting the loop
for url in urls:
## getting the reponse from the page using get method of requests module
page = requests.get(url, headers={"user-agent": user_agent.chrome})
## storing the content of the page in a variable
html = page.content
## creating BeautifulSoup object
soup = bs4.BeautifulSoup(html, "html.parser")
## Then parse the HTML, extract any data
## write it to a file
When you run multiple URLs in a script and want to write the data to a file too, make sure you store the data in form of a tuple and then write it in the file.
Next tutorial is a simple excercise where you will have to run web scraping script on Studytonight's website. Excited? Head to the next tutorial.