Web Scraping: Writing data to a File
In the last tutorial we successfuly scraped data from a website and printed the data on the console.
But generally when we collect(extract) so much data from any website, we don't just want to print it on the console, rather we would want to write it to some file or may be insert it into a database.
In this tutorial, we will extend our last tutorial and will write the list of products fetched from the consumer reports website into a file.
We will write the data in Excel CSV Format using the csv
module of python.
Writing Data to CSV File
The first step would be to import the module csv
into our code before we start using it,
## importing csv module
import csv
If you want you can create a csv file with name product_data.csv and we will write the extracted data to that file, or the below code will also create a file:
## then we open a csv file in append mode
with open("product_data.csv", "a") as csv_file:
writer = csv.writer(csv_file)
## now we will write data to it,
## using writer.writerow() method
This how the complete code will look. The part where we extract the data is explained in the previous tutorial after which we have addded the code to write the extracted data into a csv file:
## importing bs4, requests, fake_useragent and csv modules
import bs4
import requests
from fake_useragent import UserAgent
import csv
## initializing the UserAgent object
user_agent = UserAgent()
url = "https://www.consumerreports.org/cro/a-to-z-index/products/index.htm"
## getting the reponse from the page using get method of requests module
page = requests.get(url, headers={"user-agent": user_agent.chrome})
## storing the content of the page in a variable
html = page.content
## creating BeautifulSoup object
soup = bs4.BeautifulSoup(html, "html.parser")
## div tags with crux-body-copy class
div_class = "crux-body-copy"
## getting all the divs with class 'crux-body-copy'
div_tags = soup.find_all("div", class_="div_class")
## then we open a csv file in append mode
with open("product_data.csv", "a") as csv_file:
writer = csv.writer(csv_file)
## extracting the names and links from the div tags
for tag in div_tags:
name = tag.a.text.strip()
link = tag.a['href']
## now we will write data to the file
writer.writerow([name, link])
Try to run this code in you machine and if you face any issue you can post your questions here: Studytonight Q & A Forum