Content Aggregator Python Project
What is a Content Aggregator, and How Does it Work?
A content aggregator is similar to a menu at a restaurant. It provides you with a variety of articles or press releases with a single click on their website, so you don't have to go through many sources of information to get what you need. Content aggregators gather multiple articles and press releases, as well as other media such as blog posts, news articles, product descriptions, and so on, into one convenient location and make them all available for viewing, so you no longer have to go from site to site looking for news that's worth reading!
Requirements for Content Aggregator Python Project
- Django
- BeatifulSoup
- Module for Requests
Note: We'll be building this whole project in Django, a Python framework designed exclusively for web development. While I'll explain what I'm doing as we go, it's better if you're familiar with it beforehand.
Getting the Project Started
To begin, we'll need to install the Django framework, which can be done by going to:
pip install django
Run the following command to begin the project:
django-admin startproject content_aggregator
After you've done the preceding command, navigate to the project directory and enter the following command to create a Django application:
cd content_aggregator #you can go to the project directory using this command
python manage.py startapp aggregator
Using the command python manage.py, you may go to the project directory. aggregation of startapps
Using an IDE, go to the content aggregator folder, open content aggregator/settings.py, and add the application's name to the "INSTALLED APPS" section, as seen below:
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'aggregator' #<-- here
]
To make your templates operate correctly, include the TEMPLATES directory as well.
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [os.path.join(BASE_DIR,'templates')], #<-- here
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
Make the following modifications to the content aggregator/urls.py file:
from django.contrib import admin
from django.urls import path, include
urlpatterns = [
path('admin/', admin.site.urls),
path('/', include('aggregator.urls')),
]
Scraping the internet
We created our own site scraping mechanism to gather the data for aggregation. Scraping data from an existing website is known as web scraping. We used the requests and beautifulsoup modules to scrape the webpage. When it comes to crawling or scraping websites for information, these modules are quite useful. Using these two Python modules, we will extract articles from the Times of India and theonion in this case.
We may begin by going to theonion or any other website you choose; the technique will be the same.
To proceed, follow these steps:
- Open the website and go to developer tools by hitting F12 or browsing via the side menu. Developer tools should appear on the right side or below on the browser.
- Press the Ctrl (or command) + Shift+C keys, or click the arrow-shaped button in the upper left corner.
- Navigate to the article's container, which in most instances will be a div, and click on it. It will appear on the right side, where you can see its class.
Writing the points of view
This is where we'll do the most of the code; we'll import, manage, and establish up how things will operate in this project.
You may execute the following scripts to install the two Python modules we discussed before, requests and beautifulsoup:
pip install bs4
pip install requests
We may begin working on the views after installing both packages:
import requests
from django.shortcuts import render
from bs4 import BeautifulSoup
toi_r = requests.get("https://timesofindia.indiatimes.com/briefs")
toi_soup = BeautifulSoup(toi_r.content, "html.parser")
toi_headings = toi_soup.find_all('h2')
toi_headings = toi_headings[0:-13] # removing footers
toi_news = []
for th in toi_headings:
toi_news.append(th.text)
#Getting news from theonion
ht_r = requests.get("https://www.theonion.com/")
ht_soup = BeautifulSoup(ht_r.content, "html.parser")
ht_headings = ht_soup.find_all('h4')
ht_headings = ht_headings[2:]
ht_news = []
for hth in ht_headings:
ht_news.append(hth.text)
def index(req):
return render(req, 'index.html', {'toi_news':toi_news, 'ht_news': ht_news})
Templates for Writing
The next step is to create a templates directory and a file called indet.html, which should look like this.
<!DOCTYPE html>
<html>
<head>
<title></title>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">
</head>
<body>
<div class="jumbotron">
<center><h1>Content Aggregator</h1>
<a href="/" class="btn btn-danger">Refresh News</a>
</form>
</center>
</div>
<div class="container">
<div class="row">
<div class="col-6">
<h3 class="text-centre"> News from Times of india</h3>
{% for n in toi_news %}
<h5> - {{n}} </h5>
<hr>
{% endfor %}
<br>
</div>
<div class="col-6">
<h3 class="text-centre">News from theonion </h3>
{% for htn in ht_news %}
<h5> - {{htn}} </h5>
<hr>
{% endfor %}
<br>
</div>
</div>
</div>
<script
src="http://code.jquery.com/jquery-3.3.1.min.js"
integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8="
crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js" integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q" crossorigin="anonymous"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js" integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl" crossorigin="anonymous"></script>
</body>
</html>
Output
Now that everything is finished, we can begin working on the project, but first we must do a few tasks. To build the whole project, run both of the following commands:
python manage.py makemigrations
python manage.py migrate
We may now begin the project by:
python3 manage.py runserver
And you should get something like this as a response:
Final Thoughts
That's all; you now have a basic Python news aggregator site with two sites. It may easily be adjusted to combine the headlines of any two websites on one page. By altering the code, you may add your own websites to the list. Scraping additional data, such as URLs and photos, may also be used to enhance functionality. This will help you improve your abilities and understand how the internet works.