Signup/Sign In

Content Aggregator Python Project

Content Aggregator Python Project

What is a Content Aggregator, and How Does it Work?

A content aggregator is similar to a menu at a restaurant. It provides you with a variety of articles or press releases with a single click on their website, so you don't have to go through many sources of information to get what you need. Content aggregators gather multiple articles and press releases, as well as other media such as blog posts, news articles, product descriptions, and so on, into one convenient location and make them all available for viewing, so you no longer have to go from site to site looking for news that's worth reading!

Requirements for Content Aggregator Python Project

  • Django
  • BeatifulSoup
  • Module for Requests

Note: We'll be building this whole project in Django, a Python framework designed exclusively for web development. While I'll explain what I'm doing as we go, it's better if you're familiar with it beforehand.

Getting the Project Started

To begin, we'll need to install the Django framework, which can be done by going to:

pip install django

Run the following command to begin the project:

django-admin startproject content_aggregator

After you've done the preceding command, navigate to the project directory and enter the following command to create a Django application:

cd content_aggregator  #you can go to the project directory using this command
python manage.py startapp aggregator

Using the command python manage.py, you may go to the project directory. aggregation of startapps

Using an IDE, go to the content aggregator folder, open content aggregator/settings.py, and add the application's name to the "INSTALLED APPS" section, as seen below:

INSTALLED_APPS = [
  'django.contrib.admin',
  'django.contrib.auth',
  'django.contrib.contenttypes',
  'django.contrib.sessions',
  'django.contrib.messages',
  'django.contrib.staticfiles',
  'aggregator'  #<-- here
]

To make your templates operate correctly, include the TEMPLATES directory as well.

TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': [os.path.join(BASE_DIR,'templates')],  #<-- here
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                'django.template.context_processors.debug',
                'django.template.context_processors.request',
                'django.contrib.auth.context_processors.auth',
                'django.contrib.messages.context_processors.messages',
            ],
        },
    },
]

Make the following modifications to the content aggregator/urls.py file:

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path('admin/', admin.site.urls),
    path('/', include('aggregator.urls')),
]

Scraping the internet

We created our own site scraping mechanism to gather the data for aggregation. Scraping data from an existing website is known as web scraping. We used the requests and beautifulsoup modules to scrape the webpage. When it comes to crawling or scraping websites for information, these modules are quite useful. Using these two Python modules, we will extract articles from the Times of India and theonion in this case.

We may begin by going to theonion or any other website you choose; the technique will be the same.

To proceed, follow these steps:

  • Open the website and go to developer tools by hitting F12 or browsing via the side menu. Developer tools should appear on the right side or below on the browser.
  • Press the Ctrl (or command) + Shift+C keys, or click the arrow-shaped button in the upper left corner.
  • Navigate to the article's container, which in most instances will be a div, and click on it. It will appear on the right side, where you can see its class.

Writing the points of view

This is where we'll do the most of the code; we'll import, manage, and establish up how things will operate in this project.

You may execute the following scripts to install the two Python modules we discussed before, requests and beautifulsoup:

pip install bs4
pip install requests

We may begin working on the views after installing both packages:

import requests
from django.shortcuts import render
from bs4 import BeautifulSoup 
toi_r = requests.get("https://timesofindia.indiatimes.com/briefs")
toi_soup = BeautifulSoup(toi_r.content, "html.parser")

toi_headings = toi_soup.find_all('h2')

toi_headings = toi_headings[0:-13] # removing footers

toi_news = []

for th in toi_headings:
    toi_news.append(th.text)



#Getting news from theonion

ht_r = requests.get("https://www.theonion.com/")
ht_soup = BeautifulSoup(ht_r.content, "html.parser")
ht_headings = ht_soup.find_all('h4')
ht_headings = ht_headings[2:]
ht_news = []

for hth in ht_headings:
    ht_news.append(hth.text)


def index(req):
    return render(req, 'index.html', {'toi_news':toi_news, 'ht_news': ht_news})

Templates for Writing

The next step is to create a templates directory and a file called indet.html, which should look like this.

<!DOCTYPE html>
<html>
<head>
    <title></title>
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">
</head>
<body>
    <div class="jumbotron">
        <center><h1>Content Aggregator</h1>
          <a href="/" class="btn btn-danger">Refresh News</a>
        </form>
    </center>
    </div>
    <div class="container">
        <div class="row">
            <div class="col-6">
                    <h3 class="text-centre"> News from Times of india</h3>
                    {% for n in toi_news %}
                    <h5> -  {{n}} </h5>
                    <hr>
                    {% endfor %}
                    <br>
            </div>
            <div class="col-6">
                    <h3 class="text-centre">News from theonion </h3>
                    {% for htn in ht_news %}
                    <h5> - {{htn}} </h5>
                    <hr>
                    {% endfor %}
                    <br>
            </div>
        </div>


</div>
    <script
src="http://code.jquery.com/jquery-3.3.1.min.js"
integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8="
    crossorigin="anonymous"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js" integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q" crossorigin="anonymous"></script>
    <script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js" integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl" crossorigin="anonymous"></script>
</body>
</html>

Output

Now that everything is finished, we can begin working on the project, but first we must do a few tasks. To build the whole project, run both of the following commands:

python manage.py makemigrations
python manage.py migrate

We may now begin the project by:

python3 manage.py runserver

And you should get something like this as a response:

Content aggrergator

Final Thoughts

That's all; you now have a basic Python news aggregator site with two sites. It may easily be adjusted to combine the headlines of any two websites on one page. By altering the code, you may add your own websites to the list. Scraping additional data, such as URLs and photos, may also be used to enhance functionality. This will help you improve your abilities and understand how the internet works.



About the author:
Adarsh Kumar Singh is a technology writer with a passion for coding and programming. With years of experience in the technical field, he has established a reputation as a knowledgeable and insightful writer on a range of technical topics.