How to Scrape Web Pages with Beautiful Soup and Python?

Scrape Web Pages with Beautiful Soup

What is Web Scraping?

When a script pretends to be a browser and retrieves web pages to extract information. Mainly web scraping refers to the extraction of data from a website.
For example, search engines, Google, etc scrape web pages, but we call that “web-crawling”.

In Python for web scraping we can use Beautiful Soup, package for parsing HTML and XML documents.

Beautiful Soup (HTML parser)

What is PIP?

PIP is a package manager for Python packages. If you have Python version 3.4 or later, PIP is included by default.

How to install Beautiful Soup?

BeautifulSoup is not a standard python library, so we need to install it first, before use it.
To install Beautiful Soup run this:

pip install beautifulsoup4

Web Page Scraper with BeautifulSoup examples

Get all links from a web page

#Get all links from a web page
import urllib.request, urllib.parse, urllib.error

#import BeautifulSoup library
from bs4 import BeautifulSoup

#Ignore SSL certificate errors
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter url: ')
#read() read all document. open() read line by line
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html,'html.parser')

#Get all anchor tags
tags = soup('a')
for tag in tags:
    print(tag.get('href', None))

