What is Web Scraping?
When a script pretends to be a browser and retrieves web pages to extract information. Mainly web scraping refers to the extraction of data from a website.
For example, search engines, Google, etc scrape web pages, but we call that “web-crawling”.
In Python for web scraping we can use Beautiful Soup, package for parsing HTML and XML documents.
Beautiful Soup (HTML parser)
What is PIP?
PIP is a package manager for Python packages. If you have Python version 3.4 or later, PIP is included by default.
How to install Beautiful Soup?
BeautifulSoup is not a standard python library, so we need to install it first, before use it.
To install Beautiful Soup run this:
pip install beautifulsoup4
Web Page Scraper with BeautifulSoup examples
Get all links from a web page
#Get all links from a web page import urllib.request, urllib.parse, urllib.error #import BeautifulSoup library from bs4 import BeautifulSoup #Ignore SSL certificate errors import ssl ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter url: ') #read() read all document. open() read line by line html = urllib.request.urlopen(url, context=ctx).read() soup = BeautifulSoup(html,'html.parser') #Get all anchor tags tags = soup('a') for tag in tags: print(tag.get('href', None))
Hello there!I hope you find this post useful!
I'm Mihai, a programmer and online marketing specialist, very passionate about everything that means online marketing, focused on eCommerce.
If you have a collaboration proposal or need helps with your projects feel free to contact me. I will always be glad to help you!