What is Web Scraping?

Web scraping is a method to automatically obtain data from a website. You can even use it in your job search process to fetch advertisements 😉.

Is web scraping legal?

The rules for web scraping are mostly unclear but Yes web scraping is legal, as long as you don't break any T&C.

For example, In 2019 the case filed by Linkedin asking HiQ to stop scraping its website was won by HiQ. Learn More

Commonly used tools & libraries for web scraping

BeautifulSoup4
Selenium ( We will be using this )
Scrapy

We will use Selenium as it is easy to learn and can fetch links from Dynamic site.

Prerequisites for this project

Basic knowledge of Python

What will we create?

We will create a basic scrapper that fetches links of all images for us from a list of webpages and saves them in a txt file.

Let's Start

Setting up libraries and Driver

We will need selenium library for it

For this project we will be using Chromium WebDriver. Download it from here. We will also need Chrome so download it too.

Setting up files

We need to create three files.

scrapper.py
links.txt -> Will be used to store scraped links

Importing libraries

Open scrapper.py and add below code:

The first line from above code imports webdriver which will control our browser. The second line imports options for Chrome as Options.

Next, add the following lines to our code

We first initialize Options for our webdriver as options. Next, we describe our window size with window-size=x,y.

Now attach these options to your driver by adding options and also add driver path with executable_path

We are adding a test site in our code. Please, don't scrape and spam any website without first reading its Terms & Conditions. Create a list of sources with sources variable. Also, create a writer for links.txt.

Next, create a for-loop for fetching all the images.

The above code works like this :

Pick a link from sources list as sourceLink.
Fetch the source with driver.get().
Find tag img and save it to elem_link.
For each src link in elem_link add it to links.txt.
Now quit the driver and save links.txt.

Our scrapper.py should look like this now

Done! Now run it. Open your links.txt file to view links for images. Like 💖 if you found this helpful.

A Quick guide to Web Scraping with Python and Selenium.