
A Quick guide to Web Scraping with Python and Selenium.
18 November, 2022
2
2
0
Contributors
What is Web Scraping?
Web scraping is a method to automatically obtain data from a website. You can even use it in your job search process to fetch advertisements 😉.
Is web scraping legal?
The rules for web scraping are mostly unclear but Yes web scraping is legal, as long as you don't break any T&C.
For example, In 2019 the case filed by Linkedin asking HiQ to stop scraping its website was won by HiQ. Learn More
Commonly used tools & libraries for web scraping
- BeautifulSoup4
- Selenium ( We will be using this )
- Scrapy
We will use Selenium as it is easy to learn and can fetch links from Dynamic site.
Prerequisites for this project
- Basic knowledge of Python
What will we create?
We will create a basic scrapper that fetches links of all images for us from a list of webpages and saves them in a txt file.
Let's Start
Setting up libraries and Driver
We will need selenium library for it
For this project we will be using Chromium WebDriver. Download it from here. We will also need Chrome so download it too.
Setting up files
We need to create three files.
- scrapper.py
- links.txt -> Will be used to store scraped links
Importing libraries
Open scrapper.py
and add below code:
The first line from above code imports webdriver which will control our browser. The second line imports options for Chrome as Options.
Next, add the following lines to our code
We first initialize Options for our webdriver as options. Next, we describe our window size with window-size=x,y
.
Now attach these options to your driver by adding options and also add driver path with executable_path
We are adding a test site in our code. Please, don't scrape and spam any website without first reading its Terms & Conditions.
Create a list of sources with sources
variable. Also, create a writer for links.txt
.
Next, create a for-loop for fetching all the images.
The above code works like this :
- Pick a link from
sources
list assourceLink
. - Fetch the source with driver.get().
- Find tag
img
and save it toelem_link
. - For each
src
link inelem_link
add it tolinks.txt
. - Now quit the driver and save
links.txt
.
Our scrapper.py
should look like this now
Done! Now run it.
Open your links.txt
file to view links for images. Like 💖 if you found this helpful.
python
webscraping
web
develevate
howto