What is web scraping?

Web scraping is an automated method of obtaining large amounts of information from websites. The majority of this information is in the form of unstructured HTML that is converted to structured data in a spreadsheet or database before being used in various applications.

Is web scraping illegal?

Scraping data that is publicly available on the internet is legal. However, some types of data are protected by international legislation; thus, scraping personal information, intellectual property, or confidential information should be avoided.

What are more web scraping tools in Python?

Some more web scraping tools in Python are Requests, Beautiful Soup, Selenium, LXML, Urllib, etc.

Scraping E-Commerce Websites with Scrapy

Introduction

Monitoring pages, analysing site performance, ensuring the site is accessible to consumers with disabilities, and searching for optimization possibilities are all solid reasons for e-commerce business owners and managers to crawl their websites.

There are separate tools, web crawlers, and services available to assist you in monitoring your site for each of these.

While these solutions might be useful, you can design your own web crawler and monitoring system with only a little coding work.

Getting Started with Scraping

First of all, make sure that you have scrapy installed in your system. If not, execute the following command in the terminal to install scrapy.

pip install scrapy

You can also try this code with Online Python Compiler

Run Code

Now to use scrapy, we need to start a scrapy project. Navigate to a suitable directory and execute the following command in the terminal.

scrapy startproject ecomScraping

You can also try this code with Online Python Compiler

Run Code

This command will create a scrapy project with some files and boilerplate code.

For our project, we will be scraping shopclues website. More specifically, we will be scraping the following information about their headphones and watches:

Image link of the product
Name of the product
Discount of the product
Discounted price of the product

Scrapy Shell

To understand how to extract the required information, we need to dig deeper into the HTML of the page.

To explore the HTML, execute the below command in the terminal to enter into the scrapy shell.

scrapy shell www.shopclues.com/mobiles-featured-store-4g-smartphone.html

You can also try this code with Online Python Compiler

Run Code

First, we want to get the name of the product. We can get that in image attributes.

The title is stored in the title attribute of the image. So, we get a list of titles by executing the above command.

In [1]: response.css("img::attr(title)").extract()

You can also try this code with Online Python Compiler

Run Code

Output:

Practice this code with the help of Online Python Compiler
Now we want to get the discount on the product. We can do so by executing the following command.

In [3]: response.css('.prd_discount::text').extract()

You can also try this code with Online Python Compiler

Run Code

We can do the same for getting the price, discount, and image of the product.

Now coming back to scraping.

Navigate to the spider folder in the project and create a file named fetch.py (you can name it anything).

We need to fetch from the link headphone and watches.

We start by creating the spider class and naming it extractProduct. We then define the start_requests() function and write the URLs.

import scrapy
 class ExtractProduct(scrapy.Spider):
    
   name = "extractProduct"
    
   # request function
   def start_requests(self):
       urls = ['https://www.shopclues.com/search?q=watch&sc_z=2222&z=0&count=9&user_id=&user_segment=default','https://www.shopclues.com/search?q=Headphones&z=0&user_id=&user_segment=default&trend=1']
        
       for url in urls:
           yield scrapy.Request(url = url, callback = self.parse)

You can also try this code with Online Python Compiler

Run Code

We know that for extracting the information from the response, we use the below commands.

title = response.css('img::attr(title)').extract()
discount = response.css('.prd_discount::text').extract()
image = response.css('img::attr(data-img)').extract()
price = response.css('.p_price::text').extract()

You can also try this code with Online Python Compiler

Run Code

Hence final code is after adding the parse function is.

import scrapy
 class extractQuotes(scrapy.Spider):
   name = "extractProduct"
   def start_requests(self):
       urls = ['https://www.shopclues.com/search?q=watch&sc_z=2222&z=0&count=9&user_id=&user_segment=default','https://www.shopclues.com/search?q=Headphones&z=0&user_id=&user_segment=default&trend=1']
        
       for url in urls:
           yield scrapy.Request(url = url, callback = self.parse)
    def parse(self, response):
       discount = response.css('.prd_discount::text').extract()
       price = response.css('.p_price::text').extract()
       image = response.css('img::attr(data-img)').extract()
       title = response.css('img::attr(title)').extract()
       for item in zip(title,price,image,discount):
           product_information = {
               'title' : item[0],
               'price' : item[1],
               'image' : item[2],
               'discount' : item[3]
           }
           yield product_information

You can also try this code with Online Python Compiler

Run Code

We can run it and store the information in a file using.

scrapy crawl extractProduct -o a.csv

You can also try this code with Online Python Compiler

Run Code

Output:

FAQs

What is web scraping?
Web scraping is an automated method of obtaining large amounts of information from websites. The majority of this information is in the form of unstructured HTML that is converted to structured data in a spreadsheet or database before being used in various applications.
Is web scraping illegal?
Scraping data that is publicly available on the internet is legal. However, some types of data are protected by international legislation; thus, scraping personal information, intellectual property, or confidential information should be avoided.
What are more web scraping tools in Python?
Some more web scraping tools in Python are Requests, Beautiful Soup, Selenium, LXML, Urllib, etc.

Key Takeaways

Congratulations on making it this far.

In this blog, we learned how to scrape e-commerce websites with scrapy and store the information in a CSV file.

If you want to become proficient with Python programming, I suggest you take the Coding Ninjas Python Course, which will teach Python basics with Data Structures and Algorithms.

Scraping E-Commerce Websites with Scrapy

Are you ready for your Dream Job?

Introduction

Getting Started with Scraping

Scrapy Shell

FAQs

Key Takeaways