0% found this document useful (0 votes)

55 views9 pages

Scrapy Beginners Series Part 3 - Storing Data With Scrapy - ScrapeOps

Uploaded by

amritjsr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views9 pages

Scrapy Beginners Series Part 3 - Storing Data With Scrapy - ScrapeOps

Uploaded by

amritjsr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

07/07/2024, 16:17 Scrapy Beginners Series Part 3 - Storing Data With Scrapy | ScrapeOps

Scrapy Beginners Series Part 3: Storing

Data With Scrapy
In Part 1 and Part 2 of this Python Scrapy 5-Part Beginner Series we learned how to build a basic
scrapy spider and get it to scrape some data from a website as well as how to clean up data as it was
being scraped.

In Part 3 we will be exploring how to save the data into files/formats which would work for most
common use cases. We'll be looking at how to save the data to a CSV or JSON file as well as how to save
the data to a database or S3 bucket.

Python Scrapy 5-Part Beginner Series

Part 1: Basic Scrapy Spider - We will go over the basics of Scrapy, and build our first Scrapy spider.
(Part 1)

Part 2: Cleaning Dirty Data & Dealing With Edge Cases - Web data can be messy, unstructured,
and have lots of edge cases. In this tutorial we will make our spider robust to these edge cases,
using Items, Itemloaders and Item Pipelines. (Part 2)

Part 3: Storing Our Data - There are many different ways we can store the data that we scrape from
databases, CSV files to JSON format, and to S3 buckets. We will explore several different ways we
can store the data and talk about their Pro's, Con's and in which situations you would use them.
(This Tutorial)

Part 4: User Agents & Proxies - Make our spider production ready by managing our user agents &
IPs so we don't get blocked. (Part 4)

https://scrapeops.io/python-scrapy-playbook/scrapy-beginners-guide-storing-data/ 1/9
07/07/2024, 16:17 Scrapy Beginners Series Part 3 - Storing Data With Scrapy | ScrapeOps

Part 5: Deployment, Scheduling & Running Jobs - Deploying our spider on a server, and monitoring
and scheduling jobs via ScrapeOps. (Part 5)

The code for this project is available on Github here!

In this tutorial, Part 3: Storing Data With Scrapy we're going to cover:

Using Feed Exporters

Saving Data to a JSON or CSV file
Saving Data to Amazon S3 Storage
Saving Data to a Database

With the intro out of the way let's get down to business.

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.

Proxy Manager Scraper Monitoring Job Scheduling

Using Feed Exporters

https://scrapeops.io/python-scrapy-playbook/scrapy-beginners-guide-storing-data/ 2/9
07/07/2024, 16:17 Scrapy Beginners Series Part 3 - Storing Data With Scrapy | ScrapeOps

Scrapy already has a way to save the data to several different formats. Scrapy call's these ready to go
export methods Feed Exporters.

Out of the box scrapy provides the following formats to save/export the scraped data:

JSON file format

CVS file format
XML file format
Pythons pickle format

The files which are generated can then be saved to the following places using a Feed Exporter:

The machine Scrapy is running on (obviously)

To a remote machine using FTP (file transfer protocall)
To Amazon S3 Storage
To Google Cloud Storage
Standard output

In this guide we're going to give examples on how your can use Feed Exporters to store your data in
different file formats and locations. However, there are many more ways you can store data with
Scrapy.

Saving Data to a JSON or CSV File

We've already quickly looked at how to export the data to JSON and CSV in part one of this series but
we'll quickly go over how to store the data to a JSON file and a CSV file one more time. Feel free to skip
ahead if you know how to do this already!

To get the data to be saved in the most simple way for a once off job we can use the following
commands:

Saving in JSON format

To save to a JSON file simply add the flag -o to the scrapy crawl command along with the file path
you want to save the file to:

scrapy crawl chocolatespider -o my_scraped_chocolate_data.json

You can also define an absolute path like this:

scrapy crawl chocolatespider -O

file:///path/to/my/project/my_scraped_chocolate_data.json:json

Saving in CSV format

To save to a CSV file add the flag -o to the scrapy crawl command along with the file path you
want to save the file to:

https://scrapeops.io/python-scrapy-playbook/scrapy-beginners-guide-storing-data/ 3/9
07/07/2024, 16:17 Scrapy Beginners Series Part 3 - Storing Data With Scrapy | ScrapeOps

scrapy crawl chocolatespider -o my_scraped_chocolate_data.csv

You can also define an absolute path like this:

scrapy crawl chocolatespider -O

file:///path/to/my/project/my_scraped_chocolate_data.csv:csv

You can also decide whether to overwrite or append the data to the output file.

For example, when using the crawl or runspider commands, you can use the -O option instead of -o
to overwrite the output file. (Be sure to remember the difference as this might be confusing!)

Saving Data to Amazon S3 Storage

Now that we have saved the data to a CSV file, lets save the created CSV files straight to an Amazon S3
bucket (You need to already have one setup).

You can check out how to set up an S3 bucket with amazon here:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/setting-up-s3.html

OK- First we need to install Botocore which is an external Python library created by Amazon to help with
connecting to S3.

pip3 install botocore

Now that we have that installed we can save the file to S3 by specifying the URI to your Amazon S3
bucket:

scrapy crawl chocolatespider -O

s3://aws_key:aws_secret@mybucket/path/to/myscrapeddata.csv:csv

Obviously you will need to replace the aws_key & aws_secret with your own Amazon Key & Secret.
As well as putting in your bucket name and file path. We need the :csv at the end to specify the
format but this could be :json or :xml .

You can also save the aws_key & aws_secret in your project settings file:

AWS_ACCESS_KEY_ID = 'myaccesskeyhere'
AWS_SECRET_ACCESS_KEY = 'mysecretkeyhere'

Note: When saving data with this method the AWS S3 Feed Exporter uses delayed file delivery. This
means that the file is first temporarily saved locally to the machine the scraper is running on and then
it's uploaded to AWS once the spider has completed the job.

Saving Data to MySQL and PostgreSQL Databases

https://scrapeops.io/python-scrapy-playbook/scrapy-beginners-guide-storing-data/ 4/9
07/07/2024, 16:17 Scrapy Beginners Series Part 3 - Storing Data With Scrapy | ScrapeOps

Here well show you how to save the data to MySQL and PostgreSQL databases. To do this we'll be using
Item Pipelines again.

For this we are presuming that you already have a database setup called chocolate_scraping .

For more information on setting up a MySQL or Postgres database check out the following resources:

Windows: MySQL - Postgres

Mac: MySQL - Postgres

Ubuntu: MySQL - Postgres

Saving data to a MySQL database

We are assuming you have already have a database setup and a table called chocolate_products in
your DB. If not you can login to your database and run the following command to create the table:

CREATE TABLE IF NOT EXISTS chocolate_products (

id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255),
price VARCHAR(255),
url TEXT
);

To save the data to the databases we're again going to be using the Item Pipelines. If you don't know
what they are please check out part 2 of this series where we go through how to use Scrapy Item
Pipelines!

The first step in our new Item Pipeline class, as you may expect is to connect to our MySQL database
and the table in which we will be storing our scraped data.

We are going to need to install the mysql package for Python.

pip install mysql

If you already have mysql installed on your computer - you might only need the connection package.

pip install mysql-connector-python

Then create a Item pipeline in our pipelines.py file that will connect with the database.

import mysql.connector

class SavingToMySQLPipeline(object):

def __init__(self):
self.create_connection()

def create_connection(self):
self.conn = mysql.connector.connect(

https://scrapeops.io/python-scrapy-playbook/scrapy-beginners-guide-storing-data/ 5/9
07/07/2024, 16:17 Scrapy Beginners Series Part 3 - Storing Data With Scrapy | ScrapeOps
host = 'localhost',
user = 'root',
password = '123456',
database = 'chocolate_scraping'
)
self.curr = self.conn.cursor()

Now that we are connecting to the database, for the next part we need to save each chocolate product
we scrape into our database item by item as they are processed by Scrapy.

To do that we will use the scrapy process_item() function (which runs after each item is scraped)
and then create a new function called store_in_db in which we will run the MySQL command to store
the Item data into our chocolate_products table.

import mysql.connector

class SavingToMySQLPipeline(object):

def __init__(self):
self.create_connection()

def create_connection(self):
self.connection = mysql.connector.connect(
host = 'localhost',
user = 'root',
password = '123456',
database = 'chocolate_scraping'
)
self.curr = self.connection.cursor()

def process_item(self, item, spider):

self.store_db(item)
#we need to return the item below as Scrapy expects us to!
return item

def store_db(self, item):

self.curr.execute(""" insert into chocolate_products ( name, price, url) values
(%s,%s,%s)""", (
item["name"],
item["price"],
item["url"]
))
self.connection.commit()

Before trying to run our pipeline we mustn't forget to add the pipeline to our ITEM_PIPELINES in our
project settings.py file.

ITEM_PIPELINES = {
'chocolatescraper.pipelines.PriceToUSDPipeline': 100,
'chocolatescraper.pipelines.DuplicatesPipeline': 200,
'chocolatescraper.pipelines.SavingToMySQLPipeline': 300,
}

https://scrapeops.io/python-scrapy-playbook/scrapy-beginners-guide-storing-data/ 6/9
07/07/2024, 16:17 Scrapy Beginners Series Part 3 - Storing Data With Scrapy | ScrapeOps

Saving data to a PostgreSQL database

As in the above section - we are assuming you have already have a postgres database setup and you
have created a table called chocolate_products in your DB. If not you can login to your postgres
database and run the following command to create the table:

CREATE TABLE IF NOT EXISTS chocolate_products (

id SERIAL PRIMARY KEY,
name VARCHAR(255),
price VARCHAR(255),
url TEXT
);

To save the data to a PostgreSQL database the main thing we need to do is to update how the
connection is created. To do so we will will install the Python package psycopg2 .

pip install psycopg2

And update the connection library in our function.

import psycopg2

class SavingToPostgresPipeline(object):

def __init__(self):
self.create_connection()

def create_connection(self):
self.connection = psycopg2.connect(
host="localhost",
database="chocolate_scraping",
user="root",
password="123456")

self.curr = self.connection.cursor()

def process_item(self, item, spider):

self.store_db(item)
#we need to return the item below as scrapy expects us to!
return item

def store_db(self, item):

try:
self.curr.execute(""" insert into chocolate_products (name, price, url)
values (%s, %s, %s)""", (
item["name"],
item["price"],
item["url"]
))

https://scrapeops.io/python-scrapy-playbook/scrapy-beginners-guide-storing-data/ 7/9
07/07/2024, 16:17 Scrapy Beginners Series Part 3 - Storing Data With Scrapy | ScrapeOps

except BaseException as e:
print(e)
self.connection.commit()

Again before trying to run our pipeline we mustn't forget to add the pipeline to our ITEM_PIPELINES in
our project settings.py file.

ITEM_PIPELINES = {
'chocolatescraper.pipelines.PriceToUSDPipeline': 100,
'chocolatescraper.pipelines.DuplicatesPipeline': 200,
'chocolatescraper.pipelines.SavingToPostgresPipeline': 300,
}

After running our spider again we should be able to see the data in our database if we run a simple
select command like the following(after logging into our database!):

select * from chocolate_products;

Next Steps
We hope you now have a good understanding of how to save the data you've scraped into the file or
database you need! If you have any questions leave them in the comments below and we'll do our best
to help out!

If you would like the code from this example please check it out on Github.

The next tutorial covers how to make our spider production ready by managing our user agents & IPs so
we don't get blocked. (Part 4)

Need a Free Proxy? Then check out our Proxy Comparison Tool that allows to compare the pricing,
features and limits of every proxy provider on the market so you can find the one that best suits your
needs. Including the best free plans.

https://scrapeops.io/python-scrapy-playbook/scrapy-beginners-guide-storing-data/ 8/9
07/07/2024, 16:17 Scrapy Beginners Series Part 3 - Storing Data With Scrapy | ScrapeOps

What do you think?

12 Responses

Upvote Funny Love Surprised Angry Sad

2 Comments 
1 Login

G Join the discussion…

LOG IN WITH OR SIGN UP WITH DISQUS ?

Name

 Share Best Newest Oldest

AdmiralLuke − ⚑
a year ago

I was having trouble inserting the scraped data into the postgresql database table with this code since I copied and
pasted it. I realized the problem was "self.connection.commit()" was within the exception clause when it shouldn't
be. Unindenting it solved the problem and everything worked after.

0 0 Reply Share ›

https://scrapeops.io/python-scrapy-playbook/scrapy-beginners-guide-storing-data/ 9/9

Sqlmap Advanced
No ratings yet
Sqlmap Advanced
6 pages
Parvez
No ratings yet
Parvez
2 pages
Hotmail
No ratings yet
Hotmail
538 pages
Paypal Sample
No ratings yet
Paypal Sample
2 pages
IXL Usernames and Passwords List
No ratings yet
IXL Usernames and Passwords List
2 pages
Web App Programming in Python - Johar 20152 PDF
No ratings yet
Web App Programming in Python - Johar 20152 PDF
5 pages
Europe 1 - Password Vps 2022
100% (1)
Europe 1 - Password Vps 2022
109 pages
Ethical Hacking II MCA
No ratings yet
Ethical Hacking II MCA
42 pages
Buy Verified Binance Accounts
No ratings yet
Buy Verified Binance Accounts
17 pages
Exploring Hash-Based Cryptography 2
No ratings yet
Exploring Hash-Based Cryptography 2
11 pages
Wa0004
No ratings yet
Wa0004
16 pages
GNU Free Document License
No ratings yet
GNU Free Document License
9 pages
Tickets To Print PDF
No ratings yet
Tickets To Print PDF
3 pages
How Do I Hack My LAN Network in Linux (Ubuntu)
100% (1)
How Do I Hack My LAN Network in Linux (Ubuntu)
3 pages
RTU560 Remote Terminal Unit RTUtil560 Us
No ratings yet
RTU560 Remote Terminal Unit RTUtil560 Us
134 pages
2020 Amazon REFUND METHOD
75% (4)
2020 Amazon REFUND METHOD
7 pages
Oxylas
No ratings yet
Oxylas
3 pages
Python Code Mining
No ratings yet
Python Code Mining
1 page
SMG-SS7 Cheat Sheet
No ratings yet
SMG-SS7 Cheat Sheet
2 pages
Slime Refcard
No ratings yet
Slime Refcard
1 page
Cooki 3424
No ratings yet
Cooki 3424
13 pages
Project - Real Time Monitoring Project
No ratings yet
Project - Real Time Monitoring Project
12 pages
Main Bank - Py
No ratings yet
Main Bank - Py
2 pages
Cloud Storage
No ratings yet
Cloud Storage
20 pages
Credit - Card - Fraud - Detection Using ML - Jupyter Notebook
No ratings yet
Credit - Card - Fraud - Detection Using ML - Jupyter Notebook
12 pages
Comparison of Clouds
No ratings yet
Comparison of Clouds
1 page
CoreJava Day1Assignments
No ratings yet
CoreJava Day1Assignments
4 pages
A Machine Learning Approach For Digital Watermarking
No ratings yet
A Machine Learning Approach For Digital Watermarking
12 pages
Circular Doubly Linked List
No ratings yet
Circular Doubly Linked List
17 pages
Scrapy Beginners Series Part 1 - First Scrapy Spider - ScrapeOps
No ratings yet
Scrapy Beginners Series Part 1 - First Scrapy Spider - ScrapeOps
17 pages
Ooo Ooooooooo Oooooooooo Oooooooooo
No ratings yet
Ooo Ooooooooo Oooooooooo Oooooooooo
1 page
CheatSheet Python 1 Keywords1
No ratings yet
CheatSheet Python 1 Keywords1
1 page
DuckDuckGo - Cheat Sheet FULL
No ratings yet
DuckDuckGo - Cheat Sheet FULL
3 pages
Script Login
No ratings yet
Script Login
3 pages
Message
No ratings yet
Message
4 pages
Havij SQL Injection Help English
0% (1)
Havij SQL Injection Help English
43 pages
Squatting
No ratings yet
Squatting
4 pages
Python Web Crawler
No ratings yet
Python Web Crawler
15 pages
Python Cheat Sheet
No ratings yet
Python Cheat Sheet
26 pages
06.05.23 - Python - Web Scraping in Python
No ratings yet
06.05.23 - Python - Web Scraping in Python
108 pages
Key Brave
No ratings yet
Key Brave
45 pages
Beginners Python Cheat Sheet PCC Files Exceptions PDF
No ratings yet
Beginners Python Cheat Sheet PCC Files Exceptions PDF
2 pages
Number Bashing
No ratings yet
Number Bashing
5 pages
Comparison of Blockchain Platforms A Systematic Re
No ratings yet
Comparison of Blockchain Platforms A Systematic Re
17 pages
400 WebSite Admin Bypass - BreachForums
No ratings yet
400 WebSite Admin Bypass - BreachForums
2 pages
Introduction Create An Account Reset The Password User Guide
No ratings yet
Introduction Create An Account Reset The Password User Guide
16 pages
Scrapy Beginners Series Part 2 - Cleaning & Processing Data - ScrapeOps
No ratings yet
Scrapy Beginners Series Part 2 - Cleaning & Processing Data - ScrapeOps
10 pages
Web Crawling and Social Media Mining: Module No. 5
No ratings yet
Web Crawling and Social Media Mining: Module No. 5
77 pages
EKSDEDE
No ratings yet
EKSDEDE
1,115 pages
Steps To Create IBM Cloud Account
No ratings yet
Steps To Create IBM Cloud Account
9 pages
Demov6 141213202739 Conversion Gate01
No ratings yet
Demov6 141213202739 Conversion Gate01
41 pages
Consulta Logs-85867439869
No ratings yet
Consulta Logs-85867439869
76 pages
Cqtools: The New Ultimate Hacking Toolkit
No ratings yet
Cqtools: The New Ultimate Hacking Toolkit
14 pages
Hexonline - Co.uk - HexonlineDB - Creditcard
No ratings yet
Hexonline - Co.uk - HexonlineDB - Creditcard
1 page
Example Output
No ratings yet
Example Output
49 pages
Chapter-3 Email, Internet and Its Applications: by Shailendra - Pathare
No ratings yet
Chapter-3 Email, Internet and Its Applications: by Shailendra - Pathare
108 pages
HTB Calamity Write-Up (Ret2mprotect, Bypass NX, Info Leak) - CTF - 0x00sec - The Home of The Hacker
No ratings yet
HTB Calamity Write-Up (Ret2mprotect, Bypass NX, Info Leak) - CTF - 0x00sec - The Home of The Hacker
20 pages
Malware Adware Example
No ratings yet
Malware Adware Example
11 pages
CIS Microsoft SQL Server 2005 Benchmark v2.0.0
No ratings yet
CIS Microsoft SQL Server 2005 Benchmark v2.0.0
166 pages
Proxy Servers: Jiban Jyoti Rana Reg No:0801307165 Branch: IT Sec: B
No ratings yet
Proxy Servers: Jiban Jyoti Rana Reg No:0801307165 Branch: IT Sec: B
30 pages
CNS Unit 3
No ratings yet
CNS Unit 3
15 pages
Scope-Airlines Reservation System
No ratings yet
Scope-Airlines Reservation System
4 pages
Unit Ii
No ratings yet
Unit Ii
12 pages
The History of Internet
No ratings yet
The History of Internet
7 pages
2.4.11 Packet Tracer - Modify Single-Area OSPFv2 (Answers)
No ratings yet
2.4.11 Packet Tracer - Modify Single-Area OSPFv2 (Answers)
4 pages
Bài tập unit 10 anh 8 (P)
No ratings yet
Bài tập unit 10 anh 8 (P)
8 pages
Docu71828 Connectrix Cisco Data Center Network Manager 10.1 (1) Release Notes
No ratings yet
Docu71828 Connectrix Cisco Data Center Network Manager 10.1 (1) Release Notes
27 pages
Google AdSense Mastery Guide
No ratings yet
Google AdSense Mastery Guide
27 pages
SEO Intermediate Assignment: Presented by Isma Arifin
No ratings yet
SEO Intermediate Assignment: Presented by Isma Arifin
23 pages
BB Code
No ratings yet
BB Code
7 pages
Prolin Terminal Manager (2.0.2)
No ratings yet
Prolin Terminal Manager (2.0.2)
57 pages
Week 1 - Web App Testing Basics
No ratings yet
Week 1 - Web App Testing Basics
19 pages
h18300 Increasing Hdfs Performance Powerscale Google Cloud
No ratings yet
h18300 Increasing Hdfs Performance Powerscale Google Cloud
4 pages
Uipath Guide
No ratings yet
Uipath Guide
38 pages
Introduction To Ethical Hacking
No ratings yet
Introduction To Ethical Hacking
2 pages
EEGLAB Tutorial: Arnaud Delorme, Toby Fernsler, Hilit Serby, and Scott Makeig, April 12, 2006
No ratings yet
EEGLAB Tutorial: Arnaud Delorme, Toby Fernsler, Hilit Serby, and Scott Makeig, April 12, 2006
235 pages
Disadvantages of Using An AI Essay Writeraxfmw
No ratings yet
Disadvantages of Using An AI Essay Writeraxfmw
2 pages
Az 801
No ratings yet
Az 801
128 pages
TKH Security Siqura TC 336 and TC 640
No ratings yet
TKH Security Siqura TC 336 and TC 640
11 pages
Ca Siteminder® Secure Proxy Server: Release Notes
No ratings yet
Ca Siteminder® Secure Proxy Server: Release Notes
32 pages
IT - Network Administrator - Pre-Screening Questionnaire
No ratings yet
IT - Network Administrator - Pre-Screening Questionnaire
6 pages
Wallet Talk Deck v2 - 0727-Compressed
No ratings yet
Wallet Talk Deck v2 - 0727-Compressed
14 pages
COMST Camera Ready3 Compressed
No ratings yet
COMST Camera Ready3 Compressed
32 pages
Project Report Final
No ratings yet
Project Report Final
41 pages
Lesson 129 Non ts25
No ratings yet
Lesson 129 Non ts25
9 pages
Pemanfaatan G Suite For Education Untuk Meningkatkan Efektivitas Belajar Mengajar Dan Kapasitas Guru SMA
No ratings yet
Pemanfaatan G Suite For Education Untuk Meningkatkan Efektivitas Belajar Mengajar Dan Kapasitas Guru SMA
7 pages
Assignment 3 - MIDTERMhh
No ratings yet
Assignment 3 - MIDTERMhh
4 pages
HACKING: Unveiling the Secrets of Cybersecurity and Ethical Hacking (2024 Guide for Beginners)
From Everand
HACKING: Unveiling the Secrets of Cybersecurity and Ethical Hacking (2024 Guide for Beginners)
ALEC JENSEN
No ratings yet
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
Proxy server A Complete Guide
From Everand
Proxy server A Complete Guide
Gerardus Blokdyk
No ratings yet

Scrapy Beginners Series Part 3 - Storing Data With Scrapy - ScrapeOps

Uploaded by

Scrapy Beginners Series Part 3 - Storing Data With Scrapy - ScrapeOps

Uploaded by

07/07/2024, 16:17 Scrapy Beginners Series Part 3 - Storing Data With Scrapy | ScrapeOps

Scrapy Beginners Series Part 3: Storing

Python Scrapy 5-Part Beginner Series

The code for this project is available on Github here!

Using Feed Exporters

Need help scraping the web?

Proxy Manager Scraper Monitoring Job Scheduling

Using Feed Exporters

JSON file format

The machine Scrapy is running on (obviously)

Saving Data to a JSON or CSV File

Saving in JSON format

scrapy crawl chocolatespider -o my_scraped_chocolate_data.json

You can also define an absolute path like this:

scrapy crawl chocolatespider -O

Saving in CSV format

scrapy crawl chocolatespider -o my_scraped_chocolate_data.csv

You can also define an absolute path like this:

scrapy crawl chocolatespider -O

Saving Data to Amazon S3 Storage

pip3 install botocore

scrapy crawl chocolatespider -O

Saving Data to MySQL and PostgreSQL Databases

Windows: MySQL - Postgres

Mac: MySQL - Postgres

Ubuntu: MySQL - Postgres

Saving data to a MySQL database

CREATE TABLE IF NOT EXISTS chocolate_products (

We are going to need to install the mysql package for Python.

pip install mysql

pip install mysql-connector-python

def process_item(self, item, spider):

def store_db(self, item):

Saving data to a PostgreSQL database

CREATE TABLE IF NOT EXISTS chocolate_products (

pip install psycopg2

And update the connection library in our function.

def process_item(self, item, spider):

def store_db(self, item):

select * from chocolate_products;

What do you think?

Upvote Funny Love Surprised Angry Sad

G Join the discussion…

LOG IN WITH OR SIGN UP WITH DISQUS ?

 Share Best Newest Oldest

You might also like