How to set Scrapy configuration values

In a nutshell

The Public.Law Open-gov spiders set the secrets in production via Zyte’s Spider Settings UI. Then this code at the end of enables local development mode:

# In development mode only, set the sensitive and environment-
# dependent configuration values via env  variables. On development
# machines, set `SCRAPY_DEVELOPMENT_MODE` to make this work. This
# isn't necessary, however, to develop and run the spiders.
if "SCRAPY_DEVELOPMENT_MODE" in os.environ:
    LOG_LEVEL = os.environ["SCRAPY_LOG_LEVEL"]
    USER_AGENT = os.environ["SCRAPY_USER_AGENT"]

The details: env vars in dev, Scrapy configs in production.

Python Scrapy is a batteries-included, complete ecosystem for scraping data: libraries, commercial hosting, open source hosting, and active communities.

One funny, tricky thing for me has been configuring the code in the modern & secure style: no API keys or secrets committed with the code. Each environment (development and production) picks up the proper settings. Finally, for the Public.Law scrapers, enabling external open-source devs to work with the code quickly without setting up a bunch of APIs.

It turns out that Zyte’s Spider Settings UI does not set OS environment variables. Instead, they directly set Scrapy Spider settings.

So, in order to run spiders locally as well as on a host like Zyte, two methods of settings the configs must be supported:

  • In production, do not use os.environ. Instead, simply set the configs in the web UI and the Scrapy library will pick them up.
  • In development, we do want to use os.environ. In this case, we read the env vars, set constants in, and the library will pick them up from there.

The final trick is figuring out whether we’re in production or dev. I couldn’t find anything in the docs or debug output. So on my dev laptop, I simply set a flag variable (I use the fish shell):


I check for that env var in a safe way which ignores the remaining code when in production:

if "SCRAPY_DEVELOPMENT_MODE" in os.environ: