Merge pull request scrapy#6048 from wRAR/relnotes-2.11

Release notes for 2.11.0
elacuesta · Sep 18, 2023 · efc594b · efc594b
2 parents 3f34a5b + 528911d
commit efc594b
Show file tree

Hide file tree

Showing 4 changed files with 164 additions and 59 deletions.
diff --git a/docs/news.rst b/docs/news.rst
@@ -8,17 +8,118 @@ Release notes
 Scrapy 2.11.0 (to be released)
 ------------------------------
 
+Highlights:
+
+-   Spiders can now modify :ref:`settings <topics-settings>` in their
+    :meth:`~scrapy.Spider.from_crawler` methods, e.g. based on :ref:`spider
+    arguments <spiderargs>`.
+
+-   Periodic logging of stats.
+
+
 Backward-incompatible changes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+-   Most of the initialization of :class:`scrapy.crawler.Crawler` instances is
+    now done in :meth:`~scrapy.crawler.Crawler.crawl`, so the state of
+    instances before that method is called is now different compared to older
+    Scrapy versions. We do not recommend using the
+    :class:`~scrapy.crawler.Crawler` instances before
+    :meth:`~scrapy.crawler.Crawler.crawl` is called. (:issue:`6038`)
+
+-   :meth:`scrapy.Spider.from_crawler` is now called before the initialization
+    of various components previously initialized in
+    :meth:`scrapy.crawler.Crawler.__init__` and before the settings are
+    finalized and frozen. This change was needed to allow changing the settings
+    in :meth:`scrapy.Spider.from_crawler`. If you want to access the final
+    setting values in the spider code as early as possible you can do this in
+    :meth:`~scrapy.Spider.start_requests`. (:issue:`6038`)
+
 -   The :meth:`TextResponse.json <scrapy.http.TextResponse.json>` method now
     requires the response to be in a valid JSON encoding (UTF-8, UTF-16, or
-    UTF-32).
+    UTF-32). If you need to deal with JSON documents in an invalid encoding,
+    use ``json.loads(response.text)`` instead. (:issue:`6016`)
+
+Deprecation removals
+~~~~~~~~~~~~~~~~~~~~
+
+-   Removed the binary export mode of
+    :class:`~scrapy.exporters.PythonItemExporter`, deprecated in Scrapy 1.1.0.
+    (:issue:`6006`, :issue:`6007`)
+
+    .. note:: If you are using this Scrapy version on Scrapy Cloud with a stack
+              that includes an older Scrapy version and get a "TypeError:
+              Unexpected options: binary" error, you may need to add
+              ``scrapinghub-entrypoint-scrapy >= 0.14.1`` to your project
+              requirements or switch to a stack that includes Scrapy 2.11.
+
+-   Removed the ``CrawlerRunner.spiders`` attribute, deprecated in Scrapy
+    1.0.0, use :attr:`CrawlerRunner.spider_loader
+    <scrapy.crawler.CrawlerRunner.spider_loader>` instead. (:issue:`6010`)
+
+Deprecations
+~~~~~~~~~~~~
+
+-   Running :meth:`~scrapy.crawler.Crawler.crawl` more than once on the same
+    :class:`scrapy.crawler.Crawler` instance is now deprecated. (:issue:`1587`,
+    :issue:`6040`)
+
+New features
+~~~~~~~~~~~~
+
+-   Spiders can now modify settings in their
+    :meth:`~scrapy.Spider.from_crawler` method, e.g. based on :ref:`spider
+    arguments <spiderargs>`. (:issue:`1305`, :issue:`1580`, :issue:`2392`,
+    :issue:`3663`, :issue:`6038`)
+
+-   Added the :class:`~scrapy.extensions.periodic_log.PeriodicLog` extension
+    which can be enabled to log stats and/or their differences periodically.
+    (:issue:`5926`)
+
+-   Optimized the memory usage in :meth:`TextResponse.json
+    <scrapy.http.TextResponse.json>` by removing unnecessary body decoding.
+    (:issue:`5968`, :issue:`6016`)
+
+-   Links to ``.webp`` files are now ignored by :ref:`link extractors
+    <topics-link-extractors>`. (:issue:`6021`)
+
+Bug fixes
+~~~~~~~~~
+
+-   Fixed logging enabled add-ons. (:issue:`6036`)
+
+-   Fixed :class:`~scrapy.mail.MailSender` producing invalid message bodies
+    when the ``charset`` argument is passed to
+    :meth:`~scrapy.mail.MailSender.send`. (:issue:`5096`, :issue:`5118`)
+
+-   Fixed an exception when accessing ``self.EXCEPTIONS_TO_RETRY`` from a
+    subclass of :class:`~scrapy.downloadermiddlewares.retry.RetryMiddleware`.
+    (:issue:`6049`, :issue:`6050`)
+
+-   :meth:`scrapy.settings.BaseSettings.getdictorlist`, used to parse
+    :setting:`FEED_EXPORT_FIELDS`, now handles tuple values. (:issue:`6011`,
+    :issue:`6013`)
+
+-   Calls to ``datetime.utcnow()``, no longer recommended to be used, have been
+    replaced with calls to ``datetime.now()`` with a timezone. (:issue:`6014`)
+
+Documentation
+~~~~~~~~~~~~~
+
+-   Updated a deprecated function call in a pipeline example. (:issue:`6008`,
+    :issue:`6009`)
+
+Quality assurance
+~~~~~~~~~~~~~~~~~
+
+-   Extended typing hints. (:issue:`6003`, :issue:`6005`, :issue:`6031`,
+    :issue:`6034`)
 
-    If you need to deal with JSON documents in an invalid encoding, use
-    ``json.loads(response.text)`` instead.
+-   Pinned brotli_ to 1.0.9 for the PyPy tests as 1.1.0 breaks them.
+    (:issue:`6044`, :issue:`6045`)
 
-    (:issue:`5968`)
+-   Other CI and pre-commit improvements. (:issue:`6002`, :issue:`6013`,
+    :issue:`6046`)
 
 .. _release-2.10.1:
 

diff --git a/docs/topics/extensions.rst b/docs/topics/extensions.rst
@@ -350,52 +350,8 @@ full list of parameters, including examples on how to instantiate
 .. module:: scrapy.extensions.debug
    :synopsis: Extensions for debugging Scrapy
 
-Debugging extensions
---------------------
-
-Stack trace dump extension
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. class:: StackTraceDump
-
-Dumps information about the running process when a `SIGQUIT`_ or `SIGUSR2`_
-signal is received. The information dumped is the following:
-
-1. engine status (using ``scrapy.utils.engine.get_engine_status()``)
-2. live references (see :ref:`topics-leaks-trackrefs`)
-3. stack trace of all threads
-
-After the stack trace and engine status is dumped, the Scrapy process continues
-running normally.
-
-This extension only works on POSIX-compliant platforms (i.e. not Windows),
-because the `SIGQUIT`_ and `SIGUSR2`_ signals are not available on Windows.
-
-There are at least two ways to send Scrapy the `SIGQUIT`_ signal:
-
-1. By pressing Ctrl-\ while a Scrapy process is running (Linux only?)
-2. By running this command (assuming ``<pid>`` is the process id of the Scrapy
-   process)::
-
-    kill -QUIT <pid>
-
-.. _SIGUSR2: https://en.wikipedia.org/wiki/SIGUSR1_and_SIGUSR2
-.. _SIGQUIT: https://en.wikipedia.org/wiki/SIGQUIT
-
-Debugger extension
-~~~~~~~~~~~~~~~~~~
-
-.. class:: Debugger
-
-Invokes a :doc:`Python debugger <library/pdb>` inside a running Scrapy process when a `SIGUSR2`_
-signal is received. After the debugger is exited, the Scrapy process continues
-running normally.
-
-For more info see `Debugging in Python`_.
-
-This extension only works on POSIX-compliant platforms (i.e. not Windows).
-
-.. _Debugging in Python: https://pythonconquerstheuniverse.wordpress.com/2009/09/10/debugging-in-python/
+.. module:: scrapy.extensions.periodic_log
+   :synopsis: Periodic stats logging
 
 Periodic log extension
 ~~~~~~~~~~~~~~~~~~~~~~
@@ -441,10 +397,10 @@ This extension periodically logs rich stat data as a JSON object::
 
 This extension logs the following configurable sections:
 
--   ``"delta"`` shows how some numeric stats have changed since the last stats 
+-   ``"delta"`` shows how some numeric stats have changed since the last stats
     log message.
-    
-    The :setting:`PERIODIC_LOG_DELTA` setting determines the target stats. They 
+
+    The :setting:`PERIODIC_LOG_DELTA` setting determines the target stats. They
     must have ``int`` or ``float`` values.
 
 -   ``"stats"`` shows the current value of some stats.
@@ -453,11 +409,11 @@ This extension logs the following configurable sections:
 
 -   ``"time"`` shows detailed timing data.
 
-    The :setting:`PERIODIC_LOG_TIMING_ENABLED` setting determines whether or 
+    The :setting:`PERIODIC_LOG_TIMING_ENABLED` setting determines whether or
     not to show this section.
 
-This extension logs data at the start, then on a fixed time interval 
-configurable through the :setting:`LOGSTATS_INTERVAL` setting, and finally 
+This extension logs data at the start, then on a fixed time interval
+configurable through the :setting:`LOGSTATS_INTERVAL` setting, and finally
 right before the crawl ends.
 
 
@@ -507,4 +463,52 @@ PERIODIC_LOG_TIMING_ENABLED
 
 Default: ``False``
 
-``True`` enables logging of timing data (i.e. the ``"time"`` section).
+``True`` enables logging of timing data (i.e. the ``"time"`` section).
+
+
+Debugging extensions
+--------------------
+
+Stack trace dump extension
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. class:: StackTraceDump
+
+Dumps information about the running process when a `SIGQUIT`_ or `SIGUSR2`_
+signal is received. The information dumped is the following:
+
+1. engine status (using ``scrapy.utils.engine.get_engine_status()``)
+2. live references (see :ref:`topics-leaks-trackrefs`)
+3. stack trace of all threads
+
+After the stack trace and engine status is dumped, the Scrapy process continues
+running normally.
+
+This extension only works on POSIX-compliant platforms (i.e. not Windows),
+because the `SIGQUIT`_ and `SIGUSR2`_ signals are not available on Windows.
+
+There are at least two ways to send Scrapy the `SIGQUIT`_ signal:
+
+1. By pressing Ctrl-\ while a Scrapy process is running (Linux only?)
+2. By running this command (assuming ``<pid>`` is the process id of the Scrapy
+   process)::
+
+    kill -QUIT <pid>
+
+.. _SIGUSR2: https://en.wikipedia.org/wiki/SIGUSR1_and_SIGUSR2
+.. _SIGQUIT: https://en.wikipedia.org/wiki/SIGQUIT
+
+Debugger extension
+~~~~~~~~~~~~~~~~~~
+
+.. class:: Debugger
+
+Invokes a :doc:`Python debugger <library/pdb>` inside a running Scrapy process when a `SIGUSR2`_
+signal is received. After the debugger is exited, the Scrapy process continues
+running normally.
+
+For more info see `Debugging in Python`_.
+
+This extension only works on POSIX-compliant platforms (i.e. not Windows).
+
+.. _Debugging in Python: https://pythonconquerstheuniverse.wordpress.com/2009/09/10/debugging-in-python/
diff --git a/docs/topics/settings.rst b/docs/topics/settings.rst
@@ -98,7 +98,7 @@ and settings set there should use the "spider" priority explicitly:
             super().update_settings(settings)
             settings.set("SOME_SETTING", "some value", priority="spider")
 
-.. versionadded:: VERSION
+.. versionadded:: 2.11
 
 It's also possible to modify the settings in the
 :meth:`~scrapy.Spider.from_crawler` method, e.g. based on :ref:`spider

diff --git a/docs/topics/spiders.rst b/docs/topics/spiders.rst
@@ -136,7 +136,7 @@ scrapy.Spider
        attributes in the new instance so they can be accessed later inside the
        spider's code.
 
-       .. versionchanged:: VERSION
+       .. versionchanged:: 2.11
 
            The settings in ``crawler.settings`` can now be modified in this
            method, which is handy if you want to modify them based on