.. _news: Release notes ============= Scrapy 1.4.0 (2017-05-18) ------------------------- Scrapy 1.4 does not bring that many breathtaking new features but quite a few handy improvements nonetheless. Scrapy now supports anonymous FTP sessions with customizable user and password via the new :setting:`FTP_USER` and :setting:`FTP_PASSWORD` settings. And if you're using Twisted version 17.1.0 or above, FTP is now available with Python 3. There's a new :meth:`response.follow ` method for creating requests; **it is now a recommended way to create Requests in Scrapy spiders**. This method makes it easier to write correct spiders; ``response.follow`` has several advantages over creating ``scrapy.Request`` objects directly: * it handles relative URLs; * it works properly with non-ascii URLs on non-UTF8 pages; * in addition to absolute and relative URLs it supports Selectors; for ```` elements it can also extract their href values. For example, instead of this:: for href in response.css('li.page a::attr(href)').extract(): url = response.urljoin(href) yield scrapy.Request(url, self.parse, encoding=response.encoding) One can now write this:: for a in response.css('li.page a'): yield response.follow(a, self.parse) Link extractors are also improved. They work similarly to what a regular modern browser would do: leading and trailing whitespace are removed from attributes (think ``href=" http://example.com"``) when building ``Link`` objects. This whitespace-stripping also happens for ``action`` attributes with ``FormRequest``. **Please also note that link extractors do not canonicalize URLs by default anymore.** This was puzzling users every now and then, and it's not what browsers do in fact, so we removed that extra transformation on extractred links. For those of you wanting more control on the ``Referer:`` header that Scrapy sends when following links, you can set your own ``Referrer Policy``. Prior to Scrapy 1.4, the default ``RefererMiddleware`` would simply and blindly set it to the URL of the response that generated the HTTP request (which could leak information on your URL seeds). By default, Scrapy now behaves much like your regular browser does. And this policy is fully customizable with W3C standard values (or with something really custom of your own if you wish). See :setting:`REFERRER_POLICY` for details. To make Scrapy spiders easier to debug, Scrapy logs more stats by default in 1.4: memory usage stats, detailed retry stats, detailed HTTP error code stats. A similar change is that HTTP cache path is also visible in logs now. Last but not least, Scrapy now has the option to make JSON and XML items more human-readable, with newlines between items and even custom indenting offset, using the new :setting:`FEED_EXPORT_INDENT` setting. Enjoy! (Or read on for the rest of changes in this release.) Deprecations and Backwards Incompatible Changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Default to ``canonicalize=False`` in :class:`scrapy.linkextractors.LinkExtractor` (:issue:`2537`, fixes :issue:`1941` and :issue:`1982`): **warning, this is technically backwards-incompatible** - Enable memusage extension by default (:issue:`2539`, fixes :issue:`2187`); **this is technically backwards-incompatible** so please check if you have any non-default ``MEMUSAGE_***`` options set. - ``EDITOR`` environment variable now takes precedence over ``EDITOR`` option defined in settings.py (:issue:`1829`); Scrapy default settings no longer depend on environment variables. **This is technically a backwards incompatible change**. - ``Spider.make_requests_from_url`` is deprecated (:issue:`1728`, fixes :issue:`1495`). New Features ~~~~~~~~~~~~ - Accept proxy credentials in :reqmeta:`proxy` request meta key (:issue:`2526`) - Support `brotli`_-compressed content; requires optional `brotlipy`_ (:issue:`2535`) - New :ref:`response.follow ` shortcut for creating requests (:issue:`1940`) - Added ``flags`` argument and attribute to :class:`Request ` objects (:issue:`2047`) - Support Anonymous FTP (:issue:`2342`) - Added ``retry/count``, ``retry/max_reached`` and ``retry/reason_count/`` stats to :class:`RetryMiddleware ` (:issue:`2543`) - Added ``httperror/response_ignored_count`` and ``httperror/response_ignored_status_count/`` stats to :class:`HttpErrorMiddleware ` (:issue:`2566`) - Customizable :setting:`Referrer policy ` in :class:`RefererMiddleware ` (:issue:`2306`) - New ``data:`` URI download handler (:issue:`2334`, fixes :issue:`2156`) - Log cache directory when HTTP Cache is used (:issue:`2611`, fixes :issue:`2604`) - Warn users when project contains duplicate spider names (fixes :issue:`2181`) - :class:`CaselessDict` now accepts ``Mapping`` instances and not only dicts (:issue:`2646`) - :ref:`Media downloads `, with :class:`FilesPipelines` or :class:`ImagesPipelines`, can now optionally handle HTTP redirects using the new :setting:`MEDIA_ALLOW_REDIRECTS` setting (:issue:`2616`, fixes :issue:`2004`) - Accept non-complete responses from websites using a new :setting:`DOWNLOAD_FAIL_ON_DATALOSS` setting (:issue:`2590`, fixes :issue:`2586`) - Optional pretty-printing of JSON and XML items via :setting:`FEED_EXPORT_INDENT` setting (:issue:`2456`, fixes :issue:`1327`) - Allow dropping fields in ``FormRequest.from_response`` formdata when ``None`` value is passed (:issue:`667`) - Per-request retry times with the new :reqmeta:`max_retry_times` meta key (:issue:`2642`) - ``python -m scrapy`` as a more explicit alternative to ``scrapy`` command (:issue:`2740`) .. _brotli: https://github.com/google/brotli .. _brotlipy: https://github.com/python-hyper/brotlipy/ Bug fixes ~~~~~~~~~ - LinkExtractor now strips leading and trailing whitespaces from attributes (:issue:`2547`, fixes :issue:`1614`) - Properly handle whitespaces in action attribute in :class:`FormRequest` (:issue:`2548`) - Buffer CONNECT response bytes from proxy until all HTTP headers are received (:issue:`2495`, fixes :issue:`2491`) - FTP downloader now works on Python 3, provided you use Twisted>=17.1 (:issue:`2599`) - Use body to choose response type after decompressing content (:issue:`2393`, fixes :issue:`2145`) - Always decompress ``Content-Encoding: gzip`` at :class:`HttpCompressionMiddleware ` stage (:issue:`2391`) - Respect custom log level in ``Spider.custom_settings`` (:issue:`2581`, fixes :issue:`1612`) - 'make htmlview' fix for macOS (:issue:`2661`) - Remove "commands" from the command list (:issue:`2695`) - Fix duplicate Content-Length header for POST requests with empty body (:issue:`2677`) - Properly cancel large downloads, i.e. above :setting:`DOWNLOAD_MAXSIZE` (:issue:`1616`) - ImagesPipeline: fixed processing of transparent PNG images with palette (:issue:`2675`) Cleanups & Refactoring ~~~~~~~~~~~~~~~~~~~~~~ - Tests: remove temp files and folders (:issue:`2570`), fixed ProjectUtilsTest on OS X (:issue:`2569`), use portable pypy for Linux on Travis CI (:issue:`2710`) - Separate building request from ``_requests_to_follow`` in CrawlSpider (:issue:`2562`) - Remove “Python 3 progress” badge (:issue:`2567`) - Add a couple more lines to ``.gitignore`` (:issue:`2557`) - Remove bumpversion prerelease configuration (:issue:`2159`) - Add codecov.yml file (:issue:`2750`) - Set context factory implementation based on Twisted version (:issue:`2577`, fixes :issue:`2560`) - Add omitted ``self`` arguments in default project middleware template (:issue:`2595`) - Remove redundant ``slot.add_request()`` call in ExecutionEngine (:issue:`2617`) - Catch more specific ``os.error`` exception in :class:`FSFilesStore` (:issue:`2644`) - Change "localhost" test server certificate (:issue:`2720`) - Remove unused ``MEMUSAGE_REPORT`` setting (:issue:`2576`) Documentation ~~~~~~~~~~~~~ - Binary mode is required for exporters (:issue:`2564`, fixes :issue:`2553`) - Mention issue with :meth:`FormRequest.from_response ` due to bug in lxml (:issue:`2572`) - Use single quotes uniformly in templates (:issue:`2596`) - Document :reqmeta:`ftp_user` and :reqmeta:`ftp_password` meta keys (:issue:`2587`) - Removed section on deprecated ``contrib/`` (:issue:`2636`) - Recommend Anaconda when installing Scrapy on Windows (:issue:`2477`, fixes :issue:`2475`) - FAQ: rewrite note on Python 3 support on Windows (:issue:`2690`) - Rearrange selector sections (:issue:`2705`) - Remove ``__nonzero__`` from :class:`SelectorList` docs (:issue:`2683`) - Mention how to disable request filtering in documentation of :setting:`DUPEFILTER_CLASS` setting (:issue:`2714`) - Add sphinx_rtd_theme to docs setup readme (:issue:`2668`) - Open file in text mode in JSON item writer example (:issue:`2729`) - Clarify ``allowed_domains`` example (:issue:`2670`) Scrapy 1.3.3 (2017-03-10) ------------------------- Bug fixes ~~~~~~~~~ - Make ``SpiderLoader`` raise ``ImportError`` again by default for missing dependencies and wrong :setting:`SPIDER_MODULES`. These exceptions were silenced as warnings since 1.3.0. A new setting is introduced to toggle between warning or exception if needed ; see :setting:`SPIDER_LOADER_WARN_ONLY` for details. Scrapy 1.3.2 (2017-02-13) ------------------------- Bug fixes ~~~~~~~~~ - Preserve request class when converting to/from dicts (utils.reqser) (:issue:`2510`). - Use consistent selectors for author field in tutorial (:issue:`2551`). - Fix TLS compatibility in Twisted 17+ (:issue:`2558`) Scrapy 1.3.1 (2017-02-08) ------------------------- New features ~~~~~~~~~~~~ - Support ``'True'`` and ``'False'`` string values for boolean settings (:issue:`2519`); you can now do something like ``scrapy crawl myspider -s REDIRECT_ENABLED=False``. - Support kwargs with ``response.xpath()`` to use :ref:`XPath variables ` and ad-hoc namespaces declarations ; this requires at least Parsel v1.1 (:issue:`2457`). - Add support for Python 3.6 (:issue:`2485`). - Run tests on PyPy (warning: some tests still fail, so PyPy is not supported yet). Bug fixes ~~~~~~~~~ - Enforce ``DNS_TIMEOUT`` setting (:issue:`2496`). - Fix :command:`view` command ; it was a regression in v1.3.0 (:issue:`2503`). - Fix tests regarding ``*_EXPIRES settings`` with Files/Images pipelines (:issue:`2460`). - Fix name of generated pipeline class when using basic project template (:issue:`2466`). - Fix compatiblity with Twisted 17+ (:issue:`2496`, :issue:`2528`). - Fix ``scrapy.Item`` inheritance on Python 3.6 (:issue:`2511`). - Enforce numeric values for components order in ``SPIDER_MIDDLEWARES``, ``DOWNLOADER_MIDDLEWARES``, ``EXTENIONS`` and ``SPIDER_CONTRACTS`` (:issue:`2420`). Documentation ~~~~~~~~~~~~~ - Reword Code of Coduct section and upgrade to Contributor Covenant v1.4 (:issue:`2469`). - Clarify that passing spider arguments converts them to spider attributes (:issue:`2483`). - Document ``formid`` argument on ``FormRequest.from_response()`` (:issue:`2497`). - Add .rst extension to README files (:issue:`2507`). - Mention LevelDB cache storage backend (:issue:`2525`). - Use ``yield`` in sample callback code (:issue:`2533`). - Add note about HTML entities decoding with ``.re()/.re_first()`` (:issue:`1704`). - Typos (:issue:`2512`, :issue:`2534`, :issue:`2531`). Cleanups ~~~~~~~~ - Remove reduntant check in ``MetaRefreshMiddleware`` (:issue:`2542`). - Faster checks in ``LinkExtractor`` for allow/deny patterns (:issue:`2538`). - Remove dead code supporting old Twisted versions (:issue:`2544`). Scrapy 1.3.0 (2016-12-21) ------------------------- This release comes rather soon after 1.2.2 for one main reason: it was found out that releases since 0.18 up to 1.2.2 (included) use some backported code from Twisted (``scrapy.xlib.tx.*``), even if newer Twisted modules are available. Scrapy now uses ``twisted.web.client`` and ``twisted.internet.endpoints`` directly. (See also cleanups below.) As it is a major change, we wanted to get the bug fix out quickly while not breaking any projects using the 1.2 series. New Features ~~~~~~~~~~~~ - ``MailSender`` now accepts single strings as values for ``to`` and ``cc`` arguments (:issue:`2272`) - ``scrapy fetch url``, ``scrapy shell url`` and ``fetch(url)`` inside scrapy shell now follow HTTP redirections by default (:issue:`2290`); See :command:`fetch` and :command:`shell` for details. - ``HttpErrorMiddleware`` now logs errors with ``INFO`` level instead of ``DEBUG``; this is technically **backwards incompatible** so please check your log parsers. - By default, logger names now use a long-form path, e.g. ``[scrapy.extensions.logstats]``, instead of the shorter "top-level" variant of prior releases (e.g. ``[scrapy]``); this is **backwards incompatible** if you have log parsers expecting the short logger name part. You can switch back to short logger names using :setting:`LOG_SHORT_NAMES` set to ``True``. Dependencies & Cleanups ~~~~~~~~~~~~~~~~~~~~~~~ - Scrapy now requires Twisted >= 13.1 which is the case for many Linux distributions already. - As a consequence, we got rid of ``scrapy.xlib.tx.*`` modules, which copied some of Twisted code for users stuck with an "old" Twisted version - ``ChunkedTransferMiddleware`` is deprecated and removed from the default downloader middlewares. Scrapy 1.2.3 (2017-03-03) ------------------------- - Packaging fix: disallow unsupported Twisted versions in setup.py Scrapy 1.2.2 (2016-12-06) ------------------------- Bug fixes ~~~~~~~~~ - Fix a cryptic traceback when a pipeline fails on ``open_spider()`` (:issue:`2011`) - Fix embedded IPython shell variables (fixing :issue:`396` that re-appeared in 1.2.0, fixed in :issue:`2418`) - A couple of patches when dealing with robots.txt: - handle (non-standard) relative sitemap URLs (:issue:`2390`) - handle non-ASCII URLs and User-Agents in Python 2 (:issue:`2373`) Documentation ~~~~~~~~~~~~~ - Document ``"download_latency"`` key in ``Request``'s ``meta`` dict (:issue:`2033`) - Remove page on (deprecated & unsupported) Ubuntu packages from ToC (:issue:`2335`) - A few fixed typos (:issue:`2346`, :issue:`2369`, :issue:`2369`, :issue:`2380`) and clarifications (:issue:`2354`, :issue:`2325`, :issue:`2414`) Other changes ~~~~~~~~~~~~~ - Advertize `conda-forge`_ as Scrapy's official conda channel (:issue:`2387`) - More helpful error messages when trying to use ``.css()`` or ``.xpath()`` on non-Text Responses (:issue:`2264`) - ``startproject`` command now generates a sample ``middlewares.py`` file (:issue:`2335`) - Add more dependencies' version info in ``scrapy version`` verbose output (:issue:`2404`) - Remove all ``*.pyc`` files from source distribution (:issue:`2386`) .. _conda-forge: https://anaconda.org/conda-forge/scrapy Scrapy 1.2.1 (2016-10-21) ------------------------- Bug fixes ~~~~~~~~~ - Include OpenSSL's more permissive default ciphers when establishing TLS/SSL connections (:issue:`2314`). - Fix "Location" HTTP header decoding on non-ASCII URL redirects (:issue:`2321`). Documentation ~~~~~~~~~~~~~ - Fix JsonWriterPipeline example (:issue:`2302`). - Various notes: :issue:`2330` on spider names, :issue:`2329` on middleware methods processing order, :issue:`2327` on getting multi-valued HTTP headers as lists. Other changes ~~~~~~~~~~~~~ - Removed ``www.`` from ``start_urls`` in built-in spider templates (:issue:`2299`). Scrapy 1.2.0 (2016-10-03) ------------------------- New Features ~~~~~~~~~~~~ - New :setting:`FEED_EXPORT_ENCODING` setting to customize the encoding used when writing items to a file. This can be used to turn off ``\uXXXX`` escapes in JSON output. This is also useful for those wanting something else than UTF-8 for XML or CSV output (:issue:`2034`). - ``startproject`` command now supports an optional destination directory to override the default one based on the project name (:issue:`2005`). - New :setting:`SCHEDULER_DEBUG` setting to log requests serialization failures (:issue:`1610`). - JSON encoder now supports serialization of ``set`` instances (:issue:`2058`). - Interpret ``application/json-amazonui-streaming`` as ``TextResponse`` (:issue:`1503`). - ``scrapy`` is imported by default when using shell tools (:command:`shell`, :ref:`inspect_response `) (:issue:`2248`). Bug fixes ~~~~~~~~~ - DefaultRequestHeaders middleware now runs before UserAgent middleware (:issue:`2088`). **Warning: this is technically backwards incompatible**, though we consider this a bug fix. - HTTP cache extension and plugins that use the ``.scrapy`` data directory now work outside projects (:issue:`1581`). **Warning: this is technically backwards incompatible**, though we consider this a bug fix. - ``Selector`` does not allow passing both ``response`` and ``text`` anymore (:issue:`2153`). - Fixed logging of wrong callback name with ``scrapy parse`` (:issue:`2169`). - Fix for an odd gzip decompression bug (:issue:`1606`). - Fix for selected callbacks when using ``CrawlSpider`` with :command:`scrapy parse ` (:issue:`2225`). - Fix for invalid JSON and XML files when spider yields no items (:issue:`872`). - Implement ``flush()`` fpr ``StreamLogger`` avoiding a warning in logs (:issue:`2125`). Refactoring ~~~~~~~~~~~ - ``canonicalize_url`` has been moved to `w3lib.url`_ (:issue:`2168`). .. _w3lib.url: https://w3lib.readthedocs.io/en/latest/w3lib.html#w3lib.url.canonicalize_url Tests & Requirements ~~~~~~~~~~~~~~~~~~~~ Scrapy's new requirements baseline is Debian 8 "Jessie". It was previously Ubuntu 12.04 Precise. What this means in practice is that we run continuous integration tests with these (main) packages versions at a minimum: Twisted 14.0, pyOpenSSL 0.14, lxml 3.4. Scrapy may very well work with older versions of these packages (the code base still has switches for older Twisted versions for example) but it is not guaranteed (because it's not tested anymore). Documentation ~~~~~~~~~~~~~ - Grammar fixes: :issue:`2128`, :issue:`1566`. - Download stats badge removed from README (:issue:`2160`). - New scrapy :ref:`architecture diagram ` (:issue:`2165`). - Updated ``Response`` parameters documentation (:issue:`2197`). - Reworded misleading :setting:`RANDOMIZE_DOWNLOAD_DELAY` description (:issue:`2190`). - Add StackOverflow as a support channel (:issue:`2257`). Scrapy 1.1.4 (2017-03-03) ------------------------- - Packaging fix: disallow unsupported Twisted versions in setup.py Scrapy 1.1.3 (2016-09-22) ------------------------- Bug fixes ~~~~~~~~~ - Class attributes for subclasses of ``ImagesPipeline`` and ``FilesPipeline`` work as they did before 1.1.1 (:issue:`2243`, fixes :issue:`2198`) Documentation ~~~~~~~~~~~~~ - :ref:`Overview ` and :ref:`tutorial ` rewritten to use http://toscrape.com websites (:issue:`2236`, :issue:`2249`, :issue:`2252`). Scrapy 1.1.2 (2016-08-18) ------------------------- Bug fixes ~~~~~~~~~ - Introduce a missing :setting:`IMAGES_STORE_S3_ACL` setting to override the default ACL policy in ``ImagesPipeline`` when uploading images to S3 (note that default ACL policy is "private" -- instead of "public-read" -- since Scrapy 1.1.0) - :setting:`IMAGES_EXPIRES` default value set back to 90 (the regression was introduced in 1.1.1) Scrapy 1.1.1 (2016-07-13) ------------------------- Bug fixes ~~~~~~~~~ - Add "Host" header in CONNECT requests to HTTPS proxies (:issue:`2069`) - Use response ``body`` when choosing response class (:issue:`2001`, fixes :issue:`2000`) - Do not fail on canonicalizing URLs with wrong netlocs (:issue:`2038`, fixes :issue:`2010`) - a few fixes for ``HttpCompressionMiddleware`` (and ``SitemapSpider``): - Do not decode HEAD responses (:issue:`2008`, fixes :issue:`1899`) - Handle charset parameter in gzip Content-Type header (:issue:`2050`, fixes :issue:`2049`) - Do not decompress gzip octet-stream responses (:issue:`2065`, fixes :issue:`2063`) - Catch (and ignore with a warning) exception when verifying certificate against IP-address hosts (:issue:`2094`, fixes :issue:`2092`) - Make ``FilesPipeline`` and ``ImagesPipeline`` backward compatible again regarding the use of legacy class attributes for customization (:issue:`1989`, fixes :issue:`1985`) New features ~~~~~~~~~~~~ - Enable genspider command outside project folder (:issue:`2052`) - Retry HTTPS CONNECT ``TunnelError`` by default (:issue:`1974`) Documentation ~~~~~~~~~~~~~ - ``FEED_TEMPDIR`` setting at lexicographical position (:commit:`9b3c72c`) - Use idiomatic ``.extract_first()`` in overview (:issue:`1994`) - Update years in copyright notice (:commit:`c2c8036`) - Add information and example on errbacks (:issue:`1995`) - Use "url" variable in downloader middleware example (:issue:`2015`) - Grammar fixes (:issue:`2054`, :issue:`2120`) - New FAQ entry on using BeautifulSoup in spider callbacks (:issue:`2048`) - Add notes about scrapy not working on Windows with Python 3 (:issue:`2060`) - Encourage complete titles in pull requests (:issue:`2026`) Tests ~~~~~ - Upgrade py.test requirement on Travis CI and Pin pytest-cov to 2.2.1 (:issue:`2095`) Scrapy 1.1.0 (2016-05-11) ------------------------- This 1.1 release brings a lot of interesting features and bug fixes: - Scrapy 1.1 has beta Python 3 support (requires Twisted >= 15.5). See :ref:`news_betapy3` for more details and some limitations. - Hot new features: - Item loaders now support nested loaders (:issue:`1467`). - ``FormRequest.from_response`` improvements (:issue:`1382`, :issue:`1137`). - Added setting :setting:`AUTOTHROTTLE_TARGET_CONCURRENCY` and improved AutoThrottle docs (:issue:`1324`). - Added ``response.text`` to get body as unicode (:issue:`1730`). - Anonymous S3 connections (:issue:`1358`). - Deferreds in downloader middlewares (:issue:`1473`). This enables better robots.txt handling (:issue:`1471`). - HTTP caching now follows RFC2616 more closely, added settings :setting:`HTTPCACHE_ALWAYS_STORE` and :setting:`HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS` (:issue:`1151`). - Selectors were extracted to the parsel_ library (:issue:`1409`). This means you can use Scrapy Selectors without Scrapy and also upgrade the selectors engine without needing to upgrade Scrapy. - HTTPS downloader now does TLS protocol negotiation by default, instead of forcing TLS 1.0. You can also set the SSL/TLS method using the new :setting:`DOWNLOADER_CLIENT_TLS_METHOD`. - These bug fixes may require your attention: - Don't retry bad requests (HTTP 400) by default (:issue:`1289`). If you need the old behavior, add ``400`` to :setting:`RETRY_HTTP_CODES`. - Fix shell files argument handling (:issue:`1710`, :issue:`1550`). If you try ``scrapy shell index.html`` it will try to load the URL http://index.html, use ``scrapy shell ./index.html`` to load a local file. - Robots.txt compliance is now enabled by default for newly-created projects (:issue:`1724`). Scrapy will also wait for robots.txt to be downloaded before proceeding with the crawl (:issue:`1735`). If you want to disable this behavior, update :setting:`ROBOTSTXT_OBEY` in ``settings.py`` file after creating a new project. - Exporters now work on unicode, instead of bytes by default (:issue:`1080`). If you use ``PythonItemExporter``, you may want to update your code to disable binary mode which is now deprecated. - Accept XML node names containing dots as valid (:issue:`1533`). - When uploading files or images to S3 (with ``FilesPipeline`` or ``ImagesPipeline``), the default ACL policy is now "private" instead of "public" **Warning: backwards incompatible!**. You can use :setting:`FILES_STORE_S3_ACL` to change it. - We've reimplemented ``canonicalize_url()`` for more correct output, especially for URLs with non-ASCII characters (:issue:`1947`). This could change link extractors output compared to previous scrapy versions. This may also invalidate some cache entries you could still have from pre-1.1 runs. **Warning: backwards incompatible!**. Keep reading for more details on other improvements and bug fixes. .. _news_betapy3: Beta Python 3 Support ~~~~~~~~~~~~~~~~~~~~~ We have been `hard at work to make Scrapy run on Python 3 `_. As a result, now you can run spiders on Python 3.3, 3.4 and 3.5 (Twisted >= 15.5 required). Some features are still missing (and some may never be ported). Almost all builtin extensions/middlewares are expected to work. However, we are aware of some limitations in Python 3: - Scrapy does not work on Windows with Python 3 - Sending emails is not supported - FTP download handler is not supported - Telnet console is not supported Additional New Features and Enhancements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Scrapy now has a `Code of Conduct`_ (:issue:`1681`). - Command line tool now has completion for zsh (:issue:`934`). - Improvements to ``scrapy shell``: - Support for bpython and configure preferred Python shell via ``SCRAPY_PYTHON_SHELL`` (:issue:`1100`, :issue:`1444`). - Support URLs without scheme (:issue:`1498`) **Warning: backwards incompatible!** - Bring back support for relative file path (:issue:`1710`, :issue:`1550`). - Added :setting:`MEMUSAGE_CHECK_INTERVAL_SECONDS` setting to change default check interval (:issue:`1282`). - Download handlers are now lazy-loaded on first request using their scheme (:issue:`1390`, :issue:`1421`). - HTTPS download handlers do not force TLS 1.0 anymore; instead, OpenSSL's ``SSLv23_method()/TLS_method()`` is used allowing to try negotiating with the remote hosts the highest TLS protocol version it can (:issue:`1794`, :issue:`1629`). - ``RedirectMiddleware`` now skips the status codes from ``handle_httpstatus_list`` on spider attribute or in ``Request``'s ``meta`` key (:issue:`1334`, :issue:`1364`, :issue:`1447`). - Form submission: - now works with ``