Scrapy安装

今天在Ubuntu上安装python的爬虫框架Scrapy,遇到了一些问题。

失败案例

首先想用pip来安装Scrapy:

1
pip install Scrapy

出现了如下错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
 Complete output from command python setup.py egg_info:
Couldn't find index page for 'incremental' (maybe misspelled?)
No local packages or download links found for incremental>=16.10.1
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-wIgIGQ/Twisted/setup.py", line 21, in <module>
setuptools.setup(**_setup["getSetupArgs"]())
File "/usr/lib/python2.7/distutils/core.py", line 111, in setup
_setup_distribution = dist = klass(attrs)
File "build/bdist.linux-x86_64/egg/setuptools/dist.py", line 260, in __init__
File "build/bdist.linux-x86_64/egg/setuptools/dist.py", line 284, in fetch_build_eggs
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 563, in resolve
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 799, in best_match
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 811, in obtain
File "build/bdist.linux-x86_64/egg/setuptools/dist.py", line 327, in fetch_build_egg
File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 434, in easy_install

File "build/bdist.linux-x86_64/egg/setuptools/package_index.py", line 475, in fetch_distribution
AttributeError: 'NoneType' object has no attribute 'clone'

----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-wIgIGQ/Twisted/

google和百度了半天,都没有找到合适的解决方案:

参考了一个很像的博客,运行了sudo -H pip install --upgrade setuptools

出现了如下的错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 209, in main
status = self.run(options, args)
File "/usr/lib/python2.7/dist-packages/pip/commands/install.py", line 328, in run
wb.build(autobuilding=True)
File "/usr/lib/python2.7/dist-packages/pip/wheel.py", line 748, in build
self.requirement_set.prepare_files(self.finder)
File "/usr/lib/python2.7/dist-packages/pip/req/req_set.py", line 360, in prepare_files
ignore_dependencies=self.ignore_dependencies))
File "/usr/lib/python2.7/dist-packages/pip/req/req_set.py", line 448, in _prepare_file
req_to_install, finder)
File "/usr/lib/python2.7/dist-packages/pip/req/req_set.py", line 397, in _check_skip_installed
finder.find_requirement(req_to_install, self.upgrade)
File "/usr/lib/python2.7/dist-packages/pip/index.py", line 442, in find_requirement
all_candidates = self.find_all_candidates(req.name)
File "/usr/lib/python2.7/dist-packages/pip/index.py", line 400, in find_all_candidates
for page in self._get_pages(url_locations, project_name):
File "/usr/lib/python2.7/dist-packages/pip/index.py", line 545, in _get_pages
page = self._get_page(location)
File "/usr/lib/python2.7/dist-packages/pip/index.py", line 648, in _get_page
return HTMLPage.get_page(link, session=self.session)
File "/usr/lib/python2.7/dist-packages/pip/index.py", line 757, in get_page
"Cache-Control": "max-age=600",
File "/usr/share/python-wheels/requests-2.9.1-py2.py3-none-any.whl/requests/sessions.py", line 480, in get
return self.request('GET', url, **kwargs)
File "/usr/lib/python2.7/dist-packages/pip/download.py", line 378, in request
return super(PipSession, self).request(method, url, *args, **kwargs)
File "/usr/share/python-wheels/requests-2.9.1-py2.py3-none-any.whl/requests/sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "/usr/share/python-wheels/requests-2.9.1-py2.py3-none-any.whl/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/usr/share/python-wheels/CacheControl-0.11.5-py2.py3-none-any.whl/cachecontrol/adapter.py", line 46, in send
resp = super(CacheControlAdapter, self).send(request, **kw)
File "/usr/share/python-wheels/requests-2.9.1-py2.py3-none-any.whl/requests/adapters.py", line 376, in send
timeout=timeout
File "/usr/share/python-wheels/urllib3-1.13.1-py2.py3-none-any.whl/urllib3/connectionpool.py", line 610, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/share/python-wheels/urllib3-1.13.1-py2.py3-none-any.whl/urllib3/util/retry.py", line 228, in increment
total -= 1
TypeError: unsupported operand type(s) for -=: 'Retry' and 'int'

最终放弃

成功方案

参考官方文档的另一种安装方式:conda

要使用conda,需要安装AnacondaMiniconda
Miniconda是简化办的Anaconda。图方便安装了Miniconda。(安装方法:https://conda.io/docs/user-guide/install/linux.html

安装好conda之后,运行conda install -c conda-forge scrapy安装Scrapy

安装成功:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ scrapy
Scrapy 1.5.0 - no active project

Usage:
scrapy <command> [options] [args]

Available commands:
bench Run quick benchmark test
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy

[ more ] More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command

文章

  1. Installation guide
  2. Anaconda
  3. Miniconda
  4. Conda: Installing on Linux
  5. ubuntu 16.04 安装 scrapy 一个花了我5个小时的bug最终解决
  6. 使用conda管理python环境