NameError: name 'download' is not defined
《用python写网络爬虫》1.4.2网站地图爬虫,在运行时提示“NameError: name 'download' is not defined“
操作方法
- 01
这是书中的代码:import re#from crawling import downloaddef crawl_sitemap(url): #download the sitemap file sitemap = download(url) #extract the sitemap links links = re.findall('<loc>(.*?)</loc>',str(sitemap)) #download each link for link in links: html = download(link) #scrape html here #...
- 02
这本书的这个例子是和1.4.1的例子相互关联的download未被定义,download是来自1.4.1的代码,我这里将1.4.1取名为crawling.py 代码如下: import urllib.requestdef download(url,user_agent='wswp',num_retries=2): print('Downloading:',url) headers = {'User-agent':user_agent} request = urllib.request.Request(url,headers = headers) try: html = urllib.request.urlopen(url).read() except urllib.request.URLError as e: print('Downloading error:',e.reason) html = None if num_retries > 0: if hasattr(e,'code') and 500<=e.code<600: #recursively retry 5xx HTTP errors return download(url,num_retries-1) return html
- 03
所以这个代码应该写为: import refrom crawling import downloaddef crawl_sitemap(url): #download the sitemap file sitemap = download(url) #extract the sitemap links links = re.findall('<loc>(.*?)</loc>',str(sitemap)) #download each link for link in links: html = download(link) #scrape html here #...crawl_sitemap('http://example.webscraping.com/sitemap.xml')
- 04
运行结果如下:
- 05
运行成功,如果有帮到您,请给我投个票吧