天堂免费在线,亚洲天堂免费在线,日韩成人精品在线观看

Scrapy批量運行爬蟲文件的兩種方法：

1、使用CrawProcess實現(xiàn)

https://doc.scrapy.org/en/latest/topics/practices.html

2、修改craw源碼+自定義命令的方式實現(xiàn)

（1）我們打開scrapy.commands.crawl.py 文件可以看到：

									def run(self, args, opts):

									   if len(args) < 1:

									     raise UsageError()

									   elif len(args) > 1:

									     raise UsageError("running 'scrapy crawl' with more than one spider is no longer supported")

									   spname = args[0]

									   self.crawler_process.crawl(spname, **opts.spargs)

									   self.crawler_process.start()

這是crawl.py 文件中的run() 方法，在此可以指定運行哪個爬蟲，要運行所有的爬蟲，則需要更改這個方法。

run() 方法中通過crawler_process.crawl(spname, **opts.spargs) 實現(xiàn)了爬蟲文件的運行，spname代表爬蟲名。要運行多個爬蟲文件，首先要獲取所有的爬蟲文件，可以通過crawler_process.spider_loader.list() 實現(xiàn)。

（2）實現(xiàn)過程：

a、在spider目錄的同級目錄下創(chuàng)建存放源代碼的文件夾mycmd，并在該目錄下創(chuàng)建文件mycrawl.py；

b、將crawl.py 中的代碼復制到mycrawl.py 文件中，然后進行修改：

									#修改后的run() 方法

									  def run(self, args, opts):

									    #獲取爬蟲列表

									    spd_loader_list = self.crawler_process.spider_loader.list()

									    #遍歷各爬蟲

									    for spname in spd_loader_list or args:

									      self.crawler_process.crawl(spname, **opts.spargs)

									      print("此時啟動的爬蟲："+spname)

									    self.crawler_process.start()