I've written a script in scrapy to grab different names
and links
from different pages of a website and write those parsed items in a csv file. When I run my script, I get the results accordingly and find a data filled in csv file. I'm using python 3.5, so when I use scrapy's built-in command to write data in a csv file, I do get a csv file with blank lines in every alternate row. Eventually, I tried the below way to achieve the flawless output (with no blank lines in between). Now, It produces a csv file fixing blank line issues. I hope I did it in the right way. However, if there is anything I can/should do to make it more robust, I'm happy to cope with.
This is my script which provides me with a flawless output in a csv file:
import scrapy ,csv from scrapy.crawler import CrawlerProcess class GetInfoSpider(scrapy.Spider): name = "infrarail" start_urls= ['http://www.infrarail.com/2018/exhibitor-profile/?e={}'.format(page) for page in range(65,70)] def __init__(self): self.infile = open("output.csv","w",newline="") def parse(self, response): for q in response.css("article.contentslim"): name = q.css("h1::text").extract_first() link = q.css("p a::attr(href)").extract_first() yield {'Name':name,'Link':link} writer = csv.writer(self.infile) writer.writerow([name,link]) c = CrawlerProcess({ 'USER_AGENT': 'Mozilla/5.0', }) c.crawl(GetInfoSpider) c.start()
Btw, I used .CrawlerProcess()
to be able to run my spider from sublime text editor.