0

I'm trying to create and write to a file. I have the following code:

from urllib2 import urlopen def crawler(seed_url): to_crawl = [seed_url] crawled=[] while to_crawl: page = to_crawl.pop() page_source = urlopen(page) s = page_source.read() with open(str(page)+".txt","a+") as f: f.write(s) f.close() return crawled if __name__ == "__main__": crawler('http://www.yelp.com/') 

However, it returns the error:

Traceback (most recent call last): File "/Users/adamg/PycharmProjects/NLP-HW1/scrape-test.py", line 29, in <module> crawler('http://www.yelp.com/') File "/Users/adamg/PycharmProjects/NLP-HW1/scrape-test.py", line 14, in crawler with open("./"+str(page)+".txt","a+") as f: IOError: [Errno 2] No such file or directory: 'http://www.yelp.com/.txt' 

I thought that open(file,"a+") is supposed to create and write. What am I doing wrong?

4
  • 3
    "No such file or directory: 'yelp.com/.txt'" You mean this directory doesn't exist?
    – Mathemats
    CommentedMar 5, 2015 at 1:04
  • Ugh, is the slash making a directory?
    – Adam_G
    CommentedMar 5, 2015 at 1:05
  • yep, it's because of the forward slash.CommentedMar 5, 2015 at 1:08
  • I knew it was something dumb. Thank you
    – Adam_G
    CommentedMar 5, 2015 at 1:08

1 Answer 1

5

If you want to use the URL as the basis for the directory, you should encode the URL. That way, slashes (among other characters) will be converted to character sequences which won't interfere with the file system/shell.

The urllib library can help with this.

So, for example:

>>> import urllib >>> urllib.quote_plus('http://www.yelp.com/') 'http%3A%2F%2Fwww.yelp.com%2F' 

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.