0

I am trying to scrape running routes, to geoprocess in R, from the following site: http://runkeeper.com/user/127244964/route/1149604

I am trying to do to that with this code:

from bs4 import BeautifulSoup import urllib2 import csv import os import requests page1 = urllib2.urlopen("http://runkeeper.com/user/212579518/route/513771") soup = BeautifulSoup(page1) print(soup) 

When I print the results I see that the data that I need is on a text/javascript:


var routePoints = [{"latitude":38.918704,"longitude":-77.036478,"deltaDistance":0,"type":"StartPoint","altitude":40,"deltaPause":0}

I need to scrape the variables inside the dictionary. Any suggestions on how to do this?

Thanks.

    2 Answers 2

    1

    This will search the soup data with regex and load it into an object for your usage.

    import re import json point_re = re.compile('.*routePoints =(.*);') point_json = point_re.search(str(soup)).group(1) point_data = json.loads(point_json) 
    2
    • Thanks, this seems to get all the points that I need. If I wanted to save this to a csv file what would be your suggestion? Also if you have any suggestions on a good tutorial for BeautifulSoup tutorial/book I would appreciate it.
      – asado23
      CommentedFeb 25, 2014 at 3:42
    • you could use docs.python.org/2/library/csv.html but it is just as easy to open a file and write the lines you want as long as you are just dumping numerics it will be pretty easy.CommentedFeb 25, 2014 at 3:46
    0

    Use regexp to strip everything outside the square brackets (or alternately, to only select the content of the outermost brackets), then use json.loads on the brackets.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.