Scraping javascript using python

Question

I am trying to scrape running routes, to geoprocess in R, from the following site: http://runkeeper.com/user/127244964/route/1149604

I am trying to do to that with this code:

from bs4 import BeautifulSoup import urllib2 import csv import os import requests page1 = urllib2.urlopen("http://runkeeper.com/user/212579518/route/513771") soup = BeautifulSoup(page1) print(soup)

When I print the results I see that the data that I need is on a text/javascript:

var routePoints = [{"latitude":38.918704,"longitude":-77.036478,"deltaDistance":0,"type":"StartPoint","altitude":40,"deltaPause":0}

I need to scrape the variables inside the dictionary. Any suggestions on how to do this?

Thanks.

Brad Culberson · Accepted Answer · 2014-02-25 03:32:51Z

1

This will search the soup data with regex and load it into an object for your usage.

import re import json point_re = re.compile('.*routePoints =(.*);') point_json = point_re.search(str(soup)).group(1) point_data = json.loads(point_json)

answered Feb 25, 2014 at 3:32

Brad Culberson

1,5771 gold badge10 silver badges2 bronze badges

Thanks, this seems to get all the points that I need. If I wanted to save this to a csv file what would be your suggestion? Also if you have any suggestions on a good tutorial for BeautifulSoup tutorial/book I would appreciate it.
– asado23
CommentedFeb 25, 2014 at 3:42
you could use docs.python.org/2/library/csv.html but it is just as easy to open a file and write the lines you want as long as you are just dumping numerics it will be pretty easy.
– Brad Culberson
CommentedFeb 25, 2014 at 3:46

Add a comment |

Amadan · Accepted Answer · 2014-02-25 02:54:44Z

Use regexp to strip everything outside the square brackets (or alternately, to only select the content of the outermost brackets), then use json.loads on the brackets.

Collectives™ on Stack Overflow

Scraping javascript using python

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Related