2

I'm looking for a way to post content from a web scraper external to our Wordpress site to post content its scraped into our site. Right now, I have the scraped content formatted into JSON. Could I somehow use the JSON API to post that json formatted data into a WP site, or would I need to use the XML-RPC approach?

Edit: I'm actually trying to help a scattershot writing community by bringing together various poems and stories published over twenty different websites run by the writers individually. They would like a site showcasing their work as a whole. No spamming is taking place here....

The scraper, which is written in Python (using a framework called Scrapy), can output what it scrapes in JSON. Let's say, I have a title and description and those are outputted as title: Story Name and description: This story is about this. I want to then post those two bits of data into Wordpress, on a different server, as a post type of Story. I'm asking, since my research shows that the XML-RPC methodology may be old, if the newer JSON API supports this and where I could find a good example. I've been looking for examples but so far haven't found any.

Edit: So it seems that coupled with the downvotes and the fact that I failed to identify myself as a non-spammer, this thread might be going the way of many threads in the open source community...however, for the sake of anyone trying to go down the same path as me, who are simply looking for working examples of how to use the JSON API to post content, I did find this discussion. So far I've been able to create a nonce, which is hopeful. now I think I need to write a controller for my next step so I can actually post content of type post:

http://wordpress.org/support/topic/plugin-json-api-how-to-add-a-comment-or-post

This is the url I tried after generating a nonce for myself... I got an error:

http://mysite.com/?json=post.create_posts&nonce='5d3f89d00e'&title='testingpost'&content='this%20is%20mypost%20stuff'&status=publish

Which currently returns: {"status":"error","error":"Unknown controller 'post'."}

5
  • 1
    You realize that "scraping" content is often, though not always, in the same class of behavior as "spamming", right? Unless you can explain your project in such a way as to convince me that what you are doing isn't unseemly I, personally, will be unwilling to help with this question. I can't speak for anyone else, or on behalf of the site and its polices.
    – s_ha_dum
    CommentedJul 1, 2013 at 17:39
  • I'm actually trying to help a scattershot writing community by bringing together various poems and stories published over twenty different websites run by the writers individually. They would like a site showcasing their work as a whole. No spamming is taking place here....
    – yoyodyne
    CommentedJul 1, 2013 at 18:12
  • Good enough. I don't consider the question to be detailed enough to provide a good answer though. Does your WordPress site request the data from the third party? Do you save it an upload over FTP? How does this work? What does your JSON look like?
    – s_ha_dum
    CommentedJul 1, 2013 at 18:18
  • The scraper, which is written in Python (using a framework called Scrapy), can output what it scrapes in JSON. Let's say, I have a title and description and those are outputted as title: Story Name and description: This story is about this. I want to then post those two bits of data into Wordpress, on a different server, as a post type of Story. I'm asking, since my research shows that the XML-RPC methodology may be old, if the newer JSON API supports this and where I could find a good example. I've been looking for examples but so far haven't found any.
    – yoyodyne
    CommentedJul 1, 2013 at 18:54
  • Edit that ^^ into the question, as well as information addressing the rest of my questions. And the explanation about why you are not an unpleasant content scraper.
    – s_ha_dum
    CommentedJul 1, 2013 at 18:56

2 Answers 2

1

You can either use the JSON API plugin, or wait for the JSON REST API project that is now running through the Google Summer of Code project. This is intended to eventually go into WordPress core, but will work as a standalone plugin for the time being.

2
  • I looked at the JSON REST API and it seems to have a lot of good things. Unfortunately, it's clearly not ready for primetime. I tried installing it only to be met with a fatal error.
    – yoyodyne
    CommentedJul 1, 2013 at 19:21
  • I ended up using the JSON API to use the create_post part of the API. My blockage was that I didn't see the JSON API settings page after I activated the plugin that in turn allows me to enable the create_post bit.
    – yoyodyne
    CommentedJul 9, 2013 at 22:09
2

Since you're already using Python, I recommend grabbing a Python XML-RPC library and building a request into your scraper. We actually added support for creating new custom post type entries in WordPress 3.4.

If it helps, the documentation for Python's XML-RPC library is freely available. As is the documentation for the WordPress wp.newPost rpc call.

Just specify a post_type when you build your post's content struct and it will be created as you expect it to within WordPress.

1
  • Oh THANK YOU!!! This helps I think. I'll try it tomorrow and update this thread accordingly so it don't just leave off for future coders.
    – yoyodyne
    CommentedJul 2, 2013 at 6:51

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.