Wordpress RSS Parser

Question

This is an RSS Parser intended only to parse the feed Wordpress provides (and even then, only what I actually want to display on my website). System.Linq.Xml and System.ServiceModel.Syndication are unavailable because I'm targeting .Net Core instead of the full framework.

I think I've achieved a more idiomatic implementation this time. Where can I improve? Are there opportunities to compose higher order functions that I'm missing?

Rss.fs

module Rss open System type Item = { Title: string Link: Uri Comments: Uri PublishDate: DateTimeOffset Description: string } type Channel = { Title: string Link: Uri Description: string LastBuildDate: DateTimeOffset Items: seq<Item> } type RssFeed = { Version: double Channel: Channel } open System.Xml let parse input = // Helper Functions let innerText (elementName: string) (node: XmlNode) = let element = node.Item elementName element.InnerText let titleText node = node |> innerText "title" let descriptionText node = node |> innerText "description" let elementAsDateTimeOffset elementName node = node |> innerText elementName |> DateTimeOffset.Parse let elementAsUri elementName node = node |> innerText elementName |> Uri let linkUri node = node |> elementAsUri "link" let itemNodeToItem node = { Title = node |> titleText Link = node |> linkUri Comments = node |> elementAsUri "comments" PublishDate = node |> elementAsDateTimeOffset "pubDate" Description = node |> descriptionText } // Load the document and begin parsing let document = XmlDocument() document.LoadXml input let rss = document.DocumentElement let channelNode = rss.FirstChild let items = channelNode.ChildNodes |> Seq.cast<XmlNode> |> Seq.filter (fun node -> node.Name.Equals "item") |> Seq.map itemNodeToItem let channel = { Title = channelNode |> titleText Link = channelNode |> linkUri Description = channelNode |> descriptionText LastBuildDate = channelNode |> elementAsDateTimeOffset "lastBuildDate" Items = items } let version = (rss.Attributes.ItemOf "version").Value |> Double.Parse let rssFeed = {Version = version; Channel = channel} rssFeed

Controllers/BlogController.fs

namespace theupsyde.Controllers open System open System.Collections.Generic open System.Linq open System.Threading.Tasks open Microsoft.AspNetCore.Mvc open Rss open System.Net type BlogController () = inherit Controller() let fetchUrl url = async { let req = WebRequest.Create(Uri(url)) use! resp = req.GetResponseAsync() |> Async.AwaitTask use stream = resp.GetResponseStream() use reader = new IO.StreamReader(stream) return reader.ReadToEnd() } let rssFeed = fetchUrl "https://christopherjmcclellan.wordpress.com/feed/" |> Async.RunSynchronously |> Rss.parse let recent = rssFeed.Channel.Items member this.Index () = this.View(rssFeed.Channel) member this.Recent() = this.PartialView("BlogPosts", recent) member this.Latest () = let latest = recent |> Seq.head this.PartialView("BlogPost", latest)

TheQuickBrownFox · Accepted Answer · 2017-08-14 14:47:02Z

The Rss helper functions are inside the parse function, even though they don't use any of the state inside that scope. Consider moving these up to the module level, even though they are not used anywhere else. By doing this the main function is shorter and easier to follow. You could also make those functions private when you pull them out.
Seq.cast can throw an exception. Even though you know that every item will be an XmlNode in this case, it can be safer to filter out objects that are not of the required type while casting. Unfortunately, F# doesn't have a function for this in the Seq module but you can always use the one already in .NET: channelNode.ChildNodes |> System.Linq.Enumerable.OfType<XmlNode>
You fetch the RSS feed in the constructor of BlogController. This will happen once for each request. Using Async.RunSynchronously will block a thread, so you lose the benefit of using Async in the first place. You should avoid Async.RunSynchronously whenever you can. You can pull out the RSS fetching and parsing into another function that returns an async. Call this function from each controller method, do more async processing if necessary and then use Async.StartAsTask to return a task from the method. ASP.NET will then be able to run your code without unnecessary blocking.

EDIT:

I'm not too familiar with ASP.NET MVC, but based on this issue I think something like this should work:

let fetchRss() = async { let! content = fetchUrl "https://christopherjmcclellan.wordpress.com/feed/" return Rss.parse content } type BlogController() = inherit Controller() member this.Index() = async { let! rss = fetchRss() return this.View(rss.Channel) }

Thanks for the review. I was struggling with the async and I had a feeling I was blocking... Could I trouble you to show me what you mean? A small example of how to do it right would really help me out. — RubberDuck, CommentedAug 14, 2017 at 11:01
Thanks. That was close. It needs to be something like member this.Index () = Async.StartAsTask(async {...}). It's a bit ugly, but properly converts the Async<'a> into a Task<T> and runs everything asynchronously. — RubberDuck, CommentedAug 16, 2017 at 2:02

Der Kommissar · Accepted Answer · 2017-08-15 21:17:01Z

So TheQuickBrownFox is spot on when they mention the potential exception with Seq.Cast<'T>, and I want to talk about that and introduce a couple new operators to you.

The potential problem lies in the fact that it could be possible for an item to not be castable to XmlNode, and then we throw exceptions which is usually frowned upon in F#, we would instead want to return an error state.

One of the many operators in F# is the :? operator, which tests if a value is castable to the specified type. (So x :? XmlNode will test if x is an XmlNode.) We can thing of this as analogous to the is operator in C#.

So I want to write a castSafe method which will cast our object list to a type, without throwing an exception. We intend to use this as channelNode.ChildNodes |> Seq.castSafe<XmlNode>.

The first step is to define the type-structure of our function, which I'll define as IEnumerable -> 'T option seq. We also want to define what happens if an invalid cast is attempted? I'm going to return a 'T option, that defines the result. If it's Some 'T, then we casted successfully, if it's None, then we didn't.

A basic, lazy implementation might look as follows:

let castSafe<'T> (items:IEnumerable) : 'T option seq = items |> Seq.cast<obj> |> Seq.map (fun node -> match node with | :? 'T as result -> Some result | _ -> None)

So we're immediately casting to obj to allow the type-system to make it's next move. Since XmlNodeList doesn't implement IEnumerable<T> we can't enumerate each item as an XmlNode: we have to enumerate as an obj. Then we match on the type: :? 'T says "Is node a 'T type?" If it's a yes, then we return Some result which is the node casted as 'T, if not we return a None.

Of course this means on the client-side we would have to test for Some node or None, which we don't want to do. So we'll also build a castSafeFilter which will cast to 'T and remove the None entires. A naive implementation could look as follows:

let castSafeFilter<'T> (items:IEnumerable) : 'T seq = items |> castSafe |> Seq.filter (fun node -> match node with | Some x -> true | None -> false) |> Seq.map (fun node -> node.Value)

But we don't want to do that: we're casting then testing for Some/None, but there's a more effective manner: we can filter first. Enter the :?> operator. This will do a direct cast to a type lower in the type hierarchy, that is: this will do a runtime cast to a sub-type. (So if we cast XmlNode -> obj, this let's us go back to XmlNode.) I wish this were synonymous with the as operator in C#, but it's more like an explicit cast: (T)obj.

Our new function looks something like this:

let castSafeFilter<'T> (items:IEnumerable) : 'T seq = items |> Seq.cast<obj> |> Seq.filter (fun node -> node :? 'T) |> Seq.map (fun node -> node :?> 'T)

Do the <obj> cast immediately, but then filter out all non-'T items then cast.

By doing it like this we eliminate the double-map, and do a single map to 'Tif and only if it is a 'T. More interesting: we can add these two methods directly to the Seq module just as if they were extension methods:

module Seq open System.Collections let castSafe<'T> (items:IEnumerable) : 'T option seq = items |> Seq.cast<obj> |> Seq.map (fun node -> match node with | :? 'T as result -> Some result | _ -> None) let castSafeFilter<'T> (items:IEnumerable) : 'T seq = items |> Seq.cast<obj> |> Seq.filter (fun node -> node :? 'T) |> Seq.map (fun node -> node :?> 'T)

Usually I would omit type-annotations, but when building a generic, multi-purpose module I always keep them. We could define an fsi file instead that would contain all the signatures, but that's beyond the scope of what I wish to explain here.

Finally, we're back to our code, you had:

let items = channelNode.ChildNodes |> Seq.cast<XmlNode> |> Seq.filter (fun node -> node.Name.Equals "item") |> Seq.map itemNodeToItem

We're going to change that to:

let items = channelNode.ChildNodes |> Seq.castSafeFilter<XmlNode> |> Seq.filter (fun node -> node.Name.Equals "item") |> Seq.map itemNodeToItem

And bam, now we have safely casted the items to an XmlNode, and allowed ourselves a great deal of reuse in the future.

And finally, with some extra cleverness, we could even devise a castFilter method, that would test if our node is a 'T, and if so do the filtering at that level to avoid redundant calls. That is:

let castFilter<'T> (filter:'T -> bool) (items:IEnumerable) : 'T seq = items |> Seq.cast<obj> |> Seq.filter (fun node -> match node with | :? 'T as result -> result |> filter | _ -> false) |> Seq.map (fun node -> node :?> 'T)

And interestingly if we work with result on the filter at that point, the :? 'T as result tests if node is a 'T, and does the node :?> 'T cast for us, so we've finally achieved the same functionality as the C# as keyword.

And as mentioned in comments, using Seq.choose is another great alternative, which opens us up to a little bit of extraction:

let private castItemSafe<'T> (item:obj) : 'T option = match item with | :? 'T as result -> Some result | _ -> None let castSafe<'T> (items:IEnumerable) : 'T option seq = items |> Seq.cast<obj> |> Seq.map castItemSafe let castSafeFilter<'T> (items:IEnumerable) : 'T seq = items |> Seq.cast<obj> |> Seq.choose castItemSafe

Because of the nature of our casting, we really only need to store the function which casts an individual item in one spot, since map will return all items and choose will naturally filter them out.

I previously touched on some basic composition stuff, and I want to continue that discussion with some new examples here.

We have opportunity to use some function composition in this code, which lends us to partial application, and allow us to not care about the node parameter to our functions.

For example, we can take your titleText and descriptionText functions and simply omit the node parameter and our functions still work the same, thus leaving them agnostic to what's applied even moreso than they were. But what if we want to omit node from elementAsDateTimeOffset? Well we can't really do that as it's written, because the pipe-right (|>) operator expects a value, not a function. Now we can use the compose-right (>>) operator to eliminate that requirement:

let titleText = innerText "title" let descriptionText = innerText "description" let elementAsDateTimeOffset elementName = innerText elementName >> DateTimeOffset.Parse let elementAsUri elementName = innerText elementName >> Uri let linkUri = elementAsUri "link"

Now here is the part I really like: none of our signatures changed, but we no longer care about the parameter itself in these functions. We've partially applied the functions. For example, on titleText it is simply applying the "title" string to the innerText function (which expects two parameters) and returning a function that just needs an XmlNode applied to it.

I like to use this more and more in my code as it seems a bit clearer to me in the end. The net result is the same, and there's no significant advantage (that I know of - do note I'm no expert) to using one over the other, but it becomes a preference on the developer's end. (This also lends itself very easily towards the introduction of currying: that is, every function has one and only one parameter, even those with multiple parameters.)

An alternative implementation: let castSafeFilter<'T> = Seq.map box >> Seq.choose (function :? 'T as x -> Some x | _ -> None). But even better: let castSafeFilter<'T> = System.Linq.Enumerable.OfType<'T>. I think it's more idiomatic to use an existing .NET method instead of re-writing it :) — TheQuickBrownFox, CommentedAug 14, 2017 at 17:16
I'm unlikely to use the safe cast here, as I'm guaranteed that XmlNodeList only contains XmlNodes, but it was highly educational none the less. Thank you. I'll see if I can grok your partial application suggestions later. — RubberDuck, CommentedAug 14, 2017 at 18:20
@RubberDuck I don't blame you there - I'm not saying this is the better idea, all I wanted to point at is that it is possible to build something like this that feels natural, allowing us to expand on what the language already does. :) — Der Kommissar, CommentedAug 15, 2017 at 21:11
The partial application thing worked great. It was neat to see how it didn't change the signature of the function. Now I just need to figure out how/when I can leverage partial application in my code in general. You're a great teacher. Thanks again! — RubberDuck, CommentedAug 16, 2017 at 22:43

Stack Exchange Network

Wordpress RSS Parser

Rss.fs

Controllers/BlogController.fs

2 Answers 2

Linked

Hot Network Questions

Wordpress RSS Parser

Rss.fs

Controllers/BlogController.fs

2 Answers 2

Linked

Related

Hot Network Questions