7

I have a series of systems that need to be distributed to a variable number of nodes (at least two, but more likely 8 - 10). For performance reasons, any one piece of state needs to be maintained in memory and replicated to at least two nodes although as demand increases this may be replicated to more nodes as dictated by the manager. Unfortunately, we cannot buy commercial data grids which are already very good at this.

The question: Assuming the state is immutable, what are some patterns and/or algorithms I can use to facilitate this process? I've supplied a sample to facilitate the discussion, but I'm not tied to this. Really!... I'm wide open on picking a new direction to meet these requirements. If you could point me to pictures or code samples that would be much appreciated.

As it stands, one of the nodes will self-promote and will orchestrate who should replicate and when (i.e. the manager).

Other assumptions:

  • List item
  • Systems will communicate via TCP/IP and Multicast
  • No databases
  • No file transfer... just byte arrays
  • All components are currently C#, and I can change almost anything
  • Eventual consistency will be fine

Note: I realize this sample does not handle fail cases. At this point I thought it would over-complicate things

Possible sequence

3
  • Thanks. I figured a visual aid would be needed to move the conversation along. Trying to create the correct mental image through text descriptions would have made this post a lot longer.
    – JoeGeeky
    CommentedDec 19, 2011 at 13:59
  • I realize this sample does not handle fail cases. At this point I thought it would over-complicate things Fail cases in distributed systems like this can sometimes account for up to 80% of the complexity.
    – maple_shaft
    CommentedDec 19, 2011 at 15:14
  • @maple_shaft I can appreciate that, I just meant, complicate things relative to this post. The fail cases are certainly something that will have to be addressed, but not in this question. Thank you though.
    – JoeGeeky
    CommentedDec 19, 2011 at 17:26

3 Answers 3

2

I believe there are no ACID transactions and you are aware of BASE and CAP theorems. What I understand from your diagrams is that the Manager X scans the source Y and target Z (this may increase) and facilitate replication between the two. In such a case, you can use the excellent ZMQ library and its C# bindings. It would solve you almost all your problems.

Else you may use MongoDB as it easy to replicate (MongoDB is a document-oriented datastore)

Things may get interesting if you want to be transaction safe.

4
  • Unfortunetally, bringing products like MongoDB into this problem-set will not be an option. But, ZMQ... maybe, I am reading up on this now. Thanks.
    – JoeGeeky
    CommentedDec 19, 2011 at 17:28
  • I was not aware of BASE and CAP theorems. Looking them up now. Thanks.
    – JoeGeeky
    CommentedDec 19, 2011 at 17:45
  • 1
    Found an interesting intro to the CAP theorems at julianbrowne.com/article/viewer/brewers-cap-theorem. This will take a little to digest. For reasons that are starting to become clear to me, these concepts are wrapped up in database technologies as well.
    – JoeGeeky
    CommentedDec 19, 2011 at 18:27
  • I preferred MongoDB since you are going to save stream of data. A stream of data is usually saved as a file and MongoDB has GridFS to serve files. The major advantage is easy replication. Regarding, CAP, traditional RDBMS cannot support it since it is against their fundamentalsCommentedDec 20, 2011 at 4:07
1

I think it is worth considering distributed cache approach.

Microsoft: http://msdn.microsoft.com/en-us/library/ff383731.aspx

Open source memcached: http://memcached.org/

What is the size of data you want to replicate?

1
  • Due to a number of requirements, this capability needs to be organic to the product. Consequently, I won't be able to use these. That is why I'm looking into the underlying patterns that products like these employ.
    – JoeGeeky
    CommentedDec 19, 2011 at 17:32
0

Paxos is an algorithm that can be used for active-active replication between multiple nodes. In the (patented) version we use, it has been enhanced to work well in a WAN environment.

Implementation is observed to be non-trivial, but appears to meet the described criteria.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.