Skip to content
niho edited this page Sep 30, 2011 · 10 revisions

Welcome to the Related Wiki

Philosophy

The design of Related is highly opinionated and the intention is not for Related to be a hammer for all nails. The goal is specifically to be an easy to use, yet powerful, tool for creating social applications. Quality over quantity! If you're doing DNA sequencing or are trying to figure out who is a terrorist you should probably look into another tool. If you're building the next Facebook or a cool new semantic web product, you've probably come to the right place.

Sharding

The intention is for Related to be fairly easy to shard efficiently. It can't shard at all right now, but the ambition is to support sharding in the 1.0 release. To be able to shard in a useful way any system design needs to make trade-offs. Related relies heavily on the very efficient set operations that Redis provides, but with the obvious downside that those operations can't be efficiently supported in a sharded environment in a general way.

There are currently two proposed solutions to solving this:

  1. Do set operations on the server side in Redis when possible (when the involved keys happen to exist on the same server) and emulate the set operation on the client side when it's not possible. For many applications like most social networks for example, this solution will be more than efficient enough. The drawback is of course that it will become less efficient the more shards you add and for networks with a very large number of relationships for each node (like a Twitter clone for example) it might not work very well at all at scale.

  2. Store all node and relationship properties ("entities" in Related parlance) in a partitioned key space (using Redis::Distributed) which will allow you to store as many nodes and relationships as you want and scale infinitely to an unlimited number of servers. But store the set keys that defines the "links" between nodes on a single master server (or a replicated master-slave setup). In such a setup the only limitation will be how much data you can store on the master server, and since the sets are fairly compact compared to the entity properties in most applications, that should work fine most of the time. All set operation queries and graph traversal stuff will go to the master server and everything else will hit the sharded servers.

The intention is to implement both strategies to allow you to select the one that makes the most sense for your application.

Indexing

Related does not implement any kind of indexing of the data you store in a node or relationship. Which means the only way to access that data is by knowing the ID of a node or relationship and then to query the graph to get to the data. There are no plans to implement any index functionality in Related.

The reason for that is that most useful indexing is either too application specific or too heavy/complex/inefficient to be added to Related. Some applications will need full text search, some will need geo spatial indices, some will not need any indices at all. Some will need to index the data in real time, some will rather index the data in offline background jobs. To support all those use cases in an optimal way is not realistic. Related is a graph database, not a full text search solution. Some people might not like it and I know it might be controversial, but my opinion is that indexing does not belong in the database. Another strong reason is that it does not play very well with caching. If you want to cache an object in memcached for example, each time you want to retrieve that object you will need its ID to look it up in memcached. If the only way to get that ID is to query an index and that index is a part of your database, then some of the benefit of having a caching layer in front of your database gets diminished. So the recommendation is: "Use the right tool for the right job and everything will work much better!".

Clone this wiki locally