-
-
Notifications
You must be signed in to change notification settings - Fork 107
Home
Developing distributed applications at scale is not a simple task and feels unnatural as it hides inherent risks and complexities. Microservices is not a silver bullet, it involves dealing with cross-cutting concerns that are inherent to the architecture - one of the main concerns that come with microservices architecture is the stability of the overall system teams will usually prefer a natural growth as stability evolves rather than scaling from day one.
Developing debugging and testing microservices at scale might become an almost impossible task for smaller development teams, usually, its a practice saved to the big guys. one alternative is to give-up on microservices architecture at first and refactor later which slows down the team productivity. with scalecube you don't have to give up on microservices! scalecube allows you to enjoy both worlds. it separates between the logical entity of a service and its physical attributes. you can now develop the system on same JVM as you would with a monolith and re-package its modules when you feel ready or required to scale. at that point with scalecube breaking the monolith becomes a packaging task rather than a refactoring task.
Scalecube microservices is a service built based on scalecube cluster which provides a built-in service discovery. the discovery uses SWIM protocol and gossip that scales better and have inherent failure detection and superior coherent understanding of the cluster state and cluster membership taking part in a swarm of services. such approach provides a true service-mesh and lower latency and more resilient technique as routing and service lookup is a local operation. this simplify the deployments dramatically reduces the need for configuration management of large scale microservices cluster.
ScaleCube is the embeddable microservices library for the rapid development of distributed, resilient, reactive applications that scales. It connects distributed microservices in a way that resembles a fabric when viewed collectively. It greatly simplifies and streamlines asynchronous programming and provides a tool-set for managing microservices architecture. ScaleCube has succeeded to find a way to achieve ease of development, performance, stability, and flexibility without a compromise.
Let’s say that you asked to build a distributed database similar to Cassandra. Your storage system will store and process large amounts of data running on a huge number of commodity servers. In other words, your system will rely on the power of 100s of nodes to manage data.
At this scale, failures will be the norm rather than the exception. Even if we assume that one node lasts for 1000 days (roughly 3 years), in a cluster of 500 nodes there will be a failure once every 2 days.
To deal with this scenario, you would require a failure detection service, which apart from detecting faulty nodes also keeps all non-faulty processes in sync about the processes that are alive. In this blog post, we’ll cover one such protocol called SWIM and understand its inner workings.
The SWIM or the Scalable Weakly-consistent Infection-style process group Membership protocol is a protocol used for maintaining membership amongst processes in a distributed system.
A membership protocol provides each process of the group with a locally maintained list, called a membership list, of other non-faulty processes in the group. The protocol, hence, carries out two important tasks -
detecting failures i.e. how to identify which process has failed and disseminating information i.e. how to notify other processes in the system about these failures. It goes without saying that a membership protocol should be scalable, reliable and fast in detecting failures. The scalability and efficiency of membership protocols is primarily determined by the following properties
Completeness: Does every failed process eventually get detected? Speed of failure detection: What is the average time interval between a failure and its detection by a non-faulty process? Accuracy: How often are processes classified as failed, actually non-faulty (called the false positive rate)? Message Load: How much network load is generated per node and is it equally distributed? Ideally, one would want a protocol that is strongly complete 100% accurate which means that every faulty process is detected, with no false positives. However, like other trade-offs in distributed systems there exists an impossibility result which states that 100% completeness and accuracy cannot be guaranteed over an asynchronous network. Hence, most membership protocols (including SWIM) trade in accuracy for completeness and try to keep the false positive rate as low as possible.
The SWIM failure detector uses two parameters - a protocol period T and an integer kwhich is the size of failure detection sub-groups.
SWIM Failure Detection
The figure above shows how the protocol works. After every T time units, a process Miselects a random process from its membership list - say Mj - and sends a ping to it. It then waits from an ack from Mj. If it does not receive the ack within the pre-specified timeout, Miindirectly probes Mj by randomly selecting k targets and uses them to send a ping to Mj. Each of these k targets then sends a ping to Mj on behalf of Mi and on receiving an acknotifies Mi. If, for some reason, none of these processes receive an ack, Mi declares Mj as failed and hands off the update to the dissemination component (discussed below).
The key differentiating factor between SWIM and other heart-beating / gossip protocols is how SWIM uses other targets to reach Mj so as to avoid any congestion on the network path between Mi and Mj.
The dissemination component simply multicasts failure updates to the rest of the group. All members receiving this message delete Mj from its local membership list. Information about new members or voluntarily leaving is multicast members in a similar manner. Information about newly joined members or voluntarily leaving members are multicast in a similar manner.
Infection-Style Dissemination - In a more robust version of SWIM, instead of relying on the unreliable and inefficient multicast, the dissemination component piggybacks membership updates on the ping and ack messages sent by the failure detector protocol. This approach is called the infection-style (as this is similar to the spread of gossip, or epidemic in a population) of dissemination which has the benefits of lower packet loss and better latency.
Suspicion Mechanism - Even though the SWIM protocol guards against the scenario where there’s congestion between two nodes by pinging k nodes, there is still a possibility where a perfectly healthy process Mj becomes slow (high load) or becomes temporarily unavailable due to a network partition around itself and hence is marked failed by the protocol.
SWIM mitigates this by running a subprotocol called the Suspicion subprotocol whenever a failure is detected by the basic SWIM. In this protocol, when Mi finds Mj to be non-responsive (directly and indirectly) it marks Mj as a suspect instead of marking it as failed. It then uses the dissemination component to send this message Mj: suspect to other nodes (infection-style). Any process that later finds Mj responding to ping un-marks the suspicion and infects the system with the Mj: alive message.
Time-bounded Completeness - The basic SWIM protocol detects failures in an average constant number of protocol periods. While every faulty process is guaranteed to be detected eventually, there is a small probability that due to the random selection of target nodes there might be a considerable delay before a ping is sent to faulty node.
A simple improvement suggested by SWIM to mitigate this is by maintaining an array of known members and selecting ping targets in a round-robin fashion. After the array is completely traversed, it is randomly shuffled and the process continues. This provides a finite upper bound on the time units between successive selections of the same target.
- Provision and interconnect microservices as a unified system (cluster)
- Async RPC with java-8 CompleteableFutures support
- Reactive Streams support with RxJava.
- No single-point-of-failure or single-point-of-bottleneck
- Cluster aware and distributed
- Modular, flexible deployment models and topology
- Zero configuration, automatic peer-to-peer service discovery using gossip
- Simple non-blocking, asynchronous programming model
- Resilient due to failure detection, fault tolerance, and elasticity
- Routing and balancing strategies for both stateless and stateful services
- Low latency and high throughput
- Takes advantage of the JVM and scales over available cores
- Embeddable to existing Java applications
- Message Driven based on google-protocol-buffers
- Natural Circuit-Breaker due to tight integration with scalecube-cluster failure detector.
- Support Service instance tagging.
Microservices architecture is not a silver bullet and introduces cross-cutting-concerns scale-cube was developed to deal with some of the issues introduced by micro-services architecture:
Developer tools/IDEs are oriented on building monolithic applications and don’t provide explicit support for developing distributed applications and Testing is more difficult.
with scalecube its possible to develop / tests / simulate distributed applications within a single JVM instance scalecube so it's still possible the regular IDE and existing development tools.
ScaleCube supports inter-service communication in a transparent way in the sense that developers code has very little boilerplate for inter-service communicating.
ScaleCube allows you to scale your teams as your micro-services architecture grows. managing the services contracts is very similar to the monolithic approach. in fact, it's possible to develop a monolithic first approach and only scale on demand. the decision to distribute a service is a complete packaging issue with no code changes required.
provides a fault-tolerant decentralized peer-to-peer based cluster management service with no single point of failure or single point of bottleneck. It is based on efficient and scalable implementations of cluster membership, gossip protocol and failure detector algorithms and consists of the same name components.
provides a method to communicate between cluster members and exchange bi-directional communications. The transport component is based on netty that provides better throughput and lower latency communication.