When multiple instances of Interlok start in failover mode, only one single instance, known as the master, will fully start up ready to process messages. All other Interlok instances in the cluster will sit and wait in a dormant state (slave) until they are promoted to the master.
In failover mode all instances are started as slaves.
Upon start-up of multiple instances in failover mode, if you have not configured the slave position for each, they will decide the order themselves.
The instance nominated as slave at position 1 will be the next instance to promote itself to master. If there is currently no master instance running this slave will promote itself immediately.
Should the master stop communicating with the other instances it is then assumed the master instance has gone down, in which case as soon as each slave recognises the lack of communication from the master, the slave in position 1 will promote itself to master. Each additional slave in positions 2 and above will each promote themselves up one position.
If you start new slaves at any time, all slaves in the cluster may decide to re-order themselves to accomodate the new slave.
Each instance in the cluster constantly communicates with every other instance in the cluster via multicast or direct TCP, dpending on your configuration, specified below.
Please note, some environments do not enable multicast broadcasting by default, therefore you may require additional networking configuration before Interlok failover clusters can be configured using Multicast mode.
For each Interlok instance you wish to join the failover cluster, you will need only one optional component;
Simply copy the jar files from the optional component directory into the lib directory of each Interlok instance in the failover cluster.
Currently you can choose between Multicast failover or Direct TCP failover.
Direct TCP mode only since Interlok version 3.6.4
The main benfit of multicast failover is that you can add new failover peers as and when you want. Simply start a new instance of Interlok and the new instance will be added to the failover cluster automatically.
If multicast is not available on your environment then you can configure Direct TCP failover instead. The main difference here is that you must define each machines host and port for each instance in the failover cluster group in either the bootstrap.proprties, or via java system properties.
Since Interlok version 3.7.4, you may now add new failover peers while using the Direct TCP mode without having to reconfigure the current peers. Simply make sure your new instance has been configured with each of the host and port numbers of the current failover members, from there the existing members will update their own peer lists as the new instance comes online.
Configuring Basic Interlok Failover
Simple configuration is required as detailed below.
As of 3.7 starting Interlok in failover mode can be done using interlok-boot with
--failover argument and passing the name of the chosen bootstrap.properties file as a parameter.
A short example of a windows start script; the last line setting the argument and the bootstrap.properties parameter;
set ADAPTRIS_HOME=C:\Adaptris\Interlok set JAVA_HOME=C:\Java\jdk1.8\bin cd %ADAPTRIS_HOME% %JAVA_HOME%\bin\java -jar lib\interlok-boot.jar --failover bootstrap.properties
Alternatively starting Interlok in failover mode simply requires changing the java main class and passing the name of the chosen bootstrap.properties file as the single parameter.
NOTE: Make sure your chosen bootstrap properties file is on the classpath.
If you choose to create your own Interlok startup script make sure that the following class is set as the main class in the java command;
A short example of a windows start script; the last line setting the main class and the bootstrap.properties parameter;
set CLASSPATH=. set ADAPTRIS_HOME=C:\Adaptris\Interlok3.3 set JAVA_HOME=C:\Java\jdk1.8.0_60\bin set CLASSPATH=%CLASSPATH%;%ADAPTRIS_HOME%\lib\interlok-core.jar;%ADAPTRIS_HOME%\config for /R %ADAPTRIS_HOME%\lib %%H in (*.jar) do set CLASSPATH=!CLASSPATH!;..\Interlok3.3\lib\%%~nxH %JAVA_HOME%\java -cp %CLASSPATH% com.adaptris.failover.FailoverBootstrap bootstrap.properties
Alternatively you can modify the Interlok LAX file to set the main class like so;
And then adding the location of the bootstrap.properties file as the single parameter like so;
If you are using Direct TCP failover mode as discussed above then you have the option of setting the failover peers in either the bootstrap.properties (shown below) or as system properties.
If you choose system properties, then the following will need to be added to you start script/lax file;
See the discussion below for a more detailed explaination of both properties.
|failover.multicast.group||must be defined when working in multicast mode|
|failover.multicast.port||must be defined when working in multicast mode|
|failover.tcp.port||must be defined if direct TCP mode is enabled|
|failover.tcp.peers||must be defined if direct TCP mode is enabled, and is a
|failover.tcp.host||optional property that specifies the host name or IP address of the current Interlok instance. If not specified we will try to determine the local IP address.|
|failover.slave.position||if you wish to preconfigure the slave position in the hierarchy then define this, otherwise one will be assigned|
|failover.ping.interval.seconds||How often each instance will attempt to communicate with each other, defaults to 3 seconds|
|interval.instance.timeout.seconds||How long before an instance is deemed as no longer available, defaults to 20 seconds|
failover.tcp.peers may either be set in the bootstrap.proprties or as java system proprties.
Assumes the local instance can listen for TCP failover events on port 15555 and that there are two further instances in the failover cluster who happen to be running on the same machine (localhost) and are listening on ports 15556 and 15557 respectively.
Example Java system properties
Assumes the local instance can listen for TCP failover events on port 4444 and that there are two further instances in the failover cluster who happen to be running on the same machine (localhost) and are listening on ports 15556 and 15557 respectively.
java -Dfailover.tcp.port=15555 -Dfailover.tcp.peers=localhost:15556;localhost:15557 -jar lib/interlok-boot.jar -failover
Should you wish to stop the master instance running; either through the JMX API or the Web UI, the slave at position 1 is expected to promote itself to master.
To enable this functionality you must change the default event handler in your Interlok configuration. This is simply changed like this (notice the failover-event-handler);
<adapter> <unique-id>MyInterlokInstance</unique-id> <event-handler class="failover-event-handler"/>
NOTE: If you manually force a failover, you will not be able to re-start the master instance through the JMX API or Web UI. The original master instance will need a full restart.
Once you have put it all together; then when you start your Interlok instances, you will get additional logging in the log file.
All Interlok instances in the failover cluster will include the following log lines;
INFO [main] [FailoverBootstrap] Starting Interlok instance in failover mode as a slave. INFO [main] [FailoverBootstrap] Slave position 1
If you do not specify the slave positions, then you will see logging like this;
INFO [main] [FailoverBootstrap] Starting Interlok instance in failover mode as a slave. INFO [main] [FailoverBootstrap] No slave position has been set, one will be allocated. INFO [Failover Monitor Thread] [FailoverManager] Assigning myself slave position 1
Only the master instance (slave at position 1, after it has promoted itself) will additionally show the following lines in the log file;
TRACE [Failover Monitor Thread] [FailoverManager] Master not available, promoting myself to master. INFO [Failover Monitor Thread] [FailoverBootstrap] Promoting to MASTER
If you set the JVM parameter interlok.failover.debug to true, then periodically each instance in the failover cluster will report it’s state and information (at TRACE level logging) about all known other online instances.
Example below shows the logging from the slave at position 1, where 4 instances are in the cluster; a master and 3 slaves.
TRACE [Failover Monitor Thread] [com.adaptris.failover.FailoverManager] com.adaptris.failover.FailoverManager@7ce58397[ Self=OnlineInstance[ID=1f0eab39-e917-4b02-b987-bf37d9620c3d,Type=slave,Position=1,last=Thu Jan 01 01:00:00 GMT 1970] master=OnlineInstance[ID=10cc907e-f4ca-4cb4-b96f-9547035028c4,Type=master,Position=0,last=Mon Jul 23 12:34:39 BST 2018] slaves=[OnlineInstance[ID=9c9fc821-f257-420c-bffa-6838b4baecbc,Type=slave,Position=2,last=Mon Jul 23 12:34:40 BST 2018], OnlineInstance[ID=b22c7478-dafd-4d95-af1d-3b13c8cc9080,Type=slave,Position=3,last=Mon Jul 23 12:34:41 BST 2018]] ]