Version 25

    Issues with JGroups and IPv6 on Linux

     

    In a nutshell: IPv6 works with JGroups on Windows (but TCP_NIO doesn't work).

     

    On Windows (XP), you have to enable IPv6 by clicking on any connection (e.g. "Local Area Connection") --> Properies --> Install --> Protocol --> IPv6.

    If you do ipconfig, you will see the new IPv6 addresses. By default, WinXP seems to prefer IPv6.

     

    On Linux, IPv6 doesn't work (see below) due to a bug in the JDKs up to and including JDK 5. Mustang (JDK 6) fixed this, but I could not verify that it works on RHAT fedora core 5. So, on Linux, with SUN's VM, you have to use -Djava.net.preferIPv4Stack=true, until you switch to JDK 6. I haven't tried this on JRockit or IBM's VM.

     

     

    Here is the Jira issues: http://jira.jboss.com/jira/browse/JGRP-47

     

    IP_MULTICAST_IF:

     

    java.net.SocketException: bad argument for IP_MULTICAST_IF: address not bound to any interface
            at java.net.PlainDatagramSocketImpl.socketSetOption(Native Method)
            at java.net.PlainDatagramSocketImpl.setOption(PlainDatagramSocketImpl.java:295)     
    

     

    There was an earlier post mentioning the same symptoms. Assuming that you are running JBoss on Linux, if your Linux kernel has IPv6 support, you will have this problem. If you don't need IPv6 support, you can disable it from the kernel and the problem will go away.

     

    A different symptom of IPv6 problems

     

    Another possible error: if using JGroups for features such as JBossCache, the following error can be encountered at startup time:

    --- MBEANS THAT ARE THE ROOT CAUSE OF THE PROBLEM ---
    ObjectName: JBoss.cache:service=ClusteredService
      State: FAILED
      Reason: org.jgroups.ChannelException: failed to start protocol stack
      I Depend On:
        jboss:service=Naming
        jboss:service=TransactionManager
    

    Certain versions of Red Hat EL exhibit the error (ex: Red Hat Enterprise Linux ES release 4 (Nahant Update 2)), while others avoid the issue somehow (ex: Red Hat Enterprise Linux AS release 3 (Taroon Update 2)). Windows XP has also been reported to throw this error.

     

    Yet another symptom

     

    Our QE lab admins and some JBoss developers have reported problems with JBoss AS trunk builds when servers are started by the AS testsuite. Startup / shutdown of the AS is very slow, with messages similar to the following logged:

     

    2007-11-12 16:49:06,932 WARN  [org.jboss.ha.framework.server.JChannelFactory] Flush failed at 127.0.
    0.1:32822 ......

     

    Investigation shows that this occurs when the node is bound to localhost.  The node is unable to receive its own multicast messages.  So, in this case an IPv6 problem did not result in a catastrophic failure, but rather a partially broken channel.

     

    Again, the solution was to modify our testsuite to use -Djava.net.preferIPv4Stack=true when starting servers.

     

    Turn off IPv6 in the Linux kernel

     

    To disable IPv6 in the kernel (or really, disable automatic loading of the IPv6 module - see here for IPv6 and Linux details for details), add the following line in your modules config file :

      alias net-pf-10 off   # disable automatically load of IPv6 module on demand
    

    For 2.4 kernels, this would be in  or .  For 2.6 kernels, this would be in  or  (for SLES).  You need to reboot the Linux box.  I'm not sure of another way to disable this without a reboot.  If you know of any, please update this page or add a link to some further docs on the subject.

     

    Also, once you disable this in the kernel, you can still enable IPv6 on an interface by following these instructions (for RedHat at least).

     

     

    Turn off IPv6 in the JVM

     

    Another source of problems might be the use of IPv6, and/or misconfiguration of /etc/hosts. If you communicate between an IPv4 and an IPv6 host, and they are not able to find each other, try the  property, e.g.

     

     

     

    JDK 1.4.1 uses IPv6 by default, although is has a dual stack, that is, it also supports IPv4. Here's more details on the subject.

     

    IPv6 and IP Bonding

    Looks like turning IPv6 off also has the nice side effect that IP Bonding works (it doesn't work with IPv6 turned on).

    This is anecdotal (from a customer support case), and we need to verify this in the 2.3/2.4 time frame

     

    IPv6 support

    Due to a bug in the JDK on Linux, a socket cannot be bound to an IPv6 address. This is the reason why  has to be used. This bug is supposedly fixed in Mustang (JDK 6).

    JGroups, starting from version 2.3, supports IPv6. The change from previous versions was mainly how we marshalled IpAddresses. Actually, previous versions (starting from 2.2.8) worked too, we just changed externalization, which didn't work, but which wasn't used anyway starting with 2.2.8 because we used Streamable.

     

     

    IPv6 and scoped link-local addresses

     

    Scoped link local IPv6 address look like this: fe80::216:cbff:fea9:c3b5%en0 or fe80::216:cbff:fea9:c3b5%3. The %X suffix refers to the interface on which the link local address is assigned. Because multiple link local interfaces can have the same link-local address assigned, the suffix might be needed to differentiate the interfaces [1].

     

    In JGroups, a bind_addr can be defined with or without the scope-id suffix. If the bind_addr is link-local, JGroups tries to attach a scope-id, just to be on the safe side. This is on the receiver side.

     

    Now, on the server side, there is a problem: if we get a scoped link-local address (e.g. through MPING, or TCPPING), Socket.connect(ADDR) will not work as the scope-id doesn't mean anything on the client !

     

    The solution is to strip away the scope-id from ADDR on Socket.connect(ADDR), this is done automatically for all socket connects in JGroups. Note that this is not necessary when sending datagrams.

     

    [1] https://jira.jboss.org/jira/browse/JGRP-976

     

    Referenced by: