Search

Blogs About RHQ

RHQ

Creating a delegating login module (for JBoss EAP 6.1 )


[ If you only want to see code, just scroll down ]

Motivation


In RHQ we had a need for a security domain that can be used to secure the REST-api and its web-app via container managed security. In the past I had just used the classical DatabaseServerLoginModule to authenticate against the database.

Now does RHQ also allow to have users in LDAP directories, which were not covered by above module. I had two options to start with:
  • Copy the LDAP login modules into the security domain for REST
  • Use the security domain for the REST-api that is already used for UI and CLI


The latter option was of course favorable to prevent code duplication, so I went that route. And failed.

I failed because RHQ was on startup dropping and re-creating the security domain and the server was detecting this and complaining that the security domain referenced from the rhq-rest.war was all of a sudden gone.

So next try: don't re-create the domain on startup and only add/remove the ldap-login modules (I am talking about modules, because we actually have two that we need).

This also did not work as expected:
  • The underlying AS sometimes went into reload needed mode and did not apply the changes
  • When the ldap modules were removed, the principals from them were still cached
  • Flushing the cache did not work and the server went into reload-needed mode


So what I did now is to implement a login module for the rest-security-domain that just delegates to another one for authentication and then adds roles on success.

This way the rhq-rest.war has a fixed reference to that rest-security-domain and the other security domain could just be handled as before.

Implementation



Let's start with the snippet from the standalone.xml file describing the security domain and parametrizing the module


<security-domain name="RHQRESTSecurityDomain" cache-type="default">
<authentication>
<login-module code="org.rhq.enterprise.server.core.jaas.DelegatingLoginModule" flag="sufficient">
<module-option name="delegateTo" value="RHQUserSecurityDomain"/>
<module-option name="roles" value="rest-user"/>
</login-module>
</authentication>
</security-domain>


So this definition sets up a security domain RHQRESTSecurityDomain which uses the DelegatingLoginModule that I will describe in a moment. There are two parameters passed:
  • delegateTo: name of the other domain to authenticate the user
  • roles: a comma separated list of modules to add to the principal (and which are needed in the security-constraint section of web.xml


For the code I don't show the full listing; you can find it in git.

To make our lives easier we don't implement all functionality on our own, but extend
the already existing UsernamePasswordLoginModule and only override
certain methods.


public class DelegatingLoginModule extends UsernamePasswordLoginModule {


First we initialize the module with the passed options and create a new LoginContext with
the domain we delegate to:

@Override
public void initialize(Subject subject, CallbackHandler callbackHandler,
Map<String, ?> sharedState,
Map<String, ?> options)
{
super.initialize(subject, callbackHandler, sharedState, options);

/* This is the login context (=security domain) we want to delegate to */
String delegateTo = (String) options.get("delegateTo");

/* Now create the context for later use */
try {
loginContext = new LoginContext(delegateTo, new DelegateCallbackHandler());
} catch (LoginException e) {
log.warn("Initialize failed : " + e.getMessage());
}



The interesting part is the login() method where we get the username / password and store it for later, then we try to log into the delegate domain and if that succeeded we tell super that we had success, so that it can do its magic.


@Override
public boolean login() throws LoginException {
try {
// Get the username / password the user entred and save if for later use
usernamePassword = super.getUsernameAndPassword();

// Try to log in via the delegate
loginContext.login();

// login was success, so we can continue
identity = createIdentity(usernamePassword[0]);
useFirstPass=true;

// This next flag is important. Without it the principal will not be
// propagated
loginOk = true;

the loginOk flag is needed here so that the superclass will call LoginModule.commit() and pick up the principal along with the roles.
Not setting this to true will result in a successful login() but no principal
is attached.


if (debugEnabled) {
log.debug("Login ok for " + usernamePassword[0]);
}

return true;
} catch (Exception e) {
if (debugEnabled) {
LOG.debug("Login failed for : " + usernamePassword[0] + ": " + e.getMessage());
}
loginOk = false;
return false;
}
}


After success, super will call into the next two methods to obtain the principal and its roles:

@Override
protected Principal getIdentity() {
return identity;
}


@Override
protected Group[] getRoleSets() throws LoginException {

SimpleGroup roles = new SimpleGroup("Roles");

for (String role : rolesList ) {
roles.addMember( new SimplePrincipal(role));
}
Group[] roleSets = { roles };
return roleSets;
}


And now the last part is the Callback handler that the other domain that we delegate to will use to obtain the credentials from us. It is the classical JAAS login callback handler. One thing that first totally confused me was that this handler was called several times during login and I thought it is buggy. But in fact the number of times it is called corresponds to the number of login modules configured in the RHQUserSecurityDomain.


private class DelegateCallbackHandler implements CallbackHandler {
@Override
public void handle(Callback[] callbacks) throws IOException, UnsupportedCallbackException {

for (Callback cb : callbacks) {
if (cb instanceof NameCallback) {
NameCallback nc = (NameCallback) cb;
nc.setName(usernamePassword[0]);
}
else if (cb instanceof PasswordCallback) {
PasswordCallback pc = (PasswordCallback) cb;
pc.setPassword(usernamePassword[1].toCharArray());
}
else {
throw new UnsupportedCallbackException(cb,"Callback " + cb + " not supported");
}
}
}
}



Again, the full code is available in the RHQ git repository.

Debugging (in EAP 6.1 alpha or later )



If you write such a login module and it does not work, you want to debug it. Started with the usual means to find out that my login() method was working as expected, but login just failed. I added print statements etc to find out that the getRoleSets() method was never called. But still everything looked ok. I did some googling and found this good wiki page. It is possible to tell a web app to do audit logging


<jboss-web>
<context-root>rest</context-root>
<security-domain>RHQRESTSecurityDomain</security-domain><disable-audit>false</disable-audit>


This flag alone is not enough, as you also need an appropriate logger set up, which is explained on
the wiki page. After enabling this, I saw entries like


16:33:33,918 TRACE [org.jboss.security.audit] (http-/0.0.0.0:7080-1) [Failure]Source=org.jboss.as.web.security.JBossWebRealm;
principal=null;request=[/rest:….

So it became obvious that the login module did not set the principal. Looking at the code in the superclasses then led me to the loginOk flag mentioned above.

Now with everything correctly set up the autit log looks like


22:48:16,889 TRACE [org.jboss.security.audit] (http-/0.0.0.0:7080-1)
[Success]Source=org.jboss.as.web.security.JBossWebRealm;Step=hasRole;
principal=GenericPrincipal[rhqadmin(rest-user,)];
request=[/rest:cookies=null:headers=authorization=user-agent=curl/7.29.0,host=localhost:7080,accept=*/*,][parameters=][attributes=];

So here you see that the principal rhqadmin has logged in and got the role rest-user assigned, which is the one matching in the security-constraint element in web.xml.

Further viewing



I've presented the above as a Hangout on Air. Unfortunately G+ muted me from time to time when I was typing while explaining :-(

After the video was done I got a few more questions that at the end made me rethink the startup phase for the case that the user has a previous version of RHQ installed with LDAP enabled. In this case, the installer will still install the initial DB-only RHQUserSecurityDomain and then in the startup bean
we check if a) LDAP is enabled in system settings and b) if the login-modules are actually present.
If a) matches and they are not present we install them.

This Bugzilla entry also contains more information about this whole story.
Creating Https Connection Without javax.net.ssl.trustStore Property
Question: How can you use the simple Java API call java.net.URL.openConnection() to obtain a secure HTTP connection without having to set or use the global system property "javax.net.ssl.trustStore"? How can you make a secure HTTP connection and not even need a truststore?

I will show you how you can do both below.

First, some background. Java has a basic API to make a simple HTTP connection to any URL via URL.openConnection(). If your URL uses the "http" protocol, it is very simple to use this to make basic HTTP connections.

Problems creep in when you want a secure connection over SSL (via the "https" protocol). You can still use that API - URL.openConnection() will return a HttpsURLConnection if the URL uses the https protocol - however, you must ensure your JVM can find and access your truststore in order to authenticate the remote server's certificate.

[note: I won't discuss how you get your trusted certificates and how you put them in your truststore - I'll assume you know, or can find out, how to do this.]

You tell your JVM where your truststore is by setting the system property "javax.net.ssl.trustStore" and you tell your JVM how to access your truststore by giving your JVM the password via the system property "javax.net.ssl.trustStorePassword".

The problem is these are global settings (you often see instructions telling you to set these values via the -D command line arguments when starting your Java process) so everything running in your JVM must use that truststore. And you can't alter those system properties during runtime and expect those changes to take effect. Once you ask the JVM to make a secure connection, those system property values appear to be cached in the JVM and are used thereafter for the life of the JVM (I don't know exactly where in the JRE code these values are cached, but my experience shows me that they are). Changing those system properties later on in the lifetime of the JVM has no effect; the original values are forever used.

Another problem that some people run into is having the need for a truststore in the first place. Sometimes you don't have a requirement to authenticate the server endpoint; however, you would still like to send your data encrypted over the wire. You can't do this readily since the connection you obtain from URL.openConnection() will, by default, expect to use your truststore located at the path pointed to by the system property javax.net.ssl.trustStore.

To allow me to use different truststores for different connections, or to allow me to encrypt a connection but not authenticate the endpoint, I wrote a Java utility object that allows you to do just this.

The main constructor is this:

public SecureConnector(String secureSocketProtocol,
                       File   truststoreFile,
                       String truststorePassword,
                       String truststoreType,
                       String truststoreAlgorithm)

You pass it a secure socket protocol (such as "TLS") and your truststore file location. If the truststore file is null, the SecureConnector object will assume you do not want to authenticate the remote server endpoint and you only want to encrypt your over-the-wire traffic. If you do provide a truststore file, you need to provide its password, its type (e.g. "JKS"), and its algorithm (e.g. "SunX509") - if you pass in null for type and/or algorithm, the JVM defaults are used.

Once you create the object, just obtain a secure connection to any URL via a call to SecureConnector.openSecureConnection(URL). This expects your URL to have a protocol of "https". If successful, an HttpsURLConnection object is returned and you can use it like any other connection object. You do not need to set javax.net.ssl.trustStore (or any other javax.net.ssl system property) and, as explained above, you don't even need to provide a truststore at all (assuming you don't need to do any authentication).

The code for this is found inside of RHQ's agent - you can read its javadoc and look through SecureConnector code here.

The core code is found in openSecureConnection and looks like this, I'll break it down:

First, it simply obtains the HTTPS connection object from the URL itself:
HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
Then it prepares a custom SSLContext object using the given secure socket protocol:
TrustManager[] trustManagers;
SSLContext sslContext = SSLContext.getInstance(getSecureSocketProtocol());
If no truststore file was provided, it will build its own "no-op" trust manager and "no-op" hostname verifier. What these "no-op" objects will do is always accept all certificates and hostnames thus they will always allow the SSL communications to flow. This is how the authentication is by-passed:
if (getTruststoreFile() == null) {
    // configured to not care about authenticating server, encrypt but don't worry about certificates
    trustManagers = new TrustManager[] { NO_OP_TRUST_MANAGER };
    connection.setHostnameVerifier(NO_OP_HOSTNAME_VERIFIER);
If a truststore file was provided, then it will be loaded in memory and stored in a KeyStore instance:
} else {
    // need to configure SSL connection with truststore so we can authenticate the server.
    // First, create a KeyStore, but load it with our truststore entries.
    KeyStore keyStore = KeyStore.getInstance(getTruststoreType());
    keyStore.load(new FileInputStream(getTruststoreFile()), getTruststorePassword().toCharArray());
The truststore file's content (now stored in a KeyStore object) is used to initialize a trust manager. Unlike the "no-op" trust manager that was created above (if a truststore file was not provided), this trust manager really does perform authentication and it uses the provided truststore's certificates to authorize the server being communicated with. This is why we no longer need to worry about the system properties "javax.net.ssl.trustStore" and "javax.net.ssl.trustStorePassword" - this builds its own trust manager using the data provided by the caller:
    // create truststore manager and initialize it with KeyStore we created with all truststore entries
    TrustManagerFactory tmf = TrustManagerFactory.getInstance(getTruststoreAlgorithm());
    tmf.init(keyStore);
    trustManagers = tmf.getTrustManagers();
}
Finally, the SSL context is initialized with the trust manager that was created earlier (either the "no-op" trust manager, or the trust manager that was initialized with the truststore's certificates). That SSL context is handed off to the SSL connection so the connection can use the context when it needs to perform authentication:
sslContext.init(null, trustManagers, null);
connection.setSSLSocketFactory(sslContext.getSocketFactory());
The connection is finally returned to the caller, fully configured and ready to be used.
return connection;
This is helpful for certain use cases. First, it is helpful when you have multiple truststores that you need to choose from when connecting to different servers as well as being able to switch truststores at runtime (remember, the system property values of javax.net.ssl.trustStore, et. al. are fixed for the lifetime of the JVM - this helps bypass that restriction). This is also helpful in local testing, debugging and demo scenarios when you don't really need or care about setting up truststores and certificates but you do want to connect over https.

RHQ 4.7 released


RHQ 4.7 has been released and one of the two major features in this release are
the new charts that finally have replaced the year old charts that we had since the
start of RHQ project:


Screenshot with new charts


The new charts are implemented on with the awesome D3.js toolkit as I've written before.

The other big change is an upgrade of the underlying app server to JBoss EAP 6.1 alpha 1.

As always there have been many more smaller improvements and bug fixes.

Please check the full release notes for details. They also contain a list of commits.

RHQ is an extensible tool to monitory your infrastructure of machines and applications, alert operators on user defined conditions, configure resources and run operations on them from a central web-based UI. Other ways of communicating with RHQ include a command line interface and a REST-api.

You can download the release from source forge.


As mentioned above, the old installer is gone, so make sure to read
the wiki document describing how to use the new installer.

Maven artifacts are available from the JBoss Nexus repository and should also be available from Central.



Please give us feedback, be it in Bugzilla, Mailing lists or the forum. Or just join us on IRC at irc.freenode.net/#rhq.

Deleting RHQ Agent Made Easier
In the past, if you wanted to remove an RHQ Agent from your RHQ environment, the simple answer was "just uninventory the platform" which would, under the covers, also remove the agent record completely.

However, in some cases, users found it difficult to remove their agent. Usually, what happens is they try to install the RHQ Agent, run into problems, and then get their RHQ system in a state that causes their RHQ Agent to not be able to register with the RHQ Server. For example, this could happen when a person runs the agent as a different user from before or with the -L command line option - both of which essentially purges the agent's security token and could cause the RHQ Server to reject future registration requests from the agent.

If a person does not understand the linkage between platform and agent, or if the agent's platform was never committed to inventory, it became difficult to understand how to get out of the quadmire.

This has now been addressed as an enhancement as requested by BZ 849711. Now, the answer is simple - regardless of whether the platform is in inventory or not, and even if the agent's resources do not yet show up in the discovery queue - you have a way to quickly purge your agent from the system. This will allow you to get back to a clean slate and attempt to re-install your agent.

You do this by going to the top Adminstration page and selecting the "Agents" item. From here, you see the list of all the agents currently registered in your RHQ environment. If you select one or more of them, you now have the option of pressing the new "Delete" button at the bottom. This will do a few things. First, if the agent's platform is already in inventory, it will uninventory that platform. This means that platform and all its child servers and services will be removed (so be careful and make sure you really want to do this - you will lose all manageability and all audit history for all resources previously managed by that agent). Once that is done, all resources will disappear from the inventory and you won't even see any resources for that agent in the Discovery Queue. Finally, the agent's record itself is removed - so the Administration>Agents page will show that the agents have disappeared.

 
With the agent and its resources completely removed, you have the option to attempt to re-install the agent if you wish to bring it back.

You can also use this feature if your managed infrastructure has changed and you no longer want to manage a machine. Just select the agent that was responsible for managing that machine and delete it.

Note that if your agent is still running, it will attempt to re-register itself! So if you no longer wish to manage a machine, make sure you shutdown the agent as well (you'll obviously want to do this anyway, since you won't want an RHQ Agent consuming resources on a machine that you no longer want managed by RHQ).
Awesome new graphs in RHQ - based on d3.js


Mike Thompson has yesterday presented the latest and greatest version of the new graphs for RHQ in a video on YouTube. Shortly after he has committed the results of his huge work into RHQ master branch.


While this work is not yet finished, it is the result of the work started by Denis Krusko in last years Google Summer of Code. At the moment both the old and new graphs are can be looked at in the UI, so that you can compare them and potentially report non-matches.


Here are some screenshots to foster your appetite:


Popup chart for a single metric
Popup chart for a single metric

Individual metric
Individual metric on the monitoring tab


As the subject already says are those graphs made with the help of the awesome D3.js framework - I let Mike chime in to describe in more details what he and Denis had to do to get this to work inside GWT+SmartGWT.

I've uploaded a snapshot from master as of this morning (my time) of this from our CI server onto SourceForge for you to try. THIS IS NOT FOR PRODUCTION.


There is a known issue where the red bar shows "..global exception.." this is harmless and we will fix that anyway. Also the graph portlets in the dashboard don't honor the column width yet.


Please do not forget to report bugs (if there are any :-)

Sorry Miss Jackson -- or how I loved to do custom Json serialization in AS7 with RestEasy
Who doesn't remember the awesome "Sorry Miss Jackson" video ?

Actually that doesn't have to do anything with what I am talking here -- except that RestEasy inside JBossAS7 is internally using the Jackson json processing library. But let me start from the beginning.

Inside the RHQ REST api we have exposed links (in json representation) like this:

{ 
"rel":"self",
"href":"http://...
}

which is the natural format when a Pojo

class Link {
String rel;
String href;
}

is serialized (with the Jackson provider in RestEasy).

Now while this is pretty cool, there is the disadvantage that if you need the link for a rel of 'edit', you had to load the list of links, iterate over them and check for each link if its rel is 'edit' and then take the value of the href.

And there is this Bugzilla entry.

I started investigating this and thought, I will use a MessageBodyWriter and all is well. Unfortunately MessageBodyWriters do not work recursively and thus cannot format Link objects as part of another object in a special way.

Luckily I have done custom serialization with Jackson in the as7-plugin before, so I tried this, but the Serializer was not even instantiated. More trying and fiddling and a question on StackOverflow led me to a solution by copying the jackson libraries (core, mapper and jaxrs) into the lib directory of the RHQ ear and then all of a sudden the new serialization worked. The output is now

{ "self":
{ "href" : "http://..." }
}


So far so nice.

Now the final riddle was how to use the jackson libraries that are already in user by the resteasy-jackson-provider. And this was solved by adding respective entries to jboss-deployment-structure.xml, which ends up in the META-INF directory of the rhq.ear directory.

The content in this jboss-deployment-structure.xml then looks like this (abbreviated):


<sub-deployment name="rhq-enterprise-server-ejb3.jar">
<dependencies>
....
<module name="org.codehaus.jackson.jackson-core-asl" export="true"/>
<module name="org.codehaus.jackson.jackson-jaxrs" export="true"/>
<module name="org.codehaus.jackson.jackson-mapper-asl" export="true"/>
</dependencies>
</sub-deployment>


AS7 is still complaining a bit at startup:


16:48:58,375 WARN [org.jboss.as.dependency.private] (MSC service thread 1-3) JBAS018567: Deployment "deployment.rhq.ear.rhq-enterprise-server-ejb3.jar" is using a private module ("org.codehaus.jackson.jackson-core-asl:main") which may be changed or removed in future versions without notice.


but for the moment the result is good enough - and as we do not regularly change the underlying container for RHQ, this is tolerable.

And of course I would be interested in an "official" solution to that -- other than just doubling those jars into my application (and actually I think this may even complicate the situation, as (re-)using the jackson jars that RestEasy inside as7 uses, guarantees that the version is compatible).

RHQ 4.6 released

As I've mentioned before, the RHQ team has been very busy since RHQ 4.5.1 (and actually already before that) and has switched the application server it uses to JBoss AS 7.1. Directly after the switchover we have posted a first alpha version.


Now after more work and fixes, we are happy to provide the release 4.6 of RHQ, that has all the issues resolved that arose from the switch. Features of this release are:

  • The internal app server is now JBossAS 7.1.1
  • GWT has been upgraded to version 2.5
  • There is a new installer (this has also changed since the 4.6 alpha release)
  • The REST-Api has been enhanced
  • Korean translations have been added (contributed by SungUk Jeon)
  • Webservices have been removed
  • Building RHQ now requires Java7, but it will still run on Java6
.

See the full release notes for details. They also contain a list of commits.

You can download the release from source forge.


As mentioned above, the old installer is gone, so make sure to read
the wiki document describing how to use the new installer.

Maven artifacts are available from the JBoss Nexus repository and should soon also be available from Central.

We also like to say thank you to our contributors for this release:

  • Jürgen Hoffmann
  • Richard Hensman
  • SungUk Jeon


Please give us feedback, be it in Bugzilla, Mailing lists or the forum. Or just join us on IRC at irc.freenode.net/#rhq.
Best practice for paging in RESTful APIS (updated)?
In the RHQ-REST api, we return individual objects and also collections of objects. Some of those collections are rather small (number of platforms), while others can grow a lot (number of resources in total or number of alerts fired). For the latter it is advised to do some paging and not return the full result set in one go. Some of the reasons are:
  • Memory consumption in server and client
  • Bandwidth need to transfer the data.
  • Latency to transfer huge amounts of data over slow networks


Inside RHQ we have the concept of a PageList<?> where an internal PageControl object defines the page size and other criteria like sorting. The PageList then only contains the objects from a certain page. I think this is a pretty common setup.

And here is where my question comes:

What is the "best-practice" to represent such a PageList in a RESTful api? So far I have seen two major ways:
  1. Add a Link: header that contains the prev and next relations. This is what RFC 5988 suggests and what projects like AeroGear use. The advantage here is that the body still contains the "raw" data and not meta data. And for both cases of 'single' object and 'collection' the data is at the 'root' of the body. Also paging is available for HEAD requests.

    On the other hand, it may get a bit harder for some client code (JavaScript, jQuery) to access the header and make use of the paging links
  2. Put the prev and next relations in the body of the request next to the collection. This has the advantage that there is no need to parse the http header. Disadvantage is that the real payload is now shifted "one level down" for collections.


    I sort of see the paging links as meta-data and think that this should not be mixed with the payload. Now a colleague of mine said: "Isn't that a state change link for the collection like the 'rel=edit' for a single object?". This sounds odd, but can't be denied.

Actually I have also seen mentioning the use of cookies to send the paging information, but that looks very non-transparent to me, so I am not considering this at all.

Just to be clear: I am explicitly talking about paging of collections and not about affordances of individual objects.

So are there established best practices? How do others do it?

If going for the Link: header: would people rather like to see multiple Link headers (see RFC 2616), one for each relation:

Link: <http://foo/?page=3>; rel='next'
Link: <http://foo/?page=1>; rel='prev'

or rather the combined way:

Link: <http://foo/?page=3>; rel='next', Link: <http://foo/?page=1>; rel='prev'

that is listed in RFC 5988?

[update]

I just saw that URLConnection.getHeaderField(name) does not support the multiple Link: headers as it only returns the last occurrence:

If called on a connection that sets the same header multiple times with possibly different values, only the last value is returned.


While there may be other ways to access all the Link: headers, this is a too obvious pitfall, that can be prevented by not using that style.

"Nested Transactions" and Timeouts
While coding up some EJB3 SLSB methods, the following question came up:

If a thread is already associated with a transaction, and that thread calls another EJB3 SLSB method annotated with "REQUIRES_NEW", the thread gets a new (or "pseudo-nested" or for ease-of-writing what I'm simply call the "nested" transaction, even though it is not really "nested" in the true sense of the term) transaction but what happens to the parent transaction while the "nested" transaction is active? Does the parent transaction's timeout keep counting down or is it suspended and will start back up counting down only when the "nested" transaction completes?

For example, suppose I enter a transaction context by calling method A and this method has a transaction timeout of 1 minute. Method A then immediatlely calls a REQUIRES_NEW method B, which itself has a 5 minute timeout. Now suppose method B takes 3 minutes to complete. That is within B's allotted 5 minute timeout so it returns normally to A. A then immediately returns.

But A's timeout is 1 minute! B took 3 minutes on its own. Even though the amount of time A took within itself was well below its allotted 1 minute timeout, its call to B took 3 minutes.

What happens? Does A's timer "suspend" while its "nested" transaction (created from B) is still active?  Or does A's timer keep counting down, regardless of whether or not B's "nested" transaction is being counted down at the same time (and hence A will abort with a timeout)?

Here's some code to illustrate the use-case (this is what I actually used to test this):

@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
@TransactionTimeout(value = 5, unit = TimeUnit.SECONDS)
public void testNewTransaction() throws InterruptedException {
   log.warn("~~~~~ Starting new transaction with 5s timeout...");
   LookupUtil.getTest().testNewTransaction2();
   log.warn("~~~~~ Finishing new transaction with 5s timeout...");
}
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
@TransactionTimeout(value = 10, unit = TimeUnit.SECONDS)
public void testNewTransaction2() throws InterruptedException {
   log.warn("~~~~~ Starting new transaction with 10s timeout...sleeping for 8s");
   Thread.sleep(8000);
   log.warn("~~~~~ Finishing new transaction with 10s timeout...");
}
I don't know what any of the EE specs say about this, but it doesn't matter - all I need to know is how JBossAS7 behaves :) So I ran this test on JBossAS 7.1.1.Final and here's what the log messages say:
17:51:22,935 ~~~~~ Starting new transaction with 5s timeout...
17:51:22,947 ~~~~~ Starting new transaction with 10s timeout...sleeping for 8s
17:51:27,932 WARN  ARJUNA012117: TransactionReaper::check timeout for TX 0:ffffc0a80102:-751071d8:5115811c:449 in state  RUN
17:51:27,936 WARN  ARJUNA012121: TransactionReaper::doCancellations worker Thread[Transaction Reaper Worker 0,5,main] successfully canceled TX 0:ffffc0a80102:-751071d8:5115811c:449
17:51:30,948 ~~~~~ Finishing new transaction with 10s timeout...
17:51:30,949 ~~~~~ Finishing new transaction with 5s timeout...
17:51:30,950 WARN  ARJUNA012077: Abort called on already aborted atomic action 0:ffffc0a80102:-751071d8:5115811c:449
17:51:30,951 ERROR JBAS014134: EJB Invocation failed on component TestBean for method public abstract void org.rhq.enterprise.server.test.TestLocal.testNewTransaction() throws java.lang.InterruptedException: javax.ejb.EJBTransactionRolledbackException: Transaction rolled back
...
Caused by: javax.transaction.RollbackException: ARJUNA016063: The transaction is not active!
   at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1155) [jbossjts-4.16.2.Final.jar:]
   at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:117) [jbossjts-4.16.2.Final.jar:]
   at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:75)
   at org.jboss.as.ejb3.tx.CMTTxInterceptor.endTransaction(CMTTxInterceptor.java:92) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
So it is clear that, at least for JBossAS7's transaction manager, the parent transaction's timer is not suspended, even if a "nested" transaction is activated. You see I enter the first method (which activates my first transaction) at 17:51:22, and immediately enter the second method (which activates my second, "nested", transaction at the same time of 17:51:22). My first transaction has a timeout of 5 seconds, my second "nested" transaction has a timeout of 10 seconds. My second method sleeps for 8 seconds, so it should finish at 17:51:30 (and it does if you look at the log messages at that time). BUT! Prior to that, my first transaction is aborted by the transaction manager at 17:51:27 - exactly 5 seconds after my first transaction was started. So, clearly my first transaction's timer was not suspended and was continually counting down even as my "nested" transaction was active.

So, in short, the answer is (for JBossAS7 at least) - a transaction timeout is always counting down and starts as soon as the transaction activates. It never pauses nor suspends due to "nested" transactions.

When an annotation processor does not process...
In RHQ we have the rest-docs-generator that takes annotations from the code and turns them into xml. As you may have guessed from the last sentence this is implemented as a Java annotation processor and used to work quite well.

Yesterday I wanted to run it on the latest version of the code (we don't run it on every build, as it takes some time with all the backend processing) and it failed. Looking at the processor itself and its test runs did not reveal anything, as they continued to work.

As we use the maven processor plugin, I thought this may fail because we now use Java7 to build (but then I used that before too) and upgraded the plugin, but that did not help. In the end I switched to the maven compiler plugin, which spit out a ton of errors and stacktraces. It turned out, that one of the classes on the classpath had an unsatisfied dependency and the annotation processor just "died" silently before.

After adding the dependency, the errors were gone, but the Processor.init() method was still not called and no processing happened. Looking through tons of output I found this:


Processor <hidden to protect the innocent> matches
[javax.persistence.PersistenceContext,
[……]
javax.ws.rs.Consumes,
javax.interceptor.Interceptors, com.wordnik.swagger.annotations.Api,
javax.ejb.Startup] and returns true.

This "returns true" together with the list of annotations I am interested in means that this other processor that is now in the classpath (probably pulled in when we switched from as4 to as7) swallows all those annotations, so that they are not passed to our processor.

The solution in my case was to explicitly name the rest-docs-generator in the compiler plugin configuration and not to rely on the auto-discovery (I did that in the processor plugin already, but it looks like this had no effect in my case) :


<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
<configuration>
<annotationProcessors>
<processor>org.rhq.helpers.rest_docs_generator.ClassLevelProcessor</processor>
</annotationProcessors>

<proc>only</proc>
<compilerArguments>
<AtargetDirectory>${project.build.directory}/docs/xml</AtargetDirectory>
<AmodelPkg>org.rhq.enterprise.server.rest.domain</AmodelPkg>
<!-- enable the next line to have the output of the processor shown on console -->
<!--<Averbose>true</Averbose>-->
</compilerArguments>
<!-- set the next to true to enable verbose output of the compiler plugin -->
<!--<verbose>false</verbose>-->
</configuration>
[…]


Note that to find the above "Processor .. matches .." output, the compiler plugin must be set to verbose.

I meanwhile also heard that a newer version of that "rouge" processor only looks after its annotations now.
Recap of RHQ @ LJCs first Meet a Project event
RHQ-logoLJC logo

Last thursday I was in London at the London Java Community (LJC) first "Meet a project" event.

Getting there started with an aborted take-off in Stuttgart. The plane accelerated on the runway and then all of a sudden did a full breaking. We exited onto the movement area, parked there for some minutes and then circled back on to the runway to finally take off. But anyway, I arrived safe and early enough in London.

The "Meet a Project" (MaP) event was held at the University College London campus. When I arrived a few attendees were already there and soon after Barry, the organizer joined too. We started in one room to explain "the rules" and then split into three rooms.

Me in the very last session of 6

Myself talking at the back table in the last of the six sessions.


To explain how MaP works, one can think of "speed dating for projects". There were six projects present and thus six groups have been formed, each sitting around a table. The project ambassadors (like myself) then spend 15 mins per table to present the project, explain about open source in general and give hints where and how the attendee may get involved and help the project.

As I did not know what to expect (and as this was the first incarnation of the MaP, no one really did), I created a small slide show to explain about RHQ, and had a handout prepared to give to interested attendees. For the individual sessions I always took the full 15mins. Almost all attendees were very interested and I distributed over 20 handouts.

After the sessions were over at around 8:30pm we went to a hotel bar to do some socializing, and then Manik, Davide, Sanne, Richard Warburton from jClarity and myself went to an indian restaurant to finally have dinner.

I cannot yet tell if the event was a success in the sense that RHQ project really got any new contributors. What I can tell is that the format of "speed dating for projects" felt really good to me, as with the small groups one could have an intensive session with direct feed back if the concepts were clear. With around 50 attendees I am happy to have given away 20 handouts. And while socializing a few attendees told me that they have never heard of RHQ before, and that has been good to hear about. And one lady switched tables to be able to listen to me before she had to leave earlier :)
Monitoring the monster
RHQ-logoJDF logo

The classical RHQ setup assumes an agent and agent plugins being present on a machine ("platform" in RHQ-speak). Plugins then communicate with the managed resource (e.g. an AS7 server); ask it for metric values or run operations (e.g. "reboot").

This article shows an alternative way for monitoring applications at the example of the Ticket Monster application from the JBoss Developer Framework.


The communication protocol between the plugin and the managed resource is dependent on the capabilities of that resource. If the resource is a java process, JMX is often used. In the case of JBoss AS 7, we use the DMR over http protocol. For other kinds of resources this could also be file access or jdbc in case of databases. The next picture shows this setup.

RHQ classic setup


The agent plugin talks to the managed resource and pulls the metrics from it. The agent collects the metrics from multiple resources and then sends them as batches to the RHQ server, where it is stored, processed for alerting and can be viewed in the UI or retrieved via CLI and REST-api.

Extending

The above scenario is of course not limited to infrastructure and can also be used to monitor applications that sit inside e.g. AS7. You can write a plugin that uses the underlying connection to talk to the resource and gather statistics from there (if you build on top of the jmx or as7 plugin, you don't necessarily need to write java-code).

This also means that you need to add hooks to your application that export metrics and make them available in the management model (the MBean-Server in classical jre; the DMR model in AS7), so that the plugin can retrieve them.

Pushing from the application

Another way to monitor application data is to have the application push data to the RHQ-server directly. In this case you still need to have a plugin descriptor in order to define the meta data in the RHQ server (what kinds of resources and metrics exist, what units do the metrics have etc.). In this case you only need to define the descriptor and don't write java code for the plugin. This works by inheriting from the No-op plugin. In addition to that, you can also just deploy that descriptor as jar-less plugin.

The next graphic shows the setup:

RHQ with push from TicketMonster


In this scenario you can still have an agent with plugins on the platform, but this is not required (but recommended for basic infrastructure monitoring). On the server side we deploy the ticket-monster plugin descriptor.

The TicketMonster application has been augmented to push each booking to the RHQ server as two metrics for total number of tickets sold and the total price of the booking (BookingService.createBooking()).


@Stateless
public class BookingService extends BaseEntityService<Booking> {

@Inject
private RhqClient rhqClient;

public Response createBooking(BookingRequest bookingRequest) {
[…]
// Persist the booking, including cascaded relationships
booking.setPerformance(performance);
booking.setCancellationCode("abc");
getEntityManager().persist(booking);
newBookingEvent.fire(booking);
rhqClient.reportBooking(booking);
return Response.ok().entity(booking)
.type(MediaType.APPLICATION_JSON_TYPE).build();


This push happens over a http connection to the REST-api of the RHQ server, which is defined inside the RhqClient singleton bean.

In this RhqClient bean we read the rhq.properties file on startup to determine if there should be any reporting at all and how to access the server. If reporting is enabled we try to find the platform we are running on and if the RHQ server does not know about it, create it. On top of the platform we create the TicketMonster instance. This is safe to do multiple times, as would be the platform creation - I am looking for an existing platform where an agent might perhaps already monitor the basic data like cpu usage or disk utilization.

The reporting of the metrics then looks like this:


@Asynchronous
public void reportBooking(Booking booking) {

if (reportTo && initialized) {
List<Metric> metrics = new ArrayList<Metric>(2);

Metric m = new Metric("tickets",
System.currentTimeMillis(),
(double) booking.getTickets().size());
metrics.add(m);

m = new Metric("price",
System.currentTimeMillis(),
(double) booking.getTotalTicketPrice());
metrics.add(m);

sendMetrics(metrics, ticketMonsterServerId);
}
}


Basically we construct two Metric objects and then send them to the RHQ-Server. The second parameter is the resource id of the TicketMonster server resource in the RHQ-server, which we have obtained from the create-request I've mentioned above.

A difference to the classical setup where the MeasurementData objects inside RHQ always have a so called schedule id associated is that in the above case we pass the metric name as is appears in the deployment descriptor and let the RHQ server sort out the schedule id.


<metric property="tickets" dataType="measurement"
displayType="summary" description="Total number tickets sold"/>
<metric property="price"
displayType="summary" description="Total selling price"/>


And voilà this what the sales look like that are created from the Bot inside TicketMonster:

Bildschirmfoto 2013 01 23 um 11 16 23
Bookings in the RHQ-UI


The display interval has been set to "last 12 minutes". If you see a bar, that means that within the timeslot of 12mins/60 slots = 12sec, there were multiple bookings. In this case the bar shows the max and min value, while the little dot inside shows the average (via Rest-Api it is still possible to see the individual values for the last 7 days).

Why would I want to do that?

The question here is of course why would I want to send my business metrics to the RHQ server, that is normally used for my infrastructure monitoring?

Because we can! :-)

Seriously such business metrics are also able to indicate issues. If e.g. the number of ticket bookings is unusually high or low, this can also be a source of concern and warrants an alert. Take the example of E-Shops that sell electronics and where it happened that someone made a typo and offered laptops that are normally sold at €1300 and are now selling at €130. That news is quickly spread via social networks and sales triple over their normal numbers. Here monitoring the number of laptops sold can be helpful.

The other reason is that RHQ with its user concept allows to set up special users that only have (read) access to the TicketMonster resources, but not to other resources inside RHQ. This way it is possible to give the business people access to the metrics from monitoring the ticket sales.

Resource tree all resourcesBildschirmfoto 2013 01 23 um 10 43 27


On the left you see the resource tree below the platform "snert" with all the resources as e.g. the 'rhqadmin' user sees it. On the right side, you see the tree as a user that only has the right to see the TicketMonster server (™").

TODOs

The above is a proof of concept to get this started. There are still some things left to do:

  • Create sub-resorces for performances and report their booking separately
  • Regularly report availability of the TicketMonster server
  • Better separate out the internal code that still lives in the RhqClient class
  • Get the above incorporated into TicketMonster propper - currently it lives in my private github repo
  • Decide how to better handle an RHQ server that is temporarily unavailable
  • Get Forge to create the "send metric / … " code automatically when a class or field has some special annotation for this purpose. Same for the creation of new items like Performances in the ™ case.


If you want to try it, you need a current copy of RHQ master -- the upcoming RHQ 4.6 release will have some of the changes on RHQ side that are needed. The RHQ 4.6 beta does not yet have them.
RHQ 4.6 beta released

The RHQ team has been very busy since RHQ 4.5.1 (and actually already before that) and has switched the application server it uses to JBoss AS 7.1. Directly after the switchover we have posted a first alpha version.


Now after more work and fixes, we are happy to provide a beta version of RHQ 4.6, that has all the issues resolved that arose from the switch. Features of this release are

  • The internal app server is now JBossAS 7.1
  • GWT has been upgraded to version 2.5
  • There is a new installer (this has also changed since the 4.6 alpha release)
  • The REST-Api has been enhanced
  • Korean translations have been added (contributed by SungUk Jeon)


You can download the release from source forge.


This wiki document describes how to use the new installer.

The first version of the download did unfortunately not contain the Korean locale -- that is now fixed. If you already have downloaded the zip and do not need the Korean locale, then you don't need to re-download.

Please try the release and give us feedback, be it in Bugzilla, Mailing lists or the forum.

AlertDefinitions in the RHQ REST-Api

I have in the last few days added the support for alert definitions to the REST-Api in RHQ master.
Although this will make it into RHQ 4.6 , it is not the final state of affairs.

On top of the API implementation I have also written 27 tests (for the alert part; at the time of writing this posting) that use Rest Assured to test the api.

Please try the API, give feedback and report errors; if possible as Rest Assured tests, to increase the
test coverage and as an easy way to reproduce your issues.

I think it would also be very cool if someone could write a script in whatever language that exports definitions and applies them to a resource hierarchy on a different server (e.g from test to production)
Korean translations contributed to RHQ
Login screen


Thanks to SungUk Jeon we now have Korean translations of the RHQ ui. Those will first appear in the upcoming RHQ 4.6 release

Dashboard
Individual resource


If Korean ist not your default locale, you can switch to it by appending ?locale=ko to the url of the RHQ-ui just like http://localhost:7080/coregui?locale=ko.

Thanks a lot, SungUk
Testing REST-apis with Rest Assured
The REST-Api in RHQ is evolving and I had long ago started writing some integration tests against it.
I did not want to do that with pure http calls, so I was looking for a testing framework and found one that I used for some time. I tried to enhance it a bit to better suit my needs, but didn't really get it to work.

I started searching again and this time found Rest Assured, which is almost perfect. Let's have a look at a very simple example:


expect()
.statusCode(200)
.log().ifError()
.when()
.get("/status/server");


As you can see, this is a fluent API that is very expressive, so I don't really need to explain what the above is supposed to do.

In the next example I'll add some authentication


given()
.auth().basic("user","name23")
.expect()
.statusCode(401)
.when()
.get("/rest/status");


Here we add "input parameters" to the call, which are in our case the information for basic authentication, and expect the call to fail with a "bad auth" response.

Now it is tedious to always provide the authentication bits throughout all tests, so it is possible to tell Rest Assured to always deliver a default set of credentials, which can still be overwritten as just shown:


@Before
public void setUp() {
RestAssured.authentication = basic("rhqadmin","rhqadmin");
}


There are a lot more options to set as default, like the base URI, port, basePath and so on.

Now let's have a look on how we can supply other parameters


AlertDefinition alertDefinition = new AlertDefinition(….);

AlertDefinition result =
given()
.contentType(ContentType.JSON)
.header(acceptJson)
.body(alertDefinition)
.log().everything()
.queryParam("resourceId", 10001)
.expect()
.statusCode(201)
.log().ifError()
.when()
.post("/alert/definitions")
.as(AlertDefinition.class);


We start with creating a Java object AlertDefinition that we use for the body of the POST request. We define that it should be sent as JSON and that we expect JSON back. For the URL, a
query parameter with the name 'resourceId' and value '10001' should be appended.
We also expect that the call returns a 201 - created and would like to know the details is this is not the case.
Last but not least we tell RestAssured, that it should convert the answer back into an object of type AlertDefinition which we can then use to check constraints or further work with it.

Rest Assured offers another interesting and built-in way to check constraints with the help of XPath or it's JSON-counterpart JsonPath:


expect()
.statusCode(200)
.body("name",is("discovery"))
.when()
.get("/operation/definition");


In this (shortened) example we expect that the GET-call returns OK and an object that has a body field with the name 'name' and the value 'discovery'.

Conclusion

Rest Assured is a very powerful framework to write tests against a REST/hypermedia api. With its fluent approach and the expressive method names it allows to easily understand what a certain call is supposed to do and to return.

The Rest Assured web site has more examples and documentation. The RHQ code base now also has >70 tests using that framework.
A small blurb of what I am currently working on
I have not yet committed and pushed this to the repo and it is sill fragile and most likely to change - and still I want to share it with you:


$ curl -i -u rhqadmin:rhqadmin -X POST \
http://localhost:7080/rest/alert/definitions?resourceId=10001 \
-d @/tmp/foo -HContent-type:application/json
HTTP/1.1 201 Created
Server: Apache-Coyote/1.1
Location: http://localhost:7080/rest/alert/definition/10682
Content-Type: application/json
Transfer-Encoding: chunked
Date: Wed, 09 Jan 2013 21:41:10 GMT

{
"id":10682,
"name":"-x-test-full-definition",
"enabled":false,
"priority":"HIGH",
"recoveryId":0,
"conditionMode":"ANY",
"conditions":[
{"name":"AVAIL_GOES_DOWN",
"category":"AVAILABILITY",
"id":10242,
"threshold":null,
"option":null,
"triggerId":null}
],
"notifications":[
{"id":10432,
"senderName":"Direct Emails",
"config":{
"emailAddress":"enoch@root.org"}
}
]}


In the UI this looks like:

General view
Conditions
Notifications


Other features like dampening or recovery are not yet implemented.

To be complete, the content of /tmp/foo looks like this:


{
"id": 0,
"name": "-x-test-full-definition",
"enabled": false,
"priority": "HIGH",
"recoveryId": 0,
"conditionMode": "ANY",
"conditions": [
{
"id": 0,
"name": "AVAIL_GOES_DOWN",
"category": "AVAILABILITY",
"threshold": null,
"option": null,
"triggerId": null
}
],
"notifications": [
{
"id": 0,
"senderName": "Direct Emails",
"config": {
"emailAddress": "enoch@root.org"
}
}
]
}
Monitoring IP Endpoints Via RHQ
RHQ has a little known plugin called the "netservices" plugin. Someone was asking about how RHQ can monitor HTTP endpoints on the #rhq freenode chat room - you use this netservices plugin to do this.

There are actually two different resource types that plugin provides - one allows you to monitor an HTTP URL endpoint (e.g. so you can monitor for different HTTP status codes). The other is a very basic resource type that just verifies that you can ping a specific IP or hostname.

Here's the RHQ 4.5.1 version of that plugin and its source:

http://mirrors.ibiblio.org/maven2/org/rhq/rhq-netservices-plugin/4.5.1/

(note: if anyone in the community is interested in doing some work on RHQ and looking for something simple to do - here's a perfect opportunity. It would be nice to have some documentation written for that plugin, but more importantly, I think we could beef up this plugin. For example, it would be nice to have another resource type (similar to that basic PingService) that is able to confirm that a particular port on a particular IP/host can be connected to. We also could use some testing done on this plugin to make sure it still works as expected.)
JAX-RS 2 client in RHQ (plugins)
When my fellow Red Hatter Bill Burke recently wrote about the new features in JAX-RS 2.0, I started to consider using the new Client api to use in tests and plugins. Luckily Bill provided a 3.0-beta-1 release of RESTEasy that I started with.

To get this going, I started with the plugin case and wrote my few lines of code in the editor. All red, as the classes were not known yet. After many iterations in the editor and the standalone-pc, I ended up with this list of dependencies, which all need to be included in the final plugin jar:

                <artifactItem>
<groupId>org.jboss.resteasy</groupId>
<artifactId>resteasy-client</artifactId>
</artifactItem>
<artifactItem>
<groupId>org.jboss.resteasy</groupId>
<artifactId>jaxrs-api</artifactId>
</artifactItem>
<artifactItem>
<groupId>org.jboss.resteasy</groupId>
<artifactId>resteasy-jaxrs</artifactId>
</artifactItem>
<artifactItem>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</artifactItem>
<artifactItem>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
</artifactItem>
<artifactItem>
<groupId>org.jboss.resteasy</groupId>
<artifactId>resteasy-jackson-provider</artifactId>
</artifactItem>
<artifactItem>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-jaxrs</artifactId>
</artifactItem>
<artifactItem>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-core-asl</artifactId>
</artifactItem>
<artifactItem>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
</artifactItem>
</artifactItems>


Still I got errors that there is no provider for a media type of application/json available. It turned out, that the RHQ plugin container's classloader does not honor the META-INF/services/javax.ws.rs.ext.Providers entry (or any of the services there, so I had to explicitly enable them in code:


javax.ws.rs.client.Client client = ClientFactory.newClient();
client.configuration().register(org.jboss.resteasy.plugins
.providers.jackson.ResteasyJacksonProvider.class);
client.configuration().register(org.jboss.resteasy.plugins
.providers.jackson.JacksonJsonpInterceptor.class);


Now the client works as intended. At there is quite a number of jars involved, I am thinking of providing an abstract REST plugin, that provides the jars and the basic connection functionality, so that other plugins can build on top just like we have done that with the jmx-plugin.

I was also trying to get this to work with Jersey, but failed somewhere in the maven world, as it failed for me after downloading half of the Glassfish distribution.

One interesting question to me is about the target group of this JAX-RS 2.0 client framework, as the Jersey and RESTEasy implementations seem to grow rather large (which does not matter too much for servers or on the desktop), so that Android apps are less likely to use that and will rather fall back to home grown solutions. With the huge number of mobile phones, this could perhaps the end also mean that nobody will touch the Client framework and falls back to those smaller homegrown solutions on mobile and desktop.
REST/JAX-RS documentation generation


As I may have mentioned before :-) we have added a REST api to RHQ. And one thing we have clearly found out during development and from feedback by users is that the pure linking feature of REST may not be enough to work with a RESTful api and that some. Our Mark Little posted an article on InfoQ recently that touches the same subject.

Take this example:


@PUT
@Path("/schedule/{id}")
@Consumes({MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML})
Response updateSchedule(int scheduleId,
MetricSchedule in,
@Context HttpHeaders headers);


So this is a PUT request on some schedule with an id identifier where a certain MetricSchedule thing should be supplied. And then we get some Response back, which is just a generic
JAX-RS "placeholder".

Now there is a cool project called Swagger with a pretty amazing interactive UI to run REST requests in the browser against e.g. the pet store.

Still this has some limitations
  • You need to deploy wagger, which not all enterprises may want
  • The actual objects are not described as I've mentioned above

Metadata via annotations


The metadata in Swagger to define the semantics of the operations and the simple parameters is defined on the Java source as annotations on the restful methods like this:


@PUT
@Path("/schedule/{id}")
@Consumes({MediaType.APPLICATION_JSON,MediaType.APPLICATION_XML})
@ApiOperation(value = "Update the schedule (enabled, interval) ",
responseClass = "MetricSchedule")
@ApiError(code = 404, reason = NO_SCHEDULE_FOR_ID)
Response updateSchedule(
@ApiParam("Id of the schedule to query") @PathParam("id") int scheduleId,
@ApiParam(value = "New schedule data", required = true) MetricSchedule in,
@Context HttpHeaders headers);


which is already better, as we now know what the operation is supposed to do, that it will return an
object of type MetricSchedule and for the two parameters that are passed we also get
a description of their semantics.

REST docs for RHQ



I was looking on how to document the stuff for some time and after finding the Swagger stuff it became clear to me that I do not need to re-invent the annotations, but should (re)use what is there. Unfortunately the annotations were deep in the swagger-core module.

So I started contributing to Swagger - first by splitting off the annotations into
their own maven module so that they do not have any dependency onto other modules, which makes
it much easier to re-use them in other projects like RHQ.

Still with the above approach the data objects like said MetricSchedule are not documented. In order to do that as well, I've now added a @ApiClass annotation to swagger-annotations, that allows to also document the data classes (a little bit like JavaDoc, but accessible from an annotation processor). So you can now do:


@ApiClass(value = "This is the Foo object",
description = "A Foo object is a place holder for this blog post")
public class Foo {
int id;
@ApiProperty("Return the ID of Foo")
public int getId() { return id; }


to describe the data classes.

The annotations themselves are defined in the following maven artifact:


<dependency>
<groupId>com.wordnik</groupId>
<artifactId>swagger-annotations_2.9.1</artifactId>
<version>1.1.1-SNAPSHOT</version>
</dependency>

which currently (as I write this) is only available from the snapshots repo


<!-- TODO temporary for the swagger annotations -->
<id>sonatype-oss-snapshot</id>
<name>Sonatype OSS Snapshot repository</name>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
</repository>

The generator



Based on the annotations I have written an annotation processor that analyzes the files and creates an XML document which can then be transformed via XSLT into HTML or DocBookXML (which can then be transformed into HTML or PDF with the 'normal' Docbook tool chain.

Tool chain


You can find the source for the rest-docs-generator in the RHQ-helpers project on GitHhub.

The pom.xml file also shows how to use the generator to create the intermediate XML file.

Examples



You can see some examples of the generated output (for some slightly older version of the generator in the RHQ 4.5.1 release files on sourceforge, as well as the in the documentation for JBoss ON, where our docs team just took the input and fed it into their DocBook tools chain almost without and change.

Postprocessing…



If you are interested to see how the toolchain is used in RHQ, you can look at the pom.xml file from the server/jar module ( search for 'REST-API' ).


Project independent



One thing I want to emphasize here is that the generator with its toolchain is completely independent from RHQ and can be reused for other projects.


Why I am ready to move to CQL for Cassandra application development
Earlier this year, I started learning about Cassandra as it seemed like it might be a good fit as a replacement data store for metrics and other time series data in RHQ. I developed a prototype for RHQ. I used the client library Hector for accessing Cassandra from within RHQ. I defined my schema using a Cassandra CLI script. I recall when I first read about CQL. I spent some time deliberating over whether to define the schema using a CLI script or using a CQL script. Although I was intrigued but ultimately decided against using CQL. As the CLI and the Thrift interface were more mature, it seemed like the safer bet. While I decided not to invest any time in CQL, I did make a mental note to revisit it at a later point since there was clearly a big emphasis within the Cassandra community for improving CQL. That later point is now, and I have decided to start making extensive use of CQL.

After a thorough comparative analysis, the RHQ team decided to move forward with using Cassandra for metric data storage. We are making heavy use of dynamic column families and wide rows. Consider for example the raw_metrics column family in figure 1,

Figure 1. raw_metrics column family

The metrics schedule id is the row key. Each data point is stored in a separate column where the metric timestamp is the column name and the metric value is the column value. This design supports fast writes as well as fast reads and works particularly well for the various date range queries in RHQ. This is considered a dynamic column family because the number of columns per row will vary and because column names are not defined up front. I was quick to rule out using CQL due to a couple misconceptions about CQL's support for dynamic column families and wide rows. First, I did not think it was possible to define a dynamic table with wide rows using CQL. Secondly, I did not think it was possible to execute range queries on wide rows.

A couple weeks ago I came across this thread on the cassandra-users mailing list which points out that you can in fact create dynamic tables/column families with wide rows. And conveniently after coming across this thread, I happened to stumble on the same information in the docs. Specifically the DataStax docs state that wide rows are supported using composite column names. The primary key can have multiple components, but there must be at least one column that is not part of the primary key. Using CQL I would then define the raw_metrics column family as follows,

This CREATE TABLE statement is straightforward, and it does allow for wide rows with dynamic columns. The underlying column family representation of the data is slightly different from the one in figure 1 though.

Figure 2. CQL version of raw_metrics column family
Each column name is now a composite that consists of the metric timestamp along with the string literal, value. There is additional overhead on reads and writes as the column comparator now has to compare the string in addition to the timestamp. Although I have yet to do any of my own benchmarking, I am not overly concerned by the additional string comparison. I was however concerned about the additional overhead in terms of disk space. I have done some preliminary analysis and concluded that the difference with just storing the timestamp in the column name is negligible due to compression of SSTables which is enabled by default.

My second misconception about executing range queries is really predicated on the first misconception. It is true that you can only query named columns in CQL; consequently, it is not possible to perform a date range query against the column family in figure 1. It is possible though to execute a date range query against the column family in figure 2.

RHQ supports multiple upgrade paths. This means that in order to upgrade to the latest release (which happens to be 4.5.0 at the time of this writing), I do not have to first upgrade to the previous release (which would be 4.4.0). I can upgrade from 4.2.0 for instance. Supporting multiple upgrade paths requires a tool for managing schema changes. There are plenty of such tools for relational databases, but I am not aware of any for Cassandra. But because we can leverage CQL and because there is a JDBC driver, we can look at using an existing tool instead of writing something from scratch. I have done just that and working on adding support for Cassandra to Liquibase. I will have more on that in future post. Using CQL allows us to reuse existing solutions which in turn is going to save a lot of development and testing effort.

The most compelling reason to use CQL is the familiar, easy to use syntax. I have been nothing short of pleased with Hector. It is well designed, the online documentation is solid, and the community is great. Whenever I post a question on the mailing list, I get responses very quickly. With all that said, contrast the following two, equivalent queries against the raw_metrics column family.

RHQ developers can look at the CQL version and immediately understand it. Using CQL will result in less, easier to maintain code. We can also leverage ad hoc queries with cqlsh during development and testing. The JDBC driver also lends itself nicely to applications that run in an application as RHQ does.

Things are still evolving both with CQL and with the JDBC driver. Collections support is coming in Cassandra 1.2. The JDBC driver does not yet support batch statements. This is due to the lack of support for it the server side. The functionality is there in the Cassandra trunk/master branch, and I expect to see it in the 1.2 release. The driver also currently lacks support for connection pooling. These and other critical features will surely make their way into the driver. With the enhancements and improvements to CQL and to the JDBC driver, adding Cassandra support to Hibernate OGM becomes that much more feasible.

The flexibility, tooling, and ease of use make CQL a very attractive option for working with Cassandra. I doubt the Thrift API is going away any time soon, and we will continue to leverage the Thrift API through Hector in RHQ in various places. But I am ready to make CQL a first class citizen in RHQ and look forward to watching it continue to mature into a great technology.
Another nice JBoss OneDayTalk is over (updated)


Yesterday I was at the OneDayTalk conference organized by the Munich JBoss User Group.

And as in the last two years this was a nice conference with an interested audience and it was good
to meet with colleagues again.

This year I was talking about "RHQ and its interfaces" - and other than some other speakers, I enjoyed to just give my talk in German :)
My slides are available as PDF.

Update: A recording of my talk (which I held in German) is now available:

RHQ und seine Schnittstellen from Heiko W. Rupp on Vimeo.



Thanks for the Munich JBug to organize this nice conference and to allow me to present there.
RHQ 4.5.1 released


I am pleased to announce the immediate availability of RHQ 4.5.1.
RHQ is a system for management and monitoring of resources like application servers
or databases and can be extended by writing plugins.

Actually I wanted to announce 4.5.0 a week ago, but a first user report showed an
error in the upgrade path from a previous version, so we have pulled that release
and fixed the bug along with another one and have now created a fresh 4.5.1 release.

Notable changes are:

  • Python support in the Command Line Interface (CLI)
  • Support for importing of scripts in the CLI
  • Enhancements in the JBossAS7 plugin
  • Enhancements in the REST API
  • Events tab allows to filter by date range
  • Postgres 9.2 is now supported as backend database
  • The Sigar library has been updated.


Special thanks goes to Elias Ross and Richard Hensman for their contributions.

Maven artifacts have been uploaded to the JBoss Nexus repo and should show up on maven central soon.

You can find the full release notes, that also contain a download link on the RHQ wiki.

This time we have included the full output from git shortlog for the commits of the release. Please tell us if this is useful for you.

Heiko on behalf of the RHQ team
Fedorahosted GIT links have changed (again)
Well, fedorahosted did it again.

All of my links to RHQ source code in all of my blog entries (and everywhere else I used them - bugzilla entries, wiki pages, etc) will be invalid now, because fedorahosted once again changed their git URLs and AFAICS the old URLs no longer work and do not redirect to the new URLs.

This is at least the second time that I know of that they did this - it may have happened more. Previously, I went through all of my blog entries and updated my source links, but I am not going to do it this time. It is too time consuming and I am not convinced this won't happen again.

So, if you click a link to source code in my blogs and get an error message saying the link is invalid, you know why. :-/

(Heiko is right, we need to move to github :-)
Introducing the No-op plugin for RHQ

Introducing the No-op plugin for RHQ

“A no-op plugin, does that make sense?” you will ask - and indeed it makes sense when you look at recent developments inside RHQ with the REST-api and also the support for jar-less plugins.

The no-op plugin is meant to support jar-less plugins that are written to define ResourceTypes on the server along with their metrics in order to be usable via the REST api, but where no resources are supposed to be found via the classic java-agent.

A concrete use case is the work our GSoC student Krzysztof is doing by bridging monitoring data from CIMmons deployed e.g. in RHEL servers to RHQ resources. Here we have the (possible) need to define resource types that will only be fed via the REST api.

How do I use this?

Basically you write a plugin descriptor (either in the classical way which you then put into a plugin jar) or with the name of *-rhq-plugin.xml and then deploy that into the server. The contents of this descriptor then refer to the No-op plugin - first in the <plugin> elements’s package declaration:

<plugin name="bar-test"
version="4.5.1"
displayName="Jar less plugin test"
description="Just testing"
package="org.rhq.plugins.noop”
xmlns="urn:xmlns:rhq-plugin"
xmlns:c="urn:xmlns:rhq-configuration">

The next part is then to declare a dependency on the No-op plugin to be able to use the classes:

   <depends plugin="No-op" useClasses="true"/>

And finally to use the NoopComponent for the discovery and component classes of <server>s and <service>s:

    <server name="test"
discovery="NoopComponent"
class="NoopComponent"
>

</server>
</plugin>

A plugin defined like this then defines metrics etc. just like every other plugin. This is the plugin info:

Plugin info


And the metric definition templates:

Metric definition templates
Jar-less plugins for RHQ

JAR-less plugins for RHQ

Mazz has written a few times about the Custom-JMX plugin (funnily enough I did not find a newer post to link to), which is basically a plugin descriptor, that re-uses the classes from the base JMX plugin.

While this is very powerful, it has the small usability drawback of requiring to wrap the descriptor into a jar file to be recognized by the server (and the agents).

I have now implemented BZ 741682 which allows to deploy plugin descriptors only. For this to work

  • the file needs to be called *-rhq-plugin.xml (e.g. foo-rhq-plugin.xml)
  • the classes of the plugin need to already be present on the agent

The latter can be achieved by putting a depends directive into the descriptor:

<depends plugin="JMX" useClassed="true"/>

In the following example you see the bar-rhq-plugin.xml file picked up from the $SERVER/plugins directory
(the "drop-box") and placed into the plugins directory inside the app:

09:18:55,160 INFO  [PluginDeploymentScanner] Found plugin descriptor at 
    [/im/dev-container/plugins/bar-rhq-plugin.xml] and placed it at 
    [/im/dev-container/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/bar-rhq-plugin.xml]

Next is the transformation of this plugin descriptor into a jar file - if this is successful, the now obsolete plugin descriptor is removed.

09:18:55,403 INFO  [AgentPluginScanner] Found a plugin-descriptor at 
    [/im/dev-container/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/bar-rhq-plugin.xml],
    creating a jar from it to be deployed at the next scan
09:18:55,411 INFO  [AgentPluginScanner] Deleted the now obsolete plugin descriptor: true

At the next scan of the deployment scanner, the scanner will pick up this generated jar and deploy it like any other plugin:

09:19:26,470 INFO  [ProductPluginDeployer] Discovered agent plugin [bar-test]
09:19:26,474 INFO  [ProductPluginDeployer] Deploying [1] new or updated agent plugins: [bar-test]
09:19:26,656 INFO  [ResourceMetadataManagerBean] Persisting new ResourceType [bar-test:test(id=0)]...
09:19:29,344 INFO  [ProductPluginDeployer] Plugin metadata updates are complete for [1] plugins: [bar-test]

The last step would then be to run plugins update on the agent to get this new plugin from the server and to deploy it into the agent.

RHQ's New Availability Checking
The latest RHQ release has introduced a few new things related to its availability scheduling and checking. (side note: you'll notice that link points to RHQ's new documentation home! The wiki moved to jboss.org)

It used to be that RHQ's availability scanning was pretty rigid. RHQ scanned every resource on a fixed 5 minute schedule. Though this was configurable, it was only configurable on an agent-wide scale (meaning, you could not say that you want to check resource A every 5 minutes, but resource B every 1 minute).

In addition, when the agent would go down (for whatever reason), it took a long time (around 15 minutes by default) for the RHQ Server to consider the agent to be "missing in action" and mark all of the resources that agent was managing as "down". Prior to those 15 minutes (what RHQ calls the "Agent Max Quiet Time Allowed"), you really had no indication there was a problem and some users found this to be too long before alerts started firing. And even though this Agent Max Quiet Time Allowed setting is configurable, under many situations, setting it to something appreciably lower than 15 minutes would cause the RHQ Server to think agents were down when they weren't, triggering false alerts.

RHQ has since addressed these issues. Now, you will notice all server and service resources have a common "Availability" metric in their Monitoring>Schedules subtab in the GUI. This works just like other metric schedules - you can individually configure these as you wish just like normal metrics. By default, all server resources will have their availability checked every 1 minutes (for example, see the server resource snapshot in figure 1). By default, all service resources will have their availability checked every 10 minutes (for example, see the service resource snapshot in figure 2).

Figure 1 - A server resource and its Availability metric, with default interval of 1 minute
Figure 2 - A service resource and its Availability metric, with default interval of 10 minutes
The reason for the difference in defaults is this - we felt most times it didn't pay to scan all services at the same rate as servers. That just added more load on the agent and more load on the managed resources for no appreciable gain in the default case. Because normally, if a service is down, we could have already detected that by seeing that its server is down. And if a server is up, generally speaking, all of its services are normally up as well. So we found checking the server resources more frequently, combined with checking all of their services less frequently, helps detect the common failures (those being, crashed servers) more quickly while lowering the total amount of work performed. Obviously, these defaults are geared toward the general case; if these assumptions don't match your expectations, you are free to alter your own resources' Availability schedules to fit your own needs. But we felt that, out-of-box, we wanted to reduce the load the agent puts on the machine and the resources it is managing while at the same time we wanted to be able to detect downed servers more quickly. And this goes a good job of that.

Another change that was made is that when an agent gracefully shuts down, it sends one final message to the RHQ Server to indicate it is shutting down and at that point the RHQ Server immediately marks that agent's platform as "down" rather than wait for the agent to go quiet for a period of 15 minutes (the Agent Max Quiet Time Allowed period) before being marked "down". Additionally, although the platform is marked "down", that platform's child resources (all of its servers and services) are now marked as "unknown". This is because, technically, RHQ doesn't really know what's happening on that box now that the agent is no longer reporting in. Those managed resources that the agent was monitoring (the servers and services) could be down, but RHQ can't make that determination - they could very well still be up. Therefore, the RHQ Server places all the servers and services in the new availability state known as "unknown" as indicated by the question-mark-within-a-gray-circle icon. See figure 3 as an example.

Figure 3 - Availability data for a service that has been flagged as being in an "unknown" state
As part of these infrastructure changes, the Agent Max Quiet Time Allowed default setting has been lowered to 5 minutes. This means that, in the rare case that the agent actually crashes and was unable to gracefully notify the RHQ Server, the RHQ Server will consider the agent "missing in action" faster than before (now 5 minutes instead of 15 minutes). Thus, alerts can be fired in a more timely manner when agents go down unexpectedly.

Setting up a local Cassandra cluster using RHQ
As part of my ongoing research into using Cassandra with RHQ, I did some work to automate setting up a Cassandra cluster (for RHQ) on a single machine for development and testing. I put together a short demo showing what is involved. Check it out at http://bit.ly/N3jbT8.
RHQPocket updated


I have worked over the last days on improving RHQPocket, an Android application that serves as an example client for RHQ. Most of the features of RHQPocket will work (best) with a current version of the RHQ server (from current master).

The following is a recording I made that shows the current state.

Update on RHQPocket from Heiko W. Rupp on Vimeo.



The video has been recorded with QuickTime player with the running emulator. I first tried this with the "classical" emulator, but this used 100% cpu (1 core) and with recording turned on, this was so slow that simple actions that are normally immediate, would take several seconds.

My next try was to film my real handset, but too much environmental light and reflexions made this a bad experience. After I heard that others had successfully installed the hardware accelerated GPU and the virtualized x86 image for the emulator I went that route and the CPU usage from the emulator went from 100% before to around 15-20% (home screen), so I went that route for recording.

You can find the full source code on GitHub - contributions and feedback are very welcome.
If you don't want to build from source, you can install the APK from here.
EJB Calltime Monitoring
I wanted to give a brief illustration of RHQ's calltime monitoring feature by showing how it is used to collect EJB calltime statistics. The idea here is that I have an EJB application (in my example, I am using the RHQ Server itself as the monitored application!) and I want to see calltime statistics for the EJB method calls being made. For example, in my EJB application being monitored, I have many EJB Stateless Session beans (SLSBs). I want to see how my EJB SLSBs are behaving by looking at how many times my EJB SLSB methods are called and how efficient they are (that is, I want to see the maximum, minimum and average time each EJB SLSB method took).

So, first what I do is go to my EJB SLSB resources that are in inventory and I enable the "Method Invocation Time" calltime metric. You can do this on an individual EJB resource or you can do a bulk change of all your EJB's schedules by navigating to your EJB autogroup and changing the schedule in the autogroup's Schedules subtab:



At this point, the RHQ Agent will collect the information and send it up to the RHQ Server just like any other metric collection. Over time, I can now examine my EJB SLSB calltime statistics by traversing to my EJB resource's Calltime subtab:



I can even look at an aggregate view of all my EJBs via the EJB autogroup. In the RHQ GUI, you will see in the left hand tree all of my EJB SLSBs are children of the autogroup node "EJB3 Session Beans". If I select that autogroup node and navigate to its Calltime subtab, I can see all of the measurements for all of my EJB SLSBs:



Note that this calltime measurement collection feature is not specific to EJBs, or the JBossAS 4 plugin. This is a generic subsystem supported by any plugin that wants to enable this feature. If you want to write your own custom plugin that wants to monitor calltime-like statistics (say, for HTTP URL endpoints or any other type of infrastructure that has a "duration" type metric that can be collected) you can utilize the calltime subsystem to collect your information and let the RHQ Server store it and display it in this Calltime subtab for your resource.
Aggregating Metric Data with Cassandra

Introduction

I successfully performed metric data aggregation in RHQ using a Cassandra back end for the first time recently. Data roll up or aggregation is done by the data purge job which is a Quartz job that runs hourly. This job is also responsible for purging old metric data as well as data from others parts of the system. The data purge job invokes a number of different stateless session EJBs (SLSBs) that do all the heavy lifting. While there is a still a lot of work that lies ahead, this is a big first step forward that is ripe for discussion.

Integration

JPA and EJB are the predominant technologies used to implement and manage persistence and business logic. Those technologies however, are not really applicable to Cassandra. JPA is for relational databases and one of the central features of EJB is declarative, container-managed transactions. Cassandra is neither a relational nor a transactional data store. For the prototype, I am using server plugins to integrate Cassandra with RHQ.

Server plugins are used in a number of areas in RHQ already. Pluggable alert notifcation senders is one of the best examples. A key feature of server plugins is the encapsulation made possible by the class loader isolation that is also present with agent plugins. So let's say that Hector, the Cassandra client library, requires a different version of a library that is already used by RHQ. I can safely use the version required by Hector in my plugin without compromising the RHQ server. In addition to the encapsulation, I can dynamically reload my plugin without having to restart the whole server. This can help speed up iterative development.

Cassandra Server Plugin Configuration
You can define a configuration in the plugin descriptor of a server plugin. The above screenshot shows the configuration of the Cassandra plugin. The nice thing about this is that it provides a consistent, familiar interface in the form of the configuration editor that is used extensively throughout RHQ. There is one more screenshot that I want to share.

System Settings
This is a screenshot of the system settings view. It provides details about the RHQ server itself like the database used, the RHQ version, and build number. There are several configurable settings, like the retention period for alerts and drift files and settings for integrating with an LDAP server for authentication. At the bottom there is a property named Active Metrics Server Plugin. There are currently two values from which to choose. The first is the default, which uses the existing RHQ database. The second is for the new Cassandra back end. The server plugin approach affords us a pluggable persistence solution that can be really useful for prototyping among other things. Pluggable persistence with server plugins is a really interesting topic in and of itself. I will have more to say on that in future post.

Implementation

The Cassandra implementation thus far uses the same buckets and time slices as the existing implementation. The buckets and retention periods are as follows:

Metrics Data BucketData Retention Period
raw data7 days
one hour data2 weeks
6 hour data1 month
1 day data1 year

Unlike the existing implementation, purging old data is accomplished simply by setting the TTL (time to live) on each column. Cassandra takes care of purging expired columns. The schema is pretty straightforward. Here is the column family definition for raw data specified as a CLI script:


The row key is the metric schedule id. The column names are timestamps and column values are doubles. And here is the column family definition for one hour data:


As with the raw data, the schedule id is the row key. Unlike the raw data however, we use composite columns here. All the buckets with the exception of the raw data, store computed aggregates. RHQ calculates and stores the min, max, and average for each (numeric) metric schedule. The column name consists of a timestamp and an integer. The integer identifies whether the value is the max, min, or average. Here is some sample (Cassandra) CLI output for one hour data:


Each row in the output reads like a tuple. The first entry is the column name with a colon delimiter. The timestamp is listed first followed by the integer code to identify the aggregate type. Next is the column value, which is the value of the aggregate calculation. Then we have a timestamp. Every column has a timestamp in Cassandra has a timestamp. It is used for conflict resolution on writes. Lastly, we have the ttl. The schema for the remaining buckets is similar the one_hour_metric_data column family so I will not list them here.

The last implementation detail I want to discuss is querying. When the data purge job runs, it has to determine what data is ready to be aggregated. With the existing implementation that uses the RHQ database, queries are fast and efficient using indexes. The following column family definition serves as an index to make queries fast for the Cassandra implementation as well:


The row key is the metric data column family name, e.g., one_hour_metric_data. The column name is a composite that consists of a timestamp and a schedule id. Currently the column value is an integer that is always set to zero because only the column name is needed. At some point I will likely refactor the data type of the column  value to something that occupies less space. Here is a brief explanation of how the index is used. Let's start with writes. Whenever data for a schedule is written into one bucket, we update the index for the next bucket. For example, suppose data for schedule id 123 is written into the raw_metrics column family at 09:15. We will write into the "one_hour_metric_data" row of the index with a column name of 09:00:123. The timestamp in which the write occurred is rounded down to the start of the time slice of the next bucket. Further suppose that additional data for schedule 123 is written into the raw_metrics column family at times 09:20, 09:25, and 09:30. Because each of those timestamps gets rounded down to 09:00 when writing to the index, we do not wind up with any additional columns for that schedule id. This means that the index will contain at most one column per schedule for a given time slice in each row.

Reads occur to determine what data if any needs to be aggregated. Each row is in the index is queried. After a column is read and the data for the corresponding schedule is aggregated into the next bucket, that column is then deleted. This index is a lot like a job queue. Reads in the existing implementation that use a relational database should be fast; however, there is still work that has to be done to determine what data if any needs to be aggregated when the data purge job runs. With the Cassandra implementation, the presence of a column in a row of the metrics_aggregates_index column family indicates that data for the corresponding schedule needs to be aggregated.

Testing

I have pretty good unit test coverage, but I have only done some preliminary integration testing. So far it has been limited to manual testing. This includes inspecting values in the database via the CLI or with CQL and setting break points to inspect values. As I look to automate the integration testing, I have been giving some thought to how metric data is pushed to the server. Relying on the agent to push data to the server is sub optimal for a couple reasons. First, the agent sends measurement reports to the server once a minute. I need better control of how frequently and when data is pushed to the server.

The other issue with using the agent is that it gets difficult to simulate older metric data that has been reported over a specified duration, be it an hour, a day, or a week. Simulating older data is needed for testing that data is aggregated into 6 hour and 24 hour buckets and that data is purged at appropriate times.

RHQ's REST interface is a better fit for the integration testing I want to do. It already provides the ability to push metric data to the server. I may wind up extending the API, even if just for testing, to allow for kicking off the aggregation that runs during the data purge job. I can then use the REST API to query the server and verify that it returns the expected values.

Next Steps

There is still plenty of work ahead.I have to investigate what consistency levels are most appropriate for different operations. There is a still a large portion of the metrics APIs that needs to be implemented, some of the more important ones being query operations used to render metrics graphs and tables. The data purge job is not the best approach going forward for doing the aggregation. Only a single instance of the job runs each hour, and it does not exploit any of the opportunities that exist for parallelism. Lastly and maybe most importantly, I have yet to start thinking about how to effectively manage the Cassandra cluster with RHQ. As I delve into these other areas I will continue sharing my thoughts and experiences.
RHQ REST api: Support for JSONP (updated)


I have just committed support for JSONP to the RHQ REST-api.

Update: In the first version a special media type was required. This is now removed, as jQuery seems to have issues sending this type. Also the default name for the callback parameter has been renamed to jsonp.



To use it you need to pass a parameter for the callback. Lets look at an example:


$ curl -i -u rhqadmin:rhqadmin \
http://localhost:7080/rest/1/alert?jsonp=foo \
-Haccept:application/json
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Pragma: No-cache
Cache-Control: no-cache
Expires: Thu, 01 Jan 1970 01:00:00 CET
X-Powered-By: Servlet 2.4; JBoss-4.2.0.CR2
Content-Type: application/javascript
Transfer-Encoding: chunked
Date: Thu, 14 Jun 2012 09:25:09 GMT

foo([{"name":"test","resource":{"typeName":null, ……..])


The name of the callback parameter is jsonp and the name of the callback function to return is
foo. In the output of the server you see how the json-data is then wrapped inside foo().
The wrapping will only happen when both the right content-type is requested and the callback parameter is present.

The content type returned is then application/javascript.



Setting the name of the callback parameter

Then name of the callback parameter (jsonp in above example) can be set in web.xml :


<filter>
<filter-name>JsonPFilter</filter-name>
<filter-class>org.rhq.enterprise.rest.JsonPFilter</filter-class>
<init-param>
<param-name>filter.jsonp.callback</param-name>
<param-value>jsonp</param-value>
<description>Name of the callback to use for JsonP /description>
</init-param>
</filter>
Modeling Metric Data in Cassandra
RHQ supports three types of metric data - numeric, traits, and call time. Numeric metrics include things like the amount of free memory on a system or the number of transactions per minute. Traits are strings that track information about a resource and typically change in value much less frequently than numeric metrics. Some examples of traits include server start time and server version. Call time metrics capture the execution time of requests against a resource. An example of call time metrics is EJB method execution time.

I have read several times that with Cassandra it is best to let your queries dictate your schema design. I recently  spent some time thinking about RHQ's data model for metrics and how it might look in Cassandra. I decided to focus only on traits for the time being, but much of what I discuss applies to the other metrics types as well.

I will provide a little background on the existing data model to make it easier to understand some of the things I touch on. All metric data in RHQ belongs to resources. A particular resource might support metrics like those in the examples above, or it might support something entirely different. A resource has a type, and the resource type defines which type of metrics that it supports.We refer to these as measurement definitions. These measurement definitions, along with other meta data associated with the resource type, are defined in the plugin descriptor of the plugin that is responsible for managing the resource. You can think of a resource type of an abstraction and a resource is a realization of that abstraction. Similarly, a measurement definition is an abstraction, and a measurement schedule is a realization of a measurement definition. A resource can have multiple measurement schedules, and each schedule is associated with measurement definition. The schedule has a number of attributes like the collection interval, an enabled flag, and the value. When the agent reports metric data to the RHQ server the data is associated with a particular schedule. To tie it all together, here is a snippet of some of the relevant parts of the measurement classes:

To review, for a given measurement schedule, we can potentially add an increasing number of rows in the RHQ_MEASUREMENT_DATA_TRAIT table over time. There are a lot of fields included in the snippet for MeasurementDefinition. I chose to include most of them because they are pertinent to the discussion.

For the Cassandra integration, I am interested primarily in the MeasurementDataTrait class. All of the other types are managed by the RHQ database. Initially when I started thinking about what column families I would need, I felt overcome with writer's block. Then I reminded myself to think about trait queries and try to let those guide my design. I decided to focus on some resource-level queries and leave others like group-level queries for a later exercise. Here is a screenshot of one of the resource-level views where the queries are used:


Let me talk a little about this view. There are a few things to point out in order to understand the approach I took with the Cassandra schema. First, this is a list view of all the resource's traits. Secondly, the view shows only the latest value for each trait. Finally, the fields required by this query span across multiple tables and include resource id, schedule id, definition id, display name, value, and time stamp. Because the fields span across multiple tables, one or more joins is required for this query. There are two things I want to accomplish with the column family design in Cassandra. I want to be able to fetch all of the data with a single read, and I want to be able to fetch all of the traits for a resource in that read. Cassandra of course does not support joins; so, some denormalization is needed to meet my requirements. I have two column families for storing trait data. Here is the first one that supports the above list view as a Cassandra CLI script:
create column family resource_traits
with comparator = 'CompositeType(DateType, Int32Type, Int32Type, BooleanType, UTF8Type, UTF8Type)' and
default_validation_class = UTF8Type and
key_validation_class = Int32Type;
The row key is the resource id. The column names are a composite type that consist of the time stamp, schedule id, definition id, enabled flag, display type, and display name. The column value is a string and is the latest known value of the trait. This design allows for the latest values of all traits to be fetched in a single read. It also gives me the flexibility to perform additional filtering. For example, I can query for all traits that are enabled or disabled. Or I can query for all traits whose values last changed after a certain date/time. Before I talk about the ramifications of the denormalization I want to introduce the other column family that tracks the historical data. Here is the CLI script for it:
create column family traits
with comparator = DateType and
default_validation_class = UTF8Type and
key_validation_class = Int32Type;
This column family is pretty straightforward. The row key is the schedule id. The column name is the time stamp, and the column value is the trait value. In the relational design, we only store a new row in the trait table if the value has changed. I have only done some preliminary investigation, and I am not yet sure how to replicate that behavior with a single write. I may need to use a custom comparator. It is something I have to revisit.

I want to talk a little bit about the denormalization. As far this example goes, the system of record for everything except the trait data is the RHQ database. Suppose a schedule is disabled. That will now require a write to both the RHQ database as well as to Cassandra. When a new trait value is persisted, two writes have to be made to Cassandra - one to add a column to the traits column family and one to update the resource_traits column family.

The last thing I will mention about the design is that I could have opted for a more row based approach where each column in resource_traits is stored in a separate row. With that approach, I would use statically named columns like scheduleId and the corresponding value would be something like 1234. The primary reason I decided against this is because the RandomPartitioner is used for the partitioning strategy, which happens to be the default. RandomPartitioner is strongly recommended for most cases to allow for even key distribution across nodes. Without going into detail, range scans, i.e., row-based scans, are not possible when using the RandomPartitioner. Additionally, Cassandra is designed to perform better with slice queries, i.e., column-based queries than with range queries.

The design may change as I get further along in the implementation, but it is a good starting point. The denormalization allows for efficient querying of a resource's traits and offers the flexibility for additional filtering. There are some trade offs that have to be made, but at this point, I feel that they are worthwhile. One thing is for certain. Studying the existing (SQL/JPA) queries and understanding what data is involved and how helped flush out the column family design.
Working with Cassandra
RHQ provides a rich feature set in terms of its monitoring capabilities. In addition to collecting and storing metric data, RHQ automatically generates baselines, allows you to view graphs of data points over different time intervals, and gives you the ability to alert on metric data. RHQ uses a single database for storing all of its data. This includes everything from the inventory, to plugin meta data, to metric data. This presents an architectural challenge for the measurement subsystem particularly in terms of scale. As the number of managed resources grows and the volume of metrics being collected increases, database performance starts to degrade. Despite various optimizations that have been made, the database remains a performance bottleneck. The reality is that the relational database simply is not the best tool for write-intensive applications like time-series data.

This architectural challenge has in large part motivated me to start learning about Cassandra. There are plenty of other, non-relational database systems that I think could address the performance problems with our measurement subsystem. There are a couple things about Cassandra that provided enough intrigue that made me decide to invest time learning about it.

The first point of intrigue is that Cassandra is a distributed, peer-to-peer system with no single point of failure. Any node in the cluster can serve read and write requests. Nodes can be added to and removed from the cluster at any point making it easier to meet demands around scalability. This design is largely inspired by Amazon's Dyanmo.

The second point of intrigue for me is that running a node involves running only a single Java process. For the purposes of RHQ and JBoss Operations Network (JON), this is much more important to me than the first point about single points of failure. The fewer the moving parts, the better. It simplifies management which will goes along way towards the goal of having a self-contained solution.

Cassandra could be a great fit for RHQ, and the time I have spent thus far learning it is definitely time well spent. There are some learning curves and hurdles one has to overcome though. I find the project documentation to be lacking. For example, it took some time to wrap my head around super columns. It was only after I started understanding super columns to the point where I could begin thinking about how to leverage them with RHQ's data model that I then discovered that composite columns should be favored over super columns. Apparently composite columns do not have the performance and memory overhead inherent to super columns. And composite columns allow for an arbitrary level of nesting whereas super columns do not. Fortunately DataStax's docs help fill in a lot of the gaps.

One thing that was somewhat counter-intuitive initially is how the sorting works. With a relational database, you first define the schema, and then queries are defined later on. Sorting is done on column values and is specified at query time. With Cassandra, sorting is based on column names and is specified at the time of schema creation. This might seem really strange if you are thinking in terms of a relational database, but Cassandra is a distributed key-value store. If you think about it more along the lines of say, java.util.TreeMap, then it makes a lot more sense. With a TreeMap, sorting is done on keys. When I want to use a TreeMap or another ordered collection, I have to decide in advance how the elements of the collection should be ordered. This aspect of Cassandra is a good thing. It contributes to the high performance read/writes for which Cassandra is known. It also lends itself very well to working with time-series data.

DataStax posted a great blog the other day about how they use Cassandra as a metrics database. The algorithm described sounds similar to what we do in RHQ; however, there are a few differences (aside from the obvious one of using different database systems). One difference is in bucket sizes. They use bucket sizes of one minute, five minutes, two hours, and twenty-four hours. RHQ uses bucket sizes of one hour, six hours, and twenty-four hours. I will briefly explain what this means. RHQ writes raw data points into a set of round-robin tables. Every hour a job runs to perform aggregation. The latest hour of data points is aggregated into the one hour table or bucket. RHQ calculates the max, min, and average for each metric collection schedule. When the one hour table has six hours worth of data, it is aggregated and written into the  six hour table.

Disk space is cheap, but it is not infinite. There needs to be a purge mechanism in place to prevent unbounded growth. For RHQ, the hourly job that does the aggregation also handles the purging. Data in the six hour bucket for instance, is kept for 31 days. With Cassandra, DataStax simply relies on Cassandra's built-in TTL (time to live) feature. When data is written into a column, the TTL is set on it so that it will expire after the specified duration.

So far it has been a good learning experience. Cassandra is clearly an excellent fit for storing RHQ's metric data, but I am starting to how it could also be a good fit for other parts of the data model as well.
RHQ REST api: added support for Group Definitions


I've just added some support for GroupDefinitions (aka "DynaGroups") to RHQ.

The following shows some examples:

List all definitions:

$ curl -i --user rhqadmin:rhqadmin http://localhost:7080/rest/1/group/definitions
[{"name":"group1",
"id":10001,
"description":"just some random test",
"expression":["groupby resource.type.plugin","groupby resource.type.name"],
"recursive":false,
"recalcInterval":180000,
"generatedGroupIds":[10082,10091]},
{"name":"platforms",
"id":10002,
"description":"",
"expression":["resource.type.category = PLATFORM","groupby resource.name"],
"recursive":false,
"recalcInterval":0,
"generatedGroupIds":[10152]}
]
Get a single definition by id:

$ curl -i --user rhqadmin:rhqadmin http://localhost:7080/rest/1/group/definition/10002
{"name":"platforms",
"id":10002,
"description":"",
"expression":["resource.type.category = PLATFORM","groupby resource.name"],
"recursive":false,
"recalcInterval":0,
"generatedGroupIds":[10152]
}


You see in the above examples that the actual expression is encoded as a list with each line being an item in the list. The recalculation interval needs to be given in milliseconds.

Delete a definition (by id):

$ curl -i --user rhqadmin:rhqadmin http://localhost:7080/rest/1/group/definition/10031 -X DELETE
Create a new definition:

$ curl -i --user rhqadmin:rhqadmin http://localhost:7080/rest/1/group/definitions \
-HContent-Type:application/json -HAccept:application/json -X POST \
-d '{"name":"test1","description":"Hello","expression":["groupBy resource.name"]}'
HTTP/1.1 201 Created
Location: http://localhost:7080/rest/1/group/definition/10041


For creation a name is required. The location of the created group definition is returned in the header of the response.

And finally to update a definition:

curl -i --user rhqadmin:rhqadmin http://localhost:7080/rest/1/group/definition/10041?recalculate=true \
-HContent-Type:application/json -HAccept:application/json -X PUT \
-d '{"name":"test4","description":"Hello","expression":["groupBy resource.name"]}'


By passing the query-param recalculate=true we can trigger a re-calculation of the groups defined by this group definition.

Monitoring Custom Data from DB Queries
There is an interesting little feature in RHQ that I thought I would quickly mention here. Specifically, it's a feature of the Postgres plugin that let's you track metrics from any generic query you specify.

Suppose you have data in your database and you want to expose that data as a metric. For example, suppose you want to track the total number of users that are currently logged into your application and that information is tucked away in some database table that you can query.

Import your Postgres database into RHQ and manually add a "Query" resource under your Postgres Database Server resource (see the image below where the "Import" menu provides you with the names of the resource types you can manually add as a child to the database server resource - in this case, the only option is the Query resource type).



When you "import" this Query resource through the manual add feature, you will be asked for, among other things, the query that you want to execute that extracts your metric data.



Once you do, you'll have a new Query resource in your RHQ inventory that is now tracking your metric value like any other metric (e.g. you will be able to see the historical values of your data in the graph on the Monitoring tab; you'll be able to alert on those values; etc.)



The one quirky thing about this is the query needs to return a single row of two columns - the first column must have a value of "metricColumn" (literally) and the second column must be a numeric value. To follow the earlier example (tracking the number of users currently logged in), it could be something like:

SELECT 'metricColumn', count(id) FROM my_application_user WHERE is_logged_in = true

That's it. A pretty simple feature, but it seems like this could have a wide range of uses. Hopefully, this little tidbit can spark an idea in your head about how you can use this feature while monitoring your systems.
Enviando notificações XMPP através do RHQ
Este post é para aqueles que vivem o cotidiano das ferramentas de monitoramento de servidores e plataformas, como o RHQ. Normalmente, estas ferramentas possuem a funcionalidade de envio de notificações, quando na ocorrência de algum tipo de alerta: memória insuficiente, sobrecarga no processamento, limite de conexões, etc. A notificação, na maioria dos casos, é realizada [...]
Sending RHQ Alerts over XMPP
Rafael has created a very cool server plugin to allow RHQ to send alerts to his Google account over XMPP. Not only that, but he was able to use that same XMPP channel to send commands to the RHQ Server, like another kind of CLI.

Watch his demo
to see how it works. Very awesome!

This is exactly the kind of innovation we envisioned the community being able to do via the server plugins, as I mentioned in my earlier blog entry titled "RHQ Server Plugins - Innovation Made Easy"

Nice job, Rafael!
Managing Compliance Across Multiple Resources
In the latest RHQ project release, and Red Hat's JBoss Operations Network product, a new feature called "drift monitoring" has been introduced.

If you've seen my previous blog, it (along with its demo) described how you can monitor changes in files belonging to a single resource. If a resource's files deviate from a trusted snapshot, that resource will be considered out of compliance and the UI will indicate when this happens.

This pinning/compliance feature within the drift subsystem can be combined with drift templating to allow you to pin a single snapshot of content to multiple resources. This allows you to have a single snapshot which all resources can share. In other words, if you have a cluster of app servers and they all have the same web application deployed, you can pin a snapshot of the in-compliant web application to a drift template that all of your app servers use when they scan for drift. Thus, for an entire cluster, if one of the servers in that cluster drifts away from that shared, in-compliant snapshot, that server will be flagged as "out of compliance".

To see how this works, see my "Managing Compliance Across Multiple Resources" demo.
Managing Compliance with Drift Pinning
In the latest RHQ project release, and Red Hat's JBoss Operations Network product, a new feature called "drift monitoring" has been introduced.

Drift monitoring provides the ability to monitor changes in files and to determine if those files are in or out of compliance with a desired state. In other words, if I installed an application and someone changes files in that installation, I can be told when those changes occurred and I can analyze those changes.

I put together another demo for my "drift demo series" that illustrates this concept:

How it works is this - suppose you have a set of files on your machine and you don't want those files changed. In other words, the files you have now are "in compliance" with what you want and this set of in compliance files should not be touched. In RHQ/JON, you would create a new drift definition and pin that definition to your current snapshot of files. This pinning effectively marks that snapshot of files as the "in compliant" version. Any changes now made to those files being tracked will be considered drift and out of compliance. In the graphical user interface, you can see what has gone out of compliance and you can drill down to see what files drifted and even what parts of those files drifted.

This pinning/compliance feature within the drift subsystem can be combined with drift templating to allow you to pin a single snapshot of content to multiple resources allowing you to have a single snapshot which all resources can share. In other words, if I have a cluster of app servers and they all have the same web application deployed, I can pin a snapshot of the in-compliant web application to a drift template that all my app servers use when they scan for drift. Thus, for my entire cluster, if one of my servers in that cluster drifts away from that shared, in-compliant snapshot, that server will be flagged as "out of compliance". I will be posting another demo to illustrate this concept in the near future.
JBoss ON 3.0 Released and a Drift Demo
Red Hat has released the next version of JBoss Operations Network. This is version 3.0 and incorporates the new look and feel of the GWT user interface that RHQ introduced recently. The bundle UI has been enhanced a bit as well (it was the only GWT-based portion of the UI that was in JBoss ON 2.4.1).

A new feature introduced in this JBoss ON 3.0 release is drift monitoring. If you have been following the progress of the RHQ upstream project, you'll know that this drift monitoring feature provides administrators with the ability to keep on the look out for unintended changes (either accidental or malicious) to your installed software and other file content.

I posted a demo showing the basics of the new drift feature:
I plan on posting some more demos on drift in the near future.
RHQ enviando Notificações para o Nagios
Fala galera. Muitos usuários têm adotado o RHQ (JON/JOPR) como ferramenta de monitoramento, não só de instâncias JBoss, como também de outros recursos como: sistema operacional, banco de dados, servidores Web, etc.Entretanto, muitos querem centralizar a visibilidade dos alertas gerados pelo RHQ com os alertas de outras ferramentas de monitoramento. Normalmente, essa centralização acontece através [...]
Drift Management Coming to RHQ
Introduction
I am excited to share that we are very close to releasing a beta of RHQ 4.1.0. I have been working on Drift Management, one of the new features going into the release. I have been meaning to write a little bit about what this new feature is all about, and now is as good a time as any. I will try to provide a high level overview and save getting into more specific, detailed topics for future posts.

What is Drift?
The first thing we need to do is define what is exactly is meant by the term Drift Management. Let's start with the first part. Conceptually, we can define drift as an unplanned or unintended change to a managed resource. Let's consider a couple examples to illustrate the concept.

We have an EAP server that is configured for production use. That is, things like the JVM heap size, data source definitions, etc. are configured with production values. At some point suppose the heap settings for the EAP server are changed such that they are no longer consistent with what is expected for production use. This constitutes drift.

Now let's consider another example involving application deployment. Suppose we have a cluster of EAP servers that is running our business application. We deploy an updated version of the application. For some reason, one of the cluster nodes does not get updated with the newer version of the application while the others have. We now have a cluster node that does have content that is expected to be deployed on it. This constitutes drift.

Why Do We Care about Drift?
Now that we have looked at some examples to illustrate the concept of drift, there is a perfectly reasonably question to ask. Why should we care? Unplanned or unintended changes frequently lead to problems. Those problems can manifest themselves as production failures, defects, outages, etc. Even with planned, intended changes, problems arise. It is not a question of if but rather when. A production server going down can result in a significant loss of time and money among other things. Anything you can do to be proactive in handling issues when the occur could help save your organization time, money, and resources.

How Will RHQ Manage Drift?
What can RHQ do to deal with drift? First and foremost, it can monitor resources for unintended or unplanned changes. RHQ allows you to specify which resources or which parts of resources you want to monitor for drift. The agent can periodically scan the file system looking for changes. When the agent detects a change, it notifies the server with the details of what has changed.

The server maintains a history of the changes it receives from the agent. This makes it possible for example to compare the state of a resource today versus its state two weeks ago. One of the many interesting and challenging problems we are tackling is how to present that history in meaningful ways so that users can quickly and easily identify changes of interest.

An integral aspect of RHQ's monitoring capabilities is its alerting system. RHQ allows you to define different rules which can result in alerts being triggered. For example, we can create a rule that will trigger an alert whenever an EAP server goes down. Similarly, RHQ could (and will) give you the ability to have alerts triggered whenever drift is detected on any of your managed EAP servers.

Another key aspect of RHQ's drift management functionality is remediation. Some platforms and products provide automatic remediation. Consider the earlier example of the changed heap settings on the EAP server. With automatic remediation, those settings might be reverted back to their orignal values as soon as the change is detected.

Then there is also manual remediation. Think merge conflicts in a version control system. There are lots of visual editors for view diffs and resolving conflicts. A couple that I use are diffmerge and meld. RHQ will provide interfaces and tools for generating and viewing diffs and for performing remediation much in the same way you might with a visual diff editor.

What's Next?
Here is a quick run down of drift management features that will be in the beta:

  • Enable drift managent for individual resources
    • This involves defining the drift configuration or rules which specify what files to monitor for drift and how often monitoring should be done
  • Perform drift monitoring (done by the agent)
  • View change history in the UI
  • Execute commands from the CLI to:
    • Query for change history
    • Generate snapshots
      • A snapshot provides a point in time view of a resource for a specified set of changes
    • Diff snapshots (This is not a file diff)

Here are some notable features that will not be available in the beta:
  • Define filters that specify which files to include/exclude in drift monitoring (Note that you actually can define the filter. They just are not handled by the agent yet)
  • Perform manual remediation (i.e., visual diff editor)
  • Support for golden images (more on this in a future post)
  • Generate/view snapshots in the UI
  • Alerts integration
It goes without saying that there will be bugs, some of which are known, and that functionality in the beta is subject to change in ways that will likely break compatibility with future releases. More information will be provided in the release notes as soon as they are available. Stay tuned!
Telling GWT To Ignore Certain Classes
We recently hit a problem in RHQ that required us to learn about a feature in GWT (specifically the GWT compiler) that was very useful and I thought I would blog about it.

First, some background. In RHQ, we have split up the source code into several "modules". Each module is built by Maven and artifacts are produced (for example, some modules will output a jar file after the build completes). Dependent artifacts are built first, then modules that depend on other modules' artifacts are built afterwards - Maven knows how to maintain the proper dependent hierarchy so it can build modules in the proper order. After our entire suite of modules are built, the build system assembles all the module artifacts into the RHQ distribution. As you can see, there is nothing special here, any complex application needs a build system that does the same thing.

One of RHQ's modules is what we call the "domain" module. This is simply the module that contains the source code for all of our domain objects that map to our data model (in other words, the domain objects simply represent the set of all of our database entities, such as Users, or Roles, or Resources, or Alerts, etc.)

These domain objects are essentially the basic building blocks on which all of RHQ is built. These domain objects are used all throughout the RHQ codebase - in the agent, in the server, in the remote CLI and in the GWT client. All of their modules depend on the domain module - the domain module is one of the first ones built.

Since our GWT client needs these domain objects, the domain module needs to not only be passed through the Java compiler, it needs a second pass through the GWT compiler.

Here is where the problem comes in. Some of our domain objects require that they import certain Java classes that GWT doesn't support. Because the domain objects are used everywhere, they end up needing to provide functionality that is required by the server and agent, not just the GWT client. But this functionality sometimes requires the use of certain JRE features that are not emulated, and thus not available, in the GWT runtime (things such as java.io or java.sql classes).

But how can we do this? If we import these GWT-unsupported JRE classes into some the domain classes, the GWT client cannot load the domain jar and the GWT client becomes dead. But without those JRE classes, the functionality that the domain module needs to provide to the other modules (like the server or agent) becomes broken. It is a catch-22. So, the question becomes, how can we provide all of the domain module functionality to all dependent modules, but remove only portions of it that are not GWT compatible from the GWT client module?

One idea was to create another module - a second domain module - that was compatible with the GWT client (essentially it would be the original domain module, minus the GWT-incompatible code). We could possibly do this by refactoring the original domain objects and creatively using inheritance. But it would give us two domain jars. Adding yet another module to the build isn't something we wanted to do.

As it turns out, after being perplexed at how best to design around this problem, we found out that GWT provides a very, very easy solution to this. You can tell the GWT compiler to filter out and exclude certain classes when it compiles the Java source into GWT Javascript. Since those classes don't get compiled into GWT Javascript, the GWT client never sees them or tries to load them. Without having to split our domain module into two separate modules, we get the effect of having two different domain libraries - one for the GWT client and one for the rest of the RHQ system.

Now we can use normal Java inheritance to share common code while at the same time we ensure that only certain subclasses will use the non-GWT-supported features of the JRE. We can then exclude those subclasses from the GWT compilation thus allowing the GWT client to load the domain module, minus those classes it couldn't use anyway.

To use this feature, we had to add the <exclude> XML element to our GWT module's XML file in order to tell the GWT compiler which Java classes to ignore:


<entry-point class="org.rhq.core.client.RHQDomain" />
<source path="domain">
<exclude name="**/DriftFileBits.*" />
</source>


This essentially tells the GWT compiler to compile all of our domain module's classes except for the DriftFileBits class. If, in the future, we have to introduce other classes that are not supported by GWT, we can add more <exclude> elements to filter out those additional Java packages or individual classes.

One negative aspect of this is that we have to be careful to not "leak" references to those excluded classes into our GWT API or GWT implementation classes. But even if we do, it is caught rather quickly when the GWT client attempts to load the application.
Deploying RHQ Bundles To Non-Platform Resources
I recently completed the initial implementation of a feature that folks have been asking for lately, so I decided to put together a quick demo to show it.

First, some background. The initial RHQ Bundle Provisioning feature (of which I have blogged about in the past) supported the ability to deploy bundle distributions (which are, essentially, just files of generic content) to a group of platforms. That is, given a set of files bundled together, RHQ can ship out those files to a group of platform resources where the agents on those platforms will unbundle the files to the root file system.

This new feature (submitted as enhancement request BZ 644328) now allows you to target your bundle deployments to a group of non-platform resources if you wish. For example (as the demo shows), we can now target bundle deployments to JBossAS Server resources. This means you can bundle up a WAR or an EAR, and push that bundle to a group of JBoss AS Servers such that the WAR or EAR gets deployed directly to the deploy/ directory (and hence gets deployed into your application servers).

The demo is 10 minutes long and shows what a simple workflow looks like to deploy a WAR bundle to a group of JBoss EAP application servers.

View the demo here.

[Note: unlike my previous demos which were built on a Windows laptop using Wink and formatted as Flash, I decided to try Istanbul on my Fedora desktop. So the demo is formatted in .ogg format, as opposed to Flash. Hopefully, this doesn't limit the audience that is able to view the demo.]
Manually Add Resources to Inventory from CLI
Resources in RHQ are typically added to the inventory through discovery scans that run on the agent. The plugin container (running inside the agent) invokes plugin components to discover resources. RHQ also allows you to manually add resources into inventory. There may be times when discovery scans fail to find a resource you want to manage. The other day I was asked whether or not you can manually add a resource to inventory via the CLI. Here is a small CLI script that demonstrates manually adding a Tomcat server into inventory.


The findResourceType and findPlatform functions are pretty straightforward. The interesting work happens in createTomcatConnectionProps and in manuallyAddTomcat. The key to it all though is on line 44. DiscoveryBoss provides methods for importing resources from the discovery queue as well as for manually adding resources. manuallyAddResources expects as arguments a resource type id, a parent resource id, and the plugin configuration (i.e., connection properties).

Determining the connection properties that you need to specify might not be entirely intuitive. I looked at the plugin descriptor as well as the TomcatDiscoveryComponent class from the tomcat plugin to determine the minimum, required connection properties that need to be included.

Here is how the script could be used from the CLI shell:

rhqadmin@localhost:7080$ login rhqadmin rhqadmin
rhqadmin@localhost:7080$ exec -f manual_add.js
rhqadmin@localhost:7080$ hostname = '127.0.0.1'
rhqadmin@localhost:7080$ tomcatDir = '/home/jsanda/Development/tomcat6'
rhqadmin@localhost:7080$ manuallyAddTomcat(hostname, tomcatDir)
Resource:
id: 12071
name: 127.0.0.1:8080
version: 6.0.24.0
resourceType: Tomcat Server

rhqadmin@localhost:7080$

This effectively adds the Tomcat server to the inventory of managed resources. This same approach can be used with other resource types. The key is knowing what connection properties you need to specify so that the plugin (in which the resource type is defined) knows how to connect to and manage the resource.
Making Links Work Right in a SmartGWT App on IE
The GUI of RHQ 4.0 and later is built upon SmartGWT. In many places in the SmartGWT app, we embedded raw HTML to render fragment links (e.g. #Inventory/Servers) to other places in the app. We used raw HTML, rather than widgets, for a few reasons:

  1. SmartGWT does not provide a link widget. GWT provides the HyperLink and InlineHyperLink widgets, but we try to avoid using non-SmartGWT widgets when possible to prevent layout issues or CSS issues caused by straying from the SmartGWT framework. A SmartGWT Label or HTMLFlow can be extended to simulate a link using a ClickHandler but it will not be rendered as an 'a' tag and so will not inherit the CSS styles used for 'a' tags and will not display the link's URL in the browser status bar when the user hovers over the link.
  2. Many of our links are inside ListGrid cells. There is no straightforward reliable way to embed arbitrary widgets in ListGrid cells. I tried using the mechanism http://www.smartclient.com/smartgwt/showcase/#grid_cell_widgets described here and encountered overflow and wrapping issues, which I was unable to overcome. The CellFormatter interface only supports returning a String, but that String can include HTML, so that's what we ended up using for cells that need to contain a link.
  3. For FormItems that need to contain links, CanvasItem can be extended in order to embed GWT HyperLink widgets, but using a StaticTextItem with HTML embedded in its value is more straightforward.
Unfortunately, we noticed that clicking on any of our raw HTML fragment links in IE caused a full page refresh, which is not at all desirable in a GWT app, which is intended to be pure AJAX. Further investigation revealed that this is a longstanding quirk (aka bug) in IE; rather than simply generating a history event for the URL with the updated fragment, it sends an unnecessary request for the URL to the server. If you use the GWT HyperLink widget, GWT uses some JavaScript fanciness involving iframes to circumvent the IE bug and make fragment links work properly. However, since we were using raw HTML for all the links in the RHQ GUI, this magic was not there for us. Converting all our HTML links would be a ton of work and was simply not a viable option for links in ListGrid cells for the reasons described above, so we needed to find a way to execute GWT's magic when any of our raw HTML links were clicked. The answer ended up being to add a native preview event handler that intercepts browser click events and executes the magic if the click was on one of our 'a' tags. We did this by making our EntryPoint class implement the GWT Event.NativePreviewHandler interface as follows:
    public void onPreviewNativeEvent(Event.NativePreviewEvent event) {
if (SC.isIE() && event.getTypeInt() == Event.ONCLICK) {
NativeEvent nativeEvent = event.getNativeEvent();
EventTarget target = nativeEvent.getEventTarget();
if (Element.is(target)) {
Element element = Element.as(target);
if ("a".equalsIgnoreCase(element.getTagName())) {
// make sure it's not a hyperlink that GWT already
// handles
if (element.getPropertyString("__listener") == null) {
String url = element.getAttribute("href");
String historyToken = getHistoryToken(url);
if (historyToken != null) {
GWT.log("Forcing History.newItem(\"" +
historyToken + "\")...");
History.newItem(historyToken);
nativeEvent.preventDefault();
}
}
}
}
}
}

private static String getHistoryToken(String url) {
String token;
if (url.startsWith("#")) {
token = url.substring(1);
} else if (url.startsWith("/#")) {
token = url.substring(2);
} else if (url.contains(Location.getHost()) && url.indexOf('#') > 0) {
token = url.substring(url.indexOf('#') + 1);
} else {
token = null;
}
return token;
}
We then add the native preview handler at app load time by adding the following line to our EntryPoint class:
Event.addNativePreviewHandler(this);
This solution is working great. However, we still might eventually go back and switch over to using GWT HyperLinks, rather than raw HTML, in places where it is feasible, such as FormItems, since it is generally better to use widgets rather than raw HTML to keep things object-oriented and leave the generation of HTML, CSS, and JavaScript to the framework.
Detaching Hibernate Objects to pass to GWT
One thing we encountered pretty quickly when we started our GWT work was the fact that you can't serialize objects over the wire from server to GWT client if those objects were obtained via a Hibernate/JPA entity manager.

If you've ever worked with Hibernate/JPA, you'll know that when you get back entity POJOs whose fields are not loaded (i.e. marked for lazy loading and you didn't ask for the data to be loaded), your entity POJO instance will have Hibernate proxies where you would expect a "null" object to be (this is to allow you to load the data later, if your object is still attached to the entity manager session).

Having these proxies even after leaving a JPA entity manager session is a problem in the GWT world because the GWT client sitting in your browser doesn't have Hibernate classes available to it! Trying to send these entity POJO instances that have references to Hibernate proxies causes serialization errors and your GWT client will fail to operate properly.

This is a known issue and is discussed here.

We pretty quickly decided against using DTOs. As that page above mentioned, "if you have many Hibernate objects that need to be translated, the DTO / copy method creation process can be quite a hassle". We have a lot of domain objects that are used server side in RHQ. There was no reason why we shouldn't be able to reuse our domain objects both server side and client side - introducing DTOs just so we could workaround this serialization issue seemed ill-advised. It would have just added bloat and unnecessary complexity.

I can't remember how mature the Gilead project was at the time we started our GWT work, or maybe we just didn't realize it existed. Gilead does require you to have your domain objects and server side impl classes extend certain Java classes (LightEntity for example), so it has a slight downside that it requires you to modify all your domain objects. In any event, we do not use Gilead to do this detaching of hibernate proxies.

RHQ's solution was to write our own "Hibernate Detach Utility". This is a single static utility that you use to process your objects just prior to sending them over the wire to your GWT client. Essentially it scrubs your object of all Hibernate proxies, cleaning it such that it can be serialized over the wire successfully.

We also used this when we originally developed a web services interface to the RHQ remote API.

Here is the HibernateDetachUtility source code in case you are interested in seeing how we do it - maybe you could use this in your own GWT/Hibernate application. I think it is reuseable - not much custom RHQ stuff is going on in here.
RHQ 4 Has Been Released
It has been a very long road for this one, but we managed to release RHQ 4.

We managed to standardize on GWT as our user interface framework. Here's an example of what the new GWT based UI looks like:


There are still a few JSP pages around, but for the most part, the RHQ GUI is now a GWT application with SmartGWT components. One of the fun parts of this job is to learn new and exciting technologies (though I guess "new" is relative - but I still consider GWT a "new" GUI technology, compared to things like Struts and JSP).

Take a look, give it a test drive and see what you think.
A REPL for the RHQ Plugin Container
Overview
RHQ plugins run inside of a plugin container that provides different services and manages the life cycles of  plugins. The plugin container in turn runs inside of the RHQ agent. If you are not familiar with the agent, it is deployed to each machine you want RHQ to manage. You can read more about it here and here. While the plugin container runs inside of the agent, it is not coupled to the agent. In fact is it used quite a bit outside of the agent. It is used in Embedded Jopr which was intended to be a replacement for the JMX web console in JBoss AS. The plugin container is also used a lot during development in automated tests.

My teammate, Heiko Rupp, has developed a cool wrapper application for the plugin container. It defines a handful of commands for working with the plugin container interactively. What is nice about this is that it can really speed up plugin development. Heiko has written several articles about the standalone container including Working on a standalone PluginContainer wrapper and I love the Standalone container in Jopr (updated). After reading some of his posts I got to thinking that a REPL for the plugin container would be really nice but not just any REPL though. I was thinking specifically about Clojure's REPL.

I have spent some time exploring the different ways Clojure could be effectively integrated with RHQ. There is little doubt in my mind that this is one of them. I recently started working on some Clojure functions to make working with the plugin container easier. I am utilizing Clojure's immutable and persistent data structures as well as some of the other great language features such as first class functions and multimethods. I am trying to make these functions easy enough to use so that someone who might not be a very experienced Clojure programmer might still find them useful during plugin development and testing.

Getting the Code
The project is available on github at https://github.com/jsanda/clj-rhq. It is built with leiningen so you want to get it installed. I typically run a swank server and connect from Emacs, but you can also start a REPL session directly if you are not an Emacs user. The project pulls in the necessary dependencies so that you can work with plugin container-related classes as you will see in the following sections.

Running the Code
These steps assume that you already have leiningen installed. First, clone the project:

git clone https://jsanda@github.com/jsanda/clj-rhq.git

Next, download project dependencies with:

lein deps

Some plugins reply  on native code provided by the Sigar library which you should find at clj-rhq/lib/sigar-dist-1.6.5.132.zip. Create a directory in lib named native and unzip sigar-dist-1.6.5.132.zip there. The project is configured to look for native libraries in lib/native.

Finally, if you are using Emacs run lein swank to start a swank server; otherwise, run lein repl to start a REPL session on the command line.

Starting/Stopping the Plugin Container

The first thing I do is call require to load the rhq.plugin-container namespace. Then I call the start function. The plugin container emits a line of output, and then the function returns nil. Next I verify that the plugin has started up by calling running?.  Then I call the stop function to shutdown the plugin container and finally call running? again to verify that the plugin container has indeed shutdown.

Executing Discovery Scans
So far we have looked at starting and stopping the PC. One of the nice things about working interactively in the REPL is that you are not limited to a pre-defined set of functions. If rhq.plugin-container did not offer any functions for executing a discovery scan, you could write something like the following:


The pc function simply returns the plugin container singleton which gives us access to the InventoryManager. We call InventoryManager's executeServerScanImmediately method and store the InventoryReport object that it returns in a variable named inventory-report. Alternatively you can use the discover function.


On the first call to discover we pass the keyword :SERVER as an argument. This results in a server scan being run. On the second call, we pass :SERVICE which results in a service scan being run. If you invoke discover with no arguments, a server scan is executed followed by a service scan. The two inventory reports from those scans are returned in a vector. The use of the count function to see how many resources were discovered is a good example that demonstrates how you can easily use functions defined outside of the rhq.plugin-container namespace to provide additional capabilities and functionality.

Searching the Local Inventory
Once you have the plugin container running and are able to execute discovery scans, you need a way query the inventory for resources with which you want to work. The inventory function does just that. It can be invoked in one of two ways. In its simpler form which takes no arguments, it returns the platform Resource object. In its more complex form, it takes a map of pre-defined filters and returns a lazy sequence of those resources that match the filters.


inventory is invoked on line 1 without any arguments, and then a string version of it is returned with a call to str. The type is Mac OS X indicating that the object is in fact the platform resource. On line 5 we invoke inventory with a single filter to include resources that are available.  That call shows that there are 62 resources in inventory that are up.  On line 7 we query for resources that are a service and see that there are 60 in inventory. On line 9 we specify multiple filters that will return down services. When multiple filters are specified, a resource must match each one in order to be included in the results. On line 10 we query for webapps from the JBossAS plugin. On line 13 we specify a custom filter in the form of an anonymous function with the :fn key. This filter finds resources that define at least two metrics.

Conclusion
We have looked at a number of functions to make working with the plugin container from the REPL a bit easier. Each function should also include a useful docstring as in the following example,


We have only scratched the surface with the functions in the rhq.plugin-container namesapce. In some future posts we will explore invoking resource operations, updating resource configurations, and deploying resources like EARs and WARs.
Remote Streams in RHQ
The agent/server communication layer in RHQ provides rich, bi-directional communication that is highly configurable, performant, and fault tolerant. And as a developer it has another feature of which I am quite fond - I rarely have to think about it. It just works.

Recently I started working on a new feature that involves streaming potentially large files from agent to server. This work has led me to look under the hood of the comm layer to an extent. The comm layer allows for high-level APIs between server and agent. Consider the following example:

public interface ConfigurationServerService {
...
@Asynchronous(guaranteedDelivery = true)
void persistUpdatedResourceConfiguration(int resourceId, Configuration resourceConfiguration);
}

The agent calls persistUpdateResourceConfiguration when it has detected a resource configuration change that has occurred outside of RHQ. The @Asynchronous annotation tells the communication layer that the remote method call from agent to server can be performed asynchronously. There are no special stubs or proxies that I have to worry about to use this remote API. It is all nicely tucked away in the communication layer.

Several posts could be devoted to discussing RHQ's communication layer but back to my current work of streaming large files. I needed to put in place a remote API on the server so that the agent can upload files. You might consider something like the following as an initial approach:

// Remote API exposed by RHQ server to stream files from agent to server
void uploadFile(byte[] data);

The problem with this approach is that it involves loading the file contents into memory. File sizes could easily exceed several hundred megabytes in size resulting in substantial memory usage that would be impractical. The RHQ agent is finely tuned to keep a low foot print in terms of memory usage as well as CPU utilization. When reading the contents of a large file that is too big to fit into memory, java.io.InputStream is commonly used. With the RHQ communication layer, I am able to expose an API like the following,

// Remote API exposed by RHQ server to stream files from agent to server
void uploadFile(InputStream stream);

With this API, the agent passes an InputStream object to the server. Keep in mind though that none of Java's standard InputStream classes implement Serializable which is a requirement for using objects with a remote invocation framework like RMI or JBoss Remoting. Fortunately for me RHQ provides the RemoteInputStream class which extends java.io.InputStream. The Javadocs from that class state,

This is an input stream that actually pulls down the stream data from a remote server. Note that this extends InputStream so it can be used as any normal stream object; however, all methods are overridden to actually delegate the methods to the remote stream.

When the agent wants to upload a file, it calls uploadFile passing a RemoteInputStream object. The server can then read from the input stream just as it would any other input stream unbeknownst to it that the bytes are being streamed over the wire.

While I find myself impressed with RemoteInputStream, it gets even better. I wanted to read from the stream asynchronously. When the agent calls uploadFile, instead of reading from the stream in the thread handling the request, I fire off a message to a JMS queue to free up the thread to service other agent requests. I am able to pass the RemoteInputStream object in a JMS message and have a Message Driven Bean then read from the stream to upload the file from the agent.

This level of abstraction along with the performance, fault tolerance, and stability characteristics of the agent/server communication layer makes it one of those hidden gems you do not really appreciate until you have to look under the hood so to speak. And rarely if ever do I find myself having to look under the hood because... it just works. Lastly, I should point out that there is a RemoteOutputStream class that compliments the RemoteInputStream class.
RHQ Bundle Recipe for Deploying JBoss Server
My colleague mazz wrote an excellent blog post that describes in detail the provisioning feature of RHQ. The post links to a nice Flash demo he put together to illustrates the various things he discusses in his article. Taking what I learned from his post, I put together a simple recipe to deploy a JBoss EAP server and then start the server after it has been laid down on the destination file system. Here is the recipe:


The bundle declaration itself on lines 4 - 11 is pretty straightforward. If this part is not clear, read through the docs on Ant bundles. Where things became a little less than straightforward is with the <exec> task starting on line 18. The first problem I encountered was Ant saying that it could not find run.sh. I think this is because it was looking for it on my PATH. Adding resovleexecutable="true" on line 21 took care of this problem. This tells Ant to look for the executable in the specified execution directory.

On line 22 I specify arguments to run.sh. -b 0.0.0.0 tells JBoss to bind to all available addresses. Initially I had line 22 written as:

<arg value="-b 0.0.0.0"/>

That did not get parsed correctly and resulting in JBoss throwing an exception with an error message saying that an invalid bind address was specified. Specifying the line attribute instead of the value attribute fixed the problem.

The last problem I encountered was Ant complaining that it did not have the necessary permissions to execute run.sh. It turned out that when the EAP distro was unpacked, the scripts in the bin were not executable. This is why I added the <chmodgt; call on line 17. It seems that the executable file mode bits are getting lost somewhere along the way in the deployment process. I went ahead and filed a bug for this you. You can view the ticket here.

After working through these issues, I was able to successfully deploy my JBoss server and have it start up without error. Now I can easily deploy my bundle a single machine, a cluster of RHEL servers that might serves as a QA or staging environment, or even a group heterogeneous machines that could consist of Windows, Fedora (or other Linux distros), and Mac OS X. Very cool! Provisioning is still a relatively new feature to RHQ. It add tremendous value to the platform, and fortunately I think it can add even more value. One of the things I would like to see is more support for common tasks like starting/stopping a server whether it is in the form of custom Ant tasks or something else.
Alerting and Remote Script Execution
RHQ has the ability to invoke operations on any resource when it triggers alerts. When you combine this with the ability of the Script plugin to run remote scripts via its "Execute" operation, you have a very powerful mechanism to integrate your own processes and rules to help correct or workaround abnormal conditions that occur in your managed environment.

Because I've heard several people ask if RHQ has this ability, I put together a flash demo that shows how you do it. The demo shows how you can execute any script on a remote machine when an alert is triggered by RHQ. For example, if you set up an alert that detects when your app server is using an abnormally large number of threads (a possible indication of heavy load), you can have RHQ execute a custom script on your app server machine to help alleviate problems that might occur due to that condition (such a script could be one that reconfigures a load balancer to help redirect load away from your app server).

The use-cases for this feature are virtually endless. Any set of alert conditions on any managed resource can trigger the execution of any script you have. And as the demo illustrates, it is really easy to set this up within RHQ.
Bundle Provisioning Via RHQ
A certain amount of enhancements and UI clean up were added to the RHQ Bundle/Provisioning feature. So, I figure now would be a good time to re-introduce it in a new blog entry. I also put together a flash demo if you would like to see the UI in action.

Let me recap what this RHQ Bundle/Provisioning feature is all about. RHQ allows you to bundle up a set of files and push them out to remote machines. You can install and upgrade these sets of files as well as revert back to a previous version of the files or purge the bundle files completely. You sometimes see this mentioned as the "Provisioning" feature, and other times you will see it referred to as the "Bundle" subsystem. (I prefer the term "Bundle" since that is the term that the RHQ user interface uses).

There are a few concepts you must know in order to understand how RHQ does its thing. This is covered on the wiki, but I'll try to explain it briefly here, too.

First is the concept of "bundle". A "bundle" is a logical concept and basically refers to an application ("Pet Store Application" or "My Wiki Server"). A bundle has one or more "versions". A "bundle version" refers to an actual set of files that you want to push out to a set of remote machines. Think of it as your application distribution. Each bundle version has its own "recipe" which tells RHQ what files exist in the bundle, and how to configure and provision those files to the remote machines. Developers or application packagers are responsible for writing the recipe and 'bundling up' the application's files (hence the name 'bundle') with the recipe into an RHQ "bundle version".

What you do with a bundle version brings us to the next set of concepts. A bundle "destination" is associated with a specific bundle and is simply a place (or places) where you want to deploy your bundle. A destination specifies two things - a group (which contains one or more remote machines) and a destination directory (which specifies where on the remote machines' filesystems the bundle files should go). Once you have a "bundle destination" in place, you can begin to deploy one or more of its bundle's versions to that destination. A bundle "deployment" represents one deployment of a bundle version to a destination.

It may make more sense if I give an example. Suppose I have a web application (call it "My Application") that runs inside a JBoss application server. This is my "bundle". I actually have two versions of my application, 1.0 and 2.0. These are my "bundle versions".

Now suppose that I have a QA environment that consists of two machines - a Windows machine and a Linux machine. I want to test my application on my two QA machines. So I need to install my application on both of them. I want to install my application on each machine's "/home/mazz/opt/myapp" directory. This group of two QA machines, along with the destination directory, is a "bundle destination" for my application bundle (call it the "QA destination"). I also have a group of three Linux machines that make up my production environment. After my application passes all tests, I want to deploy my application to that production environment in the "/opt" directory. That group of three production machines, along with the "/opt" directory specification, is another bundle destination associated with my application bundle - call it the "production destination".

Once I tell RHQ to deploy the "1.0" bundle version of my application to my "QA destination", I will have a "bundle deployment". This bundle deployment will be considered the "live" deployment because its the last one I pushed out. I can then test that version while it is on my QA machines. Suppose I find that I want to upgrade my QA environment with the newer "2.0" version of my application. I simply deploy that bundle version to the "QA destination" and now I have a second "bundle deployment". This second deployment is now considered live. If I find that I do not like this newer "2.0" version of my application, I can ask RHQ to revert back to the last live deployment (which was my "1.0" bundle version) - this revert becomes yet another "bundle deployment" (the third) but it reverts back to the "1.0" bundle version content. Once I pass all of my QA tests, I can then deploy whatever bundle version I deem appropriate to my "production destination".

Most of what I describe above is actually demonstrated in my flash demo. The only thing I do not show in the demo is the use of a second "production destination", but it is the same effort to deploy to a second destination as it is to deploy to the first destination.

One new feature that has been introduced to RHQ is the ability to "purge" a destination of all bundle content. If, for example, you want to remove all bundle files completely from the QA destination, you can ask RHQ to purge that destination. What RHQ will do is remove all bundle content from the remote machines that were associated with that destination.

Another new feature that has been added is the ability for RHQ to deploy a bundle into an already existing deployment directory that may have other non-managed content that should be left alone. Such would be the case if you want to deploy an EAR or WAR to a JBossAS deploy/ directory (which obviously has other files inside of it). This deserves some additional explanation.

Typically, you will want to deploy a set of application files into its own directory on some file system. For example, if you have a JBoss application server, you want to install it in something like "/opt/my-jboss". All of your application server files are in that directory, but no other files are in there. If you want to remove your JBoss application server installation, it is as simple as "rm -rf /opt/my-jboss".

However, what if you deployed a bundle version in that directory already, but you then upgrade that bundle deployment with a new bundle version? In this case, you will already have files in /opt/my-jboss (the original bundle version content). RHQ will actually overwrite, backup or ignore conflicting files that it finds following strict upgrade rules. If, for example, RHQ finds files in that /opt/my-jboss directory that don't belong to the new bundle version, they will be removed. RHQ calls this "managing the root directory".

This is usually what you want if you are deploying a standalone software product. If there are any unknown files in the deployment directory, RHQ has to remove them to make sure the bundle deployment directory is exactly in the state the new bundle version recipe wants it to be. However, this is not what you want if you desire to deploy an EAR or WAR to an already existing JBossAS's deploy/ directory. That's because we already know there will be unrelated files in this deploy/ directory that must remain intact and in place. RHQ must leave any files it finds in that destination directory alone - even though they aren't part of our bundle deployment. In other words, we do not want RHQ to manage this root deployment directory.

RHQ now supports this by allowing the Ant recipe author to specify the manageRootDir="false" attribute in the rhq:deployment-unit task. This new feature is documented in Bugzilla 659142 and this new attribute is documented on the RHQ wiki.

You can read more about the Provisioning/Bundle feature on the RHQ wiki.
Writing an RHQ Plugin in Clojure
Clojure is a new, exciting language. My biggest problem with it is that I do not find enough time to work with it. One of the ways I am trying to increase my exposure to Clojure is by exploring ways of integrating it into RHQ. RHQ is well-suited for integrating non-Java, JVM languages because it was designed and built to be extended. In previous posts I have talked about various extension points including agent plugins, server plugins, and remote clients.

I decided to write an agent plugin in Clojure. If you are not familiar with RHQ plugins or what is involved with implementing one, check out this excellent tutorial from my colleague Heiko. Right now, I am just doing exploratory work. I have a few goals in mind though as I go down this path.

First, I have no desire to wind up writing Java in Clojure. By that I mean that I do not want to get bogged down dealing with mutable objects. One of the big draws to Clojure for me is that it is a purely functional language with immutable data structures; so, as I continue my exploration of integrating Clojure with RHQ, I want to write idiomatic Clojure code to the greatest extent possible.

Secondly, I want to preserve what I like to think of as the Clojure development experience. Clojure is a very dynamic language in which functions and name spaces can be loaded and reloaded on the fly. The REPL is an invaluable tool. It provides instant feedback. In my experience Test-Driven Development usually results in short, quick development iterations. TDD + REPL produces extremely fast development iterations.

Lastly, I want to build on the aforementioned goals in order to create a framework for writing RHQ plugins in Clojure. For instance, I want to be able to run and test my plugin in a running plugin container directly from the REPL. And then when I make a change to some function in my plugin, I want to be able to just reload that code without having to rebuild or redeploy the plugin.

Now that I have provided a little background on where I hope to go, let's take a look at where I am currently. Here is the first cut at my Clojure plugin.


I am using gen-class to generate the plugin component classes. As you can see, this is just a skeleton implementation. Here is the plugin descriptor.


I have run into some problems though when I deploy the plugin. When the plugin container attempts to instantiate the plugin component to perform a discovery scan, the following error is thrown:


I was not entirely surprised to see such an error because I have heard about some of the complexities involved with trying to run Clojure in an OSGi container, and the RHQ plugin container shares some similarities with OSGi. There is a lot of class loader magic that goes on with the plugin container. For instance, each plugin has its own class loader, plugins can be reloaded at runtime, and the container limits the visibility of certain classes and packages. I came across this Clojure Jira ticket, CLJ-260, which talks about setting the context class loader. Unfortunately this did not help my situation because the context class loader is already set to the plugin class loader.

After spinning my wheels a bit, I decided to try a different approach. I implemented my plugin component class in Java, and it delegates to a Clojure script. Here is the code for it.


And here is the Clojure script,


This version deploys without error. I have not fully grokked the class loading issues, but at least for now, I am going to stick with a thin Java layer that delegates to my Clojure code. Up until now, I have been using leiningen to build my code, but now that I am looking at a mixed code base, I may consider switching over to Maven. I use Emacs and Par Edit for Clojure, but I use IntelliJ for Java. The IDE support for Maven will come in handy when I am working on the Java side of things.
Server-Side Scripting in RHQ
Introduction
The RHQ platform can be extended in several ways, most notably through plugins that run on agents. There also exists the capability to extend the platform's functionality with scripting via the CLI. The CLI is a remote client, and even scripts that are run on the same machine on which the RHQ server resides are still a form of client-side scripting because they run in a separate process and operate on a set of remote APIs exposed by the server.

In this post I am going to introduce a way to do server-side scripting. That is, the scripts are run in the same JVM in which the RHQ server is running. This form of scripting is in no way mutually exclusive to writing CLI scripts; rather, it is complementary. While a large number of remote APIs are exposed through the CLI, they do not encompass all of the functionality internal to the RHQ server. Server-side scripts however, have full and complete access to the internal APIs of the RHQ server.


Server Plugins
RHQ 3.0.0 introduced server plugins which are distinct from agent plugins. Server plugins run directly in the RHQ server inside a server plugin container. Unlike agent plugins, they do not perform any resource discovery. The article, RHQ Server Plugins - Innovation Made Easy, provides a great introduction to server plugins. Similar to agent plugins, server plugins can expose operations which can be invoked from the UI. They can also be configured to run as scheduled jobs. Server plugins have full access to the internal APIs of the RHQ server. Reference documentation for server plugins can be found here. The server-side scripting capability we are going to look at is provided by a server plugin.

Groovy Script Server
Groovy Script Server is a plugin that allows you to dynamically execute Groovy scripts directly on the RHQ server. Documentation for the plugin can be found here. The plugin currently provides a handful of features including,
  • Customizable classpath per script
  • Easy access to RHQ EJBs through dynamic properties
  • An expressive DSL for generating criteria queries

An Example
Now that we have introduced server plugins and the Groovy Script Server, it is time for an example. A while back, I wrote a post on a way to auto-import resources into inventory using the CLI. We will revisit that script, written as a server-side script.

resourceIds = []

criteria(Resource) {
filters = [inventoryStatus: InventoryStatus.NEW]
}.exec(SubjectManager.overlord) { resourceIds << it.id }

DiscoveryBoss.importResources(SubjectManager.overlord, (resourceIds as int[]))

On line three we call the criteria method which is available to all scripts. This method provides our criteria query DSL. Notice that the method takes a single parameter - the class for which the criteria query is being generated. Filters are specified as a map of property names to property values.
Properties names are derived from the various addFilterXXX methods exposed by the criteria object being built. In this instance, the filter corresponds to the method ResourceCriteria.addFilterInventoryStatus.

The criteria method returns a criteria object that corresponds to the class argument. In this example, a ResourceCriteria object is returned. Notice that exec is called on the generated ResourceCriteria object. This method is dynamically added to each generated criteria object. It takes care of calling the appropriate manager which in this case is ResourceManager. exec takes two arguments - a Subject and a closure. Most stateless session bean methods in RHQ go through a security layer to ensure that the user specified by the Subject has the necessary permissions to perform the requested operation. In the CLI, you may have noticed that you do not have to pass a Subject to the various manager methods. This is because the CLI implicitly passes the Subject corresponding to the logged in user. The second argument, a closure, is called once for each entity in the results returned from exec.

Let's look at a second example that builds off of the previous one. Instead of auto-importing everything in the discovery queue, suppose we only want to import JBoss AS 5 or or AS 6 instances.

resourceIds = []

criteria(Resource) {
filters = [
inventoryStatus: InventoryStatus.NEW,
resourceTypeName: 'JBossAS Server',
pluginName: 'JBossAS5'
]
}.exec(SubjectManager.overlord) { resourceIds << it.id }

DiscoveryBoss.importResources(SubjectManager.overlord, (resourceIds as int[]))

Here we add two additional filters for the resource type name and the plugin. If we did not filter on the plugin name in addition to the resource type name, then our results could include JBoss AS 4 instances which we do not want.

Future Work
The Groovy Script Server, as well as server plugins in general, are relatively new to RHQ. There are some enhancements that I already have planned. First, is adding support for running scripts as scheduled jobs. This is one of the big features of server plugins. With support for scheduled jobs, we could configure the auto-inventory script to run periodically freeing us from manually having to log into the server to execute the script. The CLI version of the script could be wrapped in a cron job. If we did that with the CLI script though, we might want to include some error handling logic in case the server is down or otherwise unavailable. With the server-side scheduled job, we do not need that kind of error handling logic.

The second thing I have planned is to put together additional documentation and examples. With the work that has already been done, the server-side scripting capability opens up a lot of interesting possibilities. I would love to hear feedback on how you might utilize the script server as well as any enhancements that you might like to see.
RHQ: Deleting Agent Plugins
Introduction
RHQ is an extensible management platform; however, the platform itself does not provide the management capabilities. For example, there is nothing built into the platform for managing a JBoss AS cluster. The platform is actually agnostic of the actual resource and types of resources it manages, like the JBoss AS cluster. The management capabilities for resources like JBoss AS are provided through plugins. RHQ's plugin architecture allows the platform to be extended in ways such that it can manage virtually any type of resource.

Plugin JAR files can be deployed and installed on an RHQ server (or cluster of servers), they can be upgraded, and they can even be disabled. They cannot however be deleted. In this post, we spend a little bit of time exploring plugin management, from the perspectives of installing and upgrading to disabling them. Then we consider my recent work for deleting plugins.

Installing Plugins
Plugins can be installed in one of two ways. The first involves copying the plugin JAR file to /jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins . And starting with RHQ 3.0.0, you can alternatively copy the plugin JAR file to /plugins which is arguably easier the much shorter path. The RHQ server will periodically scan these directories for new plugin files. When a new or updated plugin is detected, the server will deploy the plugin. This approach is particularly convenient during development when the RHQ server is running on the same machine on which I am developing. In fact, RHQ's Maven build is set up to copy plugins to a development server as part of the build process.

The second approach to installing a plugin involves uploading the plugin file through the web UI. The screenshot below shows the UI for plugin file upload.



 Deploying plugins through the web UI is particularly useful when the plugin is on a different file system that the one on which the RHQ server is running. It is worth noting that there currently is no API exposed for installing plugins through the CLI.

Upgrading Plugins
The platform not only supports deploying new plugins that previously have not been installed in the system, but it also supports upgrading existing plugins. From a user's perspective there really is no difference in upgrading a plugin versus installing one for the first time. The steps are the same. And the RHQ server, for the most part, treats both scenarios the same as well.

Installing a new or upgraded plugin does not affect any agents that are currently running. Agents have to be explicitly updated in one of a number of ways including,
  • Restarting the agent
  • Restarting the plugin container
  • Issuing the plugins update command from the agent prompt
  • Issuing a resource operation for one of the above. This can be done from the UI or from the CLI
  • Issuing a resource operation for one of the above from a server script.
Disabling Plugins
Installed plugins can be disabled. Disabling a plugin results in agents ignoring that plugin once the agent is restarted (or more precisely, when the plugin container running inside the agent is restarted). The plugin container will not load that plugin, which means resource components, discovery components, and plugin classloaders are not loaded. This results in a reduced memory footprint of the agent. It also reduces overall CPU utilization since the agent's plugin container is performing fewer discovery and availability scans.

Plugins can be disabled on a per-agent basis allowing for a more heterogeneous deployment of agents. For instance, I might have a web server that is only running Apache and the agent that is monitoring it, while on another machine I have a JBoss AS instance running.  I could disable the JBoss-related plugins on the Apache box freeing up memory and CPU cycles. Likewise, I can disable the Apache plugins on the box running JBoss AS.

When a plugin is disabled, nothing is removed from the database. Any resources already in inventory from the disabled plugin remain in inventory. Type definitions from the disabled plugin also remain in the system.

Deleting Plugins
Recently I have been working on adding support for deleting plugins. Deleting a plugin not only deletes the actual plugin from the system, but also everything associated with it including all type definitions and all resources of the types defined in the plugin. When disabling a plugin, the plugin container has to be explicitly restarted in order for it to pick up the changes. This is not the case though with deleting plugins. Agents periodically send inventory reports to the server. If the report contains a resource of a type that has been deleted, the server rejects the report and tells the agent that it contains stale resource types. The agent in turn recycles its plugin container, purging its local inventory of any stale types and updating its plugins to match what is on the server. No type definitions, discovery components, or resource components from the plugin will be loaded

Use Cases for Plugin Deletion
There are a number of motivating use cases for supporting plugin deletion. The most import of these might be the added ability to downgrade a plugin. But we will also see the benefits plugin deletion brings to the plugin developer.

Downgrading Plugins
We have already mentioned that RHQ supports upgrading plugins. It does not however support downgrading a plugin. Deleting a plugin effectively provides a way to rollback to a previous version of a plugin. There may be times in a production deployment for example when a plugin does not behave as expected or as desired. Users currently do not have the capability to downgrade to a previous version of that plugin. Plugin deletion now makes this possible.

Working with Experimental Plugins
Working with an experimental plugin or one that might not be ready for production use carries with it certain risks. Some of those risks can be mitigated with the ability to disable a plugin; however, the plugin still exists in the system. Resources remain in inventory. Granted those resources can be deleted easily enough, but there is still some margin for error in so far as failing to delete all of the resources from the plugin or accidentally deleting the wrong resources. And there exists no way to remove type definitions such as metric definitions and operation definitions without direct database access. Having the ability to delete a plugin along with all of its type definitions and all instances of those type definitions completely eliminates these risks.

Simplifying Plugin Development
A typical work flow during during plugin development includes incremental deployments to an RHQ server as changes are introduced to the plugin. Many if not all plugin developers have run into situations in which they have to blow away their database due to changes made in the plugin (This normally involves changes to type definitions in the plugin descriptor). This slows down development, sometimes considerably. Deleting a plugin should prove much less disruptive to a developer's work flow than having to start with a fresh database installation, particularly when a substantial amount of test data has been built up in the database. To that extent, I can really see the utility in a Maven plugin for RHQ plugin development that deploys the RHQ plugin to a development server. The Maven plugin could provide the option to delete the RHQ plugin if it already exists in the system before deploying the new version.

Conclusion
Development for the plugin deletion functionality is still ongoing, but I am confident that it will make it into the next major RHQ release. If you are interested in tracking the progress or experimenting with this new functionality, take a look at the delete-agent-plugin branch in the RHQ Git repo. This is where all of the work is currently being done. You can also check out this design document which provides a high level overview of the work involved.
Using Byteman To Analyze the RHQ Agent
I recently had a need to run some quick and dirty performance testing over a few method calls in one of the RHQ plugins that was deployed in my agent. I didn't have a profiler installed and ready to go so I used Byteman. I quickly wrote up a couple of Byteman rules (thanks to Jay S. for correcting some problems I had with those rules), ran the RHQ Agent with Byteman installed and got some really good data.

It didn't take me more than 15 minutes to get the agent and Byteman integrated and running with an initial set of rules and it was very simple to explain it to Jay so he could run with the Agent/Byteman combo for his own testing he was doing. It took us longer to determine what rules we needed, since we were using different rules to probe around the agent runtime to see different method timings before we narrowed down a potential problem area.

Another nice thing about this is how easily it is to share rules in order to collaborate with others during testing. When Jay came up with a set of rules that produced an interesting set of data - he just sent his rules to me over IM, and I quickly installed them in my agent's Byteman instance. This allowed me to attempt to replicate the same behavior Jay was seeing with his own agent. Within a few minutes we were able to confirm the behavior on his environment was the same as my environment thanks to the data emitted by his rules.

I realized that this is really a handle tool for not only RHQ core developers, but also for plugin developers out in the community and those that have RHQ deployed and running in a managed environment. If a support engineer needs to get some data from deep within the agent or one of its plugins and log files can't provide this data, using Byteman is an easy, fast and effective way to do it - all without asking the user to perform some heavyweight install of a third-party profiler or to redeploy a "debug version" of the agent or plugin jars.

I bundled up the one required Byteman jar, a sample rules file and a script that you can use to run the RHQ Agent with Byteman installed and preconfigured to run the rules in the rules file. All you have to do is edit the rules file with your own rules and use the provided script to start the agent. No additional third party downloads/installs are needed, you can use the same RHQ Agent you have installed already and no additional reconfiguration of the RHQ Agent is required (except in the case where you have configured a custom RHQ_AGENT_ADDITIONAL_JAVA_OPTS value in rhq-agent-env.sh - if you do, see the comments in rhq-agent-with-byteman.sh to see what you need to do).

Read the wiki page on this for more information if you are interested. You could even tweek the rhq-agent-with-byteman.sh to have it start your own Java app if you want to do this kind of thing outside of an RHQ environment.
Dealing with Asynchronous Workflows in the CLI
Introduction
There is constant, ongoing communication between agents and servers in RHQ. Agents at regularly scheduled intervals for example send inventory and availability reports up to the server. The server sends down resource-related requests such as updating a configuration or executing a resource operation. Examples of these include updating the connection pool setting for a JDBC data source and starting a JBoss AS server. Some of these work flows are performed in a synchronous manner while others are carried out in an asynchronous fashion. A really good example of an asynchronous work flows is scheduling a resource operation to execute at some point in the future. There is a common pattern used in implementing these asynchronous work flows. We will explore this pattern in some detail and then consider the impacts on remote clients like the CLI.

The Pattern
The asynchronous work flows are most prevalent in requests that produce mutative actions against resources. Let's go through the pattern.
  • A request is made on the server to take some action against a resource (e.g., invoke an operation, update connection properties, update configuration, deploy content, etc.)
  • The server logs the request on the audit trail
  • The server sends the request to the agent
    • Note that control is return back to the server immediately after sending the request to the agent. This means that the call to the agent will likely return before the requested action has actually been carried out.
  • The plugin container (running in the agent) invokes the appropriate resource component
  • The resource component carries out the request and reports the results back to the plugin container
  • The agent sends the response back to the server. The response will indicate success or failure.
  • The server updates the audit trail indicating that the request has completed and also whether it succeeded or failed.
    • Note that it is the same request that was originally logged on the original audit trail that is updated
Let's revisit the earlier example of scheduling an operation to start a JBoss server. Suppose I schedule the operation to execute immediately. Then I navigate to the operation history page for the JBoss server. I will see the operation request listed in the history. The history page is a view of the audit trail. The operation shows a status of In Progress. We could continually refresh the page until we see the status change. Eventually it will change to Success or Failure. The status does not necessarily change immediately after the operation completes. It changes after the agent reports the results back to the server and the audit trail is updated.

As previously stated, this pattern is very common throughout RHQ. Consider making a resource configuration update which is performed asynchronously as well. Once I submit submit the configuration update request, I can navigate to the configuration history page to check the status of the request. The status of the update request will show in progress until the agent reports back to the server that the update has completed. When the agent reports back to the server, the corresponding audit trail entry is updated with the results. The same pattern can also be observed when manually adding a new resource into the inventory.

Understanding the Impact to the CLI
So what does this asynchronous work flow mean for remote clients, notably CLI scripts? First and foremost, you need to understand when and where requests are carried out asynchronously to avoid unpredictable, unexpected results. We will discuss a number of things can potentially impact how you think about and how you write CLI scripts.

A method that returns without error does not necessarily mean that the operation succeeded
Let's say we have a requirement to write a script that performs a couple resource configuration  updates, but we only want to perform the second update if the first one succeeds. We might be inclined to implement this as follows,

ConfigurationManager.updateResourceConfiguration(resourceId, firstConfig);
ConfigurationManager.updateResourceConfiguration(resourceId, secondConfig);

Provided we are logged in as a user having the necessary permissions to update the resource configuration and provided the agent is online and available, the first call to updateResourceConfiguration will return without error. We proceed to submit the second configuration change, but the first update might have actually failed. With the code as is we could easily wind up violating the requirement of applying the second update only if the first succeeds. What we need to do here essentially is to block until the first configuration update finishes so that we can verify that it did in fact succeed. This can be  implemented by polling the ResourceConfigurationUpdate object that is returned from the call to updateResourceConfiguration.

ConfigurationManager.updateResourceConfiguration(resourceId, firstConfig);
var update = ConfigurationManager.getLatestResourceConfiguration(resourceId);
while (update.status == ConfigurationUpdateStatus.INPROGRESS) {
java.lang.Thread.sleep(2000); // sleep for 2 seconds
update = ConfigurationManager.getLatestResourceConfiguration(resourceId);
}
if (update.status == ConfigurationUpdateStatus.SUCCESS) {
ConfigurationManager.updateResourceConfiguration(resourceId, secondConfig);
}

The ResourceConfigurationUpdate object is our audit trail entry. The object's status will change once the resource component (running in the plugin container) finishes applying the update and the agent sends the response back to the server.

Resource proxies offer some polling suppport
 Resource proxies greatly simplify working with a number of the RHQ APIs. Invoking resource operations is one those enhanced areas. With a resource proxy, operations defined in the plugin descriptor appear as first class methods on the proxy object. This allows us to invoke a resource operation in a much more concise and intuitive fashion. Here is a brief example.

var jbossServerId1 = // look up resource id of JBoss server 1
var jbossServerId2 = // look up resource of JBoss server 2
server1 = ProxyFactory.getResource(jbossServerId1);
server2 = ProxyFactory.getResource(jbossServerId2);
server1.start();
server2.start();

The call to server1.start() does not immediately return. It polls the status of the operation waiting for it to complete.  The proxy sleeps for a short delay and the fetches the ResourceOperationHistory object that was logged for the request. If a history object is found and if its status is something other than in progress, then the proxy returns the operation's results. If the history object indicates that the operation has not yet completed, the proxy will continue polling.

Resource proxies provide some great abstractions that simplify working in the CLI. The polling that is done behind the scenes for resource operations is yet another useful abstraction in that it makes a resource operation request look like a regular, synchronous method call. The polling however, is somewhat limited. We will take a closer look at some of the implementation details to better understand how it all works.

The delay or sleep interval is fixed
The thread in which the proxy is running sleeps for one second before it polls the history object. There is currently no way to specify a different delay or sleep interval. In many cases the one second delay should be suitable, but there might be situations in which a shorter or longer delay is preferred.

The number of polling intervals is fixed
The proxy will poll the ResourceOperationHistory at most ten times. There is currently no way to specify a different number of intervals. If after ten times, the history still has a status of in progress, the proxy simply returns the incomplete results. Or if no history is available, null is returned. In many cases the polling delays and intervals may be sufficient for operations to complete, but there is no guarantee.

The proxy will not poll indefinitely
This is really an extension of the last point about not being able to specify the number of polling intervals. There may be times when you want to block indefinitely until the operation completes. Resource proxies currently do not offer this behavior.

Polling cannot be performed asynchronously
Let's say we want to start ten JBoss servers in succession. We want to know whether or not they start up successfully, but we are not concerned with the order in which they start. In this example some form of asynchronous polling would be appropriate. Let's further assume that each proxy winds up polling the maximum of ten intervals. Each call to server.start() will take a minimum of ten seconds plus whatever time it takes to retrieve and check the status of the ResourceOperationHistory. We can then conclude that it will take over 90 seconds to invoke the start operation on all of the JBoss servers. This could turn out to be very inefficient. In all likelihood, it would be faster to schedule the start operation, have control return back to the script immediately, and then schedule each subsequent operation. Then the script could block until all of the operations have completed.

As an aside, the previous example might better be solved by creating a resource group for the JBoss servers and then invoking the operation once on the entire group. The problems however, still manifest themselves with resource groups. Suppose we want to call operation O1 on resource group G1, followed by a call to operation O2 on group G2, followed by O3 on G3, etc. We are essentially faced with the same problems but now on a larger scale.

There is no uniform Audit Trail API
Scheduling a resource operation, submitting a resource configuration update, deploying content, etc. are generically speaking all operations that involve submitting a request to an agent (or multiple agents in the case of a group operation) for some mutative change to be applied to one or more resources.  In each of the different scenarios, an entry is persisted on the respective audit trails. For example, with a resource operation, a ResourceOperationHistory object is persisted. When deploying a new resource (i.e., a WAR file), a CreateResourceHistory object is persisted. With a resource configuration change, a ResourceConfigurationUpdate is persisted. Each of these objects exposes a status property that indicates whether the request is in progress, has succeeded, or has failed. Each of them also exposes an error message property that is populated if the request fails or an unexpected error occurs.

Unfortunately, there is no common base class shared among these audit trail classes in which the status and error message properties are defined. This makes writing a generic polling solution more challenging, at least if the solution is to be implemented in Java. A solution in a dynamic language, like JavaScript, might prove easier since we can rely on duck typing. We could implement a generic solution that works with a status property, without regard to an object's type.

Conclusion
It is important to understand the work flows and communication patterns described here as well as the current limitations in resource proxies in order to write effective CLI scripts that have consistent behavior and predictable results. Consistent behavior and predictable results certainly do not mean that the same results are produced every time a script is run. It does mean though that given certain conditions, we can make valid assumptions that hold to be true. For example, if we execute a resource operation and then block until the ResourceOperationHistory status has changed to SUCCESS, then we can reasonably assume that the operation did in fact complete successfully.

Many of the work flows in RHQ are necessarily asynchronous, and this has to be taken into account when working with a remote client like the CLI. Fortunately, there are many ways we can look to encapsulate much of this, shielding developers from the underlying complexities while at the same time not limiting developers in how they choose to deal with these issues.
Updating Metric Collection Schedules
One of the primary features of RHQ is monitoring. Metrics can be collected for resources across the inventory, that metric data is aggregated, and then available for viewing in graphs and tables. Measurements are collected at scheduled intervals. These collection schedules are configurable on a per-resource or per-group basis. Collection intervals can be increased or decreased, and metric collections can be turned on or off as well. There might be any number of reasons why you might want to adjust metric collection schedules. One reason might be that collections are occurring too frequently and in turn causing performance degradation on the host machine.

RHQ exposes APIs for updating measurement schedules through the CLI. I find that some of the APIs exposed through the CLI are not the most intuitive or require a thorough understanding of the RHQ domain classes and APIs. Some of the APIs for dealing with measurements fall into this category. I have started working on putting together some utility scripts that can serve as higher-level building blocks for CLI scripts. My aim is to simplify various, common tasks when and where possible. I put together a JavaScript class that offers a more data-driven approach for updating measurement schedules. Let's look at an example.


The first thing we do is create an instance of MeasurementModule. This class expose properties and methods for working with measurements, most notably the method updateSchedules. We call this method on line 15. updateSchedules takes a single object which specifies the measurement schedule changes. That object is defined on lines 5 - 13. It has to define three properties - context, id, and schedules.

context accepts only two values, 'Resource' or 'Group'. This property declares whether the update is for an individual resource or for a resource group.

The value of context determines how id is interpreted. It refers either to a resource id or to a resource group id.

schedules is essentially a map that declares which schedules are to be updated. The keys are the measurement display names as you see them in the RHQ UI. The values can be one of three things. It can be the strings 'enabled' or 'disabled' indicating that that measurement collection should be enabled or disabled respectively. Or the value can be the collection interval specified as an integer. The collection interval is stored in milliseconds. Most collection intervals are on the order of minutes though.

It is a lot easier to read and write 30 minutes as instead of 1800000 milliseconds. To address this, MeasurementModule exposes the interval method to which we declare a reference on line 2 to facilitate in calculating the interval in a more readable way. We use MeasurementModule.time in conjunction with this method. time has three properties - seconds, minutes, hours. We see these in use on lines 9 and 11.

I think (hope) MeausurementModule offers a fairly straightforward approach for updating measurement schedules. It should allow you to make to make programmatic updates without having an in-depth understanding of the underling APIs.

There is at least one additional enhancement that I already intend to make. I want to provide a way to specify an update that applies to multiple schedules. Maybe something along these lines,


In this example we are updating schedules for a compatible group. We specify a collection interval of one hour for Measurement A, disable Measurement B, and all of the rest of the measurements are set to a collection interval of 30 minutes. I see this as a useful feature and will write another post when I have it implemented.

There is one last annoying detail that needs to be discussed before you start using MeausurementModule. The class uses some of the functions described in Utility Functions for RHQ CLI. Those functions are defined in another source file. When using the CLI in interactive mode, you can use the exec command to execute a script. That command (or an equivalent method/function) however is not available in non-interactive mode. I filed a bug for this a little while back. You can track the progress here if you are interested. This means that for now you need to run in interactive mode. Let's walk through a final example tying all of this together. Assume we are have already logged in through the CLI.

rhqadmin@localhost:7080$ exec -f path/to/util.js
rhqadmin@localhost:7080$ exec -f path/to/measurement_utils.js
rhqadmin@localhost:7080$ measurementModule = new MeaurementModule()
rhqadmin@localhost:7080$ // do some stuff with measurementModule

MeausrementModule is defined in the file measurement_utils.js. A few scripts including measurement_utils.js are shiped in the latest release of RHQ which can be downloaded from here. They are packaged in the CLI in the samples directory.
Building an RHQ Client
RHQ exposes a set of APIs that can be used to build a remote client. The CLI is one such consumer of those APIs. The documentation for building your own remote client says to get the needed libraries from the CLI distribution. If you are using a build tool such as Maven, Gradle, Buildr, etc. that provides dependency handling, it would be easier to be provided with the dependency information needed to integrate into your existing build.

Last week I started building my own client and hit a couple snags. The CLI pulls in dependencies from the module rhq-remoting-client-api; so, I hoped that declaring a dependency on that module was all that I needed.



org.rhq
rhq-remoting-client-api
3.0.0



This pulls in a number of libraries. When I started my simple client though, I got the following error,

java.lang.NoClassDefFoundError: EDU/oswego/cs/dl/util/concurrent/SynchronizedLong (NO_SOURCE_FILE:3)

After a little digging I realized that I needed to add the following dependency.


oswego-concurrent
concurrent
1.3.4-jboss-update1


After rebuilding and restarting my client, I got a different exception.

org.jboss.remoting.CannotConnectException: Can not connect http client invoker. Response: OK/200.
at org.jboss.remoting.transport.http.HTTPClientInvoker.useHttpURLConnection(HTTPClientInvoker.java:348)
at org.jboss.remoting.transport.http.HTTPClientInvoker.transport(HTTPClientInvoker.java:137)
at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:122)
at org.jboss.remoting.Client.invoke(Client.java:1634)
at org.jboss.remoting.Client.invoke(Client.java:548)
at org.jboss.remoting.Client.invoke(Client.java:536)
at org.rhq.enterprise.client.RemoteClientProxy.invoke(RemoteClientProxy.java:201)
at $Proxy0.login(Unknown Source)

I started comparing the dependencies that I was pulling in versus what the CLI used. I was definitely pulling in some additional libraries that are not included in the CLI. While the above exception does not convey a whole lot of information, I knew that it was occurring because I had classes on my local classpath that I should be getting from the server. When I managed to get my dependencies to match up with the CLI, I got past the exceptions.

For other folks who want to build their own RHQ client, this dependency situation can be a bit of a mess. Last night I pushed a new module into the RHQ git repo that alleviates this problem. Now you only need the following dependency,


org.rhq
remote-client-deps
4.0.0-SNAPSHOT
pom


With this, you will pull in only those dependencies that are needed to build your own RHQ client. You do not need to declare any additional RHQ dependencies. You can view the source for the remote-cliet-deps module here.
Utility Functions for RHQ CLI
I have done a fair amount of programming in Groovy, and one of the things that I have quickly gotten accustomed to is closures. Among other things, closure provide an elegant solution for things like iteration by encapsulating control flow. The following example illustrates this.

[1, 2, 3].each { println it }

When working in the CLI however, I have to fall back to a for loop. In the case of a JavaScript array, I do not have to worry about a loop counter.

var array = [1, 2, 3];
for (i in array) println(array[i]);

A lot of the remote APIs, particularly query methods, return a java.util.List. In these situations, I have to use a loop counter.

var list = new java.util.ArrayList();
list.add(1);
list.add(2);
list.add(3);
for (i = 0; i < list.size(); i++) println(list.get(i));
Recently I had to write a script in which I was executing a number of criteria queries and then iterating over the results. Needless to say, I quickly found myself missing the methods in languages like Groovy and Ruby that provide control flow with closures; so, I wrote a few utility functions to make things a bit easier.
// Iterate over a JS array
var array = [1, 2, 3];
foreach(array, function(number) { println(number); });

// Iterate over a java.util.Collection
var list = new java.util.ArrayList();
list.add(1);
list.add(2);
list.add(3);
foreach(list, function(number) { println(number); });

// Iterate over a java.util.Map
var map = new java.util.HashMap();
map.put(1, "ONE");
map.put(2, "TWO");
map.put(3, "THREE");
foreach(map, function(key, value) { println(key + " --> " + value); });

// Iterate over an object
var obj = {x: "123", y: "456", z: "678"};
foreach(obj, function(name, value) { println(name + ": " + value); });
The foreach function is fairly robust in that it provides iteration over several different types including JavaScript arrays, Java collections and maps, and generic objects. In the case of an array or collection, the callback function takes a single argument. That argument will be each of the elements in the array or collection. In the case of a map or object, the callback function is passed two arguments. For the map the callback is passed the key and the value of each entry. For a generic object, the callback is passed each of the object's properties' names and values.

The find function iterates over an array, collection, map, or object in the same way that foreach does. The callback function though is a predicate that should evaluate to true or false. Iteration will stop when the first match is found, that is when the predicate returns true, and that value will be returned. Here are a couple examples to illustrate its usage.
// Find first number less than 3
var array = [1, 2, 3]
// prints "1"
println("found: " + find(array, function(number) { number < 3; }));

// Find map entry with a value of 'TWO'
var map = new java.util.HashMap();
map.put(1, "ONE");
map.put(2, "TWO");
map.put(3, "THREE");
var match = find(map, function(key, value) { return value == 'TWO'; });
// prints "found: 2.0 --> TWO"
println("found: " + match.key + " --> " + match.value);

When find is iterating over a generic object or map, the first match will be returned as an object with two properties - key and value. Lastly, there is a findAll function that is similar to find except that it returns all matches in a java.util.List.

These functions can be found in the RHQ git repo at rhq/etc/cli-scripts/util.js. These functions are neither part of nor distributed with the CLI; however, they may be included in a future release so that the functions would be available to any CLI script.
Auto Import Resources into Inventory
There is no way currently in RHQ through the UI to auto-import resources. You have to go to the discovery queue to explicitly select resources to import. Here is a short CLI script for auto-importing resources.

// auto_import.js
rhq.login('rhqadmin', 'rhqadmin');

var resources = findUncommittedResources();
var resourceIds = getIds(resources);
DiscoveryBoss.importResources(resourceIds);

rhq.logout();

// returns a java.util.List of Resource objects
// that have not yet been committed into inventory
function findUncommittedResources() {
var criteria = ResourceCriteria();
criteria.addFilterInventoryStatus(InventoryStatus.NEW);

return ResourceManager.findResourcesByCriteria(criteria);
}

// returns an array of ids for a given list
// of Resource objects. Note the resources argument
// can actually be any Collection that contains
// elements having an id property.
function getIds(resources) {
var ids = [];
for (i = 0; i < resources.size(); i++) {
ids[i] = resources.get(i).id;
}
return ids;
}

In the function findUncommittedResources() we query for Resources objects having an inventory status of NEW. This results in a query that retrieves discovered resources that have been "registered" with the RHQ server (i.e., stored in the database) but not yet committed into inventory.

DiscoveryBoss is one of the remote EJBs exposed by the RHQ server to the CLI. It provides a handful of inventory-related operations. On line six we call DiscoveryBoss.importResources() which takes an array of resource ids.

 In a follow-up post we will use some additional CLI features to parametrize this script so that we have more control of what gets auto-imported.
Remotely Installing An Agent
One new feature coming soon to RHQ is the ability to remotely install an RHQ Agent on to any machine in your network (with the caveat that the remote machine must be accessible via SSH).

What this now means is you do not have to manually log onto a remote machine and do the tasks of downloading the agent update binary distribution, installing it and running it. RHQ can now do this for you all remotely - all you need to provide to the RHQ GUI is the machine name, your SSH credentials and the location where you want to install the agent.

I think this could still use some enhancements - there is no way to customize each agent's configuration (i.e. providing custom answers to the startup setup questions). But that's for another day. If you want to customize the new agent, you can do so by just importing the RHQ Agent resource into inventory and going to its Configuration tab and adjusting its configuration (don't forget to restart the agent so it can pick up the changed configuration- you can restart the agent using this new RHQ GUI page! I'll talk about that next).

This mechanism will also allow you to stop and start an existing agent that is already installed. This is very useful if you have not started the agent yet but wish to bring it back online, or for those cases where you have not commited the RHQ Agent resource in inventory yet but still need a way to shutdown or restart an agent.

I put together a flash demo that shows this stuff in action.
RHQ’s Powerful New Search Facility

Search was developed to enable users to gain deeper insight, more quickly, into their enterprise by supporting a sophisticated method of querying system state. Some of the notable features that this powerful facility brings are:

  • Arbitrarily Complex Search Expressions
  • Search Suggestions / Auto-Completion / Search Assist
  • Results Matching / Highlighting
  • User Saved Searches

Take a look at the end-user docs (with screen shots) here. Or, if you want to interact with the Search facilities up close & personal, you can download the latest RHQ binaries here.

Considering the SearchBar was written with the primary purpose of being extensible, it will surely become a much more pervasive concept across RHQ in the future. So, let me know what you liked and/or what you would improve, especially if you can think of other features you’d like to see added. You can either post back here or subscribe to the RHQ developer mailing list.


Tagged: rhq
GWT Compilation Performance

A few weeks ago I noticed that the coregui module of RHQ, which is the next generation web interface written in GWT (SmartGWT to be precise), started to have its compilation slow down…noticeably. Not only did the overall time to compile the module increase, but during the build my box seemed to more or less locked up. Even mouse gestures were choppy. So…I decided to investigate.

I started by writing a script that would compile the coregui module for me. It was important to gauge how the different arguments to the maven gwt:compile goal (http://mojo.codehaus.org/gwt-maven-plugin/compile-mojo.html) would affect the system during the build. After reading through the documentation, I decided that ‘localWorkers’ and ‘extraJvmArgs’ were the two most important parameters to test different permutations for. Here is the script I came up with:


#!/bin/sh

# INTERVAL - the minimum amount of memory to use for gwt:compile
# STEPS    - the number of increments of interval
#            e.g. 4 steps of 256 interval = tests of 256, 512, 768, 1024
#            e.g. 5 steps of 200 interval = tests of 200, 400, 600, 800, 1000
# RUNS     - number of iterations at each (INTERVAL,STEP)

INTERVAL=256
STEPS=4
RUNS=10

declare -a AVERAGES
declare -a VARIANCES

function build
{
    workers=$1
    memory=$2

    dirName="workers-${workers}-memory-${memory}"
    vmArgs="-Xms${memory}M -Xmx${memory}M"

    mvnArgs="-Pdev -DskipTests -o"
    gwtWorkers="-Dgwt-plugin.localWorkers=\"$workers\"" 
    gwtJvmArgs="-Dgwt-plugin.extraJvmArgs=\"$vmArgs\""
    gwtArgs="$gwtWorkers $gwtJvmArgs"

    cmd="mvn $mvnArgs $gwtArgs gwt:clean install"
    echo "Executing $cmd"

    total_seconds=0
    rm -rf "target/runs/$dirName"
    mkdir -p "target/runs/$dirName"
    declare -a raws
    for run in `seq $RUNS`
    do
        outputFile="target/runs/$dirName/output.${run}.log"

        before=$(date +%s)
        eval "$cmd" > $outputFile
        after=$(date +%s)

        elapsed_seconds=$(($after - $before))
        raw[$run]=$elapsed_seconds
        total_seconds=$(($total_seconds + $elapsed_seconds))

        echo "Run $run Took $elapsed_seconds seconds"
        echo "Execution details written to $outputFile"
    done 

    average_seconds=$(expr $total_seconds / $RUNS)

    let "index = (($workers - 1) * 4) + ($memory / $INTERVAL) - 1"
    AVERAGES[$index]=$average_seconds

    sum_of_square_deltas=0
    for run in `seq $RUNS`
    do
       let "sum_of_square_deltas += (${raw[$run]} - $average_seconds)**2"
    done
    let "variance = $sum_of_square_deltas / ($RUNS - 1)"
    VARIANCES[$index]=$variance

    echo "Run Total: $total_seconds seconds"
    echo "Run Average: $average_seconds seconds"
    echo "Run Variance: $variance"
}

function run
{
    for workers in `seq 4`
    do
        for offset in `seq $STEPS`
        do
            memory=$(($INTERVAL * $offset))
            build $workers $memory
        done
    done
}

function print
{
    echo "Results"
    printf "          "
    for headerIndex in `seq 4`
    do 
        printf "% 12d" $headerIndex
    done
    echo ""

    for workers in `seq 4`
    do
        let "memory = $workers * $INTERVAL"
        printf "% 10d" $memory
        for offset in `seq $STEPS`
        do
            let "index = ($workers - 1) * 4 + ($offset - 1)"
            printf "% 6d" ${AVERAGES[index]}
            printf "% s" "("
            printf "% 4d" ${VARIANCES[index]}
            printf "% s" ")"
        done
        echo ""
    done
}

if [[ $# -ne 1 ]]
then
    echo "Usage: $0 "
    exit 127
fi

rhq_repository_path=$1
cd $rhq_repository_path/modules/enterprise/gui/coregui

echo "Performing timings..."
run
print
echo "Performance capture complete"


In short, this will build the coregui module over and over again, passing different parameters to it for memory (JVM arguments Xms/Xmx) and threads (GWT compiler argument ‘localWorkers’). And here are the results:

  1 2 3 4
256 229(6008) 124(4) 124(27) 124(12)
512 141(193) 111(76) 113(82) 114(25)
768 201(5154) 115(98) 123(57) 195(2317)
1024 200(2352) 125(83) 199(499) 270(298)

The columns refer to the number of ‘localWorkers’, effectively the number of concurrent compilation jobs (one for each browser/locale). The rows refer to the amount of Xms and Xmx given to each job. Each row represents statistics for the build being executed ten times using the corresponding number of localWorkers and memory parameters. The primary number represents the average time in seconds it took to compile the entire maven module. The parenthetical element represents the variance (the square of the standard deviation).

So what does this grid tell us? Actually, a lot:

  • End-to-end compilation time suffers when using a single localWorker (i.e., only one permutation of browser/locale is compiled at a time). Not only does it suffer, but it has a high variance, meaning that sometimes the build is fast, and sometimes it isn’t. This makes sense because the compilation processes are relatively independent and generally don’t contend for shared resources. This implies the compilation is a natural candidate for a concurrent/threaded solution, and thus forcing serialized semantics can only slow it down.
  • The variance for 2 or more localWorkers is generally low, except for the lower right-hand portion of the grid which starts to increase again. This also makes sense because, for the case of 4 threads and 1GB memory each, these concurrent jobs are exhausting the available system memory. This box only had 4GB ram, and so all of the physical memory was used, which starts to spill over to swap, which in turn makes the disk thrash (this was verified using ‘free’ and ‘iostat’). Regardless of what other activities were occurring on the box at the time as well as how the 4 threads are interleaved, it has an effect on the build because approximately ½ GB was being swapped to my 7200rpm disk during the compilation. (Granted, I could mitigate this variance by replacing my spindles with an SSD drive (which i may end up doing) but it is nonetheless important to be mindful of the amount of memory you’re giving to the entire compilation process (all workers threads) relative to the amount of available physical RAM.)
  • The 512MB row has values which are relative minimums with respect to the other timings for the same number of localWorkers given different memory parameters. This indicates that 512 is the sweet spot in terms of how much memory is needed to compile the modules. I would surmise that the cost of allocating more memory (256MB or 512MB for each of 2, 3, or 4 worker threads) and/or the cost of GC largely accounts for the slightly decreased performance with other memory settings. And then, as mentioned above, with lots of threads and very high memory, swapping to disk starts to dominate the performance bottleneck.
  • Aside from end-to-end running time, it’s also important on a developer system to be able to continue working while things are building in the background. It should be noted that each worker thread was pegging the core it ran on. For that reason, I’m going to avoid using 3 or 4 workers on my quad-core box because my entire system crawls due to all cores being pegged simultaneously…which is exacerbated in the case of high total memory with 3 or 4 workers causing swapping and disk thrashing alongside all my cores being pegged.

Conclusions:

On my quad-core box with 4 gigs of RAM, the ideal combination is 2 localWorkers with 512MB. The end-to-end build will consistently (recall the low variance) complete just as quickly as any other combination (recall the low average running time), and it won’t peg my box because only half of my cores will be used for the GWT compilation, leaving the other half for other processes i have running…eclipse, thunderbird, chrome, etc.

So what happens now if I take and move this script over to a larger box? This new box is also a quad core, but has a faster processor, and more than twice the amount of RAM (9 gigs). From everything deduced thus far, can you make an intelligent guess as to what might happen?

….

I anticipated that with more than twice the amount of RAM, that the worse-case permutation (4 localWorkers with 1GB of Xms/Xmx each) would cause little to no swapping. And with a more powerful processor, that the average running times would come down. And that is exactly what happened:

  1 2 3 4
256 153(24) 98(0) 96(1) 95(1)
512 150(23) 96(1) 96(2) 95(1)
768 149(32) 97(1) 96(1) 95(0)
1024 149(24) 96(0) 96(0) 95(1)

With nearly non-existent variance, it’s easy to see the net effect of having plenty of physical resources. When the processes never have to go to swap, they can execute everything within physical RAM, and they have much more consistent performance profiles.

As you can see, adding more cores does not necessarily yield large decreases for the end-to-end compilation time. This can be explained because (at the time of this writing) RHQ is only compiling 6 different browser/locale permutations. As more localizations are added in the future, the number of permutations will naturally increase, which will make the effect of using more cores that much more dramatic (in terms of decreasing end-to-end compilation times).

Unfortunately, the faster processor on the larger box still got pegged when compiling coregui. So since 3 or 4 localWorkers doesn’t result in dramatically improved end-to-end compilation times, it’s still best to use 2 localWorkers on this larger box. I could, however, get away with dropping the memory requirements to 256MB for each worker, since the faster hardware seems to even out the the performance profile of workers given different memory parameters holding the localWorkers constant.

Lessons learned:

  • In any moderately sized GWT-based web application, the compilation may take a considerable portion of each core it executes on, perhaps even pegging them at 100% for a portion of the compile. Thus, if you want your development box to remain interactive and responsive during the compilation, make sure to set the localWorkers parameter to something less than the number of cores in your system.
  • Don’t throw oodles of memory at the GWT compilation process and expect to get a speedup. Too much Xms/Xmx will cause the worker threads to spill out of main memory and into swap, which can have a detrimental affect on the compilation process itself as well as other tasks on the box since I/O requests explode during that time. Modify the script presented here to work with your build, and obtain reasonable timings for your GWT-based web application at different memory levels. Then, only use as much RAM as is required to make the end-to-end time reasonable while avoiding swap.
  • Provide reasonably conservative default values for localWorkers and Xms/Xmx so as not to give a negative impression to your project’s contributors. Err on the side of a slow build rather than a build that crushes the machine it runs on.
  • Parameterize your builds so that each developer can easily override (in a local configuration file) his or her preferred values for localWorkers and Xms/Xmx. Provide a modified version of the script outlined here, to allow developers to determine the ideal parameter values on each of their development boxes.

Other tips / tricks to keep in mind:

  • Only compile your GWT application for the browser/locale permutation you’re currently developing and testing on. There’s no need to compile to other languages/locales if you’re only going to be testing one of them at a time. And there’s no need to compile the applications for all browser flavors if the majority of early testing is only going to be performed against one of them.

Recall at the start of this article I was investigating why my box seemed to be non-responsive during the compilation of coregui. Well it turns out that one developer inadvertently committed very high memory settings, which as you’ve now seen can cause large amounts of swapping when localWorkers is also high. The fix was to lower the default localWorkers to 2, and reduce the memory defaults to 512MB. As suggested above, both values are parameterized in the current RHQ build, and so developers can easily override either or both to their liking.


Tagged: gwt, java
RHQ 3.0.0 Has Been Released
The RHQ team is proud to announce the immediate availability of RHQ 3.0.0.

This release features several months of hard work by the development team and external contributors. Many of the changes have been already made available in the past through seven community releases.

You can browse the full release notes on the RHQ wiki.

The release can be downloaded via the RHQ web site or directly from SourceForge.

Download it and try it out. The documentation wiki has full install instructions.
RHQ and Customizable Dashboards

For the second in the series I've uploaded a demo of the latest prototype dashboard system for the next generation RHQ user interface. This new UI is being built upon the GWT and SmartGWT projects to provide a richer experience and quicker access to the vast amount of information gathered by RHQ. While we are currently finalizing the RHQ 3.0 release this new user interface will make its debut in the 4.0 series that will start into milestone releases shortly after 3.0.

Exposing domain objects to GWT

To continue the discussion on our use of GWT for RHQ I wanted to explain our success in directly utilizing the domain objects of RHQ in our client side code. With GWT allowing us to write our UI view code directly in Java there's even more advantage in direct access than with some of our other client technologies. We were able to reasonably easily get our domain module to GWT compile making it possible to use these objects in the generated client-side JavaScript and to serialize them between the tiers. A little bit of maven setup and we're of developing code that will run in the browser that is as direct and rich as swing development.

Because we're using JPA and Hibernate, we still have the need to prepare our entities for serialization. We've been doing this with a custom built utility for a while now, I've also more recently discovered the Gilead project's more robust solution to this. We also have JAXB bindings for our domain objects allowing JAX-WS and JAX-RS style integration. This let's our server use domain object to interact with 1) our agent, 2) or CLI / Java Client 3) WebServices clients over JAXB / JAX-WS 4) Restful clients with JAX-RS and now 5) the web browser with GWT.

Additionally, we've built a pattern for data queries that delivers a domain specific solution similar to Hibernate's criteria query concept that layers in our domain data security model. This greatly simplifies our client programming model by reducing the number of custom use APIs that are developed across the UI construction. In porting our UI to GWT we're often able to change the data access model from our older custom UI APIs over to criteria queries and simplify the code at the same time. All the while giving us new opportunities to expose our data in new ways... Such as a prototype configuration comparison tool.

New RHQ UI Technology
pp>For the next version of RHQ we have been prototyping with the GWT UI technologies that I hope will improve the capability and usability of the RHQ project. The development is still in early stages and we're planning to roll out the new UI technology over the course of several releases along with the other major feature development. New features will likely to be the first to be released in the new technology.

I've put together a short screencast of this new UI and I plan to show some more work as it happens and share some experiences with the technology. My impression so far is that this will open up some new possibilities in our designs for complex workflows and information presentation.

While nothing is set in stone, I'm seeing good results with the use of SmartGWT for a component library. Being able to directly utilize and manipulate our domain objects in the browser makes for some nice reusability and quick development.

Git – Local Tracking Branches

Whenever I clone a new copy of the RHQ repository from git, I’m placed in a local tracking branch of master. However, I rarely (if ever) want to work directly in master. Instead, I want to work in one of the numerous features-branches we have for the project.

I could create local tracking branches for the lines of development I’m interested in following, but that would be tedious considering there are nearly 3 dozen to choose from these days. Not only that, but I would need to repeat those commands each time I clone the repository (which I sometimes do so I can work in two different branches simultaneously as opposed to using ‘git stash’ and flipping back and forth between them), not to mention that I’d need to repeat those commands on all machines I work with (my home desktop, my work desktop, and my work laptop).

So I figured I would automate this by writing a quick shell script that would interrogate git, gather the lists of remote branches, and create local tracking branches automatically. I wanted to write the script such that it would set up local tracking branches for any newly created remote branches since the last time the script was run. Also, to keep things simple, I wanted the name of my local tracking branches to mirror the names of the remote branches. With those things in mind, I came up with:

#!/bin/sh

if [[ $# -ne 1 ]]
then
   echo "Usage: $0 "
   exit 127
fi

cd $1

# get all remotes except HEAD
remotes=`git branch -r | grep -v HEAD`
locals=`git branch -l`

locals_tracked=0;
total_remotes=0;
for remote_branch in $remotes 
do
   let "total_remotes++"

   # strip 'origin/' off of the remote_branch URL form
   local_branch=${remote_branch##origin/}
   
   already_had_local_branch=0
   for existing_local_branch in $locals 
   do
      if  [[ $existing_local_branch == $local_branch ]] 
      then
         already_had_local_branch=1
         break
      fi
   done

   if [[ $already_had_local_branch == 0 ]] 
   then
      git branch --track $local_branch $remote_branch
      let "locals_tracked++"
   else
      echo "Already had local branch '$local_branch'"
   fi
done

echo "Found $total_remotes remote branches"
echo "Created $locals_tracked local tracking branches"

And here is how I use it. First, I clone the RHQ repository:

[joseph@marques-redhat] git clone ssh://git.fedorahosted.org/git/rhq/rhq.git rhq-repo

Next I run my script, passing as an argument the name of the directory I just created to store my local copy of the repository:

[joseph@marques-redhat] ./git-setup-locals.sh rhq-repo

The script will show you output along the lines of:

Branch agentPlugin set up to track remote branch agentPlugin from origin.
...
Already had local branch 'master'
...

And finally print some summary information at the bottom:

Found 34 remote branches
Created 33 local tracking branches

So what is this good for? Well, it enables me to quickly bounce around multiple lines of development with ease (using ‘git checkout <branch>’) and without having to worry about whether or not I have a local tracking branch setup for that feature-branch yet.

—–

In the grand schema of things, this single issue isn’t a huge win. However, if you’re not mindful of the little things, enough of them can add up over time and start to have a real effect on productivity. So I tend to automate these things sooner rather than later so that I can forget about them and more easily focus on the code itself or the business problem at hand.


Tagged: scm
Web Development Tips – automate the little things

I recall a colleague of mine mentioning several weeks ago that it’s annoying to have to log into RHQ every time you redeploy UI code that causes portal-war’s web context to reload. I completely agreed at the time, but it wasn’t until today that I finally got annoyed enough to look for a workaround myself. Here’s the solution I ended up with:

1) Use FireFox
2) Download and install GreaseMonkey
3) Install the AutoLogin script
4) Log into “http://localhost:7080/Login.do&#8221; and make sure to tell FF to remember your password
5) Test that auto-login is working properly by logging out of the application…you should be forwarded to the login page, which FF will automatically fill in with your saved credentials, and the grease monkey script will perform the login for you

This should also work when you get logged out due to session expiry. The expiry handler will redirect you back to /Login.do, which will now automatically log you back in and – on a best effort basis – redirect you back to the last “valid” page you were on. RHQ has a mechanism for recording the last couple of pages you visited (see WebUserTrackingFilter) and will try them in most-recently-visited order until it finds a page that doesn’t blow up with JSF’s “classic” ViewExpiredException. I discuss the details of how this mechanism works in my other post.

Note: if you ever want to log into localhost with a different user, all you have to do is click the GreaseMonkey icon (on the far right-hand side of the status bar at the bottom of your browser) and you’ll temporarily disable the AutoLogin script from executing.

How would you solve this? How have you solved this? I’m eager to read your post backs.


Tagged: webdev
Autocomplete and the RHQ CLI

Autocomplete is a feature that I can not live without. It must be the curse of using better and better IDEs and phones. Not that I'm too lazy to type out the rest of a command, but something drives me nuts about the inefficiency of having to do it when I could use that time to do something more useful. So when it came time to write the command line interface for RHQ last year I knew it was a feature we had to get in there somehow.

The general concept of the RHQ CLI is that we're exposing the remote APIs for the server directly into a Java6 Scripting Engine context. This was the best way I found to provide full interactivity on the command line. It allows us to import classes, put utility classes in the context and then let the script writer build and keep Java objects as state during a session. All while providing the dynamic, weakly-typed environment of a scripting language. In our case, we went with JavaScript because it ships as a default implementation of JSR 223 in JDK6+ and had a good interactivity model that made it suitable for line-by-line execution or executing as a full script. We also support executing individual commands or expressions against the shell script so it can be called for bash, etc.

We'll start by logging in to the CLI which opens

bin/rhq-cli.sh -u rhqadmin -p rhqadmin
RHQ - RHQ Enterprise Remote CLI 1.4.0-SNAPSHOT
Remote server version is: 1.4.0-SNAPSHOT(0)
Login successful
rhqadmin@localhost:7080$

First thing we'll do is hit tab to see the default completions. This will display all the objects in the scripting context. To start with, the RHQ CLI will put the service proxies for each of the RHQ remote interfaces into the context. This makes it easy to see the available services. You'll notice a few other objects in context that will help in your use of the CLI. There's a pretty printer, some CSV export utilities, etc. Another thing you'll notice is that the tab completions will show up in multiple columns that properly utilize the width of your terminal window. This is thanks to the use of the excellent JLine library which also provides the infrastructure on which I built the auto completion technology.

Next we'll take a look at one of the services. We'll start with the ResourceManager that will help us lookup and find Resources that are being managed and monitored by RHQ. Typing "ResourceM" <tab> will complete out the service name and <tab> will add a period for calling a method on that interface. One final <tab> will list out the methods that are available on that service.

 findChildResources            findResourceComposites        findResourceLineage
findResourcesByCriteria       getLiveResourceAvailability   getParentResource
getResource                   liveResourceAvailability      parentResource
resource                      toString                      uninventoryResources

If we hit <tab> again it will go to an alternative completion mode and display full method syntaxes.

       ResourceAvailability getLiveResourceAvailability(int resourceId)
             List findResourceLineage(int resourceId)
         PageList findResourcesByCriteria(ResourceCriteria criteria)
                       void uninventoryResources(int[] resourceIds)
         PageList findChildResources(int resourceId, PageControl pageControl)
                     String toString()
                   Resource getResource(int resourceId)
                   Resource getParentResource(int resourceId)
PageList findResourceComposites(ResourceCategory category, String typeName, int parentResourceId, String searchString, PageControl pageControl)

All of this is accomplished through some basic syntax parsing and the use of reflection. Since we have the objects in the context we can directly introspect them and build up syntax and signatures quite easily. Next we'll try out a criteria based search. These criteria searches are a pattern we've built up that provide some similar capabilities to hibernate criteria lookups, but in a way that is more focused on the problem and that enforces and layers in the RHQ security model. We build the criteria and set some options.

 
rhqadmin@localhost:7080$ var criteria = new ResourceCriteria();
rhqadmin@localhost:7080$ criteria.addFilterName('ghinkle')
rhqadmin@localhost:7080$ criteria.fetchParentResource(true);

Now when we go to call the service interface's findByCriteria method you'll notice that you can tab complete the criteria parameter to method without typing anything. Since we have the criteria object in the script context and can introspect the method's signature we can do the fancy bit of telling you what variables fit the parameters to a method. A feature that first got me hooked on the IntelliJ Idea IDE.

Now on to the advanced bits. The service interfaces are powerful, but also somewhat low level. We want to make the CLI easy to use for interacting with the management inventory. In order to do so I decided to build a proxy object for a resource that could expose management capabilities that are specific to that Resource. This let's be request a proxy object and then directly look up statistics or run operations while behind the scenes the proper service interfaces are being used and security checks are being properly done. First we build the resource proxy given a resource's ID that we can find from the previous search.

rhqadmin@localhost:7080$ var platform = ProxyFactory.getResource(10001);

Now that we have the proxy we can <tab> autocomplete against it to see what features it has including metrics and operations.

rhqadmin@localhost:7080$ platform.

OSName                OSVersion             architecture          children              contentTypes
createdDate           description           freeMemory            freeSwapSpace         getChild
getMeasurement        hostname              id                    idle                  manualAutodiscovery
measurements          modifiedDate          name                  operations            resourceType
systemLoad            toString              totalMemory           totalSwapSpace        usedMemory
usedSwapSpace         userLoad              version               viewProcessList       waitLoad

In order to make the autocomplete work we haven't hacked it to utilize the resource metadata, instead I've dynamically constructed the Proxy instances Class using Javassist. This system actually generates a real implementation of the dynamically built proxy class that can be interacted with in the script environment like any other Java object. This means no hacking of the script execution engine to fake these calls or having to use map lookup methods when what you really want is a proper interface. It also lets the autocomplete operate using reflection like everywhere else. The other feature that we have in the script environment is an automatic pretty printer that will pretty print the results of any line. Just typing "platform" here will print out a nice property format of the object and its values.

The final bit of autocomplete fanciness is some specific support for certain classes that let us take advantage of the fact that this is a live environment. For example when you have a java.util.Map and want to lookup a specific key, the autocompletion system can offer the valid map keys as inputs to the lookup as the autocomplete below demonstrates.

rhqadmin@localhost:7080$ var map = new java.util.HashMap()

rhqadmin@localhost:7080$ map.put("a","alpha")

rhqadmin@localhost:7080$ map.put("b","beta");

rhqadmin@localhost:7080$ map.get('

'a'   'b'

The power of an interactive script execution environment combined with autocompletion and simple remote API calling makes the RHQ command line interface a powerful means for automation, integration and custom features. You can take a look at our autocompletor class to see how it does the job. An the custom class generator and the proxy implementation class.

JON 2.2 Installation Hints

I thought it would be a good idea to explain the new distribution model for the JBoss Operations Network. With the 2.2 version we have shifted to a base JON server distribution that includes the agent and separate plugin packs that enable support for our platform products such as EWS (tomcat), EAP (JBossAS) and SOA-P. The old model just had separate Server and Agent installs with the plugins included, but with the addition of more products supported through JON we want to give the ability to download the specific desired plugin support and install that alone.

One key to this model is that those expecting an agent download will no longer see one available. Once the server is installed the agent can be downloaded from the administration menu within the JON web ui. For those looking to use wget to download via CLI you can use something like "wget http://<server-host-name>:7080/agentupdate/download".

Plugin packs should be installed according to the readme that accompanies them. This amounts to copying the downloaded plugins to <server-installation>/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins. From there the plugins will be downloaded by agents the next time they start up or by running the "plugins update" command from the agent prompt or the "Update Plugins" operational command from within the web ui. This feature can also be used with groups of agent's allowing you to update all your agents at once.

New Beta of Jopr Managment Platform with Tomcat management

Earlier this week, the Jopr/RHQ team put out a new beta of our upcoming 2.2 release with some substantial new features. We have available a quick visual tour of the upcoming changes. The release includes bug fixes, performance enhancements, many new features and UI improvements. The release can be downloaded from our Release Page.

A list of some of the major items follow.

  • Standalone Apache Tomcat Management
  • New user interface with rich application menus
  • New Tree oriented navigation of managed resources
  • Cluster oriented tree views for managing groups of resources
  • Group configuration editing
  • Subsystem views for cross sytem views of critical information
  • New metrics analysis system for quick discovery of abnormal conditions
  • DynaGroups query enhancements
  • Configuration change detection for auditing external edits
  • Availability history and metrics display
  • Agent auto-upgrades will allow future upgrades to be automatic for agents

Try it and give us feedback on our forums.

Jopr and the angry demo gods

To those who were disappointed that the Jopr plugin development demonstration was cut short I wanted to apologize. It seems like my laptop might have overheated (wouldn't reboot until it cooled a few minutes)... but at least Jopr recorded the temp climbing and climbing during the presentation until it gave out. Not sure if Fedora10 and this video driver are getting along all that well.


One note, there's a bug in this experimental plugin that seems to be reading out Fahrenheit when it should be Celsius... gotta fix that.

For those of you interested in learning more about plugin development for Jopr and RHQ I'm going to list out some useful links and sources of information. Also, I'm going to try to put together a video demonstration of what I was going to cover since it has some useful visual demonstrations of how plugin code and configuration fit together.

Webinar on Management with the Jopr project

I'll be delivering a Webinar tomorrow at 2PM EST on the use and extension of the RHQ / Jopr projects. You can register at: http://inquiries.redhat.com/go/redhat/JOPRWebinar. I'll presenting on using these technologies to monitor your environment and applications and on how easy it is to get started with your first custom plugin.

Visualization Prototype

This rather low quality flash recording shows some of the features of the prototype I've been experimenting with for richer browsing of monitoring data exposed through RHQ and Jopr. Just a start.

Jopr: Open source JBoss management

I am pleased to announce the release of a new open source enterprise management solution for JBoss Application Server called Jopr. This project is the open source form of our JBoss Operations Network product built on the RHQ platform that I've written about several times. Jopr will act as the upstream, open source project to the Operations Network and provide a place for the community to collaborate on the use and development of this technology for implementing middleware management.

The RHQ project infrastructure provides a base management platform for discovering and tracking managed enterprise elements in a shared inventory. The project provides a base for the development of different aspects of management such as monitoring, configuration, content deployment, alerting and operational control. By building these solutions in a shared infrastructure you can correlate the performance and problems between varying solutions in an environment. Jopr and the JBoss Operations Network build on this base to provide JBoss Application Server management on top of the basic OS and project monitoring provided in the base platform, adding in elements like app server discovery, hibernate query monitoring, ejb method timings and JBoss Messaging statistics. They also benefit from the platform's centralized inventory model, fine grained security, advanced event and auditing models and extensibility.

Jopr provides JBoss AS auto-discovery, monitoring, configuration, log tracking and other elements of Operations Network, in an unsupported form for the community. This will allow JBoss users to work in that community to build administrative solutions for the JBoss product suite while the Operations Network continues to provide a supported, certified solution to management in critical environments as well as the added value of the managed deployment of cumulative patches. Best of all, this is all built on an open source platform that can be fully extended to other types of services, down to application and business specific monitoring and management, or to entirely new resource to be managed. Plugins written to the RHQ platform specification will work against Jopr and ON.

Jopr is being released initially as version 2.1 to identify the fact that it is based off technology that has been under development and release for many years as a commercial product. It also currently maps to the recently released JBoss Operations Network 2.1 release and the RHQ 1.1 release all executed wonderfully by a dedicated team of developers. In the future I'll write more about the roadmap and direction for these three projects, but these releases are a great milestone in providing the necessary tools for administrators of open source middleware.

Embedded Jopr - An Open Source Admin Console for JBossAS
Embedded Jopr (pronounced "jopper") is an exciting new open source project that has just been released by JBoss. In a nutshell, it's a web application that can be dropped into the deploy directory of a JBossAS 4.2.x instance to provide a user-friendly front end for administering the app server. Version 1.0 includes support for:
  • monitoring metrics for the server itself and for webapps
  • monitoring metrics and editing configurations for datasources, connection factories, JMS topics and queues, and the host JVM
  • creating and deleting datasources, connection factories, and JMS topics and queues
  • executing scripts that reside within the JBossAS bin directory

Under the hood, Embedded Jopr is based on the Jopr project, which is in turn based on the RHQ project. RHQ is an open source enterprise management infrastructure framework, which includes a simple yet full-featured API for writing management plugins for managing just about anything under the sun. RHQ also bundles a set of plugins for managing popular open source products such PostgreSQL and Apache Web Server. The Jopr project consists of the RHQ platform, along with a set of additional plugins for managing JBossAS and other JBoss projects - Hibernate, JBossWeb, etc.

To understand how Embedded Jopr leverages RHQ and Jopr, let's first take a step back and look at the architecture of Jopr. It consists of a single Server with multiple Agents reporting into it. There is one Agent running on each machine that is hosting applications to be discovered and managed by Jopr. Each Agent contains a Plugin Container that in turn contains plugins for each of the supported managed products. The Agent essentially wraps the Plugin Container in a standalone process that can communicate with the remote Jopr Server. The Server stores all data reported by the Agents in a central database and provides a web-based GUI for viewing and updating that data.

The goals of Embedded Jopr were considerably less complex than those of its older sibling Jopr. Embedded Jopr sought only to manage a single JBossAS instance. Because we did not need to manage multiple products running in separate OS processes, and because the JBossAS instance being managed was a Java-based application server, we could embed the Jopr Plugin Container inside an application deployed to the JBossAS instance itself - no need for a separate Agent process.

As for what type of application in which to embed the Plugin Container, we had a couple requirements:
  1. provide a way to bootstrap the embedded Plugin Container when the JBossAS instance starts up
  2. provide a web-based interface to the management data collected by the Plugin Container - real-time monitoring only

Since we only needed to provide real-time monitoring, there was no need to persist any of the collected management data, which meant we didn't require a database or Hibernate, EJB, etc. The Servlet spec provided the necessary APIs for running our bootstrap code at webapp deployment time, and JSF (which is bundled with JBossAS 4.2 and later) provided a nice framework for building a GUI. So the most sensible type of application in which to embed the Plugin Container was a webapp. We were also able to leverage Facelets, Seam, and RichFaces - extensions to JSF that allowed us to create a robust and maintainable web application.

Jopr includes RHQ plugins for managing a number of different products - JBossAS, Apache Web Server, PostgreSQL, Oracle, etc. Embedded Jopr did not need to include all of these plugins - it only needed the plugins required to manage JBossAS, its deployed services, and its host JVM. This allowed us to keep Embedded Jopr relatively lean (about 9 MB).

With such powerful foundational technologies, Embedded Jopr promises to soon become one of the nicest administration consoles out there. We hope to see lots of participation from the community, since this is one project that will greatly benefit all JBossAS users. The main goal for Embedded Jopr 1.1 is to add support for JBossAS 5.
Hardware Monitoring with RHQ

In the realm of extremely early prototype I've started an RHQ plugin that is meant to provide monitoring for some useful aspects of hardware. It only does anything useful on linux with IBM hardware or if you have smartmontools installed. The idea will be to collect some standard hardware info in a common model that isn't specific to a certain hardware. Hopefully letting you compare cpu temperatures across you're entire data center. It also utilizes smolt to gather a bit of other hardware info.

Chassis monitoring

Smart based disk monitoring

Hudson Monitoring Plugin for RHQ

I've gone ahead and committed a new plugin for RHQ that provides monitoring of a Hudson continuous integration server. It's not a particularly complex plugin, but it gives a decent example of using JSON within a plugin. You'll manually inventory the server by pointing at its install URI and it'll discover and track the build statuses of all the projects running on that server. There's more metrics it could be exposing, but its a start.

Viewing lots of monitoring data

In trying to find better ways to view large amounts of data I stumbled across a project called TreeViz that has some nice ways to render hierarchical data. I hooked a few of these up to RHQ to help show the availability of thousands of resources in a single snapshot. One I really like is the sunburst display.

With this view you can click on and drill into sub-trees that then show on the outer rim while still showing the selected context across the entire inventory in the inner rim. The image below shows the same information in a hyperbolic tree view that can be investigated by dragging.

I'll be experimenting with some of these alternative views in my spare time to see if we can include a few out of the box in a useful form.

A couple more new RHQ plugins

I've checked in two more new RHQ plugins to publish. These are also in the experimental stage right now. There's an Oracle plugin with some basic monitoring and an OpenSSH plugin that also experiments with augeas for configuration management.

Also, I'll be at Red Hat Summit next week talking about RHQ and the JBoss Operations Network and some of the cools stuff we are working on.

A couple new RHQ plugins

I finally got a chance to checkin a couple more plugins that I had been working on. The virt plugin uses libvirt to do a bit of xen monitoring and management. Just the basics so far... inventory and monitoring for the host and guests and the ability to create and configure certain things for existing guests, plus virtual device and network monitoring. It uses the very cool JNA project to make calls from the java plugin to libvirt.

I also committed a simple Jira monitoring plugin, though it doesn't do a whole lot as the Jira SOAP API is a bit weak.

Any ideas for other useful plugins? We're taking requests.

JON at JavaOne

JavaOne was a pretty good time this year. I got to demo the newly released versions of JON (2.0) and RHQ (1.0) to nice crowds and met plenty of people interested in the platform. The JBoss party was decent too even if the marketing guys did put our demo on a huge screen. Somehow things get funny on a 12 foot screen. We did manage to turn it into a drinking game at least. We also have JON pint glasses now! Time to recover, get back to the east coast and get back to work.

Advanced Plugins?

The RHQ plugin system is called AMPS or Advanced Management Plugin System. Don't blame me over the name... Mazz came up with it. Anyway, why is it advanced? On the outside it seems pretty plain vanilla. You've got an XML file to define the metadata and you write some code to do the work. Pretty typical. And it may not seem all that sexy, but I do think it demonstrates some new enhancements to a field that's had some pretty tough extension models to date. Its not perfect and we have plans to enhance, tune, simplify, etc. but I'll use a couple posts to cover some of the high-level details and hopefully some of you will find it useful in building your own RHQ plugins.

To start with the plugin system removes the need for many types of boiler plate code. A very basic example is that the user can completely customize the intervals to which a metric is collected. For example, there are a couple dozen metrics that can be collected on PostgreSQL tables. The user could have hundreds or thousands of these in many databases. The centralized server system lets the user choose how often to collect each of these metrics down to a specific resource... but also simplifies setup by shipping with defaults and letting the admin specify a global template interval or to specify an interval on an arbitrary group of tables. Combined with the advance rules-based grouping system we could easily turn on table-size monitoring every 5 minutes for all tables that contain the string "measurement" in their name and that are running on databases in production.

So why is this good for the plugin system. Well the point here is the integration. We don't make users setup cron, a collection configuration file, or run their own threads for collection... but further than that, we can automatically group the collection of metrics to allow the plugin to optimize retrieval. Instead of making a round trip to the database for every metric we can use the API to pass the plugin a list of requested metrics. Through the infrastructure, we're able to do automated grouping. Let's say you've got a metric collected every minute, one collected every two and then 5 collected every five. We'll be able to group those into patterns where each minute only one round-trip collection is made to the database for just the right set of metrics and all without the plugin having to worry about caching or stale data or other miscellaneous junk. It's part of our model to make writing plugins fun and enjoyable.

Our plugin system covers metadata definition of arbitrarily deep trees of managed resources. For those resources it covers the cross-platform auto-discovery of those resources, metric measurement, configuration reading and editing, operations with structured parameters and return values, arbitrary events and log watching and software and content discovery, installation and deployment.

This is just a very small example, but I'll get into more in future posts and introduce some of the plugins I'm working on in spare time.

Performance modeling

How to test framework systems that have varying usage is something we've hit with the RHQ project. Plugins to the platform could expose many types of data at varying frequencies and scales and so we've developed a flexible framework for performance testing. The nice thing about this system is its relative simplicity. We were able to create a standard RHQ plugin that can simulate large complex inventories of resources with just a few settings. How many server resources? How many child resources for each of those and what kinds of metrics and other data do they expose.

Additionally, we've got a system for launching copies of our agents that each look like entirely separate systems to the server. In this way we can simulate the data flow of hundreds or thousands of boxes with a relatively small setup. To the server each has separately managed connections, but we can run many more agents this way than starting separate VMs. The performance and agent launching systems are completely configurable allowing us to dial up and down the load and then we can utilize the instrumentation built into RHQ itself to let us see how critical subsystems are performing. How long is it taking to persist large data sets? How long is it taking to check alert conditions and dampening filters?

With a relatively small level of effort we've been able to increase our confidence in scalability by orders of magnitude. The instrumentation also helps us quickly identify conditions that could become a problem. All good practices if your building systems that need high performance.

More EJB3/Hibernate Utilities

Following on from the previous entry on browsing around your EJB3 entity model I've put together a little utility JSP page that will help you test and understand your JPQL queries and how they translate into database interaction. A common issue on larger EJB3 projects is losing track of what is lazy vs. eager loaded and what the real database impact is of any given query. It can be difficult to map these back to the actual SQL that runs against the database and cumbersome to utilize debug logging to find what you are looking for.

The hibernate.jsp is a standalone page that will lookup the entity manager from JNDI and use some hibernate APIs to translate the query into SQL and execute. Along the way it will give you parameter entry boxes for any query parameters and also count and track the total number of database roundtrips involved in a particular query and time the whole thing. You can also enter execute named queries.

EJB3 Drive Through

The RHQ Project utilizes EJB3 for persistence of around around 100 different entities in our model. As a generic management solution, it stores information about the things your managing {Resources, Users, Groups, Configurations and Measurement data} as well as the large amounts of meta-data it takes to provide an extensible platform. Our model centers around these two areas. A hierarchical resource model and a hierarchical type model that is delivered by plugins. Some of our objects have many connected pieces of information which makes finding data quickly during development rather difficult by normal database queries. Particularly at times when we were still completing the user interfaces to some of these systems it was useful to have a alternate way to browse around the theoretical JPA object graph.

To that end I put together a JSP and a stateless session bean that gives you that view. This code is written to access our beans, but it should be fairly easy to construct a more generic version.

There are two real views in the browser. First the list view shows you a list of all the resources of a specific entity type. There is a quick link on the landing page to the list of the most important entity types that will display one of these lists. I haven't gotten around to pagination support so you'll only see the first 100.

The other view shows a specific entity instance with a top table showing all the simple attributes as their toString values or *-to-one links as a link to that entity's detail view.

The bottom of the detail page shows the list of *-to-many references with each show the relations on the other side as links. Through these links it is possible browse all the way around an EJB3 domain model seeing the field values on specific entities or getting an idea for how they are linked. It can also be useful as a tool to learning the model. It required no changes to the entities and merely uses JPQL queries, reflection and a quick Hibernate call to ensure the data is loaded.

So what's next? Well, it was a quick tool hacked together for a project several years ago that turned out to be useful to me. It is very minimalist (ugly), doesn't support pagination or querying and isn't particularly smart about data display, but it may be useful to you on your JPA project as a development helper.

Information Visualization

I stumbled across this timeline visualization project from MIT recently and thought it would make for a pretty neat way to combine the many types of events collected by RHQ. Managed resources can have configuration change events, operations that are executed, events that are collected and alerts against those and collected monitoring data.

This example shows a Windows box with some alerts being fired at different severities, some operations begin executed successfully and some events collected from the Windows Event Log. The simile timeline allows the user to see a timeline in several bands of different granularity and then to click and drag across them to see different time periods. Dragging across the coarser bands gets you there faster. It also let's me setup different icons for the events and severities so that you can see the information at a glance and can even filter and highlight events by content.

Being open

Oddly enough, this blog went quiet when I went to for an open source company (JBoss). In my prior life in the consulting world, I worked on tools in open source that were generally orthogonal to the business goals and so it was pretty easy to work on and talk publicly about MC4J or minor projects. At JBoss, I would say "I used to be the only open source guy at closed-source companies and now I'm the only closed source guy at an open source company". Its not that I couldn't blog, but I wouldn't be able to blog about what interested me as what interested me was the closed source project I was working on.

I'm happy to say that I'm finally being paid to work on open source. Red Hat announced its cooperation on the RHQ project which constitutes much of what I've been involved with for the past three years. RHQ is an infrastructure project for building management technology and a pretty good one at that. This is just the first step now that we're finally out in the open, but I'll save some for future posts. For now, RHQ is our shot at Management Middleware. (Note: that's not middleware management, though that of course is a big part of what we do with this.