Phantom reads and data paging

Paging through the results is easy, right?

The client only needs to supply the number of rows to skip and the maximum number of rows it wants returned (aka the page number and the page size). The server then returns the data along with the information about the total number of results available. Et voila you have all the information you need. The number of rows to skip together with the page size give you the information about what page you’re showing and the page size with the total number of rows gives you the total number of pages available. Nothing too difficult or complex.

But there’s a catch. On the server, one needs to perform (at least) two queries – one query to get the data for the requested page and the second query to fetch the total number of rows. Now most of the databases set the default transaction isolation level to READ_COMMITTED and for very good reasons. But this transaction isolation level allows for phantom reads, i.e. 2 queries in the same transaction might “see” different number of rows of data, if another transaction committed and added or deleted rows that would be returned by the queries.
So, it may happen that you will:

  • return “rows 5 to 10 out of 2 total”,
  • say “there are no available results on the first page, while the total number of rows is 5”,
  • etc.

All that info acquired within one transaction.

What can you do about such situations? The obvious solution is to just admit that these things can happen 😉 Another option is to try and detect if such situation might have occured and re-try.

I’ve come up with the following rules for consistency of the results:

N is the actual number of elements on the page, P is the maximum number of elements on the page (i.e. the page size), I is the number of rows to skip and T is total number of results.

  • T < I && N == 0. This means we’re trying to show a page that is past the total number of results. We therefore expect the collection to be empty.
  • T - I > P && N == P. If we are are not showing the last page, the number of elements in the collection should be equal to the page size.
  • T - I <= P && N == T - I. If showing the last page, the collection should have all the remaining elements in it.

These are kind of obvious assumptions but phantom read can easily break them and therefore one should be checking them if one wants to be serious about returning meaningful results to the user.

So while paging is simple in principle, there are a couple of interesting corner cases that one needs to handle if one reads data out of a dynamic data set. It took us a good couple of years in RHQ to get to the bottom of this but hopefully now our paging is more robust than it was before.

Advertisements

Scripting News in RHQ 4.5.0

RHQ 4.5.0 (which we released today) contains a great deal of scripting enhancements that I think are worth talking about in more detail. In my eyes, the changes make the scripting in RHQ ready for serious use.

CommonJS support

This, I think, is huge. In the previous versions of RHQ, the only way of reusing functionality from another script was to use the exec -f command in the interactive CLI shell (in another words, this was NOT available in the batch mode, which is how majority of people are using the CLI). So if you needed to implement something bigger and needed to split your code in several files (as any sane person would do), you only had 1 option – before executing the “script”, you needed to concatenate all the scripts together.

This sucked big time and we knew it 😉 But we didn’t want to just add functionality to “include files” – that would be too easy 😉 At the same time it wouldn’t solve the problem, really. The problem with just “including” the files into the current “scope” of the script is that this would mean that each and every variable or function in those files would have to be uniquely named because javascript lacks any sort of namespace/package resolution. Fortunately, the CommonJS spec solves this problem.

Here’s how you use a module. Notice that you assign the module to a variable and that’s how you prevent the “pollution” of your scope. The loaded module can have methods and variables with the same name as your script and they won’t influence each other:

var myfuncs = require("modules:/myfuncs");
myfuncs.helloworld();

You may wonder what that "modules:/myfuncs" identifier means. It is an URI that the underlying library uses to locate the script to load. This “sourcing” mechanism is pluggable and I will talk about it more in the following chapter. To see some examples of the modules, you can look at the samples/modules directory of your CLI deployment and you can also read some documentation about this on our wiki.

Locating the scripts

With the ability to load the scripts there comes the problem of locating them. For the standalone CLI, the obvious location for them is the filesystem, but what about alert notification scripts on the RHQ server? These scripts are stored in RHQ repositories which don’t have a filesystem location. The solution is not to tie the scripts to the filesystem but have a generic way of locating them using URIs and a pluggable way of resolving those URIs and loading the scripts from various locations. This means that you can for example load the script from an RHQ repository in your standalone CLI installation, or to define 1 central location for your local CLI scripts and use the “modules” URIs to refer to them. Or you can easily implement your own “source provider” and for example load the scripts from your internal git repo or ftp or whatnot. RHQ comes with a small set of predefined source providers, documented here.

With this ability at hand, you can make an RHQ repository a central place for your scripts that you will then be able to use universally – both in the standalone CLI installations and also in the alert notification scripts.

Python

In previous versions, our scripting was tied to Javascript. Thanks to quite a bit of refactoring, the RHQ scripting integration is now language independent and language support is pluggable (see my previous post where I detail how this was done in case of Python).

What this all means is that you can now write your CLI scripts in Python and still use the same API as you were able to use before from Javascript only. I.e. you will find the ResourceManager, AlertManager and all the other predefined variables that define the RHQ API available in Python, too. The only thing that this initial implementation doesn’t support is code-completion in the interactive CLI shell.

Last but not least, the ability load the scripts from various locations is available in Python, too, using an automatically installed path_hook. You can read about how to use it on our wiki. This also means that you can now write your alert notification scripts in Python, too.

When running an alert notification script (i.e. an alert notification of the type “CLI Script”), the language of the script is determined from the script file extension – “.py” for python and “.js” for javascript. When you start the CLI shell, you pass your language of choice using the --language commandline parameter – “javascript” or “python” are the obvious possible values for it.

Conclusion

In my opinion, these changes are great and will allow our users to really start building useful tools using our scripting support. If you feel like you’ve come up with a script module you would like to share with the RHQ community, why don’t you just send a pull request to our github repo with sample scripts?

RHQ meets Arquillian

Historically, RHQ has had a little bit of a problem with test coverage of its various (agent) plugins. There is a multitude of problems with testing these but the following two are, IMHO, the main ones:

Managed Resources

You somehow need to have the managed resource available for the plugin to connect to (i.e. you need to have the JBoss AS, Postgres or whatever your plugin manages). This is always a problem for a clean quick unit test. You either somehow need to mock the managed resource (try that with Postgres) or you need to have a way of configuring your test to get at or start the managed resource. This is where Arquillian certainly can come to the rescue with its ability to manage the lifecycle of its “containers” (for managed resources that have an Arquillian extension, like JBoss AS) but generally this needs to be in the “hands” of the tests for each plugin. There are a million ways the plugins talk to their managed resources and so trying to come up with a generic solution to start, stop and configure them would IMHO create more problems than it would solve.

Setting up Agent Environment

While not even too hard, running your test in RHQ’s plugin container requires a little bit of setup. It is important to realize that if you want your tests to be run inside a real plugin container (i.e. “almost an RHQ agent”), it is not enough to have your dependencies on your test classpath. The thing is that the plugin container is a container of its own – it has its own deployment requirements and classloading policies. It is best to think about deploying a plugin into RHQ agent as deploying a webapp into Tomcat – you wouldn’t expect to be able to test the webapp in Tomcat just by virtue of having them both on the classpath and starting Tomcat.

So to put it straight, you need to jump through some maven and antrun hoops to package your plugin (and any other plugin it depends on) and put them in defined locations, where the plugin container can then pick them from. Also, if you want to take advantage of our native APIs to obtain running processes, etc., you need to use another bucket of antrun incantations in your POM to set that up.

Previous Attempts

The two problems outlined above caused that the test coverage of our plugins is rather low. We always knew this sucked and there have been attempts to change that.

A ComponentTest class used in some of our plugins is an attempt at testing the plugins out-of-container, bootstrapping them with some required input. Advantage of this approach is that you don’t need to care about the plugin container and its intricacies, disadvantage being that you don’t get to test your plugin in an environment it will be deployed to. Also, you need to implement support for bootstrapping the parameters for any plugin facet your plugin implements – in the end you’d end up reimplementing large parts of the plugin container just for the testing needs.

Another attempt was the @PluginContainerSetup annotation that took care of the configuration and lifecycle of the plugin container. The advantage was that you got access to a real plugin container running with your plugins, disadvantage being that you still were required to perform some maven and antrun artistry so that the plugin container could find all the plugins and libraries you’d need.

Enter Arquillian

As I already hinted at above, the RHQ agent shares a lot of similarities with EE/Servlet containers from the deployment point of view. Arquillian was therefore an obvious choice to try and solve our plugin test problems once and for all (well, this is a lie – the problem with having to have a managed resource available for the test is a problem that cannot be reasonably solved using a single solution).

So what is this integration about? It certainly won’t help you, as the plugin developer, with connecting to a managed resource you’re creating your plugin for. But it does bring you a lot of convenience over the previous state of things if you want to test your plugin in container.

Most importantly there is no more any maven and/or antrun required to test your plugin in-container. You just define your plugin in the Arquillian way using the @Deployment annotation (and you can “attach” to it any another plugins it depends on by instructing Arquillian to use the maven resolver). Using arquillian.xml (yes, a configuration file but an order of magnitude shorter and much more focused and simple than pom.xml), you can configure your container to use RHQ’s native APIs by flipping one config property to true. You can declaratively say you want to run discovery of managed resources (using, surprise, a @RunDiscovery annotation) and you get get results of such discovery injected into a field in your test class. You can even set the container up so that it thinks it is connected to an RHQ server and you can provide your ServerServices implementation (i.e. the RHQ server facade interface) and there is a default implementation ready that uses Mockito to mock your serverside. There’s still more, you can read all about the supported features and see some examples on this wiki page.

Conclusion

While not a panacea for all problems the testing of RHQ plugins brings about, using Arquillian we were able to cut the setup needed to run a plugin in-container by 90% and we were able to introduce a number of convenience annotations using which you can get a variety of data injected into your unit tests. This is still just a beginning though, the next step is to start actually using this integration and come up with other useful annotation and/or helper methods/classes that will ease the working with and retrieving information from the plugin container as much as possible.

RHQ CLI over XMPP

I watched the great demo of the XMPP server plugin for RHQ from Rafael Chies. Rafael is using a custom DSL to query the RHQ server for information but I thought that that really shouldn’t be necessary – it should be possible to use an ordinary CLI session behind this. Granted – the “query language” of our remote API is more complicated than the simple DSL Rafael is using but at the same time, the API we use in the CLI is much more feature rich and I wouldn’t have to reimplement any of it if I was able to “blend” the CLI session with the XMPP chat.

So I forked Rafale’s code on github and went off to work. During the course of reimplementing Rafael’s code I discovered 2 bugs in RHQ itself (BZ 786106 and BZ 786194) which I fixed immediately (well, it took me a couple of hours to figure out what the hell was going on there 😉 ). After that, it wasn’t really that hard to integrate XMPP and the CLI’s script engine and here’s a short video to prove that it actually works 🙂 :

RHQ CLI over XMPP on Vimeo.

For the interested, all the important code is included in this class.

Securing Rhino in Java6

In RHQ we let the users provide scripts that can be run when an alert fires. This is great for automation because the script can do anything the users can do with our remote API. But the users of course can write a script like this:

java.lang.System.exit(1); 

This would shut down the whole RHQ server, which, of course, is not so nice.

The solution to this problem is to run the Rhino script engine in a custom access control context. One has to define the set of Java permissions that the scripts are allowed and specifically NOT include the “exitVM” RuntimePermission in the set. After that a custom AccessControlContext can be created with the set of permissions.

But now comes the fun part. In Java6 update 28, the Rhino script engine actually changed the way it can be secured due to a found security vulnerability. So in a Java6 update 27 patched with this patch or in Java6 update 28 and later, the Rhino runs the scripts with the access control context that it was created with itself. In the unpatched Java6 u27 and earlier the scripts were run with an access control context active at the time when the script evaluated.

So what does that mean for you, my dear readers, that want to reliably secure your application and allow custom scripts to be executed in it at the same time? Well, of course, you need to secure your script engine twice (or refuse to run on anything older than Java6 u28).

Let me show you how it is done in RHQ:

ProtectionDomain scriptDomain = new ProtectionDomain(src, permissions);
AccessControlContext ctx = new AccessControlContext(new ProtectionDomain[] { scriptDomain });
try { 
    return AccessController.doPrivileged(new PrivilegedExceptionAction<ScriptEngine>() { 
        @Override 
        public ScriptEngine run() throws Exception { 
            ScriptEngineManager engineManager = new ScriptEngineManager(); 
            ScriptEngine engine = engineManager.getEngineByName("JavaScript");
            return new SandboxedScriptEngine(engine, permissions); 
        } 
    }, ctx);
} catch (PrivilegedActionException e) {
    ...
}

What do you actually see in the code above? The privileged block is there to ensure that the script engine is created using the desired access control context (so that it can use it in Java6 u28). The script engine itself (created by the call to getEngineByName) is then wrapped in a SandboxedScriptEngine which is a special decorator that wraps all the eval() invocations in a access control context with the specified permissions. That will ensure that the access control context is enforced in the unpatched Java6 u27 and earlier.

%d bloggers like this: