Upgrading Resources In RHQ

Humans make mistakes. That’s because they learn and by making mistakes and experimenting they perfect their skills. Computers on the other hand only do as they are told. When a human interacts with the computer, she expects the computer to be human-like to the extent that it can recover or react to the mistakes she makes while learning the rules of interaction with it.

In case of RHQ one of the users is a plugin developer. An RHQ (agent) plugin is a “thing” that talks to some other software and can configure and monitor it. The other type of the RHQ user is the system administrator that uses RHQ and its plugins to manage their IT infrastructure. For the administrator, the plugin becomes part of the “computer”. But the plugin is made by humans and humans make mistakes.

One of the mistakes the plugin developer can make is to assign a wrong “resource key” during the discovery of resources. A resource key is something that uniquely identifies the particular “resource” the plugin can talk to. You can dive much deeper into the details of this here but in a nutshell, the resource key is extracted from the data the plugin can gather about a resource in such a way that if at some later point in time the plugin is told to rediscover the resources it can manage, the resource key will remain the same for the same resources. Usually, the resource key is a file-system location of some significant configuration file, an installation directory, a port a service is listening on, a CPU id, a mount point, etc. Whatever fits the need of the particular plugin.

If the plugin developer realized she made a mistake in the way the resource key is generated and that it for example doesn’t identify the resource in a completely unique way, the only thing she could do is to create a new version of the plugin with the fix and distribute it to her customers. Easy.

But the consequences for the customers (i.e. the system administrators) were quite severe. Because the resource key algorithm changed, the resources that they already had in their inventories and that they managed, collected stats on, had alerts defined for, etc. suddenly became defunct and a new resource (or even more resources, with the new resource keys) appeared in their place. The administrator would then have to go and define their alerts, add the new resource(s) to the groups they defined for them, etc. all over again. Not to mention the fact the historical data would from now on be split between the “old” and “new” resource so if the admin wanted to retain the historical data they could not just delete the defunct resource. They’d have to keep it in their inventory in which the resource would forever stay in unavailable state showing red icons signifying a problem where there wouldn’t be any (well, the problem was on the plugin developer side, but the admin would suffer for it).

But all that changed. Yesterday, I merged in a new feature that enables the plugin developers to fix their past mistakes in a way that the system admins wouldn’t be punished for.

I called the process “resource upgrade” because it enables the plugin developers to change the data of the existing resources to “upgrade” them so that they conform to the latest version of the plugin code. For now the feature is quite rudimentary and only enables to upgrade the resource key, resource name and description. Obviously, the other big candidates for upgrade would be the plugin configuration (aka connection settings) and resource configuration. When we were thinking about this though we realized that not only would implmenenting the configuration upgrades be quite complex because we would basically have to break the fundamental principle of RHQ configuration which is that the RHQ server is the authoritative source of configuration data, but we couldn’t find a plugin that would benefit from such capability.

If you happen to have a custom RHQ plugin for your piece of software and say to yourself, damn, why oh why didn’t they think about my use case, I’d love to be able to upgrade the configuration of existing resources, please leave a comment here. I’d love to hear about your needs so that we can think about supporting them in the future.

When I said that the resource upgrade supports upgrading the resource name and description I partially lied though. The actual update of these is guarded by a server configuration setting because both resource name and description can be updated by the users and we wouldn’t want to overwrite their updates without consent. For now there is even no way to enable that setting in the UI even though it’d be trivial to add. The problem with this is that for this feature to be useful to our current community and customers, we’d have to update all the plugins with code that would implement it which becomes a substantial amount of work with our 30+ plugins at the moment. But if you feel like you absolutely need this, contact our team. Anything is possible when there’s enough backing 😉

You can read more about the technicalities of the implementation on our wiki.

Advertisements

Database setup for TestNG tests

In my previous post I talked about the approach I took to export data from a database using a JPA model. I also mentioned that that was a part of a larger effort to support performance testing that we are currently implementing for RHQ. This post is a follow-up on that theme. This time we’re going to take a look at how to use the exported data in TestNG based tests.

The problem at hand is basically restoring the database to the exact state as it was when the data for the test was exported. This gets non-trivial in an evolving project like RHQ where we constantly change the DB schema to either add new features or to do performance enhancements. Before each test, we therefore need to do the following:

  1. Recreate the database to the minimum supported version.
  2. Upgrade the database schema to the version from which the data for the test was exported from.
  3. Import the test data.
  4. Upgrade the schema (now with the correct data) to the latest database version.
  5. Run the test.

TestNG is all about annotations so all this should ideally happen transparently to the test just by annotating the methods somehow. As far as I know there is no easy way to add a new custom annotation to TestNG core, but fortunately TestNG 5.12 added support for @Listeners annotation which can be used to add any TestNG defined listener to the test. By implementing IInvokedMethodListener, we can check for presence of our new annotations on the tests and thus effectively implement a new TestNG “managed” annotation.

With @Listeners and IInvokedMethodListener, the implementation is quite easy. We can define a simple annotation that will provide configuration for restoring the database state to be used on the test methods and implement the setup in our method listener.

Let’s take a look at the actual database state annotation copied from our code base:

/**
 * An annotation to associate a test method with a required state of the database.
 * 
 * @author Lukas Krejci
 */
@Retention(value = RetentionPolicy.RUNTIME)
@Target(value = { ElementType.METHOD })
public @interface DatabaseState {

    /**
     * The location of the database state export file.
     */
    String url();

    /**
     * The version of the RHQ database the export file is generated from.
     * Before the data from the export file are imported into the database, the database
     * is freshly created and upgraded to this version. After that, the export file
     * is imported to it and the database is then upgraded to the latest version.
     */
    String dbVersion();
    
    /**
     * Where is the export file accessible from (defaults to {@link DatabaseStateStorage#CLASSLOADER}).
     */
    DatabaseStateStorage storage() default DatabaseStateStorage.CLASSLOADER;
    
    /**
     * The format of the export file (defaults to zipped xml).
     */
    FileFormat format() default FileFormat.ZIPPED_XML;
    
    /**
     * The name of the method to provide a JDBC connection object.
     * If the method is not specified, the value of the {@link JdbcConnectionProviderMethod} annotation
     * is used.
     */
    String connectionProviderMethod() default "";
}

A test class that would use these would look something like this:

@Listeners(DatabaseSetupInterceptor.class)
public class MyDbTests {

    @Test
    @DatabaseState(url = "my-exported-data.xml.zip", dbVersion = "2.94")
    public void test1() {
        ...
    }
}

I think that most of that is pretty self-explanatory. The only thing that needs explained further is the dbVersion and how we are dealing with setting up and upgrading the database schema.

In RHQ we have been using our home-grown dbutils that use one XML file to store the “current” database schema definitions and another XML file (db-upgrade.xml) to detail the individual upgrade steps that evolve the schema (each such step is considered a schema “version”). The first XML is used for clean installations and the other is used to upgrade a schema used in previous versions to the current one. The dbVersion therefore specifies the version from the db-upgrade.xml.

And that’s basically it. You can check the implementation of the DatabaseSetupInterceptor which does exactly the points 1 to 4 mentioned above.

As a final, slightly unrelated, note, we are currently thinking about migrating our own database setup/upgrade tool to liquibase. I think that the above approach should be easily transferable to it by changing the dbVersion attribute to the liquibase’s changeset id/author/file combo but I’m no expert in liquibase. If you happen to know liquibase and think otherwise, please leave a comment here and we’ll get in touch 😉

As with the export tool described in the previous post, I tried to implement this in a way that wouldn’t be tied to RHQ so this could potentially be used in other projects (well, with this time, you’d either have to adopt our dbutils or liquibase, but I think even this could be made configurable).

Posted in Java, RHQ. 7 Comments »

How to export data from a DB using JPA model

In RHQ, we are currently contemplating implementing a series of automated performance tests. For those tests to make any sense, we have to provide them with some initial data to work with.

So the goal is quite simple. Export some defined dataset from an existing database, store it away and import it back again before a test is run. Easy. When I started researching the export part of the problem, I thought there’s bound to be something out there already in existence that would do the job. And I was right. The dbUnit project is exactly what I was looking for. They support extraction of the data from the database and can even follow the foreign key relationships (in both ways) to export the necessary data to keep referential integrity. Great.

But wait. Our data model isn’t that simple. I certainly want all the data that my core dataset depends on to be included in the export, but I also want some of the data that depends on my dataset.

Ok that didn’t make much sense, so let me introduce a little example to explain the problems on. First, let’s look at the class diagram, that will show the relationships between different entities in the model.

These entities are mapped to these tables:

Now let’s say I wanted to export all the resources with their configurations but I’m not interested in the alert definitions. Obviously this is going to require some kind of configuration.

I could stay on the database level and for example create a configuration where I would specifically state something like "I want data from this table." or "I’m interested in this table and all its dependencies but this particular foreign key." and implement a dbUnit search based on this configuration but I’m a Java developer and even though I can write my SQL statements and design a (more or less) reasonable database schema, I certainly don’t love that job. To find out the relationships between tables, looking at the JPA annotated Java code is much quicker and more pleasant to me than looking at table and foreign key definitions.

Before I dive into more details let me show you the configuration file that will achieve the above goal:

<graph includeExplicitDependentsImplicitly="true" 
       packagePrefix="org.rhq.core.domain.resource">
    <entity name="Resource" root="true">
        <filter>SELECT ID FROM RESOURCE WHERE NAME='myResource'</filter>
        <rel field="configuration"/>
    </entity>
    <entity name="ResourceType" includeAllFields="true">
        <rel field="resources" exclude="true"/>
    </entity>
</graph>

This is still a bit of a mouthful but at the same time it’s very powerful. What I’m basically saying there is that I want to export a resource with the name "myResource" and I only want to include its configuration in the export (of course the simple properties of the resource are implicitly exported but configuration is the only relationship that gets exported along with it). Further, I’m telling the exporter that it’s free to export all the data of the ResourceType entities my Resource is dependent upon but I don’t want to include the resources of the ResourceType in the export. This is to prevent the other resources to "leak" to the export due to the explicit relationship of the ResourceType and its "child" Resource entities. The mysterious "includeExplicitDependetsImplicitly" attribute tells the exporter to include all dependents of the entities it encounters unless configured otherwise.

I want the above configuration to cause the exporter to include the following in the export (look at the above class diagram to get a better understanding why I need the below):

  1. "myResource" resource
  2. Its configuration and all its properties
  3. The resource type of the resource
  4. The configuration definition associated with that resource type
  5. All the property definitions of the configuration definition

Details

To achieve the above functionality I needed to create a bridge that would look at the JPA annotations in my domain layer classes and translate the relationships expressed there into SQL terms. Once I have the SQL representation of the domain model relationships I can feed that into dbUnit and use it to export and import the data as well (I also let dbUnit figure out the proper insertion order to keep the referential integrity but more on that later).

The code turned out to be fairly simple and basically consists of creation of an entity dependency graph, where nodes represent the JPA entities and edges represent individual relationships (i.e. a directed, cyclic, multiply connected graph). The JPA annotations contain all the information to translate the entities and their relationships into the terms of SQL tables and columns, the translation is only slightly complicated by the possibility of relation tables (e.g. a relation table to describe a @ManyToMany relationship) (the code is here).

With the SQL mapping at hand I could start linking the code I had with the functionality defined in dbUnit. I chose to implement it as a ITableFilter. By inheriting from DatabaseSequenceFilter I got the correct table order in the export for free and by retaining the insertion order in the sets of the allowed PKs while traversing the entity dependency graph, I was also able to retain the correct insertion order even in cases where a table has a foreign key on itself. My EntityRelationshipFilter can use the above mentioned configuration to restrict the traversal of the entity dependency graph and therefore restrict the resulting export (by relying on an inclusion resolver to tell it what to do). You can take a look at the code here.

Conclusion

Relying on dbUnit to do the "low-level" data export and import for me, I could create a "Java developer friendly" data export tool in just a little bit more than a week’s time. The good thing is that it is completely generic and so it could be easily used with other projects than RHQ (of course, more work would be required on the tool in that case because the translation from JPA to SQL isn’t completely implemented. For example it’s missing handling the implicit values of the JPA annotations (e.g. the table name derived from the class name if the @Table annotation doesn’t explicitly specify a name) and I’m sure I missed some corner cases in handling the relationships as well. But it seems to work for RHQ at the moment which means that it’s already quite capable because our domain model isn’t a trivial one. If there was interest, I’d be more than happy to help create a standalone full-featured tool out of this and take it out of the RHQ source code. You can read even more about the tool on our wiki here.


In the next blog entry, I’ll take a look at the "import and test" part of the solution, namely on the integration of the database setup and data import with TestNG.

Posted in Java, RHQ. 1 Comment »

Measuring UI Performance

RHQ is not built to be used by thousands of users at the same time. We rather add features to the page so that it contains maximum information and context so that the users can make the right decisions about their infrastructure. But even then we do care about a responsive and reasonably performing UI (and the system as a whole of course). Recently I’ve been tasked with researching the performance of our UI layer. Obviously there are a thousand factors influencing the responsiveness of the UI but for the web app developer, there are only a few s/he can do something about. Those in a nutshell are CPU and memory usage of the web app itself and efficiency in the communication with whatever back-end the webapp is using to pull its data from (usually a DB, but in case of RHQ also the agents). We did quite a bit of testing on database and agent communication but we lacked the performance data for the UI layer of RHQ. There obviously are some candidate pages that one might suspect of a need for performance enhancements but which ones to pick? The first thing to decide was to actually determine how to measure the performance of the system. One of the obvious metrics to use is the response time of the HTTP requests. This would be a good start because it’d give me a basic understanding of where the problems might lie. I’d have the starting point to start my search for performance bottlenecks in the URLs that exhibit slow response times. On the other hand the results could be skewed by environmental parameters out of my control like network lag and such like. But since I had the access to the server I wanted to test on, I could do better by measuring metrics on the server itself. On the server side I have much broader choice of what and how I want to measure. If I wanted to I could even insert “probes” into the server to collect stats that are otherwise unobtainable from outside of the server’s JVM. I needed to test several areas of UI using one common testing “algorithm”. I needed to simulate a number of users logging in into the UI and visiting a number of pages from the “same area” (or rather the same page with different query parameters). This simple scenario would give me the least performing areas of UI I could then focus on. To summarize, here is what I was after:

  • don’t bother with response times on the client, i can get the same and more information on the server-side
  • look for memory hogs
  • look for CPU intensive tasks
  • ideally I want to know more than just the URL at which a bottleneck might be.

Measuring the memory can either be done by asking ps command or the JVM itself can provide the heap dump or summary. Measuring CPU is best done just by ps. The JVM can also provide a thread-dump on demand. Neither heap summary nor CPU usage nor the thread-dump can be collected from within the JVM in a simple way (if at all) so I couldn’t track each request directly using some kind of probe (i.e. adding a filter that would collect the data to the RHQ web application). All I could do was to track the request times (either by configuring the embedded Tomcat to produce access log or better by configuring the RHQ server itself to collect response time information about itself (http://wiki.rhq-project.org/display/RHQ/Response+Time+Filter)) and periodically collect the stats using an external script.

Technologies Used

For generating the load on the server I used JMeter. The nice thing about this tool is that it is easily parametrized by data either on command-line or in CSV files. Checkout the very simple JMeter script that I used to generate the load I needed on the server. The script for starting and stopping the RHQ server and JMeter and collecting the stats on memory and CPU usage was simply written in Bash. I used an R script to generate graphs out of the CSV files that the Bash script produces from the collected data. You can find all the source code at the end of this blog entry, if you are interested in trying it out yourself.

Interpreting The Results

The script collects 3 kinds of statistics. The per-thread CPU usage, the JVM heap summary and the JVM thread dump. The scripts run N iterations and collect the data for each stat in each iteration and store it off in a file. After the script has finished collecting the data, it creates CSV files from the CPU usage and heap summary files for easier consumption of that data. Finally, if R is installed, the CSV files are converted into graphs (yet easier to digest). The JVM thread dump is collected so that one can get a basic idea about what each of the threads in the graph has been doing during the iterations (obviously this is not precise because of the time elapsed between the CPU usage and thread dump collections). Let’s take a look at an example graph of the CPU usage.

In there, you can see that one of the threads dominates the CPU usage in the later iterations. This obviously is a sign of a problem. Taking note of the thread id (in the legend of the graph) and comparing it with a "tid" of the threas in the thread dumps in various iterations reveals that that is the VM Thread doing garbage collection. Looking at the heap summary graph

one can easily observe that the application was consuming just too much memory and that the GC, even though it tried really hard, couldn’t handle the load. From that point on, finding the offending code was as easy as taking a full heap dump before and after the test (using the jmap tool that comes with JDK) and finding out which classes contributed the most to the increased memory usage. Eclipse Mat is a great tool for such tasks and finding the code that caused this issue was a breeze.

Tests Layout

If you read all the way down here, you probably are interested in how is this all put together and how does the script obtain all that data. You can find the link to the complete source code at the end of this entry. The zip file you can download contains the bash scripts necessary to run the tests along with an example “test suite” containing the JMeter test file, example input data for it and example tests.

  • testsuite-runner the bash script that will start the testsuite in the background
  • testsuite-run examines the testsuite directory and spawns the individual tests inside it
  • test-run runs a single test (i.e. starts RHQ server, starts JMeter with the test file, collects stats, stops JMeter, stops RHQ server and produces the outputs)
  • example-testsuite contains the testsuite files
    • input a folder containing input data used in the tests. You will have to modify these files in order to make the tests work with your RHQ installation.
    • tests contains the tests directories

The example invocation of the script would like like the following:

testsuite-runner path/to/rhq/server/install path/to/jmeter/install NUMBER_OF_STATS_COLLECTIONS path/to/the/testsuite/directory

This command would start the testsuite in the background. For each test in the testsuite, an RHQ server and would be started, then a JMeter instance would be fired up with the test file for given testsuite and a the provided number of stats measurements would be taken in 10 second intervals. After that the JMeter and RHQ servers woul be taken down and the next test in the testsuite would be started.

Future work

Obviously, these scripts are just a quick and dirty solution to my requirements and have much to be added to them to become truly automated and useful. For starters, the tests do not connect to the RHQ database which makes them tied to a particular RHQ inventory (at a defined point in time), because the inputs of the tests hardcode resource ids. The first enhancement would therefore be to rewrite the scripts in a more capable (scripting) language and make them database agnostic.


The source files and an example testsuite can be downloaded from here.

System.println(“Hello world!”);

Well, the time has come for me to reinvent the wheel. This is going to be a blog about stuff that I find interesting as I move along my way of an open-source developer.

%d bloggers like this: