Chapter 7. Sequencing content

As we've mentioned before, JBoss DNA is able to work with existing JCR repositories. Your client applications make changes to the information in those repositories, and JBoss DNA automatically uses its sequencers to extract additional information from the uploaded files.

This chapter discusses the sequencing features of JBoss DNA and the components that are involved.

7.1. Sequencing Service

The JBoss DNA sequencing service is the component that manages the sequencers, reacting to changes in JCR repositories and then running the appropriate sequencers. This involves processing the changes on a node, determining which (if any) sequencers should be run on that node, and for each sequencer constructing the execution environment, calling the sequencer, and saving the information generated by the sequencer.

Note

Configuring JBoss DNA services is a bit more manual than is ideal. As you'll see, JBoss DNA uses dependency injection to allow a great deal of flexibility in how it can be configured and customized. But this flexibility makes it more difficult for you to use. We understand this, and will soon provide a much easier way to set up and manage JBoss DNA. Current plans are to use the JBoss Microcontainer along with a configuration repository.

To set up the sequencing service, an instance is created, and dependent components are injected into the object. This includes among other things:

An execution context that defines the context in which the service runs, including a factory for JCR sessions given names of the repository and workspace. This factory must be configured, and is how JBoss DNA knows about your JCR repositories and how to connect to them. More on this a bit later.
An optional factory for class loaders used to load sequencers. If no factory is supplied, the service uses the current thread's context class loader (or if that is null, the class loader that loaded the sequencing service class).
An ExecutorService used to execute the sequencing activites. If none is supplied, a new single-threaded executor is created by calling Executors.newSingleThreadExecutor(). (This can easily be changed by subclassing and overriding the SequencerService.createDefaultExecutorService() method.)
Filters for sequencers and events. By default, all sequencers are considered for "node added", "property added" and "property changed" events.

As mentioned above, the ExecutionContext provides access to a SessionFactory that is used by JBoss DNA to establish sessions to your JCR repositories. Two implementations are available:

The JndiSessionFactory> looks up JCR Repository instances in JNDI using names that are supplied when creating sessions. This implementation also has methods to set the JCR Credentials for a given workspace name.
The SimpleSessionFactory has methods to register the JCR Repository instances with names, as well as methods to set the JCR Credentials for a given workspace name.

You can use the BasicJcrExecutionContext implementation of JcrExecutionContext and supply a SessionFactory instance, or you can provide your own implementation.

Here's an example of how to instantiate and configure the SequencingService:

SimpleSessionFactory sessionFactory = new SimpleSessionFactory();
sessionFactory.registerRepository("Main Repository", this.repository);
Credentials credentials = new ("jsmith", "secret".toCharArray());
sessionFactory.registerCredentials("Main Repository/Workspace1", credentials);
ExecutionContext executionContext = new BasicJcrExecutionContext(sessionFactory);

// Create the sequencing service, passing in the execution context ...
SequencingService sequencingService = new SequencingService();
sequencingService.setExecutionContext(executionContext);

After the sequencing service is created and configured, it must be started. The SequencingService has an administration object (that is an instance of ServiceAdministrator) with start(), pause(), and shutdown() methods. The latter method will close the queue for sequencing, but will allow sequencing operations already running to complete normally. To wait until all sequencing operations have completed, simply call the awaitTermination method and pass it the maximum amount of time you want to wait.

sequencingService.getAdministrator().start();

The JBoss DNA services are utilizing resources and threads that must be released before your application is ready to shut down. The safe way to do this is to simply obtain the ServiceAdministrator for each service (via the getServiceAdministrator() method) and call shutdown(). As previously mentioned, the shutdown method will simply prevent new work from being processed and will not wait for existing work to be completed. If you want to wait until the service completes all its work, you must wait until the service terminates. Here's an example that shows how this is done:

// Shut down the service and wait until it's all shut down ...
sequencingService.getAdministrator().shutdown();
sequencingService.getAdministrator().awaitTermination(5, TimeUnit.SECONDS);

// Shut down the observation service ...
observationService.getAdministrator().shutdown();
observationService.getAdministrator().awaitTermination(5, TimeUnit.SECONDS);

7.2. Sequencer Configurations

The sequencing service must also be configured with the sequencers that it will use. This is done using the addSequencer(SequencerConfig) method and passing a SequencerConfig instance that you create. Here's the code that defines 3 sequencer configurations: 1 that places image metadata into "/images/<filename>", another that places MP3 metadata into "/mp3s/<filename>", and a third that places a structure that represents the classes, methods, and attributes found within Java source into "/java/<filename>".

String name = "Image Sequencer";
String desc = "Sequences image files to extract the characteristics of the image";
String classname = "org.jboss.dna.sequencer.images.ImageMetadataSequencer";
String[] classpath = null; // Use the current classpath
String[] pathExpressions = {"//(*.(jpg|jpeg|gif|bmp|pcx|png)[*])/jcr:content[@jcr:data] => /images/$1"};
SequencerConfig imageSequencerConfig = new SequencerConfig(name, desc, classname, 
                                                           classpath, pathExpressions);
sequencingService.addSequencer(imageSequencerConfig);

name = "MP3 Sequencer";
desc = "Sequences MP3 files to extract the ID3 tags from the audio file";
classname = "org.jboss.dna.sequencer.mp3.Mp3MetadataSequencer";
pathExpressions = {"//(*.mp3[*])/jcr:content[@jcr:data] => /mp3s/$1"};
SequencerConfig mp3SequencerConfig = new SequencerConfig(name, desc, classname, 
                                                         classpath, pathExpressions);
sequencingService.addSequencer(mp3SequencerConfig);

name = "Java Sequencer";
desc = "Sequences java files to extract the characteristics of the Java source";
classname = "org.jboss.dna.sequencer.java.JavaMetadataSequencer";
pathExpressions = {"//(*.java[*])/jcr:content[@jcr:data] => /java/$1"};
SequencerConfig javaSequencerConfig = new SequencerConfig(name, desc, classname, 
                                                          classpath, pathExpressions);
this.sequencingService.addSequencer(javaSequencerConfig);

Each configuration defines several things, including the name, description, and sequencer implementation class. The configuration also defines the classpath information, which can be passed to the ClassLoaderFactory to get a Java ClassLoader with which the sequencer class can be loaded. (If no classpath information is provided, as is done in the code above, the application class loader is used.) The configuration also specifies the path expressions that identify the nodes that should be sequenced with the sequencer and where to store the output generated by the sequencer. Path expressions are pretty straightforward but are quite powerful, so before we go any further with the example, let's dive into path expressions in more detail.

7.2.1. Path Expressions

Path expressions consist of two parts: a selection criteria (or an input path) and an output path:

  inputPath => outputPath

The inputPath part defines an expression for the path of a node that is to be sequenced. Input paths consist of '/' separated segments, where each segment represents a pattern for a single node's name (including the same-name-sibling indexes) and '@' signifies a property name.

Let's first look at some simple examples:

Table 7.1. Simple Input Path Examples

Input Path	Description
/a/b	Match node "`b`" that is a child of the top level node "`a`". Neither node may have any same-name-sibilings.
/a/*	Match any child node of the top level node "`a`".
/a/*.txt	Match any child node of the top level node "`a`" that also has a name ending in "`.txt`".
/a/*.txt	Match any child node of the top level node "`a`" that also has a name ending in "`.txt`".
/a/b@c	Match the property "`c`" of node "`/a/b`".
/a/b[2]	The second child named "`b`" below the top level node "`a`".
/a/b[2,3,4]	The second, third or fourth child named "`b`" below the top level node "`a`".
/a/b[*]	Any (and every) child named "`b`" below the top level node "`a`".
//a/b	Any node named "`b`" that exists below a node named "`a`", regardless of where node "`a`" occurs. Again, neither node may have any same-name-sibilings.

With these simple examples, you can probably discern the most important rules. First, the '*' is a wildcard character that matches any character or sequence of characters in a node's name (or index if appearing in between square brackets), and can be used in conjunction with other characters (e.g., "*.txt").

Second, square brackets (i.e., '[' and ']') are used to match a node's same-name-sibiling index. You can put a single non-negative number or a comma-separated list of non-negative numbers. Use '0' to match a node that has no same-name-sibilings, or any positive number to match the specific same-name-sibling.

Third, combining two delimiters (e.g., "//") matches any sequence of nodes, regardless of what their names are or how many nodes. Often used with other patterns to identify nodes at any level matching other patterns. Three or more sequential slash characters are treated as two.

Many input paths can be created using just these simple rules. However, input paths can be more complicated. Here are some more examples:

Table 7.2. More Complex Input Path Examples

Input Path	Description
/a/(b\|c\|d)	Match children of the top level node "`a`" that are named "`a`", "`b`" or "`c`". None of the nodes may have same-name-sibling indexes.
/a/b[c/d]	Match node "`b`" child of the top level node "`a`", when node "`b`" has a child named "`c`", and "`c`" has a child named "`d`". Node "`b`" is the selected node, while nodes "`b`" and "`b`" are used as criteria but are not selected.
/a(/(b\|c\|d\|)/e)[f/g/@something]	Match node "`/a/b/e`", "`/a/c/e`", "`/a/d/e`", or "`/a/e`" when they also have a child "`f`" that itself has a child "`g`" with property "`something`". None of the nodes may have same-name-sibling indexes.

These examples show a few more advanced rules. Parentheses (i.e., '(' and ')') can be used to define a set of options for names, as shown in the first and third rules. Whatever part of the selected node's path appears between the parentheses is captured for use within the output path. Thus, the first input path in the previous table would match node "/a/b", and "b" would be captured and could be used within the output path using "$1", where the number used in the output path identifies the parentheses.

Square brackets can also be used to specify criteria on a node's properties or children. Whatever appears in between the square brackets does not appear in the selected node.

Let's go back to the previous code fragment and look at the first path expression:

  //(*.(jpg|jpeg|gif|bmp|pcx|png)[*])/jcr:content[@jcr:data] => /images/$1

This matches a node named "jcr:content" with property "jcr:data" but no siblings with the same name, and that is a child of a node whose name ends with ".jpg", ".jpeg", ".gif", ".bmp", ".pcx", or ".png" that may have any same-name-sibling index. These nodes can appear at any level in the repository. Note how the input path capture the filename (the segment containing the file extension), including any same-name-sibling index. This filename is then used in the output path, which is where the sequenced content is placed.

7.3. JBoss DNA Sequencers

JBoss DNA includes a number of sequencers "out of the box". These sequencers can be used within your application to sequence a variety of common file formats. To use them, the only thing you have to do is define the appropriate sequencer configurations and include the appropriate JAR files.

7.3.1. Image sequencer

A sequencer that extracts metadata from JPEG, GIF, BMP, PCX, PNG, IFF, RAS, PBM, PGM, PPM and PSD image files. This sequencer extracts the file format, image resolution, number of bits per pixel and optionally number of images, comments and physical resolution, and then writes this information into the repository using the following structure:

image:metadata node of type image:metadata

This structure could be extended in the future to add EXIF and IPTC metadata as child nodes. For example, EXIF metadata is structured as tags in directories, where the directories form something like namespaces, and which are used by different camera vendors to store custom metadata. This structure could be mapped with each directory (e.g. "EXIF" or "Nikon Makernote" or "IPTC") as the name of a child node, with the EXIF tags values stored as either properties or child nodes.

To use this sequencer, simply include the dna-sequencer-images JAR in your application and configure the Sequencing Service to use this sequencer using something similar to:

String name = "Image Sequencer";
String desc = "Sequences image files to extract the characteristics of the image";
String classname = "org.jboss.dna.sequencer.images.ImageMetadataSequencer";
String[] classpath = null; // Use the current classpath
String[] pathExpressions = {"//(*.(jpg|jpeg|gif|bmp|pcx|png|iff|ras|pbm|pgm|ppm|psd)[*])/jcr:content[@jcr:data] => /images/$1"};
SequencerConfig sequencerConfig = new SequencerConfig(name, desc, classname, 
                                                      classpath, pathExpressions);
sequencingService.addSequencer(sequencerConfig);

7.3.2. Microsoft Office document sequencer

This sequencer is included in JBoss DNA and processes Microsoft Office documents, including Excel spreadsheets and PowerPoint presentations. With presentations, the sequencer extracts the slides, titles, text and slide thumbnails. With spreadsheets, the sequencer extracts the names of the sheets. And, the sequencer extracts for all the files the general file information, including the name of the author, title, keywords, subject, comments, and various dates.

Note

Currently, Word documents are not supported. For more information and the latest status, see DNA-153.

To use this sequencer, simply include the dna-sequencer-msoffice JAR and all of the POI JARs in your application and configure the Sequencing Service to use this sequencer using something similar to:

String name = "Microsoft Office Document Sequencer";
String desc = "Sequences MS Office documents, including spreadsheets and presentations";
String classname = "org.jboss.dna.sequencer.msoffice.MSOfficeMetadataSequencer";
String[] classpath = null; // Use the current classpath
String[] pathExpressions = {"//(*.(doc|docx|ppt|pps|xls)[*])/jcr:content[@jcr:data] => /msoffice/$1"};
SequencerConfig sequencerConfig = new SequencerConfig(name, desc, classname, 
                                                      classpath, pathExpressions);
sequencingService.addSequencer(sequencerConfig);

7.3.3. ZIP archive sequencer

The ZIP file sequencer is included in JBoss DNA and extracts the files and folders contained in the ZIP archive file, extracting the files and folders into the repository using JCR's nt:file and nt:folder node types.

To use this sequencer, simply include the dna-sequencer-zip JAR in your application and configure the Sequencing Service to use this sequencer using something similar to:

String name = "ZIP Sequencer";
String desc = "Sequences ZIP archives to extract the files and folders";
String classname = "org.jboss.dna.sequencer.zip.ZipSequencer";
String[] pathExpressions = {"//(*.zip[*])/jcr:content[@jcr:data] => /zips/$1"};
SequencerConfig sequencerConfig = new SequencerConfig(name, desc, classname, 
                                                      classpath, pathExpressions);
this.sequencingService.addSequencer(sequencerConfig);

7.3.4. Java source sequencer

One of the sequencers that included in JBoss DNA is the dna-sequencer-java subproject. This sequencer parses Java source code added to the repository and extracts the basic structure of the classes and enumerations defined in the code. This structure includes: the package structures, class declarations, class and member attribute declarations, class and member method declarations with signature (but not implementation logic), enumerations with each enumeration literal value, annotations, and JavaDoc information for all of the above. After extracting this information from the source code, the sequencer then writes this structure into the repository, where it can be further processed, analyzed, searched, navigated, or referenced.

To use this sequencer, simply include the dna-sequencer-java JAR (plus all of the JARs that it is dependent upon) in your application and configure the Sequencing Service to use this sequencer using something similar to:

String name = "Java Sequencer";
String desc = "Sequences java files to extract the characteristics of the Java source";
String classname = "org.jboss.dna.sequencer.java.JavaMetadataSequencer";
String[] classpath = null; // Use the current classpath
String[] pathExpressions = {"//(*.java[*])/jcr:content[@jcr:data] => /java/$1"};
SequencerConfig sequencerConfig = new SequencerConfig(name, desc, classname, 
                                                      classpath, pathExpressions);
this.sequencingService.addSequencer(sequencerConfig);

7.3.5. MP3 audio file sequencer

Another sequencer that is included in JBoss DNA is the dna-sequencer-mp3 sequencer project. This sequencer processes MP3 audio files added to a repository and extracts the ID3 metadata for the file, including the track's title, author, album name, year, and comment. After extracting this information from the audio files, the sequencer then writes this structure into the repository, where it can be further processed, analyzed, searched, navigated, or referenced.

To use this sequencer, simply include the dna-sequencer-mp3 JAR and the JAudioTagger library in your application and configure the Sequencing Service to use this sequencer using something similar to:

String name = "MP3 Sequencer";
String desc = "Sequences MP3 files to extract the ID3 tags of the audio file";
String classname = "org.jboss.dna.sequencer.mp3.Mp3MetadataSequencer";
String[] pathExpressions = {"//(*.mp3[*])/jcr:content[@jcr:data] => /mp3s/$1"};
SequencerConfig sequencerConfig = new SequencerConfig(name, desc, classname, 
                                                      classpath, pathExpressions);
this.sequencingService.addSequencer(sequencerConfig);

7.3.6. JCR Compact Node Definition (CND) file sequencer

This sequencer is incomplete and is not currently usable. The purpose is to sequence JCR Compact Node Definition (CND) files to extract the node definitions with their property definitions, and inserting these into the repository using JCR standard notation.

7.4. Creating custom sequencers

The current release of JBoss DNA comes with six sequencers. However, it's very easy to create your own sequencers and to then configure JBoss DNA to use them in your own application.

Creating a custom sequencer involves the following steps:

Create a Maven 2 project for your sequencer;
Implement the StreamSequencer interface with your own implementation, and create unit tests to verify the functionality and expected behavior;
Add the sequencer configuration to the JBoss DNA SequencingService in your application as described in the previous chapter; and
Deploy the JAR file with your implementation (as well as any dependencies), and make them available to JBoss DNA in your application.

It's that simple.

7.4.1. Creating the Maven 2 project

The first step is to create the Maven 2 project that you can use to compile your code and build the JARs. Maven 2 automates a lot of the work, and since you're already set up to use Maven, using Maven for your project will save you a lot of time and effort. Of course, you don't have to use Maven 2, but then you'll have to get the required libraries and manage the compiling and building process yourself.

Note

JBoss DNA may provide in the future a Maven archetype for creating sequencer projects. If you'd find this useful and would like to help create it, please join the community.

In lieu of a Maven archetype, you may find it easier to start with a small existing sequencer project. The dna-sequencer-images project is a small, self-contained sequencer implementation that has only the minimal dependencies. See the subversion repository: http://anonsvn.jboss.org/repos/dna/trunk/sequencers/dna-sequencer-images/

You can create your Maven project any way you'd like. For examples, see the Maven 2 documentation. Once you've done that, just add the dependencies in your project's pom.xml dependencies section:




<dependency>

  <groupId>org.jboss.dna</groupId>

  <artifactId>dna-common</artifactId>

  <version>0.1</version>

</dependency>

<dependency>

  <groupId>org.jboss.dna</groupId>

  <artifactId>dna-graph</artifactId>

  <version>0.1</version>

</dependency>

<dependency>

  <groupId>org.slf4j</groupId>

  <artifactId>slf4j-api</artifactId>

</dependency>

These are minimum dependencies required for compiling a sequencer. Of course, you'll have to add other dependencies that your sequencer needs.

As for testing, you probably will want to add more dependencies, such as those listed here:




<dependency>

  <groupId>junit</groupId>

  <artifactId>junit</artifactId>

  <version>4.4</version>

  <scope>test</scope>

</dependency>

<dependency>

  <groupId>org.hamcrest</groupId>

  <artifactId>hamcrest-library</artifactId>

  <version>1.1</version>

  <scope>test</scope>

</dependency>

<!-- Logging with Log4J -->

<dependency>

  <groupId>org.slf4j</groupId>

  <artifactId>slf4j-log4j12</artifactId>

  <version>1.4.3</version>

  <scope>test</scope>

</dependency>

<dependency>

  <groupId>log4j</groupId>

  <artifactId>log4j</artifactId>

  <version>1.2.14</version>

  <scope>test</scope>

</dependency>

Testing JBoss DNA sequencers does not require a JCR repository or the JBoss DNA services. (For more detail, see the testing section.) However, if you want to do integration testing with a JCR repository and the JBoss DNA services, you'll need additional dependencies for these libraries.




<dependency>

  <groupId>org.jboss.dna</groupId>

  <artifactId>dna-repository</artifactId>

  <version>0.1</version>

  <scope>test</scope>

</dependency>

<!-- Java Content Repository API -->

<dependency>

  <groupId>javax.jcr</groupId>

  <artifactId>jcr</artifactId>

  <version>1.0.1</version>

  <scope>test</scope>

</dependency>

<!-- Apache Jackrabbit (JCR Implementation) -->

<dependency>

  <groupId>org.apache.jackrabbit</groupId>

  <artifactId>jackrabbit-api</artifactId>

  <version>1.3.3</version>

  <scope>test</scope>

  <!-- Exclude these since they are included in JDK 1.5 -->

  <exclusions>

    <exclusion>

      <groupId>xml-apis</groupId>

      <artifactId>xml-apis</artifactId>

    </exclusion>

    <exclusion>

      <groupId>xerces</groupId>

      <artifactId>xercesImpl</artifactId>

    </exclusion>

  </exclusions>

</dependency>

<dependency>

  <groupId>org.apache.jackrabbit</groupId>

  <artifactId>jackrabbit-core</artifactId>

  <version>1.3.3</version>

  <scope>test</scope>

  <!-- Exclude these since they are included in JDK 1.5 -->

  <exclusions>

    <exclusion>

      <groupId>xml-apis</groupId>

      <artifactId>xml-apis</artifactId>

    </exclusion>

    <exclusion>

      <groupId>xerces</groupId>

      <artifactId>xercesImpl</artifactId>

    </exclusion>

  </exclusions>

</dependency>

At this point, your project should be set up correctly, and you're ready to move on to writing the Java implementation for your sequencer.

7.4.2. Implementing the StreamSequencer interface

After creating the project and setting up the dependencies, the next step is to create a Java class that implements the StreamSequencer interface. This interface is very straightforward and involves a single method:

public interface StreamSequencer {

    /**
     * Sequence the data found in the supplied stream, placing the output 
     * information into the supplied map.
     *
     * @param stream the stream with the data to be sequenced; never null
     * @param output the output from the sequencing operation; never null
     * @param progressMonitor the progress monitor that should be kept 
     *   updated with the sequencer's progress and that should be
     *   frequently consulted as to whether this operation has been cancelled.
     */
    void sequence( InputStream stream, SequencerOutput output, ProgressMonitor progressMonitor );

The job of a stream sequencer is to process the data in the supplied stream, and place into the SequencerOutput any information that is to go into the JCR repository. JBoss DNA figures out when your sequencer should be called (of course, using the sequencing configuration you'll add in a bit), and then makes sure the generated information is saved in the correct place in the repository.

The SequencerOutput class is fairly easy to use. There are basically two methods you need to call. One method sets the property values, while the other sets references to other nodes in the repository. Use these methods to describe the properties of the nodes you want to create, using relative paths for the nodes and valid JCR property names for properties and references. JBoss DNA will ensure that nodes are created or updated whenever they're needed.

public interface SequencerOutput {

  /**
   * Set the supplied property on the supplied node.  The allowable
   * values are any of the following:
   *   - primitives (which will be autoboxed)
   *   - String instances
   *   - String arrays
   *   - byte arrays
   *   - InputStream instances
   *   - Calendar instances
   *
   * @param nodePath the path to the node containing the property; 
   * may not be null
   * @param property the name of the property to be set
   * @param values the value(s) for the property; may be empty if 
   * any existing property is to be removed
   */
  void setProperty( String nodePath, String property, Object... values );

  /**
   * Set the supplied reference on the supplied node.
   *
   * @param nodePath the path to the node containing the property; 
   * may not be null
   * @param property the name of the property to be set
   * @param paths the paths to the referenced property, which may be
   * absolute paths or relative to the sequencer output node;
   * may be empty if any existing property is to be removed
   */
  void setReference( String nodePath, String property, String... paths );
}

JBoss DNA will create nodes of type nt:unstructured unless you specify the value for the jcr:primaryType property. You can also specify the values for the jcr:mixinTypes property if you want to add mixins to any node.

For a complete example of a sequencer, let's look at the ImageMetadataSequencer implementation:

public class ImageMetadataSequencer implements StreamSequencer {

    public static final String METADATA_NODE = "image:metadata";
    public static final String IMAGE_PRIMARY_TYPE = "jcr:primaryType";
    public static final String IMAGE_MIXINS = "jcr:mixinTypes";
    public static final String IMAGE_MIME_TYPE = "jcr:mimeType";
    public static final String IMAGE_ENCODING = "jcr:encoding";
    public static final String IMAGE_FORMAT_NAME = "image:formatName";
    public static final String IMAGE_WIDTH = "image:width";
    public static final String IMAGE_HEIGHT = "image:height";
    public static final String IMAGE_BITS_PER_PIXEL = "image:bitsPerPixel";
    public static final String IMAGE_PROGRESSIVE = "image:progressive";
    public static final String IMAGE_NUMBER_OF_IMAGES = "image:numberOfImages";
    public static final String IMAGE_PHYSICAL_WIDTH_DPI = "image:physicalWidthDpi";
    public static final String IMAGE_PHYSICAL_HEIGHT_DPI = "image:physicalHeightDpi";
    public static final String IMAGE_PHYSICAL_WIDTH_INCHES = "image:physicalWidthInches";
    public static final String IMAGE_PHYSICAL_HEIGHT_INCHES = "image:physicalHeightInches";

    /**
     * {@inheritDoc}
     */
    public void sequence( InputStream stream, SequencerOutput output, 
                          ProgressMonitor progressMonitor ) {
        progressMonitor.beginTask(10, ImageSequencerI18n.sequencerTaskName);

        ImageMetadata metadata = new ImageMetadata();
        metadata.setInput(stream);
        metadata.setDetermineImageNumber(true);
        metadata.setCollectComments(true);

        // Process the image stream and extract the metadata ...
        if (!metadata.check()) {
            metadata = null;
        }
        progressMonitor.worked(5);
        if (progressMonitor.isCancelled()) return;

        // Generate the output graph if we found useful metadata ...
        if (metadata != null) {
            // Place the image metadata into the output map ...
            output.setProperty(METADATA_NODE, IMAGE_PRIMARY_TYPE, "image:metadata");
            // output.psetProperty(METADATA_NODE, IMAGE_MIXINS, "");
            output.setProperty(METADATA_NODE, IMAGE_MIME_TYPE, metadata.getMimeType());
            // output.setProperty(METADATA_NODE, IMAGE_ENCODING, "");
            output.setProperty(METADATA_NODE, IMAGE_FORMAT_NAME, metadata.getFormatName());
            output.setProperty(METADATA_NODE, IMAGE_WIDTH, metadata.getWidth());
            output.setProperty(METADATA_NODE, IMAGE_HEIGHT, metadata.getHeight());
            output.setProperty(METADATA_NODE, IMAGE_BITS_PER_PIXEL, metadata.getBitsPerPixel());
            output.setProperty(METADATA_NODE, IMAGE_PROGRESSIVE, metadata.isProgressive());
            output.setProperty(METADATA_NODE, IMAGE_NUMBER_OF_IMAGES, metadata.getNumberOfImages());
            output.setProperty(METADATA_NODE, IMAGE_PHYSICAL_WIDTH_DPI, metadata.getPhysicalWidthDpi());
            output.setProperty(METADATA_NODE, IMAGE_PHYSICAL_HEIGHT_DPI, metadata.getPhysicalHeightDpi());
            output.setProperty(METADATA_NODE, IMAGE_PHYSICAL_WIDTH_INCHES, metadata.getPhysicalWidthInch());
            output.setProperty(METADATA_NODE, IMAGE_PHYSICAL_HEIGHT_INCHES, metadata.getPhysicalHeightInch());
        }

        progressMonitor.done();
    }
}

Notice how the image metadata is extracted and the output graph is generated. A single node is created with the name image:metadata and with the image:metadata node type. No mixins are defined for the node, but several properties are set on the node using the values obtained from the image metadata. After this method returns, the constructed graph will be saved to the repository in all of the places defined by its configuration. (This is why only relative paths are used in the sequencer.)

Also note how the progress monitor is used. Reporting progress through the supplied ProgressMonitor> is very easy, and it ensures that JBoss DNA can accurately monitor and report the status of sequencing activities to the users. At the beginning of the operation, call beginTask(...) with a meaningful message describing the operation and a total for the amount of work that will be done by this sequencer. Then perform the sequencing work, periodically reporting work by specifying the incremental amount of work with the worked(double) method, or by creating a subtask with the createSubtask(double) method and reporting work against that subtask monitor.

Your method should periodically use the ProgressMonitor's isCancelled() method to check whether the operation has been cancelled.. If this method returns true, the implementation should abort all work as soon as possible and close any resources that were acquired or opened.

Finally, when your sequencing operation is completed, it should call done() on the progress monitor.

7.4.3. Testing custom sequencers

The sequencing framework was designed to make testing sequencers much easier. In particular, the StreamSequencer interface does not make use of the JCR API. So instead of requiring a fully-configured JCR repository and JBoss DNA system, unit tests for a sequencer can focus on testing that the content is processed correctly and the desired output graph is generated.

Note

For a complete example of a sequencer unit test, see the ImageMetadataSequencerTest unit test in the org.jboss.dna.sequencer.images package of the dna-sequencers-image project.

The following code fragment shows one way of testing a sequencer, using JUnit 4.4 assertions and some of the classes made available by JBoss DNA. Of course, this example code does not do any error handling and does not make all the assertions a real test would.

StreamSequencer sequencer = new ImageMetadataSequencer();
MockSequencerOutput output = new MockSequencerOutput();
ProgressMonitor progress = new SimpleProgressMonitor("Test activity");
InputStream stream = null;
try {
    stream = this.getClass().getClassLoader().getResource("caution.gif").openStream();
    sequencer.sequence(stream,output,progress);   // writes to 'output'
    assertThat(output.getPropertyValues("image:metadata", "jcr:primaryType"), 
               is(new Object[] {"image:metadata"}));
    assertThat(output.getPropertyValues("image:metadata", "jcr:mimeType"), 
               is(new Object[] {"image/gif"}));
    // ... make more assertions here
    assertThat(output.hasReferences(), is(false));
} finally {
    stream.close();
}

It's also useful to test that a sequencer produces no output for something it should not understand:

 sequencer = new ImageMetadataSequencer();
MockSequencerOutput output = new MockSequencerOutput();
ProgressMonitor progress = new SimpleProgressMonitor("Test activity");
InputStream stream = null;
try {
    stream = this.getClass().getClassLoader().getResource("caution.pict").openStream();
    sequencer.sequence(stream,output,progress);   // writes to 'output'
    assertThat(output.hasProperties(), is(false));
    assertThat(output.hasReferences(), is(false));
} finally {
    stream.close();
}

These are just two simple tests that show ways of testing a sequencer. Some tests may get quite involved, especially if a lot of output data is produced.

It may also be useful to create some integration tests that configure JBoss DNA to use a custom sequencer, and to then upload content using the JCR API, verifying that the custom sequencer did run. However, remember that JBoss DNA runs sequencers asynchronously in the background, and you must synchronize your tests to ensure that the sequencers have a chance to run before checking the results. (One way of doing this (although, granted, not always reliable) is to wait for a second after uploading your content, shutdown the SequencingService and await its termination, and then check that the sequencer output has been saved to the JCR repository. For an example of this technique, see the SequencingClientTest unit test in the example application.)

7.4.4. Deploying custom sequencers

The first step of deploying a sequencer consists of adding/changing the sequencer configuration (e.g., SequencerConfig) in the SequencingService. This was covered in the previous chapter.

The second step is to make the sequencer implementation available to JBoss DNA. At this time, the JAR containing your new sequencer, as well as any JARs that your sequencer depends on, should be placed on your application classpath.

Note

A future goal of JBoss DNA is to allow sequencers, connectors, and other extensions to be easily deployed into a runtime repository. This process will not only be much simpler, but it will also provide JBoss DNA with the information necessary to update configurations and create the appropriate class loaders for each extension. Having separate class loaders for each extension helps prevent the pollution of the common classpath, facilitates an isolated runtime environment to eliminate any dependency conflicts, and may potentially enable hot redeployment of newer extension versions.