2.3. Sequencing content

The current JBoss DNA release contains a sequencing framework that is designed to sequence data (typically files) stored in a JCR repository to automatically extract meaningful and useful information. This additional information is then saved back into the repository, where it can be accessed and used.

In other words, you can just upload various kinds of files into a JCR repository, and DNA automatically processes those files to extract meaningful structured information. For example, load DDL files into the repository, and let sequencers extract the structure and metadata for the database schema. Load Hibernate configuration files into the repository, and let sequencers extract the schema and mapping information. Load Java source into the repository, and let sequencers extract the class structure, JavaDoc, and annotations. Load a PNG, JPEG, or other image into the repository, and let sequencers extract the metadata from the image and save it in the repository. The same with XSDs, WSDL, WS policies, UML, MetaMatrix models, etc.

JBoss DNA sequencers sit on top of existing JCR repositories (including federated repositories) - they basically extract more useful information from what's already stored in the repository. And they use the existing JCR versioning system. Each sequencer typically processes a single kind of file format or a single kind of content.

The following sequencers are included in JBoss DNA:

Image sequencer - A sequencer that processes the binary content of an image file, extracts the metadata for the image, and then writes that image metadata to the repository. It gets the file format, image resolution, number of bits per pixel (and optionally number of images), comments and physical resolution from JPEG, GIF, BMP, PCX, PNG, IFF, RAS, PBM, PGM, PPM, and PSD files. (This sequencer may be improved in the future to also extract EXIF metadata from JPEG files; see DNA-26 .)
MP3 sequencer - A sequencer that processes the contents of an MP3 audio file, extracts the metadata for the file, and then writes that image metadata to the repository. It gets the title, author, album, year, and comment. (This sequencer may be improved in the future to also extract other ID3 metadata from other audio file formats; see DNA-26 .)

As the community develops additional sequencers, they will also be included in JBoss DNA. Some of those that have been identified as being useful include:

XML Schema Document (XSD) Sequencer - Process XSD files and extract the various elements, attributes, complex types, simple types, groups, and other information. (See DNA-32 )
Web Service Definition Language (WSDL) Sequencer - Process WSDL files and extract the services, bindings, ports, operations, parameters, and other information. (See DNA-33 )
Hibernate File Sequencer - Process Hibernate configuration (cfg.xml) and mapping (hbm.xml) files to extract the configuration and mapping information. (See DNA-61 )
XML Metadata Interchange (XMI) Sequencer - Process XMI documents that contain UML models or models using another metamodel, extracting the model structure into the repository. (See DNA-31 )
ZIP Archive Sequencer - Process ZIP archive files to extract (explode) the contents into the repository. (See DNA-63 )
Java Archive (JAR) Sequencer - Process JAR files to extract (explode) the contents into the classes and file resources. (See DNA-64 )
Java Class File Sequencer - Process Java class files (bytecode) to extract the class structure (including annotations) into the repository. (See DNA-62 )
Java Source File Sequencer - Process Java source files to extract the class structure (including annotations) into the repository. (See DNA-51 )
PDF Sequencer - Process PDF files to extract the document metadata, including table of contents. (See DNA-50 )
Maven 2 POM Sequencer - Process Maven 2 Project Object Model (POM) files to extract the project information, dependencies, plugins, and other content. (See DNA-24 )
Data Definition Language (DDL) Sequencer - Process various dialects of DDL, including that from Oracle, SQL Server, MySQL, PostgreSQL, and others. May need to be split up into a different sequencer for each dialect. (See DNA-26 )
MP3 and MP4 Sequencer - Process MP3 and MP4 audio files to extract the name of the song, artist, album, track number, and other metadata. (See DNA-30 )

The examples in this book go into more detail about how sequencers are managed and used, and Chapter 5 goes into detail about how to write custom sequencers.