About the Chemical Resource Kit (v2)
The Chemical Resource Kit is a collection of several packages
which, in combination, aspire to offer a graphical interface to archival,
data entry and presentation of chemical information, as well as useful
tools for data modelling; combined with a modern ab initio
computational module for calculating optimised chemical structures and
properties; a scheduling methodology for setting up a "compute farm"; dual
file-based and 3-tier data storage configuration enabling seamless network
centralisation, and a thin-client, browser-based access pathway.All packages are developed in a Linux/i386 environment, and are released as pure copylefted GNUware - which means, in summary, that anybody may take the source code and do whatever they wish with it, with no liability, and one limitation: all derivatives remain similarly open, and free.
History, and ObjectivesEarlier incarnations of the Chemical Resource Kit had a number of goals, among them ways to articulate chemical information in a structured format, and provide presentation methods for displaying this data. Features were then, as now, implemented on a priority basis as required for the author's day to day tasks. Display of 3D chemical structures was developed for use with X-ray crystallography data, with a particular emphasis on preparing spectacular-quality output for various presentations; 2D structures were implemented for formatting the author's list of synthesised molecules for his PhD thesis; spectroscopy data handling/manipulation were implemented as an alternative to trying to use inflexible off-the-shelf software for data from a variety of instruments; molecular modelling and computation features were added in response to immediate research needs. You get the idea.Version 2 of these packages is a clean-room rewrite. There are several reasons for this, mainly revolving around correcting architectual inadequacies based on changing requirements over time. The new version uses XML for all of its file formats and communication protocols, which is itself a significant overhaul; the format of these files is also significantly less regimented and provides for far greater expansibility. Perhaps more importantly, the new set of packages is no longer file-centric, and conducts all of its data storage and retrieval functions through a network protocol. This means that files stored in the user's home directory are transparently equivalent to files stored in a database and accessed via a webservice. More on this later. As always, due to the fact that this project consists entirely of the spare time of a programmer chemist who has an otherwise relatively fulfilling life, development consists mostly of bottlenecks, and although the scope of the project is broad, the features most relevant to the author's immediate interests get written first. The author is currently interested in building a large library of computed structures, thus features relating to this are currently under active development, whilst others languish, for the moment. The 3D structure editing, molecular modelling, ab initio computation, analysis of computed data, and efficient database utilisation are the priority features. Since the author is using these himself on a regular basis, they are being refined with considerable care. Other features which will be filled in later include: a 2D diagram editor, capable of importing and exporting common formats (such as the nefarious yet popular MOL-files), which is already moderately useable; spectrum viewing and manipulation; management of common textual chemical properties; reimplementation of many fancy presentation features that were available in Version 1, including being able to print anything to PostScript or vector metafiles, and make impressive POV-Ray rendered raytraced pictures. Also, a functional thin-client browser-based access route for data will likely be expanded upon. Consult the roadmap for more uptodate specifics.
Package OverviewXykronAlso known affectionately as the "Fat Client", Xykron is a large, monolithic, graphical application for X-Windows, written in C++ using the Qt toolkit (the same toolkit used to construct KDE, although this is not itself a "K" application). Xykron is designed to be as easy to install as possible (i.e. just run the binary, and play around), to be familiar and straightforward to anyone who has ever used any kind of graphical program, and to be visually appealing. It provides access to CRK data files on the local filesystem, as well as through a 3-tier implementation (see Xortoth below). The various editing, presentation and data organisation features of the suite are all intended to be enabled through this client package (although as all of us Unix nerds know well, there is no such thing as a GUI that does everything!). Xentark The computation arm of the project, 95+% of Xentark's source code is actually from an impressive project called the Massively Powerful Quantum Chemistry program. This is a C++ implementation of the latest ab initio computational methods, and Xentark draws upon this in order to provide Hartree-Fock and Density Functional methods to its own repertoire. Xentark seeks to build upon the core features provided by the underlying library, expanding and adding new features whenever possible. MPQC is also GNU open source. Unfortunately MPQC is currently slightly challenging to install, albeit rewarding. It is very portable between difference Unixes, from the lowliest Linux box to the biggest supercomputer. It is also built in the form of a library, which is how Xentark displaces the simple control program that comes with the package, and directs the computational code to its own ends. Earlier incarnations of Xentark in Version 1 actually had their own graphical interfaces, and had such niceties such as displaying the structure as it was optimised. The Version 2 implementation runs as more of a background task, which is less fun to watch, but significantly more sensible. Xortoth Initially, in the early stages of Version 2 of the Chemical Resource Kit, the network storage model was going to be a 2-tier client/server model, and Xortoth was in fact a C++ program that ran as a daemon. This model was discarded on account of being cumbersome, unreliable, and above all, really too much trouble to program (i.e. taken outside and shot). Xykron and Xortoth initially communicated with each other via an asynchronous message-based protocol, which involved sending short XML documents to each other. This protocol was retained; when the client/server model was dismantled, Xykron got slightly larger, as it absorbed the code for maintaining datafiles in the user's home directory, and "pretending" to be somewhere else on the network, responding to XML "questions" instead of just opening and saving the files. Simultaneously, Xykron can also maintain "connections" to any number of configurable Xortoth "realms", such that the user does not necessarily need to know which realm is local, and which is a network, nor where exactly for that matter; the message protocol is the same. So what is Xortoth now? It is a PHP4 page with several include files (I can't even remember what PHP stands for, but it is essentially the open source equivalent of ActiveServer Pages, and quite similar to Perl), which accesses a MySQL database, which must in turn have several tables with half a dozen fields each, as described in a simple configuration text document. It really is quite simple, although in order to run it, the administrator must know how to install MySQL and PHP4 on an appropriate Apache server, which admittedly not everybody knows how to do without reading a HOWTO or two. A thin client access pathway for Xortoth is also under development, but currently not useful. Version 1 actually included some Java applets to display the complex datatypes, and this avenue of opportunity may be reopened sometime in the future. Xuru The most recent addition to the team, Xuru bases itself on the 3-tier implementation (via Xortoth) and runs scheduled tasks. Such tasks can be configured using a client feature (part of Xykron), and usually involve running the computational program (Xentark), but can also handle external programs. With the addition of this package, it is possible to build a "compute farm", i.e. a centralised server retains all the data, and also a list of computations to be performed. One or more computers on the network regularly or sporadically offer to take up one or more of these tasks at any given time, and report back when they are finished.
Platforms & TechnologyThe Chemical Resource Kit is Linux based, but has strong cross-platform leanings.
Java applets may be reinstated whenever development of the thin-client version of Xortoth resumes, or PHP becomes too unwieldy for the server-side tasks.
The Chemical Resource Kit website is hosted by SourceForge, which
we all know is doing a great service for the open source movement.
About the AuthorDr. Alex M. Clark grew up in Auckland, New Zealand, where he attended the University of Auckland to become bestowed with BSc, MSc and PhD degrees. His postgraduate research centred on organometallic synthesis. Following that, spent a couple of years as a PostDoc at the University of California, Riverside working for Professor Chris Reed, preparing novel "buckyball" materials, among other things.
For the last two years, he has been writing chemistry software for a living, in an idyllic mountain town on a West Coast of the United States. The Chemical Resource Kit has been his pet project for some time, and continues to serve as an important outlet for keeping his scientific programming skills sharp. Although once a Windows programmer by choice, he saw the light and joined the Rebel Alliance to fight the Evil Empire by becoming a full time Linux user. He is attempting to make a contribution toward the range of quality open source software. In particular, he finds it a little disturbing that even the scientific community seems to be a bit less computer literate every year, and happens to opine that scientists have much to gain by appreciating and understanding software, as it affects all areas of research these days. |