Christian Borgelt's Webpages

MoSS - Molecular Substructure Miner

(aka MoFa - Molecular Fragment Miner)

Download

moss.jar    executable Java archive    (135 kb)
moss.zip Java sources, version 5.13, 2009.05.07 (778 kb)
example1.smiles example input file in SMILES format (1 kb)
example2.smiles example input file in SMILES format (1 kb)
steroids.smiles example input file in SMILES format (1 kb)

More example input files are contained in the source archive, in the directory moss/data.

The source package also contains some basic documentation in HTML format, in the directory moss/doc/user and javadoc documentation in the directory moss/doc/java.

Description

A program to find frequent molecular substructures and discriminative fragments in a database of molecule descriptions. The algorithm is based on the Eclat algorithm for frequent item set mining. Apart from the default MoSS/MoFa algorithm, this program contains the gSpan algorithm [Yan and Han 2002] (or rather its extension CloseGraph [Yan and Han 2003]) as a special processing mode.

Call the program without any arguments to get a list of options. See the shell script run (included in the source package) for examples of how to invoke the program. The example input files made available above (also contained in the data directory in the source package) show the input format.

Full description of this program (included in the source package).

Sister page with some more explanations and a worked example at the ALTANA Chair of Applied Computer Science (M.R. Berthold) of the University of Konstanz.

The first version of this program was developed in cooperation with Tripos, Inc., Data Analysis Research Lab, South San Francisco, CA, USA.

Details about the application and the algorithm can be found in these papers:

Note that this program version does not support wildcard atoms and does not have a graphical user interface as the version described in two of the above papers. The version supporting these features is property of Tripos, Inc.