Assemble 2.0 Tutorial

Brief Description | First Example | Fundamental difference between non-overlapping fragments and substructure constraints | Atom Tags | Assemble as a tool in structure elucidation | Ranking | Postprocessing




Assemble as a tool in structure elucidation

One of Assemble's classical fields of application is structure elucidation of organic compounds with spectroscopic methods. The following simple example may illustrate how a structure generator can enhance productivity and reliability in structure elucidation. Five spectra of an unknown compound are given: the mass spectrum, the infrared spectrum, the H-1 nmr spectrum, and the broadband decoupled and the off-resonance decoupled C-13 nmr spectrum.

Before Assemble can be used at all, the molecular formula must be known. This is straightforward for small molecules as in this case. All spectra are briefly investigated, only the most obvious and trivial information is interpreted.

The highest signal in the mass spectrum appears at m/e = 114. Although the signal is weak, it is assumed to be the molecule ion.

The infrared spectrum (liquid film) shows a broad band from 3200 up to 3700 cm-1. This is supposed to be a trace of humidity. The compound is obviously hygroscopic. The region around 3000 cm-1, domain of the the C-H stretching vibrations, is normally not very informative, as the bands are almost always present. The high intensity at low frequencies around 2800 cm-1 indicates the presence of hetero atoms, particularly oxygen. The only interpretable band outside the fingerprint region is the C=C stretching mode at 1650 cm-1 indicating the presence of a C=C double bond. The most valuable information in the IR spectrum is usually the absence of a number of functional groups. In particular, we don't find evidence of hydrogen bonded to hetero atoms. In addition there is no carbonyl group present in the molecule.

The C-13 nmr spectrum shows 6 signals. The multiplicities as available from the off-resonance decoupled spectrum is indicated as a small uppercase letter at the top of the signal. All carbons are bonded to hydrogen. There are 2 CH groups and 4 CH2 groups minimum. The chemical shift of the triplets around 70 ppm again indicate the presence of hetero atoms, most likely oxygen.

The H-1 nmr spectrum shows a number of complicated signals with integral ratios 1:2:2:1:1:1:1:1, adding up to 10. This corresponds with the lower limit found in the C-13 nmr spectra.

We have found 6 C, 10 H and indications for the presence of hetero atoms, most likely oxygen. The C6H10 part of the molecular formula contributes 82 amu. If the signal at m/e = 114 in the MS indeed corresponds to the molecule ion, there is a missing mass of 32, to be constituted without C and H atoms. The mass difference could be accounted for by one sulfur atom or two oxygen atoms. The assumption of the latter leads to a molecular formula C6H10O2. The formula corresponds to two double bond equivalents.

As the molecular formula is known, Assemble can run the first time. Possibly an old example is still set up. You may want to shut down the program and restart it, or you remove the old entries. To delete the substructures, move the mouse pointer into the "Fragments" field and press the right mouse button. From the pop-up menu choose "Remove All Fragments":

Main Input Windows: Remove All Fragments

The atom constraints and cycle size constraints are removed similarly. Delete all entries directly accessible in the main window and add the new molecular formula. If nothing but the molecular formula is given to Assemble, it generates 4869 constitutions, some of them looking quite weird.

Project Second: 4846 structures calculated

It is most illustrative, how the number of candidates decreases as only the most trivial information is given. The C-13 nmr spectra yield the number of hydrogen atoms immediately bonded to the carbon atoms. In addition the signals at 135.0 ppm and 116.8 ppm can be assigned to sp2 hybridized carbon atoms, while the others correspond to sp3 hybridized atoms. The information is readily entered as atom constraints.

Main Input Window: C6H10O2, Atoms

Only 26 structures are compatible with the information, already a manageable number. Therefore it is worthwhile to have a look at the candidates.

Project Second: 26 structures calculated

Some of the candidates are peroxides. As Assemble does not consider any kind of chemical behavior, all chemically unstable molecules are generated. There is no spectroscopic evidence for the absence of a peroxide linkage. It is assumed by pure chemical intuition. To forbid the peroxides, the group is set up as a substructure constraint, minimum and maximum limits set to zero.

Main Input Window: C6H10O2, Atoms, no peroxides

There are 17 structures left.

Project Second: 17 structures calculated

Working with Assemble adds some flexibility to the structure elucidation process. Often the spectroscopist does not find a clue simply due to a lack of ideas. As there are candidate structures available, one can attempt to find discrepancies between the spectral information and some of the candidates. The first structure generated is obviously inconsistent with the spectra. There is a CH2-CH2-CH2 linkage in the molecule. The chemical shift of the center CH2 group is expected below 2 ppm in the H-1 nmr spectrum, and below 30 ppm in the C-13 nmr spectrum, both in contradiction with the experiment. Some other candidates also show a CH2 group between two carbon atoms, not in accord with the spectra. So it seems worthwhile to forbid the linkage. The group is set up as a substructure constraint.

Main Input Window: Second, no_CCH2C

There are only 5 structures left.

Project Second: 5 structures calculated

Further reduction of candidates is not worthwhile, as 5 structures can be easily handled individually. Just for the sake of practicing you may want to give some more restrictions. The last structure shows an oxygen atom bonded to the C=C double bond. This is in contradiction with the nmr spectra. The isolated spin-spin interaction system of the double bond would lead to a less complex signal pattern as experimentally observed. In addition the chemical shift of the CH group next to the oxygen is expected to be substantially higher than observed, in both C-13 and H-1 nmr spectra. One way to pass the information to Assemble is to set up the double bond as a fragment and use the neighboring atom tag for the CH group:

Main Input Window: Second, no_CH2=CH-O

The output does not include the offending structure any more. As no other candidate structure showed the feature, it is also the only one excluded.

Project Second: 4 structures calculated

The last structure shows a CH2 group between two oxygen atoms. The chemical shift of the corresponding signal is expected around 100 ppm in the C-13 nmr spectrum, some 30 ppm apart from the experimental value. As again no other candidate shows the feature, forbidding the group will only remove that particular candidate. Do it for the sake of practicing. The offending group is set up as a substructure constraint and excluded by setting minimum and maximum occurrence to zero.

Main Input Window: Second, no_O-CH2-O

Again the structure is excluded. There are 3 candidates remaining.

Project Second: 3 structures calculated

The chemical shifts in epoxides are rather special. Normally the neighboring oxygen pushes the shift above 3 ppm in the H-1 nmr spectrum. Not so in epoxides where the influence of the oxygen is reduced. As all sp3 hybridized carbon atoms have an oxygen atom as a neighbor, the low chemical shifts below 3 ppm are only compatible with the third structure, which is indeed the correct constitution.



Brief Description | First Example | Fundamental difference between non-overlapping fragments and substructure constraints | Atom Tags | Assemble as a tool in structure elucidation | Ranking | Postprocessing