Thursday, January 21, 2010

Converting SMILES structures to pngs

Now that I've used VLOOKUP() to get the chemical structures for the plate-wells I'm interested in, I would like to view these structures.  If the data set is small enough, one could simply copy and paste each SMILES structure into a chemical drawing program such as Xdrawchem, bkchem, or chemdraw.  However, I generally work with larger data sets.

If one is willing to risk excel crashes for the rest of the life of their computer, they could install the chemdraw plugin and create the structures directly within the spread sheet, but alas, I run GNU/Linux, and need my spreadsheet apps to remain reliable.  Further, as stated before, I work with large datasets, and preforming the chemdraw plug-in trick on more than a few hundred structures can really lock up a system.  Therefore, my suggestion requires an alternative way to parse SMILES strings into 2-D structures.

My suggestion on how to do this depends on whether or not you need to simply view the structures or preform manipulations on them, or if you need actual hard image files, such as a PNG.

In the first case one can simply install Bioclipse and create a .smi file of all the relevant SMILES strings, then import it.  For help with installation of Bioclipse, see my last post. To create a .smi file, copy the column of reliant SMILES strings into a text file and change the extension of the file from .txt to .smi.

However, in my case, the point of my exercise was so that others could see the structures and bin them how they please.  Thus, my best option is to create a bunch of PNGs.  Unfortunately, this wasn't as straight forward as I would like it to be, and required shell scripting to get working.  Further, since I'm just learning how to shell script, I also had to do some spreadsheet scripting where I would otherwise use a for-loop.

The key program to get PNGs from SMILES structures is Dingo.  Dingo accepts .mol .rxb and .smi files for input and will out put .png .svg or .pdf.  Creating a command for dingo is rather simple:
dingo-render infile.{mol,rxn,smi} outfile.{png,svg,pdf}
Getting the command to work turned out to be not so simple.

At first try dingo's input was limited to a .smi file that contains a single structure and no descriptive information. As of yesterday this changed, making it quite easy for me to convert some 20,000 SMILES structures to flawless 2D PNGs in a matter of minutes!!

No comments:

Post a Comment