Perturbing Biochemistry

Thursday, January 21, 2010

Converting SMILES structures to pngs

Now that I've used VLOOKUP() to get the chemical structures for the plate-wells I'm interested in, I would like to view these structures. If the data set is small enough, one could simply copy and paste each SMILES structure into a chemical drawing program such as Xdrawchem, bkchem, or chemdraw. However, I generally work with larger data sets.

If one is willing to risk excel crashes for the rest of the life of their computer, they could install the chemdraw plugin and create the structures directly within the spread sheet, but alas, I run GNU/Linux, and need my spreadsheet apps to remain reliable. Further, as stated before, I work with large datasets, and preforming the chemdraw plug-in trick on more than a few hundred structures can really lock up a system. Therefore, my suggestion requires an alternative way to parse SMILES strings into 2-D structures.

My suggestion on how to do this depends on whether or not you need to simply view the structures or preform manipulations on them, or if you need actual hard image files, such as a PNG.

In the first case one can simply install Bioclipse and create a .smi file of all the relevant SMILES strings, then import it. For help with installation of Bioclipse, see my last post. To create a .smi file, copy the column of reliant SMILES strings into a text file and change the extension of the file from .txt to .smi.

However, in my case, the point of my exercise was so that others could see the structures and bin them how they please. Thus, my best option is to create a bunch of PNGs. Unfortunately, this wasn't as straight forward as I would like it to be, and required shell scripting to get working. Further, since I'm just learning how to shell script, I also had to do some spreadsheet scripting where I would otherwise use a for-loop.

The key program to get PNGs from SMILES structures is Dingo. Dingo accepts .mol .rxb and .smi files for input and will out put .png .svg or .pdf. Creating a command for dingo is rather simple:
dingo-render infile.{mol,rxn,smi} outfile.{png,svg,pdf}
Getting the command to work turned out to be not so simple.

At first try dingo's input was limited to a .smi file that contains a single structure and no descriptive information. As of yesterday this changed, making it quite easy for me to convert some 20,000 SMILES structures to flawless 2D PNGs in a matter of minutes!!

Installing Bioclipse

In the process of learning on how to create smiles structures to .png files I tried converiting a .smi file to a .sdf or .mol file using openbabel to no avial. It seems to only want to read the first structure in the .smi list.

I came across an open source QnA forum that recommend installing dingo, but was unable to get it working at that instance in time. (However as of yesterday, dingo was updated and now works flawlessly.)

I've tried several other programs both on Ubuntu and OpenSuse but each only seems to let me work with a single smiles string at a time, and many won't take this string from the command line.

In comes Bioclipse.

I had some troubble getting Bioclipse up and running at first. I failed in Ubuntu, and attributed it to having the default version 6 JRE, but couldn't seem to find version 5 in the repository. As it turns out, it will start to run in OpenSUSE with Sun JRE version 6. Therefore, the source, not the version that was the problem and Ubuntu users, be sure to install the sun java package if they wish to run Bioclipse.

So far in OpenSUSE I've found one hitch related to an xul runner error:

Error while booting Bioclipse: SWTErrorXPCOM error -2147467262

org.eclipse.swt.SWTError: XPCOM error -2147467262
        at org.eclipse.swt.browser.Mozilla.error(Mozilla.java:1638)

        at org.eclipse.swt.browser.Mozilla.setText(Mozilla.java:1861)
        at org.eclipse.swt.browser.Browser.setText(Browser.java:737)

As reported here, this can be solved by editing bioclipse.ini, adding the line:

-Dorg.eclipse.swt.browser.XULRunnerPath=/usr/lib/xulrunner

Thursday, January 14, 2010

Cross referencing items in a spreadsheet using VLOOKUP()

I'm in the mist of a typical cheminformatics problem: how to convert plate-well data in to something more useful. Normally our founding programmer would handle things like this, but he's stretched thin, and therefore it may take weeks to see a solution on his end.

The first step is to aggregate ones data; a mapping of all chemical structures to their respective plate-well locations, a list of interesting plate-well locations. Once aquired, one must choose a way to cross-reference the data. My background is in Java, which isn't quite as nice as python, or pear to preform database manuplations on. Therefore, staying in a spreasheet is like Calc (openoffice) or excel is my best option.

When cross referencing data in a spread sheet one uses the commands LOOKUP(), HLOOKUP, or VLOOKUP. As my data happen to have to generally reference row-to-row, I will focus on the VLOOKUP() command. VLOOKUP has the following properties:

VLOOKUP(Search Criterion, Array, Index, Sort Order)

"Search Criterion" - is the data (likely in the current row of the current spreadsheet) that you want to index against.

"Array" is the block of data containing the Search Criterion as the first column, and the data you want to import in a later column.

For instance, if the feature I'm referencing spans column B from rows 2-51 the and the data I want to pull in is in the same rows of column D, the array would be structured:

B2:D51

Most likely this data is contained in another sheet, or entirely different file. For this case one must prefix the above block with the exact path to the workbook, the # symbok, and the name of the sheet of interest followed by a period. For instance, if the data is contained in a sheet called "sheet1" in a separate file, called "referenceFile.ods", in a folder called Documents in my home directory, one would put the following before the cell block specified above.

'file:///home/myusername/Documents/referenceFile.ods'#sheet1.

Finally, as we probably don't want the range of the block specified in the first part of this example to move, we need to add a '$' to each part of the range. Thus the entire "Array" argument would look like:

'file:///home/myusername/Documents/referenceFile.ods'#$sheet1.$B$2:$D$51

The "Index" argument is simply the number of columns to the right of my initial column specified in the "Array" argument.

In the example above, column D (the data I want to import) is 2 columns to the right of column B (my referencing column), thus the index is 2.

A blog about ChemBio & tech

I'm a chemical biologist. On a regular basis come across some pretty aggravating problems that I would prefer not to have to solve twice, or some literature that that I'd like to bookmark for future reference. Moreover, many of these complications are computer related - but a bit to specific to place in my general computer use blog. Therefore, rather than keep a a separate Journal, and many of these thoughs and issues are non-propitiatory, I've decided to start this web-log.