Friday, December 16, 2011

SQLite and Moving

First off, after successfully adapting Google Refine to my company's needs, I now go full circle and adapt the software I wrote for the company to be open-source and useable by all. This task has lead to met implementing server-side SQLite using GWT framwork (and the sqlitejdbc JAR). It's going surprisingly well. It's probably not the best most secure idea in the world, but at least where I am, it'll be behind a log-in screen.

After this I may get involved in contributing to Google Refine's development. It depends on my company's needs, of course. All and all, this semester has been one of the most productive of my life.

Wednesday, December 7, 2011

Rebranding Google Refine

Google Refine is cool with you modifying their code and redistributing it... as long as you don't call it "Google Refine." So, I'm going through the process of rebranding Google Refine as BIORefine.

The first thing I did was whip up a nifty little replacement for the Google Refine logo.






It was created with two different heights 30px and 40px. it goes in the grefine/main/webapp/modules/core/images folder.

grefine/main/webapp/modules/core/index.vt is edited to change Google Refine to BIORefine.

There are a handful of other files to be changed (mostly the .html files in grefine/main/webapp/modules/core/scripts/index/ such as create-project-ui-source-selection.html).

Edit: I'll probably update this guide as I find more that I've missed (like about.html - although I did leave all the text in, just added that BIORefine is a modified version of Google Refine).

Friday, December 2, 2011

Update

So, it looks like my proposal for my project will finally be approved soon. :-) Also the project is pretty much done. I have it good to go as far as Richard XXXX is concerned about generating sensible matches. Part of me would like to play and implement some sort of Needleman&Wunsch algorithm instead of edit distance, but... it's not a huge deal, and I don't think I'd get that much more out of it.

I look forward to seeing how it performs on Wednesday, when I'm on campus, and don't have to do the vpn-thing. It should perform much better - the cell names took the longest, but that's because there are thousands of them, and they're much more likely to have weird punctuation issues. Two hours to check a couple thousand, but then again, it beats checking them all by hand by a huge amount.

I took out the suggestion request code, it's just not something I see as valuable for the service to do. There's going to be enough overlap between tissue type and disease that it may be confusing and the user should know what they're looking for.