I've completed opening the reconciliation service. Now it has a front-page that lets the user intelligently put in references to where s/he wants data pulled from. The data is then stored in a SQLite database, which the servlet reads from the populate the metadata and to grab the vocabulary that the requests are reconciled against.
It will be put up on: http://code.google.com/p/open-reconcile/ when I have it looking less jury-rigged.
Friday, January 6, 2012
Friday, December 16, 2011
SQLite and Moving
First off, after successfully adapting Google Refine to my company's needs, I now go full circle and adapt the software I wrote for the company to be open-source and useable by all. This task has lead to met implementing server-side SQLite using GWT framwork (and the sqlitejdbc JAR). It's going surprisingly well. It's probably not the best most secure idea in the world, but at least where I am, it'll be behind a log-in screen.
After this I may get involved in contributing to Google Refine's development. It depends on my company's needs, of course. All and all, this semester has been one of the most productive of my life.
After this I may get involved in contributing to Google Refine's development. It depends on my company's needs, of course. All and all, this semester has been one of the most productive of my life.
Wednesday, December 7, 2011
Rebranding Google Refine
Google Refine is cool with you modifying their code and redistributing it... as long as you don't call it "Google Refine." So, I'm going through the process of rebranding Google Refine as BIORefine.
The first thing I did was whip up a nifty little replacement for the Google Refine logo.
It was created with two different heights 30px and 40px. it goes in the grefine/main/webapp/modules/core/images folder.
grefine/main/webapp/modules/core/index.vt is edited to change Google Refine to BIORefine.
There are a handful of other files to be changed (mostly the .html files in grefine/main/webapp/modules/core/scripts/index/ such as create-project-ui-source-selection.html).
Edit: I'll probably update this guide as I find more that I've missed (like about.html - although I did leave all the text in, just added that BIORefine is a modified version of Google Refine).
The first thing I did was whip up a nifty little replacement for the Google Refine logo.
It was created with two different heights 30px and 40px. it goes in the grefine/main/webapp/modules/core/images folder.
grefine/main/webapp/modules/core/index.vt is edited to change Google Refine to BIORefine.
There are a handful of other files to be changed (mostly the .html files in grefine/main/webapp/modules/core/scripts/index/ such as create-project-ui-source-selection.html).
Edit: I'll probably update this guide as I find more that I've missed (like about.html - although I did leave all the text in, just added that BIORefine is a modified version of Google Refine).
Friday, December 2, 2011
Update
So, it looks like my proposal for my project will finally be approved soon. :-) Also the project is pretty much done. I have it good to go as far as Richard XXXX is concerned about generating sensible matches. Part of me would like to play and implement some sort of Needleman&Wunsch algorithm instead of edit distance, but... it's not a huge deal, and I don't think I'd get that much more out of it.
I look forward to seeing how it performs on Wednesday, when I'm on campus, and don't have to do the vpn-thing. It should perform much better - the cell names took the longest, but that's because there are thousands of them, and they're much more likely to have weird punctuation issues. Two hours to check a couple thousand, but then again, it beats checking them all by hand by a huge amount.
I took out the suggestion request code, it's just not something I see as valuable for the service to do. There's going to be enough overlap between tissue type and disease that it may be confusing and the user should know what they're looking for.
I look forward to seeing how it performs on Wednesday, when I'm on campus, and don't have to do the vpn-thing. It should perform much better - the cell names took the longest, but that's because there are thousands of them, and they're much more likely to have weird punctuation issues. Two hours to check a couple thousand, but then again, it beats checking them all by hand by a huge amount.
I took out the suggestion request code, it's just not something I see as valuable for the service to do. There's going to be enough overlap between tissue type and disease that it may be confusing and the user should know what they're looking for.
Tuesday, November 29, 2011
Updates and more thoughts
I added a nearest neighbor algorithm to find misspellings. I think I'm going to take out some of the pattern matchings and just use the nearest neighbor scoring. It seems to give better results, and when terms are very short, it'll work better. (A misgiven id name of "C2" getting matched to a hunderd different things that are very long isn't that useful).
I want to find out if I can have the reconciliation happen in an autogenerated new column, for the purpose of building a synonym table.
And I really hate mold, I feel so sick.
I want to find out if I can have the reconciliation happen in an autogenerated new column, for the purpose of building a synonym table.
And I really hate mold, I feel so sick.
Monday, November 21, 2011
Farewell to Freebase in Google Refine
To delete Freebase I:
- deleted extensions/freebase folder.
- edited extensions/build.xml to remove <ant dir="freebase/" target="build" />
- edited main/webapp/module/core/scripts/reconiliation/recon-manager.js and removed
ReconciliationManager.customServices.push({
"name" : "Freebase Query-based Reconciliation",
"ui" : { "handler" : "ReconFreebaseQueryPanel" }
});
and changed:
ReconciliationManager.registerStandardService(
"http://4.standard-reconcile.dfhuynh.user.dev.freebaseapps.com/reconcile");
}
to:
ReconciliationManager.registerStandardService(
"http://my service/reconcile");
}
- edited main/webapp/module/core/scripts/project/exporters.js to remove the MSQL and TripleWriter options.
- deleted extensions/freebase folder.
- edited extensions/build.xml to remove <ant dir="freebase/" target="build" />
- edited main/webapp/module/core/scripts/reconiliation/recon-manager.js and removed
ReconciliationManager.customServices.push({
"name" : "Freebase Query-based Reconciliation",
"ui" : { "handler" : "ReconFreebaseQueryPanel" }
});
and changed:
ReconciliationManager.registerStandardService(
"http://4.standard-reconcile.dfhuynh.user.dev.freebaseapps.com/reconcile");
}
to:
ReconciliationManager.registerStandardService(
"http://my service/reconcile");
}
- edited main/webapp/module/core/scripts/project/exporters.js to remove the MSQL and TripleWriter options.
Friday, November 18, 2011
Project is progressing well!
I've gotten Freebase disintegrated from Google Refine as much as possible for now. I was to explore a few other last questions I have about some of the freebase references that are coded into various classes.
I'm working on a reconciliation service configuration site that will allow users to add sources of data to the reconciliation service. It should be fun.
Subscribe to:
Posts (Atom)