I added a nearest neighbor algorithm to find misspellings. I think I'm going to take out some of the pattern matchings and just use the nearest neighbor scoring. It seems to give better results, and when terms are very short, it'll work better. (A misgiven id name of "C2" getting matched to a hunderd different things that are very long isn't that useful).
I want to find out if I can have the reconciliation happen in an autogenerated new column, for the purpose of building a synonym table.
And I really hate mold, I feel so sick.
Tuesday, November 29, 2011
Monday, November 21, 2011
Farewell to Freebase in Google Refine
To delete Freebase I:
- deleted extensions/freebase folder.
- edited extensions/build.xml to remove <ant dir="freebase/" target="build" />
- edited main/webapp/module/core/scripts/reconiliation/recon-manager.js and removed
ReconciliationManager.customServices.push({
"name" : "Freebase Query-based Reconciliation",
"ui" : { "handler" : "ReconFreebaseQueryPanel" }
});
and changed:
ReconciliationManager.registerStandardService(
"http://4.standard-reconcile.dfhuynh.user.dev.freebaseapps.com/reconcile");
}
to:
ReconciliationManager.registerStandardService(
"http://my service/reconcile");
}
- edited main/webapp/module/core/scripts/project/exporters.js to remove the MSQL and TripleWriter options.
- deleted extensions/freebase folder.
- edited extensions/build.xml to remove <ant dir="freebase/" target="build" />
- edited main/webapp/module/core/scripts/reconiliation/recon-manager.js and removed
ReconciliationManager.customServices.push({
"name" : "Freebase Query-based Reconciliation",
"ui" : { "handler" : "ReconFreebaseQueryPanel" }
});
and changed:
ReconciliationManager.registerStandardService(
"http://4.standard-reconcile.dfhuynh.user.dev.freebaseapps.com/reconcile");
}
to:
ReconciliationManager.registerStandardService(
"http://my service/reconcile");
}
- edited main/webapp/module/core/scripts/project/exporters.js to remove the MSQL and TripleWriter options.
Friday, November 18, 2011
Project is progressing well!
I've gotten Freebase disintegrated from Google Refine as much as possible for now. I was to explore a few other last questions I have about some of the freebase references that are coded into various classes.
I'm working on a reconciliation service configuration site that will allow users to add sources of data to the reconciliation service. It should be fun.
Wednesday, November 16, 2011
Milestone
I've reached a huge milestone. The service works with Google Refine now. This is excellent and fantastic and everything peachy because it enables me to run test cases so I can ensure it behaves as it should. Right now there's only one additional feature I want to add. It's done as far as I'm concerned.
Next: Take away Freebase extension, see if I can hard-code it to look for the company's service only.
Friday, November 11, 2011
Progress!
The single query works well. The multiple query is... just being difficult, but it should resolve itself soon. Next to do:
Add type-based help for matching.
Take out Freebase stuff. There's no reason why anyone needs to use the Freebase features at my company.
I should have this all done in a little less than a month's time. :-)
Wednesday, November 9, 2011
Reconciliation Service Tomcat Installation Notes
Note to self: class12.jar needs to be in the lib directory. After it's put in there tomcat needs to be restarted, so that it'll be added to the classpath.
Now to work on matching/scoring for the single item query.
Also - remember that multiple queries aren't working yet, so stop trying to run that!!!
I'm runninng on maybe 1 hr of sleep, so I'm a little loopy today. Yeay end of semester time combined with bad health that makes working on stuff harder (who doesn't love having a repetitive stress injury that is just getting worse by doing anything related to everything I need to do?).
Monday, November 7, 2011
Progress and To-Do
I have the GSON for the results working for the single query, kind of. Right now I just have it able to match a single result. I need to create a class that is an array of them.
I need to figure out how I want to handle multiple queries results (these are much more common in practical use).
Friday, November 4, 2011
JSON versus JSONP with a GSON example
JSON is a data format used for passing data between webapps.
An example of JSON
{
"name" : "Freebase Reconciliation Service",
"identifierSpace" : "http://rdf.freebase.com/ns/type.object.mid",
"schemaSpace" : "http://rdf.freebase.com/ns/type.object.id",
"view" : {
"url" : "http://www.freebase.com/view{{id}}"
},
"preview" : {
"url" : "http://www.freebase.com/widget/topic{{id}}?mode=content",
"width" : 430,
"height" : 300
},
"suggest" : {
"property" : "http://standard-reconcile.freebaseapps.com/suggest_property"
},
"defaultTypes" : [
{
"id" : "/people/person",
"name" : "Person"
},
{
"id" : "/location/location",
"name" : "Location"
}
]
}
JSON is meant to be used only within the same domain, a domain of origin thing to prevent its from being vulnerable to cross site scripting, like what was exploited like crazy when I was a young one just learning Java. But that's not always a very convenient solution for everyone, sometimes you just need to get JSON-based data from a 3rd party site (after all JSON has advantages over other methods), so what to do?
JSONP was developed to solve this. JSONP looks almost entirely just like JSON, even in name. An example of JSONP where "foo" is added to the call in form of a callback value
(i.e. http://www.mydomain.com/mywebapp?callback=foo):
foo({
"name" : "Freebase Reconciliation Service",
"identifierSpace" : "http://rdf.freebase.com/ns/type.object.mid",
"schemaSpace" : "http://rdf.freebase.com/ns/type.object.id",
"view" : {
"url" : "http://www.freebase.com/view{{id}}"
},
"preview" : {
"url" : "http://www.freebase.com/widget/topic{{id}}?mode=content",
"width" : 430,
"height" : 300
},
"suggest" : {
"property" : "http://standard-reconcile.freebaseapps.com/suggest_property"
},
"defaultTypes" : [
{
"id" : "/people/person",
"name" : "Person"
},
{
"id" : "/location/location",
"name" : "Location"
}
]
})
So if you're like me, and thinking GSON is neat and you want to use it, but need the data to be packaged in JSONP format, it's rather easy to fix this..
// Okay, it doesn't have to have the setPrettyPrinting,
// but it helps make it readable when troubleshooting!
Gson gson = new GsonBuilder().setPrettyPrinting().create();;
response.setCharacterEncoding("UTF-8");
response.setContentType("application/json");
response.setStatus(200);
PrintWriter out= response.getWriter();
String JSONstr = gson.toJson(object2beJSONized);
String callback = request.getParameter("callback");
JSONstr = callback+"("+JSONstr+");";
out.print(test);
Ta-da~!
I'm posting this because I really wish there had been this easy of an explanation for how to do this somewhere online.
An example of JSON
{
"name" : "Freebase Reconciliation Service",
"identifierSpace" : "http://rdf.freebase.com/ns/type.object.mid",
"schemaSpace" : "http://rdf.freebase.com/ns/type.object.id",
"view" : {
"url" : "http://www.freebase.com/view{{id}}"
},
"preview" : {
"url" : "http://www.freebase.com/widget/topic{{id}}?mode=content",
"width" : 430,
"height" : 300
},
"suggest" : {
"property" : "http://standard-reconcile.freebaseapps.com/suggest_property"
},
"defaultTypes" : [
{
"id" : "/people/person",
"name" : "Person"
},
{
"id" : "/location/location",
"name" : "Location"
}
]
}
JSON is meant to be used only within the same domain, a domain of origin thing to prevent its from being vulnerable to cross site scripting, like what was exploited like crazy when I was a young one just learning Java. But that's not always a very convenient solution for everyone, sometimes you just need to get JSON-based data from a 3rd party site (after all JSON has advantages over other methods), so what to do?
JSONP was developed to solve this. JSONP looks almost entirely just like JSON, even in name. An example of JSONP where "foo" is added to the call in form of a callback value
(i.e. http://www.mydomain.com/mywebapp?callback=foo):
foo({
"name" : "Freebase Reconciliation Service",
"identifierSpace" : "http://rdf.freebase.com/ns/type.object.mid",
"schemaSpace" : "http://rdf.freebase.com/ns/type.object.id",
"view" : {
"url" : "http://www.freebase.com/view{{id}}"
},
"preview" : {
"url" : "http://www.freebase.com/widget/topic{{id}}?mode=content",
"width" : 430,
"height" : 300
},
"suggest" : {
"property" : "http://standard-reconcile.freebaseapps.com/suggest_property"
},
"defaultTypes" : [
{
"id" : "/people/person",
"name" : "Person"
},
{
"id" : "/location/location",
"name" : "Location"
}
]
})
So if you're like me, and thinking GSON is neat and you want to use it, but need the data to be packaged in JSONP format, it's rather easy to fix this..
// Okay, it doesn't have to have the setPrettyPrinting,
// but it helps make it readable when troubleshooting!
Gson gson = new GsonBuilder().setPrettyPrinting().create();;
response.setCharacterEncoding("UTF-8");
response.setContentType("application/json");
response.setStatus(200);
PrintWriter out= response.getWriter();
String JSONstr = gson.toJson(object2beJSONized);
String callback = request.getParameter("callback");
JSONstr = callback+"("+JSONstr+");";
out.print(test);
Ta-da~!
I'm posting this because I really wish there had been this easy of an explanation for how to do this somewhere online.
To do for today
Get reconciliation service working - apparently Google Refine seems to make POST requests... I had it programmed focused on GET oops!
It looks like callbacks may be the issue, see how the vitro system handles them.
Google Refine 2.5 is nice, and I have the code checked out for it. It allows data from clipboard to be input, for some odd reason I think that it is a plus to have that.
Wednesday, November 2, 2011
More thoughts...
To do:
Get rid of namespace reconciliation feature - it's entirely Freebase-based.
Strip down features to what is useful per Rich (dif. between cell operations and column operations confusing).
Good news!
- Reconciliation service on tomcat is able to spit out metadata all happy-like. It shouldn't be too bad to extend, but I want to wait until everything else is done first.
Subscribe to:
Posts (Atom)