Tuesday, November 29, 2011

Updates and more thoughts

I added a nearest neighbor algorithm to find misspellings. I think I'm going to take out some of the pattern matchings and just use the nearest neighbor scoring. It seems to give better results, and when terms are very short, it'll work better. (A misgiven id name of "C2" getting matched to a hunderd different things that are very long isn't that useful).

I want to find out if I can have the reconciliation happen in an autogenerated new column, for the purpose of building a synonym table. 

And I really hate mold, I feel so sick.

No comments:

Post a Comment