Computers: To Do

All About Me
Old Crap
What's Going On
Important News
Pictures
Kitty Porn

Things I can't live without

Astronomy

Atheism

Cars

Computers

Food/Recipes

Humor

Music

Privacy

Expand your vocabulary and feed a hungry person!

2004-01-13

Frustrated with the lack of open source recommendation systems available (there are none, really) and having had experience with NetPerceptions, a high-end, commercial, super-expensive ($25,000 per year) recommendation system, I'm about to begin writing my own. I've already written a sort of bare-bones recommendation system for my job, but it had no concept of affinity groups; it just queried the database, building a list of what other people expressed interest in that had also expressed interest in whatever current item the result set was on. Like I said, very basic, but it works well.

This one, however, will have to be in C++: I have no intentions of writing something in C and dealing with its lack of objects, plus I haven't written C for at least 5 years now. Ironically, I've never written a large scale C++ project. Heh.

The guts of the project are pretty straightforward: based on a whole lot of data, figure out correlations in said data. Fortunately, all of this can be fairly well abstracted -- the application need not know you're dealing with users and products, since everything can be based off of some arbitrary number, thusly allowing the whole thing to be used for pretty much any purpose.

OK... so the end result must:

Handle "users" and "products". I'll use 'users' and 'products' to describe things, but note that this won't be limited to strictly users and products: use whatever you'd like as it's data.
For a given 'user', record his/her likes and dislikes on a binary scale: either the user likes it or they don't like it.
For a given 'user', record his/her likes and dislikes on a Likert scale: 0=hate, 1=midly despise, 2=its ok, 3=average, 4=pretty cool, 5=OMG I'M CREAMING MY PANTS THINKING ABOUT IT
For a given 'product', figure out who likes it
For a given 'product', figure out who dislikes it
For a given 'user', find out what other 'users' share similar interests (affinity groups)
For a given 'user', find out what else the 'user' may like (affinity groups) that they have not expressed any interest in
For a given 'product', find out what 'users' may be interested in it (affinity groups)
Log all results to a database, but keep the results in memory as well. The key that NP has is that it was blindingly fast, mostly because it did all this in memory
listen() on a given port and accept queries (SELECTs and INSERTs and UPDATEs), return results

Pretty ambitious, eh? Yeah, I think so too.