All About Me
 Old Crap
 What's Going On
   Important News
 Pictures
   Kitty Porn

 Things I can't live without

> Cars
> Computers
> Food/Recipes
> Humor
> Music
> Privacy
Expand your vocabulary and feed a hungry person!

2004-01-13

Frustrated with the lack of open source recommendation systems available (there are none, really) and having had experience with NetPerceptions, a high-end, commercial, super-expensive ($25,000 per year) recommendation system, I'm about to begin writing my own. I've already written a sort of bare-bones recommendation system for my job, but it had no concept of affinity groups; it just queried the database, building a list of what other people expressed interest in that had also expressed interest in whatever current item the result set was on. Like I said, very basic, but it works well.

This one, however, will have to be in C++: I have no intentions of writing something in C and dealing with its lack of objects, plus I haven't written C for at least 5 years now. Ironically, I've never written a large scale C++ project. Heh.

The guts of the project are pretty straightforward: based on a whole lot of data, figure out correlations in said data. Fortunately, all of this can be fairly well abstracted -- the application need not know you're dealing with users and products, since everything can be based off of some arbitrary number, thusly allowing the whole thing to be used for pretty much any purpose.

OK... so the end result must:
  • Handle "users" and "products". I'll use 'users' and 'products' to describe things, but note that this won't be limited to strictly users and products: use whatever you'd like as it's data.
  • For a given 'user', record his/her likes and dislikes on a binary scale: either the user likes it or they don't like it.
  • For a given 'user', record his/her likes and dislikes on a Likert scale: 0=hate, 1=midly despise, 2=its ok, 3=average, 4=pretty cool, 5=OMG I'M CREAMING MY PANTS THINKING ABOUT IT
  • For a given 'product', figure out who likes it
  • For a given 'product', figure out who dislikes it
  • For a given 'user', find out what other 'users' share similar interests (affinity groups)
  • For a given 'user', find out what else the 'user' may like (affinity groups) that they have not expressed any interest in
  • For a given 'product', find out what 'users' may be interested in it (affinity groups)
  • Log all results to a database, but keep the results in memory as well. The key that NP has is that it was blindingly fast, mostly because it did all this in memory
  • listen() on a given port and accept queries (SELECTs and INSERTs and UPDATEs), return results
Pretty ambitious, eh? Yeah, I think so too.


© 1996-2010. Do not link to images here; you will not get the image you're leeching (and costing me money by doing so)
Use a free image hosting service instead. Thanks.