So, it’s a research project, then? What’s it do?

October 24th, 2009

I have mentioned in couple of places (like, say, the main Mondegreen page) that Mondegreen is a research project. I’m not sure if I explained what I am looking into, but I’d like to take a moment to elaborate on that, answering a couple of seemingly unrelated questions in the process.

First and foremost, since my school has the words ’social psychology’ in its name, it was necessary for me to, you know, do a bit of that. Therefore, although Mondegreen looks like it’s a research project on bots (or, at the very least, chatterbots), this is not entirely true. I mean, I find the whole concept fascinating and I’m gonna devote some of my time to that in the future (I have an initial draft prepared of something that, for now, I’m going to keep calling ‘iteration2′), but that’s not the meat of the matter. What I want to find out is basically this: if I allow people – random people from the Internet, not specifically trained for the task, mind you – unrestricted means of teaching the bot everything they please, what will they do with that power? While I hate quoting myself, the question boils down to this: “Will you mold it into a courteous gentleman, an unlikeable, meme-spewing monstrosity, or something in between?” I am not going to harm the experiment by telling you what the answer seems to be, so you will just have to talk to it and find for yourself.

As for the data I’m gathering, I’ve found that, however small the database is, (basically just questions, responses, timestamps, ratings and who said what) it allows me to do some pretty robust things with what’s in it. I am currently considering making the stats available to everyone, but, again, I’m not sure how that is going to impact the experiment. I’m gonna have to clear that up come tuesday.

Also, the current issue with Mondegreen dataset is that there are many answers, but most of them have not been rated often. I am working on a solution to amend that, and I hope to have it ready by the end of this week, or at the start of the next one. The tool is gonna be pretty simple and – I hope – straightforward, it’s just that I have to divide my time between Mondegreen and other things. I am optimistic about this, however, and I will let you know as soon as this goes live. You will probably notice that first, though.

Until then, keep chatting! We’ve reached over 1 thousand users and over 20 thousand unique question-response pairs! I thank you for your support and hope you’ll stick around to see for yourself where the experiment takes us :)

Introduction

October 16th, 2009

So. Mondegreen.

Where do I begin? I assume all of you readers already checked it out (if you haven’t, do check Mondegreen out; I mean, it’s not even gonna take long, I’ll wait right here) and realized it’s one of those programs that pretend to have an intelligent conversation with you, at least until you ask it something the author clearly did not intend, and the illusion breaks. And there are plenty of those, some better, some worse, there’s even a Loebner Prize that’s awarded each year to chatterbots that pretend to be humans the best. So, why make more if they’re still gonna screw up every time some jokers asks them about their preference of mudkips?

The thing, as cliche as it sounds, is that our robots are is different. And I suppose this is as good a time as any to tell you of advantages and disadvantages of how Mondegreen operates.

The core idea behind Mondegreen is crowdsourcing. Bots such as ALICE, and others based around AIML rely on an intricately woven system of templates, pattern recognition, programmed-in knowledge of what can be substituted for what, etc.. Obviously, you can’t just tell Joe Random “hey, go ahead and talk to the bot, just remember to use your <srai>* tags where appropriate! Also, here’s the database of nouns, if you could help fill that, that’d be peachy. Okay have fun now!”; you have to simplify, and simpler the better.

Simply put, Mondegreen is, on one level, much dumber than ALICE, since it doesn’t process context AT ALL but, to compensate for that, it learns much, much faster than any AIML bot, and by “much faster”, I mean “it actually learns anything at all on its own“. The unfortunate side effect of that rapid and uncritical learning process is that, depending on how tame a commpunity gathered around Mondegreen, you’re just as likely to end up with a bemonocled scholar, and a raging idiot.

So, which one it’s gonna be? That is entirely up to you, the denizens of the Internet. What are you going to do with it?

*This is even more unfortunate when you are either Polish, or know any language in which “srai” sounds EXACTLY like a command: “take a dump”