Thursday, November 29, 2012

Can the flaws in the MCMC algorithm give some insight on stereotyping?

Stereotyping = guessing based on prior assumptions. Tends to be more accurate than guessing at random.
Why bad? People less likely to recognize individual differences and focus on group differences, which may be of a small magnitude. Humans are limited - focus on group differences => less attention to individual differences (Side note: less attention to individual differences = bad? Or only in individualistic cultures like US?)

Flaw of MCMC algorithm: the sampling process is biased, especially when features are highly correlated. Positive feedback loop, occasionally broken by chance. Slow convergence. Similar to stereotyping? Less accurate, and tends to "jump to conclusions".

Wednesday, November 28, 2012

random thoughts

Random ideas:

Is it feasible to scan entire books without flipping any pages?
- Approach: Powerful beam of focused light that passes through the pages. Focal length can be changed to provide more data points. Page boundaries can be discerned by the difference in absorbance of paper, laminate and ink.
- Difficulties: Thick hard-cover books would be quite opaque. Difficult for sensor to discern print on pages that face each other. Noise due to paper grain.
- Easier workarounds: Mechanical page flipper. In fact, it probably already exists.

VCG mechanism (which is truthful in DRM under non-negative valuation, non-positive external contribution and quasi-linear utility domain) is not collusion proof. It's truthful because telling the truth is the Nash equilibrium. However, Nash equilibrium does not imply sub-game perfect. VCG is also not collusion-proof. But interestingly, it seems that Nash equilibrium, if we consider for simplicity that the action set is continuous and differentiable, only considers the first and second derivatives of the multi-dimensional mapping between utilities and utilities (for each outcome rule assuming a dominant strategy exists for all players and they play that). Wait, it's not a derivative of a function - it's a derivative of many functions. Darn. I'm wondering if there is an many-to-many mapping equivalent of a Hessian determinant. If there is, it would probably be completely useless, but still kinda cool. The real solution might not even give any insight since non-integral indices doesn't mean anything at all in this scenario.

My first-world problem: Watching a youtube video on developing countries when I have a slow internet connection, and feeling guilty and annoyed at the same time when it stops to buffer.

Also, reading the Communist Manifesto while listening to "Do you hear the people sing" is pretty awesome.

I remember vaguely that I would once have used words other than "a mapping from the set of combination of clothes to days" in a dinner conversation. Nowadays it feels like a completely normal and natural way of saying something that everyone can understand. Thanks Charter club. (I think the words would have been a choice of outfit, but I'm not sure if my mouth has ever uttered these words before either.)

Quora is amazing. But I still think the problem of people repeating arguments that have a set of good counter-arguments still hasn't been adequately solved. I want to learn more about what it takes to build a large information system, and it's looking rather intimidating. There is the computational component, the incentive component, and the business component. I mean, holy crap it's a hard problem. But I still think it's absolutely worth solving. Branch is doing nicely on part of allowing people to debate on issues that branch out, and stack exchange is doing well on the community edited answers front. In fact, stack overflow is fantastic on search - if the top thread on google isn't the question you're asking, one of the "related questions" probably is. It's amazing and worth emulating. Stack overflow is solving one of the problems on forums. But debates aren't handled very well, and the available solutions still disappoint me. What I would like to see is more scoring of responses to arguments. Quora allows embedded quotes. That is fantastic - I would like the whole internet to allow embedded quotes, so that attribution is easy.

Seriously though, can we have embedded quotes for the whole internet? What would it take? First problem - webpages change. Can't archive everything - too much space needed. A browser tool for a generating citation on the fly? That would be nice, but seems a bit intrusive. Yeh, attribution of a source in a user-friendly yet informative and structured way is hard. Reproducing the source wholesale could probably do it. Sounds like a job that "turn it in" has already figured out though, so the infrastructure definitely already exists.

"Ideas" are difficult to handle - it's so context dependent. A probabilistic approach is good at figuring out the topic, but is it good at figuring out the meaning?

I wonder what hardwired parameters our brains have. What is the "corpus" of a human being? It would be interesting if we can get hold of all the sensory inputs to a child from birth onwards. What does the baby know about the world when it's born, and how much of it was learnt from experiences in the womb? (Alas, hard to know)

Implementing an MCMC algorithm for Bayesian reasoning is pretty interesting. There was this case where all the nodes reinforced each other, and the algorithm took a really long time to converge - because the chance of the algorithm "changing its mind" about the central node's value. It makes me wonder if that is somehow related to religious beliefs - strongly reinforcing nodes. But that requires a certain model of reasoning.

Thoughts on a model of reasoning (I'm sure the cogn-psy peeps have figured out a large part of it, but I guess I can always ramble). People can hold on to inconsistent beliefs. People only reason some of the times, and believe what has been said most of the time.





Managed to explain the concept of virtual memory to someone who hasn't done programming before, and also managed to explain Bayes Network to a Hellenistic Studies major. I feel somewhat accomplished.

Monday, November 19, 2012

Warm fuzzy

I'm really moved today.

I'm in Charter Club (one of the eating clubs in Princeton), and honestly, I haven't been super on about it (I miss saying on and having people understand that it also means zealous). But today, we voted on changing the gendered lyrics of the Charter song. To me, using gendered terms isn't offensive, but it sounds unnecessarily antiquated. A portion of the club, about 150 of us, sat down and participated in an open moderated debate for and against changing the lyrics of the song. Each point was well made, good considerations were brought up, and important clarifications were raised. Having 150 people discuss a controversial issue openly, calmly and respectfully is not something I see all the time, and I'm super proud of everyone in the club for making this possible.

Rodrigo, the club president (soon to step down for the next president), cried during his last speech. His term was marked by the changes in the member selection mechanism (between a subjective bicker system and an open points system). He cried when he made the point about how our openness and respect makes us special, and even though it was short and to the point, it's one of the most moving speeches I've ever heard.

-----------------

Also, man, facebook is so cool, I really hope I can get in!!!

Thursday, November 15, 2012

O hai

So many people have actually serious blogs where they blog about one particular thing. In fact, almost everyone I physically meet now who has a blog keeps one where they talk about a topic others might be interested in, like programming or design or startups or law or non-profits. Hopefully people find whatever random thoughts I have to be interesting enough, because I can't be counted on to write anything else here.
-------------------------------------------------

With the obligatory meta-blogging out of the way, here goes:

Thoughts on virtualization:
So I'm currently learning about OS (Operating Systems). One very important thing that the OS does is to give each program* the illusion that it is the only program running on the computer - each program "thinks" it has all 4GBs (or however much) of RAM and the entire CPU to itself even though it's actually sharing those with all the other programs(except the OS kernel) that are running on the computer. How does the OS do it? Well, it turns out that at any one time, the state of a program's execution(called the context) can be stored in quite a small amount of space. Every once in a while, you can "freeze" the program, save the context, "unfreeze" another program, and repeat. In this way, each program gets to run some of the times, and is never aware of anything that happens when it is not running.

But if we stop there, other programs could also leave stuff lying around, which would break the illusion that each program is the only one running. So the OS kernel does something sneaky: it never exposes the actual space to the program. Each program gets a virtual space, such that anytime it looks at or changes something in its space, the OS kernel would give the program the impression that whatever the program is looking for is where the program thinks it showing the program that the corresponding real location contains. The OS kernel has to do this very carefully, otherwise the program would get confused, or even worse, it might accidentally allow the program to touch something it is not supposed to touch.

The nice thing about this world that if the OS is done right, each program doesn't "feel" that their requests take a long time, even though it might actually have taken 1 million times as long as it had ever existed for the request to be fulfilled (I think about that when I'm lining up at the DMV.) The bad thing is that you're all alone, and the only way to talk to anyone else is by making a phone call to "the system".

So suppose you are someone with the ability to stop time and you never die, you could actually give everyone on earth the illusion that they are alone! Freeze time for everyone, let one guy move once in a while, and then put everyone else in a small place, swapping them around from time to time.

Then truly, "Only through our love and friendship system calls can we create the illusion for the moment that we're not alone".

* technically it's a process/thread, but the details are not that important here.


Some notes on entropy, and an interesting consequence

Entropy is a measure of "disorder", but it's also useful to think of it as a measure of how little we know about a system - these are equivalent ways of looking at the same thing.

Suppose you have a room full of stuff. The usual thing to say is that a messy room has more entropy than a neat room, because the messy room is more "disordered". But we can also look at this from the information perspective. Imagine you have a robot that can reconstruct your room item by item. Now you tell your robot, "Construct a messy room." (Assume that the robot also understands what "messy" and "neat" means. This would become important later.) How much would you know about the room that has been reconstructed by the robot if you are not allowed to look at it?

Well, not very much, because the robot has a great deal of freedom in putting things in different places. You have no way to know where any particular thing is going to be, and even after you know where something is, you still don't know where all the other things are. You are, in fact, so clueless about the messy room that the robot would need to give you a lot of information before you really know where everything is. In other words, saying that a room is messy doesn't tell you very much about it - that what it means (at least from an information perspective) by high entropy. In contrast, if you told the robot to construct a neat room, you already know that the books would be on the shelves, and the tables and chairs are going to be some angle not too far from being a right angle before you even look at the room - saying that a room is neat tells you quite a bit about it. That's how it's low entropy. Another way of saying it is that, entropy is a measure of the number of ways you can make up something that agrees with a particular description. "Messy room" is less descriptive than "neat room", so there are more ways you can make a room messy compared to the number of ways you can make a room neat.

This ties in two fundamental ideas in physics and computer science together. It turns out that this sets absolute physical limits to the information density of a chip, or anything at all (though in practice, this density limit is about as relevant to chip designers today as the speed of light limit is to spacecraft builders). Here's the argument: suppose you make a chip that holds a lot of information, let's call it an "uber-chip". When I say, "Here's an uber-chip," I am allowing for many ways to make up an uber-chip. Perhaps I flip the last bit, perhaps I flip the middle two, or perhaps I alternate the bits... and so on. This means that chips of arbitrarily high information density also have arbitrarily high entropy. Now, the entropy of a black hole increases by a certain fixed amount when you drop a mass into it. Suppose you drop an uber-chip into a black hole. Well, so you've started with something to arbitrarily high entropy, and how you've converted it to some fixed amount of entropy - decreasing the total entropy of the universe in the process. You can't do that, so that means no uber-chip is possible.

And I think that's pretty cool.