Monday, March 5, 2007

Orkut friends list – a few thoughts

The orkut phenomenon has really caught on with the number of its users reaching a whopping 45 million. The other day, Tanmay said its an innovation in communication (or did he say a new communication technology?) – that might be stretching it a bit too far, but then again, for an application with 45 million users I couldn’t dare to disagree (at least not there and then- but I might return with an interesting argument later :-) , the clichéd argumentative Indian that I am...).

Orkut represents an interesting aspect of the general “ambivertedness” of the human nature. Standing right in the middle of our desire of privacy in communication (which e-mail services provide us with – if I mail a person A, nobody else needs to know about that) and our desire for recognition. Orkut is an in-between since (a) you usually do not expect a complete stranger to scrap you (for the ladies- I used the word usually) – so there IS some amount of privacy (b) if an acquaintance happens to drop by, you want a uber-chic profile – the desire for recognition. Also you don’t mind a friend reading the scraps written to you by another friend.

The friends-list is a source for some interesting ideas. But to talk about things in a precise way lets talk about graphs first. (You can skip this paragraph if you know what they are). All we need to know for now is that a graph is a diagram like this:

Dots and lines. Graphs are of profound importance in mathematics since a lot of situations can be modeled as graphs (let the dots be cities, the lines be roads – so the above diagram can be believed to represent cities and the interconnecting roads etc.), and any theorem that’s true for the graphs can be carried to the original situations to gain deeper insight. We will call the dots and lines, nodes and edges respectively. Also, we say a graph is connected if you can find a path of lines/edges between any two nodes. The above graph is connected. The following one isn’t:

since there is no path between b and a.

Back to orkut now. The friends list can be modeled as a graph with nodes being people and the edges (though they do not exist materially) as “friendships”. We will call this graph the “friend-graph”.

Thoughts on the friend-graph:

[1] Orkut claims that you are connected to 45 million people (you see this message when you log in). Connected means if you pick up any 2 people amongst the users, you will always be able to find a chain of friends between them (or, a path in the friend-graph), right? Now 45 million is a big number and I was not too sure that all the users are connected. So I created a second profile only to have orkut claim that I was connected to 45 million people through 0 friends. Clearly not true – I was expecting to be told that I was connected to zero people. Or rather, let me put it in this way: orkuts’ definition of connectedness is not the same as ours. What Orkut means is you are potentially connected to 45 million users.

[2]Someone mentioned that friends list is a tree. Its not. A tree is a special kind of graph that looks like this:

It’s precisely defined as a graph that doesn’t have circuits (circuit is a path that leads to the starting point). Orkut has circuits. Noticed that “common friends” list when you visit a profile (say a friend named x)- which is supposed to show the friends that are common to the profile and you? Let’s assume a common friend is y. Visualize the orkut friend-graph now – there is an edge between you and x. And there is path with the edges: you->y, y -> x (since y is friend of x too). So we have the circuit: you->x->y->you. Thus the orkut friends list is definitely not a tree.

[3] The friend graph is also a dynamic graph i.e. one in which the number of nodes change (new users, deleted profiles) and the number of edges change (new friends)

[4] When a new user registers, the count of the total number of users does not go up immediately. Orkut changes the count periodically, not immediately.

[5] It would be interesting to know how many people you are really connected to, through chains of friends i.e. the number of nodes in he connected component of the graph you are a part of. But orkut doesn’t disclose this data :(

[6] I think it would be quite interesting to find out what is the minimum number of those crucial profiles in your connected component, which if deleted, breaks the component into two disconnected parts (and do what?)

[7] Anyone for finding out the degrees of separation? The famous Milgram experiment claimed that any 2 people in the US are separated by six acquaintances in general. That was in 1967. I would love to know what the average dergree of separation on Orkut is.

[Gaurav said he wanted something more from the article. Hence this message to the non-geeks : ignore this paragraph and the following one.


Parankush and I were actually planning to find out the degrees of separation on Orkut. The quick and dirty implementation plan was to use a screen-scraping package like PERL Mechanize, scrape all the friend-graph related data onto the local disk, and then do whatever we wish to do with the data ( like boast about it, run algos to find out the minimal cut-sets, the average degree of separation etc etc). Some interesting challenges were (a) we would need a way to merge findings from my program and his (since we were planning to run the codes over our machines, separately, and we didn’t want redundant work. (b) savepoints: to protect ourselves against power failures, and proxy server downtimes we needed a way to commit a savepoint periodically (or, detect when the laptop switches to the battery backup, so that a save point is registered automatically then) (c) how do we pack such a volume of data on our hard drives . The biggest problem however was time : we had absolutely no idea how long we would have to run our codes (days? Weeks?) to scrape all the data using our not-so-reliable net connections and other interrupting factors. Larger the amount of time required to run the algos, more a dynamic graph changes – a fact which prompted Parankush to ask me the Tao of all questions: “Why are we doing this?” As a reply to which we retired to our monotonous lives.]