Monday, March 5, 2007

Orkut friends list – a few thoughts

The orkut phenomenon has really caught on with the number of its users reaching a whopping 45 million. The other day, Tanmay said its an innovation in communication (or did he say a new communication technology?) – that might be stretching it a bit too far, but then again, for an application with 45 million users I couldn’t dare to disagree (at least not there and then- but I might return with an interesting argument later :-) , the clichéd argumentative Indian that I am...).

Orkut represents an interesting aspect of the general “ambivertedness” of the human nature. Standing right in the middle of our desire of privacy in communication (which e-mail services provide us with – if I mail a person A, nobody else needs to know about that) and our desire for recognition. Orkut is an in-between since (a) you usually do not expect a complete stranger to scrap you (for the ladies- I used the word usually) – so there IS some amount of privacy (b) if an acquaintance happens to drop by, you want a uber-chic profile – the desire for recognition. Also you don’t mind a friend reading the scraps written to you by another friend.

The friends-list is a source for some interesting ideas. But to talk about things in a precise way lets talk about graphs first. (You can skip this paragraph if you know what they are). All we need to know for now is that a graph is a diagram like this:

Dots and lines. Graphs are of profound importance in mathematics since a lot of situations can be modeled as graphs (let the dots be cities, the lines be roads – so the above diagram can be believed to represent cities and the interconnecting roads etc.), and any theorem that’s true for the graphs can be carried to the original situations to gain deeper insight. We will call the dots and lines, nodes and edges respectively. Also, we say a graph is connected if you can find a path of lines/edges between any two nodes. The above graph is connected. The following one isn’t:

since there is no path between b and a.

Back to orkut now. The friends list can be modeled as a graph with nodes being people and the edges (though they do not exist materially) as “friendships”. We will call this graph the “friend-graph”.

Thoughts on the friend-graph:

[1] Orkut claims that you are connected to 45 million people (you see this message when you log in). Connected means if you pick up any 2 people amongst the users, you will always be able to find a chain of friends between them (or, a path in the friend-graph), right? Now 45 million is a big number and I was not too sure that all the users are connected. So I created a second profile only to have orkut claim that I was connected to 45 million people through 0 friends. Clearly not true – I was expecting to be told that I was connected to zero people. Or rather, let me put it in this way: orkuts’ definition of connectedness is not the same as ours. What Orkut means is you are potentially connected to 45 million users.

[2]Someone mentioned that friends list is a tree. Its not. A tree is a special kind of graph that looks like this:

It’s precisely defined as a graph that doesn’t have circuits (circuit is a path that leads to the starting point). Orkut has circuits. Noticed that “common friends” list when you visit a profile (say a friend named x)- which is supposed to show the friends that are common to the profile and you? Let’s assume a common friend is y. Visualize the orkut friend-graph now – there is an edge between you and x. And there is path with the edges: you->y, y -> x (since y is friend of x too). So we have the circuit: you->x->y->you. Thus the orkut friends list is definitely not a tree.

[3] The friend graph is also a dynamic graph i.e. one in which the number of nodes change (new users, deleted profiles) and the number of edges change (new friends)

[4] When a new user registers, the count of the total number of users does not go up immediately. Orkut changes the count periodically, not immediately.

[5] It would be interesting to know how many people you are really connected to, through chains of friends i.e. the number of nodes in he connected component of the graph you are a part of. But orkut doesn’t disclose this data :(

[6] I think it would be quite interesting to find out what is the minimum number of those crucial profiles in your connected component, which if deleted, breaks the component into two disconnected parts (and do what?)

[7] Anyone for finding out the degrees of separation? The famous Milgram experiment claimed that any 2 people in the US are separated by six acquaintances in general. That was in 1967. I would love to know what the average dergree of separation on Orkut is.

[Gaurav said he wanted something more from the article. Hence this message to the non-geeks : ignore this paragraph and the following one.


Parankush and I were actually planning to find out the degrees of separation on Orkut. The quick and dirty implementation plan was to use a screen-scraping package like PERL Mechanize, scrape all the friend-graph related data onto the local disk, and then do whatever we wish to do with the data ( like boast about it, run algos to find out the minimal cut-sets, the average degree of separation etc etc). Some interesting challenges were (a) we would need a way to merge findings from my program and his (since we were planning to run the codes over our machines, separately, and we didn’t want redundant work. (b) savepoints: to protect ourselves against power failures, and proxy server downtimes we needed a way to commit a savepoint periodically (or, detect when the laptop switches to the battery backup, so that a save point is registered automatically then) (c) how do we pack such a volume of data on our hard drives . The biggest problem however was time : we had absolutely no idea how long we would have to run our codes (days? Weeks?) to scrape all the data using our not-so-reliable net connections and other interrupting factors. Larger the amount of time required to run the algos, more a dynamic graph changes – a fact which prompted Parankush to ask me the Tao of all questions: “Why are we doing this?” As a reply to which we retired to our monotonous lives.]


Tuesday, February 27, 2007

Lines from The Rubaiyat

Some of my favouite lines. From the Rubaiyat:


We are no other than a moving row
Of Magic Shadow-shapes that come and go
Round with the Sun-illumined Lantern held
In Midnight by the Master of the Show;

But helpless Pieces of the Game He plays
Upon this Chequer-board of Nights and Days;
Hither and thither moves, and checks, and slays,
And one by one back in the Closet lays.

--- Omar Khayyam, The Rubaiyat

Return of the Commandline

Here is a search phrase Google accepts:

intext:(google+recruitment+india) intext:(bangalore) filetype:html

This translates to the following more readable form:
“Find pages containing the words Google, recruitment, India in their texts. From amongst these, find the ones that contain the word Bangalore. Finally, from amongst these, get me the pages of type html.”

Though complicated, the above phrase achieves quite a lot in a few words, much like the command line statements UNIX users are accustomed to. In the years before the Graphical User Interfaces (GUI) flooded the software market, this was all we had – plain command lines. With the likes of Google and quite a few other applications encouraging them again (another notable example is yubnub.org) it suddenly seems the command line is slated for a comeback.

GUIs, undoubtedly, have been the dominant form of interface for quite a while. Their greatest advantage is their ease of use over command lines. They are not only easy to learn, but if you have worked on one GUI, you would probably figure out the nuts and bolts of another quite easily. On the other hand, to use command lines efficiently you would require prior knowledge of (probably complicated and cryptic) commands (we are talking a serious learning curve here), with a relatively weaker guarantee that two command line interfaces are going to be similar. Yet, the enormous power of manipulation they offer, make them a formidable contemporary of the GUI and a favorite of developers/administrators (and something that Google hackers swear by).

Fundamentally, what GUIs do is tell users what commands to the system are relevant in the current context of work (apart from graphically displaying the context). For example, if you are on your windows desktop, and right click the mouse, the GUI lists the commands that are relevant here (e.g.”Refresh”, “Arrange Icons”, ”New” etc.). When you are reading a MS-Word document, the GUI now presents the user a host of options to choose from (File related options:”Open”, “Save As” etc. Help related options: “Check For Updates” etc.). And when you right click your mouse over an open ms-word document, the GUI does not list the “Arrange Icons” command this time; It believes that the command is irrelevant in the particular context. Thus a GUI:


(1) is all about clearly defining a context, with a set of user friendly commands (and the user does not need

to know the
exact syntax of these commands beforehand)
(2)allows the user to execute a command without actually having him type it in – the command is already out
there, on the screen – an effortless mouse click is all that’s needed to execute it. So, life’s good.

It is interesting to think about whether a command line interface also defines a context. Its very hard to see how, when all you can see is a prompt. What I am driving at is that GUI was probably the first widely popular paradigm that introduced (or at least popularized) the idea of clearly defining contexts. Of course, we take it for granted now. A command line interface is not bound to a context, and thus, in this sense is freer. It has a no-nonsense attitude that basically says you get what you type (and if you mess up, you got what you deserved!).

There are some drawbacks to GUIs though. Scalability is a big one: a GUI is good when we have a small number of commands that you can allow the user to choose from. Bigger the number, more cumbersome the GUI becomes, till a point when you realize that probably a command line substitute might be neater. The nature of this challenge can be guessed when one thinks of implementing the Google search phrase above with a GUI (and, this isn’t just about one search phrase, but the family of such phrases that is accepted by Google).

To take another example lets look at the following situation and decide whether it would be wiser to opt for a GUI as against a commandline: I subscribe to a particular newsletter (via mail). Since I haven’t checked my mails for the past few days, and I plan to store the contents of the newsletters I have received in the period, on my local disk, I would have to mechanically sift through these mails (after I have searched them and got them listed), open each of them and copy the contents onto a local file. And to crown it all, if I am using a trickle-down dial-up connection, this job can be quite exacting in terms of time, money and patience. I don't need much convincing here to realise that this is where I would be right at home with a good command line interface that I can put inside a 'code' or 'pipe' or manipulate somehow (or, if this is something that I do regularly, I might even consider writing a screen-scraping script for it – anything to not depend on the GUI. Of course, we might have a GUI to do this for us, but then that would be a different GUI, with a specialized functions, which only stands on the side of my argument).

Apart from the problem of scalability, the commands displayed by a GUI cannot be used as elements to build “bigger” commands or commands with more functionality (okay, there are macros, but they are more like sequentially defining actions –more on that in a while). That’s to say, the commands cannot be combined together, which restricts a user to the set of commands already provided. Contrast this with the freedom a UNIX shell (the command line interface) provides. Ideas like piping (streaming the output of one command to the input of another – much like the two “intext” criteria that we have used in the sample search phrase above, where the output pages of the first “intext” criterion is considered as input to the second) and command substitution (plugging the command output into another context) remarkably expand the scope of functionalities of existing commands . And what’s more is users can have these commands sequentially executed by listing them in a script – without any intervention needed (returning to macros again, they come close to this behavior, but there’s a lot they cannot do. For example, they usually work within an application, can’t be made to handle errors easily, etc.). Then there's the question of exercising your creativity too - piping/command substitution let you 'create' your versions of a solution to a problem, in a trouble-free way - which, I feel, adds to the excitement of using them.


Summing it up, the GUI as an innovative interface medium has certainly more than fared well. But the GUI revolution has been here for some time and now that we have toyed with it enough to discover its not-so convenient aspects, its probably time we analyzed the command line (again) for its strengths and look for a more promising interface (which proabbly would be the right combination of the two?).

P.S.:

(a)There are a few other factors GUIs and command lines maybe compared with respect to. Some are:


(1) Resource intensiveness: A command line is not as resource intensive (in terms of memory) as

a GUI.
(2) Better Control of OS: Someone who has worked with both the Linux command line and its GUI

will tell you
that command line provides better control of the OS.
(3) Multitasking: Both command lines and GUIs can multitask. But GUIs enable surprisingly easy

monitoring of the tasks (Your web browser is downloading a file, while you are happily typing
away into your blog listening
to your favorite band playing – and the great part is you can see all of this happening.)

My thoughts in the post are primarily motivated by the usability of an interface.


(b) For those interested,
here’s another article on command line vs. GUIs. This article mentions a project called Enso at www.humanized.com, that aims at integrating a command line to your GUI. That is really putting it very vaguely and to do full justice to the wonderful project you must watch their demo video on the home page. Enso is trying to blur the lines between contexts in a GUI – rather, it tries to define a super-context, that’s a bit of both command line and GUI.

(c) A small aside: are tools making us dumber? An interesting post
here.
(d) I recently came across this - a desktop UI system called Oberon - quite interesting to look at since it seems to have taken an evolution path different from most contemporary systems.