Paul Buchheit

Saturday, September 12, 2009

Evaluating risk and opportunity (as a human)

Our lives are full of decisions that force us to balance risk and opportunity: should you take that new job, buy that house, invest in that company, swallow that pill, jump off that cliff, etc. How do we decide which risks are smart, and which are dumb? Once we've made our choices, are we willing to accept the consequences?

I think the most common technique is to ask ourselves, "What is the most likely outcome?", and if that outcome is good, then we do it (to the extent that people actually reason through decisions at all). That works well enough for many decisions -- for example, you might believe that the most likely outcome of going to school is that you can get a better job later on, and therefore choose that path. That's the reasoning most people use when going to school, getting a job, buying a house, or making most other "normal" decisions. Since it focuses on the "expected" outcome, people using it often ignore the possible bad outcomes, and when something bad does happen, they may feel bitter or cheated ("I have a degree, now where's my job!?"). For example, most people buying houses a couple of years ago weren't considering the possibility that their new house would lose 20% of its value, and that they would end up owing more than the house was worth.

When advising on startups, I often tell people that they should start with the assumption that the startup will fail and all of their equity will become worthless. Many people have a hard time accepting that fact, and say that they would be unable to stay motivated if they believed such a thing. It seems unfortunate that these people feel the need to lie to themselves in order to stay motivated, but recently I realized that I'm just using a different method of evaluating risks and opportunities.

Instead of asking, "What's the most likely outcome?", I like to ask "What's the worst that could happen?" and "Could it be awesome?". Essentially, instead of evaluating the median outcome, I like to look at the 0.01 percentile and 95th percentile outcomes. In the case of a startup, the worst case outcome is generally that you will lose your entire investment (but learn a lot), and the best case is that you make a large pile of money, create something cool, and learn a lot. (see "Why I'd rather be wrong" for more on this)

Thinking about the best-case outcomes is easy and people do it a lot, which is part of the reason it's often disrespected ("dreamer" isn't usually a compliment). However, too many people ignore the worst case scenario because thinking about bad things is uncomfortable. This is a mistake. This is why we see people killing themselves over investment losses (part of the reason, anyway). They were not planning for the worst case. Thinking about the worst case not only protects us from making dumb mistakes, it also provides an emotional buffer. If I'm comfortable with the worst-case outcome, then I can move without fear and focus my attention on the opportunity.

Considering only the best and worst case outcomes is not perfect of course -- lottery tickets have an acceptable worst case (you lose a $1) and a great best case (you win millions), yet they are generally a bad deal. Ideally we would also consider the "expected value" of our decisions, but in practice that's impossible for most real decisions because the world is too complicated and math is hard. If the expected value is available (as it is for lottery tickets), then use it (and don't buy lottery tickets), but otherwise we need some heuristics. Here are some of mine:

Will I learn a lot from the experience? (failure can be very educational)
Will it make my life more interesting? (a predictable life is a boring life)
Is it good for the world? (even if I don't benefit, maybe someone else will)

These things all raise the expected value (in my mind at least), so if they are mostly true, and I'm excited about the best-case outcome, and I'm comfortable with the worst-case outcome, then it's probably a good gamble. (note: I should also point out that when considering the worst-case scenario, it's important to also think about the impact on others. For example, even if you're ok with dying, that outcome may cause unacceptable harm to other people in your life.)

I've been told that I'm extremely cynical. I've also been told that I'm unreasonably optimistic. Upon reflection, I think I'm ok with being a cynical optimist :)

By the way, here's why I chose the 0.01 percentile outcome when evaluating the worst case: Last year there were 37,261 motor vehicle fatalities in the United States. The population of the United States is 304,059,724, so my odds of getting killed in a car accident is very roughly 1/10,000 per year (of course many of those people were teenagers and alcoholics, so my odds are probably a little better than that, but as a rough estimate it's good). Using this logic, I can largely ignore obscure 1/1,000,000 risks, which are too numerous and difficult to protect against anyway.

Also see The other half of the story

Friday, April 17, 2009

Make your site faster and cheaper to operate in one easy step

Is your web server using using gzip encoding? Surprisingly, many are not. I just wrote a little script to fetch the 30 external links off news.yc and check if they are using gzip encoding. Only 18 were, which means that the other 12 sites are needlessly slow, and also wasting money on bandwidth.

Check your site here.

Some people think gzip is "too slow". It's not. Here's an example (run on my laptop) using data from one of the links on news.ycombinator.com:
$ cat < /tmp/sd.html | wc -c
146117
$ gzip < /tmp/sd.html | wc -c
35481
$ time gzip < /tmp/sd.html >/dev/null
real    0m0.009s
user    0m0.004s
sys    0m0.004s

It took 9ms to compress 146,117 bytes of html (and that includes process creation time, etc), and the compressed data was only about 24% the size of the input. At that rate, compressing 1GB of data would require about 66 seconds of cpu time. Repeating the test with a much larger file results yields about 42 sec/GB, so 66 sec is not an unreasonable estimate.

Inevitably, someone will argue that they can't spare a few ms per page to compress the data, even though it will make their site much more responsive. However, it occured to me today that thanks to Amazon, it's very easy to compare CPU vs Bandwidth. According to their pricing page, a "small" (single core) instance cost $0.10 / hour, and data transfer out costs $0.17 / GB (though it goes down to $0.10 / GB if you use over 150 TB / month, which you probably don't).

Using these numbers, we can estimate that it would cost $1.88 to gzip 1TB of data on Amazon EC2, and $174 to transfer 1TB of data. If you instead compress your data (and get 4-to-1 compression, which is not unusual for html), the bandwidth will only cost $43.52.

Summary:
with gzip: $1.88 for cpu + $43.52 for bandwidth = $45.40 + happier users

without gzip: $174.00 for bandwidth = $128.60 wasted + less happy users

The other excuse for not gzipping content is that your webserver doesn't support it for some reason. Fortunately, there's a simple solution: put nginx in front of your servers. That's what we do at FriendFeed, and it works very well (we use a custom, epoll-based python server). Nginx acts as a proxy -- outside requests connect to nginx, and nginx connects to whatever webserver you are already using (and along the way it will compress your response, and do other good stuff).

Thursday, January 22, 2009

Communicating with code

Some people can sell their ideas with a brilliant speech or a slick powerpoint presentation.

I can't.

Maybe that's why I'm skeptical of ideas that are sold via brilliant speeches and slick powerpoints. Or maybe it's because it's too easy to overlook the messy details, or to get caught up in details that seem very important, but aren't. I also get very bored by endless debate.

We did a lot of things wrong during the 2.5 years of pre-launch Gmail development, but one thing we did very right was to always have live code. The first version of Gmail was literally written in a day. It wasn't very impressive -- all I did was take the Google Groups (Usenet search) code (my previous project) and stuff my email into it -- but it was live and people could use it (to search my mail...). From that day until launch, every new feature went live immediately, and most new ideas were implemented as soon as possible. This resulted in a lot of churn -- we re-wrote the frontend about six times and the backend three times by launch -- but it meant that we had direct experience with all of the features. A lot of features seemed like great ideas, until we tried them. Other things seemed like they would be big problems or very confusing, but once they were in we forgot all about the theoretical problems.

The great thing about this process was that I didn't need to sell anyone on my ideas. I would just write the code, release the feature, and watch the response. Usually, everyone (including me) would end up hating whatever it was (especially my ideas), but we always learned something from the experience, and we were able to quickly move on to other ideas.

The most dramatic example of this process was the creation of content targeted ads (now known as "AdSense", or maybe "AdSense for Content"). The idea of targeting our keyword based ads to arbitrary content on the web had been floating around the company for a long time -- it was "obvious". However, it was also "obviously bad". Most people believed that it would require some kind of fancy artificial intelligence to understand the content well enough to target ads, and even if we had that, nobody would click on the ads. I thought they were probably right.

However, we needed a way for Gmail to make money, and Sanjeev Singh kept talking about using relevant ads, even though it was obviously a "bad idea". I remained skeptical, but thought that it might be a fun experiment, so I connected to that ads database (I assure you, random engineers can no longer do this!), copied out all of the ads+keywords, and did a little bit of sorting and filtering with some unix shell commands. I then hacked up the "adult content" classifier that Matt Cutts and I had written for safe-search, linked that into the Gmail prototype, and then loaded the ads data into the classifier. My change to the classifier (which completely broke its original functionality, but this was a separate code branch) changed it from classifying pages as "adult", to classifying them according to which ad was most relevant. The resulting ad was then displayed in a little box on our Gmail prototype ui. The code was rather ugly and hackish, but more importantly, it only took a few hours to write!

I then released the feature on our unsuspecting userbase of about 100 Googlers, and then went home and went to sleep. The response when I returned the next day was not what I would classify as "positive". Someone may have used the word "blasphemous". I liked the ads though -- they were amusing and often relevant. An email from someone looking for their lost sunglasses got an ad for new sunglasses. The lunch menu had an ad for balsamic vinegar.

More importantly, I wasn't the only one who found the ads surprisingly relevant. Suddenly, content targeted ads switched from being a lowest-priority project (unstaffed, will not do) to being a top priority project, an extremely talented team was formed to build the project, and within maybe six months a live beta was launched. Google's content targeted ads are now a big business with billions of dollars in revenue (I think).

Of course none of the code from my prototype ever made it near the real product (thankfully), but that code did something that fancy arguments couldn't do (at least not my fancy arguments), it showed that the idea and product had real potential.

The point of this story, I think, is that you should consider spending less time talking, and more time prototyping, especially if you're not very good at talking or powerpoint. Your code can be a very persuasive argument.

The other point is that it's important to make prototyping new ideas, especially bad ideas, as fast and easy as possible. This can be especially difficult as a product grows. It was easy for me to stuff random broken features into Gmail when there were only about 100 users and they all worked for Google, but it's not so simple when there are 100 million users.

Fortunately for Gmail, they've recently found a rather clever solution that enables the thousands of Google engineers to add new ui features: Gmail Labs. This is also where Google's "20% time" comes in -- if you want innovation, it's critical that people are able to work on ideas that are unapproved and generally thought to be stupid. The real value of "20%" is not the time, but rather the "license" it gives to work on things that "aren't important". (perhaps I should do a post on "20% time" at some point...)

One of the best ways to enable prototyping and innovation on an established product is though an API. Twitter is possibly the best example of how well this can work. There are thousands of different Twitter clients, with new ones being written every day, and I believe a majority of Twitter messages are entered though one of these third-party clients.

Public APIs enable everyone to experiment with new ideas and create new ways of using your product. This is incredibly powerful because no matter how brilliant you and your coworkers are, there are always going to be smarter people outside of your company.

At FriendFeed, we discovered that our API does more than enable great apps, it also reveals great app developers. Gary and Ben were both writing FriendFeed apps using our API before we hired them. When hiring, you don't have to guess which people are "smart and gets things done", you can simply observe it in the wild :)

In my previous post, I asked people to describe their "ideal FriendFeed". Since then, I've been thinking about ideas for my "ideal FriendFeed". Unfortunately, it's very difficult for me to know how much I like an idea based only on words or mockups -- I really need to try it out. So in the spirit of prototyping, I've used my spare time to write a simple FriendFeed interface that prototypes some of the things I've been thinking about. This interface isn't the "future of FriendFeed", it's just a collection of ideas, some that I like, and some that I don't. One thing that's kind of cool about it (from a prototyping perspective) is that it's written entirely in Javascript running in the web browser -- it's just a single web page that uses FriendFeed's JSON APIs to fetch data. This also means that it's relatively easy for other people to copy and change -- you don't even need a server!

If you'd like to try it out, you can see everyone that I'm subscribed to (assuming their feed is public), or if you are a FriendFeed user, you can see all of your public subscriptions by going to http://paulbuchheit.github.com/xfeed.html#YOUR_NICKNAME_GOES_HERE. The complete source code (which is just several hundred lines of HTML and JS) is here. In this prototype, I'm experimenting with treating entries, comments, and likes all as simple "messages", only showing comments from the user's friends (which can be a little confusing), and putting it all in reverse-chronological order. As I mentioned, this interface isn't the "future of FriendFeed", it's just a collection of ideas that I'm playing with.

If you're interested in prototyping something, feel free to take this code and have your way with it. As always, I'd love to see your prototypes action!

Tuesday, January 06, 2009

If you're the kind of person who likes to vote...

Now is your opportunity!

FriendFeed was nominated for three "Crunchies". Please vote for us in all three categories:

I can't promise that your vote will end the war, fix the economy, or save the environment (that one is here), but I can promise that your vote might be counted.

Sunday, January 04, 2009

Overnight success takes a long time

For some reason, this weekend has seen a lot of talk about what FriendFeed is/isn't/should be doing (see Louis Gray and others). One person even predicted that we will fail.

I considered writing my own list of complaints about FriendFeed. I think and care about it a lot more than most people, so my list of FriendFeed issues would be a lot longer. I may still do that, but there's something else also worth discussing...

One of the benefits of experience is that it gives some degree of perspective. Of course there's a huge risk of overgeneralizing (someone took a picture!), but with that in mind...

We starting working on Gmail in August (or September?) 2001. For a long time, almost everyone disliked it. Some people used it anyway because of the search, but they had endless complaints. Quite a few people thought that we should kill the project, or perhaps "reboot" it as an enterprise product with native client software, not this crazy Javascript stuff. Even when we got to the point of launching it on April 1, 2004 (two and a half years after starting work on it), many people inside of Google were predicting doom. The product was too weird, and nobody wants to change email services. I was told that we would never get a million users.

Once we launched, the response was surprisingly positive, except from the people who hated it for a variety of reasons. Nevertheless, it was frequently described as "niche", and "not used by real people outside of silicon valley".

Now, almost 7 and a half years after we started working on Gmail, I see things like this:

Yahoo and Microsoft have more than 250m users each worldwide for their webmail, according to the comScore research firm, compared to close to 100m for Gmail. But Google's younger service, launched in 2004, has been gaining ground in the US over the past year, with users growing by more than 40 per cent, compared to 2 per cent for Yahoo and a 7 per cent fall in users of Microsoft's webmail.

And that probably isn't counting all of the "Apps for your domain" users. I still have a huge list of complaints about Gmail, by the way.

It would be a huge mistake for me to assume that just because Gmail did eventually take off, then the same thing will happen to FriendFeed. They are very different products, and maybe we just got lucky with Gmail.

However, it does give some perspective. Creating an important new product generally takes time. FriendFeed needs to continue changing and improving, just as Gmail did six years ago (there are some screenshots around if you don't believe me). FriendFeed shows a lot of promise, but it's still a "work in progress".

My expectation is that big success takes years, and there aren't many counter-examples (other than YouTube, and they didn't actually get to the point of making piles of money just yet). Facebook grew very fast, but it's almost 5 years old at this point. Larry and Sergey started working on Google in 1996 -- when I started there in 1999, few people had heard of it yet.

This notion of overnight success is very misleading, and rather harmful. If you're starting something new, expect a long journey. That's no excuse to move slow though. To the contrary, you must move very fast, otherwise you will never arrive, because it's a long journey! This is also why it's important to be frugal -- you don't want to starve to death half the way up the mountain.

Getting back to FriendFeed, I'm always concerned when I hear complaints about the service. However, I'm also encouraged by the complaints, because it means that people care about the product. In fact, they care so much that they write long blog posts about what we should do differently. It's clear that our product isn't quite right and needs to evolve, but the fact that people are giving it so much thought tells me that we are at least headed in roughly the right direction. I would be much more concerned if there were silence and nobody cared about what we are doing -- it would mean that we are "off in the weeds", as they say. Getting this kind of valuable feedback is one of the major benefits of launching early.

If you'd like to contribute (and I hope you do), I'd love to read more of your visions of "the perfect FriendFeed". Describe what would make FriendFeed perfect for YOU, and post it on your blog (or email [email protected] if you don't have a blog -- they create them automatically). Feel free to drop or change features in any way you like. Yes, technically you're doing my work for me, but it's mutually beneficial because we'll do our best to create a product that you like, and even if we don't, maybe someone else will (since the concepts are out there for everyone).

Saturday, January 03, 2009

The question is wrong

On "Coding Horror", Jeff Atwood asked this question:

Let's say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl. What are the odds that person has a boy and a girl?

He then argues that our intuition leads us to the "wrong" answer (50%) instead of the "correct" answer (2/3 or 67%).

However, the question does not include enough information to determine which of these answers is actually correct, so the only truly correct answer is, "I don't know" or "it depends". I skimmed though the comments on the post (there are about a million), and didn't see anyone addressing this issue (though someone probably did). They mostly argued about BG vs GB for some reason.

The reason that this question is wrong is because it doesn't specify the "algorithm" for posing the question.

If we assume that boys and girls are born with equal probability (50/50, like flipping a coin), then families with two children will have two girls 25% of the time, two boys 25% of the time, and a boy and a girl 50% of the time.

If the algorithm for posing the question is:

Choose a random parent that has exactly two children
If the parent has two boys, eliminate him and choose another random parent
Ask about the odds that the parent has both a boy and a girl

Then we can see that step two eliminated the "two boy" possibility, which leaves the 25% probability of two girls and the 50% probability of both a boy and a girl. Of course probabilities should add up to 100%, so the final probabilities are 25/75 (1/3) for two girls and 50/75 (2/3) for both a boy and girl. This is the "correct" answer described by Jeff, and it occurs because of the elimination performed at step two.

However, if the algorithm for posing the question was instead:

Choose a random parent that has exactly two children
Arbitrarily announce the gender of one of the children
Ask about the odds that the parent has both a boy and a girl

Now we're back to having a 50% probability of there being both a boy and a girl. The difference is that there was no elimination at step two, and simply announcing the gender of one of the children does not affect the gender of the other child or change the probability distribution.

The problem with the question as originally posed was that it didn't specify which of these algorithms was being used. Were we arbitrarily told about the girl, or was a selective process applied?

By the way, if we're applying a selective process, then 100% is also a possibly correct answer, because at step two we could have eliminated all parents that don't have both a boy and a girl. Likewise, all other probabilities are also potentially correct depending on the algorithm applied.

Update: Surprisingly, some people are still thinking that my second algorithm yields 2/3 instead of 1/2 (see the confused discussion on news.yc). I think part of the reason is that I was somewhat imprecise with the concept of "elimination". The second algorithm does not eliminate any of the families, but if I announce that there is a boy, that does eliminate the possibility of two girls. This is where some people are getting lost and thinking that the boy+girl probability has become 2/3. The catch is that announcing the boy also reduced the boy+girl probability by an equal amount, so the result is still the same (it eliminated either BG or GB, I don't know which, but it doesn't matter).

Tuesday, December 30, 2008

communication and collaboration - the big upgrade

The responses to my blog are always a little surprising to me. Yesterday's post didn't have a whole lot of substance, but it did include one good product idea, which is to somehow let other people edit my posts.

Someone on news.yc was not impressed by my idea though:

I was going to disagree with those negative comments below but then read the blog and damn; this guy has a freaking ego to think people would want to edit his ramblings for him in any other way than comical..

Next, I looked at the comments that were on my blog directly. One of the first ones was from Paul Graham, and it said simply:

deniable -> deniability :-)

Well, apparently Paul Graham wants to edit my ramblings, and in a way that would make me look smarter too... I'm pretty sure that he would have just made the correction himself had it been as easy and obvious as leaving a comment, but unfortunately no blog software seems to do that, most especially not Blogger.com.

Last year, someone translated one of my posts into Chinese (and I had Google translate it back).

This all reminds me of one of the blog posts that has been trapped in my head for a long time...

It starts off with something about ants, because my house must have been built on top of a giant anthill or something, because they are continually staging giant invasions and I'm always having to set them on fire or vacuum them up or something. So I'm always thinking about ants, and ants are kind of interesting because, more so than a lot of animals, the individuals are not really viable, and the hive (or colony or whatever) is kind of like a creature of its own (yeah, I know, I'm not the first person to notice this). It even has a short term memory in the form of pheromone trails left on my floor, and I erase those memories with a paper towel and some soapy water. So the ant colony is fairly sophisticated, but each ant's behavior is relatively simple -- they are just following some simple rules and don't really comprehend why or how the colony works. They don't see the "big picture".

And that reminds me of our brains, which are built out of relatively simple neurons. Each neuron simply sums up it's inputs, and then generates an output that gets passed along to some other neurons (or something like that, I'm sure it's a huge simplification, but you get the point). Certainly no individual neuron can possibly comprehend what it's doing -- it just cranks along summing up inputs and generating outputs. The magic is in the wires, the connections among the neurons.

Individual humans aren't terribly viable animals either. They almost went extinct not that long ago (100,000 years?). However, since then we've managed to pretty much take over the entire planet and build all kinds of amazing things like airplanes, computers, and burritos. Humans started out kind of similar to other animals (but weaker and less numerous) and then became something fundamentally different. That transition occurred because we are able to communicate and collaborate like nothing else. We can communicate though both time and space. We learn from people who died thousands of years ago on the other side of the planet. Even a survivalist hunter who goes off into wilderness alone is still relying on all the training and knowledge that was passed on before the journey began.

So in many ways, the human society (or human superorganism) is kind of like the human brain -- the magic is in the connections. Significant advances have occurred when we upgraded the wiring that connects everyone. The inventions of spoken language, written language, and the printing press were all revolutionary because they enabled more sophisticated communication and collaboration.

And now I can ramble on about ants and neurons and stuff, and people all of the world can read it, and digest it, and tell me I'm an idiot, and make their own ideas, and pass them on to other people, and it all happens in a matter of minutes. As much hype and excitement as there has been around the Internet, I think that people may still be misunderestimating its importance. We are literally upgrading the wiring that drives human society.

This is also why I'm excited about things like FriendFeed. The flow of information and influence is rather fundamental to way our world works. In the past much of that information flow was slow and hierarchical. It had to pass through one of a relatively small number of tightly controlled networks and publishers. But suddenly, the information can come from anywhere, and go anywhere, and it doesn't need anyone's approval. If it's completely random, it won't work any better than a bunch of randomly wired neurons (which I assume isn't very good), but with the right wiring, everyone starts to get the right information for them, and maybe we can stop being so stupid. I'm not yet sure what this new human architecture looks like, but that's what makes it an interesting (and extremely important) problem.

I sometimes think of FriendFeed as a kind of "distributed broadcast channel", but that's just part of picture. Better collaboration, like having other people edit my blog posts, is another part. It enables each of us to do what we do best, which improves the overall system efficiency and intelligence (and more importantly, I can avoid things that I don't like doing).

Keeping with the brain anology, it's very likely that we can't even comprehend what's going on. I certainly don't. I'm just a little neuron, summing up my inputs, and then passing the result along to you.

Monday, December 29, 2008

blog, v2

I haven't posted anything here in about eight months, mostly because I've just been very busy, but also because:

Blogging is too hard
I post a lot of things over on FriendFeed, which is easier, and I'm lazy (and you really should subscribe to my FriendFeed if you find anything I post here at all interesting)
I got tired of my blog posts. When I read them, there's something I don't like.

Nevertheless, I sometimes want to share something more than a FriendFeed message or interesting link, so I'm going to give it a second try.

However, I've decided that in the future my posts will be more rambling, and more pointless. I think part of what I don't like about the older posts is that they are sometimes arguing a point or something, but my real point (or my intention, at least) is just to share some kind of idea or thought, not convince anyone of anything. Also, I think this will be a lot easier to write because I can just type a bunch of words and they don't have to fit together in any particular way, and it's also a good excuse to not bother with any editing, so I should be able to crank these things out really fast.

I also have this idea to outsource the writing of my blog posts to someone, ideally everyone. The idea is that I'd write a bunch of stuff and then someone else (maybe wiki-style) would turn it into something coherent and readable. That would save me a lot of time and also provide plausible deniable when I write something that turns out to be especially stupid or offensive. But that's in the future. For now, it will just be a bunch of words that keep going until I get bored or distracted, and then I'll hit "send" :) (I'm also writing these things in Gmail since the blogger interface upsets me)

Wednesday, April 23, 2008

The power of links and the value of global knowledge

Long, long ago, before Google, search engines evaluated and ranked web pages by considering each page in isolation, examining the size of the fonts, the contents of the meta tags, etc. In some cases, it was even possible to "hijack" another site's listings by simply cloning their HTML. Perhaps a few search engines attempted to improve on this with simple tactics such as counting the number of links to a page, but that was generally useless since it's so easy to create "fake" links in order to boost your count.

With Pagerank, Google took a very different approach. Instead of considering each page in isolation, they examined the link structure of the entire web and computed a global evaluation of that structure. In other words, they began looking at the entire forest instead of just the individual trees. Google did other things too -- Pagerank is just one of many factors, but this general approach of evaluating information in a global context is fundamental to many of the algorithms. These algorithms made it easier for Google to spot which web sites were actually important, and which were just pretenders. Of course Google isn't perfect, and people can still manipulate rankings to some extent, but it was substantially better than the old way, and good enough to form the foundation of what is now a $174 billion dollar company.

Last week I wrote about Facebook gathering similar information about people. By collecting information about people and the links between them, they can start to get a global view of the human "forest". Unfortunately, based on many of the responses, that post wasn't very well written. A lot of people focused on how annoying Facebook applications are (true), how search results limited to your friends would be useless (also true), or other things completely unrelated to my point. A few people mentioned that Facebook hasn't done anything useful with this data, which is actually a good point, but I think that has more to do with Facebook and the newness of the data than it does with the value of the data. After all, the web was around for many years before Google came along and started profitably mining the link structure.

Will Facebook ever do anything useful with the human link data? I have no idea, and it's not particularly important to me. However, I'm confident that SOMEONE will begin mining this data, and that it could ultimately be more valuable than the link data from the web. Facebook is a convenient example because they happen to have a head start on collecting the data, but others might be the first to actually profit from it. Google, in particular, is much better at data mining and also has quite a bit of human link data (from Gmail and Orkut). Microsoft+Yahoo will also have a nice data set, though I doubt that they will know what to do with it. Of course none of this data is perfectly clean and noise-free, but real data never is -- the web certainly isn't.

Thursday, April 17, 2008

Facebook knows who you are, and that's worth more than you think

It's very fashionable to declare that Facebook is an over-hyped fad and will never make any real money, certainly not enough to justify its insane $15 billion valuation. At first glance, it's easy to understand why some people might think it's a toy -- most of the activity there seems to involve biting, poking, and joining groups with funny names.

However, I think that assessment misses out on something very interesting: Facebook is capturing everyone's identity and relationships. Of course there's some noise caused by random friending, but by examining the larger graph as well as other details such as location, affiliations, interactions, and of course explicitly entered relationship details ("how do you know Paul?"), they can get a pretty good idea of which people are actual friends and acquaintances.

The lack of reliable identity information has always been an issue on the web. It's the reason why we don't have a useful directory of email addresses -- everyone in the directory would get bombarded by spam or other unwanted messages, and even if it did exist, how would you know which of the thousands of Adam Smiths is the one that you are looking for? Facebook has already solved this problem for a large fraction of people. It's easy to search for a name and then pick out the right person based on their picture, location, or friends. I get a lot of messages on Facebook, but unlike email, I have yet to receive any spam. That's pretty remarkable.

Perhaps a people directory doesn't seem terribly valuable, but if you can't imagine how to make money from knowing everyone's identity and trust networks, then you aren't being very imaginative. Spam and fraud are two of the biggest problems on the internet, and they are very difficult to stop because it's so easy to create new identities, and we have no good way of differentiating between real identities and fake ones. Even in "real" life, people are able to skip town-to-town, defrauding people again and again because to the people in the new town, they have a new and unknown identity.

One of the best examples of this problem on the internet is eBay. If you try to buy or sell something on eBay (especially computers or electronics, apparently), there is a very good chance that someone will try to rip you off -- just search Google for ebay scammers and you will find pages such as "How scammers run rings round eBay" and "eBay Forums: Today's Scams In Progress". Ebay has had a relatively solid lock on the auction market due to network effects, but with billions of dollars in profits, a $42 billion market cap, and 10 years of not innovating, I'm willing to bet that won't last. With reliable identity information, most of these fraud schemes would become impractical, which would obviously be a real advantage for an eBay competitor.

What else is highly profitable on the internet? Search. I doubt that anyone will ever beat Google at Google-style search, certainly not Microsoft or Yahoo, even if they do tie their horses together. The only way anyone will create something significantly better than today's Google is if they add a new and important ingredient to the mix. Many people have suggested that demographic information, or perhaps knowing what your friends have searched for will help, but I doubt it. What could work is actual, direct, human involvement by the users. In fact, it's already helping in a very limited form -- Wikipedia pages are written and edited by random people on the internet and they frequently occupy the top spots on Google (and I always click on them). Of course the problem with letting random users edit or reorder the search results is that you will quickly be overwhelmed by spam and fraud. But what if you knew who the users were and which ones you could trust?

Those are just the first few things that come to mind -- the uses of identity information are endless. Of course there's no guarantee that Facebook will actually realize any of this potential -- there were many search engines before Google, and they all fumbled the opportunity they had, but it's important to at least understand the potential for big things.

Update: This post was supposed to be about data more so than Facebook (Facebook just happens to have the data). See this post for a (hopefully) better explanation.

Sunday, March 30, 2008

Ideas vs Judgment and Execution: Climbing the Mountain

How much is an idea worth? Many normal people assume that ideas are valuable, and that if only they could think of one, they might be able to sell it for millions of dollars, like the Pet Rock. On the other hand, many engineers, VCs, and successful entrepreneurs claim that ideas are worthless. Paul Graham provides a sort of "proof" that ideas are worthless:

Startup ideas are not million dollar ideas, and here's an experiment you can try to prove it: just try to sell one. Nothing evolves faster than markets. The fact that there's no market for startup ideas suggests there's no demand. Which means, in the narrow sense of the word, that startup ideas are worthless.

People in the "ideas are worthless" camp usually claim that it's all about execution -- they have plenty of great ideas that just need great teams to execute on them.

I have ideas all of the time, many more than I have time for, and so I tend towards the "ideas are worthless" camp. However, there's a nagging inconsistency -- something isn't quite right.

Quoting yet again from Marc Andreessen's "Guide to Startups, part 4: The only thing that matters"

I'll assert that market is the most important factor in a startup's success or failure.
...
The product doesn't need to be great; it just has to basically work. And, the market doesn't care how good the team is, as long as the team can produce that viable product.
...
Conversely, in a terrible market, you can have the best product in the world and an absolutely killer team, and it doesn't matter -- you're going to fail.

In other words, you just need to build the right product. A mediocre team building the right product will succeed and a brilliant team building the wrong product will fail.

Isn't that a little bit like saying that having the right idea DOES matter? And if ideas are so plentiful, then why do we see great teams executing perfectly on bad ideas?

I've thought about this for a bit and realized that both camps ("ideas are valuable" and "ideas are worthless") are wrong, at least when stated so simply.

Imagine that products are mountains. To build a product, you will need to climb that mountain. Some mountains have a big pot of gold at the top, and some do not. In order to make money, you will need to pick the right mountain and then successfully climb to the top and gather up the gold. You can fail by choosing a mountain that has little or no gold at the top, or by dying on the way up.

Taking this metaphor a little further, there are also multiple paths up the mountain. According to Wikipedia, Mount Everest has fifteen recognized routes to the top. Some routes are easier than others.

Successfully executing a trip to the top of the mountain requires a certain level of technical ability -- how much will depend on the mountain and route. It also requires good judgment in order to choose the right route, or to change course when you realize that the current path isn't working out.

Judgment isn't talked about as much as execution, but it's obviously very important. A technically brilliant team, upon encountering a sheer cliff, may excitedly think to themselves, "this is the perfect opportunity to use Erlang!" (or some other fancy tech -- Erlang is just a funny example) A team with better judgment would notice that there's an easier route that goes around the other side.

Judgment also plays a critical role in choosing which mountain to climb. Our landscape of product-mountains has millions of different mountains, many of which have never been climbed. Other mountains have been attempted in the past, but the team froze on the way up, or there was no gold when they got to the top (apparently the gold flows intermittently in this analogy).

There are also people wandering around in the flat lands near the mountains. Many of these people have ideas about which mountains have gold at the top, and some of them have even drawn crude maps showing what they believe to be an easy route to the top. Inevitably, they try to sell their ideas and maps to the mountain climbers, but the climbers just brush them off and say that their ideas and plans are worthless.

Eventually, a team of climbers will discover a huge cache of gold on one of the mountains. Naturally, the people who were hanging around at the base trying to sell their ideas and plans will say, "I had that idea first! They stole my idea! I knew there was gold at the top of the mountain!"

And it's true that they had the idea, as did many other people. Ideas are plentiful. The problem is that most ideas are bad -- either there's no gold at the top of the mountain, or the ascent is too difficult with today's technology. What's valuable is the judgment to know which mountains have the gold, and the team that can get to it.

So are ideas worthless? Not quite. If a skilled climber who has successfully chosen the right mountains in the past thinks he knows the location of another gold-rich mountain, people will listen. The idea has value because it comes from someone who has a history of being right.

If the exact same idea were presented by a random person with no experience and no ability to execute, it would probably be ignored -- there's just not enough evidence that it's a good idea. If that person truly believes in their idea, they will have to prove it on their own. (The beauty of our system is that they often can, even if everyone else thinks it's a bad idea)

If someone with a history of being right also has a capable team of climbers who have demonstrated the technical skill and judgment to climb other mountains, then that is very valuable, and they will have no trouble getting their idea funded.

Summary:
Idea * Judgment * Ability * Determination * Luck = $$$

Thursday, March 27, 2008

FriendFeed from the command line

Sometimes, it's faster and easier to just use the command line. Thanks to the new FriendFeed API, I was able write a little script that connects my command line to my FriendFeed.

This probably would have been easier to write in Python, but bash is so awkward that it makes for a somewhat more interesting challenge. (most of this code is just dealing with image files -- the real work is done by curl)

Here you go:

#!/bin/bash
# Replace with your nickname:remote-key
# Go to http://friendfeed.com/account/api to get your remote key
USER="paulapitest:buggy696hoist"

function usage {
    echo "Usage: $0 [-t title] [-l link] [-u nickname:remotekey] [images ...]"
    exit 1
}

MAXSIZE=""
while getopts m:u:t:l: opt ; do
    case "$opt" in
        t)  TITLE="$OPTARG";;
        l)  LINK="$OPTARG";;
        u)  USER="$OPTARG";;
        m)  MAXSIZE="$OPTARG";;
        \?) usage;;
    esac
done
shift $[OPTIND - 1]

TITLE="${TITLE:-$LINK}"
TITLE="${TITLE:-$1}"

[ "$TITLE" = "" ] && usage

ARGS=("-F" "title=$TITLE" "-F" "link=$LINK" "-u" "$USER")
FILES=("$@")

for F in "${FILES[@]}" ; do
    if [ "$MAXSIZE" != "" -a -x /usr/bin/sips ] ; then
        T=`mktemp /tmp/ffshare.XXXXXX`
        sips --resampleHeightWidthMax "$MAXSIZE" --out "$T" "$F" 2>/dev/null
        F="$T;filename=$F"
    fi
    N="${#ARGS[@]}"
    ARGS[N]="-F"
    ARGS[N+1]="img$RANDOM=@$F"
done

CODE=`curl -o /dev/null -w "%{http_code}" "${ARGS[@]}" http://friendfeed.com/api/share`
if [ "$CODE" == "200" ] ; then
    echo "Shared on http://friendfeed.com/`echo "$USER" | sed -e 's/:.*//'`"
else
    echo "Failed: HTTP response $CODE"
fi

Monday, March 17, 2008

Is fragmentation bad?

Imagine that you've just finished watching a movie and are in the mood to talk about it. How are you going to do that? You could chat with random, semi-anonymous people in the movie theater lobby (assume you went to a theater). You could find a community of people who are big fans of the director or the book that the movie was based on. Or, if you saw the movie with friends or family, maybe you'll discuss it with them.

Which of these options you choose will probably depend on your situation. Sometimes it's fun to hear what "random" people think. If the movie is a little more niche and you're somewhat of a connoisseur, you may not care what random people, or even your friends, think. On the other hand, going to movies is often more about shared experience than it is about the movie itself. We enjoy spending time with our friends and the movie is just something interesting to discuss.

Ultimately, a single movie may spawn millions of separate discussions among millions of different people, all in different situations and contexts.

However, there's a question that no one is asking: Isn't all that fragmentation bad? Instead of having millions of separate discussions, shouldn't we have a single, unified discussion, preferably under the control and ownership of the movie studio?

No?

I enjoy our fragmented movie discussions, and I suspect that I would hate the single, unified, shouting match that would occur if we tried to unify all of those separate discussions. This issue of unified discussion may seem a little silly, but I keep seeing it repeated in the context of blogs and other online content.

People sometimes complain that specialized communities such as news.ycombinator.com are taking the conversation away from the sites that they link to, but I go to news.yc in large part because it has an intelligent and well behaved community. That community is kind of niche -- they mostly talk about programming and startups -- but I'm interested in those same things, so I like it.

On the other hand, I occasionally read the comments on YouTube, but I would never comment there myself. It's too random and belligerent for me.

Most recently, this issue of fragmentation has been brought up a lot when debating FriendFeed. One of things that people really love about FriendFeed are the comments -- it's the only place on the web where I can easily share and discuss things with my actual friends (to see what this looks like, view the things I've shared or the things that I've liked or commented on).

Although comments are one of our most popular features, they are also our most controversial feature. If you believe that there should only be a single, unified discussion, then the extra fragmentation caused by FriendFeed will seem like a step in the wrong direction. In fact, not only is there a separate discussion on FriendFeed, there may be hundreds of separate discussions within FriendFeed on the very same topic or link (because different people are sharing the link, and different people have different friend groups).

I, for one, enjoy the fragmentation. It's important to understand that FriendFeed isn't trying to replace the specialized communities on places such as news.yc, or the screaming hordes on YouTube. We're creating a third option: discussion with friends. It may not be for everyone, and that's fine, but many people really like it, including people who would never participate in broader forums such as TechCrunch or YouTube.

Monday, February 25, 2008

Good news, everyone!

FriendFeed is officially launching! (and also announcing our funding)

See Louis Gray, VentureBeat ("Friendfeed, the best software for conversations"), and TechCrunch for more detailed reviews.

Sunday, February 17, 2008

The most important thing to understand about new products and startups

First, a quote from Marc Andreessen's "Guide to Startups, part 4: The only thing that matters"

If you ask entrepreneurs or VCs which of team, product, or market is most important, many will say team.
...
Personally, I'll take the third position -- I'll assert that market is the most important factor in a startup's success or failure.

Why?

In a great market -- a market with lots of real potential customers -- the market pulls product out of the startup.

The market needs to be fulfilled and the market will be fulfilled, by the first viable product that comes along.

The product doesn't need to be great; it just has to basically work. And, the market doesn't care how good the team is, as long as the team can produce that viable product.
...
Conversely, in a terrible market, you can have the best product in the world and an absolutely killer team, and it doesn't matter -- you're going to fail.

Mark's blog post did not immediately resonate with me, because his terms are somewhat different from the way I think. After all, how great is your product if nobody wants it? How great is your team if they persist in building something that nobody wants?

However, his main point has stayed in the back of my mind since then, and I'm continually reminded of how important it is, and how often I see people who clearly don't get it.

In my mind, there's really two points. One: You can take the smartest, most experienced, most connected, most brilliant people in the world and have them build the most stunningly designed and technically advanced product in the world, but if people don't want it, then you will fail. This is roughly what happened with the Segway, for example.

Perhaps that seems a little discouraging. After all, if really smart people with all the right resources can fail, then what hope is there for the rest of us? Perhaps success is random, and maybe startups are more like the lottery than we'd like to admit.

I don't believe that's true though. There is an optimistic way of understanding my first point, and that's my second point: Even if you aren't the smartest person around, and your product is kind of ugly and broken, you can still be very successful, if you just build the right product. YouTube and MySpace are both fine examples of this.

But if your team is so great, why aren't they building the right product? Simply put, they have the wrong attitude. Firstly, they overestimate the importance of their own skills. Engineers think that success is all about fancy technology and complex engineering (hello Google). Designers think that success is all about beautiful design. MBAs think that success is all about knowing the right people, or spreadsheets, or something. If you have especially smart or successful people, then this problem could be even worse, because then the team is also likely to be arrogant and overconfident, which makes them less likely to question these assumptions or the value of their own skills.

It's easy to find examples of this wrong attitude. When Google acquired YouTube, many people inside the company were flabbergasted, "But they have no technology!?" They didn't understand that you only need enough technology to make the product work. Any more and you probably have the wrong priorities. I regularly see similar complaints about Facebook, MySpace, and a lot of other popular sites. Similarly, people will often complain that MySpace or even Google has "no design" or "bad design". Again, they have enough design (or the right design) to work for their users.

So what's the right attitude? Humility. It doesn't matter how smart and successful and qualified you are, you simply don't know what you're doing. The good news is that nobody else does either, though some are foolish enough to think that they do (and that's why you can beat them).

What is the humble approach to product design? Pay attention. Notice which things are working and which aren't. Experiment and iterate. Question your assumptions. Remember that you are wrong about a lot of things. Watch for the signals. Lose your technical and design snobbery. Whatever works, works.

MySpace is a great example of this. I'm pretty sure that their custom profile page layouts were an accident. They didn't know enough to properly escape the text that people put on their profiles, and that allowed their users to start including arbitrary html and css in their pages. This is a common bug, and most people would have fixed the bug and that would have been the end of it (really great engineers wouldn't have had the bug in the first place). But they did something smarter. They noticed that the feature was popular and found a way to preserve it. The result is mostly ugly, but it's extremely popular.

There are many other accidental inventions besides MySpace, but it's important to understand that "accidental" isn't the same as "random". There are clues all around us, we just need to watch more closely.

For web based products at least, there's another very powerful technique: release early and iterate. The sooner you can start testing your ideas, the sooner you can start fixing them.

I wrote the first version of Gmail in one day. It was not very impressive. All I did was stuff my own email into the Google Groups (Usenet) indexing engine. I sent it out to a few people for feedback, and they said that it was somewhat useful, but it would be better if it searched over their email instead of mine. That was version two. After I released that people started wanting the ability to respond to email as well. That was version three. That process went on for a couple of years inside of Google before we released to the world.

Startups don't have hundreds of internal users, so it's important to release to the world much sooner. When FriendFeed was semi-released (private beta) in October, the product was only about two months old (and 99.9% written by two people, Bret and Jim). We've made a lot of improvements since then, and the product that we have today is much better than what we would have built had we not launched. The reason? We have users, and we listen to them, and we see which things work and which don't.

Find the gradient, then follow it.