Had read this interesting paper by Michael Mitzenmacher some time ago and always wanted to blog about this. Now is a good time as any, while I wait for a long download to finish.
The paper talks about the new found interest in the power-law and log-normal distributions especially among web researchers. Michael provides ample evidence to show that these debates are nothing new. Power-law versus log-normal processes have been debated in several areas like biology, chemistry and astronomy for more than a century now.
A power-law distribution over a random variable X is one where the probability that X takes a value at least k is proportional to k-g. Here g > 0 is the power-law "exponent". Intuitively a power-law distribution represents a system where X has a very large number of instances of very small values and a very small number of instances of extremely large values.
Power-law distributions are also called scale-free distributions as their overall shape appears the same, no matter at what scale they are viewed from. A characteristic feature of the power-law is that it appears as a straight line with a negative slope, on a log-log scale.
Log-normal distributions on the other hand appear as normal (Gaussian) distributions in the log-log scale. Power-law is log-linear.
Both power-law and log-normal distributions represent very skewed systems with a small number of very resourceful entities and a large number of entities with very limited resources. Such skewed distributions are very pervasive. Some examples are: the sizes of blood vessels to the number of such vessels in our body; income distributions and the number of people with such incomes; populations of cities versus the number of such cities; the rank of a word in a document (in terms of its frequency) versus the number of times it appears; etc.
There are several generative models that speculate on the underlying random processes that generate the power-law and log-normal distributions. And the sheer simplicity of some of them make them most appealing.
One of the most common generative models for the power-law is preferential attachment. This is one of the most popular models to explain the in-degree distribution of web pages. Simply put, the preferential attachment model for the web says that the probability of a page obtaining a new incoming hyperlink is proportional to its existing in-degree. Pages that have more in-degree are more popular and tend to be more visible -- making their popularity go up further.
There are several other models that generate the power-law based on models like information-theoretic optimizations and martingale processes.
Similarly, there are several models for generating log-normal distributions. One of the simplest generative models is of multiplicative processes. Suppose that on a social networking site like Orkut, each person brings in about 'r' new friends into the system. Since each person is assumed to be acting independently, let r be modeled as an i.i.d. (independent identically distributed) random variable with a finite variance (i.e. there are limits up to which any person is likely to have friends). Suppose the site starts with an initial population of X(0). The population after one time step would be X(1) = r X(0).
As we can see, the population at any time t is given as X(t) = r X(t-1) = rt X(0).
This is a multiplicative process, where each iteration has a multiplicative, rather than an additive effect on the population. Now taking log on both sides, we get: log X(t) = log X(0) + t log r.
Logarithms have this effect of converting multiplications into additions. So what was a multiplicative process in the linear domain is now an additive process in log scale. Now we had assumed that r is an i.i.d. random variable with finite variance (and so would be log r). There is this nice little theorem called the central-limit theorem, which says that the distribution of any i.i.d. random variable with finite variance, approaches the normal distribution asymptotically. So, there we have a multiplicative process generating a normal distribution in a log-log scale.
Anyway, one of the best generative models for the power-law is as follows: A monkey randomly typing on a typewriter (having n letters and one space) is just as likely to generate words with a power-law frequency distribution as do any other models!
Assume that the monkey types a space with a probability q and any of the n letters with a probability of (1-q)/n. Spaces are used to delimit words. A word with c letters would then have a probability of: ((1-q)/n)c q.
There are nc words of length c. The probability of a word with frequency rank r(j) (having nj words) occurs with probability q((1-q)/n)log_n(r(j)), which in turn reduces to q(r(j))log_n(1-q)-1
.. giving us a power-law distribution. Reading this serves us with a good reminder that there could be nothing but pure noise that determines how the web (for example) grows! Maybe there is a giant monkey sitting out there weaving the web...
The paper talks about the new found interest in the power-law and log-normal distributions especially among web researchers. Michael provides ample evidence to show that these debates are nothing new. Power-law versus log-normal processes have been debated in several areas like biology, chemistry and astronomy for more than a century now.
A power-law distribution over a random variable X is one where the probability that X takes a value at least k is proportional to k-g. Here g > 0 is the power-law "exponent". Intuitively a power-law distribution represents a system where X has a very large number of instances of very small values and a very small number of instances of extremely large values.
Power-law distributions are also called scale-free distributions as their overall shape appears the same, no matter at what scale they are viewed from. A characteristic feature of the power-law is that it appears as a straight line with a negative slope, on a log-log scale.
Log-normal distributions on the other hand appear as normal (Gaussian) distributions in the log-log scale. Power-law is log-linear.
Both power-law and log-normal distributions represent very skewed systems with a small number of very resourceful entities and a large number of entities with very limited resources. Such skewed distributions are very pervasive. Some examples are: the sizes of blood vessels to the number of such vessels in our body; income distributions and the number of people with such incomes; populations of cities versus the number of such cities; the rank of a word in a document (in terms of its frequency) versus the number of times it appears; etc.
There are several generative models that speculate on the underlying random processes that generate the power-law and log-normal distributions. And the sheer simplicity of some of them make them most appealing.
One of the most common generative models for the power-law is preferential attachment. This is one of the most popular models to explain the in-degree distribution of web pages. Simply put, the preferential attachment model for the web says that the probability of a page obtaining a new incoming hyperlink is proportional to its existing in-degree. Pages that have more in-degree are more popular and tend to be more visible -- making their popularity go up further.
There are several other models that generate the power-law based on models like information-theoretic optimizations and martingale processes.
Similarly, there are several models for generating log-normal distributions. One of the simplest generative models is of multiplicative processes. Suppose that on a social networking site like Orkut, each person brings in about 'r' new friends into the system. Since each person is assumed to be acting independently, let r be modeled as an i.i.d. (independent identically distributed) random variable with a finite variance (i.e. there are limits up to which any person is likely to have friends). Suppose the site starts with an initial population of X(0). The population after one time step would be X(1) = r X(0).
As we can see, the population at any time t is given as X(t) = r X(t-1) = rt X(0).
This is a multiplicative process, where each iteration has a multiplicative, rather than an additive effect on the population. Now taking log on both sides, we get: log X(t) = log X(0) + t log r.
Logarithms have this effect of converting multiplications into additions. So what was a multiplicative process in the linear domain is now an additive process in log scale. Now we had assumed that r is an i.i.d. random variable with finite variance (and so would be log r). There is this nice little theorem called the central-limit theorem, which says that the distribution of any i.i.d. random variable with finite variance, approaches the normal distribution asymptotically. So, there we have a multiplicative process generating a normal distribution in a log-log scale.
Anyway, one of the best generative models for the power-law is as follows: A monkey randomly typing on a typewriter (having n letters and one space) is just as likely to generate words with a power-law frequency distribution as do any other models!
Assume that the monkey types a space with a probability q and any of the n letters with a probability of (1-q)/n. Spaces are used to delimit words. A word with c letters would then have a probability of: ((1-q)/n)c q.
There are nc words of length c. The probability of a word with frequency rank r(j) (having nj words) occurs with probability q((1-q)/n)log_n(r(j)), which in turn reduces to q(r(j))log_n(1-q)-1
.. giving us a power-law distribution. Reading this serves us with a good reminder that there could be nothing but pure noise that determines how the web (for example) grows! Maybe there is a giant monkey sitting out there weaving the web...
Comments
www.eecs.harvard.edu/~michaelm/ListByYear.html
you'll see I wrote a paper with Brian Conrad that shows even if the monkey is lazy (non-uniform distribution on keys) then you still get a power law. Strangely, that case was left open for about 40 years....
Thanks for the interest!
Michael Mitzenmacher
Nice to see your comments! Let me read the paper you've mentioned.
The lazy monkey problem seems somewhat analogous to preferential attachment, and so the occurrence power-laws seem more natural than the fair monkey problem.
Thanks for the interesting paper!