Skip to main content

Posts

Co-occurrence and Correlation

In one of our projects, we encountered this dilemma where we had to nitpick on (the probability of) co-occurrence of a pair of events and correlation between the pair of events. Here is my attempt at disambiguating between the two. Looking forward to any pokes at loopholes in my argument. Consider two events e1 and e2 that have a temporal signature. For instance, they could be login events of two users on a computer system across time. Let us also assume that time is organized as discrete units of constant duration each (say one hour). We want to now compare the login behaviour of e1 and e2 over time. We need to find out whether e1 and e2 are taking place independently or are they correlated. Do they tend to occur together (i.e. co-occur) or do they take place independent of one another? This is where terminologies are freely used and things start getting a bit confusing. So to clear the confusion, we need to define our terms more precisely. Co-occurrence is simply the probability that
Recent posts

Meta-modeling heuristics

(Acknowledgment: This post has inputs from Sanket of First principles fame.) When teaching about formal axiomatic systems, I'm usually asked a question something like, "But, how do you find the axioms in the first place?" What are axioms, you ask? To take a few steps back, reasoning processes based on logic and deduction are set in an "axiomatic" context. Axioms are the ground truths, based on which we set out to prove or refute theorems within the system. For example, "Given any two points, there can be only one line that passes through both of them" is an axiom of Eucledian geometry. When proving theorems within an axiomatic system, we don't question the truth of the axioms themselves. As long as the set of axioms are consistent among themselves (i.e. don't contradict one another) it is fine. Axioms are either self-evident or well-known truths, or somethings that are assumed to be true for the context. But then, the main question becomes ver

Paradoxes and self references

One of the most celebrated paradoxes in set theory is the Russel's paradox. The story behind the paradox and subsequent developments is rather interesting. Consider a set of the kind S = {a, b, S}. It seems somewhat unusual because S is a set in which S itself is a member. If we expand it, we get S = {a, b, {a, b, {a, b ....}}} leading to an infinite membership chain. Suppose we want to express the class of sets that don't have this "foundationless" property. Let us call this set as the set of all "proper" sets, that is, sets that don't contain themselves. We can express this set of sets as: X = {x | x is not a member of x} Now this begs the question whether X is a member of itself. If X is a member of itself, then by the definition of X (set of all sets that don't contain themselves), X should not be a member of itself. If X is not a member of itself, then by the definition of X (set of all sets that don't contain themselves), X should be a memb

The Power Law of Monkeys..

Had read this interesting paper by Michael Mitzenmacher some time ago and always wanted to blog about this. Now is a good time as any, while I wait for a long download to finish. The paper talks about the new found interest in the power-law and log-normal distributions especially among web researchers. Michael provides ample evidence to show that these debates are nothing new. Power-law versus log-normal processes have been debated in several areas like biology, chemistry and astronomy for more than a century now. A power-law distribution over a random variable X is one where the probability that X takes a value at least k is proportional to k -g . Here g > 0 is the power-law "exponent". Intuitively a power-law distribution represents a system where X has a very large number of instances of very small values and a very small number of instances of extremely large values. Power-law distributions are also called scale-free distributions as their overall shape appears the s