Cohort is a group of users experiencing a common event within the same time period.
An oft-repeated but very relevant example of a cohort is- a group of students joining in the same year. So the class of 2017 is a cohort and so is a class of 18, and so on and so forth.
What is cohort analysis?
Cohort analysis is an analytical modeling employed to study the cohorts characteristics over a period of time and the elements that influence change in those characteristics. It traces its roots to medical research where cohort studies are done to identify the cause of a disease.
So, to identify the cause of lung cancer doctors would create a hypothesis that it is caused by smoking. Then they will take two groups- smokers and non-smokers.
Thereafter, both groups would be studied to identify the influence of smoking on the person’s likelihood to get lung cancer.
How do we employ this in business analytics?
In business applications, we compare cohorts- users sharing a common experience in a given time frame- or analyze the behavior of a single cohort, to identify a pattern that supports a growth hypothesis. That hypothesis could be anything.
For instance, we may create a hypothesis that users getting acquired via display ads have higher LTV than the ones getting acquired by Facebook. To prove the hypothesis we would do the cohort analysis.
Likewise, let’s suppose we want to identify the cause of the aggregate dip in your retention.So we would form a hypothesis that retention has a correlation with the first purchase of the customer.
To establish the relation we shall cohortize users on the basis of their first purchase and plot their, say monthly, retention %.
From the graph above it is apparent that the users who purchased marshmallows the first time displayed higher LTV than the others. This despite the fact that overall retention of the product has declined. Naturally, the intent of business now would be to get more users purchase marshmallows post acquisition.
Cohorts and Segments are not the same
Most folks interchangeably use ‘Cohort’ and ‘Segment’ which is not correct.
For two users to be part of the same cohort they have to be bound by the common event and time period. Eg 2017 graduates, 1990 born men.
However, to create a Segment you could use almost any condition as a basis which cannot necessarily be time and event based. Eg graduates, men.
Cohort is a subset of Segment. So, there can be a cohort of ‘new users this week’ and likewise, there can also ‘segment of new users this week’.
Now that we have understood fundamentals of cohorts, let’s understand some business use-cases.
Some powerful use-cases of Cohort Analysis
To explain the use-cases we have created a google sheet (linked below) where we have built the cohort chart for every use-case.
1. Understanding customer retention
But before we do that, a little throwback to how to read a cohort chart. We are skipping the data crunching part and jumping right into the presentation.
How to read a cohort chart?
Let’s go through row and column one by one. You could well see that column is for activation month and row is for the number of returning customers.
So, B4 represents the number of new customers we acquired in the month of Jan. C4 tells us the number of customers who were acquired in Jan but they returned in Feb. Likewise
C4- number of customers acquired in Jan who returned in March.
D4- the ones who returned in April
And so on and so forth.
Basically, as we move along the Jan’s row, we understand how the retention of new customers acquired in Jan fluctuated until Dec.
Column represents the number of returning or new customers. D4 represents the number of customers acquired in Jan who returned in March. D5- the number of customers acquired in Feb who returned in March. D6 is the number of new customers acquired in March.
The same pattern repeats as we move along the row.
Table 2 (Refer Sheet-Cohort by Active users- Sheet 1 | Excel)
Now, let’s understand how the each cohort, retention wise, behaves over the period of time.
To do that, we would slightly pivot the above table. We would change the column from the actual month to the ‘# of months since acquisition’. From Jan, Feb to 0. 1, 2 which would pull all the row data to the left.
You may notice that the table changed from right aligned triangle changed to left aligned.
So, in the first row, as we move along, we would know how many customers acquired in Jan returned in the succeeding months.
Table 3 (Refer Sheet-Cohort by Active users- Sheet 1 | Excel)
In this table, we changed the numbers into percentage to get better view of the data.
Now looking at each row we may get the retention curve of the corresponding month. However, what if we want to understand how the retention has been over the past 12 months?
So, in the final row, we have calculated the aggregate. The aggregate gives us the retention curve of the past 12 months.
2. Correlation between category and retention
A friend of mine had worked on the cohort analysis of one of the world’s largest retailer. He told me that one of the conclusions from their analysis was that the users who purchased baby products in their first visit showed higher propensity to visit again. This prompted the retailer to promote their baby section more aggressively.
One can create a hypothesis that there are some categories which trigger maximum stickiness among users when they are the first purchase.
To determine that category let’s cohortize users on the basis of category of their first purchase and plot their retention.
From the chart it is evident one can draw the following conclusions:
- Users buys Sportswear in the first purchase showed higher retention than the rest.
- Users buying Jewelleries in the first purchase showed the lowest retention rate.
- 5th month is critical as the churn seems to increasing beyond that.
Some possible inferences can be that the marketing expense for sportswear needs to be decreased. Likewise, the retention strategies for Jewellery purchasers need to be relooked. Retention strategy for users entering 5th month since their acquisition has to be evaluated.
3. What features correspond to maximum retention
A report by Quettra shows that an average app loses 77% of the DAUs within 3 days post install. Now, if your product itself isn’t deserving, then nothing can evade uninstall. However, if it is not, then apparently the first three days are critical and determinant of the user’s retention.
3 days was the average trend and your critical number could accordingly vary.
You could determine your own critical number through the method that we discussed in #1.
Let’s suppose it is x days for the time being then you have to do something within the first x days post install to hook users.
How cohort analysis comes into picture
Let’s create a hypothesis that there are some features in the app which when used increases the stickiness among users.
Create an aggregate retention curve of the last 12 months like we did in #1.
Note- The retention curve of the mobile app unlike a web-app is going to decrease linearly because a web-app doesn’t need to be installed on your device. A user can login any time he wishes. With mobile app, once it is uninstalled you potentially lose the user forever.
Now, screen the users who have retained and jot down the features used by them on the first day. Suppose you are analyzing for an e-commerce app and concluded the following traits to be common among all retained users.
Let’s say “push notification clicked” and “added to wishlist” are two most common actions
Now we would narrow our analysis for both of these events and do a comparison between them
Visit the above sheet and change the value for each feature from the drop down to see how the graph changes.
From the above chart, it would be clear that users who added-to-wishlist display higher propensity to retain than the rest. The ones who clicked push notification perform even worse than the average.
Again, this graph gives us the correlation not the cause of retention.
P.S. This is a very interesting method and extensively used by consumer businesses. I just discussed the basic framework and there are various edges that can lead you to a more definite conclusion.
4. How customers react to a new feature release
Inversely the above cohort analysis could also be used to figure out what are the obsolete features that needs some rework.
For instance, the cohorts curve of users who clicked on push notification fare poorly than the average retention curve. Push notification is obviously meant to complement your retention so the above chart prompts us to rethink our strategy.
Creating cohorts in Mixpanel, Amplitude, Adobe- First event and Returning event
If you are using Amplitude or Mixpanel, or any of the similar products, to do your cohort analysis, these are the two fields that you have to specify for creating cohort chart
- First event
- Returning event
Let’s see some examples
First event is the primary criteria to build the cohort- the ‘experience’ element in creating cohort that we discussed in the very beginning.
Returning event is the baseline that you want to track for your users. In the above charts, retention has been the baseline of our analysis. In analytics, retention could be defined as ‘any event performed by the user’ on your platform.
So, if we have to create cohort in Amplitude then it would somewhat look like this
Cohort analysis is a respite from vanity metrics.
At any time momentary growth can be bought which may give you temporary pleasure but cohort analysis allows to be cynical. It gives a very critical view of churn and doesn’t let it get masked by growth.
For instance, if you are investing into acquisition there can be an instant surge in the MAU but high MAU is not the indicator of growth. A cohort analysis will tell how many of those acquisitions are actually sticking with you.
Similarly, a particular channel might be amounting to the highest acquisition. But a cohort analysis will tell which of them contribute to maximum profit.
Whatever your key metrics may be you would be able to see how it evolves over the customer lifecycle or product lifecycle.