Cohort analysis provides deep insight into customer bases because cohorts expose how customer accounts grow, evolve and churn. Plus, cohort analysis provides a framework to evaluate product releases, marketing pushes and advertising campaign performance.
The Six Cohort Reports I Use to Understand Trends
Below is a list of the six reports I create when analyzing cohorts. The combination of these six reports provides a robust summary of the state of a customer base. By no means exhaustive, this collection is a solid starting point and of course these are framed in the context of revenue but could easily be changed to activity or another metric.
- Average Revenue Per Customer Over Time - Chart monthly revenue over time to contrast with cohort data
- Individual Account Growth Over Time - Chart all accounts to visualize trends.
- Typical Account Growth Over Time - Chart how the average account grows with time
- Number of Customers in Each Cohort - Chart number of customers in each cohort to see how sensitive cohort data is to sample size and also see the size of the new customer pipeline over time.
- Average Monthly Revenue By Cohort - Chart the revenue by cohort to see if newer customers generate more or less revenue than older customers. Really good for marketing spend evaluation.
- Cohort Comparison - Chart the different cohorts over time to see how their revenue characteristics compare.
Implementation in R
Cohort analysis is difficult to perform in a database or in Excel so I turned to R. Below is a generic cohort analysis script for R, the open source statistical language.
I use two libraries by Hadley Wickham, plyr for data manipulation and ggplot2 for plotting the data. Hadley is at the forefront of R and an extraordinary contributor to the community. I can’t overstate his impact on the way I use R. I seem to use only standard libraries and Hadley’s.
To use this data analysis, start with a file with 3 columns: “date” column (YYYY-MM-DD), “company_name” column, monthly revenue “revenue”. If your file isn’t in this format, use Hadley’s reshape library to “melt” the data. Note: I use tab-separated files but you could easily use a CSV.
If you have ideas for improving the script or if you’ve found a bug, send me a note using the email link on the left.
### IMPORT LIBRARIES ### library(plyr) library(ggplot2) library(lubridate) ### IMPORT DATA ### account_data <- read.delim("data.txt") account_data $date = as.Date(account_data$date) ### COMPUTE AVERAGES ACROSS CUSTOMER BASE ### monthly_average = ddply(account_data,~date,summarise, avg_revenue=mean(revenue), customers = length(unique(company_name))) ggplot(monthly_average, aes(x=date, y=avg_revenue)) + geom_line() + labs(title = "Average revenue by Calendar Month for Customers in $", x="Year", y="Average revenue ($)") ### REVENUE BY COHORT ### cohort = ddply(account_data,~company_name,summarise,cohort=min(date), revenue=sum(revenue), num_months_as_customer = as.numeric(round((as.Date(format(Sys.time(), "%Y-%m-%d")) - min(date))/(365.25/12)) + 1)) cohort_summary = ddply(cohort, ~cohort, summarise, mean_revenue = sum(revenue)/sum(num_months_as_customer) ) ggplot(cohort_summary) + geom_bar(aes(cohort, mean_revenue), stat="identity") + labs(title="Average Monthly Revenue by Cohort", x="Cohort", y="Monthly Revenue") ### COHORT ANALYSIS -- PLOT INDIVIDUAL ACCOUNT GROWTH OVER TIME, USE ALPHA TO EXTRACT PATTERNS ### accounts= ddply(account_data, .(company_name), transform, cmonth = as.numeric(round((date - min(date))/(365.25/12)) + 1, cohort = min(date))) ggplot(accounts, aes(x=cmonth, y=revenue, group=company_name)) + geom_line(alpha=0.3) + labs(title="Individual Account Growth Over Time", x = "Months Since Becoming a Paid Customer", y = "Monthly Recurring Revenue") ### COHORT ANALYSIS - PLOT THE AVERAGE ACCOUNT GROWTH OVER TIME ### average_growth = ddply(accounts, .(cmonth), summarise, avg_revenue = mean(revenue), num = length(company_name)) ggplot(average_growth) + geom_line(aes(cmonth,avg_revenue)) + labs(title="Account Growth Over Time", x = "Months Since Becoming a Paid Account", y = "Monthly Recurring Revenue ($)") ### PLOT THE NUMBER OF CUSTOMERS IN EACH COHORT ### ggplot(average_growth) + geom_bar(aes(x=cmonth, y=num), stat="identity") + labs(title = "Number of Customers in Each Cohort", x="Months Since Becoming a Paid Customer", y = "Customers") ### PLOT THE AVERAGE revenue OF CUSTOMERS BY DATE JOINED ### account_data = ddply(account_data, .(company_name), transform, cohort = min(date)) agg = ddply(account_data, .(cohort), summarise, avg_revenue = mean(revenue)) ggplot(agg, aes(x = cohort, y=avg_revenue)) + geom_bar(stat="identity") + labs(title = "Average revenue of Customers by Date Joined", x = "Date Became a Paid Customer", y = "Average revenue ($)") ### COMPARE COHORT REVENUE OVER TIME ### ggplot(account_data) + geom_line(aes(x=date, y=revenue, group=cohort, colour=cohort))