0121 31 45 374
Qoute Icon

How to generate the data needed for a cohort chart- cohort analysis Part 3

Tim

Sean Ronan pointed out on twitter that "what" the columns on www.quickcohort.com are which is a good point so I thought I would fill in the gaps a little. The data for a cohort is fairly simple and can be as granular as you decide but the columns needed are:

  1. First Action Date/Time
  2. Most Recent Action Date/Time
  3. Count of customers which have this First/Most Recent Action Date/Times

Most of the time you can strip the time part from the date/time (especially if you're looking at the data on a month basis) but the tricky part is getting the count of users within each date grouping. You can't just select min/max dates as you need the data grouped by your unique customer identifier. If you're running SQL Server 2005+ then you've got the benefit of Common Table Expressions.

For this example, I've assumed a simple order table structure which contains a Customer Reference (CustomerId) and an Order Date (OrderDate). You could however use any date and identifier which groups actions together e.g. ProfileId and LastLoginDate.

SQL 2005 or later

WITH Actions (FirstAction, LastAction, UniqueId)
AS (
	SELECT min(dateadd(dd, datediff(dd, 0, o.[OrderDate]), 0))
		 , max(dateadd(dd, datediff(dd, 0, o.[OrderDate]), 0))
		 , o.[CustomerId]
	FROM Orders o
	GROUP BY CustomerId
)
SELECT a.[FirstAction]
	 , a.[LastAction]
	 , count(a.[UniqueId]) AS [CountOfCustomers]
FROM
	Actions a
GROUP BY a.[FirstAction]
	   , a.[LastAction]
ORDER BY a.[FirstAction]
	   , a.[LastAction]

Otherwise, I think you'll need to write something using temporary tables e.g.:

Pre SQL 2005

CREATE TABLE #Actions(
	FirstAction SMALLDATETIME, 
	LastAction SMALLDATETIME, 
	UniqueId INT
)
INSERT INTO #Actions
(
  FirstAction
 ,LastAction
 ,UniqueId
)
SELECT min(dateadd(dd, datediff(dd, 0, o.[OrderDate]), 0))
	 , max(dateadd(dd, datediff(dd, 0, o.[OrderDate]), 0))
	 , o.[CustomerId]
FROM #Orders o
GROUP BY CustomerId

SELECT a.[FirstAction]
	 , a.[LastAction]
	 , count(a.[UniqueId]) AS [CountOfCustomers]
FROM
	#Actions a
GROUP BY a.[FirstAction]
	   , a.[LastAction]
ORDER BY a.[FirstAction]
	   , a.[LastAction]

DROP TABLE #Actions

This should then generate some data that looks like this:

FirstAction LastAction CountOfCustomers
2011-03-01 2011-03-01 1
2011-03-01 2011-04-01 1
2011-04-01 2011-06-01 1
2011-05-01 2011-10-01 1
2011-06-01 2011-11-01 1
2011-07-01 2011-08-01 1
2011-08-01 2011-08-01 1
2011-09-01 2011-12-01 1
2011-10-01 2012-02-01 1
2011-11-01 2012-02-01 1
2011-12-01 2012-02-01 1
2012-01-01 2012-01-01 1
2012-02-01 2012-02-01 2

Which you should just be able to drop into www.quickcohort.com. I've not written a version for MySQL as I suspect someone far better at MySQL will be able to pop something together but the pre SQL 2005 script should work.

Liked this post? Got a suggestion? Leave a comment