Friday, February 22, 2013

Metrics can be Technically Correct and Precisely Wrong

Why do we have issues with data and metrics?

Data and metrics are unique in their ability to be both technically correct yet precisely wrong at the same time.   It is like they exist in a quantum, parallel universe.

Consider the case of Unique Visitors and Publication Opens.  A recent report showed 9,931 publication opens as well as 13,410 Unique Visitors.   And to help provide insights, the ratio of the two is computed as 'publications/visitor'.  

But 0.74 is less than one publication open per visitor.  

Something, that on the surface seems odd, if not outright wrong.  If publication opens happen by visitors then shouldn't the average be at least one?

The answer depends on how those two numbers are defined.
  • Publication Opens: a fairly precise technical event logged by the system based on an individual opening the index page (usually) of a document.  Given that definition it is easy to count.
  • Unique Visitors: a business definition is required to answer the question:  What set of events do we want to include in the definition of 'unique'?  If I open two publications;  I am counted once in Unique Visitors and twice in Publication Opens.   If I start a session, but don't open a publication the numbers are 0 Opens, 1 Visitor.   Hmmm….
All is revealed by looking at the way it is implemented, ie at the business logic used in the code and not the raw data.   The definition of "Unique" according to the business requirements, and hence code, covers not only "Publication Opens" but five other events as well.   So, it is quite possible that there are visitors who do NOT open a publication.   And in fact that is exactly the case. 

So, the implication is that the 0.74 is technically correct.    There are 9,931 events logged as an open; and there are 13,410 unique visitors to the site based on six events.  

However, the ratio and thus the business interpretation are clouded because the number doesn't answer the question the client is probably asking:  Among people who open publications, how many do they open?   

Language may be powerful, but it is imprecise.

We (myself included) often rush to say the data are bad…what we are really saying is:  We don't really understand the business requirements we conveyed nor do we have the skills to ensure what we meant actually shows up in the results.  It is experience with clients' business and marketing that allows us to at least apply a sniff test.   Does less than one publication open per visitor make sense?  What is my client going to think?  While I got what I asked for, is it what I need?

The data are what the data are. 

It is the logic conveyed by the business and the language used to implement where most issues arise.

No comments: