Thursday, September 25, 2014

{re}thinking with Data

How do we go about finding insights?

Let's start with what we mean by 'insights' -- here's my working definition in the context of marketing:
the identification of a previously unknown connection between marketing activities and consumer behavior that changes how we align our solutions we human needs.
Or, simply the finding of something "I didn't know I needed to know."

The typical story line for doing analysis consists of the following chapters:
  1. State your objective
  2. Define your strategy and tactics for achieving them
  3. Create a measurement framework of KPI's that defines success
  4. Articulate the dimensions of the business/behavior you need to understand
  5. Set targets or goals that you need to compare to in order to track profess
And along the path you're defining a technical implementation and data capture plan to get from #1 to #5 and then refine.

If we're not careful, there are several risks in that plan.

First, we don't do all the steps...there is a tendency to jump in and build something; we forget to ask "why are we doing this in the first place?" enough times.    Think about the difference between Cliff Notes and the real thing.  A good essay requires reflection.

Second, we prematurely narrow the list of potentially valuable options.  This "focusing illusion" creates a bias because we tend to look no further than the first idea.  This is akin to judging a book by its cover. 

Third, we often view the problem in terms of outcomes related to our current business model not what might have caused them. While it is imperative to have metrics to track, they are simply links between behavior and performance.    Conversion rate is not a behavior.

The remedy to these risks is to spend more time thinking and that requires deep domain knowledge as well as the ability and willingness to explore.

For a good read try "Thinking with Data" by Max Shron

Wednesday, September 24, 2014

Using Your Own Customers to Crowd-Source Analysis

How can we leverage the fact that we're creatures of habit?

We often talk and read about benchmarks by tactic.  For instance 'email open rate' is tracked because it is the gate keeper to engagement and involvement.   As an example Silverpop reported the median open rate in APAC in 2012 as 27.2%.

I chose a two-year old number from 5,000 miles away in order to focus on the fact that these metrics are generated thru the lens of the campaign, not the consumer.

The health of a continuity email program relies on involvement over time and leads to the important question: How many more emails will you open?

The nice thing about 27.2% is that it is derived from nothing but 0s and 1s - consumers did or did not open the email.   If we look at the campaign or program level we now have a series of events from our intended audience that help answer that bigger question.  It turns out that there are two truths about consumer marketing metrics:
  1. Even if your numbers are flat, it is highly unlikely that it is the same consumers each week.
  2. The number of times consumers do something very often follows a predictable pattern.
In the case of email, consumers act just the way they do with buying products - that is there is a decay in the number people who do sequentially more things. 

The number of additional emails opened looks like this for one program.


Yes, the campaign had a problem with relevancy - something originally hidden by the rate of acquiring the email list.  Message: fix the communication strategy.

This is a very simple example of crowd-sourced analytics, there are lots of behaviors that can be treated in a similar fashion.  In fact there is a whole class of work being done in anomaly detection that takes advantage of habits.

Tuesday, September 23, 2014

How to Make Sharing Work

Why do people share? And with whom?

A recent post on LinkedIn about the phenomenon of sharing made the point:  we all ask for a share, but virtually no one offers a reason as to why we should bother.   Making it clear what we're offering and what action we want should be basic marketing.

But what should our expectations be about sharing?



The following is some of what Google learned in the development of circles.

First, circles exist indicating that people categorize others according to some meta-association.  There are likely some standard classes of association - think function like work or school and strength of the relationship like fraternity or building association.  It is easy enough to prove this, just compare one's FB friends and LinkedIn connections - some overlap, some exclusivity for defined reasons.

This means sharing must be selective - if we have circles then we don't intend to send everything to everyone.  Therefore we need to consider the why and who questions of context and audience.

Why do people share? the reason or context of sharing can be for one of three reasons.  It may be...
  • personal - stories or opinions about oneself
  • conversation - contribute to a discussion
  • evangelism - spreading the good word (or funny video)
Who is the recipient? the audience is often based on a choice as to whether the content/context combination is...
  • private and appropriate for only a select few, 
  • relevant to a community of interest, or
  • interesting to the masses.
So, before throwing the share button on every thing possible give some thought to how context and audience relate to your offer and call to action.

There is the obvious questions about what is shared...but I'll leave the content discussion for another time.


Source: extracted from David Huffaker's discussion of extracting meaning from data in "Doing Data Science".



Monday, September 22, 2014

3 Implications of Implementing Analytics

Are there {un}intended consequences of being data-driven?

As analytics moves closer to what Bill Franks of Teradata recently described in a post on operational analtyics there are organizational changes looming on the horizon.   A couple of things come to mind:
  • Predicting the Future Creates the Future: If analytic output is implemented by the business then the creators need to share responsibility for success or failure.  This changes the "analysis as a service" model quite a bit.
  • Opportunities Will Be In-Market Before the Business Case is Written:  The emerging trend in all of science is to analyze the data to uncover new theories, whereas in the past we started with a hypothesis and then collected the data to test it. Analytic-driven discovery inverts the venerable command and control approval process.  Which leads to...
  • Results Trump Responsibilities:  This central tenet about rewarding value creation over resource control comes from accenture's blog discussing the twelve self-evident truths of being digital.  Proving an impact has more weight than claiming one.
This suggests that future line managers will come from the ranks of analysts.  


Friday, September 19, 2014

5 Questions To Ask After Each A/B Test

Why do we test?

Very often the discussion about testing is framed in the measurement side of things.  "To improve ROI" is a fairly typical the answer to the question above.  Yet, from an analytic or data science perspective that actually misses the point.

The reason we test is to gain knowledge.

To be sure testing green buttons vs. red buttons should be framed in terms of conversion metrics.  But more importantly we need to be asking ourselves the following questions:
  1. Why did it work?
  2. What are the segment characteristics of the winning option?
  3. Among which segment did it NOT work?
  4. What do the results tell us about the decision making process?
  5. Where else in the pipeline or funnel can we apply this learning?
Understanding the answers to these questions will likely have a bigger impact on the business then testing purple in the next round.

Wednesday, September 17, 2014

Big Data vs. Data Science

What is the difference?

A lot of conversations I'm in having these days ask about these two phrases:  Have I done it? Can I lead a team doing it?   To answer I've had to put some stakes in the ground and define them from my point of view.
  • Big Data:  a state in which current systems and capacities are simply overwhelmed. One cannot use traditional thinking or tools because the data doesn't fit in memory on a single machine.
  • Data Science: the process of interrogating data in hopes of improving the human condition.
While Big Data is a state of being it is by no means static.  Like the rapids on the Inga river it can be a massive torrent of moving droplets.  The bigger the wave, the more a Data Scientist {team} needs computer science skills to navigate from point to point.  And unlike its predecessors "Data Science" as a discipline starts from a different place: given data, what questions could be answered?   Empirical, theoretical and computational sciences start with a question and don't actually have much data - they tackle different problems through observation, logic/proof and Big Hardware.

Because we're looking at the world passing by as a torrential stream of bits we need to have a goal, an objective or a problem to solve. One simply doesn't just jump in, there needs to be a plan and a lot of preparation (did I mention a LOT of preparation) grounded in experience, math and statistics.

Big is in the eye of the beholder.

Having worked with US and Canadian clients there is a line in the sand where things seem big.   For example a reasonably sized loyalty program for a national US retailer is considered big by Canadian standards since it is larger than the total population.  Frame of reference matters.

Science is a pursuit, a line of reasoning not an algorithm.

Along the path we need to visualize, explain and communicate what we've learned to date. Sometimes it is enough to know that a tactical change improves conversion because of correlation; other times we need to explain why and address causality.

Big Data is not Data Science and Data Science is not Big Data although it is quite clear the two overlap and the most frequently mentioned stories come out of that intersection.

Congo: The Grand Inga Project
The story of Steve Fisher and friends running those rapids was released in a documentary in 2012.

Thursday, September 11, 2014

Dear Account Manager: Pleases don't ask me #2

Can I have some insights with that report?

Unlike a side of bacon, insights can't be ordered up on demand.  The best insights don't come from a short-order analyst, they come from those with a deep understanding of the business problem.  And like good recipes, they take a while to develop.

Nueske's Bacon
A good analytic team should be able to respond quickly to questions like "where are we up/down? and what are the likely drivers?"  But coming up with a new view on why consumers behave the way they do that changes how we market isn't suitable for "order up."

Please give us time...

Wednesday, September 10, 2014

Dear Account Manager: Please Don't Ask Me

What is the average {fill in your favorite event here}?

Average is the most dangerous word in marketing for three reasons.
  • First, our goal is to satisfy a need in a differentiated manner such that consumers make a connection with us.  There is no individual who believes she truly is average, so why should we think that way?  
  • Second, our job is to change history - new products, new markets, and better growth all succeed when we focus our attention away from the mushy middle.  
  • Third, as a measure of central tendency, it is either technically inappropriate because the underlying data doesn't behave normally and / or a single number masks too much useful information.
Consider the following plot of a typical event - there are some who do it once, a bunch who lump together at some low level and then a long tail out to super-consumers who do whatever this represents a lot.   Consumer behavior often looks this way - product trial, application usage and email opens all take this form.

The bars are the data, the lines are two different ways of smoothing the data so that we can draw conclusions or possibly make predictions.
  • Red is what we were all taught in class and produces an average of 42, which is almost on top of a big dip in the event count - as well as the meaning of life. Are we missing something important? Notice that it assumes we do less than zero things, an impossibility.
  • Blue is a better overall fit and shifts the curve to the left where it appears more logical, but like the normal curve it still misses the post-fifty dip.
Statistics, even as simple as average, work on a set of assumptions.  The above picture suggests that the red ones aren't quite right and the blue ones are probably much better.   There are other ways to describe those events, so I need more information to help you.

Instead of asking what the average is, ask me the following questions:
  • What does the distribution of {event} tell us about our customers?
  • Are there gaps or lumps that present opportunities to adjust our marketing?
  • What is different about consumers on one end versus the other?
  • How many events should we expect over what time frame?
And I promise I won't answer gamma, Pareto-NBD, or Weibull....

Post inspired by "Doing Data Science" by Cathy O'Neil and Rachel Shutt as well as  Eric Cai - the chemical statistician - and his series on R-bloggers.

Tuesday, September 09, 2014

Programmatic Creative

Is that title an oxymoron?

This morning a post in iMedia entitled "Programmatic Creative: The bridge between beauty and data" the authors make the case for linking the data and creative teams.   The story line is in the context of display advertising and real-time-bidding although the prime example is the virtual car buying process created by Jaguar Land Rover.

The concept is that one can optimize creative, not unlike what we've already proven with A/B testing on web sites, to improve conversion.  This is part of the larger trend around what the Winterberry Group calls programmatic marketing (registration required for white paper) where there is a method to the madness that some call marketing.

Having worked across the marketing spectrum of agencies, marketing services firms and media/publishers here's my thoughts on the analyst's role in this.
  1. Creative directors work with account planning to distill a client's request into a manifesto which is a concise description of the need to be satisfied.   In a somewhat ironic twist, the best creative solutions emerge when the situation is described in a very specific and precise way.  Thus, the role of analyst is to help sharpen that single paragraph to the point where the creative team can do their thing. 
  2. Marketing services firms often help with consumer segmentation, campaign strategy, and tactical execution.  They are likely to be on the hook for measurement and performance analyses.  Here the role of the analyst is a little like burning both end of the candle - input to the creative process as well as confirmation of its impact.  
  3. Media/publishers have yet a different point of view.  Their success relies on identifying and providing audiences that advertisers are interested in reaching.  In this scenario the analyst is looking across campaigns, web site signals and other data sources to identify and create the appropriate segments.  Providing counts of visitors is no longer enough and content interaction is becoming key.
These areas will probably converge faster in direct response advertising, e.g. driving sales or using promotional content, than in awareness scenarios due to fewer measurement challenges.  But as we better understand how people decide we can expect the learning to be applied to brand campaigns as well.

All this suggests that analyst or data scientist needs to have conviction and step up to line decision making.  No longer are we just a staff function providing options and opinions.

To answer the question: 'nope; programmatic creative- in the larger sense - will likely be the norm.'

Monday, September 08, 2014

Using Language to Build Communities of Interest

What can words tell us about interests?

Imagine playing the word association game and have to identify a community of interest from a single word: "Drift"

It could relate to communities focused on:
  • fly fishing: a boat used on rivers or a cast that is free of any pull on the line
  • racing: the act of oversteering and letting the rear wheels go wide
  • film: two Australian brothers create a surf company
In a LinkedIn post I updated some thoughts on Leveraging Communities of Interest and suggested that the language of a community is likely to be distinctive.

This idea implies we develop a thesaurus (or possibly even an ontology) for a COI that captures the concepts, their synonyms and the relationships between the words used. Taking a body of content, extracting the terms and building in the relationships is the role of a new type of marketing analyst.   This underused marketing technology (taxonomy) allows us to analyze:
  • Terms used: both unique and unusual frequency counts are the first hint of the existence of community
  • Relationships: phrases, as opposed singletons, highlight how terms go together.  Related, broader and narrower concepts clearly separate the three examples above
  • Synonyms: alternative labels for a concept are another source of clues
This approach is a little different than text mining or sentiment analysis, although the underlying technical tools are often similar, i.e. some form of natural language processing (NLP), because the end goal is the management of a vocabulary. To take full advantage of such analysis it should be deployed at the source of tagging since too often meta tags are whatever comes to mind at the time of creation. If you've ever gone back and looked at tags across a large number of articles, you probably know what I mean.

By understanding concepts, relationships and synonyms used by a community we could devise ways to assign a consumer to one or more of the communities. It would also provide the means to rate content in terms of effectiveness within and across communities.

The more content you create, the more important vocabulary is - particularly if you're a publisher.

Friday, September 05, 2014

Top 10 Algorithms Affecting Marketing

How can we relate math and marketing?

Earlier this year io9 listed the "10 Algorithms that Dominate Our World."  These complex math functions are:
  • Google Search - at 67% of search traffic, 'nuff said.
  • Facebook's News Feed - they pick what you see
  • OKCupid Date Matching - more successful than pickup lines because of weighted data
  • NSA Data Collection - the method has been redacted
  • "You may also enjoy" - from Amazon to Zappos guiding the next choice is rampant
  • Google Adwords - figuring out if you'll be satisfied with the results, and then charging more
  • High Frequency Stock Trading - leaves humans out of the equation
  • MP3 Compression - the pipe is only so big, everything should be compacted
  • CRUSH (Criminal Reduction Utilizing Statistical History) - public sector success story
  • Auto-Tune - pitch blending for fun and profit, just ask Cher
As a consumer I don't care how they work, I'm just glad that they do.  However, as a marketer I do care since these algorithms sit between my campaigns and my results.   These algorithms are complex, proprietary and changing so fast that understanding the specifics across the board is out of the question.  That said, here are some ideas on how they relate to a marketing point of view:
  • Organic search is a popularity contest with one judge and a hidden score card.  Since our content is judged against that from everyone else we need to constantly be looking at the world from the perspective of 'how do we help consumers find what they need'.
  • Social news feeds take into account the wisdom of the crowd in determining what to run past you. From this we should be thinking about what content archetypes and forms creates interest and engagement.
  • Matching algorithms often work with layers of weighted information across many dimensions. In many respects this follows the same process as branding - reduce the reasons to believe to a promise and ultimately to a single essence that aligns a solution with a need.
  • Algorithmic adjudication or determination poses some ethical questions about permitted use. Something anyone dealing with privacy already knows all to well.  Data is not neutral, observation is biased, and all models are based on assumptions and decisions.  
  • Recommendation engines are at the heart of personalization and dynamic content.  However, there is a risk of over filtering and missing that fact that decisions are based on emotions rather than facts. "I didn't know I wanted to have...." is tough to program if you've never had....
  • Paid search is a combination of two sets of results - theirs and ours.  This is a perfect place to think about and work on the problems of resource allocation and attribution in part because it is down at the intent level of the consumer journey.
  • Programmatic marketing, at least in the narrow sense of real time bidding (RTB), evolved from the ability to arbitrage at speed.  With RTB we're still left with two key questions:  What should we pay to reach an audience? and What should we tell them?  Creative optimization is next.
  • Compression is an analytic process that removes redundancy and noise in a defined manner.  The best parallel I can think of is the creation of consumer segments - we abstract and reduce the most important details and hope for a lossless solution.
  • Data mashups, particularly physical location + digital activity, are a breeding ground for unique insights.  They are also a good way to start to break down traditional channel silos because they look at the world in a new light where everyone can contribute. Adding algorithms makes it better.
  • No marketing activity has perfect pitch so comparing data to match a known standard sounds a bit like forecasting.  Since forecasts are always wrong, the interesting bits are "why?" and "what did we learn?"
I'm sure there are numerous other algorithms and parallels - please add yours.

Thursday, September 04, 2014

Two Uses of Visualization

What are the use cases for visualizing data?

Visualization has become the new buzz word.   First there was 'analysis' (came to the fore back when we first started doing pivot tables), then we had 'insights' (because it sounds cooler and technology companies wanted to be thought leaders). These concepts have morphed into machine learning (because it all seems too complex for mere mortals) and or visualization.

Visualization is used for two very different purposes:
  • exploration: looking for that insight that changes how we market
  • explanation: illustrating what we want to communicate after we've found it 
The skills and tools we use for those two functions are very different.

Exploration visualization needs to focus on patterns and relationships  - the first image uses the sample data sets accompanying "Doing Data Science" and is produced in R which allows for a train of thought to be followed quickly. It shows the distribution of click thru rate by age of consumer.  We already know that CTR is low (big bump to the left.) What is important to takeaway is that the relationship between click thru rate and age category is consistent, i.e. the lines pretty much stack on top of each other. This leads to another question: do we need to rethink the idea that age-based cohorts are different.
You may notice that there is a group of consumers who appear to not have been born yet, i.e. their age is given as (-Inf,0].  Knowing the data helps here since one has to be logged in order to link to age implying that this group consists of anonymous visitors.  So, this plot actually uncovers another point - CTR among subscribers doesn't vary from non-subscribers raising new questions about the value of engagement.  The data team should be versed in this type of visualization and questioning.

In contrast explanation needs to look like Consumer Barometer where a lot of design time went into creating an interactive display of how/where people buy stuff.  Like any dashboard, it provides a series of predefined navigation paths.  In this case the user can traverse along geographic, demographic, product dimensions. This specific image captures variance in online research by product category (color) and penetration (size) for 40+ year old consumers in Hong Kong.  Instantly we see that "they are more active in computer/electronic categories than any other."  This is where we partner with the creative or UX teams to highlight the differences and communicate key points.


Two uses, two technology stacks, two (actually more) teams.

Wednesday, September 03, 2014

Creating Segments from Signals Using a DMP

Just what does c_product=="apple" mean?

Previously I outlined some thoughts on developing segments that focused on more strategic uses.  The end result of the top down process is 5-to-8 key personae that should be tracked as part of any management reporting process.

However, when working at the campaign level the process of defining segments is a bit different because we're now more interested in getting the most bang for the buck as quickly as possible.  Optimization and bidding decisions are the epitome of this tactical thinking and the more variations we can test the better.

Think blocks...




Nowhere is this distinction more evident than in the process of developing segments for digital targeting using a data management platform (DMP).  In this case we start with the signals generated by traffic, convert them to traits that are appended to each visitor and then create the segments based on those attributes. This bottom up process takes a lot of structure and quite a bit of restraint not to end up with a bazillion one-off segments.  

The organization of traits and segments is the most critical aspect of a successful implementation even more so than the script that generates the signals .  These folder structures (I'm using Adobe Audience Manager as a mental model) are "priceless", a term used by a colleague with more experience than I at the time.

Here are lessons learned from a recent experience:
  • Use tag management to standardize or at least rationalize different tagging schemes before the signals arrive.  This becomes more important as the number of sources or sites increases.  Imagine trolling thru an unused signal report where each source has its own taxonomy.   Since these signals get converted to simple values, e.g. "apple", there is no other context provided than what you see.  The band aid is having to add traits just to define what you mean...this gets ugly as the "or" and "and" conditions get strung out.
  • Separate traits into independent dimensions that represent specific elements of behavior, e.g. location, site structure, content elements, and events/actions.  You can deconstruct past campaigns into its component parts, but don't think too much about those segments at this point.  The goal is to resist the injection of complex logic too soon in the process which ends up decreasing the value of the traits over the long term because they are too specific.
  • Think of segments at the DMP level as building blocks, not the final target group.  In this way we can easily mix and match at will at the destination or ad server side of things when we know what the campaign objective is.   We want to encourage reuse from both and immediate efficiency perspective, but also for analysis purposes later on.  By having building blocks we can look across campaigns for insights.
As a really simple example, consider building segments that a retailer can use for targeting it's customers.   There are at least two definitions here - site visitors (i.e., "h_referer") as well as those within a defined trade area (i.e., "d_postal").   These should be passed as two segments rather than one to allow different campaign rules to apply of various combinations at various points in time in the future.
  • Visitor In-Area:  retain, defend, reward
  • Visitor Out of Area:  understand
  • Non-Visitor In-Area: acquire, make aware
Now imagine a dozen or more classes of building blocks...

As analysts we should be in the business of creating the building blocks, not a specific audience segment.

Tuesday, September 02, 2014

3 Steps for Market Segmentation

What should we be thinking when defining segments?

The path from consumer interactions to valuable segmentation requires taking a series of steps rather than making one declarative statement about who our market is.
  • Step 1: Define why we need segments.   The technical definition of a segment - a group of consumers with a common need that are expected to respond similarly to our messages - doesn't actually help us much. For instance, "web site visitor" is a characteristic, not a segment per se since we have all types of visitors who are there for different reasons.  So I start with the business objective we are trying to improve.
  • Step 2: Given a metric that matters, identify what characteristics, attributes or behavioral traits could help identify members of a segment.  Since segments are a mental model for grouping consumers and rarely something people actually call themselves we're left with the challenge of identifying a collection of proxies that indicate membership.   I work through a list of attributes as if they were software feature requests: what traits 'could, should, and do' make a difference.
  • Step 3: Report by segment.  We've gone thru the steps necessary to create and target segments, it would be a shame not to report and analyze that way (happens all too often.)  The biggest obstacle is that the things we can track (digital activity) isn't always what we want to report (change in sales or marketing impact.)  And outside of pure-play ecommerce I'm going to be modeling that impact. 
The implications of these steps are:
  1. There is more than one segmentation scheme and our first ideas coming out of a conference room probably aren't the best
  2. This reality leads to a Test & Learn approach to the definition not just the creative/offer side of things
  3. Segments need to be stable over the marketing planning horizon in order assess change and affect budgeting

Monday, September 01, 2014

Analyst Job Defined

What is the role of an analyst?

Ran into this graphic from gapingvoid...Hugh MacLeod's site about using art to transform business... that sums it up quite nicely.

We're not paid to report the numbers but rather as Marc Cendella explained our job is "to share and explain how you can make the connections between those bits of information."  The goal of analysis is to improve things...be it a campaign, marketing in general, or the human condition.