[This is a guest post from Gary Angel, President of Semphonic, a web analytics company based in San Francisco]
Online survey technology has made available a whole range of analysis and measurement that was really not possible before. From inexpensive primary research to a deeper understanding of your web site audience to a different perspective on web behavioral data, online surveys can contribute mightily to our knowledge.
But online survey analysis doesn’t work quite the same way for each of these tasks. When you’re doing primary research or audience profiling using online surveys, your biggest concern is probably getting a good sample. Particularly in the early days of the web, most researchers simply discounted online surveys for primary research because the online population was too different. That isn’t really true for most companies nowadays – which is certainly one of the reasons why online survey usage has skyrocketed.
But assuming your sample isn’t skewed in some fundamental sense, the analysis of online survey data for primary research and audience profiling is essentially identical to the body of techniques developed for offline research.
That isn’t true, however, for the very wide and popular range of cases where you want to apply the results of survey research to a deeper understanding of the web site and the behaviors exhibited there. To see why this is so, consider the following example:
A media site launched an online survey of visitors. They tracked overall site satisfaction and also the usage of a number of different site areas. They had recently launched a new “comment” functionality on the site that allowed users to submit comments, rate comments, and track their own status as commenters. Tracking this tool in the online survey, they found that the users who generated comments had a significantly higher satisfaction score than the site average.
From this, they concluded that the comment functionality was boosting site satisfaction and was a success.
Sadly, however, this conclusion is simply not warranted. There is no way to determine from the basic facts:
Comment users have a higher sat score than non-comment users (attitudinal)
Or even
Comment users consume more pages than non-comment users (behavioral)
if either relationship is causal. We don’t know if commenting self-selects visitors who happen to be more satisfied and consume more content or whether it actually contributes to that relationship.
People who use comment functionality may already be more engaged and have higher satisfaction than those who do not bother. If so, the apparent (and statistically valid) relationship between using comment functionality and satisfaction is non-causal – at least in the direction we are hoping for.
Comments are not driving satisfaction, they are being driven by it.
It’s as simple as this. People who are highly-engaged with your site are likely to be more satisfied with it. They may also be more likely to view or post comments. This in no way proves that they are more satisfied because they view or post comments. They may be less satisfied as a result of commenting. They may be more satisfied. There may be zero impact. You just don’t know. Looking at the satisfaction scores for each area on your site and inferring causality from them is simply a basic statistical fallacy.
This is an incredibly common source of error when doing web analytics in general and it has migrated seamlessly over into the usage of online survey data. Self-selection is, in fact, a subtle sort of sampling problem where we forget that the sub-populations we are using for an analysis are not random.
I think it’s fair to say that a simple majority of all uses of online survey data as it applies to web site performance that I see are nothing more than interpretive errors caused by self-selection.
You can defend yourself against these types of errors, but it takes significantly more work. Internally, you can try to use other variables inside the survey to hold the populations constant across a range of other factors (like intent, brand awareness, overall usage) before you look at comparative satisfaction scores.
Naturally, the quality and size of the survey also affects its analytical strength. While “less is better” (more people will fill it out), a good survey will ask the same questions in different ways, in order to judge the quality of responses. “Were you able to find what you’re looking for on this website?” can be paired with “Was the navigation or search on this website effective?”
Widely disparate answers to these two questions suggest that survey respondents are not really paying attention to what they’re answering, and can then be filtered out. This, of course, is all standard surveying technique.
A different technique is to use behavioral data integration to analyze the population of relatively similar respondents (as discussed in a previous post) if you hold constant for number of visits, engagement milestones and total activity you can often get a good comparative population. Finally, you can use sampling techniques directed to tracking satisfaction of users before and after trying a tool or area (like commenting).
Each of these methods is designed to give you a valid population with which to compare the group who did the activity you’re interested in. Of course, each of these is more work than just doing a cross-tabulation between two survey variables. But what works okay for profiling your basic web audience is more than likely to be fundamentally deceptive when applied to a range of analysis that can involve self-selecting behaviors on the web.
As I mentioned above, behavioral analysis, analyzed using “engagement” as part of the analytical process, is just as prone to self-selection as online survey data. Combining the two is often the best way to build a much more comparative population set than either can achieve on their own. Survey data, when combined with behavioral data, can enlighten marketing and editorial teams about not just what visitors are doing, but also what they’re thinking.
Indeed, finding ways to develop better control groups is one of the larger, if somewhat hidden, advantages of combining web behavioral and online survey data.
More Info:
- Semphonic, a web analytics consultancy: http://www.semphonic.com/
- QuestionPro Pop-up Survey Configuration:
http://www.questionpro.com/help/64-window.html
[Gary Angel is President of Semphonic (http://www.semphonic.com), the leading independent web analytics consultancy in the United States. Headquartered in the San Francisco Bay Area, with offices in Washington, D.C. and Boston, Semphonic works with all of the major web analytics tools including Omniture, WebTrends, Unica, Google Analytics and Coremetrics. Semphonic clients include companies like the American Express, Barclays, the BBC, Charles Schwab, Genentech, Intuit, Kohler, the National Cancer Institute, National Geographic, Nokia, and Turner Broadcasting.]








1 response so far ↓
Catherine // July 14, 2009 at 9:27 pm |
Great post! If you are doing a survey, one other piece I would propose would be to have a system in place for analyzing the verbatims – the comments fields. This is where all the undiscovered stuff resides, and can provide the “whys” behind the scores.
If you have 100 surveys to review, this isn’t a big deal. If you have 10,000 on the other hand, you should look into a product like Attensity’s Survey Advantage.
http://www.attensity.com/en/Applications-and-Services/Applications/Voice-of-the-Customer/SurveyAdvantage.html for more information.
It’s also useful to correlate survey results with information from your own support desk/CRM system, with tweets, with expert forum analysis etc. Attensity can do that as well.