Email Logit Research

News:
Latest updates
 
Techniques and tools:
Introduction
Structured Equation Modelling
CHAID Analysis
Latent Class Analysis
Conjoint Modelling
Maximum Difference Scaling
Brand Mapping
Multivariate Testing
Key Drivers Analysis

 

 


Techniques and tools
Latent Class Analysis

Latent Class Analysis is a family of techniques based around clustering and data reduction and is fast become the state-of-the-art technique for segmentation projects. It uses a number of underlying statistical models to capture differences between observed data or stimuli in the form of:

  • Discrete (unordered) population segments
  • Group segments (e.g. groups of countries, within which there are population segments
  • Ordered Factors (Segments with an underlying numeric order)
  • Continuous Factors
  • Or mixtures of the above

To take a very simple example, users of a technology product answered questions about where the product is located (home/work or elsewhere), their type of use (Business use only, Personal use only and both) and their volume of use (captured in a diary). The clusters are defined using these variables, which are known as indicators. In practice clusters can be defined on many more indicators than this. The model works by identifying segments which can be used to predict (i.e. reform) the underlying data.

A number of other variables are used to describe the clusters: Model of Equipment, Region, Age, Job title and number of employees. Latent Class Analysis allows these types of variables (known in the Latent Class terminology as Covariates) to be used to more accurately predict cluster membership if the relationship with the clusters is statistically significant.

A 5-cluster solution best explained the variation in the first three questions. Cluster membership is also predicted well by the covariates.

Including descriptive, as well as indicator variables, as we have done in this case can produce better clusters, which make better use of the available data and are easier to explain. Statistical tests show which indicators and covariates make a significant contribution to the cluster model.

Rather than allocating each case to one cluster, the LC Clustering approach assesses the probability that every case (user) belongs to every cluster. For a model which converges well, these probabilities are usually close to 100% for the cluster a particular user is most associated with and 0% for the other clusters. This gives much more accurate cluster averages when analysing subgroups than with other methods.

These probabilities can be turned into bi-plots such as the one below showing cluster membership for the subgroups of interest.

The main benefits of Latent Class Models are their flexibility, in terms of:

  • The type of question/variable scale used in the segmentation (scale types include nominal, categorical, ordinal, interval or counts)
  • The underlying models used to define the clusters (fitting a cluster model in this way is accepted by leading experts as providing a superior solution which differentiates more effectively across key indicators)
  • The ability to compare and assess the fit of different segmentation models to the data (very comprehensive fit statistics) producing more robust and repeatable segmentations than other methods
  • The ability to deal with rater bias when using questions with a rating scale
  • Allowing individual cases to vary with respect to some continuous factors to focus the segmentation around the issues of greatest interest.

The very latest implementation of Latent Class Segmentation models (which only became available in 2005) allows continuous or ordinal factors to be formed along with the clusters. It also allows multilevel modelling.

Continuous Factors
Relationships between question responses which vary as a continuum rather than in clusters can be captured separately, allowing the clusters to focus on subtler relationships. For example, in a survey about food preference, the likelihood of preferring convenience and takeaway food over freshly prepared/healthy foods might lie on a continuum. There might be a very strong relationship between these types of food i.e. increasing/decreasing preference for convenience over fresh food. Using a simple segmentation might result in 6 segments which are mainly differentiated with respect to this difference, masking more subtle relationships between other variables. By capturing this relationship separately in a continuous factor, the segmentation is then more able to detect more subtle relationships between the remaining variables. For instance, it might differentiate preferences for different types of cuisine or tendency towards fad dieting.

This is a very powerful extension of traditional segmentation, made possible by recent advances in computer processing performance. It addresses the common concern about traditional segmentation; namely that they tend to focus on the obvious relationships in data rather than reveal more subtle patterns in data. Using this powerful extension to Latent Class Analysis means that these subtler patterns are more likely to be revealed.

Using continuous factors as “intercepts” in ratings based cluster models, also enables scale rater bias to be factored out of the segmentation solution. This eliminates the age old problem of segments defined more on people’s tendency to generally rate high or low on a rating scale. Taking this out of the equation means that segments are free to focus on the relative ups and downs of ratings across different factors, rather than on the obvious differences in rating scale use overall.

Ordinal Factors
Ordinal Factors can be thought of as segments which have some kind of underlying order to them. For example in a study looking at business development opportunities across different subgroups, it might be more useful to have a factor which groups organisations/consumers into three categories:

  • Good prospects
  • Possibilities
  • Bad Prospects

In this sense, identifying ordinal factor such as this in data can be thought of as creating a composite ordinal measure. The segments in this case are different levels of this measure (low through to high).

In the same study, we might find that a separate, independent ordinal factor exists which classifies organisations/consumers into:

  • Those with long terms needs
  • Those with short term needs

Combining the possible levels of these factors might create 6 segments which not only meet the needs of the Business development manager commissioning the research, but also provides the best fit to the data.

Multilevel Latent Class Models
Multilevel segmentation models allow the development of nested segmentations. For example, a sample might consist of individuals within different countries, employees within work-teams, diary entries for specific respondents, or customers within specific business units. This nested data consists of respondent/response -level data, nested within groups or some higher level data.

Taking the countries example, multilevel modelling allows clusters to be defined on individual responses, but with a higher level of additional clustering of the countries, according to the relative size of the individual-level clusters within those counties. For example, we might find there are 8 consumer segments and 3 groups of countries, consisting of Northern Europe (group 1), Southern Europe (Group 2) with Greece separate (3). The Northern Europe group might consist mainly of three consumer segments, with only a small proportion in the remaining segments. Southern Europe might mainly consist of 3 different consumer segments, while Greece might have two other dominant segments which are unique to Greece.

Nested segmentations of this sort can be much more informative to marketers than a simple segmentation across the whole sample.


Please contact Gary Bennett for further information (garyb@logitresearch.com).

 

< Back to Techniques and Tools Home Page

 
    © Logit Research 2004 Site by Ocean