Skip to Content
MIT Technology Review

Making better decisions with big data personas

These data collections representing real people are used to present information to decision-makers. Combining them with analytics makes them much easier to manage.

A persona is an imaginary figure representing a segment of real people, and it is a communicative design technique aimed at enhanced user understanding. Through several decades of use, personas were data structures, static frameworks user attributes with no interactivity. A persona was a means to organize data about the imaginary person and to present information to the decision-makers. This wasn’t really actionable for most situations.

How personas and data work together

With increasing analytics data, personas can now be generated using big data and algorithmic approaches. This integration of personas and analytics offers impactful opportunities to shift personas from flat files of data presentation to interactive interfaces for analytics systems. These personas analytics systems provide both the empathic connection of personas and the rational insights of analytics. With persona analytics systems, the persona is no longer a static, flat file. Instead, they are operational modes of accessing user data. Combining personas and analytics also makes the user data less challenging to employ for those lacking the skills or desire to work with complex analytics. Another advantage of persona analytics systems is that one can create hundreds of data-driven personas to reflect the various behavioral and demographic nuances in the underlying user population.

A “personas as interfaces” approach offers the benefits of both personas and analytics systems and addresses each's shortcomings. Transforming both the persona and analytics creation process, personas as interfaces provide both theoretical and practical implications for design, marketing, advertising, health care, and human resources, among other domains.

This persona as interface approach is the foundation of the persona analytics system, Automatic Persona Generation (APG). In pushing advancements of both persona and analytics conceptualization, development, and use, APG presents a multi-layered full-stack integration affording three levels of user data presentation, which are a) the conceptual persona, b) the analytical metrics, and c) the foundational data.

APG generates casts of personas representing the user population, with each segment having a persona. Relying on regular data collection intervals, data-driven personas enrich the traditional persona with additional elements, such as user loyalty, sentiment analysis, and topics of interest, which are features requested by APG customers.

Leveraging intelligence system design concepts, APG identifies unique behavioral patterns of user interactions with products (i.e., these can be products, services, content, interface features, etc.) and then associates these unique patterns to demographic groups based on the strength of association to the unique pattern. After obtaining a grouped interaction matrix, we apply matrix factorization or other algorithms for identifying latent user interaction. Matrix factorization and related algorithms are particularly suited for reducing the dimensionality of large datasets by discerning latent factors.

How APG data-driven personas work

APG enriches the user segments produced by algorithms via adding an appropriate name, picture, social media comments, and related demographic attributes (e.g., marital status, educational level, occupation, etc.) via querying the audience profiles of prominent social media platforms. APG has an internal meta-tagged database of thousand of purchased copyright photos that are age, gender, and ethnically appropriate. The system also has an internal database of hundreds of thousands of names that are also age, gender, and ethnically appropriate. For example, for a persona of an Indian female in her twenties, APG automatically selects a popular name for females twenty years ago in India. The APG data-driven personas are then displayed to the users from the organization via the interactive online system.

APG employs the foundational user data that the system algorithms act upon, transforming this data into information about users. This algorithmic processing outcome is actionable metrics and measures about the user population (i.e., percentages, probabilities, weights, etc.) of the type that one would typically see in industry-standard analytics packages. Employing these actionable metrics is the next level of abstraction taken by APG. The result is a persona analytics system capable of presenting user insights at different granularity levels, with levels both integrated and appropriate to the task.

For example, C-level executives may want a high-level view of the users for which personas would be applicable. Operational managers may want a probabilistic view for which the analytics would appropriate. The implementers need to take direct user action, such as for a marketing campaign, for which the individual user data is more suitable.

Each level of the APG can be broken down as follows:

Conceptual level, personas. The highest level of abstraction, the conceptual level, is the set of personas that APG generates from the data using the method described above, with a default of ten personas. However, APG theoretically can generate as many personas as needed. The persona has nearly all the typical attributes that one finds in traditional flat-file persona profiles. However, in APG, personas as interfaces allow for dramatically increased interactivity in leveraging personas within organizations. Interactivity is provided such that the decision-maker can alter the default number to generate more or fewer personas, with the system currently set for between five and fifteen personas. The system can allow for searching a set of personas or leveraging analytics to predict persona interests.

Analytics level: percentages, probabilities, and weights. At the analytics level, APG personas act as interfaces to the underlying information and data used to create the personas. The specific information may vary somewhat by the data source. Still, the analytics level will reflect the metrics and measures generated from the foundational user data and create the personas. In APG, the personas provide affordance to the various analytics information via clickable icons on the persona interface. For example, APG displays the percentage of the entire user population that a particular persona is representing. This analytic insight is valuable for decision-makers to determine the importance of designing or developing for a specific persona and helps address the issue of the persona's validity in representing actual users.

User level: individual data. Leveraging the demographic metadata from the underlying factorization algorithm, decision-makers can access the specific user level (i.e., individual or aggregate) directly within APG. The numerical user data (in various forms) are the foundation of the personas and analytics.

The implications of data-driven personas

The conceptual shift of personas from flat files to personas as interfaces for enhanced user understanding opens new possibilities for interaction among decision-makers, personas, and analytics. Using data-driven personas embedded as the interfaces to analytics systems, decision-makers can, for example, imbue analysis systems with the benefit of personas to form a psychological bond, via empathy, between stakeholders and user data and still have access to the practical user numbers. There are several practical implications for managers and practitioners. Namely, personas are now actionable, as the personas accurately reflect the underlying user data. This full-stack implementation aspect has not been available with either personas or analytics previously.

APG is a fully functional system deployed with real client organizations. Please visit to see a demo.

This content was written by Qatar Computing Research Institute, Hamad Bin Khalifa University, a member of Qatar Foundation. It was not written by MIT Technology Review’s editorial staff.