A Cognitive Model for Decision-Making with Data Visualizations

Data visualizations increasingly inform our daily decisions. Traffic visualizations inform which route to take to the office, business intelligence dashboards indicate how you’re doing on projects and key performance indicators. And data collected by fitness trackers tell you how close you are (or aren’t) to reaching your weight loss or fitness goals.

Each of these domains (transport, performance, fitness) use different kinds of visualizations and may require different decision processes and frameworks. While there’s been significant research on data visualizations on decision making in isolated domains, there hasn’t been a much research around cross-domain research in an attempt to uncover a common cognitive decision-making framework. That is, until recently.

Earlier this year, team lead by Lace Padilla conducted an analysis of decision-making theories and visualization frameworks and propose an integrated decision-making model

What are Decision-Making Frameworks?

Over the last 30 years, the dominant decision-making theory into how humans make risk-based decisions has been the dual-process theory. In the first process, humans make reflexive, intuitive decisions with little consideration. This is also called Type 1 processing. Type 2 processing is more deliberate and contemplative. The two types of decisions were made famous by Daniel Kahneman in “Thinking, Fast and Slow.” There have also been some proposals that these two types are a gross oversimplication of how the human brain makes decisions and the reality is closer to a spectrum of decision-making, based on required attention and working memory.

Cross-Domain Research Findings

The researchers discovered four findings as part of the review. The first two are impacted by Type 1 processes; the third by Type 2, while the fourth appears to be impacted by both.

Visualizations direct viewers’ bottom-up attention, which can be helpful or detrimental

Things like colors, edges, lines and other foreground information can cause involuntary shifts in attention (bottom-up attention). This may cause viewers of a visualization to focus on things like icons while missing task-relevant information. In one example, reproduced from the original document, some viewers were willing to pay $125 more for tires when viewing the visualizations versus viewing a textual representation.

dataviz_fig6

Bottom-up attention has a significant influence on decision-making, but it’s also a Type 1 task that likely influences the initial decision-making process.

Visual encoding techniques prompt visual-spatial biases

How a visualization is presented can trigger biases. One example is using semi-opaque overlays on a map to indicate user location on a map. Representing the probable location as a blurred area produced different decisions than fixed probability area, depicted below:

Screen Shot 2018-10-06 at 17.32.07

Like the previous finding, these visual-spatial biases are a Type 1 process occurring automatically.

Visualizations that have a better cognitive fit result in faster and more effective decisions

“Cognitive fit” describes the alignment between the task or question and the visualization. In other words, is the visualization formatted in such a way that it facilitates answering the question being asked. The researchers used the example of finding the most significant members of a social media network. When the graph was formatted in a way that didn’t facilitate the task, participants with less working memory capacity performed the task more slowly than those with greater working memory. When using a visualization optimized for the task, there was no difference in task completion times.

Knowledge-driven processes can interact with the effects of the encoding technique

The last finding is that the knowledge that a person possesses can impact how the visualization is used, triggering biases or allowing viewers to use existing expertise. Knowledge might be temporarily stored in working memory or held in long-term memory and used with some effort (both Type 2), or stored in long-term memory and automatically used (Type 1).

The Cross-Domain Model

The model the researchers developed adds working memory to a previously existing model of visualization comprehension. Working memory can influence every step in the decision-making processe, except bottom-up attention.

Screen Shot 2018-10-06 at 17.48.13

Recommendations

As part of their review and the previously depicted cross-domain model, the researchers created several recommendations for data visualization designers:

  • Create visualizations that identify the critical information needed for a task and using visual encoding techniques to direct attention to that information.
  • Use a saliency algorithm to determine the elements in a visualization that will likely attract viewers’ attention.
  • Try to create visualizations that align to a viewer’s mental “schema” and task demands.
  • Ensure cognitive fit by reducing the number of mental transformations required in the decision-making process.

Overall, this is excellent work that should be top of mind for anyone using and presenting data visualizations to decision-makers.

Google/Walmart Tie-Up Leaves Data Use and Ownership Unanswered

Google and Walmart have announced a partnership where Google Home users can purchase Walmart’s products using voice ordering. As Recode points out, the intent of the partnership is to blunt Amazon’s initial foray into voice-based ordering. Coming at this from the data and analytics perspective, my first question is what happens to the customer data from, potentially, millions of orders?

Google’s partnership position is clearly more advantageous than Walmart’s. For Google, the data from voice-based ordering is likely to be combined with the existing customer profile it already has and will feed its advertising efforts. Obviously Walmart also gets the order data, but who else? Can Google resell that data to other parties? These details weren’t included in the partnership announcement, but Google’s terms and conditions make it clear that they can use data however it sees fit.

As partnerships between consumer-centric companies proliferate, the questions about who owns customer data and how it is used must become prominent questions for both the companies involved and the impacted consumers. After all, consumers provide the data that drives revenues for companies like Google.

Data Lake Webinar Recap

Last Thursday I presented the webinar “From Pointless to Profitable: Using Data Lakes for Sustainable Analytics Innovation” to about 300 attendees. While we don’t consider webinar polling results valid data for research publication (too many concerns about survey sampling), webinar polls can offer some interesting directional insight.

I asked the audience two questions. First, I asked what the data lake concept meant to them. There were some surprises:
datalake webinar q1

The audience’s expectation for a data lake is as a platform to support self-service BI and analytics (36%), but also as a staging area for downstream analytics platforms (25%). It’s not unreasonable to combine these two together – the functionality for a data lake is largely the same in both cases. The users for each use case differ, as well as the tools, but it’s still the same data lake. A realistic approach is to think of these two use cases as a continuum. Self-service users first identify new or existing data sources that support a new result. Then, those data sources are processed, staged and moved to an optimized analytics platform.

It was reassuring to see smaller groups of respondents considering a data lake for a data warehouse replacement (9%) and as a single source for all operational and analytical workloads (15%). I expected these numbers to be higher based on overall market hype.

The second polling question asked what type of data lake audience members had implemented. Before I get into the results, I have to set some context. My colleague Svetlana Sicular identified three data lake architecture styles (see “Three Architecture Styles for a Useful Data Lake“):

  1. Inflow lake: accommodates a collection of data ingested from many different sources that are disconnected outside the lake but can be used together by being colocated within a single place.
  2. Outflow lake: a landing area for freshly arrived data available for immediate access or via streaming. It employs schema-on-read for the downstream data interpretation and refinement. The outflow data lake is usually not the final destination for the data, but it may keep raw data long term to preserve the context for downstream data stores and applications.
  3. Data science lab: most suitable for data discovery and for developing new advanced analytics models — to increase the organization’s competitive advantage through new insights or innovation.

With that context in place, I asked the audience about their implementation:
datalake webinar q2

63% of respondents have yet implemented a data lake. That’s understandable. After all, they’re listening to a foundational webinar about the concept. The outflow lake was the most common architecture style (15%) and it’s also the type clients are asking about most frequently. Inflow and data science architectural styles tied at 11%.

The audience also asked some excellent questions. Many asked about securing and governing data lakes, a topic I’m hoping to address soon with Andrew White and Merv Adrian.

Five Levels of Streaming Analytics Maturity

Data and analytics leaders are increasingly targeting stream processing and streaming analytics to get faster time to insight on new or existing data sources. Year to date, streaming analytics inquiries from end users have increased 35% over 2016. I expect that trend to continue.
In getting to real-time, these leaders are presented with a range of proprietary commercial products, open source projects and open core products that wrap some existing open source framework. However, in many cases, streaming analytics capabilities are little more than commercially supported open source bundled with some other product. Creating a streaming analytics application is left as an exercise for the buyer.
The challenge is that getting real value from streams of data requires more than just a point solution. Stream analytics is a cross-functional discipline integrating technology, business processes, information governance and business alignment. It’s the difficulty integrating these areas that keeps many organizations from realizing the value of their data in real-time. I’ve been working with my colleague Roy Schulte on a streaming analytics maturity model to help organizations understand what’s required at each maturity level:
In “The Five Levels of Stream Analytics — How Mature Are You?”, we present structured maturity levels for data and analytics leaders to evaluate the current state of their stream analytics capabilities and how to advance their respective organization’s maturity to become smarter, event-driven enterprises. The report is focused on the use of event streams for analytics purposes, with the goal of improving decision making. Gartner clients can download it here.