Visualizing Data: A Harvard Business Review Insight Center Report
Visualizing Data: A Harvard Business Review Insight Center Report
Sponsored by
VISUALIZING DATA
The HBR Insight Center highlights emerging thinking around todays most important business ideas. In this Insight Center, well explore the power of using data visualization to drive business strategy. Well talk about when (and when not) to use visualization, how to get started, how to know if youre getting a good return on your data visualization investment, and more.
1 Editors Welcome: Your Business Needs Insight Not Just Pretty Pictures by Scott Berinato 3 5 The Power of Visualizations Aha Moments by Scott Berinato The Value of a Good Visual: Immediacy by Bill Franks 18 Data for All! How New Tools Democratize Visualization by Bill Franks 19 Visualization as Process, Not Output by Jer Thorp 21 Telling Stories with Visual Data: A Glimpse into the Future of Narrative by Jer Thorp 22 When Creating Visualizations, Question Everything by Irene Ros and Adam Hyland 24 Visualizing Hip Hop Lyrics as Cultural Indicator by Je Kehoe 25 Tell Better Data Stories with Motion and Interactivity by Andrew DeVigal 27 The Science of What We Do (and Dont) Know About Data Visualization by Robert Kosara 29 The Chart Wars Have Begun by Jake Porway 30 WEBINAR SUMMARY Get the Picture: Gaining Insight with Data Visualization Featuring Jeremy Howard
7 The Question All Smart Visualizations Should Ask by Michael Schrage 8 When Data Visualization Works and When It Doesnt by Jim Stikeleather
10 Weve Reached Peak Infographic, and Were No Smarter for It by Dylan C. Lathrop 12 Its Time to Retire Crap Circles by Gardiner Morse 14 What Moleskines Market Position Really Looks Like by Dan McGinn 15 When Presenting Your Data, Get to the Point Fast by Nancy Duarte 17 Ten Years of News Corp. Income Data in Less Than a Minute by Gretchen Gavett
EDITORS WELCOME: YOUR BUSINESS NEEDS INSIGHT NOT JUST PRETTY PICTURES
BY SCOTT BERINATO
In 2007, if you were a Starbucks shareholder and you opened your annual financial report, the first meaningful information you encountered was this: The evolution of annual reports represents a broader trend in business communication: Data comes first, and its increasingly visual. Welcome to the Visualizing Data Insight Center, where, for the next month, well explore the broadening role data visualization plays in business communication. We believe this is not only an inevitable trend but also one you must embrace if you want to effectively communicate with all your stakeholders. Data visualization is taking hold now because of two trends. The first: Big data is here, it must be analyzed, and one of the best ways to make sense of it is with visual representations. The second: The tools to create good data visualizations are being democratized, which has led to a growing community of programmers, designers, and statisticians who can apply their analytical and intuitive powers to creating meaningful visual stories. Weve already declared the data scientist to be the sexiest job of the 21st century. Those scientists will need visualization experts the way writers need editors. While well be sharing some stunning visuals over the next month, this Insight Center is not just focused on pretty pictures. Our aim is to get you acquainted with this burgeoning discipline, emphasizing how and when you can practically apply visualization in order to be a more effective communicator. Well cover everything from the basic how you can improve PowerPoint slides using visualization to the advanced ways graphics can make sense of big data sets. Last year, if you were a Starbucks shareholder, you received only one annual report, and the very first page after the cover began with this: Well also share our contributors and editors favorite visualizations, with explanations of why they find them so useful. And, of course, well share some all-time bad examples as cautionary tales. We also want to know what your favorite or least favorite data visualizations are. You can email me or post links in the comments. Were kicking off the Insight Center today with a good mix of introductory content. Designer and art director Dylan C. Lathrop explores the growth of data viz and the positives and negatives that have emerged during this period. I interviewed Amanda Cox, a graphics editor at The New York Times, to talk about how they create their stunning and stunningly useful visualizations. How do they decide whats worth doing? And how do they staff against it?
And if you read the narrative annual report from that year, the first thing you saw was this:
And for fun, were running one of our most popular features on crap circles, exploring the world of remarkably awful PowerPoint cycle graphics. Thank you for joining us, and jump in with your ideas, insights, and visualizations as well.
what we start with. I come from a statistics background, and Im finding statistics students portfolios are crazy weak compared to the computer science students, even though theyre playing with the same problems. I think its because comp sci students are encouraged to play, whereas with stats majors, its, Heres your rule book, now make things. I dont think thats a good model for making better visualization. But surely you have to have some of those three core skills. In bigger projects, we put together teams where those skills are reflected, but its not like we all need to know how to program. I bring a statistical background. But Im not a designer by any means. Thats surprising that, as a graphics editor, you dont consider design a strength. When I first started, I thought design was ten minutes of make it cute at the end that I could talk someone into doing for me. Now I know that design thinking needs to be involved from conception. And after a while you see the math behind it. How do I minimize eye movement on this infographic? Something like that now I know how to do that because we have design principles. Design and typography do matter. Its about hierarchy of information and how people perceive information. Done properly, that cleanup work really matters. On the other hand, its easy to believe that it matters more than it does. If you make a fantastically interesting chart and some poor design decisions, the data will still come through. If you make a bad chart with a beautiful design, what have you done, really? Is there resistance to data viz at The Times? Does anyone think its a lot of work for not much reward? Not a ton of resistance here. Editors and reporters rightfully question why we might want to do a visualization. Its not always immediately obvious to them the journalistic value, and sometimes theyre right. The parallel to me is that theres room in a newspaper for quirky text stories too. Most everyone here would agree that the best way to tell some stories is through data. Some think this is so very rarely, some think this is so most of the time, but they would concede that telling the story with data is accepted. What about the cost-benefit part of it? Data viz is still somewhat new, and sometimes it seems like its a lot of work to make a simple point. We might be different than a non-news organization, but the cri-
terion for us is how interesting or newsworthy is this, just like any story. Also, when people ask us How long will it take? our answer is How long do you have? We know how to scope based on how much time we have. We rarely kill projects. Well scale them back, but thats often good, like taking off earrings but making sure youre still well-dressed. As far as ROI goes, Id argue there are lots of times that, when people see it, they process information differently, better, than if they were reading bullet points or text. How you measure that return, though, is a tricky calculus. Do you have an example of that reaction? Theres an aha moment sometimes. Even on the most obvious things. Take Matthew Bloch, Shan Carter, and Alan McLeans census maps.
unemployment chart your most basic visualization and we let people choose a Democrat or Republican interpretation of the data.
You can literally see the visualization change based on whose point of view was highlighted. It would be silly to interpret any data viz as truth. They are interpretations of truth. Part of whats driving this focus on data viz is the democratization of the tools needed to create them. Are you positive about the direction of technology to enable this kind of communication? Yes. You saw a slow period a couple of years ago. Before then, most work was being done in Flash with ActionScript. Then there was a period where I felt like we couldnt do interesting data viz because we had moved away from Flash but didnt have any kind of great replacements for it. Then some of the more tech-competent people starting using D3 JavaScript, and now were having fun with data again. In some ways it feels of the web in the way that the Flash stuff never did. Now when someone does something interesting, how he or she did it is really just sitting out there on the Internet, so you get this great sharing and building off of each other. What gets you most excited about whats happening in data viz right now? Data viz is both young and not young. Its still rapidly changing, so Im hoping it gets more awesome rapidly. But were already at a place where we can make people understand what they didnt understand. Now we want to make people understand what no one has understood before. The best visualizations cause you to see something you werent expecting and allow you to act on it.
Im just seeing what I basically know: New York neighborhoods are segregated. But I felt it in a way I never had before. You can feel a good data visualization. Do you have other favorites working in the field right now? Theres a group out of Harvard doing interesting stuff on how doctors interpret blood flow in heart diagrams. That kind of work, to me, is interesting. And whenever Eric Fischer publishes a new map, its usually more interesting than average. Do you worry about data viz being used to misrepresent data? Sometimes visualizations can feel like the answer even if theyre based on flimsy data. I think thats a problem. Coming from a statistics background, when I first got here I thought my big contribution would be to help us account for uncertainty in data viz, and that turns out to be very difficult. But I also think we have the power to make people more data and visualization literate. One thing we did was take a very simple
As the map example illustrates, data visualizations can make it easy to rapidly understand relationships, patterns, and stories that are contained within a complex data set. This is the reason they are so powerful. There is a reason for the saying A picture is worth a thousand words. It also holds for data points. Next, I take away the map and give you a set of data that gives relative location, size, spatial border details, and other information about the states. Could you describe the same relationships in just a few seconds based on the data alone? Ill bet youd throw your hands up in the air just like I do. I have never been able to find a remotely reasonable way to explain the information immediately visible to me in a map without making use of a map. You might not immediately grasp the importance of the above example because it seems self-evident that maps are important. Id like to suggest to you that the reason for that is that you are aware that maps exist in the first place. Had you never seen a map, youd be struggling to explain the information that a map conveys in some other way. This is the key to the value proposition of data visualization. It could be that you are struggling to convey information without
2013 Harvard Business Publishing. All rights reserved.
One of the best features of modern visualization tools is that they permit interactivity with the underlying data. In other words, a visual isnt static. You can click on various parts of a visual to drill into different views of your data on the fly. While many business intelligence tools have enabled drill-down reports for years, they typically contain only common visuals and also typically constrain users to predetermined paths. Visualization tools today dont apply many limits on what users can do, which opens up a lot more options for analyzing data. A few years ago, we put a popular visualization tool on my teams laptops. It was a huge hit. Over time, several members of my team stopped using traditional spreadsheet and presentation tools altogether in favor of the visualization tool. Even if all they need to show a client are some fairly standard bar and pie charts, the interactivity of the tool is a huge plus. When the chart is up on the
screen and a client asks a question that requires a different view of the data, it is easy to drill into that view on the fly. No more sending an email later in the day with another chart. The data in the charts can also be automatically updated with the latest data. That adds a lot of value on top of the visualizations themselves. Dont underestimate how much an appropriate visual can help you get your point across. You have to see the power of high-impact visualizations in order to fully grasp what is possible. The good news is that modern visualization tools can help users at any skill level do a better job of analyzing, comprehending, and presenting information. Give it a shot.
No argument here. ... Its imperative to illustrate the data in a way that people can understand. Phil Simon
Well said. Visualizations open new doors for many interpretations. And the success emerges when the data visualizations trigger meaningful discussions and lead to value creation and innovation. Anithasrnvsn
tricks (such as unnecessary 3D, or 2D when 3D is more informative) any of which can challenge the interpretation of the data. This also creates the risk of prespecifying discoverable features and results via the embedded algorithms used by the creator (something EDA is intended to overcome). These, in turn, can significantly influence how viewers understand the visualization and what insight they will gather from it. Ignoring these requirements and risks can undermine the visualizations purpose and confuse rather than enlighten.
graphic! About what? Whatever! And these so-called information graphics threaten to undermine even the most shining examples. Infographics can evolve by transcending cold data breakdown and combining data visualization with more human narratives. Some publications have begun to present well-designed information in tandem with deeply reported pieces online, and the future it represents is thrilling. Im not ready for an infographic about the death of infographics, but Im sure someone somewhere has already assigned that piece and is just waiting for us all to click. This article first appeared in TOMORROW magazine and was reprinted with the authors permission.
As I wrote back then, when you find yourself about to drop a crap circle into your slide deck, stop. And the next time a presenter trots out a circle to make a point, call him on it. Heres the original article. I urge you to forward it to violators, and submit examples in the comments below of the worst (or best?) crap circles youve encountered. The most dubious business plan can look solid even smart if its cast as a virtuous circle: See, we invest our profits in innovation to create delightful products that customers buy which generate profits that we invest in innovation! Who could argue with that? Indeed, the merit of self-reinforcing systems seems so obvious that businesspeople instinctively describe their strategies as cyclical activities that magically fuel themselves. Meanwhile, audiences demand snappy-looking, easy-to-digest graphics that, almost by definition, strip away nuance. Its no surprise, then, that business communications are lousy with circle-and-arrow diagrams that range from the dumb to the deceptive. Though youve seen a million of these, youve probably never thought much about them. Thats because, like optical illusions, they play on your expectations and trick you into seeing something that isnt there: If one arrow leads to the next, then of course the
With the next design, a Boston-based software company helpfully illustrates the stages of its application management life cycle. Through some trick of causality, termination leads to deployment. This may be a good model from a consultancys standpoint when a clients projects end, they start again but if youre paying the tab, you probably want the project to actually end when its terminated.
The friendly-looking sunburst that follows, captured from the website of a solar energy advocacy group, shows how to create an unlimited market for your product. Here, as the supply of solar energy increases, so does the demand in an apparently endless cycle. If these folks are right, were all in the wrong business.
Kudos, though, to the author of the disarmingly honest graphic below, from a U.S. safety engineers group a refreshing bit of outof-the-circle thinking. He seems to have had an epiphany as he created the diagram, realizing that the development of safety processes doesnt always chase its tail that management review neednt slavishly feed into strategy and policy in the service of continual improvement.
And this one, from a Canadian enterprise content management company, is notable for its sleight of hand. Circles rotating in opposite directions (in which, among other oddities, publish gives rise to search) link through arrows whose origins and destinations, on close inspection, are obscure or completely hidden. Maybe the intent of this diagram is to make prospects too dizzy to ask questions.
By fighting the impulse to think in circles, hes set an example for everyone who has uncritically accepted or, worse, actually constructed a crap circle and thats most of us. The next time you find yourself preparing a circle for a presentation, ask yourself whether the process youre describing really works the way you say it does. And the next time a presenter trots out a circle to make a point, find the bogus links and put him on the spot. We could all benefit from a little more linear thinking.
But pie charts can be tricky for an audience to process when segments are similar in size its hard to distinguish between them at a glance. If youre running into that problem, consider displaying the same data in a linear way. In this bar chart, for example, you draw attention to the poorest-performing unit, a point that got lost in the pie: Its confusing especially if you project it for five seconds and then move on. And even if you leave it up for five minutes while you talk, anyone whos struggling to derive meaning from it wont be paying much attention to what you have to say. Theyll be too busy squinting from their seats, trying to navigate all those heavy grid lines that give every single cell equal weight. Its not at all clear where the eye should go. Your audience wont know what direction to read horizontally or vertically or what conclusions to draw. Though the grand total line is emphasized, is that really the main point you want to convey? Now lets look at the data presented more simply. Say youve identified three business units with potential for sustained growth in Europe. By eliminating the dense matrix and connecting only key
These few tricks will help audiences see what you want them to see in your data. By focusing their attention on the message behind the numbers, not on the numbers themselves, you can create presentations that resonate with them and compel them to act.
The recommendations are well worth considering and no doubt add value to the presentation. Any information that the presenter consider of signicant importance should be provided in another form as signicant takeaways to the discussion. Great post. Tim
The post you have written on data visualization is quite compelling. Visualizations help people see things that were not obvious to them before. Will Waugh
I created a visualization. This simple graphic, hand-rolled in a tool called Processing, shows the pools of the memorial as circles, with each name arranged on the edge of a ring. The lines between those names are the requested adjacencies names that should, as per the wishes of family members, be placed together:
A visualization of the victims names and requested adjacencies for the 9/11 memorial, fall 2009. I could have read the number of names and the number of requested connections from the spreadsheet. However, the key part of the problem ended up being in the physical distribution of those connections, which showed up only after sketching. This quick visualization also showed me that the connections were not evenly balanced between the two pools; indeed, they were heavily concentrated in one of the two pools. By building a bespoke visualization quickly, I put the data into a visual form that fit its structure specifically and got to the core of the problem. Getting this insight into the character of the data quickly changed my sense that developing an algorithm was impossible. I could now see that it seemed possible. I think of these small visualization steps as sketch points. I dont have to put too much thought into their aesthetics because they arent built for public consumption. I make my sketch points in some kind of expressive medium (like Processing) as opposed to a quicker but more constrained tool (like Excel or Tableau) in order to tailor them to the specifics of the data as closely as possible. In this fashion these stops along the way become low-investment testing grounds for new ideas and unusual approaches.
Here are a few sketch points from recent projects, each of which represented a turningpoint in my thinking:
A sketch point from a 2011 visualization of 138 years of Popular Science. A sketch point from the development of a new visualization of ad placement networks, 2013. None of these is meant for public consumption. Youre looking at my efforts to work out a problem, see what Im up against, and find in the sketches potential ways forward. By thinking about visualization as a process instead of an outcome, we arm ourselves with an incredibly powerful thinking tool. By splitting this process into small, bespoke sketch points, we can engage with the character of our data more specifically and access a broader and more varied solution space. Data visualization becomes much more than just the end of a sentence.
Thank you so much for sharing and encouraging rightbrain thinking in the process of presenting data. Angela222
TELLING STORIES WITH VISUAL DATA: A GLIMPSE INTO THE FUTURE OF NARRATIVE
BY JER THORP
Editors note: Weve asked contributors to the Visualizing Data Insight Center to show us some of their favorite examples of data viz, with short explanations of what makes those visualizations so effective. Today, Jer Thorp shares one of his favorites. Below is a screen grab of a masterful interactive data visualization. This narrative-driven piece by Pitch Interactive manages the extratricky task of balancing heavy subject matter with a clear story and compelling visuals. Its a glimpse into the future of data-driven storytelling. Perhaps the most interesting thing about the piece is that it wasnt commissioned by a media organization. It was built by Pitch as a way to explore and understand this complex topic. Bravo. The full data visualization is here its worth watching and scrolling over for a more in-depth view. View more data visualization examples and best practices in our monthlong series on data visualization.
Earned Income Tax Credit, dramatically changing the direction of their research. When the datas forms, shapes, and curves become second nature, the really fun part begins: asking it more questions. It is tempting to start a data visualization project with an idea for a striking visual artifact illustrating a conclusion and then work backward. Much like our initial conversation to discuss goals, clients will often say, We know that X is true, and we want a data visualization showing that. But is it really true? Even if it is, is that really the right fact to communicate about the data? More often than not, challenging some of these assumptions can have a profound impact not just from the perspective of the immediate project but also for the organization as a whole. The data, then, exists for two purposes: verifying what we already know and exposing us to what we dont. Having the time and space to explore both of these goals in a phase of analysis is crucial to preparing the data to be visualized. When we truly find the data to support what were trying to communicate, the visual forms will emerge naturally. Now yes now we start actually visualizing the data. Its time to think about the audience. What are its characteristics? What decisions are readers or constituents expected to make based on the visualization? For highly engaged audiences with their own questions, an exploratory tool is best, like this one from the University of Utah:
For a busy audience, a high-level, static graphic may be most appropriate, such as this one from The Economist:
In practice, this pipeline feeds itself. As we understand our data, it brings up new and better questions for us to answer. Budgeting time for this exploration is necessary in order to create a valuable resource that will help you communicate with your audience. The greatest mistake we see with data visualization is taking the easy way out oversimplification of the data (such as averaging it out) can hide intricacies and important patterns, but visualizing all the data often results in hairballs, or distributions with no patterns. In the end, the data visualization workflow requires you to be surprised. When working with data, you can expect to be wrong, to fail fast, and to fail often. More than any other engineering practice, data visualization requires an iterative approach to account for the changing nature of your findings as you work. This approach is strongly supported by modern data visualization practices on the Open Web. The web offers us tools for rapid prototyping, instant support and feedback, development practices that can evolve fast and grow with the changing ecosystem, and powerful tools for interactive software development. Working on the Open Web does not eliminate the complexity of the underlying problems were trying to solve, but it does offer users and new developers the ability to read, tinker with, and share code because of the collaborative nature of the ecosystem. Youre never alone. Regardless of whether youre a startup or an established organization, the Open Web can prove invaluable in making sense of your data and presenting it well.
Always asking questions is such a vital part of getting to a worthwhile place. Chicago Style SEO
Many of these approaches can be combined in the right circumstances. When the delivery format is understood and the constraints of the audience and their expectations clear, its time to play around with visual designs. Experimentation is crucial. Not all visual designs are created equal; depending on the relationships in the data, some may be more appropriate than others. Targeting a certain format and screen size introduces a different set of constraints and possibilities. The best solution is often the simplest one, and rapid iteration and prototyping are the only ways to ensure that we are representing the data and narrative accurately.
Pace through the data. By layering information over time, you isolate specific data sets to minimize the odds that you overwhelm your audience. Comparing numbers over time can also help identify patterns and highlight trends. Animating these patterns helps bridge the comprehension gap between two sets of data. Minimize the number of canvases. By keeping to one canvas throughout the video, your viewer will be able to understand the full frame of reference. You can zoom in and pan across the overall chart to highlight or tease out nuggets of information. For example, The New York Times classic One Race, Every Medalist Ever relies solely on camera moves throughout the entire video. The objects and data remain static. With intentional angles and views, this single canvas afforded multiple perspectives to visualize the data and tell the historical story of the event.
Add motion purposefully. Do not add animation for animations sake; make sure that youre using it to convey meaning. Avoid using motion as a type and motion study, as in many college animation assignments. Use motion to show growth, demonstrate a shift over time, or emphasize a piece of data. These points will give your visualized data more impact only if they work in concert with the main goals: Simplify the complex and ensure that the data provide insight.
There is more. My former student Caroline Ziemkiewicz and I found that there is a potential interaction between the visual metaphor used to show data and the linguistic metaphor used to ask a
question. We found this when looking at visualizations of trees, or hierarchies. The two most popular visualization techniques for this type of data, treemaps and node-link diagrams, differ in the way they show the hierarchy. Node-link diagrams use levels (or aboveness), while treemaps use nesting. A question asked using a levels metaphor (Which of the nodes below node D ...) is easier to answer using the node-link diagram, which uses a compatible metaphor, than is one asked using containment (Which of the directories inside directory D ...), which works better with a treemap. The different metaphors are illustrated below, with treemaps on the left and node-link diagrams on the right.
We only scratched the surface on this. There are many other metaphors that are used in visualization, whether obvious or not. Barbara Tversky and Jeff Zacks found in the early 2000s that lines imply transitions whereas bars imply individual values. The seemingly simple choice between a bar graph and a line chart has implications on how we perceive the data. Bizarrely, so does gravity. In our work on metaphors, Ziemkiewicz and I found that people interpreted round shapes as unstable because, they said, they might roll away. But for shapes to roll, there must be a force that causes the movement. After studying this effect some more, we found that the points in a scatter plot attract each other and that they are seemingly pulled down by gravity. We remember points not where they are in the plot but rather shift them toward clusters in our memory and let them drift slightly downward. Findings and distinctions in visualization can be subtle, but they can have a profound impact on how well we can read the information and how we interpret it. There is much more to be learned about how visualization works and how best we can use it to represent, analyze, and communicate data.
Key Learnings
Visualization helps people cope with massive amounts of data.
Graphic depictions have always been a key part of how humans communicate and share information, going back to the days of cave painting. In the Cabinet War Rooms used by Churchill during World War II, walls were covered with maps and color-coded pieces of string. The ability to visualize critical information allowed leaders to be able to quickly and easily understand the situationwhich remains true today. Since WWII, the amount of data in our digital universe has grown exponentially. The volume, mode, quality, speed, and granularity of data have all changed. But the importance of visualization has not. In this ocean of information, the need for visualization to help understand the data is already great and is growing.
Overview
In todays data-rich environment, where the quantity and speed of data are growing exponentially, it is more critical than ever to be able to accurately interpret and quickly act on incoming information. But humans are poorly equipped to efficiently find meaningful patterns in overwhelmingly large databases and numerical tables; picturesborn from data visualizationwork much better. Great visualization provides efficiency at understanding data patterns, helps users derive more insights, and aligns people around a shared view of a situation. Visualization allows users to confirm what they know and exposes an organization to what it doesnt know. Tools can help even non-statisticians visualize data relationships that lead to faster insights and better decisions. Users, though, should be careful not to misrepresent data or to assume they already know the datas meaning.
The model you really need is to be showing people pictures, because pictures are what the human brain understands best. Jeremy Howard
Data visualization provides multiple benets, particularly the ability to glean new insights.
Sviokla sees three primary benefits of superior graphic visualization: Efficiency. Great visualizations are efficient in that they let people look at vast quantities of data quickly. Alignment. A great visualization helps create a shared view of a situation. Users are able to verify and share what they know, and people can become aligned on needed actions. Insight. Perhaps most important, the ability to visualize data can help an analyst or group achieve more insight into a problem and discover a new or even greater understanding.
Context
In a prescient HBR blog post, former Harvard Business School professor John Sviokla discussed why data visualization is so important and what its benefits are. In a recent HBR blog post, Jim Stikeleather, an executive strategist on innovation at Dell Services, described the key elements necessary to make data visualization work. In this Harvard Business Review webinar, data scientist Jeremy Howard, who has won numerous data visualization competitions, discussed and demonstrated how simple tools can turn information into insightful pictures.
I believe that we will naturally migrate toward superior visualizations to cope with this information ocean. John Sviokla
For example, a property and casualty insurer was able to combine visual information from Google Earth with data showing flood plain information (Figure 1). This helped the company better assess risk and enabled its salespeople to communicate to customers why they might have higher premiums. Exploration. Visualization can be used to help build a new model that allows us to predict and better manage a system. The practice of using visual discovery in lieu of statistics is called exploratory data analysis, and too few businesses currently make use of it.
Ultimately, data visualization is about communicating an idea that will drive action. Jim Stikeleather
For visualization to have value, the data used must be interpretable, relevant, and novel.
In Stikeleathers blog, he emphasizes his support for visualization, but he stresses that visualization must serve an informing purpose and must be based on meaningful data. Stikeleathers criteria for the type of data needed to produce meaningful insights through visualization are that data must be: Interpretable. With so much unstructured data today, it is critical that the data being analyzed generate interpretable information. Collecting lots of data without the associated metadata such as what is it and where when, how, and by whom it was collectedreduces the opportunity to play with, interpret, and gain insights from it. Relevant. The data must be relevant to the persons who are looking to gain insights and to the purpose for which the information is being examined. Novel. For a visualization to be meaningful, the data used must be original or shed new light on an area. If the information fails any of these criteria, then even the greatest special effects cant make a visualization valuable. That means that only a tiny slice of the data we can bring to life visually will actually be worth the effort.
FIGURE 1
In addition to arranging the information to create shared understanding, visualization gives us the ability to combine data in order to create new insight quickly and clearly. John Sviokla
In his blog, Stikeleather provides his take on the reasons to construct data visualizations. He sees them as: Confirmation. If we already have assumptions about how a system operates, visualizations can help us check those assumptions. They can enable us to observe whether an underlying system has deviated from the model we had and assess the risk of actions we are about to undertake. This approach is used in some enterprise dashboards. Education. Visualization offers two forms of education:
n Reporting. This is how we measure the performance of an underlying system, often in comparison to other systems or models. n
In most of the visualizations I see, people rst decide on the result theyre looking for or the stories that theyre creating. And then they create a chart to show it. You would be much better o rst creating the chart, nding out what the data says, and then creating your story. Jeremy Howard
One advantage of using third-party data scientists is that they look first at data without even understanding the business and then present this data to domain experts to help interpret what it means. Fresh eyes of outsiders can often find relationships in the data that daily users overlook. In addition to being misleading, there are other risks related to visualization that Stikeleather laid out. Data quality. The quality of the underlying data is crucial to the value of visualization. How complete and reliable is it? As with all analytical processes, putting garbage in means getting garbage out. Context. The point of visualization is to make large amounts of data approachable so we can detect patterns and draw insights from it. To do so, we need to be able to access all the potential relationships of the data elements. This context is the source of insight. To leave out any contextual information or metadata is to risk hampering our understanding. Biases. The creator of a visualization may influence the semantics and syntax of the elements of the visualization via color choices, positioning, and visual tricksany of which can challenge the interpretation of the data. This can significantly influence how viewers understand the visualization and what insight they will gather from it. Ignoring these risks can undermine the visualizations purpose and confuse rather than enlighten. Although data sets can appear overwhelmingly large, it is almost never necessary to analyze the full set. Excel users should learn how to randomly sample rows, perhaps 10%, and then analyze just that selection.
When you turn things from tables of numbers to pictures, you get a lot more insight. Jeremy Howard
Conclusion
An excellent visualization, according to Edward Tufte, expresses complex ideas communicated with clarity, precision, and efficiency. Clearly, excellent data visualization also tells a story through the graphical depiction of statistical information. When you are creating a visualization in an educational or confirmational role, it is really a dynamic form of persuasion. Few forms of communication are as persuasive as a compelling narrative. To this end, the visualization needs to tell a story to the audience. Its the story that helps the viewer gain insight from the data.
Additional Information
Play with the data. The data sets used by Howard in this webinar are available for users to play with. They can be accessed at www.jphoward.wordpress.com. Sparklines by Tufte. For more information about Sparklines, read an excerpt from Tuftes book Beautiful Evidence. Color blindness. When using Excels conditional formatting, opt for blue and red cells rather than blue and green, which are much more difficult for color-blind viewers to distinguish. Higher resolution. Improved visualization is possible today because of improved computing power and also because of the higher-resolution screens and projectors that now exist to display data. To learn more about SAS data visualization and analytic capabilities, visit sas.com/visualanalytics.
Several tools help non-statisticians get started in extracting insights from data.
While there is a growing number of data visualization tools available, a simple way to begin getting experienced with data visualization can be found in Microsoft Excel. Excel offers three useful functions that can assist in revealing data relationships: 1. Sparklines. These are tiny charts embedded in single Excel cells. They turn data into shapes that make it easier to spot consistent patterns as well as outliers. 2. Conditional formatting. Adding conditional formatting to a cell allows the application of different formatting options, such as color or font style, to a cell based on the data in the cell(s). For example, a cell could be colored red when its value is between two values. By assigning conditional formatting to different data ranges, an analyst can quickly map information frequency patterns that would be tedious to extract from a purely numerical table. 3. Pivot tables. This tool sorts, counts, and sums spreadsheet data and then creates a second table to display the summarized data. Done iteratively, it is a quick way to get a sense of what the data looks like.
2013 Harvard Business Publishing. All rights reserved.
Speaker Biographies
Jeremy Howard, President and Chief Scientist, Kaggle, Inc.
Jeremy Howard is president and chief scientist at Kaggle. Previously, he founded FastMail (sold to Opera Software) and Optimal Decisions (sold to ChoicePointnow called LexisNexis Risk Solutions). Prior to that he worked in management consulting at McKinsey & Company and at A.T. Kearney. Howard received his BA in philosophy from the University of Melbourne. The information contained in this summary reflects BullsEye Resources, Inc.s, subjective condensed summarization of the applicable conference session. There may be material errors, omissions, or inaccuracies in the reporting of the substance of the session. In no way does BullsEye Resources or Harvard Business Review assume any responsibility for any information provided or any decisions made based upon the information provided in this document.
Angelia Herrin (Moderator), Editor for Research and Special Projects, Harvard Business Review
Angelia Herrin is editor for research and special projects at Harvard Business Review. She oversaw the relaunch of the management newsletter line and established the conference and virtual seminar division for Harvard Business Review. More recently, Herrin created a new series to deliver customized programs and products to organizations and associations. Prior to coming to Harvard Business Review, Herrin was the vice president for content at womenConnect.com, a website focused on women business owners and executives. Herrins journalism experience spans 20 years, primarily with Knight-Ridder newspapers and USA Today. At Knight-Ridder, she covered Congress as well as the 1988 presidential election. At USA Today, she worked as Washington editor, heading the 1996 election coverage. She won the John S. Knight Fellowship in Professional Journalism at Stanford University in 198990.
Created for Harvard Business Review by BullsEye Resources. www.bullseyeresources.com 2012 Harvard Business School Publishing.
LEARN MORE
DATA VISUALIZATION: WHAT IS IT AND WHY IS IT IMPORTANT?
sas.com/data-visualization/overview.html
TRY SAS VISUAL ANALYTICS: BROWSE INTERACTIVE REPORTS OR GET DEMO ACCESS TO EXPLORE A SAMPLE DATA SET AND BUILD A REPORT
sas.com/software/visual-analytics/demos/all-demos.html
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 106560_S107916.0613
hbr.org