IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 

Featured Concept
Hey baby!

Hey baby!This is the second article in a series on collaboration, which is fast becoming recognized as an essential, yet often hidden, ingredient in working efficiently and effectively. This series focuses on tools and methods that can demystify collaboration and help IBM's clients harness its power.

In February 2005, IBM researcher Martin Wattenberg created a Web-based visualization applet, the NameVoyager, to help call attention to his wife's first book, The Baby Name Wizard, a guide to American baby names. This effort to support his wife's project swept the Web and became a hot topic of conversation - for those searching for the perfect baby name and for others. Without any advertising, the applet drew more than 500,000 site visits in its first two weeks. It has been downloaded more than 900,000 times as of mid-April. Also in April, Google found more than 11,000 references to the NameVoyager.

The broad popularity and effectiveness of the NameVoyager is especially interesting because it is, in essence, an exploratory data analysis application. In many situations, from education to retirement planning, it is important to encourage people to interact with complex data sets. Understanding the factors that led a statistical exploration program to become a minor fad may shed light on broader solutions for encouraging people to engage in data mining.

Creating miners

Comments on the Web provide an unusual and informative window into the user experience, and Wattenberg has studied them extensively. While clearly not a scientific sample, since only the most enthusiastic people will comment, examining these statements suggests some interesting hypotheses regarding the source of popularity of the NameVoyager.

Hundreds of comments show that people are engaged in exploratory data analysis, identifying trends and anomalies, and forming conjectures. These reports show that usage patterns are strongly social and seem more closely related to those of online multiplayer games than to conventional single-user statistical tools. Indeed, users seem to fall neatly into Richard Bartle's well-known categorization of online game players as explorers, achievers, socializers or killers. This stands in contrast to the traditional view of information visualization as a task-oriented problem-solving activity.

Wattenberg hypothesizes that the broad popularity of the NameVoyager stems from features that give it a game-like sense of fun and make it suitable for social data analysis.



The NameVoyager
Figure 1. The NameVoyager

Easy interaction

The NameVoyager follows Ben Shneiderman's mantra of "overview first, zoom and filter, details on demand." When the applet starts, the viewer sees a set of stripes representing all names in the database. To filter this data, a user may type in letters, forming a prefix; the applet will then visualize data on only those names beginning with that prefix.

The applet reacts directly with each keystroke - the person does not have to press return. This instant interaction saves work and demonstrates how to mine the data. Someone might not think that searching the data set by prefix would be interesting, but seeing the striking patterns for single letters like O or K could encourage further exploration. In addition, the applet moves smoothly between states, so that when a letter is typed, an animated transition helps preserve context.

Figure 1 shows an example: typing "JO" will yield a graph with prominent stripes for popular names such as John and Jonathan, along with many thinner stripes for less popular names like Josette. Typing "LAT" highlights a trend in the African-American community in the 1970s, comprising names such as LaToya, LaTisha, and so on, as in Figure 2.

To learn details of a name, a viewer can use the mouse. Hovering over a name stripe will produce a pop-up box with numerical details for a given name at a given point in time. Clicking on a name stripe produces a graph of the popularity of that name alone.

A key distinction between the graphical display of the NameVoyager and other visualizations is the NameVoyager's use of a graph that sums all the time series. This technique seems likely to be of use in many other situations where summarizing is a natural operation, such as investigating product sales data.

Names Beginning with LAT
Figure 2. Names Beginning with LAT

Social data analysis

One of the most consistent themes seen in comments about the NameVoyager is that exploring the data has become a social activity. This is true even for loosely knit groups of Web users. For example, here is a quotation from the comments section of one blog:

"More challenges: which is the steadiest popular name? Victor?" and "Which letter has gone down most consistently? W? Observation: Note the recent upsurge in Y; basically all due to Hispanic (and some Middle Eastern) names"

This quotation illustrates two points. First, it shows how a group of people is using the NameVoyager as a stimulus to conversation and repartee. Second, it reveals an effective style of data analysis: this group is diving very deeply into the data set. They are setting pattern-finding challenges, noting outlying data points, and making guesses about causal relations. Each person seems to be building on the findings of others, making the group extremely effective - all while having fun. Wattenberg refers to data mining through dialogue as social data analysis, a version of exploratory data analysis that relies on social interaction as a source of motivation.

Design hypotheses for social data analysis

To Wattenberg, the evidence suggests that a large part of the power and popularity of the NameVoyager derives from the fact that it encourages a social style of data analysis. It is easily accessible on the Web so that a large group of people can see it. The interactive design, referred to on the Web with such terms as "cool," "fantastic," and "whizzy," means the applet is something that people may be eager to associate themselves with, like a fashionable piece of clothing.

These factors, however, would apply to anything trendy on the Web. The important discovery would be aspects of the applet's popularity that are specific to information visualization.

Wattenberg's first hypothesis is that a combination of common ground with unique individual perspectives will encourage social data analysis. In the case of the NameVoyager, the common ground is shared understanding of cultural connotations of names. Similarly, many names relate to celebrities, pop culture icons or historical figures. This common ground is what makes conversation about the data possible and interesting. Some sample quotes:

"Look what the Simpsons did to the name Bart."

"Roosevelt has two spikes right about where you'd expect them."

The authors of these comments are sharing results of their data mining because they know that their readers will understand the cultural references.

At the same time, Wattenberg believes that it is helpful for each person to have a naturally unique perspective on the data. This individual viewpoint can serve as a kind of icebreaker in the conversation and, because each person is approaching the data in a different way, a group may collectively explore more pieces of the data. Wattenberg refers to this pattern as the "common ground but unique perspective" principle. In the case of the NameVoyager, each person has one obvious point of entry: his or her own name. Applying this principle in other situations may require some flexibility in the data set, but it may also be possible to guide people without modifying the data. For instance, a visualization tool designed to help people understand different stock market investment strategies. Using well-known companies or events as landmarks could provide common ground. At the same time, there are several unique perspectives that people might take: looking at how their own company's stock has performed, or how the market as a whole did at significant points in their lives.

A second hypothesis is that the interface is an important part of a tool's being used socially. In many cases a group of two or more people used the NameVoyager together, which is to be expected in the case of two parents-to-be trying to find a name they both like. No matter how many people are working together, however, there are two distinct user roles: One person will control the input, while others in the group will act as spectators. Traditionally, interface designers focused on the active participant. With the NameVoyager, however, the prominent text area makes it easy for someone peering over the shoulder of a user to see what is being typed. The immediate letter-by-letter changes in the graphs give the display a live-action quality, allowing spectators to see each step of the user's thinking process. The animation emphasizes the results of the typing, and links successive states in a coherent progression.

The final hypothesis about how the NameVoyager encourages social data exploration is that it allows people to share the state of the visualization at any point in their explorations. Because the interaction model is so simple-just a matter of typing a few letters-it is very easy to guide other people to the same state. And indeed, many comments on the Web are written in the imperative voice, encouraging others to track "Adolph" or "Hillary." In this way, solitary, asynchronous usage can become a shared experience.

Wattenberg believes that exploring further frameworks and design principles related to social data analysis will be a fruitful avenue of investigation. In addition, IBM Research is working with partners on new applications of the basic NameVoyager visualization technique. For example, together with O'Reilly & Associates, the well-known technical book publisher, IBM has created a prototype "BookVoyager," which lets users explore trends in books sales in hundreds of different categories. This application, recently demonstrated at OSCON 2005, includes new features that help the visualization method scale to larger and more complex data sets.

BookVoyager
Figure 3. BookVoyager

Do you have a set of data that we could help you visualize? If so, contact us at mwatten@us.ibm.com.

Read the first article in a series on collaboration.


####

Martin Wattenberg is a mathematician whose research interests include information visualization and its application to collaborative computing, journalism, bioinformatics, and art. Before joining IBM, Martin was the Director of Research and Development at SmartMoney.com, where he designed internet-based financial software. His work at SmartMoney included the groundbreaking Map of the Market, which visualizes live data on hundreds of publicly traded companies.

Martin has also worked with nonfinancial data ranging from email archives to DNA sequences. In addition he is well known for artistic data visualization, visualizing such disparate information sources as music, museum collections, and web searches. His artwork has been exhibited internationally, including at the Whitney Museum of American Art, the New Museum, and Ars Electronica. Martin holds a PhD in Mathematics from the University of California at Berkeley.


    About IBMPrivacyContact