Chapter 4: Creating Specialized Graphs

The first three chapters looked at different ways to create many common types of charts with JavaScript. But if your data has unique properties, or if you want to show it in an unusual way, a more specialized chart might be more appropriate than a typical bar, line or scatter plot.

Fortunately, there are many JavaScript techniques and plugins to expand our visualization vocabulary beyond the standard charts. In this chapter, we’ll look at approaches for several specialized chart types, including:

How to combine hierarchy and dimension with tree maps
How to highlight regions with heat maps
How to show links between elements with network graphs
How to reveal language patterns with word clouds

Visualizing Hierarchies with Tree Maps

Data that we want to visualize can often be organized into a hierarchy, and in many cases that hierarchy is itself an important aspect of the visualization. This chapter considers several tools for visualizing hierarchical data, and we’ll begin the examples with one of the simplest approaches: tree maps. Tree maps represent numeric data with two-dimensional areas, and they indicate hierarchies by nesting subordinate areas within their parent.

There are several algorithms for constructing tree maps from hierarchical data; one of the most common is the squarified algorithm developed by Bruls, Huizing, and van Wijk. This algorithm is favored for many visualizations because it usually generates visually pleasing proportions for the tree map area. To create the graphics in our example, we can use Imran Ghory’s Treemap-Squared library. That library includes code for both calculating and drawing tree maps.

Step 1: Include the Required Libraries

The treemap-squared library itself depends on the Raphaël library for low-level drawing functions. Our markup, therefore, must include both libraries. The Raphaël library is popular enough for public content distribution networks to support. In line 9 of the example markup below we’re relying on CloudFlare’s CDN. We’ll have to use our own resources, however, to host the treemap-squared library, and we do so in line 10.

Note: Chapter 2 includes a more extensive discussion of content distributions networks and the trade-offs involved in using them.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title></title>
  </head>
  <body>
    <div id="treemap"></div>
    <script 
      src="//cdnjs.cloudflare.com/ajax/libs/raphael/2.1.0/raphael-min.js">
    </script>
    <script src="js/treemap-squared-0.5.min.js"></script>
  </body>
</html>

As you can see, we’ve set aside a <div> to hold our tree map. We’ve also included the JavaScript libraries as the last part of the <body> element, as that provides the best browser performance.

Step 2: Prepare the Data

For our example we’ll show the population of the United States divided by region and then, within each region, by state. The data is available from the US Census Bureau. We’ll follow its convention and divide the country into four regions. The resulting JavaScript array could look like the following snippet.

census = [
  { region: "South", state: "AL", pop2010: 4784762, pop2012: 4822023 },
  { region: "West",  state: "AK", pop2010:  714046, pop2012:  731449 },
  { region: "West",  state: "AZ", pop2010: 6410810, pop2012: 6553255 },
  // Data set continues...

We’ve retained both the 2010 and the 2012 data

To structure the data for the treemap-squared library, we need to create separate data arrays for each region. At the same time, we can also create arrays to label the data values using the two-letter state abbreviations. The following code steps through the census array to build data and label arrays for the "South" region. The same approach works for the other three regions as well.

var south = {};
south.data = [];
south.labels = [];
for (var i=0; i<census.length; i++) {
    if (census[i].region === "South") {
        south.data.push(census[i].pop2012);
        south.labels.push(census[i].state);
    }
}

Step 3: Draw the Tree Map

Now we’re ready to use the library to construct our tree map. We need to assemble the individual data and label arrays and then call the library’s main function. The first two parameters in line 3 are the width and height of the map.

var data = [ 
    west.data, midwest.data, northeast.data, south.data
];
var labels = [ 
    west.labels, midwest.labels, northeast.labels, south.labels
];
Treemap.draw("treemap", 600, 450, data, labels);

The resulting chart, shown in figure , provides a simple visualization of the US population. Among the four regions, it is clear where most of the population resides. The bottom right quadrant (the South) has the largest share of the population. And within the regions the relative size of each state’s population is also clear. Notice, for example, how California dominates the West.

Tree maps show the relative size of data values using rectangular area.

Step 4: Varying the Shading to Show Additional Data

The tree map in figure does a nice job of showing the US population distribution in 2012. The population isn’t static, however, and we can enhance our visualization to indicate trends by taking advantage of the 2010 population data that’s still lurking in our data set. When we iterate through the census array to extract individual regions, we can also calculate a few additional values:

We accumulate the total population for all states, both in 2010 and in 2012, in lines 11 and 12, respectively. These values let us calculate the average growth rate for the entire country.
For each state we can calculate its own growth rate in line 13.
For each region, we save both the minimum and maximum growth rates in lines 18 and 19.

Here’s an expanded version of our earlier code fragment that includes these additional calculations.

var total2010 = 0;
var total2012 = 0;
var south = {
    data: [],
    labels: [],
    growth: [],
    minGrowth: 100,
    maxGrowth: -100
};
for (var i=0; i<census.length; i++) {
    total2010 += census[i].pop2010;
    total2012 += census[i].pop2012;
    var growth = (census[i].pop2012 - census[i].pop2010)/census[i].pop2010;
    if (census[i].region === "South") {
        south.data.push(census[i].pop2012);
        south.labels.push(census[i].state);
        south.growth.push(growth);
        if (growth > south.maxGrowth) { south.maxGrowth = growth; }
        if (growth < south.minGrowth) { south.minGrowth = growth; }
    }
    // Code continues...
}

In the same way that we created a master object for the data and the labels, we create another master object for the growth rates. Let’s also calculate the total growth rate for the country overall.

var growth = [ 
    west.growth, midwest.growth, northeast.growth, south.growth
];
var totalGrowth = (total2012 - total2010)/total2010;

Now we need a function to calculate the color for a tree map rectangle. We start by defining two color ranges, one for growth rates higher than the national average and another for lower growth rates. We can then pick an appropriate color for each state, based on that state’s growth rate. As an example, here’s one possible set of colors.

var colorRanges = { 
  positive: [ 
    "#FFFFBF","#D9EF8B","#A6D96A","#66BD63","#1A9850","#006837" 
  ],
  negative: [ 
    "#FFFFBF","#FEE08B","#FDAE61","#F46D43","#D73027","#A50026" 
  ]
};

Next is the pickColor function that uses these color ranges to select the right color for each box. The treemap-squared library will call it with two parameters—the coordinates of the rectangle it’s about to draw, and the index into the data set. We don’t need the coordinates in our example, but we will use the index to find the value to model. Once we find the state’s growth rate, we can subtract the national average. That calculation determines which color range to use. States that are growing faster than the national average get the positive color range; states growing slower than the average get the negative range.

The final part of the code calculates where on the appropriate color range to select the color. It uses a linear scale based on the extreme values from among all the states. So, for example, if a state’s growth rate is halfway between the overall average and the maximum growth rate, we’ll give it a color that’s halfway in the positive color range array.

function pickColor(coordinates, index) {
    var regionIdx = index[0];
    var stateIdx  = index[1];
    var growthRate = growth[regionIdx][stateIdx];
    var deltaGrowth = growthRate - totalGrowth;
    if (deltaGrowth > 0) {
        colorRange = colorRanges.positive;
    } else {
        colorRange = colorRanges.negative;
        deltaGrowth = -1 * deltaGrowth;
    }
    var colorIndex = Math.floor(colorRange.length * 
        (deltaGrowth - minDelta) / (maxDelta - minDelta));
    if (colorIndex >= colorRange.length) { 
        colorIndex = colorRange.length - 1; 
    }
    
    color = colorRange[colorIndex];
    return{ "fill" : color };
}

Now when we call TreeMap.draw(), we can add this function to its parameters, specifically by setting it as the value for the box key of the options object. The treemap-squared library will then defer to our function for selecting the colors of the regions.

Treemap.draw("treemap", 600, 450, data, labels, {'box' : pickColor});

The resulting tree map of figure still shows the relative populations for all of the states. Now, through the use of color shades, it also indicates the rate of population growth compared to the national average. The visualization clearly shows the migration from the Northeast and Midwest to the South and West.

Tree maps can use color as well as area to show data values.

Highlighting Regions with a Heat Map

If you work in the web industry, heat maps may already be a part of your job. Usability researchers often use heat maps to evaluate site designs, especially when they want to analyze which parts of a web page get the most attention from users. Heat maps work by overlaying values, represented as semi-transparent colors, over a two-dimensional area. As the example in figure shows, different colors represent different levels of attention. Users focus most on areas colored red, and less on yellow, green, and blue areas.

Heat maps traditionally show where web users focus their attention on a page.

For this example we’ll use a heat map to visualize an important aspect of a basketball game: where on the court the teams are scoring most of their points. The software we’ll use is the heatmap.js library from Patrick Wied. If you need to create traditional web site heat maps, that library includes built-in support for capturing mouse movements and mouse clicks on a web page. Although we won’t use those features for our example, the general approach is much the same.

Step 1: Include the Required JavaScript

For modern browsers, the heatmap.js library has no additional requirements. The library includes optional additions for real-time heat maps and for geographic integration, but we won’t need these in our example. Older browsers (principally Internet Explorer version 8 and older) can use heatmap.js with the explorer canvas library. Since we don’t need to burden all users with this library, we’ll use conditional comments to include it only when it’s needed. Following current best practices, we include all script files at the end of our <body>.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title></title>
  </head>
  <body>
    <!--[if lt IE 9]><script src="js/excanvas.min.js"></script><![endif]-->
    <script src="js/heatmap.js"></script>
  </body>
</html>

Step 2: Define the Visualization Data

For our example, we’ll visualize the NCAA Mens’ Basketball game on 13 February 2013 between Duke University and the University of North Carolina. Our dataset contains details about every point scored in the game. To clean the data, we convert the time of each score to minutes from the game start, and we define the position of the scorer in x- and y-coordinates. We’ve defined these coordinates using several important conventions:

We’ll show North Carolina’s points on the left side of the court and Duke’s points on the right side.
The bottom-left corner of the court corresponds to position (0,0), and the top-right corner corresponds to (10,10).
To avoid confusing free throws with field goals, we’ve given all free throws a position of (–1,–1).

Here’s the beginning of the data; the full data is available with the book’s source code.

var game = [
    { team: "UNC",  points: 2, time: 0.85, 
      unc: 2, duke: 0, x: 0.506, y: 5.039 },
    { team: "UNC",  points: 3, time: 1.22, 
      unc: 5, duke: 0, x: 1.377, y: 1.184 },
    { team: "DUKE", points: 2, time: 1.65  
      unc: 5, duke: 2, x: 8.804, y: 7.231 },
    // Data set continues...

Step 3: Create the Background Image

A simple diagram of a basketball court, like that in figure , works fine for our visualization. The dimensions of our background image are 600 by 360 pixels.

A background image sets the context for the visualization.

Step 4: Set Aside an HTML Element to Contain the Visualization

In our web page, we need to define the element (generally a <div>) that will hold the heat map. When we create the element, we specify its dimensions, and we define the background. The following fragment does both of those using inline styles to keep the example concise. You might want to use a CSS stylesheet in an actual implementation

<div id='heatmap' 
    style="position:relative;width:600px;height:360px;"+
          "background-image:url('img/basketball.png');">
</div>

Notice that we’ve given the element a unique id. The heatmap.js library needs that id to place the map on the page. Most importantly, we also set the position property to relative. The heatmap.js library positions its graphics using absolute positioning, and we want to contain those graphics within the parent element.

Step 5: Format the Data

For our next step, we must convert the game data into the proper format for the library. The heatmap.js library expects individual data points to contain three properties:

the x coordinate, measured in pixels from the left of the containing element
the y coordinate, measured in pixels from the top of the containing element
the magnitude of the data point (specified by the count property)

The library also requires the maximum magnitude for the entire map, and here things get a little tricky. With standard heat maps, the magnitudes of all the data points for any particular position sum together. In our case that means that all the baskets scored from layups and slam dunks—which are effectively from the same position on the court—are added together by the heat map algorithm. That one position, right underneath the basket, dominates the rest of the court. To counteract that effect, we specify a maximum value far less than what the heat map would expect. In our case, we’ll set the maximum value to 3, which means that any location where at least three points were scored will be colored red, and we’ll easily be able to see all the baskets.

We can use JavaScript to transform the game array into the appropriate format. We start by fetching the height and width of the containing element in lines 1-3. If those dimensions change, our code will still work fine. Then we initialize the dataset object with a max property and an empty data array in lines 4-6. Finally, we iterate through the game data and add relevant data points to this array. Notice that we’re filtering out free throws in line 9.

var docNode = document.getElementById("heatmap");
var height = docNode.clientHeight;
var width  = docNode.clientWidth;
var dataset = {};
dataset.max = 3;
dataset.data = [];
for (var i=0; i<game.length; i++) {
    var currentShot = game[1];
    if ((currentShot.x !== -1) && (currentShot.y !== -1)) {
        var x = Math.round(width  * currentShot.x/10);
        var y = height - Math.round(height * currentShot.y/10);
        dataset.data.push({"x": x, "y": y, "count": currentShot.points});
    }
}

Step 6: Draw the Map

With a containing element and a formatted data set, it’s a simple matter to draw the heat map. We create the heat map object (the library uses the name h337 in an attempt to be clever) by specifying the containing element, a radius for each point, and an opacity. Then we add the data set to this object.

var heatmap = h337.create({
    element: "heatmap",
    radius: 30,
    opacity: 50
});
heatmap.store.setDataSet(dataset);

The resulting visualization in figure shows where each team scored its points.

The heat map shows successful shots in the game.

Step 7: Adjust the Heat Map z-index

The heatmap.js library is especially aggressive in its manipulation of the z-index property. To ensure that the heat map appears above all other elements on the page, the library explicitly sets this property to a value of 10000000000. If your web page has elements that you don’t want the heat map to obscure (such as fixed-position navigation menus), that value is probably too aggressive. You can fix it by modifying the source code directly. Or, as an alternative, you can simply reset the value after the library finishes drawing the map.

If you’re using jQuery, the following code will reduce the z-index to a more reasonable value.

$("#heatmap canvas").css("z-index", "1");

Showing Relationships with Network Graphs

Visualizations don’t always focus on the actual data values; sometimes the most interesting aspects of a data set are the relationships among its members. The relationships between members of a social network, for example, might be the most important feature of that network. To visualize these types of relationships, we can use a network graph. Network graphs represent objects, generally known as nodes, as points or circles. Lines or arcs (technically called edges) connect these nodes to indicate relationships.

Constructing network graphs can be a bit tricky, as the underlying mathematics is not always trivial. Fortunately, the sigmajs library takes care of most of the complicated calculations. By using that library, we can create full-featured network graphs with just a little bit of JavaScript. For our example, we’ll consider one critic’s list of the top 25 jazz albums of all time. Several musicians performed on more than one of these albums, and a network graph lets us explore those connections.

Step 1: Include the Required Libraries

The sigmajs library does not depend on any other JavaScript libraries, so we don’t need any other included scripts. It is not, however, available on common content distribution networks. Consequently, we’ll have to serve it from our own web host.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title></title>
  </head>
  <body>
    <div id="graph"></div>
    <script src="js/sigma.min.js"></script>
  </body>
</html>

As you can see, we’ve set aside a <div> in line 8 to hold our graph. We’ve also included the JavaScript library as the last part of the <body> element, as that provides the best browser performance.

Note: In most of the examples in this book, I’ve included steps you can take to make your visualizations compatible with older web browsers such as Internet Explorer 8. In this case, however, those approaches degrade performance so severely that they are rarely workable. To view the network graph visualization, your users will need a modern browser.

Step 2: Prepare the Data

Our data on the top 25 jazz albums looks like the following snippet. I’m showing only the first couple of albums below, but you can see the full list in the book’s source code.

var albums = [
  {
    album: "Miles Davis - Kind of Blue",
    musicians: [
      "Cannonball Adderley",
      "Paul Chambers",
      "Jimmy Cobb",
      "John Coltrane",
      "Miles Davis",
      "Bill Evans"
    ]
  },{
    album: "John Coltrane - A Love Supreme",
    musicians: [
      "John Coltrane",
      "Jimmy Garrison",
      "Elvin Jones",
      "McCoy Tyner"
    ]
  // Data set continues...

That’s not exactly the structure that sigmajs requires. We could convert it to a sigmajs JSON data structure in bulk, but there’s really no need. Instead, as we’ll see in the next step, we can simply pass data to the library one element at a time.

Step 3: Define the Graph’s Nodes

Now we’re ready to use the library to construct our graph. We start by initializing the library and indicating where it should construct the graph. That parameter is the id of the <div> element set aside to hold the visualization.

var s = new sigma("graph");

Now we can continue by adding the nodes to the graph. In our case, each album is a node. As we add a node to the graph, we give it a unique identifier (which must be a string), a label, and a position. Figuring out an initial position can be a bit tricky for arbitrary data. In a few steps we’ll look at an approach that makes the initial position less critical. For now, though, we’ll simply spread our albums in a circle using basic trigonometry. The radius value is roughly half of the width of the container. We can also give each node a different size, but for our purposes it’s fine to set every album’s size to 1.

for (var idx=0; idx<albums.length; idx++) {
    var theta = idx*2*Math.PI / albums.length;
    s.graph.addNode({
        id: ""+idx,   // Note: 'id' must be a string
        label: albums[idx].album,
        x: radius*Math.sin(theta),
        y: radius*Math.cos(theta),
        size: 1
    });
}

Finally, after defining the graph, we tell the library to draw it.

s.refresh();

With figure we now have a nicely drawn circle of the top jazz albums of all time. In this initial attempt some of the labels may get in one another’s way, but we’ll address that shortly.

Sigmajs draws graph nodes as small circles.

If you try out this visualization in the browser, you’ll notice that the sigmajs library automatically supports panning the graph, and users can mover their mouse pointer over individual nodes to highlight the node labels.

Step 4: Connect the Nodes with Edges

Now that we have the nodes drawn in a circle, it’s time to connect them with edges. In our case, an edge—or connection between two albums—represents a musician who performed on both of the albums. To find those edges, we iterate through the albums in four stages.

Loop through each album as a potential source of a connection (line 1).
For the source album, loop through all musicians (line 3).
For each musician, loop through all of the remaining albums as potential targets for a connection (line 5).
For each target album, loop through all the musicians looking for a match (line 7).

For the last step we’re using the some() method of JavaScript arrays. That method takes a function as a parameter, and it returns true if that function itself returns true for any element in the array.

for (var srcIdx=0; srcIdx<albums.length; srcIdx++) {
    var src = albums[srcIdx];
    for (var mscIdx=0; mscIdx<src.musicians.length; mscIdx++) {
        var msc = src.musicians[mscIdx];
        for (var tgtIdx=srcIdx+1; tgtIdx<albums.length; tgtIdx++) {
            var tgt = albums[tgtIdx];
            if (tgt.musicians.some(function(tgtMsc) {
                return tgtMsc === msc;
            })) {
                s.graph.addEdge({
                    id: srcIdx + "." + mscIdx + "-" + tgtIdx,
                    source: ""+srcIdx,
                    target: ""+tgtIdx
                })
            }
        }
    }
}

We’ll want to insert this code before we refresh the graph. When we’ve done that, we’ll have a connected circle of albums, as shown in figure .

Sigmajs can then connect graph nodes using lines to represent edges.

Again, you can pan the graph to focus on different parts.

Step 5: Automate the Layout

So far we’ve manually placed the nodes in our graph in a circle. That’s not a terrible approach, but it can make it hard to discern some of the connections. It would be better if we could let the library calculate a more optimal layout than the simple circle. That’s exactly what we’ll do now.

The mathematics behind this approach is known as force directed graphing. In a nutshell, the algorithm proceeds by treating the graph’s nodes and edges as physical objects subject to real forces such as gravity and electromagnetism. It simulates the effect of those forces, pushing and prodding the nodes into new positions on the graph.

The underlying algorithm may be complicated, but sigmajs makes it easy to employ. First we have to add an optional plugin to the sigmajs library. That’s the forceAtlas2 plugin in line 10 below.

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="utf-8">
        <title></title>
    </head>
    <body>
        <div id="graph"></div>
        <script src="js/sigma.min.js"></script>
        <script src="js/sigma.layout.forceAtlas2.min.js"></script>
    </body>
</html>

Mathieu Jacomy and Tommaso Venturini developed the specific force-direction algorithm employed by this plugin; they document the algorithm, known as Force Atlas 2, in the 2011 paper ForceAtlas2, A Graph Layout Algorithm for Handy Network Visualization. Although we don’t have to understand the mathematical details of the algorithm, knowing how to use its parameters does come in handy. There are three parameters that are important for most visualizations that use the plugin:

gravity. This parameter determines how strongly the algorithm tries to keep isolated nodes from drifting off the edges of the screen. Without any gravity, then the only force acting on isolated nodes will one that repels them from other nodes; undeterred, that force will push the nodes off the screen entirely. Since our data includes several isolated nodes, we’ll want to set this value relatively high to keep those nodes on the screen
scalingRatio. This parameter determines how strongly nodes repel each other. A small value draws connected nodes closer together while a large value forces all nodes farther apart.
slowDown. This parameter decreases the sensitivity of the nodes to the repulsive forces from their neighbors. Reducing the sensitivity (by increasing this value) can help reduce the instability that may result when nodes face competing forces from multiple neighbors. In our data there are many connections that will tend to draw the nodes together and compete with the force pulling them apart. To dampen the wild oscillations that might otherwise ensue, we’ll set this value relatively high as well.

The best way to settle on values for these parameters is to experiment with the actual data. For this data set, we’ve settled on the values in line 1 below.

Now, instead of simply refreshing the graph when we’re ready to display it, we start the force directed algorithm, which periodically refreshes the display while it performs its simulation. We also need to stop the algorithm after it’s had a chance to run for awhile. In our case 10 seconds (10000 ms) is plenty of time.

s.startForceAtlas2({gravity:100,scalingRatio:70,slowDown:100});
setTimeout(function() { s.stopForceAtlas2(); }, 10000);

As a result, our albums start out in their original circle, but quickly migrate to a position that makes it much easier to identify the connections. Some of the top albums are tightly connected, indicating that they have many musicians in common. A few, however, remain isolated. Their musicians make the list only once.

Force direction positions the graph nodes automatically.

As you can see, the labels for the nodes still get in the way of each other; we’ll fix that in the next step. What’s important here, however, is that it’s much easier to identify the albums with lots of connections. The nodes representing those albums have migrated to the center of the graph, and they have many links to other nodes.

Step 6: Adding Interactivity

To keep the labels from interfering with one another, we can add some interactivity to the graph. By default, we’ll hide the labels entirely giving users the chance to appreciate the structure of the graph without distractions. We’ll then allow them to click on individual nodes to reveal the album title and it’s connections. To suppress the initial label display, we can modify the initialization code so that nodes have blank labels (line 5). We’ll save a reference to the album title though in line 6.

for (var idx=0; idx<albums.length; idx++) {
    var theta = idx*2*Math.PI / albums.length;
    s.graph.addNode({
        id: ""+idx,   // Note: 'id' must be a string
        label: "",
        album: albums[idx].album,
        x: radius*Math.sin(theta),
        y: radius*Math.cos(theta),
        size: 1
    });
}

Now we need a function that responds to clicks on the node elements. The sigmajs library supports exactly this sort of function with its interface. We simply bind to the clickNode event.

s.bind('clickNode', function(ev) {
    var nodeIdx = ev.data.node.id;
    // Code continues...
});

Within that function, the ev.data.node.id property gives us the index of the node that the user clicked. The complete set of nodes is available from the array returned by s.graph.nodes(). Since we want to display the label for the clicked node (but not for any other), we can iterate through the entire array. At each iteration, we either set the label property to an empty string (to hide it) or to the album property (to show it).

s.bind('clickNode', function(ev) {
    var nodeIdx = ev.data.node.id;
    var nodes = s.graph.nodes();
    nodes.forEach(function(node) {
        if (nodes[nodeIdx] === node) {
            node.label = node.album;
        } else {
            node.label = "";
        }
    });
});

Now that users have a way to show the title of an album, let’s give them a way to hide it. A small addition in the preceding code is all it takes to let users toggle the album display with subsequent clicks.

        if (nodes[nodeIdx] === node && node.label !== node.album) {

As long as we’re making the graph respond to clicks, we can also take the opportunity to highlight the clicked node’s connections. We do that by changing their color. Just as s.graph.nodes() returns an array of the graph nodes, s.graph.edges() returns an array of edges. Each edge object includes target and source properties that hold the index of the relevant node.

We can then scan through all the graph’s edges to see if they connect to the clicked node. If the edge does connect to the node, we can change its color to something other than the default (line 4). Otherwise, we change the color back to the default (line 6). You can see in line 3 that we’re using the same approach as we did with the nodes to toggle the edge colors on successive clicks.

s.graph.edges().forEach(function(edge) {
    if ((nodes[nodeIdx].label === nodes[nodeIdx].album) && 
        ((edge.target === nodeIdx) || (edge.source === nodeIdx))) {
        edge.color = 'blue';
    } else {
        edge.color = 'black';
    }
});

Now that we’ve changed the graph properties, we have to tell sigmajs to redraw it. That’s a simple matter of calling s.refresh().

s.refresh();

Now we have a fully interactive network graph in figure . Our users can get a quick sense of the connections between albums, and a simple click provides additional details.

An interactive graph gives users the chance to highlight specific nodes.

Revealing Language Patterns with Word Clouds

Data visualizations don’t always focus on numbers. Sometimes the data for a visualization centers on words instead, and a word cloud is often an effective way to present this kind of data. Word clouds can associate any quantity with a list of words; most often that quantity is a relative frequency. This type of word cloud, which we’ll create for our next example, reveals which words are common and which words are rare.

To create this visualization we’ll rely on the wordcloud2 library, a spin-off from author Tim Dream’s HTML5 Word Cloud project.

Note: As is the case with a few of the more advanced libraries we’ve examined, wordcloud2 doesn’t function very well in older web browsers such as Internet Explorer version 8 and earlier. Since wordcloud2 itself requires a modern browser, for this example we won’t worry about compatibility with older browsers. This will free us to use some other modern JavaScript features, too.

Step 1: Include the Required Libraries

The wordcloud2 library does not depend on any other JavaScript libraries, so we don’t need any other included scripts. It is not, however, available on common content distribution networks, so we’ll have to serve it from our own web host.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title></title>
  </head>
  <body>
    <script src="js/wordcloud2.js"></script>
  </body>
</html>

Note: To keep our example focused on the visualization, we’ll use a word list that doesn’t need any special preparation. If you’re working with natural language as spoken or written, however, you might wish to process the text to identify alternate forms of the same word. For example, you might want to count “hold,” “holds,” and “held” as three instances of “hold” rather than three separate words. This type of processing obviously depends greatly on the particular language. If you’re working in English and Chinese, though, the same developer that created wordcloud2 has also released the WordFreq JavaScript library that performs exactly this type of analysis.

Step 2: Prepare the Data

For this example we’ll look at the different tags users associate with their questions on the popular Stack Overflow. That site lets users pose programming questions that the community tries to answer. Tags provide a convenient way to categorize the questions so that users can browse other posts related to the same topic. By constructing a word cloud (perhaps better named a tag cloud) we can quickly show the relative popularity of different programming topics.

If you wanted to develop this example into a real application, you could access the Stack Overflow data in real time using the site’s API. For our example, though, we’ll use a static snapshot. Here’s how it starts:

var tags = [
    ["c#", 601251],
    ["java", 585413],
    ["javascript", 557407],
    ["php", 534590],
    ["android", 466436],
    ["jquery", 438303],
    ["python", 274216],
    ["c++", 269570],
    ["html", 259946],
    // Data set continues...

In this data set, the list of tags is an array, and each tag within the list is also an array. These inner arrays have the word itself as the first item and a count for that word as the second item. You can see the complete list in the book’s source code.

The format that wordcloud2 expects is quite similar to how our data is already laid out, except that in each word array, the second value needs to specify the drawing size for that word. For example, the array element ["javascript", 56] would tell wordcloud2 to draw “javascript” with a height of 56 pixels. Our data, of course, isn’t set up with pixel sizes. The data value for “javascript” is 557407, and a word 557407 pixels high wouldn’t even fit on a billboard. As a result, we must convert counts to drawing sizes. The specific algorithm for this conversion will depend both on the size of the visualization and the raw values. A simple approach that works in this case is to divide the count values by 10000 and round to the nearest integer. In chapter 2, we saw how jQuery’s .map() function makes it easy to process all the elements in an array. It turns out that modern browsers have the same functionality built in, so we can use the native version of .map() even without jQuery. (This native version won’t work on older browsers like jQuery will, but we’re not worrying about that for this example.)

var list = tags.map(function(word) { 
    return [word[0], Math.round(word[1]/10000)]; 
});

After this code executes, our list variable will contain the following:

[
    ["c#", 60],
    ["java", 59],
    ["javascript", 56],
    ["php", 53],
    ["android", 47],
    ["jquery", 44],
    ["python", 27],
    ["c++", 27],
    ["html", 26],
    // Data set continues...

Step 3: Add the Required Markup

The wordcloud2 library can build its graphics either using the HTML <canvas> interface or in pure HTML. As we’ve seen with many graphing libraries, <canvas> is a convenient interface for creating graphic elements. For word clouds, however, there aren’t many benefits to using <canvas>. Native HTML, on the other hand, lets us use all the standard HTML tools (such as CSS stylesheets or JavaScript event handling). That’s the approach we’ll take in this example. When using native HTML, we do have to make sure that the containing element has a position: relative style, because wordcloud2 relies on that when placing the words in their proper location in the cloud. You can see in line 8 below that we’ve set that style inline.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title></title>
  </head>
  <body>
    <div id="cloud" style="position:relative;"></div>
    <script src="js/wordcloud2.js"></script>
  </body>
</html>

Step 4: Create a Simple Cloud

With these preparations in place, creating a simple word cloud is about as easy as it can get. We call the wordcloud2 library and tell it the HTML element in which to draw the cloud, and the list of words for the cloud’s data.

WordCloud(document.getElementById("cloud"), {list: list});

Even with nothing other than default values, wordcloud2 creates the attractive visualization shown in figure .

A word cloud can show a list of words with their relative frequency.

The wordcloud2 interface also provides many options for customizing the visualization. As expected, you can set colors and fonts, but you can also change the shape of the cloud (even providing a custom polar equation), rotation limits, internal grid sizing, and many other features.

Step 5: Add Interactivity

If you ask wordcloud2 to use the <canvas> interface, it gives you a couple of callback hooks that your code can use to respond to user interactions. With native HTML, however, we aren’t limited to just the callbacks that wordcloud2 provides. To demonstrate, we can add a simple interaction to respond to mouse clicks on words in the cloud.

First we’ll let users know that interactions are supported by changing the cursor to a pointer when they hover the mouse over a cloud word.

#cloud span {
    cursor: pointer;
}

Next let’s add an extra element to the markup where we can display information about any clicked word. That element is the <div> with the id details in line 9.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title></title>
  </head>
  <body>
    <div id="cloud" style="position:relative;"></div>
    <div id="details"><div>
    <script src="js/wordcloud2.js"></script>
  </body>
</html>

Then we define a function that can be called when the user clicks within the cloud. Because our function will be called for any clicks on the cloud (including clicks on empty space), it will first check to see if the target of the click was really a word. Words are contained in <span> elements, so we can verify that by looking that the nodeName property of the click target. As you can see from line 2, JavaScript node names are always in uppercase.

var clicked = function(ev) {
    if (ev.target.nodeName === "SPAN") {
        // A <span> element was the target of the click
    }
}

If the user did click on a word, we can find out which word by looking at the textContent property of the event target. After line 3 below, the variable tag will hold the word on which the user clicked. So, for example, if a user clicks on the word “javascript,” then the tag variable will have the value "javascript".

var clicked = function(ev) {
    if (ev.target.nodeName === "SPAN") {
        var tag = ev.target.textContent;
    }
}

Since we’d like to show users the total count when they click on a word, we’re going to need to find the word in our original data set. We have the word’s value, so that’s simply a matter of searching through the data set to find a match. If we were using jQuery, the .grep() function would do just that. In this example we’re sticking with native JavaScript, so we can look for an equivalent method in pure JavaScript. Unfortunately, although there is such a native method defined (.find()) very few browsers (even modern browsers) currently support it. We could resort to a standard for or forEach loop, but there is an alternative that many consider an improvement over that approach. It relies on the .some() method, an array method that modern browsers do support. The .some() method passes every element of an array to an arbitrary function and stops when that function returns true. Here’s how we can use it to find the clicked tag in our tags array.

The function that’s the argument to .some() is defined in lines 5 through 11. It is called with the parameter el, short for an element in the tags array. The conditional statement in line 6 checks to see if that element’s word matches the clicked node’s text content. If so, the function sets the clickedTag variable and returns true to terminate the .some() loop.

If the clicked word doesn’t match the element we’re checking in the tags array, then the function supplied to some() returns false (line 10). When some() sees a false return value, it continues iterating through the array.

var clicked = function(ev) {
    if (ev.target.nodeName === "SPAN") {
        var tag = ev.target.textContent;
        var clickedTag;
        tags.some(function(el) { 
            if (el[0] === tag) {
                clickedTag = el; 
                return true;  // This ends the .some() loop
            }
            return false;
        });
    }
}

We can use the return value of the .some() method to make sure the clicked element was found in the array. When that’s the case, .some() itself returns true. In lines 13 and 14 below we update the details variable with extra information. In line 17 we update the web page with those details.

var clicked = function(ev) {
  var details = "";
  if (ev.target.nodeName === "SPAN") {
      var tag = ev.target.textContent,
          clickedTag;
      if (tags.some(function(el) { 
          if (el[0] === tag) {
                clickedTag = el; 
                return true;
          }
          return false; 
      })) {
          details = "There were " + clickedTag[1] + 
                    " Stack Overflow questions tagged \"" + tag + "\"";
      }
  }
  document.getElementById("details").innerText = details;
}

And finally we tell the browser to call our handler when a user clicks on anything in the cloud container.

document.getElementById("cloud").addEventListener("click", clicked)

With these few lines of code, our word cloud is now interactive.

Because our word cloud consists of standard HTML elements, we can make it interactive with simple JavaScript event handlers.

Summing Up

In this chapter, we’ve looked at several different special-purpose visualizations and some JavaScript libraries that can help us create them. Tree maps are handy for showing both hierarchy and dimension in a single visualization. Heat maps can highlight varying intensities throughout a region. Network graphs reveal the connections between objects. And word clouds show relative relationships between language properties in an attractive and concise visualization.

Continue reading: Chapter 5: Showing Time Lines.