Appendix A: Managing Data in the Browser

So far in the book we’ve looked at a lot of visualization tools and techniques, but we haven’t spent much time considering the data part of data visualization. The emphasis on visualization is appropriate in many cases. Especially if the data is static, we can take all the time we need to clean and groom it before it’s even represented in JavaScript. But what if the data is dynamic, and we have no choice but to retrieve the raw source directly into our JavaScript application? We have much less control over data from third party REST APIs, Google Docs spreadsheets, or automatically generated CSV files. With those types of data sources, we often need to validate, reformat, recalculate, or otherwise manipulate the data in the browser.

This appendix considers a JavaScript library that is particularly helpful for managing large data sets in the web browser—Underscore.js. We’ll cover several aspects of Underscore.js in the following sections:

The format of this appendix differs from the regular chapters in the book. Instead of covering a few examples of moderate complexity, we’ll look a lot of simple, small examples. Each section collects several related examples together, but each of the small examples is independent. The first section differs even further. It’s a brief introduction to functional programming cast as a step-by-step migration from the more common programming style. Understanding functional programming is very helpful, however, as its philosophy underlies almost all of the Underscore.js utilities. This appendix serves as a tour of the Underscore.js library with a special focus on managing data. (As a concession to the book’s overall focus on data visualization, it also includes many illustrations.)

Using Functional Programming

When we’re working with data that’s part of a visualization, we often have to iterate through the data one item at a time to transform, extract, or otherwise manipulate it to fit our application. Using only the core JavaScript language, our code may rely on a for loop like the following:

1
2
3
for (var i=0, len=data.length; i<len; i++) {
    // Code continues...
}

Although this style, known as imperative programming, is a common JavaScript idiom, it can have a few problems in large, complex applications. In particular, it might result in code that’s harder than necessary to debug, test, and maintain. This section introduces a different programming style–functional programming–that eliminates many of those problems. As we’ll see, functional programming can result in code that’s much more concise and readable, and often as a result, much less error-prone.

To compare these two programming styles, let’s consider a simple programming problem: writing a function to calculate the Fibonacci numbers. The first two Fibonacci numbers are 0 and 1, and subsequent numbers are the sum of the two preceding values. The sequence starts like this:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, …

Step 1: Start with an Imperative Version

To start, let’s consider a traditional, imperative approach to the problem. We want a JavaScript function, call it fib() that takes as its input a parameter n and returns as its output the n^th Fibonacci number. (By convention, the 0^th and 1^st Fibonacci numbers are 0 and 1.) Here’s a first attempt:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
var fib = function(n) {
    // if 0th or 1st, just return n itself
    if (n < 2) return n;
    
    // otherwise, initialize variable to compute result
    var f0=0, f1=1, f=1;
    
    // iterate until we reach n
    for (i=2; i<=n; i++) {

        // at each iteration, slide the intermediate
        // values down a step
        f0 = f1 = f;
        
        // and calculate sum for the next pass
        f = f0 + f1;
    }
    
    // after all the iterations, return the result
    return f;
}

Step 2: Debug the Imperative Code

If you aren’t checking closely, you might be surprised to find that the trivial example above contains three bugs. Of course, it’s a contrived example and the bugs are deliberate, but can you find all of them without reading any further? More to the point, if even a trivial example can hide so many bugs, can you imagine what might be lurking in a complex web application?

To understand why imperative programming can introduce these bugs, let’s fix them one at a time.

One bug is in the for loop on line 9:

9
    for (i=2; i<=n; i++) {

The conditional that determines the loop termination checks for a less-than-or-equal (<=) value; it should, instead check for less-than (<).

A second bug occurs on line 13:

13
        f0 = f1 = f;

Although we think and read left to right (at least in English), JavaScript executes multiple assignments from right to left. Instead of shifting the values in our variables, this statement simply assigns the value of f to all three. We need to break the single statement into two:

13
14
        f0 = f1;
        f1 = f;

The final bug is the most subtle, and it’s back on line 9 in the for statement. We’re using the local variable i, but we haven’t declared it. As a result, JavaScript will treat it as a global variable. That won’t cause our function to return incorrect results, but it could well introduce a conflict–and a hard-to-find bug–elsewhere in our application. The correct code declares the variable as local:

9
    for (var i=2; i<n; i++) {

Step 3: Understand the Problems Imperative Programming May Introduce

The bugs in this small and straightforward piece of code are meant to demonstrate some problematic features of imperative programming in general. In particular, conditional logic and state variables, by their very nature, tend to invite certain errors.

Consider the first bug. Its error was using an incorrect test (<= instead of <) for the conditional that terminates the loop. Precise conditional logic is critical for computer programs, but such precision doesn’t always come naturally to most people, including programmers. Conditional logic has to be perfect, and sometimes making it perfect is tricky.

The other two errors both relate to state variables, f0 and f1 in the first case, and i in the second. Here again we find a difference between how programmers think and programs operate. When programmers write the code to iterate through the numbers, they’re probably concentrating on the specific problem at hand. It may be too easy to neglect the potential effect on other areas of the application. More technically, state variables can introduce side effects into a program, and side effects may result in bugs.

Step 4: Rewrite using Functional Programming Style

Proponents of functional programming claim that by ditching conditionals and state variables, a functional programming style can produce code that’s more concise, maintainable and less prone to errors than imperative programming.

The “functional” in “functional programming” does not refer to functions in programming languages. Rather, it’s a reference to mathematical functions such as y=f(x). Functional programming attempts to emulate mathematical functions in the context of computer programming. Instead of iterating over values by using a for loop, functional programming often uses recursion, where a function calls itself multiple times to make a calculation or manipulate values.

Here’s how we can implement the Fibonacci algorithm with functional programming:

1
var fib = function(n) { return n < 2 ? n : fib(n-1) + fib(n-2); }

Notice that this version has no state variables and, except for the edge case to handle 0 or 1, no conditional statements. It’s much more concise, and notice how the code mirrors almost word-for-word the statement of the original problem: “The first two Fibonacci numbers are 0 and 1, and subsequent numbers are the sum of the two preceding values”. See, for example, how “The first two Fibonacci numbers” corresponds to n < 2 ?, then “are 0 and 1” corresponds to n, and, finally, “subsequent numbers are the sum of the two preceding values” corresponds to fib(n-1) + fib(n-2).

Functional programming implementations often express the desired outcome directly. They can therefore minimize the chance of misinterpretations or errors in an intermediate algorithm.

Step 5: Evaluate Performance

From what we’ve seen so far it may seem that we should always adopt a functional programming style. Certainly functional programming has its advantages, but it can have some significant disadvantages as well. The Fibonacci code provides a perfect example. Since functional programming eschews the notion of loops, our example relies instead on recursion.

In our specific case the fib() function calls itself twice at every level until the recursion reaches 0 or 1. Since each intermediate call itself results in more intermediate calls, the number of calls to fib() add up exponentially. Finding the 28th Fibonacci by executing fib(28) results in over one million calls to the fib() function.

As you might imagine, the resulting performance is simply unacceptable. Here are the execution times for both the functional and the imperative versions of fib():

Version Parameter Execution Time
Imperative 28 0.231 ms
Functional 28 296.9 ms

As you can see, the functional programming version is over a thousand times slower. In the real world, such performance is rarely acceptable.

Step 6: Fix the Performance Problem

Fortunately, we can have the benefits of functional programming without suffering the performance penalty. We simply turn to the tiny but powerful Underscore.js library. As the library’s web page explains

Underscore is a utility-belt library for JavaScript that provides… functional programming support

Of course we need to include that library in our web pages. If you’re including libraries individually, Underscore.js is available on many content distribution networks such as CloudFlare.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title></title>
  </head>
  <body>
    <!-- Content goes here -->
    <script 
      src="//cdnjs.cloudflare.com/ajax/libs/underscore.js/1.4.4/"+
          "underscore-min.js">
    </script>
  </body>
</html>

With Underscore.js in place, we can now optimize the performance of our Fibonacci implementation.

The problem with the recursive implementation is that it results in many unnecessary calls to fib(). For example, executing fib(28) requires more than 100,000 calls to fib(3). And each time fib(3) is called, the return value is recalculated from scratch. It would be better if the implementation only called fib(3) once, and every subsequent time it needed to know the value of fib(3) it re-used the previous result instead of recalculating it from scratch. In effect, we’d like to implement a cache in front of the fib() function. The cache could eliminate the repetitive calculations.

This approach is known as memoizing, and the Underscore.js library has a simple method to automatically and transparently memoize JavaScript functions. Not surprisingly, that method is called memoize(). To use it, we first wrap the function we want to memoize within the Underscore object. Just as jQuery uses the bling character ($) for wrapping, Underscore.js uses the underscore character. After wrapping our function, we simply call the memoize() method. Here’s the complete code:

1
2
3
var fib = _( function(n) { 
        return n < 2 ? n : fib(n-1) + fib(n-2); 
    } ).memoize()

As you can see, we haven’t really lost any of the readability or conciseness of functional programming. And it would still be a challenge to introduce a bug in this implementation. The only real change is performance, and it’s substantially better.

Version Parameter Execution Time
Imperative fib() 28 0.231 ms
Functional fib() 28 296.9 ms
Memoized fib() 28 0.352 ms

Just by including the Underscore.js library and using one of its methods, our functional implementation has nearly the same performance as the imperative version.

For the rest of this appendix, we’ll look at many of the other improvements and utilities that Underscore.js provides. With its support for functional programming, Underscore.js makes it significantly easier to work with data in the browser.

Working with Arrays

If your visualization relies on a significant amount of data, that data is most likely contained in arrays. Unfortunately, it’s very tempting to resort to imperative programming when working with arrays. Arrays suggest the use of programming loops, and, as we saw above, programming loops are an imperative construct that often causes errors. If we can avoid loops and rely on functional programming instead, we can improve the quality of our JavaScript. The core JavaScript language includes a few utilities and methods to help applications cope with arrays in a functional style, but Underscore.js adds many others. This section describes many of Underscore.js’s array utilities most helpful for data visualizations.

Extracting Elements by Position

If you only need a subset of an array for your visualization, Underscore.js has many utilities that make it easy to extract the right subset. For the examples below, we’ll consider a simple array.

var arr = [1,2,3,4,5,6,7,8,9];
1 2 3 4 5 6 7 8 9 arr
Underscore.js has many utilities to make it easy to work with arrays.

Underscore.js’s first() method provides a simple way to extract the first element of an array, or the first n elements.

> _(arr).first()
  1
> _(arr).first(3)
  [1, 2, 3]
1 2 3 4 5 6 7 8 9 _(arr).first()
The first() function returns the first element in an array.
1 2 3 4 5 6 7 8 9 _(arr).first(3)
The first() function can also return the first n elements in an array.

Notice that first() (without any parameter) returns a simple element, while first(n) returns an array of elements. That means, for example, that first() and first(1) have different return values (1 vs. [1] in the example).

As you might expect, Underscore.js also has a last() method to extract elements from the end of an array.

> _(arr).last()
  9
> _(arr).last(3)
  [7, 8, 9]
1 2 3 4 5 6 7 8 9 _(arr).last()
The last() function returns the last element in an array.
1 2 3 4 5 6 7 8 9 _(arr).last(3)
The last() function can also return the last n elements in an array.

Without any parameters, last() returns the last element in the array. With a parameter n it returns a new array with the last n elements from the original.

The more general versions of both of these functions (.first(3) and .last(3)) would require some potentially tricky (and error-prone) code to implement in an imperative style. In the functional style that Underscore.js supports, however, our code is clean and simple.

What if you want to extract from the beginning of the array, but instead of knowing how many elements you want in the result, you only know how many elements you want to omit? In other words, you need “all but the last n” elements. The initial() method performs this extraction. As with all of these methods, if you omit the optional parameter, Underscore.js assumes a value of 1.

> _(arr).initial()
  [1, 2, 3, 4, 5, 6, 7, 8]
> _(arr).initial(3)
  [1, 2, 3, 4, 5, 6]
1 2 3 4 5 6 7 8 9 _(arr).initial()
The initial() function returns all but the last element in an array.
1 2 3 4 5 6 7 8 9 _(arr).initial(3)
The initial() function can also return all but the last n elements in an array.

Finally, you may need the opposite of initial(). The rest() method skips past a defined number of elements in the beginning of the array and returns whatever remains.

> _(arr).rest()
  [2, 3, 4, 5, 6, 7, 8, 9]
> _(arr).rest(3)
  [4, 5, 6, 7, 8, 9]
1 2 3 4 5 6 7 8 9 _(arr).rest()
The rest() function returns all but the first element in an array.
1 2 3 4 5 6 7 8 9 _(arr).rest(3)
The rest() function can also return all but the first n elements in an array.

Again, these functions would be tricky to implement using traditional, imperative programming, but are a breeze with the help of Underscore.js.

Combining Arrays

Underscore.js includes another set of utilities for combining two or more arrays. These include functions that mimic standard mathematical set operations, as well as more sophisticated combinations. For the next few examples, we’ll use two arrays, one containing the first few Fibonacci numbers and the other containing the first five even integers.

var fibs = [0, 1, 1, 2, 3, 5, 8];
var even = [0, 2, 4, 6, 8];
0 1 1 2 3 5 8 fibs 0 2 4 6 8 even
Underscore.js also has many utilities to work with multiple arrays.

The union() method is a straightforward combination of multiple arrays. It returns an array containing all elements that are in any of the inputs, and it removes any duplicates.

> _(fibs).union(even)
  [0, 1, 2, 3, 5, 8, 4, 6]
0 1 1 2 3 5 8 0 2 4 6 8 4 6 0 1 2 3 5 8 _(fibs).union(even)
The union() function creates the union of multiple arrays, removing any duplicates.

Notice that union() removes duplicates whether they appear in separate inputs (0, 2, and 4) or in the same array (1).

Although this appendix considers combinations of just two arrays, most Underscore.js methods can accept an unlimited number of parameters. For example, _.union(a,b,c,d,e) returns the union of five different arrays. You can even find the union of an array of arrays with the JavaScript apply function with something like _.union.prototype.apply(this, arrOfArrs).

The intersection() method acts just as you would expect, returning only those elements that appear in all of the input arrays.

> _(fibs).intersection(even)
  [0, 2, 8]
0 1 1 2 3 5 8 0 2 4 6 8 0 2 8 _(fibs).intersection(even)
The intersection() function returns elements in common among multiple arrays.

The difference() method is the opposite of intersection(). It returns those elements in the first input array that are not present in the other inputs.

> _(fibs).difference(even)
  [1, 1, 3, 5]
0 1 1 2 3 5 8 _(fibs).difference(even) 1 1 3 5 6 0 2 8 4
The difference() function returns elements that are only present in the first of multiple arrays.

If you need to eliminate duplicate elements but only have one array (making union() inappropriate), then you can use the uniq() method.

> _(fibs).uniq()
  [0, 1, 2, 3, 5, 8]
0 1 1 2 3 5 8 0 1 2 3 5 8 fibs _(fibs).uniq()
The uniq() function removes duplicate elements from an array.

Finally, Underscore.js has a zip() method. It’s name doesn’t come from the popular compression algorithm but, rather, because it acts a bit like a zipper. It takes multiple input arrays and combines them, element by element, into an output array. That output is an array of arrays, where the inner arrays are the combined elements.

The operation is perhaps most clearly understood through a picture.

naturals primes 1 2 3 4 5 2 3 5 7 11 _.zip(naturals,primes) 1 2 2 3 3 5 4 7 5 11
The zip() function pairs elements from multiple arrays together into a single array.
> var naturals = [1, 2, 3, 4, 5];
> var primes = [2, 3, 5, 7, 11];
> _.zip(naturals, primes)
  [ [1,2], [2,3], [3,5], [4,7], [5,11] ]

This example demonstrates an alternative style for Underscore.js. Instead of wrapping an array within the _ object as we’ve done so far, we call the zip() method on the _ object itself. The alternative style seems a better fit for the underlying functionality in this case, but if you prefer _(naturals).zip(prime), you’ll get the exact same result.

Removing Invalid Data Values

One of the banes of visualization applications is invalid data values. Although we’d like to think that our data sources meticulously ensure that all the data they provide is scrupulously correct, that is, unfortunately, rarely the case. More seriously, if JavaScript encounters an invalid value, the most common result is an unhandled exception, which halts all further JavaScript execution on the page.

To avoid such an unpleasant error, we should validate all data sets and remove invalid values before we pass the data to graphing or charting libraries. Underscore.js has several utilities to help.

The simplest of these Underscore.js methods is compact(). This function removes any data values that JavaScript treats as false from the input arrays. Eliminated values include the boolean value false, the numeric value 0, an empty string, and the special values NaN (not a number, for example 1/0), undefined, and null.

> var raw = [0, 1, false, 2, "", 3, NaN, 4, , 5, null];
> _(raw).compact()
  [1, 2, 3, 4, 5]

It is worth emphasizing that compact() removes elements with a value of 0. If you use compact() to clean a data array, be sure that 0 isn’t a valid data value in your data set.

Another common problem with raw data is excessively nested arrays. If you want to eliminate extra nesting levels from a data set, the flatten() method is available to help.

> var raw = [1, 2, 3, [[4]], 5];
> _(raw).flatten()
  [1, 2, 3, 4, 5]

By default, flatten() removes all nesting, even multiple levels of nesting, from arrays. If you set the shallow parameter to true, however, it only removes a single level of nesting.

> var raw = [1, 2, 3, [[4]], 5];
> _(raw).flatten(true)
  [1, 2, 3, [4], 5]

Finally, if you have specific values that you want to eliminate from an array, you can use the without() method. It’s parameters provide a list of values that the function should remove from the input array.

> var raw = [1, 2, 3, 4];
> _(raw).without(2, 3)
  [1, 4]

Finding Elements in an Array

JavaScript has always defined the indexOf method for strings. It returns the position of a given substring within a larger string. Recent versions of JavaScript have added this method to array objects, so you can easily find the first occurrence of a given value in an array. Unfortunately, older browsers (specifically Internet Explorer version 8 and earlier) don’t support this method.

Underscore.js provides it’a own indexOf() method to fill the gap those older browsers create. If Underscore.js finds itself running in an environment with native support for array indexOf, then it defers to the native method to avoid any performance penalty.

> var primes = [2, 3, 5, 7, 11];
> _(primes).indexOf(5)
  2

To begin your search somewhere in the middle of the array, you can specify that starting position as the second argument to indexOf().

> var arr = [2, 3, 5, 7, 11, 7, 5, 3, 2];
> _(arr).indexOf(5, 4)
  6

You can also search backwards from the end of an array using the lastIndexOf() method.

> var arr = [2, 3, 5, 7, 11, 7, 5, 3, 2];
> _(arr).lastIndexOf(5)
  6

If you don’t want to start at the very end of the array, you can pass in the starting index as an optional parameter.

Underscore.js provides a few helpful optimizations for sorted arrays. Both the uniq() and the indexOf() methods accept an optional boolean parameter. If that parameter is true, then the functions assume that the array is sorted. The performance improvements this assumption allows can be especially significant for large data sets.

The library also includes the special sortedIndex() function. This function also assumes that the input array is sorted. It finds the position at which a specific value should be inserted to maintain the array’s sort order.

> var arr = [2, 3, 5, 7, 11];
> _(arr).sortedIndex(6)
  3

If you have a custom sorting function, you can pass that to sortedIndex() as well.

Generating Arrays

The final array utility function I’ll mention is a convenient method to generate arrays. The range() method tells Underscore.js to create an array with the specified number of elements. You may also specify a starting value (the default is 0) and the increment between adjacent values (the default is 1).

> _.range(10)
  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
> _.range(20,10)
  [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
> _.range(0, 10, 100)
  [0, 100, 200, 300, 400, 500, 600, 700, 800, 900]

The range() function can be quite useful if you need to generate x-axis values to match an array of y-axis values, The zip() method can then combine the two.

> var yvalues = [0.1277, 1.2803, 1.7697, 3.1882]
> _.zip(_.range(yvalues.length),yvalues)
  [ [0, 0.1277], [1, 1.2803], [2, 1.7697], [3, 3.1882] ]

Enhancing Objects

Although the previous section’s examples show numeric arrays, often our visualization data consists of JavaScript objects instead of simple numbers. That’s especially likely if we get the data via a REST interface, as such interfaces almost always deliver data in JavaScript Object Notation (JSON). If we need to enhance or transform objects without resorting to imperative constructs, Underscore.js has another set of utilities that can help. For the following examples, we can use a simple pizza object.

var pizza = { 
    size: 10, 
    crust: "thin", 
    cheese: true, 
    toppings: ["pepperoni","sausage"]
};
pizza: size: 10 crust: cheese: “thin” true toppings: “pepperoni” “sausage”
Underscore.js has many utilities for working with arbitrary JavaScript objects.

Keys and Values

Underscore.js includes several methods to work with the keys and values that make up objects. For example, the keys() function creates an array consisting solely of an object’s keys.

> _(pizza).keys()
  ["size", "crust", "cheese", "toppings"]
“size” “crust” “cheese” “toppings” _(pizza).keys()
The keys() function returns the keys of an object as an array.

Similarly, the values() function creates an array consisting solely of an object’s values.

> _(pizza).values()
  [10, "thin", true, ["pepperoni","sausage"]]
_(pizza).values() 10 “thin” true “pepperoni” “sausage”
The values() function returns just the values of an object as an array.

The pairs() function creates a two-dimensional array. Each element of the outer array is itself an array which contains an object’s key and its corresponding value.

> _(pizza).pairs()
 [ 
   ["size",10], 
   ["crust","thin"], 
   ["cheese",true], 
   ["toppings",["pepperoni","sausage"]]
 ]
_(pizza).pairs() “size” 10 “crust” “thin” “cheese” true “toppings” “pepperoni” “sausage”
The pairs() function converts an object into an array of array pairs.

To reverse this transformation and convert an array into an object, there is the object() method.

> var arr = [ ["size",10], ["crust","thin"], ["cheese",true], 
            ["toppings",["pepperoni","sausage"]] ]
> _(arr).object()
  { size: 10, crust: "thin", cheese: true, toppings: ["pepperoni","sausage"]}

Finally, we can swap the roles of keys and values in an object with the invert() function.

> _(pizza).invert()
  {10: "size", thin: "crust", true: "cheese", "pepperoni,sausage": "toppings"}
“size” “10”: “crust” “cheese” “thin”: “true”: _(pizza).invert() “pepperoni,sausage”: “toppings”
The invert() function swaps keys and values in an object.

As the example shows, Underscore.js can even invert an object if the value isn’t a simple type. In this case it takes an array, ["pepperoni","sausage"] and converts it to a value by joining the individual array elements with commas, creating the key "pepperoni,sausage".

Note also that JavaScript requires that all of an object’s keys are unique. That’s not necessarily the case for values. If you have an object in which multiple keys have the same value, then invert() only keeps the last of those keys in the inverted object. For example, _({key1: value, key2: value}).invert() returns {value: key2}.

Object Subsets

When you want to clean up an object by eliminating unnecessary attributes, you can use Underscore.js’s pick() function. Simply pass it a list of attributes that you want to retain.

> _(pizza).pick("size","crust")
  {size: 10, crust: "thin"}
size: 10 crust: “thin” _(pizza).pick(“size”,“crust”)
The pick() function selects specific properties from an object.

We can also do the opposite of pick() by using omit() and listing the attributes that we want to delete. Underscore.js keeps all the other attributes in the object.

> _(pizza).omit("size","crust")
 {cheese: true, toppings: ["pepperoni","sausage"]}
_(pizza).omit(“size”,“crust”) cheese: toppings: true “pepperoni” “sausage”
The omit() function removes properties from an object.

Updating Attributes

When updating objects, a common requirement is to make sure that an object includes certain attributes and that those attributes have appropriate default values. Underscore.js includes two utilities for this purpose.

The two utilities, extend() and defaults() both start with one object and adjust its properties based on those of other objects. If the secondary objects include attributes that the original object lacks, these utilities add those properties to the original. The utilities differ in how they handle properties that are already present in the original. The extend() function overrides the original properties with new values, as shown below:

> var standard = { size: 12, crust: "regular", cheese: true }
> var order = { size: 10, crust: "thin", 
  toppings: ["pepperoni","sausage"] };
> _.extend(standard, order)
  { size: 10, crust: "thin", cheese: true, 
  toppings: ["pepperoni","sausage"] };
standard cheese: true size: 12 crust: “regular” order 10 “thin” “pepperoni” “sausage” _.extend( standard, order ) 10 “thin” true “pepperoni” “sausage” crust: toppings: crust: size: toppings: cheese: size:
The extend() function updates and adds missing properties to an object.

Meanwhile defaults() leaves the original properties unchanged:

> var order = { size: 10, crust: "thin", 
  toppings: ["pepperoni","sausage"] };
> var standard = { size: 12, crust: "regular", cheese: true }
> _.defaults(order, standard)
  { size: 10, crust: "thin", 
  toppings ["pepperoni","sausage"], cheese: true };
order size: 10 crust: “thin” toppings: “pepperoni” “sausage” standard size: 12 crust: “regular” true _.defaults( order, standard ) 10 “thin” toppings: “pepperoni” “sausage” true crust: size: cheese: cheese:
The defaults() function adds missing properties to an object.

It’s important to note that both extend() and defaults() modify the original object directly; they do not make a copy of that object and return the copy. Consider, for example, the following

> var order = { size: 10, crust: "thin", 
  toppings: ["pepperoni","sausage"] };
> var standard = { size: 12, crust: "regular", cheese: true }
> var pizza = _.extend(standard, order)
  { size: 10, crust: "thin", cheese: true, 
  toppings: ["pepperoni","sausage"] };

This code sets the pizza variable as you would expect, but it also sets the standard variable to that same object. More specifically, the code modifies standard with the properties from order, and then it sets a new variable pizza equal to standard. The modification of standard is probably not intended. If you need to use either extend() or defaults() in a way that does not modify input parameters, start with an empty object. We can rewrite the code above to avoid modifying standard.

> var order = { size: 10, crust: "thin", 
  toppings: ["pepperoni","sausage"] };
> var standard = { size: 12, crust: "regular", cheese: true }
> var pizza = _.extend({}, standard, order)
  { size: 10, crust: "thin", cheese: true, 
  toppings: ["pepperoni","sausage"] };

Manipulating Collections

So far we’ve seen various Underscore.js tools that are suited specifically for either arrays or objects. Next, we’ll see some tools for manipulating collections in general. In Underscore.js both arrays and objects are collections, so the tools in this section can be applied to pure arrays, pure objects, or data structures that combine both. In this section, we’ll try out these utilities on an array of objects, since that’s the data structure we most often deal with in the context of data visualization.

Here’s a small data set we can use for the examples below. It contains a few statistics from the 2012 Major League Baseball season.

var national_league = [
    { name: "Arizona Diamondbacks",  wins: 81, losses:  81, division: "west"    },
    { name: "Atlanta Braves",        wins: 94, losses:  68, division: "east"    },
    { name: "Chicago Cubs",          wins: 61, losses: 101, division: "central" },
    { name: "Cincinnati Reds",       wins: 97, losses:  65, division: "central" },
    { name: "Colorado Rockies",      wins: 64, losses:  98, division: "west"    },
    { name: "Houston Astros",        wins: 55, losses: 107, division: "central" },
    { name: "Los Angeles Dodgers",   wins: 86, losses:  76, division: "west"    },
    { name: "Miami Marlins",         wins: 69, losses:  93, division: "east"    },
    { name: "Milwaukee Brewers",     wins: 83, losses:  79, division: "central" },
    { name: "New York Mets",         wins: 74, losses:  88, division: "east"    },
    { name: "Philadelphia Phillies", wins: 81, losses:  81, division: "east"    },
    { name: "Pittsburgh Pirates",    wins: 79, losses:  83, division: "central" },
    { name: "San Diego Padres",      wins: 76, losses:  86, division: "west"    },
    { name: "San Francisco Giants",  wins: 94, losses:  68, division: "west"    },
    { name: "St. Louis Cardinals",   wins: 88, losses:  74, division: "central" },
    { name: "Washington Nationals",  wins: 98, losses:  64, division: "east"    }
];

Iteration

In the first section we saw some of the pitfalls of traditional JavaScript iteration loops as well as the improvements that functional programming can provide. Our Fibonacci example eliminated iteration by using recursion, but many algorithms don’t lend themselves to a recursive implementation. In those cases we can still use a functional programming style, however, by taking advantage of the iteration utilities in Underscore.js

The most basic Underscore.js utility is each(). It executes an arbitrary function on every element in a collection and often serves as a direct functional replacement for the traditional for (i=0; i<len; i++) loop.

> _(national_league).each(function(team) { console.log(team.name); })
  Arizona Diamondbacks
  Atlanta Braves
  // Console output continues...
  Washington Nationals

Note: If you’re familiar with the jQuery library, you may know that jQuery includes a similar $.each() utility. There are two important differences between the Underscore.js and jQuery versions, however. First, the parameters passed to the iterator function differ between the two. Underscore.js passes (element, index, list) for arrays and (value, key, list) for simple objects, while jQuery passes (index, value). Secondly, at least as of this writing, the Underscore.js implementation can execute much faster than the jQuery version, depending on the browser. (jQuery also includes a $.map() function that’s similar to the Underscore.js method.)

The Underscore.js map() method iterates through a collection and transforms each element with an arbitrary function. It returns a new collection containing the transformed elements. Here, for example, is how to create an array of all the team’s winning percentages.

> _(national_league).map(function(team) {
      return Math.round(100*team.wins/(team.wins + team.losses);
  })
  [50, 58, 38, 60, 40, 34, 53, 43, 51, 46, 50, 49, 47, 58, 54, 60]

The reduce() method iterates through a collection and returns a single value. One parameter initializes this value, and the other parameter is an arbitrary function that updates the value for each element in the collection. We can use reduce() for example, to calculate how many teams have a winning percentage over 500.

> _(national_league).reduce(
      function(count, team) {
          return count + (team.wins > team.losses);
      },
      0  // starting point for reduced value
  )
  7

As the comment in line 5 indicates, we start our count at 0. That value is passed as the first parameter to the function in line 2, and the function returns an updated value in line 3.

Note: If you’ve followed the development of “big data” implementations such as Hadoop or Google’s search, you may know that the fundamental algorithm behind those technologies is MapReduce. Although the context differs, the same concepts underlie the map() and reduce() utilities in Underscore.js.

Finding Elements in a Collection

Underscore.js has several methods to help us find elements or sets of elements in a collection. We can, for example, use find() to get a team with more than 90 wins.

> _(national_league).find( function(team) { return team.wins > 90; })
  {name: "Atlanta Braves", wins: 94, losses: 68, division: "east"}

The find() function just returns the first element in the array that meets the criteria. To find all elements that meet our criteria, use the filter() function.

> _(national_league).filter( function(team) { return team.wins > 90; })
  [ { name: "Atlanta Braves", wins: 94, losses: 68, division: "east" },
    { name: "Cincinnati Reds", wins: 97, losses: 65, division: "central" },
    { name: "San Francisco Giants", wins: 94, losses: 68, division: "west" },
    { name: "Washington Nationals", wins: 98, losses: 64, division: "east" }
  ]

The opposite of the filter() function is reject(). It returns an array of elements that don’t meet the criteria.

> _(national_league).reject( function(team) { return team.wins > 90; })
  [ { name: "Arizona Diamondbacks", wins: 81, losses:  81, division: "west" },
    { name: "Chicago Cubs", wins: 61, losses: 101, division: "central" },
    // Console output continues...
    { name: "St. Louis Cardinals", wins: 88, losses: 74, division: "central" }
  ]

If your criteria can be described as a property value, you can use a simpler version of filter(), the where() function. Instead of an arbitrary function to check for a match, where() takes for its parameter a set of properties that must match. We can use it to extract all the teams in the Eastern division.

> _(national_league).where({division: "east"})
   [ { name: "Atlanta Braves", wins: 94, losses: 68, division: "east" },
     { name: "Miami Marlins", wins: 69, losses: 93, division: "east" },
     { name: "New York Mets", wins: 74, losses: 88, division: "east" },
     { name: "Philadelphia Phillies", wins: 81, losses: 81, division: "east" },
     { name: "Washington Nationals", wins: 98, losses: 64, division: "east" }
  ]

The findWhere() method combines the functionality of find() with the simplicity of where(). It returns the first element in a collection with properties that match specific values.

> _(national_league).where({name: "Atlanta Braves"})
  {name: "Atlanta Braves", wins: 94, losses: 68, division: "east"}

Another Underscore.js utility that’s especially handy is pluck(). This function creates an array by extracting only the specified property from a collection. We could use it to extract an array of nothing but team names, for example.

> _(national_league).pluck("team")
  [
    "Arizona Diamondbacks",
    "Atlanta Braves",
    /* Data continues... */,
    "Washington Nationals"
  ]

Testing a Collection

Sometimes we don’t necessarily need to transform a collection; we simply want to check some aspect of it. Underscore.js provides several utilities to help with these tests.

The every() function tells us whether or not all elements in a collection pass an arbitrary test. We could use it to check if every team in our data set had at least 70 wins.

> _(national_league).every(function(team) { return team.wins >= 70; })
  false

Perhaps we’d like to know if any team had at least 70 wins. In that case the any() function provides an answer.

> _(national_league).any(function(team) { return team.wins >= 70; })
  true

Underscore.js also lets us use arbitrary functions to find the maximum and minimum elements in a collection. If our criteria is number of wins, we use max() to find the “maximum” team.

> _(national_league).max(function(team) { return team.wins; })
  {name: "Washington Nationals", wins: 98, losses: 64, division: "east"}

Not surprisingly, the min() function works the same way.

> _(national_league).min(function(team) { return team.wins; })
  {name: "Houston Astros", wins: 55, losses: 107, division: "central"}

Rearranging Collections

To sort a collection, we can use the sortBy() method and supply an arbitrary function to provide sortable values. Here’s how to reorder our collection in order of increasing wins.

> _(national_league).sortBy(function(team) { return team.wins; })
  [ { name: "Houston Astros", wins: 55, losses: 107, division: "central" }
    { name: "Chicago Cubs", wins: 61, losses: 101, division: "central" },
    // Data continues...
    { name: "Washington Nationals", wins: 98, losses: 64, division: "east" } ]

We could also reorganize our collection by grouping its elements according to a property. The Underscore.js function that helps in this case is groupBy(). One possibility is reorganizing the teams according to their division.

> _(national_league).groupBy("division")
  {
    { west:
      { name: "Arizona Diamondbacks", wins: 81, losses: 81, division: "west" },
      { name: "Colorado Rockies", wins: 64, losses: 98, division: "west" },
      { name: "Los Angeles Dodgers", wins: 86, losses: 76, division: "west" },
      { name: "San Diego Padres", wins: 76, losses: 86, division: "west" },
      { name: "San Francisco Giants", wins: 94, losses: 68, division: "west" },
    },
    { east:
      { name: "Atlanta Braves", wins: 94, losses: 68, division: "east" },
      { name: "Miami Marlins", wins: 69, losses: 93, division: "east" },
      { name: "New York Mets", wins: 74, losses: 88, division: "east" },
      { name: "Philadelphia Phillies", wins: 81, losses: 81, division: "east" },
      { name: "Washington Nationals", wins: 98, losses: 64, division: "east" }
    },
    { central:
      { name: "Chicago Cubs", wins: 61, losses: 101, division: "central" },
      { name: "Cincinnati Reds", wins: 97, losses: 65, division: "central" },
      { name: "Houston Astros", wins: 55, losses: 107, division: "central" },
      { name: "Milwaukee Brewers", wins: 83, losses: 79, division: "central" },
      { name: "Pittsburgh Pirates", wins: 79, losses:  83, division: "central" },
      { name: "St. Louis Cardinals",  wins: 88, losses: 74, division: "central" },
    }
  }

We can also use the countBy() function to simply count the number of elements in each group.

> _(national_league).countBy("division")
  {west: 5, east: 5, central: 6}

Note: Although we’ve used a property value ("division") for groupBy() and countBy(), both methods also accept an arbitrary function if the criteria for grouping isn’t a simple property.

As a final trick, Underscore.js let’s us randomly reorder a collection using the shuffle() function.

_(national_league).shuffle()

Summing Up

Although this appendix takes a different approach than the rest of the book, its ultimate focus is still on data visualizations. As we’ve seen in earlier chapters (and as you’ll certainly encounter in your own projects), the raw data for our visualizations isn’t always perfect as delivered. Sometimes we need to clean the data by removing invalid values, and other times we need to rearrange or transform it so that it’s appropriate for our visualization libraries.

The Underscore.js library contains a wealth of tools and utilities to help with those tasks. It let’s us easily manage arrays, modify objects, and transform collections. Furthermore, Underscore.js supports an underlying philosophy based on functional programming, so our code that uses Underscore.js remains highly readable and resistant to bugs and defects.

Continue reading: Appendix B: Building Data-Driven Web Applications.