Design Tidbits

back Tidbits Overview

Tools for Design and Visualization

Simplification

Interaction Design

Screen Design

Hierarchies

Visual Design

Design Process

Web, Web Applications, Web Design

Performance

Miscellany

 

 

Eurovision Song Contest 2011 and NodeXL: A Perfect Match – Part II

By Gerd Waloszek, SAP AG, SAP User Experience – May 30, 2011

This is the second part of a two-part article about applying the visualization tool NodeXL to the European Song Contest (ESC) 2011 voting data. In the first part, I presented variations of the same graph that illustrated how certain NodeXL features help make the graph less cluttered yet more instructive: I started with the default global appearance, and then used NodeXL's autofill feature to apply global changes to edges and vertices based on graph metrics. Finally, I applied dynamic filters to the data to change the graph selectively, with regards to edges, vertices, and filter opacity.

In this part of my article, I will demonstrate how you can assign category colors to edges and vertices. Categories can be assigned in various ways: automatically (as was done for vertices by means of a clustering algorithm), manually, or by performing simple calculations or manipulations – I will present a few examples below. Finally, I will show how you can use a map as graph background and thus provide a natural layout for data that has a geographic reference.

 

The Experiments – Part 2

(4) Coloring Edges by Category

In my previous examples, I used a color gradient based on edge weight to color the edges. In addition, I varied the edge width slightly according to edge weight (between 1 and 2). As I already mentioned in the first part of my article, such a combined coloring scheme seems to be more effective than using width alone. I constrained the edge width to a value of 2 because thicker edges can clutter the graph.

Figure 1: Edge color set to voting category (unique color for each voting category)

Figure 2: Ditto; dynamic filters used for edge weight (top three votes only); filter opacity = 0%

Figure 3: Edge color set to voting category (three categories) based on weight (points)

Figure 4: Ditto; dynamic filters used for edge category (top category = three top votes only); filter opacity = 0%


NodeXL's autofill feature allows you to assign colors (and other visual properties) to edges on the basis of category values. It took me quite a while to realize this and to understand its usefulness. In the ESC, there are ten voting categories (1 - 7, 8, 10, and 12 points), resulting in 10 different edge colors (see Figure 1). For Figure 2, I applied dynamic filters to display only the top three voting categories (8, 10, and 12 points), thereby highlighting the primary preferences for songs (filter opacity set to 0%).

I thought that it would be interesting to reduce the number of categories to three, for example: top votes, medium point votes, and low point votes. Therefore, I added a new column to the edges sheet to specify the reduced set of categories. To fill the column with values, you can either assign the new categories to the edges manually (which is somewhat tedious when you have 430 rows) or use a simple calculation or manipulation. I did the latter and created the new categories by first pasting the edge weights (voting points) into the new column and then using Excel's replace feature to transform the weights to the numbers 1, 2 and 3, that is, into three categories. I had to be careful with respect to the order of the replacement steps because Excel's replace function works on a character basis. Figure 3 shows the result for the three new categories (top votes, medium point votes, low point votes). For Figure 4, I applied the same dynamic filters as for Figure 2 to display only the top category, which now includes the 8-, 10-, and 12-point votes. Slight variations in edge width and opacity allow you to distinguish between the 8-, 10-, and 12-point votes, provided you look very carefully at the graph.

When comparing Figures 1 and 2 as well as Figures 3 and 4, you may, of course, form your own opinion about which version is the more instructive or effective one. With this experiment, I just wanted to demonstrate possible configuration options for the graph.

(5) Coloring Vertices by Category

Figure 5: Vertex color set to category (finalists vs. non-finalists)

Figure 6: Ditto; dynamic filters used for edge weight (top votes only); filter opacity = 10%

Figure 7: Vertex color set to category (8 regional categories); dynamic filters used for edge weight (top votes only); filter opacity = 10%

Figure 8: Vertex coloring based on cluster algorithm and degree (from first part of article); dynamic filters used for edge weight (12-point votes only); filter opacity = 15%


Now, it was time to transfer what I had learned from experimenting with the edges to the vertices. At the very beginning, I had already applied a clustering algorithm to group the vertices (countries) automatically; this was represented by vertex colors. You could also color vertices and thus group them visually for categories that are based on certain criteria or even created manually. The simplest way of grouping countries would be to distinguish between finalists and non-finalists. In part 1 of my article, I had already mentioned degree as a possible candidate for creating such a grouping: Non-finalists have a degree of 10, while finalists have a higher degree because they all had received votes in the final. As with the edges, I introduced a new column here, this time in the vertices sheet. Then I pasted the degree values into this column and manually changed all values larger than 10 to 20, to get the two category values 10 and 20. Absolute values are irrelevant here, and I could have replaced the values with names to make the column data more self-explanatory. Figure 5 shows the result together with my usual autofill manipulations on the edges. For Figure 6, I once again added a dynamic filter to display only the top votes. This time, however, I set filter opacity to 10% so that the filtered data is more predominant. The distinction between finalists and non-finalists is not very enlightening. But at least we can now easily see that Switzerland, the finalist with the least votes, lies even more at the periphery than the two non-finalists Israel and San Marino.

Next, I experimented with a manual assignment of vertices to categories: I assigned the countries to 8 geographic regions (Scandinavia, Baltics, Balkans, East, Southeast, South, West, Center). This seemed to promise an interesting outcome with respect to the frequently-voiced assumption that neighboring or culturally related countries tend to favor each other in the voting. But the categories and the assignment are, of course, somewhat arbitrary, so let us regard this exercise as a technical one and not as one that provides insight into the voting behavior of European countries. Apart from the vertex label colors, the results of this exercise are identical to that of the previous one, so I have only presented the filtered version in Figure 7. More interesting, perhaps, is a comparison between manually assigned categories and the results of the automatic clustering shown in part 1 of the article. I therefore copied the respective graph from Figure 5 in part 1 into Figure 8 here for easier comparison. A closer look at both Figures reveals similarities as well as differences between both assignments.

(6) Using a Map as Background

Figure 9: Map used as background; vertices placed manually; Sweden selected

Figure 10: Map used as background; vertices placed manually; dynamic filter for edge weight (12-point votes); Sweden selected


Finally, I would like to present a feature that comes in handy when dealing with data that has a geographical reference. It would be interesting to see the voting patterns in a map-based graphic, especially since the voting behavior of neighboring countries has been much debated. Thus, the map – and not an algorithm – would define the graph layout.

I found a map of the countries participating in the ESC on Wikipedia, where I had already found the voting data. I selected a version of suitable size, cropped it a little bit, used an image processing application to make it lighter, and then imported the map to NodeXL as the background. The results can be seen in Figures 9 and 10: Figure 9 shows the complete graph, while Figure 10 shows only the top votes (dynamic filters were used). In both Figures, Sweden (the most connected country) was selected and the edges going to and coming from it have been highlighted.

 

A Few Remarks About NodeXL

NodeXL is a tool for analyzing complex data and, not surprisingly, is quite complex itself in many ways. I had not used it for about half a year, and was a little bit puzzled when first confronted with it again. I therefore decided to check the online version of the book about NodeXL, particularly part II, which is a tutorial entitled "Learning by Doing". Luckily, I had already marked important passages in my version, so I was able to simply scan the tutorial and indeed get back to speed quickly. This definitely speaks for the tutorial.

However, there are also some quirks on the NodeXL user interface that may puzzle users from time to time, particularly if they do not refer to the manual. For example, there are two dialogs that I used most often in my experiments and that are indeed "unequal siblings": the "Autofill" dialog and the "Dynamic Filters" dialog (see Figure 11 below).

Dialogs in NodeXL

Figure 11: Two "unequal siblings": the "Autofill" dialog (left) and the "Dynamic Filters" dialog (right)

The "Dynamic Filters" dialog is a joy to use and you can see the effects of your changes almost instantly. However, do not expect the same experience from the "Autofill" dialog. This dialog is somewhat complex (each option opens a subdialog), and changes are only propagated to the graph if you click the "Autofill" button at the bottom (sometimes I also felt inclined to refresh the graph, but this was probably due to my inexperience). One reason for this behavior of the dialog is that each setting has options that you may want to set (in another dialog) before you refresh the graph. Another, more technical, reason is that the "Autofill" dialog populates certain columns in various worksheets that are used by NodeXL. This may, however, not be apparent to the inexperienced user because the changes are not visible (that is, the respective worksheets are hidden).

Moreover, and this is what puzzled me most at the beginning, if you clear a setting in the dialog it is not really cleared. For example, you have set the vertex color to "Degree", clicked the "Autofill" button and thus refreshed the graph to show the effect of the setting. You then want to undo this option, so you clear the respective field in the dialog, and click the "Autofill" button to refresh the graph: But nothing happens. The reason: The data that you filled into the edge color column when you set the vertex color to "Degree" is still there. You have to either delete the respective column manually (this works, I checked it) or click the button "Clear All Worksheet Columns Now," which works as expected, namely clears all columns. However, this may not be what you really want: The button clears all columns, so if you wanted to clear just one column, you have to set all settings that you want to retain back to their original values (provided you can remember them all). This is a cumbersome procedure and perhaps there is a clever workaround described in the book. I had hoped to find a solution on the user interface (it may be there...) without reading the book. This behavior is asymmetric in that if you change a setting, new numbers are filled into the columns after you press the "Autofill" button – the deletion of settings, however, is not propagated.

All in all, if the authors of this formidable tool happen to read my complaints, they may point me to a more efficient procedure – or if there is none, provide a fix and make the tool even better. I got two e-mails in less than an hour after I had notified the authors of my articles – I report their answers below.

 

Final Word

This concludes my simple experiments with NodeXL and the ESC 2011 voting data. In this two-part article, I was only able to scratch the very surface of the tool, and I am still a beginner with it. Perhaps I was able to whet the readers' appetite and inspire them to perform their own experiments with NodeXL. The secret behind such experiments is having data available that you would like to explore. In my case, the ESC 2011 voting data came in handy and motivated me to do my experiments.

P. S.: Some of the findings that I reported here can also be easily seen in the original voting table (see references) without too much effort.

P. P. S.: The references below may be useful if you want to experiment with NodeXL on your own. Please note that NodeXL only works with the latest Excel versions for the Microsoft Windows operating system. Apple Macintosh users who have a Windows operating system installed on their computers can download an Office 2010 test version for their experiments (valid for 60 days).

 

Comments from the Authors

Clear One Data Column

Derek Hansen: There is indeed a relatively simple way to clear the data for only one column (e.g., "Vertex Color"). You can click on the "Options" arrow associated with the column field title (e.g., "Vertex Color") and you'll see a choice that says "Clear Vertex Color Worksheet Column Now".

Marc Smith admits that this feature is somewhat hard to find and adds an illustration: I should note that there is a feature (which is not very discoverable) to clear single columns of data using the "Autofill" columns dialog. The "Options" drop down along side each row contains a selection:

Screen shot 2011-05-31 at 7.25.14 AM.png

The first option clears the entry in the dialog box, the second clears the column of data.

I have tried and check this solution, and it works. See the complete dialog for more context (click the Figure for a larger version):

Autofill dialog with options menu

I have to admit that I did not really care for that menu, and after having used it, I believe that I understand why: Why should one want to reset the source column name without also clearing the data column in the worksheet? This separation obviously puzzles me somehow. The authors will definitely have a reason for the separation of the steps, but as I usually want to clear both, I have to call the menu twice in such a case. Therefore, I would like to propose a command "Clear ... Source Column Name and Worksheet Column Now" – admittedly a very long name for a command...

General Design Issues

Both authors also commented on more general issues:

  • Derek Hansen: More generally, I understand the confusion with the Autofill Columns being essentially a one-way interaction (things you do on the "Autofill Columns" dialog populate spreadsheet columns with data which are then "read into" the graph, but data in the columns are not reflected in the "Autofill" columns). It's a challenging problem to solve. Any thoughts you have on the subject would be welcome.
  • Marc Smith: Your comment about the unequal nature of the "Autofill" columns and "Dynamic Filters" features is valid! We are discussing ways of addressing these issues.

So I will wrack my brain for a possible solution as well...

 

References

 

top top