|Appendix to Review|
|Review of The Craft of Information Visualization (Bederson & Shneiderman)|
|Human-Computer Interaction Laboratory (HCIL, University of Maryland)|
|Connected Action (Marc Smith)|
By Gerd Waloszek, SAP User Experience – November 10, 2010
This review takes a personal look at Analyzing Social Media Networks with NodeXL by Derek Hansen, Ben Shneiderman, and Marc Smith.
Derek Hansen, Ben Shneiderman & Marc Smith
Information: Information presentation
Derek L. Hansen is an Assistant Professor at the University of Maryland's iSchool and Director for the Center for the Advanced Study of Communities and Information, a multi-disciplinary research center focused on harnessing the power of novel social technologies to support the needs of real and virtual communities. He is also an active member of the Human Computer Interaction Lab.
Ben Shneiderman is a Professor in the Department of Computer Science, Founding Director (1983–2000) of the Human-Computer Interaction Laboratory and a member of the Institute for Advanced Computer Studies, all at the University of Maryland at College Park.
Marc Smith is a sociologist specializing in the social organization of online communities and computer mediated interaction. He founded and managed the Community Technologies Group at Microsoft Research in Redmond, Washington and led the development of social media reporting and analysis tools for Telligent Systems. Smith leads the Connected Action consulting group and lives and works in Silicon Valley, California. Smith co-founded the Social Media Research Foundation, a non-profit devoted to open tools, data, and scholarship related to social media research.
When I attended a Web 2.0 symposium at the University of Bonn, Germany, in late 2006, the conference speaker who impressed me most was Marc Smith. Smith, who was working at Microsoft Research at the time, used beautiful graphs (probably bubble charts) to visualize social media networks that were derived from analyses of bulletin boards at Microsoft. (I still own a DVD that he distributed at the conference, but I never found the time to explore it.) This summer, some four years later, Ben Shneiderman drew my attention to a new book that he had co-authored with Marc Smith and Derek Hansen: Analyzing Social Media Networks with NodeXL. Like Shneiderman, Hansen is a member of the HCIL, the Human-Computer Interaction Laboratory at the University of Maryland. Once again, social media networks and Marc Smith were brought into my focus. The title suggests that the book is primarily about NodeXL, a tool for analyzing social networks. As we will see in the course of my review, this is only partly true. The book could actually be regarded as three books in one. But readers get much more than just three books inside one cover: They can also obtain the tool in question free of charge, because NodeXL is available in the Open Source community. What more could you want? I will come back to this question in my conclusion.
The word network is familiar to a modern audience: We talk of railway networks, telephone networks, computer networks, and even of networks of friends. The latter category, as we will soon learn, is known as a social network because it connects people. During the last ten or twenty years, most people have also gained a conception of social media: The less technically savvy have learned about them from other people or from the media. The others, who in Western countries probably constitute the majority, use them in a variety of ways: e-mail, discussion forums, blogs and micro blogs (Twitter), wikis (Wikipedia), social sharing sites for photos and videos (Flickr, YouTube), community sites (FaceBook, LinkedIn), and many more. Social networks and social media combine (you guessed it!) to form what are known as social media networks: social networks that are created through the use of social media.
Because of the significance that social media and the resulting networks have attained in our lives, researchers and many other groups are now striving to analyze and understand them using tools like NodeXL. To do this, however, we need to think and talk about networks in more formal terms. I presume that many readers of this review will not be familiar with the abstract concepts of social networks, how such networks can be analyzed, and why people are – or should be – interested in analyzing them. For the benefit of these readers, I would like to provide a "quick guide" to the topics covered in the book. Further details can be found in the appendix.
Generally, a network is "a collection of things and their relationships to one another." If we talk about networks in formal terms, then the things are called nodes or vertices. They can be any objects or entities, such as people, Web pages, or message board postings. Whenever people interact – directly or indirectly – with other people, institutions, or artifacts, we speak of social networks. Social networks are primordial: They have existed as long as the human race. Social network science, on the other hand, is relatively new and has its origins in the beginning of the twentieth century. The relationships between things are referred to as edges, ties, links, or connections, which sound more familiar. In social networks, the relationships can often be given real-world labels, such as friend of, sends e-mail to, or follows XY on Twitter. The last two examples refer to the aforementioned social media networks, namely networks that are created when people use social media. (You can find a tour de force of social media in Chapter 2, Social Media: New Technologies of Collaboration.)
Network science studies patterns of connection in a wide range of physical and social phenomena. In particular, social network analysis applies the broader field of network science to the study of human relationships and connections, providing a set of methods and tools that help explore and visualize the patterns found within social (media) networks. For a number of reasons, a breakthrough in social network analysis has emerged only recently:
The tools that have been developed for research and in the industry for social media network analysis and visualization often require programming skills and technical knowledge, which severely restricts the circle of people who can use them. This is where NodeXL (Network Overview Discovery and Exploration add-in for Excel 2007) comes in: It is a tool that has been designed for ease of use and customized for application to social media networks. As an Excel add-in (Excel 2007 and higher on the Windows platform) it allows both students and practitioners to get their hands on social network analysis without much hassle: They can perform their analyses in the familiar spreadsheet environment without the need for programming. NodeXL offers a choice of visual properties, provides advanced filtering capabilities, calculates frequently-used individual and overall network metrics (see the appendix), and supports diverse visual network layouts. An important feature of NodeXL is its ability to import social network data from a variety of social media, and thus make social network analyses available for everyone. The tool was developed by a distributed team, funded by Microsoft Research, and is available in the Open Source community. According to the authors, it handles networks with several thousands of nodes well. Some users have even applied it successfully to networks containing tens of thousands of nodes (details about NodeXL can be found, for example, in the preface to the book).
According to the authors, the availability of wide-spread network connectivity (for example, WWW, mobile phone networks, UMTS ...) has led billions of people to change their life style by using social media. Despite their recent development – over the last few decades – these tools have become ubiquitous. "People" includes individuals, organizations, business enterprises, research institutions (see, for example, the review of Scientific Collaboration on the Internet), government agencies, and so on. The authors also caution that "for every successful example, there are numerous failed attempts suggesting the need for more systematic methods for analyzing and understanding social media environments. ... Understanding how these social media networks thrive, change, or fail is a substantial challenge to researchers and professionals." Because social media have a massive impact on all areas of life, interest in them is not restricted to research activities: "Existing social institutions, educational curricula, business plans, and government policies are shifting as a result of social media tools and their application." All of them show an increasing interest not only in utilizing social media but also in understanding the data that the corresponding networks provide in order to use them more effectively. As the authors state, "these relational data provide a wealth of new opportunities to understand and improve the social worlds we inhabit."
According to the authors, this book will benefit practitioners, researchers, and students who are interested in the study of social media. It is intended to enable business leaders, instructors, and students to "shift their attention from a focus on mastering social network tools and concepts to their application in measurement, analysis, and interpretation. Readers will learn how to extract insights from networks to reveal internal business activity, external customer communities, and the competitive landscape." All in all, the authors seem to primarily target the kind of educational audience that is found at universities, particularly in schools of computing, social sciences, business, and management, where enterprise collaboration and customer communities seem to have become "hot topics."
The book consists of three parts, referred to by the authors as "roots", "trunk", and "branches". As I suggested earlier, it might also be regarded as "three books in one."
Part I, Getting Started with Analyzing Social Media Networks (chapters 1 through 3), is the "backbone" of the book. It begins by giving a historical perspective and goes on to describe the core concepts of social media and social network analysis. Written as a terse primer in social networks, social media, social media networks, and their analysis, it assembles the background knowledge that you need to analyze network data independently. Chapter 2, Social Media: New Technologies of Collaboration, in particular, provides a useful framework for comparing social media tools, which is based on six key dimensions:
These dimensions may look rather abstract at first sight, but prove, as the book demonstrates, very useful for classifying and comparing social media. As an example, "genre of basic elements" refers to the digital objects that determine the basic interaction in social media networks. Such elements can encompass e-mail messages, Twitter posts, posts in discussion forums, photos, videos, and many more.
Chapter 3, Social Network Analysis: Measuring, Mapping, and Modeling Collections of Connections, introduces the formal concepts behind social media networks such as network building blocks, networks types, and networks analysis metrics. These metrics play an important role in interpreting the patterns of relationships in social media networks (see the appendix for a tabular overview).
Figure 1: My own experiments with NodeXL
Part II, NodeXL Tutorial: Learning by Doing (chapters 4 through 7), on the other hand, is a "hands-on" course (taking approximately 8 hours) that focuses on the practical details of operating NodeXL: It starts with the basic handling of the Node XL template, devotes a chapter to calculating and visualizing network metrics, another one to data preparation and filtering, and closes with the advanced topic of clustering and grouping (an earlier draft of the tutorial is also available on the Web). The tutorial guides users step by step through the tool's various functions and applications. Important steps and settings are visualized in example screens. Users who work through the tutorial should note, however, that the layouts on their computers may differ from those shown in the book. This is because the layout algorithms produce a different layout each time the graph is refreshed. There may also be minor discrepancies due to NodeXL version differences. Because the process of entering the data set manually would be tedious, the authors offer it for download on the book Website. Please note that the URL of the book Website did not make it into the book. You can find it in the references in the appendix.
Part III, Social Media Network Analysis Case Studies (chapters 8 through 15) makes up nearly two thirds of the book. It presents eight social media network analysis case studies, six of which were contributed by external authors (see the overview of the book in the appendix). They cover e-mail, threaded networks, Twitter, FaceBook, the Web, Flickr, YouTube, and wiki networks. These chapters present popular instances of social media, often include their historic background, and provide insight into the specific social patterns that emerge from using them. They demonstrate NodeXL's usefulness for real-world investigations with practical examples, but are also designed as an extended tutorial that builds on the tutorial in Part II of the book. While the chapters differ in the degree to which they offer practical support, they typically demonstrate how readers can import data from the respective social media and perform analyses on their own, what aspects they can explore, and how they can explore them. This assistance is perhaps the most important aspect of this part of the book. Once again, it reflects the authors' commitment to enabling more people to perform their own social media network analyses.
All the chapters have a common structure: They begin with an outline, continue with an introduction to the topic of the chapter, and finish with a practitioner's summary and a researcher's agenda (similar to Shneiderman's text book Designing the User Interface). References and additional resources typically conclude each chapter. In addition, throughout the book, "Advanced Topic" boxes complement the text by offering digressions into the historical background, advanced concepts, technical details, and many more advanced aspects.
The appendix shows how users with the respective background in programming can customize the NodeXL Excel 2007 template to import network graph data from any data source and create their own graphing applications using the NodeXL class libraries.
Figure 2: My imported e-mails from 2010, an experiment with NodeXL when reading Part III (see also graph examples in the appendix)
The three parts of this book form a natural sequence: First, readers acquire the necessary background knowledge, then they learn how to use the tool and to interpret the data, and finally they see how the tool is applied in practice. Students are the book's primary target audience, and the authors suggest using it for semester-long courses and also for shorter units covering selected topics. Practitioners, on the other hand, do not have such opportunities. They will also rarely find the time to read the book from cover to cover, particularly if they are not directly involved in network data analysis. Most of them would probably like to get their hands on NodeXL as quickly as possible. Whether this will make sense for them, depends to a large degree on their background knowledge – not only of social media (as mentioned above, nearly everyone has some knowledge of these) – but in particular, of networks and the related concepts and metrics.
For a jump-start into NodeXL, practitioners might want to skip directly to Part II of the book (the NodeXL tutorial) and start learning how to use the tool (download and installation instructions are provided; alternatively you can download a NodeXL tutorial, which is a draft version of Part II, from the Web; a video tutorial for NodeXL is also available on the Web). The appendix to my review might also help readers get a quick overview of the concepts relating to social network analysis. However, the information provided there is intended solely as an advance organizer to facilitate the reading of Part I, not as a replacement for it. While the first chapter of Part II focuses on the tool's basic functionality and therefore requires little knowledge of networks, the second chapter could bring you to a sudden halt. Some experimentation and experience is needed to understand the different individual and overall network metrics. The last two chapters of Part II delve even deeper into advanced topics. Therefore, it might be a good idea to consult Part I while you are working through Part II in order to close knowledge gaps. This strategy will, of course, increase the time required for the tutorial beyond the prospective eight hours mentioned by the authors.
Part III of the book covers case studies, and the authors suggest that the chapters can be read in any order. For an easier read, I would also suggest that readers start with those social media types that they know best (or are most interested in) and then proceed to the less familiar ones. Once again, readers can jump back to Part I if they need to close knowledge gaps. In addition, because the case studies are largely independent of each other, there is no obligation to read this part of the book in one sitting. With each chapter having a length of about 20 pages, the individual chapters constitute a good weekend's reading.
While this strategy may sound a bit chaotic, it might prevent time-pressed readers from giving up completely in Part I because they cannot see "the light at the end of the tunnel," namely, a chance to get their hands on the tool.
Last, but not least, the book also provides a concise source of reference after its first reading.
In the introduction to my review I asked: "What more could you want?" In fact, I don't think that you could want more! This book is a complete package in every respect, including the free-of-charge NodeXL tool. But maybe you would like to ask for something less or different – I will return to this aspect below. All in all, the book is well written and prepared, as you might expect when Ben Shneiderman is among the authors. However, and this may also be due to Shneiderman's authorship, it was written primarily for an academic audience. Its terse style will be familiar to students of universities and business schools and also to readers of the methodological chapters in Designing the User Interface. For others, however, this style may make reading laborious und demanding. You cannot simply scan the text, as I tend to do when I get exhausted (and, being a non-native speaker, I tend to get exhausted sooner...); you will miss too much content. Therefore, I have some reservations with respect to whether the prospective reader groups of business leaders and practitioners will indeed be able to persevere in their reading, particularly in Part I. Thus, as this book is neither an Analyzing Social Media Networks for Dummies, nor a Quick Guide to Analyzing Social Media Networks for Managers" (or for practitioners), some prospective readers might ask for something different or less densely packed with information. People who are looking for these kinds of book, may have to look for other resources on this topics. However, they might also consider applying my reading suggestions above and find out, whether my tips will allow them an easier access to the fascinating topic of social network analysis.
With these reservations in mind, I definitely recommend the book to anyone who wants to get acquainted with social network analysis – not only by reading about it, but also, and this is mandatory for a deeper understanding of such a complex matter, by getting their hands on a tool that allows them to explore concepts and, particularly, their own data sets. NodeXL's ability to import network data enables people to analyze real data and to gain insights into behavioral patterns in social media networks, which have become such a key ingredient of our daily lives. As Ben Shneiderman wrote in a personal communication: "We don't expect everyone to become an SNA analyst, but we do hope to greatly expand or democratize the capacity to carry out this kind of work." In my view, this book is a definite and helpful step in this direction.
See the Appendix to the review