A Sociology Citation Network

[Here’s an update version of the graph.]

Kieran Healy recently posted on orgtheory about Dan Wang’s network analysis of Economic Sociology. It was fascinating so I decided to put together something similar for sociology more generally. While Dan constructed edges based on two articles being mentioned in the same week of a syllabus, I went with a relationship based on two works being cited in the same journal article. This maps something different, but I don’t have easy access to a large number of syllabi, and I have already played around with the Web of Knowledge data. After I produced my network, I discovered that my technique was quite similar to Jim Moody’s 2009 graph of the field.

I downloaded the bibliographies from all the articles published in the American Journal of Sociology, American Sociological Review, Social Forces and Social Problems since 2008–representing the last five years–from Web of Science. Even though the data is recent, the nodes in the network can be from just about any year, since the nodes are based on the articles being cited. Each article in the dataset produces dozens of network edges, because I consider each cited work to be tied to every other cited work. I didn’t do much to clean the data, so variation on the spelling of an authors name (e.g. “Caren N” and “Caren NP”) may lead some articles to be undercounted. I also identified articles based only on the first author–which is all Web of Science provides–and publication year, so some influential articles in the graph may actually be two articles. Looking over the data, I don’t think either of these issues has any major impact. For example, the major misspelling I saw is used consistently across articles.

In order to limit the size of the graph, I included only works that had been cited eight or more times, and only included edges that appeared in four or more different articles. The final network had 397 nodes and 1,597 edges. I also employed a community-detection algorithm, which divides the nodes based on who is closely connected to whom, to add some color. The size of each circle is based on the total number of citations to the work.

The static version on this page isn’t that useful. If you click on the picture, you’ll get the version that allows you to hover over a particular point and find out the author and year for each work. You can also drag nodes around, which is fun to watch. Unfortunately, I haven’t yet figured out how to have a WordPress page that has its own JavaScript and css files, so I can’t embed the fancy version here. Also, you should use Chrome or Safari to look at the network graph. It doesn’t display well in other browsers. The layout is based on a force-directed algorithm implement in d3.js. Nodes on the periphery have a tendency to drift away, so feel free to drag them back.

I think that, overall, the graph does a pretty good job of representing contemporary trends in sociology. Some of the areas involve large quantities of works that are frequently mentioned together, while other areas are more sparsely connected. For example, the large, dense, blue cluster is largely structural social movement works, while the pink cluster above includes cultural social movements research. The light blue cluster near them but closer to the middle is the more “pure” cultural sociology, with Swidler, Sewell and Bourdieu at the core. The light orange or salmon cluster is urban/race/segregation, and the big circle from this cluster nearest the middle is Raudenbush & Bryk’s 2002 HLM book. The light green cluster has some occupational classics, but also has Putnam’s Bowling connected it to a variety of literatures. The large orange cluster really wants to be two clusters, but Kalev, Dobbin and Kelly’s 2006 ASR piece on corporate affirmative action programs binds the Reskin wing with the Dimaggio and Powell wing.

Before you get too attached to any interpretation of the data, be sure to move the nodes around and see how they reset themselves. Some clusters are near each other randomly, while others are actually tied together. Pushing the graph around is sometimes the only way to find out which is which, which is why I like dynamic graphs, even for my own exploratory purposes.

This is no doubt a biased look at the field. It excludes citations from books, although books that are cited in journal articles are included. Areas like demography, which have their own top journals, likely see fewer works in the top general interest journals, so these fields may be underrepresented. By focusing on just the top journals, this version also excludes the majority of published work by sociologists. That said, the biased view represented in this network probably mimics the bias of the elites, so it’s not a bad view to understand.

Feel free to remix this analysis. Here is the raw data. The data is in the format that Web of Science provides for you as a “plain text file,” which isn’t super useful. My Python script that loads and analyzes the data isn’t well commented, but feel free to email me if you have any questions. The heavy lifting of the graph is accomplished through d3.js, which I don’t know how to use particularly well. Luckily, Drew Conway’s fork of NetworkX makes d3 easier to use with Python and NetworkX. I also use Thomas Aynaud’s Python implementation of the Louvain community detection method.

About Neal Caren

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

Comments are closed.