Two degrees of Tina Fetner, Part 1

In early May of 2012, kick-ass sociologist Tina Fetner had 422 followers on Twitter. But how many followers did her followers have? Who is popular among @fetner’s followers? In social network analysis terms, are there any structural holes in the network? More generally, what does @fetner’s ego-network look like?

As noted in an earlier post, Twitter is a useful resource for social scientists because it is widely used, the data is largely public, and researchers can get access to a lot of Twitter’s current data through the API. In addition to text analysis, it’s also a great resource for network analysis because of all the different social networks that are created on Twitter. Relationships can be based on following the same person as someone else; being followed by the same person as someone else; mentioning someone in tweet; retwetting someone; or even by using the same hashtag as someone else. To simplify things, I’m only going to discuss links created by following someone, but the same set of tools could be used to construct all sorts of networks, depending on what you think is theoretically relevant.

For data collection purposes, the easiest way to get a list of followers is by using the Twitter API “GET Followers/ID” method. For example, If you type, “https://api.twitter.com/1/followers/ids.json?cursor=-1&screen_name=fetner“, into your address bar, you’ll see a list of numbers along with some other text. These numbers are the Twitter user id numbers for each of the followers of @fetner. This is the same list of names you would get if you click on followers on Tina’s Twitter page, but much less fancy looking. If you replace “fetner” with “justinbieber” you will get a longer list of numbers. In this case, it won’t be everyone, as Twitter limits you to seeing 5,000 users at a time. Justin Bieber has a lot more followers than that, and you can scroll through them by changing the “-1” value that follows “cursor=” in the query to the number that follows the words “next_cursor” at the top of the page.

You might have recognized that the file that Twitter returned is in the JSON format. One hint was that the URL has the words “json” in it. The more subtle hint was that it began with a “{” and had things that looked like variable names (e.g. “previous_cursor”) in quotation marks, followed by a colon, followed by a value (e.g. “0”). This means that we can easily digest the data in Python. To print a list of @fetner’s followers in Python, type:

>>> import urllib2
>>> import json
>>> url="https://api.twitter.com/1/followers/ids.json?screen_name=fetner"
>>> followers=urllib2.urlopen(url)
>>> followers=json.load(followers)
>>> print followers['ids']

The first two lines import the required modules: one for accessing the Internet, the other for translating JSON files. While we previously used urllib, this time we will use its replacement urllib2, which has better error handling, but lacks the ability to directly write files. The third line of the code sets up a string to hold the value of the URL we want to access. In the following line, the first reference to followers grabs that file from the Internet, the same way that your browser did earlier. The second reference to followers tells Python that this is the JSON format, and stores it with the same name. The ids of the followers are stored as a Python list under the ids key in followers. If you wanted to count how many followers there were, you could compute the length of the list:

>>> print len(followers['ids'])
422

You may get a different number, as @fetner accumulates followers.

How did I know that the location ['ids'] was going to be a list of the ids? One way would be to look closely at the API’s documentation. Here we see the nicely formatted results of a sample call to the API:


Following Python JSON syntax, each of the words in blue, which are in quotes and followed by a colon, such as previous_cursor or ids are different entries in the dictionary and can be accessed in the standard way: the name of the JSON dictionary–followers in this case–followed by entry name (or key) that you want to access in quotes and then in brackets. For example, you could get the next page of followers by  print followers['next_cursor']. This returns a 0 when there is no next page of followers, but if you were looking at @justinbieber’s followers, it would return a value which you could then put in your search query to get the next page in an automated fashion.

Unfortunately, this method only returns the follower’s Twitter user id number, and not their screen name. You can look that up using the API’s user lookup method. So here’s how to find out who’s the first name on Tina’s list of followers:

>>> url="https://api.twitter.com/1/users/lookup.json?user_id=17639217"
>>> user=urllib2.urlopen(url)
>>> user=json.load(user)
>>> print user[0]['screen_name']
JulieFader

As described in the documentation, this query can get most of the information Twitter has on somebody based on either their user_id or screen_name. As before, we get the file and then load it as JSON object. Since this query can accepts multiple lookups–that is, you could enter a comma-separated list of ids or screen names–it returns a list, where each item in the list is a dictionary about one user. Since we only have one person, we access the first item in the list by appending the information in [0] to user. It helps to remember that in Python the first item in a list is located at position zero.

While I only wanted the screen_name, I might want to know what else was in the file. Instead of relying on the documentation to figure out what was returned, I looped through the list of available items:

>>> for item in user[0]:
...     print item
...
follow_request_sent
profile_use_background_image
id
verified
profile_image_url_https
profile_sidebar_fill_color
geo_enabled
profile_text_color
followers_count
protected
id_str
default_profile_image
location
status
utc_offset
statuses_count
description
friends_count
profile_link_color
profile_image_url
is_translator
show_all_inline_media
profile_background_image_url_https
profile_background_color
profile_background_image_url
screen_name
lang
profile_background_tile
favourites_count
name
notifications
url
created_at
contributors_enabled
time_zone
profile_sidebar_border_color
default_profile
following
listed_count

As you can see, Twitter provides a decent amount of information about each user. And if you wanted to inspect the contents of each, you could just replace screen_name with the entry you want to display or store:

>>> print user[0]['location']
Toronto and touring

Unfortunately, this page doesn’t provide the list of followers, so we can’t get the screen name and followers in one place. We originally looked up @fetner’s followers using her screen name, but we can also do it with the user id by substituting user_id=4111281 for screen_name=fetner in the search parameter. For example, to get the list of the followers of @fetner follower @JulieFader:

>>> url="https://api.twitter.com/1/followers/ids.json?user_id=17639217"
>>> followers=urllib2.urlopen(url)
>>> followers=json.load(followers)
>>> print followers['ids']

This returns a list of @JulieFader’s 1,835 followers’ user ids.

Instead of looking up the followers of each of @fetner’s followers one at a time, we could create a loop that cycles through each of @fetner’s followers and stores the user id of the person we are looking up and the user id of each of the followers. The most convenient way to store these results would be in a list of tuples. Tuples, you may remember, are sort of like lists, but are commonly used for storing multiple attributes of a single object,like the latitude and longitude of a point, or, in this case, the two Twitter users that are connected. It is also the standard way edge data is stored for network analysis in the powerful and widely-used NetworkX Python program. For example, an edge list that shows the relationship between @fetner and her first three followers could be written as:

>>> fetner_ego=[(4111281,17639217),(4111281,356643200),(4111281,224103709)]
where each edge is enclosed in parentheses, the first number is the user id of the person being followed, and the second is the user id of the person following. We could create an edge list with all of @fetner's followers using a loop: [/cci]
>>> fetner_ego=[]
>>> for follower in followers['ids']:
...     edge=(4111281,follower)
...     fetner_ego.append(edge)
...

Here, the first line creates our empty list and the second begins a loop over each of the followers. Note that if you’ve been copying and pasting the script up until now, this will actually contain the followers of @JulieFader, but that’s okay, we’ll fix it later. The third line creates a tuple called edge with @fetner’s userid as the first item and the follower’s id as the second. Finally, we add that tuple to the list using .append. Remember that since append is actually changing fetner_ego, it isn’t prefaced by fetner_ego=.

We can now put this all together to find out how many people are one and two degrees away from Tina Fetner on Twitter. Because WordPress is cutting me off–too many calls to the plugin that displays the Python code–the exciting conclusion is in the next post.

About Neal Caren

Sociology
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Comments are closed.