Fun Demographic Data from Facebook

At one point last month while preparing for a talk to incoming students, I was wondering whether the proportion of out LGBT people jumped at around the age you head off to college or otherwise leave home. One way to be out is to express your interest in people of the same sex on Facebook. Now, this might have different meaning across place, age, and gender, but the overall trends probably represent something meaningful. So, using the Facebook ad generator described below, I pulled the number of men who were interested in men, men who were interested in women, women who were interested in men, and women who were interested in women for each age between 15 and 24. To compute the proportion of men who liked men for each age, I divided the total number of people who expressed an interest in people of the same sex by the number of people who expressed an interested in the same sex plus the number expressing an interest in the other sex. Now, this is not the best possible denominator–as some people don’t list interest in either, and some people express interest in both (a figure you can only get from Facebook)–but I think that, of the available options, my measure roughly captures the proportion of people who express an interest in those of the same sex versus those who express an interest in either sex.

The data did not support my hypothesis:

Among both men and women, the proportion expressing same-sex interest doesn’t dramatically shift at all around age 18. Instead, there is a roughly linear increase during this period of life. In general, many more women than men express a same-sex interest on Facebook. In fact, the disparity is so large that the steady increase in the proportion of out men on Facebook get smushed so that it appears almost constant in this chart. The baseline and increase associated with age is much smaller among men, from 1.6% at 15 to 2.1% at 24, but it follows the same general trend as the women’s trend: steady increases but no big jump. I’ll leave it to others who study the areas of sexuality and social networks to explain the trends, but I was certainly struck by how wrong my hypothesis was.

Now I don’t know who all these gay and lesbian teens are, and Facebook won’t provide you with their names, but Facebook does provide enough other general demographic information about its users that you can tell some interesting sociological stories. For example, based on Facebook’s data, 1.1% of boys living in North Carolina have a preference for men, while the rate in California is slightly more than twice that at 2.4%. While more adult men in each state express an interest in men (1.7% in North Carolina compared to 3.6% in California), the ratio between the two states stays roughly equal. This might suggest that variation in the number of out gay people between states is not driven primarily by selection into geographic areas, as teens have limited mobility.

Facebook’s demographic data has more information than lesbian and gay demographics. For example 22% of people expressing an interest in the Tea Party also liked Farmville, compared to only 8% of Occupiers. And while Occupiers and Tea Party folk equally express an interest in cooking, this masks a very large gender divide, as Tea Party women are ten points more likely to have an interest in cooking than female Occupiers, while Occupy men favor cooking by the same amount over Tea Party men. More generally, you can acquire a count of people who express an interest on Facebook in just about anything. You can break these counts down by geography, education level, age, other interests, and sexual orientation. It’s a pretty powerful tool for analyzing data that are rarely on surveys and it’s free. I would say that the data quality from this method is comparable to things like Google searches or tweets, both of which have been used in numerous publications.

Why does Facebook give this data to researchers? Well, it doesn’t. It gives it to people who want to advertise on Facebook. To facilitate advertising, they provide potential ad buyers with the ability to target small demographic groups. So that you know the reach of your advertisement, they give you a count of the people who fit your demographic specifications. This way, I know that if I limit my ad to bowling enthusiast in the Chapel Hill area, I’ll only reach 120 people. It’s not the exact number—they round to the nearest 20. Rounding off started a couple of years ago to limit privacy invasions. Under the old system, you could be incredibly specific in your search terms (e.g. 39 year old, male, sociologist, works at UNC) and find out which of your co-workers was secretly a fan of Pretty Little Liars. You don’t have to be an advertising executive to use the tool–I first read about this Facebook tool on Gawker or Boing Boing where they described someone doing this stort of stalking.

To get the data, all you need is a Facebook account. Once that’s taken care of, go to Click the green “Create an Ad” button. Enter a URL in the, “Choose a Facebook destination or enter a URL:” line. You don’t have to own the URL—you can pick something like or whatever. Then click, “Suggest an ad”. It will find a picture and some text from the URL and display them to you. Feel free to ignore this.
A section called “Choose Your Audience” should now be visible. On the right side, it will display the total number of people who your ad could potential reach. The default is people 13-65 who live in the United States, which totals around 160 million people, approximately 80% of the US population in this age range.

By manipulating the different demographic criteria, you can get the quantities of interest. For example, if you type, “Classical Music” in the “Precise Interests” field, the audience figure will shift to about 1.8 million people. Note that matches with “#” in front of them are fuzzy, which is usually what you want.

You could limit yourself to a particular state, get the audience number and then repeat this process 49 more times to get a sense of the geography of classical music fans. Or if you enter more than one search term in the “Precise Interests” you get an “OR” search. So adding “#Heavy Metal music” to the previous search brings the total to 4.3 million. If you remove the “#Classical Music” search term, the audience shrinks to 2.7 million fans of “#Heavy Metal music”. If those groups of music lovers did not overlap, you would expect the combined total would be 4.5 million people, so there’s about 200,000 people who like both Classical and Heavy Metal music. To put things in perspective, there are 8.6 million Justin Bieber fans on Facebook, 140,000 of whom also like classical music. The tags that Facebook suggests after you enter one search term are often quite useful. For one search, I started with “#Tea Party movement”, but then added “#Tea Party Patriots”, “#The Tea Party” and “#FreedomWorks” at Facebook’s suggestion to get a more complete picture of the Tea Party supporters on the Facebook. In addition to the “Precise Categories”, make sure to check through both the set of predefined advertising “Broad Categories” (where you can find things like had a child under 0-3) to find possible categories of interest.

Unfortunately, there is no way to automate access to this tool. Instead you just have to make your choices and copy and paste the audience figures. This can be somewhat tedious if you are looking at variation across states and ages. It can also be confusing to figure out areas of overlapping interest when you are interest in multiple categories, such as 20 different musical genres where there are dozens of search term combinations that you want to enter. But, if you think the data is scientifically valid (and if you think that reviewers will too), it’s a pretty quick way to get useful data.

