Scraping Comments from the New York Times

This isn’t a tutorial, but rather a link to a Python program that I wrote that scrapes comments from the New York Times. It doesn’t use the New York Times Community API and doesn’t require you to have a Times developer account. The official API has some additional ways to get data, such as by user, and you should learn more about it if you’re interested. My program grabs the same JSON data, so switching to the official feed is fairly painless.

Given the URL for a Times article with comments, the program will download all the public comments and return them as a list. Each item in the list is a dictionary, so you can easily access the specific fields that you want. Check out the official API documentation for a guide to the fields. Neither this module nor the official API requires you to be paid Times subscriber.

Sample usage:

>>> article_url='http://opinionator.blogs.nytimes.com/2012/04/17/whos-afraid-of-greater-luxembourg/'
>>> comments=nytimes_comments(article_url)
Found 12 comments
>>> for comment in comments:
...     print comment['commentBody']

A much enjoyed quirky article that also articulates many of the ongoing issues that the, fairly recent, nation state fixation has imposed. Up until a couple of hundred years ago territory usually followed a title and not until the 18th century did borders get to have a life of their own. Most, especially in Africa as we now see, have very little to do with either geographical or ethnic linkages.<br />
In some aspects Greater Europe may be evolving just as smaller units (Scotland, Luxembourg etc) try to assert more independence.
Fascinating article.  I remember flying Icelandic Airlines into Luxemburg in the '70's.  I took the first train out of there to Germany, but before I did, I wandered a bit around the city and saw the fantastic natural fortress built into the bluffs on the river that winds through the town.  Truly a sight.  
"...Malta, which is only 121 square miles in size, or about two-thirds the size of the District of Columbia."  DC is 68 square miles--which is 177 square *km* (121/177 is about 2/3).
These articles filled with all sorts of interesting geographical and historical facts are fun. Someone needs to check the first footnote, however. The District of Columbia was originally a square 10 miles on a side or 100 square miles, that was reduced to its present size of about 68 square miles when Virginia took back its chunk. So Malta is actually considerably larger than DC, rather than the other way around.
"Presiding over a Golden Age for Bohemia, Charles is considered father of the nation in the Czech Republic. He founded the university in Prague that is still named after him" - and the site of the international movie festival Karlovy Vary as well, perhaps?
A very funny presenter on BBC radio tells the story of taking a plane eastward through Europe and the pilot announcing, when they flew over Luxembourg, that they would "pass the duchy on the left hand side."
You might also add that L'burg has also produced a wildly disproportionate number of champion cyclists, including the current Frank and Andy Schlech.
Location, location, location.  Luxemburg City is at the heart of  western Europe.   There was no mention of the fortress that is the city of Luxemburg..,  it's called the Gibralter of the north for good reason.   Solid, unassailable, and continually attracting lots of very smart and savvy people, Luxemburg is well placed to become the nexus of Europe.
As a history student I enjoyed this article very much.  My famiily was European born, my father in Belgium, my mother in Holland, her mother in Germany.  My brother and I were born in Belgium.  We came to the United States in 1941.<br />
     G-d bless Luxemburg, a peaceful and beautiful place!
How did Luxembourg get to be the Delaware of Europe?  Why do major corporations register in Luxembourg when their operations, resources and sales are located elsewhere around the globe?

Since this can also return the number of people who recommended the comment, I imagine that it could be quite a useful tool for analyzing what well-educated Internet users think makes for good debate.

About Neal Caren

Sociology
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Comments are closed.