Please register to participate in our discussions with 2 million other members - it's free and quick! Some forums can only be seen by registered members. After you create your account, you'll be able to customize options and access all our 15,000 new posts/day with fewer ads.
Can anyone find any data on how they actually created this map? It literally makes zero sense... there are random dots on extremely small towns based on a single word.
Can anyone find any data on how they actually created this map? It literally makes zero sense... there are random dots on extremely small towns based on a single word.
Can anyone find any data on how they actually created this map? It literally makes zero sense... there are random dots on extremely small towns based on a single word.
It's a common problem with small sample sizes and normalized data. If Twitter is not popular in a county, and there just happens to be a couple of homophobic people in that county, then they're going to skew the data pretty hard. Still, it's better than not normalizing the data at all (then you'd just the strong correlation with population).
I fully understand that, I'm just trying to figure out how they let their bias affect this.
Quote:
"Here on the data team, we tend to be skeptical about the accuracy of semantic analysis. But the students and professors at Humboldt State University who produced this map read the entirety of the 150,000 geo-coded tweets they analysed.
Using humans rather than machines means that this research was able to avoid the basic pitfall of most semantic analysis where a tweet stating 'the word homo is unacceptable' would still be classed as hate speech. The data has also been 'normalised', meaning that the scale accounts for the total twitter traffic in each county so that the final result is something that shows the frequency of hateful words on Twitter. The only question that remains is whether the views of US Twitter users can be a reliable indication of the views of US citizens. Tell us what you think by posting a comment below."
Quote:
The Geography of Hate is part of a larger project by Dr. Monica Stephens of Humboldt State University (HSU) identifying the geographic origins of online hate speech. Undergraduate students Amelia Egle, Matthew Eiben and Miles Ross, worked to produce the data and this map as part of Dr. Stephens' Advanced Cartography course at Humboldt State University.
The data behind this map is based on every geocoded tweet in the United States from June 2012 - April 2013 containing one of the 'hate words'. This equated to over 150,000 tweets and was drawn from the DOLLY project based at the University of Kentucky. Because algorithmic sentiment analysis would automatically classify any tweet containing 'hate words' as "negative," this project relied upon the HSU students to read the entirety of tweet and classify it as positive, neutral or negative based on a predefined rubric. Only those tweets that were identified by human readers as negative were used in this analysis. To produce the map all tweets containing each 'hate word' were aggregated to the county level and normalized by the total twitter traffic in each county. Counties were reduced to their centroids and assigned a weight derived from this normalization process. This was used to generate a heat map that demonstrates the variability in the frequency of hateful tweets relative to all tweets over space. Where there is a larger proportion of negative tweets referencing a particular 'hate word' the region appears red on the map, where the proportion is moderate, the word was used less (although still more than the national average) and appears a pale blue on the map. Areas without shading indicate places that have a lower proportion of negative tweets relative to the national average.
The numbers that appear in the map during a mouse hover indicate the total number of hateful tweets and number of unique users sending them in each county.
Read more about the research and methods behind this project at www.FloatingSheep.org.
Funding was provided by the University Research and Creative Activities Fellowship at Humboldt State University. Twitter data was obtained from the DOLLY project at University of Kentucky.
This map was built on the Google Maps API.
It's a common problem with small sample sizes and normalized data. If Twitter is not popular in a county, and there just happens to be a couple of homophobic people in that county, then they're going to skew the data pretty hard. Still, it's better than not normalizing the data at all (then you'd just the strong correlation with population).
I leave the misconstruing to fit political biases to the masters of the practice - liberals.
You do realise you're perpetuating the cycle by being exactly the same as what you fight against?
Please register to post and access all features of our very popular forum. It is free and quick. Over $68,000 in prizes has already been given out to active posters on our forum. Additional giveaways are planned.
Detailed information about all U.S. cities, counties, and zip codes on our site: City-data.com.