Researchers need to think twice when they use data from Facebook, Twitter and other social media sites to draw conclusions about human behavior, two computer scientists warn. The rise of social media has given researchers a quick and cheap way of mining information about what people are thinking and doing. But these massive data sets can be inherently biased, meaning the people found on such sites aren't a representative sample of the general public, according to Derek Ruths of McGill University and Jürgen Pfeffer of Carnegie Mellon University. In an article published in the Nov. 28 issue of the journal Science, the two urge behavioral scientists to find ways of correcting for the biases in their samples.
"Not everything that can be labeled as 'Big Data' is automatically great," Pfeffer said. As an example, Ruths and Pfeffer note that Instagram is favored by people between the ages of 18 and 29, African-Americans, Latinos, women and city dwellers, while Pinterest is dominated by women ages 25 to 34 with average household incomes of $100,000. Yet researchers rarely correct for these inherent sampling biases, Ruths and Pfeffer said.
IN-DEPTH
- For Politicians, Instagram Is Cool, But Facebook Is Still King
- 'Facebook Murders' Are Mostly Just Like Any Other Murders, Study Says
- Pew Privacy Survey: We've Lost Control Over Our Personal Data