Social media data for social and behavioural research

June 18, 2021

Guangqing Chi and Junjun Yin from The Pennsylvania State University discuss how social media data has become a gold mine of information for both academic and non-academic use

Social media has become an essential component in the daily lives of many people. Its prevalence reached new heights after the COVID-19 pandemic hit.

Social distancing and reduced mobility led people to move many activities online; social media has played a prominent role in facilitating those transitions. Specifically, social media platforms enable people to build social networks, communicate and disseminate information, and interact on various political, cultural, and personal topics, using comments, likes/dislikes, sharing, etc.

Social media platforms now connect billions of users around the world, e.g., the number of active users on YouTube, Facebook, and Twitter so far in 2021 are 2.3 billion, 2.85 billion, and 353 million a month, respectively.

Not surprisingly, social media data that captures those conversations, user interactions, and social ties has become a gold mine of information for both academic and non-academic use.

Advantages of social media over traditional survey methods

Social scientists rely on survey and census data to study social problems and phenomena. However, the fast-changing world makes it challenging to do so effectively and efficiently due to two limitations of traditional survey data, which can be overcome with social media data.

“Social media, on the other hand, offers insights into time-sensitive events as they unfold, providing information on public opinions or observations about social movements (such as Black Lives Matter and the Me Too movement), political events (a presidential election, for example), and disastrous weather events ”

First, survey and census data collection is slow, labour intensive, and expensive. Social media data, unlike “designed data”, is directly collected from large groups of people, with no predefined criteria, and thus is deemed “organic data” and often referred to as “Big Data”. Big Data offers a timely and cost-effective approach for collecting massive amounts of information within a short time, and it provides an unprecedented amount of details about various social and behavioural dynamics.

Second, many population characteristics and behaviours cannot be measured well by traditional survey methods. For example, in the minutes after a disaster, crime, or terrorist event, what percentage of the population is fearful? Or what is the level of attentiveness to sudden public health announcements? In those cases, and many like them, it is difficult or costly to conduct a robust survey in real time, and it is likely that respondents will not be able to reconstruct how they felt or behaved at the time of the event, even if interviewed just a few days later.

Social media, on the other hand, offers insights into time-sensitive events as they unfold, providing information on public opinions or observations about social movements (such as Black Lives Matter and the Me Too movement), political events (a presidential election, for example), and disastrous weather events. More importantly, social media data can help reveal public sentiments and attitudes, such as why some people are sceptical of the COVID-19 vaccine —which is critical for public health and has a significant economic and financial impact on society.

Using social media data features such as geo-location tags and timestamps, researchers can break down event patterns based on how they evolve across time and space. In one study, geo-located Twitter data were used to evaluate the effectiveness of social distancing and its impact on the spread of COVID-19 in the United States. As another example, the data can identify temporary migration patterns after hurricanes.

Limitations and challenges of using social media data

Nevertheless, there is major concern about the representativeness of the social media user population. It is believed that the users of most social media platforms skew towards the younger population. Accessibility to the Internet also contributes to the disparity of the user population. This bias limits the ability to generalize research findings.

Accordingly, the use of Twitter data has been resisted in some social science disciplines such as sociology and demography. Also, social media data is collected from people when they are behind a screen (computer or mobile device), but how that electronic medium distorts people’s interactions on social media platforms remains unknown. Other concerns about using social media data in research include dealing with slang, sarcasm, and unconventional forms of written expression including hashtags, emoticons, and acronyms, as well as the costs of obtaining, storing, and cleaning the massive amount of data.

Another limitation of using social media data for social and behavioural research is widespread misinformation (“fake news”) on various social media platforms, which polarizes social media, further affecting the overall health of the social media ecosystem because social media contents, user interactions, and social network connections become biased. Because the perception of bias draws public attention and scrutiny, social media providers are rushing to implement policies to mediate user content. The question is to what extent are these policies necessary?

Prospects for using social media data

As social media gains more influence in and on society, we foresee both advantages and disadvantages to the use of its data in the social and behavioural sciences. As the public becomes more sensitive to privacy concerns, stricter regulations could be put in place to reduce data accessibility, which would hinder its use for research purposes. For example, Facebook tightened access to its data after the Facebook–Cambridge Analytica scandal in 2018. And Twitter recently stopped displaying precise geo-location with tweets.

Nevertheless, there is tremendous benefit in using social media data for social and behavioural research. With faster Internet connections from 5G networks and the potential for worldwide satellite broadband, it is expected that more people from less developed countries/regions will join the social media community. This growth can help alleviate concerns about the lack of user representativeness in social media data. Also, artificial intelligence might be able to effectively detect misinformation and help social media platforms become a better forum for users and researchers alike. Another benefit is that in the age of the Internet of Things, abundant information from personal and environmental sensors could be linked to social media platforms, providing a wealth of contextual information about a user’s activities, but at the risk of damaging user privacy.

Social media presents abundant opportunities for advancing social and behavioural research and can create positive and significant societal impacts, if, and only if, it is used rigorously and ethically.

This study is supported in part by U.S. National Science Foundation (Award # SES-1823633)

Please note: This is a commercial profile