Twitter Public Health Sentiment

Basic Overview

Status: Completed

Timeline: 1 Month April 2022

Technology: R

Note: Source code for the twitter data mining is not provided out of an abundance of caution for my own developer account, but I am happy to talk through my process. I used Twitter's API accessed via the rtweet package; nothing particularly complicated nor technically worth bragging about.

Download Source Code

Why Twitter for Public Health?

80%+ of Americans Get News Primarily Online

A large majority of Americans get news at least sometimes from digital devices, according to a Pew Research Center survey conducted Aug. 31-Sept. 7, 2020.

Social media has a disproportionate effect on the disbursal of false information

The Center for Countering Digital Hate, a nonprofit whose work focuses on misinformation and hate disseminated online, conducted the study to examine the origins of anti-vaccine sentiment that has gained momentum on social networking platforms during the coronavirus pandemic. Results pinpointed a group of 12 individuals, collectively referred to as "the disinformation dozen" in the CCDH's conclusory report, who are at the forefront of false information campaigns targeting COVID-19 vaccines on Facebook, Instagram and Twitter.

Method

Identify a variety of news organizations representing various viewpoints in American politics
Find “high influence” followers of these news organizations on twitter (top 50 number of follower users per org)
Draw 1000 tweets from these followers' timelines
Isolate tweets with public health terms
Perform sentiment analysis on these public health related tweets
Test for any differences in results by news org they follow using one-way ANOVA and grouping

Summary of Results

News Org	Number of Tweets	Average Sentiment	Median Sentiment
ABC News	1258	-0.11	0
BBC News (World)	1224	-0.02	0
Bloomberg	582	-0.17	0
CBS News	1953	-0.2	0
CDC	2660	-0.01	0
CNBC	1234	0.15	0
CNN	423	0.13	0
Forbes	1460	0.05	0
Fox News	1094	-0.22	0
NBC News	733	0.22	1
Reuters	1835	0.07	0
The Associated Press	2566	-0.12	0
The Economist	1315	0.11	0
The Guardian	2077	-0.06	0
The New York Times	1255	-0.03	0
The Wall Street Journal	923	0.16	0
The Washington Post	1075	-0.08	0
TIME	415	-0.05	0
World Health Organization (WHO)	1450	0.11	0

Tukey Summary Groups

The above reults may be hard to interpret and lack confidence intervals, so to determine significant difference I performed a one way ANOVA between each news org results with Tukey's honest significant difference wit ha 95% confidence interval. Each news org was the nassigned to one of several possible groups, represented by a letter; if a news org shares a letter with another news org, then there is nothing in the data to suggest a significant difference between the groups. However, if the news orgs share no letters, then there is evidence that the difference in sentiment is significant.

News Org	Groups
ABC News	abce
BBC News (World)	abcdef
Bloomberg	abce
CBS News	a
CDC	bcdef
CNBC	df
CNN	abcdef
Forbes	bdef
Fox News	ac
NBC News	d
Reuters	bdef
The Associated Press	ace
The Economist	bdf
The Guardian	abcef
The New York Times	abcdef
The Wall Street Journal	bdf
The Washington Post	abcdef
TIME	abcdef
World Health Organization (WHO)	bdf

Conclusions

There are some significant difference between groups in my findings that suggest which news sources a person turns to may affect their perception of various public health efforts.

However, this represents a tiny cross section of data, only ~38,000 tweets that were determined to be relevant from a pool of ~750,000. With a longer term project and a larger pool of data, there may be more clarity and confidence in the results. The Twitter API time limit severely limited my ability to mine for a significant number of tweets quickly.

May also need to work on a paired down list of public health terms, and do more data exploration for relevance (especially outliers!)