A study, co-funded by the VIGILANT project, was conducted by researchers from the University of Sheffield and Qingdao University of Science and Technology, to examine the differences between accurate COVID-19 information and misinformation disseminated on Twitter. Using a dataset of over 242 million tweets, the study compares these two types of information across four key aspects: topic distribution, tweet lifespan, language characteristics, and spreading power over time.
During the COVID-19 pandemic, social media became a primary source of information, leading to an "infodemic" where both accurate and false information spread rapidly. This situation made it difficult for users to discern reliable information, often resulting in harmful consequences such as following false treatments or attacking medical workers. To address this, the study aimed to understand the differences between accurate information and misinformation and improve misinformation detection.
"This paper undertakes a large-scale study of the statistical characteristics of accurate COVID-19 information compared to COVID-19 misinformation on Twitter."
The study focused on three main questions:
- What are the differences in topics and languages between accurate information and misinformation?
- What types of misinformation have social media platforms addressed?
- What is the spreading power of different types of misinformation?
Methodology
The researchers developed an evidence-based COVID-19 misinformation classifier using a newly created training set. This classifier, based on pairwise comparisons between verified misinformation claims and tweet texts, significantly outperformed previous models. The study collected over 240 million tweets, identifying misinformation through this classifier and analyzing differences across several dimensions.
Key Findings
- Topic Distribution: Misinformation tweets frequently discussed conspiracy theories and general medical advice, while accurate tweets focused on prominent actors, community spread, and public authority actions.
- Tweet Lifespan: A significant portion of misinformation tweets (over 40%) were inaccessible upon revisiting, primarily due to account suspensions. In contrast, only 8.8% of non-misinformation tweets were inaccessible.
- Spreading Power: Misinformation spread faster and more widely than accurate information. The average spread power of misinformation tweets was significantly higher in the first 36 hours compared to accurate tweets.
- Language Analysis: Linguistic features showed distinct differences. Misinformation was associated with terms related to conspiracy theories and negative emotions, while accurate information correlated with positive emotions and social terms.
The findings highlight the effectiveness of using computational tools to differentiate between misinformation and accurate information. The study underscores the need for robust misinformation detection mechanisms on social media platforms to mitigate the spread of harmful content, but also provides valuable insights into the characteristics and spread of COVID-19 misinformation compared to accurate information. By leveraging an enriched dataset and advanced classification techniques, the research offers a significant step forward in combating misinformation during pandemics. The improved classifier and analytical framework can aid in developing better strategies to handle misinformation in future health crises.
The study is available at https://workshop-proceedings.icwsm.org/pdf/2023_45.pdf and was published under proceedings of the TrueHealth 2023: Workshop on Combating Health Misinformation for Social Wellbeing at ICWSM 2023.
Link to Zenodo: https://zenodo.org/records/11241758