Covid-19 Vaccine Fake News Detector: Predicting whether news are true or false
This project was carried out as part of the TechLabs “Digital Shaper Program” in Düsseldorf (Summer Term 2022).
The emergence and spread of fake news during the Covid-19 pandemic inspired TechLabs Data Science Track students to develop an app to help people identify misleading information. For this purpose, two models were developed: a Machine Learning model and a Deep Learning model, which are available via a web-based app. The app can classify information on Covid-19 vaccines in the German language as true or false, allowing people to make better informed decisions based on verified facts. With the help of this app, misleading information can be identified, and the dissemination of rumors can be reduced.
The Covid-19 pandemic went along with numerous articles and news items in traditional print media and electronic and social media. Information spreads rapidly and can come from trusted and untrusted medical sources . Inaccurate information can be divided into misinformation, which is spread without the intention to mislead, and disinformation, which spreads false information with malicious intent to deceive [2, 3]. Fake news is misleading and deceptive news written and published to harm an organization or an individual . An estimated 30 to 35 percent of disseminated news, videos, and photos on social media are fake . This so-called “infodemic” has made reliable information harder to find and recognize, and rumors spread faster . A large scale of fake news and misinformation led, among other things, to anti-vaccine protests and hesitancy . The spread of inaccurate information about Covid-19 makes it difficult for the public to make informed decisions and thus threatens public health . The aim of this work is to introduce a Machine Learning and Deep Learning approach to predict whether a piece of text information on the Covid-19 vaccine is true or false.
A classification model with two classes (“true” or “false”) was trained to build the Fake News Detector App. 500 true and 500 false statements in German language about Covid-19 vaccines were collected in a Pandas DataFrame in Python. The criteria and sources for the true and false statements were defined in advance. The sentences in our database contain different kinds of punctuation, upper and lower case letters, German grammar, and specific medical terminology. Six preprocessing steps were performed for data manipulation. All punctuation marks and spaces were removed, and all letters were written in lowercase. We tokenized the words so the model could process each word (“token”) instead of an entire sentence. We then removed stop words using the nltk library before finally reducing all words to their stem as part of the lemmatization process using the spacy library.
Figure 1 — Example Preprocessing (own illustration)
This example shows preprocessing steps from the original statement to the sentence to train the model.
Two models were trained for comparison: a Machine Learning model based on logistic regression and a Deep Learning model based on the artificial neural network Long short-term memory (LSTM).
The Scikit-learn library was used for the machine learning model. The database was divided into training (80 percent) and testing (20 percent). The model created a Term Frequency*Inverse Document Frequency (TFIDF) matrix as a weighting factor: depending on the frequency with which a term occurs in a sentence, its relevance (score) is high or low.
For the Deep Learning model, the TensorFlow library was used. The data were divided into training, validation, and testing. The data went in a sequence through the three gates of the LSTM neural network, the forget, input, and output gates. Those gates control how the data enters, is stored and leaves the network. The output range of the network is between 0 and 1, which was used for the prediction of our model.
Both models predict whether a new statement is true or false based on the training.
Figure 2 — Fake News Detector Models (own illustration)
This overview shows roughly how the two models are trained and how the user can input a sentence to get a prediction.
Both models provide a reliable result, with an accuracy of 84 percent in the Machine Learning model and 90 percent in the Deep Learning model (accuracy / f1-score). The results show that both Deep Learning and Machine Learning approaches can solve a natural language processing problem. The models were made accessible to external users via the web-based application Anvil. The application is directly linked to our models via Google Colab. The user can enter a sentence on Covid-19 vaccines into the input screen of the Anvil app. The system automatically performs all the preprocessing steps for the new sentence and generates a prediction. The prediction appears as a pop-up with the information about whether the entered sentence is true or false.
Outlook for future work
Although the accuracy of both models is satisfactory, we acknowledge that the database is not yet large enough to generalize our Fake News Detector. Further improvements would be to collect more data sets and train the model in more detail to make the result more valid. Another way to improve the model is to optimize its parameters through hyperparameter tuning. Doing so, we can further improve the model’s accuracy and, thus, its use and reliability.
GitHub repository: https://github.com/yannick5000/Fake-News-Detector-Covid-19-Vaccine/tree/main
 Khan et al., 2022. Detecting COVID-19-Related Fake News Using Feature Extraction. Front. Public Health 9:788074. doi: 10.3389/fpubh.2021.788074
 WHO, 2020. Munich Security Conference. Retrieved from: https://www.who.int/director-general/speeches/detail/munich-security-conference
 Simon et al., 2020. Types, sources, and claims of COVID-19 misinformation. Reuters Institute. University of Oxford. Retrieved from: https://reutersinstitute.politics.ox.ac.uk/types-sources-and-claims-covid-19-misinformation
 Wardle & Derakhshan, 2017. Information Disorder Toward an interdisciplinary framework for research and policymaking. Council of Europe, October, 2017. Retrieved from: https://rm.coe.int/information-disorder-toward-an-interdisciplinary-framework-for-researc/168076277c
 Pennycook et al., 2020. Fighting COVID-19 misinformation on social media: experimental evidence for a scalable accuracy-nudge intervention. Psychological science. (2020) 31:770–80. doi: 10.1177/0956797620939054
 Ripp & Röer, 2022. Systematic review on the association of COVID-19-related conspiracy belief with infection-preventive behavior and vaccination willingness. BMC Psychol 2022, 10:66, doi: 10.1186/s40359–022–00771–2
 Bin Naeem & Boulos, 2021. COVID-19 misinformation online and health literacy: a brief overview, Int. J. Environ. Res. Publ. Health 18:15 (8091) https://doi.org/10.3390/ijerph18158091