Abstract
Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of \fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ine ective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.
- Sadia Afroz, Michael Brennan, and Rachel Greenstadt. Detecting hoaxes, frauds, and deception in writing style online. In ISSP'12. Google ScholarDigital Library
- Hunt Allcott and Matthew Gentzkow. Social media and fake news in the 2016 election. Technical report, National Bureau of Economic Research, 2017. Google ScholarCross Ref
- Solomon E. Asch and H. Guetzkow. Effects of group pressure upon the modification and distortion of judgments. Groups, leadership, and men, pages 222--236, 1951.Google Scholar
- Meital Balmas. When fake news becomes real: Combined exposure to multiple news sources and political attitudes of inefficacy, alienation, and cynicism. Communication Research, 41(3):430--454, 2014. Google ScholarCross Ref
- Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. Open information extraction from the web. In IJCAI'07.Google ScholarDigital Library
- Alessandro Bessi and Emilio Ferrara. Social bots distort the 2016 us presidential election online discussion. First Monday, 21(11), 2016. Google ScholarCross Ref
- Prakhar Biyani, Kostas Tsioutsiouliklis, and John Blackmer. "8 amazing secrets for getting more clicks": Detecting clickbaits in news streams using article informality. In AAAI'16.Google Scholar
- Jonas Nygaard Blom and Kenneth Reinecke Hansen. Click bait: Forward-reference as lure in online news headlines. Journal of Pragmatics, 76:87--100, 2015. Google ScholarCross Ref
- Paul R Brewer, Dannagal Goldthwaite Young, and Michelle Morreale. The impact of real news about fake news: Intertextual processes and political satire. International Journal of Public Opinion Research, 25(3):323--343, 2013. Google ScholarCross Ref
- Carlos Castillo, Mohammed El-Haddad, Jürgen Pfeffer, and Matt Stempeck. Characterizing the life cycle of online news stories using social media reactions. In CSCW'14. Google ScholarDigital Library
- Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. Information credibility on twitter. In WWW'11. Google ScholarDigital Library
- Abhijnan Chakraborty, Bhargavi Paranjape, Sourya Kakarla, and Niloy Ganguly. Stop clickbait: Detecting and preventing clickbaits in online news media. In ASONAM'16.Google Scholar
- Yimin Chen, Niall J. Conroy, and Victoria L. Rubin. Misleading online content: Recognizing clickbait as false news. In Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection, pages 15--19. ACM, 2015. Google ScholarDigital Library
- Justin Cheng, Michael Bernstein, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. Anyone can become a troll: Causes of trolling behavior in online discussions. In CSCW '17.Google Scholar
- Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Transactions on Dependable and Secure Computing, 9(6):811--824, 2012. Google ScholarDigital Library
- Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M. Rocha, Johan Bollen, Filippo Menczer, and Alessandro Flammini. Computational fact checking from knowledge networks. PloS one, 10(6):e0128193, 2015. Google Scholar
- Niall J. Conroy, Victoria L. Rubin, and Yimin Chen. Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology, 52(1):1--4, 2015. Google ScholarCross Ref
- Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H. Eugene Stanley, and Walter Quattrociocchi. The spreading of misinformation online. Proceedings of the National Academy of Sciences, 113(3):554--559, 2016. Google ScholarCross Ref
- Michela Del Vicario, Gianna Vivaldo, Alessandro Bessi, Fabiana Zollo, Antonio Scala, Guido Caldarelli, and Walter Quattrociocchi. Echo chambers: Emotional contagion and group polarization on facebook. Scientific Reports, 6, 2016. Google ScholarCross Ref
- Thomas G. Dietterich et al. Ensemble methods in machine learning. Multiple classifier systems, 1857:1--15, 2000.Google Scholar
- Mehrdad Farajtabar, Jiachen Yang, Xiaojing Ye, Huan Xu, Rakshit Trivedi, Elias Khalil, Shuang Li, Le Song, and Hongyuan Zha. Fake news mitigation via point process based intervention. arXiv preprint arXiv:1703.07823, 2017.Google Scholar
- Song Feng, Ritwik Banerjee, and Yejin Choi. Syntactic stylometry for deception detection. In ACL'12.Google Scholar
- Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini. The rise of social bots. Communications of the ACM, 59(7):96--104, 2016. Google ScholarDigital Library
- Johannes Fürnkranz. A study using n-gram features for text categorization. Austrian Research Institute for Artifical Intelligence, 3(1998):1--10, 1998.Google Scholar
- Ashutosh Garg and Dan Roth. Understanding probabilistic classifiers. ECML'01.Google Scholar
- Matthew Gentzkow, Jesse M. Shapiro, and Daniel F. Stone. Media bias in the marketplace: Theory. Technical report, National Bureau of Economic Research, 2014. Google ScholarCross Ref
- Adrien Guille, Hakim Hacid, Cecile Favre, and Djamel A Zighed. Information diffusion in online social networks: A survey. ACM Sigmod Record, 42(2):17--28, 2013. Google ScholarDigital Library
- Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi. Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In WWW'13. Google ScholarDigital Library
- Manish Gupta, Peixiang Zhao, and Jiawei Han. Evaluating event credibility on twitter. In PSDM'12. Google ScholarCross Ref
- David J. Hand and Robert J. Till. A simple generalisation of the area under the roc curve for multiple class classification problems. Machine learning, 2001. Google ScholarDigital Library
- Naeemul Hassan, Chengkai Li, and Mark Tremayne. Detecting check-worthy factual claims in presidential debates. In CIKM'15. Google ScholarDigital Library
- John Houvardas and Efstathios Stamatatos. N-gram feature selection for authorship identification. Artificial Intelligence: Methodology, Systems, and Applications, pages 77--86, 2006.Google Scholar
- Xia Hu, Jiliang Tang, Huiji Gao, and Huan Liu. Social spammer detection with sentiment information. In ICDM'14. Google ScholarDigital Library
- Xia Hu, Jiliang Tang, and Huan Liu. Online social spammer detection. In AAAI'14, pages 59--65, 2014.Google Scholar
- Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. Social spammer detection in microblogging. In IJCAI'13.Google ScholarDigital Library
- Zhiwei Jin, Juan Cao, Yu-Gang Jiang, and Yongdong Zhang. News credibility evaluation on microblog with a hierarchical propagation model. In ICDM'14. Google ScholarDigital Library
- Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. News verification by exploiting conicting social viewpoints in microblogs. In AAAI'16.Google Scholar
- Zhiwei Jin, Juan Cao, Yongdong Zhang, Jianshe Zhou, and Qi Tian. Novel visual and statistical image features for microblogs news verification. IEEE Transactions on Multimedia, 19(3):598--608, 2017. Google ScholarDigital Library
- Daniel Kahneman and Amos Tversky. Prospect theory: An analysis of decision under risk. Econometrica: Journal of the econometric society, pages 263--291, 1979. Google ScholarCross Ref
- Jean-Noel Kapferer. Rumors: Uses, Interpretation and Necessity. Routledge, 2017.Google ScholarCross Ref
- David O. Klein and Joshua R. Wueller. Fake news: A legal perspective. 2017.Google Scholar
- Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, and Yajun Wang. Prominent features of rumor propagation in online social media. In ICDM'13, pages 1103--1108. IEEE, 2013. Google ScholarCross Ref
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436--444, 2015. Google ScholarCross Ref
- Kyumin Lee, James Caverlee, and Steve Webb. Uncovering social spammers: social honeypots+ machine learning. In SIGIR'10.Google Scholar
- Tony Lesce. Scan: Deception detection by scientific content analysis. Law and Order, 38(8):3--6, 1990.Google Scholar
- Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. A survey on truth discovery. ACM Sigkdd Explorations Newsletter, 17(2):1--16, 2016. Google ScholarDigital Library
- Charles X. Ling, Jin Huang, and Harry Zhang. Auc: a statistically consistent and more discriminating measure than accuracy.Google Scholar
- Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J. Jansen, Kam-Fai Wong, and Meeyoung Cha. Detecting rumors from microblogs with recurrent neural networks.Google Scholar
- Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, and Kam-Fai Wong. Detect rumors using time series of social context information on microblogging websites. In CIKM'15. Google ScholarDigital Library
- Amr Magdy and Nayer Wanas. Web-based statistical fact checking of textual documents. In Proceedings of the 2nd international workshop on Search and mining user-generated contents, pages 103--110. ACM, 2010. Google ScholarDigital Library
- Filippo Menczer. The spread of misinformation in social media. In WWW'16. Google ScholarDigital Library
- Tanushree Mitra and Eric Gilbert. Credbank: A largescale social media corpus with associated credibility annotations. In ICWSM'15.Google Scholar
- Saif M. Mohammad, Parinaz Sobhani, and Svetlana Kiritchenko. Stance and sentiment in tweets. ACM Transactions on Internet Technology (TOIT), 17(3):26, 2017. Google ScholarDigital Library
- Fred Morstatter, Harsh Dani, Justin Sampson, and Huan Liu. Can one tamper with the sample api?: Toward neutralizing bias from spam and bot content. In WWW'16.Google Scholar
- Fred Morstatter, Liang Wu, Tahora H. Nazer, Kathleen M. Carley, and Huan Liu. A new approach to bot detection: Striking the balance between precision and recall. In ASONAM'16.Google Scholar
- Subhabrata Mukherjee and Gerhard Weikum. Leveraging joint interactions for credibility analysis in news communities. In CIKM'15. Google ScholarDigital Library
- Eni Mustafaraj and Panagiotis Takis Metaxas. The fake news spreading plague: Was it preventable? arXiv preprint arXiv:1703.06988, 2017.Google Scholar
- Raymond S. Nickerson. Con rmation bias: A ubiquitous phenomenon in many guises. Review of general psychology, 2(2):175, 1998. Google ScholarCross Ref
- Brendan Nyhan and Jason Reier. When corrections fail: The persistence of political misperceptions. Political Behavior, 32(2):303--330, 2010. Google ScholarCross Ref
- Christopher Paul and Miriam Matthews. The russian firehose of falsehood propaganda model.Google Scholar
- Dongping Tian et al. A review on image feature extraction and representation techniques. International Journal of Multimedia and Ubiquitous Engineering, 8(4):385--396, 2013.Google Scholar
- Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. A stylometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:1702.05638, 2017.Google Scholar
- Martin Potthast, Sebastian Köpsel, Benno Stein, and Matthias Hagen. Clickbait detection. In European Conference on Information Retrieval, pages 810--817. Springer, 2016. Google ScholarCross Ref
- Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. Rumor has it: Identifying misinformation in microblogs. In EMNLP'11.Google Scholar
- Walter Quattrociocchi, Antonio Scala, and Cass R. Sunstein. Echo chambers on facebook. 2016.Google Scholar
- Victoria L. Rubin, Yimin Chen, and Niall J. Conroy. Deception detection for news: three types of fakes. Proceedings of the Association for Information Science and Technology, 52(1):1--4, 2015. Google ScholarCross Ref
- Victoria L. Rubin, Niall J. Conroy, Yimin Chen, and Sarah Cornwell. Fake news or truth? using satirical cues to detect potentially misleading news. In Proceedings of NAACL-HLT, pages 7--17, 2016. Google ScholarCross Ref
- Victoria L. Rubin and Tatiana Lukoianova. Truth and deception at the rhetorical structure level. Journal of the Association for Information Science and Technology, 66(5):905--917, 2015. Google ScholarDigital Library
- Natali Ruchansky, Sungyong Seo, and Yan Liu. Csi: A hybrid deep model for fake news. arXiv preprint arXiv:1703.06959, 2017.Google Scholar
- Justin Sampson, Fred Morstatter, Liang Wu, and Huan Liu. Leveraging the implicit structure within social media for emergent rumor detection. In CIKM'15. Google ScholarDigital Library
- Chengcheng Shao, Giovanni Luca Ciampaglia, Alessandro Flammini, and Filippo Menczer. Hoaxy: A platform for tracking online misinformation. In WWW'16. Google ScholarDigital Library
- Baoxu Shi and Tim Weninger. Fact checking in heterogeneous information networks. In WWW'16. Google ScholarDigital Library
- Kai Shu, Suhang Wang, Jiliang Tang, Reza Zafarani, and Huan Liu. User identity linkage across online social networks: A review. ACM SIGKDD Explorations Newsletter, 18(2):5--17, 2017. Google ScholarDigital Library
- Supasorn Suwajanakorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics (TOG), 36(4):95, 2017. Google ScholarDigital Library
- Eugenio Tacchini, Gabriele Ballarin, Marco L. Della Vedova, Stefano Moret, and Luca de Alfaro. Some like it hoax: Automated fake news detection in social networks. arXiv preprint arXiv:1704.07506, 2017.Google Scholar
- Henri Tajfel and John C. Turner. An integrative theory of intergroup conict. The social psychology of intergroup relations, 33(47):74, 1979.Google Scholar
- Henri Tajfel and John C. Turner. The social identity theory of intergroup behavior. 2004.Google Scholar
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In WWW'15.Google Scholar
- Jiliang Tang, Yi Chang, and Huan Liu. Mining social media with social theories: a survey. ACM SIGKDD Explorations Newsletter, 15(2):20--29, 2014. Google ScholarDigital Library
- Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. Face2face: Real-time face capture and reenactment of rgb videos. In CVPR'16.Google Scholar
- Amos Tversky and Daniel Kahneman. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and uncertainty, 5(4):297--323, 1992. Google ScholarCross Ref
- Udo Undeutsch. Beurteilung der glaubhaftigkeit von aussagen. Handbuch der psychologie, 11:26--181, 1967.Google Scholar
- Andreas Vlachos and Sebastian Riedel. Fact checking: Task definition and dataset construction. ACL'14.Google Scholar
- Aldert Vrij. Criteria-based content analysis: A qualitative review of the first 37 studies. Psychology, Public Policy, and Law, 11(1):3, 2005. Google ScholarCross Ref
- Suhang Wang, Charu Aggarwal, Jiliang Tang, and Huan Liu. Attributed signed network embedding. In CIKM'17.Google Scholar
- Suhang Wang, Jiliang Tang, Charu Aggarwal, Yi Chang, and Huan Liu. Signed network embedding in social media. In SDM'17. Google ScholarCross Ref
- Suhang Wang, Jiliang Tang, Charu Aggarwal, and Huan Liu. Linked document embedding for classification. In CIKM'16. Google ScholarDigital Library
- Suhang Wang, Jiliang Tang, Fred Morstatter, and Huan Liu. Paired restricted boltzmann machine for linked data. In CIKM'16. Google ScholarDigital Library
- Suhang Wang, Yilin Wang, Jiliang Tang, Kai Shu, Suhas Ranganath, and Huan Liu. What your images reveal: Exploiting visual contents for point-of-interest recommendation. In WWW'17.Google Scholar
- William Yang Wang. "liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648, 2017.Google Scholar
- Yilin Wang, Suhang Wang, Jiliang Tang, Huan Liu, and Baoxin Li. Unsupervised sentiment analysis for social media images. In IJCAI, pages 2378--2379, 2015.Google ScholarDigital Library
- Andrew Ward, L. Ross, E. Reed, E. Turiel, and T. Brown. Naive realism in everyday life: Implications for social conict and misunderstanding. Values and knowledge, pages 103--135, 1997.Google Scholar
- Gerhard Weikum. What computers should know, shouldn't know, and shouldn't believe. In WWW'17.Google Scholar
- L. Wu, F. Morstatter, X. Hu, and H. Liu. Chapter 5: Mining misinformation in social media, 2016.Google Scholar
- Liang Wu, Xia Hu, Fred Morstatter, and Huan Liu. Adaptive spammer detection with sparse group modeling. In ICWSM'17.Google Scholar
- Liang Wu, Jundong Li, Xia Hu, and Huan Liu. Gleaning wisdom from the past: Early detection of emerging rumors in social media. In SDM'17.Google Scholar
- Liang Wu, Fred Morstatter, Xia Hu, and Huan Liu. Mining misinformation in social media. Big Data in Complex and Social Networks, pages 123--152, 2016.Google Scholar
- You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. Toward computational fact-checking. Proceedings of the VLDB Endowment, 7(7):589--600, 2014. Google ScholarDigital Library
- Fan Yang, Yang Liu, Xiaohui Yu, and Min Yang. Automatic detection of rumor on sina weibo. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, page 13. ACM, 2012. Google ScholarDigital Library
- Robert B. Zajonc. Attitudinal effects of mere exposure. Journal of personality and social psychology, 9(2p2):1, 1968Google Scholar
- Robert B. Zajonc. Mere exposure: A gateway to the subliminal. Current directions in psychological science, 10(6):224--228, 2001. Google ScholarCross Ref
- Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, and Rob Procter. Detection and resolution of rumours in social media: A survey. arXiv preprint arXiv:1704.00656, 2017.Google Scholar
Recommendations
Interpretable Fake News Detection on Social Media
ICSIM '23: Proceedings of the 2023 6th International Conference on Software Engineering and Information ManagementWith the development of information technology, public opinion can quickly spread to all over the world, permeate every corner of social life, and have a great impact on human's lives. Extracted from large-scale and multi-mode social media, user-...
Gatekeeping Fake News Discourses on Mainstream Media Versus Social Media
This study analyzes mainstream media (MSM) coverage of fake news discourse and compares it with social networking sites (SNS) users who reference the term “fakenews” in their tweets. The study employs computational methods by analyzing over 8 million ...
Comments