skip to main content
research-article

Fake News Detection on Social Media: A Data Mining Perspective

Authors Info & Claims
Published:01 September 2017Publication History
Skip Abstract Section

Abstract

Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of \fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ine ective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.

References

  1. Sadia Afroz, Michael Brennan, and Rachel Greenstadt. Detecting hoaxes, frauds, and deception in writing style online. In ISSP'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Hunt Allcott and Matthew Gentzkow. Social media and fake news in the 2016 election. Technical report, National Bureau of Economic Research, 2017. Google ScholarGoogle ScholarCross RefCross Ref
  3. Solomon E. Asch and H. Guetzkow. Effects of group pressure upon the modification and distortion of judgments. Groups, leadership, and men, pages 222--236, 1951.Google ScholarGoogle Scholar
  4. Meital Balmas. When fake news becomes real: Combined exposure to multiple news sources and political attitudes of inefficacy, alienation, and cynicism. Communication Research, 41(3):430--454, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  5. Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. Open information extraction from the web. In IJCAI'07.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alessandro Bessi and Emilio Ferrara. Social bots distort the 2016 us presidential election online discussion. First Monday, 21(11), 2016. Google ScholarGoogle ScholarCross RefCross Ref
  7. Prakhar Biyani, Kostas Tsioutsiouliklis, and John Blackmer. "8 amazing secrets for getting more clicks": Detecting clickbaits in news streams using article informality. In AAAI'16.Google ScholarGoogle Scholar
  8. Jonas Nygaard Blom and Kenneth Reinecke Hansen. Click bait: Forward-reference as lure in online news headlines. Journal of Pragmatics, 76:87--100, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  9. Paul R Brewer, Dannagal Goldthwaite Young, and Michelle Morreale. The impact of real news about fake news: Intertextual processes and political satire. International Journal of Public Opinion Research, 25(3):323--343, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  10. Carlos Castillo, Mohammed El-Haddad, Jürgen Pfeffer, and Matt Stempeck. Characterizing the life cycle of online news stories using social media reactions. In CSCW'14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. Information credibility on twitter. In WWW'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Abhijnan Chakraborty, Bhargavi Paranjape, Sourya Kakarla, and Niloy Ganguly. Stop clickbait: Detecting and preventing clickbaits in online news media. In ASONAM'16.Google ScholarGoogle Scholar
  13. Yimin Chen, Niall J. Conroy, and Victoria L. Rubin. Misleading online content: Recognizing clickbait as false news. In Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection, pages 15--19. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Justin Cheng, Michael Bernstein, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. Anyone can become a troll: Causes of trolling behavior in online discussions. In CSCW '17.Google ScholarGoogle Scholar
  15. Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Transactions on Dependable and Secure Computing, 9(6):811--824, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M. Rocha, Johan Bollen, Filippo Menczer, and Alessandro Flammini. Computational fact checking from knowledge networks. PloS one, 10(6):e0128193, 2015. Google ScholarGoogle Scholar
  17. Niall J. Conroy, Victoria L. Rubin, and Yimin Chen. Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology, 52(1):1--4, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  18. Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H. Eugene Stanley, and Walter Quattrociocchi. The spreading of misinformation online. Proceedings of the National Academy of Sciences, 113(3):554--559, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  19. Michela Del Vicario, Gianna Vivaldo, Alessandro Bessi, Fabiana Zollo, Antonio Scala, Guido Caldarelli, and Walter Quattrociocchi. Echo chambers: Emotional contagion and group polarization on facebook. Scientific Reports, 6, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  20. Thomas G. Dietterich et al. Ensemble methods in machine learning. Multiple classifier systems, 1857:1--15, 2000.Google ScholarGoogle Scholar
  21. Mehrdad Farajtabar, Jiachen Yang, Xiaojing Ye, Huan Xu, Rakshit Trivedi, Elias Khalil, Shuang Li, Le Song, and Hongyuan Zha. Fake news mitigation via point process based intervention. arXiv preprint arXiv:1703.07823, 2017.Google ScholarGoogle Scholar
  22. Song Feng, Ritwik Banerjee, and Yejin Choi. Syntactic stylometry for deception detection. In ACL'12.Google ScholarGoogle Scholar
  23. Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini. The rise of social bots. Communications of the ACM, 59(7):96--104, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Johannes Fürnkranz. A study using n-gram features for text categorization. Austrian Research Institute for Artifical Intelligence, 3(1998):1--10, 1998.Google ScholarGoogle Scholar
  25. Ashutosh Garg and Dan Roth. Understanding probabilistic classifiers. ECML'01.Google ScholarGoogle Scholar
  26. Matthew Gentzkow, Jesse M. Shapiro, and Daniel F. Stone. Media bias in the marketplace: Theory. Technical report, National Bureau of Economic Research, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  27. Adrien Guille, Hakim Hacid, Cecile Favre, and Djamel A Zighed. Information diffusion in online social networks: A survey. ACM Sigmod Record, 42(2):17--28, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi. Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In WWW'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Manish Gupta, Peixiang Zhao, and Jiawei Han. Evaluating event credibility on twitter. In PSDM'12. Google ScholarGoogle ScholarCross RefCross Ref
  30. David J. Hand and Robert J. Till. A simple generalisation of the area under the roc curve for multiple class classification problems. Machine learning, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Naeemul Hassan, Chengkai Li, and Mark Tremayne. Detecting check-worthy factual claims in presidential debates. In CIKM'15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. John Houvardas and Efstathios Stamatatos. N-gram feature selection for authorship identification. Artificial Intelligence: Methodology, Systems, and Applications, pages 77--86, 2006.Google ScholarGoogle Scholar
  33. Xia Hu, Jiliang Tang, Huiji Gao, and Huan Liu. Social spammer detection with sentiment information. In ICDM'14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xia Hu, Jiliang Tang, and Huan Liu. Online social spammer detection. In AAAI'14, pages 59--65, 2014.Google ScholarGoogle Scholar
  35. Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. Social spammer detection in microblogging. In IJCAI'13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhiwei Jin, Juan Cao, Yu-Gang Jiang, and Yongdong Zhang. News credibility evaluation on microblog with a hierarchical propagation model. In ICDM'14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. News verification by exploiting conicting social viewpoints in microblogs. In AAAI'16.Google ScholarGoogle Scholar
  38. Zhiwei Jin, Juan Cao, Yongdong Zhang, Jianshe Zhou, and Qi Tian. Novel visual and statistical image features for microblogs news verification. IEEE Transactions on Multimedia, 19(3):598--608, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Daniel Kahneman and Amos Tversky. Prospect theory: An analysis of decision under risk. Econometrica: Journal of the econometric society, pages 263--291, 1979. Google ScholarGoogle ScholarCross RefCross Ref
  40. Jean-Noel Kapferer. Rumors: Uses, Interpretation and Necessity. Routledge, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  41. David O. Klein and Joshua R. Wueller. Fake news: A legal perspective. 2017.Google ScholarGoogle Scholar
  42. Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, and Yajun Wang. Prominent features of rumor propagation in online social media. In ICDM'13, pages 1103--1108. IEEE, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  43. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436--444, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  44. Kyumin Lee, James Caverlee, and Steve Webb. Uncovering social spammers: social honeypots+ machine learning. In SIGIR'10.Google ScholarGoogle Scholar
  45. Tony Lesce. Scan: Deception detection by scientific content analysis. Law and Order, 38(8):3--6, 1990.Google ScholarGoogle Scholar
  46. Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. A survey on truth discovery. ACM Sigkdd Explorations Newsletter, 17(2):1--16, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Charles X. Ling, Jin Huang, and Harry Zhang. Auc: a statistically consistent and more discriminating measure than accuracy.Google ScholarGoogle Scholar
  48. Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J. Jansen, Kam-Fai Wong, and Meeyoung Cha. Detecting rumors from microblogs with recurrent neural networks.Google ScholarGoogle Scholar
  49. Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, and Kam-Fai Wong. Detect rumors using time series of social context information on microblogging websites. In CIKM'15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Amr Magdy and Nayer Wanas. Web-based statistical fact checking of textual documents. In Proceedings of the 2nd international workshop on Search and mining user-generated contents, pages 103--110. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Filippo Menczer. The spread of misinformation in social media. In WWW'16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Tanushree Mitra and Eric Gilbert. Credbank: A largescale social media corpus with associated credibility annotations. In ICWSM'15.Google ScholarGoogle Scholar
  53. Saif M. Mohammad, Parinaz Sobhani, and Svetlana Kiritchenko. Stance and sentiment in tweets. ACM Transactions on Internet Technology (TOIT), 17(3):26, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Fred Morstatter, Harsh Dani, Justin Sampson, and Huan Liu. Can one tamper with the sample api?: Toward neutralizing bias from spam and bot content. In WWW'16.Google ScholarGoogle Scholar
  55. Fred Morstatter, Liang Wu, Tahora H. Nazer, Kathleen M. Carley, and Huan Liu. A new approach to bot detection: Striking the balance between precision and recall. In ASONAM'16.Google ScholarGoogle Scholar
  56. Subhabrata Mukherjee and Gerhard Weikum. Leveraging joint interactions for credibility analysis in news communities. In CIKM'15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Eni Mustafaraj and Panagiotis Takis Metaxas. The fake news spreading plague: Was it preventable? arXiv preprint arXiv:1703.06988, 2017.Google ScholarGoogle Scholar
  58. Raymond S. Nickerson. Con rmation bias: A ubiquitous phenomenon in many guises. Review of general psychology, 2(2):175, 1998. Google ScholarGoogle ScholarCross RefCross Ref
  59. Brendan Nyhan and Jason Reier. When corrections fail: The persistence of political misperceptions. Political Behavior, 32(2):303--330, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  60. Christopher Paul and Miriam Matthews. The russian firehose of falsehood propaganda model.Google ScholarGoogle Scholar
  61. Dongping Tian et al. A review on image feature extraction and representation techniques. International Journal of Multimedia and Ubiquitous Engineering, 8(4):385--396, 2013.Google ScholarGoogle Scholar
  62. Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. A stylometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:1702.05638, 2017.Google ScholarGoogle Scholar
  63. Martin Potthast, Sebastian Köpsel, Benno Stein, and Matthias Hagen. Clickbait detection. In European Conference on Information Retrieval, pages 810--817. Springer, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  64. Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. Rumor has it: Identifying misinformation in microblogs. In EMNLP'11.Google ScholarGoogle Scholar
  65. Walter Quattrociocchi, Antonio Scala, and Cass R. Sunstein. Echo chambers on facebook. 2016.Google ScholarGoogle Scholar
  66. Victoria L. Rubin, Yimin Chen, and Niall J. Conroy. Deception detection for news: three types of fakes. Proceedings of the Association for Information Science and Technology, 52(1):1--4, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  67. Victoria L. Rubin, Niall J. Conroy, Yimin Chen, and Sarah Cornwell. Fake news or truth? using satirical cues to detect potentially misleading news. In Proceedings of NAACL-HLT, pages 7--17, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  68. Victoria L. Rubin and Tatiana Lukoianova. Truth and deception at the rhetorical structure level. Journal of the Association for Information Science and Technology, 66(5):905--917, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Natali Ruchansky, Sungyong Seo, and Yan Liu. Csi: A hybrid deep model for fake news. arXiv preprint arXiv:1703.06959, 2017.Google ScholarGoogle Scholar
  70. Justin Sampson, Fred Morstatter, Liang Wu, and Huan Liu. Leveraging the implicit structure within social media for emergent rumor detection. In CIKM'15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Chengcheng Shao, Giovanni Luca Ciampaglia, Alessandro Flammini, and Filippo Menczer. Hoaxy: A platform for tracking online misinformation. In WWW'16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Baoxu Shi and Tim Weninger. Fact checking in heterogeneous information networks. In WWW'16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Kai Shu, Suhang Wang, Jiliang Tang, Reza Zafarani, and Huan Liu. User identity linkage across online social networks: A review. ACM SIGKDD Explorations Newsletter, 18(2):5--17, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Supasorn Suwajanakorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics (TOG), 36(4):95, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Eugenio Tacchini, Gabriele Ballarin, Marco L. Della Vedova, Stefano Moret, and Luca de Alfaro. Some like it hoax: Automated fake news detection in social networks. arXiv preprint arXiv:1704.07506, 2017.Google ScholarGoogle Scholar
  76. Henri Tajfel and John C. Turner. An integrative theory of intergroup conict. The social psychology of intergroup relations, 33(47):74, 1979.Google ScholarGoogle Scholar
  77. Henri Tajfel and John C. Turner. The social identity theory of intergroup behavior. 2004.Google ScholarGoogle Scholar
  78. Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In WWW'15.Google ScholarGoogle Scholar
  79. Jiliang Tang, Yi Chang, and Huan Liu. Mining social media with social theories: a survey. ACM SIGKDD Explorations Newsletter, 15(2):20--29, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. Face2face: Real-time face capture and reenactment of rgb videos. In CVPR'16.Google ScholarGoogle Scholar
  81. Amos Tversky and Daniel Kahneman. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and uncertainty, 5(4):297--323, 1992. Google ScholarGoogle ScholarCross RefCross Ref
  82. Udo Undeutsch. Beurteilung der glaubhaftigkeit von aussagen. Handbuch der psychologie, 11:26--181, 1967.Google ScholarGoogle Scholar
  83. Andreas Vlachos and Sebastian Riedel. Fact checking: Task definition and dataset construction. ACL'14.Google ScholarGoogle Scholar
  84. Aldert Vrij. Criteria-based content analysis: A qualitative review of the first 37 studies. Psychology, Public Policy, and Law, 11(1):3, 2005. Google ScholarGoogle ScholarCross RefCross Ref
  85. Suhang Wang, Charu Aggarwal, Jiliang Tang, and Huan Liu. Attributed signed network embedding. In CIKM'17.Google ScholarGoogle Scholar
  86. Suhang Wang, Jiliang Tang, Charu Aggarwal, Yi Chang, and Huan Liu. Signed network embedding in social media. In SDM'17. Google ScholarGoogle ScholarCross RefCross Ref
  87. Suhang Wang, Jiliang Tang, Charu Aggarwal, and Huan Liu. Linked document embedding for classification. In CIKM'16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Suhang Wang, Jiliang Tang, Fred Morstatter, and Huan Liu. Paired restricted boltzmann machine for linked data. In CIKM'16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Suhang Wang, Yilin Wang, Jiliang Tang, Kai Shu, Suhas Ranganath, and Huan Liu. What your images reveal: Exploiting visual contents for point-of-interest recommendation. In WWW'17.Google ScholarGoogle Scholar
  90. William Yang Wang. "liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648, 2017.Google ScholarGoogle Scholar
  91. Yilin Wang, Suhang Wang, Jiliang Tang, Huan Liu, and Baoxin Li. Unsupervised sentiment analysis for social media images. In IJCAI, pages 2378--2379, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Andrew Ward, L. Ross, E. Reed, E. Turiel, and T. Brown. Naive realism in everyday life: Implications for social conict and misunderstanding. Values and knowledge, pages 103--135, 1997.Google ScholarGoogle Scholar
  93. Gerhard Weikum. What computers should know, shouldn't know, and shouldn't believe. In WWW'17.Google ScholarGoogle Scholar
  94. L. Wu, F. Morstatter, X. Hu, and H. Liu. Chapter 5: Mining misinformation in social media, 2016.Google ScholarGoogle Scholar
  95. Liang Wu, Xia Hu, Fred Morstatter, and Huan Liu. Adaptive spammer detection with sparse group modeling. In ICWSM'17.Google ScholarGoogle Scholar
  96. Liang Wu, Jundong Li, Xia Hu, and Huan Liu. Gleaning wisdom from the past: Early detection of emerging rumors in social media. In SDM'17.Google ScholarGoogle Scholar
  97. Liang Wu, Fred Morstatter, Xia Hu, and Huan Liu. Mining misinformation in social media. Big Data in Complex and Social Networks, pages 123--152, 2016.Google ScholarGoogle Scholar
  98. You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. Toward computational fact-checking. Proceedings of the VLDB Endowment, 7(7):589--600, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Fan Yang, Yang Liu, Xiaohui Yu, and Min Yang. Automatic detection of rumor on sina weibo. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, page 13. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Robert B. Zajonc. Attitudinal effects of mere exposure. Journal of personality and social psychology, 9(2p2):1, 1968Google ScholarGoogle Scholar
  101. Robert B. Zajonc. Mere exposure: A gateway to the subliminal. Current directions in psychological science, 10(6):224--228, 2001. Google ScholarGoogle ScholarCross RefCross Ref
  102. Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, and Rob Procter. Detection and resolution of rumours in social media: A survey. arXiv preprint arXiv:1704.00656, 2017.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGKDD Explorations Newsletter
    ACM SIGKDD Explorations Newsletter  Volume 19, Issue 1
    June 2017
    59 pages
    ISSN:1931-0145
    EISSN:1931-0153
    DOI:10.1145/3137597
    Issue’s Table of Contents

    Copyright © 2017 Authors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 September 2017

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader