Abstract
Social media is rapidly becoming a medium of choice for understanding the cultural pulse of a region; e.g. for identifying what the population is concerned with and what kind of help is needed in a crisis. To assess this cultural pulse, it is critical to have an accurate assessment of who is saying what. Unfortunately, social media is also the home of users who engage in disruptive, disingenuous, and potentially illegal activity. A range of users, both human and non-human, carry out such social cyber-attacks. We ask, to what extent does the presence or absence of such users influence our ability to assess the cultural pulse of a region? Our prior research on this topic showed that Twitter-based network structures and content are unstable and can be highly impacted by the removal of suspended users. Because of this, statistical techniques can be established to differentiate potential types of suspended and non-suspended users. In this extended paper, we develop additional experiments to explore the spatial patterns of suspended users, and we further consider how these users affect structural and content concentrations via the development of new metrics and new analyses. We find significant evidence that suspended users exist on the periphery of social networks on Twitter and consequently that removing them has little impact on network structure. We also improve prior attempts to distinguish among different types of suspended users by using a much larger dataset. Finally, we conduct a temporal sentiment analysis to illustrate differences between suspended users and non-suspended users on this dimension.
Similar content being viewed by others
References
Amleshwaram AA, Reddy N, Yadav S, Gu G, Yang C (2013) CATS: characterizing automation of twitter spammers. In: Communication systems and networks (COMSNETS), 2013 fifth international conference on, IEEE, pp 1–10
Anthonisse JM (1971) The rush in a directed graph. Stichting Mathematisch Centrum Mathematische Besliskunde (BN 9/71):1–10
Bíró I, Szabó J, Benczúr AA (2008) Latent dirichlet allocation in web spam filtering. In: Proceedings of the 4th international workshop on adversarial information retrieval on the web, ACM, pp 29–32
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Bolton RJ, Hand DJ (2002) Statistical fraud detection: a review. Stat Sci 17:235–249
Borgatti SP, Carley KM, Krackhardt D (2006) On the robustness of centrality measures under conditions of imperfect data. Soc Netw 28(2):124–136
Bosagh Zadeh R, Goel A, Munagala K, Sharma A (2013) On the precision of social and information networks. In: Proceedings of the first ACM conference on Online social networks, pp 63–74
Carley KM, Pfeffer J, Morstatter F, Liu H (2014) Embassies burning: toward a near-real-time assessment of social media using geo-temporal dynamic network analytics. Soci Netw Anal Min 4(1):1–23
De Lathauwer L, De Moor B, Vandewalle J, by Higher-Order BSS (1994) Singular value decomposition. In: Proceedings of the EUSIPCO-94, Edinburgh, Scotland, UK, vol 1, pp 175–178
Diao Q, Qiu M, Wu CY, Smola AJ, Jiang J, Wang C (2014) Jointly modeling aspects, ratings and sentiments for movie recommendation (jmars). In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 193–202
Dumais ST (2004) Latent semantic analysis. Ann Rev Inf Sci Technol 38(1):188–230
Esuli A, Sebastiani F (2006) Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of LREC, Citeseer, vol 6, pp 417–422
Frantz TL, Cataldo M, Carley KM (2009) Robustness of centrality measures under uncertainty: examining the role of network topology. Comput Math Organ Theory 15(4):303–328
Freeman LC (1979) Centrality in social networks conceptual clarification. Soc Netw 1(3):215–239
Gelman A (2008) Scaling regression inputs by dividing by two standard deviations. Stat Med 27(15):2865–2873
Golder SA, Macy MW (2011) Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science 333(6051):1878–1881. doi:10.1126/science.1202775, http://www.sciencemag.org/content/333/6051/1878
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235
Heise DR (1987) Affect control theory: concepts and model. J Math Sociol 13(1–2):1–33
Hern A (2015) Twitter CEO: we suck at dealing with trolls and abuse. http://www.theguardian.com/technology/2015/feb/05/twitter-ceo-we-suck-dealing-with-trolls-abuse
Hong L, Ahmed A, Gurumurthy S, Smola AJ, Tsioutsiouliklis K (2012) Discovering geographical topics in the twitter stream. In: Proceedings of the 21st international conference on world wide web, ACM, pp 769–778
Hong L, Davison BD (2010) Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics, ACM, pp 80–88
Hutto C, Gilbert E (2014) Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth international AAAI conference on weblogs and social media
Jordan MI (1998) Learning in Graphical Models: [proceedings of the NATO Advanced Study Institute...: Ettore Mairona Center, Erice, Italy, September 27-October 7, 1996], vol 89. Springer Science & Business Media
Joseph K, Carley KM (2015) Culture, networks, twitter and foursquare: testing a model of cultural conversion with social media data. In: Proceedings of the 7th international AAAI conference on weblogs and social media (ICWSM)
Joseph K, Tan CH, Carley KM (2012) Beyond local, categories and friends: clustering foursquare users with latent topics. In: Proceedings of the 2012 ACM conference on ubiquitous computing, ACM, pp 919–926
Le QV, Mikolov T (2014) Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053
Lim KH, Datta A (2013) A topological approach for detecting twitter communities with common interests. In: Atzmueller M, Chin A, Helic D, Hotho A (eds) Ubiquitous social media analysis. Springer, Berlin Heidelberg, pp 23–43
Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM conference on information and knowledge management, ACM, pp 375–384
Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167
Luxton DD, June JD, Fairall JM (2012) Social media and suicide: a public health perspective. Am J Public Health 102(S2):S195–S200. doi:10.2105/AJPH.2011.300608
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73
Moh TS, Murmann AJ (2010) Can you judge a man by his friends?-enhancing spammer detection on the twitter microblogging platform using friends and followers. In: Information systems, technology and management. Springer, pp 210–220
Monmarché N, Slimane M, Venturini G (1999) Antclass: discovery of clusters in numeric data by an hybridization of an ant colony with the kmeans algorithm
Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: LREC, vol 10, pp 1320–1326
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing-volume 10, association for computational linguistics, pp 79–86
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Pennebaker JW, Booth RJ, Francis ME (2007) Linguistic inquiry and word count: Liwc. Liwc net, Austin
Ratkiewicz J, Conover M, Meiss M, Gonçalves B, Flammini A, Menczer F (2011) Detecting and tracking political abuse in social media. In: ICWSM
Reynolds D (2009) Gaussian mixture models. In: Encyclopedia of biometrics. Springer, pp 659–663
Romero DM, Tan C, Kleinberg J (2013) On the interplay between social and topical structure. In: Proceedings of the 7th International AAAI Conference on weblogs and social media (ICWSM)
Santos I, Miambres-Marcos I, Laorden C, Galn-Garca P, Santamara-Ibirika A, Bringas PG (2014) Twitter content-based spam filtering. In: International joint conference SOCO13-CISIS13-ICEUTE13. Springer, pp 449–458
Thomas K, Grier C, Song D, Paxson V (2011) Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM SIGCOMM conference on internet measurement conference, ACM, pp 243–258
Thomas K, McCoy D, Grier C, Kolcz A, Paxson V (2013) Trafficking fraudulent accounts: the role of the underground market in twitter spam and abuse. Presented as part of the 22nd USENIX security symposium (USENIX Security 13). USENIX, Washington, D.C., pp 195–210
Titov I, McDonald RT (2008) A joint model of text and aspect ratings for sentiment summarization. In: ACL, Citeseer, vol. 8, pp 308–316
Wang AH (2010) Don’t follow me: spam detection in twitter. In: Security and cryptography (SECRYPT), proceedings of the 2010 international conference on, IEEE, pp 1–10
Wang C, Blei DM (2011) Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 448–456
Wei W, Carley K (2014) Real time closeness and betweenness centrality calculations on streaming network data.
Wei W, Carley KM (2015) Measuring temporal patterns in dynamic social networks. ACM Trans Knowl Discov Data (TKDD) 10(1):1–27. doi:10.1145/2749465
Wei W, Joseph K, Liu H, Carley KM (2015a) The fragility of twitter social networks against suspended users. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015, ACM, pp 9–16
Wei W, Joseph K, Lo W, Carley KM (2015b) A bayesian graphical model to discover latent events from twitter. In: Ninth international AAAI conference on web and social media
Wei W, Pfeffer J, Reminga J, Carley KM (2011) Handling weighted, asymmetric, self-looped, and disconnected networks in ora. Tech. rep., DTIC Document
Xia P, Jiang H, Wang X, Chen C, Liu B (2014) Predicting user replying behavior on a large online dating site. In: Proceedings of 8th international AAAI conference on weblogs and social media
Xia P, Liu B, Sun Y, Chen C (2015) Reciprocal recommendation system for online dating. arXiv preprint arXiv:150106247
Xie Y, Yu F, Achan K, Panigrahy R, Hulten G, Osipkov I (2008) Spamming botnets: signatures and characteristics. In: ACM SIGCOMM computer communication review, ACM 38:171–182
Xu R, Wunsch D et al (2005) Survey of clustering algorithms. Neural Netw IEEE Trans 16(3):645–678
Yin J, Ho Q, Xing EP (2013) A scalable approach to probabilistic latent space inference of large-scale networks. In: Advances in neural information processing systems, pp 422–430
Yuan J, Zheng Y, Xie X (2012) Discovering regions of different functions in a city using human mobility and pois. In: Proceedings of the 18th ACM SIGKDD international conference on kowledge discovery and data mining, ACM, pp 186–194
Acknowledgments
This work was supported in part by the Office of Naval Research (ONR) through a MURI N000140811186 on adversarial reasoning, DTRA HDTRA11010102, by the Department of Defense under the MINERVA initiative through the ONR N000141310835 on Multi-Source Assessment of State Stability, and by Center for Computational Analysis of Social and Organization Systems (CASOS). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Office of Naval Research, the Department of Defense, or the United States Government.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wei, W., Joseph, K., Liu, H. et al. Exploring characteristics of suspended users and network stability on Twitter. Soc. Netw. Anal. Min. 6, 51 (2016). https://doi.org/10.1007/s13278-016-0358-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-016-0358-5