This article is missing information about social media mining itself. Please expand the article to include this information. Further details may exist on the talk page.(August 2018)
Social media mining is the process of obtaining data from user-generated content on social media in order to extract actionable patterns, form conclusions about users, and act upon the information. Mining supports targeting advertising to users or academic research. The term is an analogy to the process of mining for minerals. Mining companies sift through raw ore to find the valuable minerals; likewise, social media mining sifts through social media data in order to discern patterns and trends about matters such as social media usage, online behaviour, content sharing, connections between individuals, buying behaviour. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as such organizations can use the analyses for tasks such as design strategies, introduce programs, products, processes or services.
Users may not understand how platforms use their data.[4] Users tend to click through Terms of Use agreements without reading them, leading to ethical questions about whether platforms adequately protect users' privacy.
During the 2016 United States presidential election, Facebook allowed Cambridge Analytica, a political consulting firm linked to the Trump campaign, to analyze the data of an estimated 87 million Facebook users to profile voters, creating controversy when this was revealed.[5]
The first social media website was introduced by GeoCities in 1994. It enabled users to create their own homepages without having a sophisticated knowledge of HTML coding. The first social networking site, SixDegrees.com, was introduced in 1997.[7] Since then, many other social media sites have been introduced, each providing service to millions of people. These individuals form a virtual world in which individuals (social atoms), entities (content, sites, etc.) and interactions (between individuals, between entities, between individuals and entities) coexist. Social norms and human behavior govern this virtual world. By understanding these social norms and models of human behavior and combining them with the observations and measurements of this virtual world, one can systematically analyze and mine social media. Social media mining is the process of representing, analyzing, and extracting meaningful patterns from data in social media, resulting from social interactions. It is an interdisciplinary field encompassing techniques from computer science, data mining, machine learning, social network analysis, network science, sociology, ethnography, statistics, optimization, and mathematics. Social media mining faces grand challenges such as the big data paradox, obtaining sufficient samples, the noise removal fallacy, and evaluation dilemma.
Social media mining represents the virtual world of social media in a computable way, measures it, and designs models that can help us understand its interactions. In addition, social media mining provides necessary tools to mine this world for interesting patterns, analyze information diffusion, study influence and homophily, provide effective recommendations, and analyze novel social behavior in social media.
Uses
Social media mining is used across several industries including business development, social science research, health services, and educational purposes.[8][9] Once the data received goes through social media analytics, it can then be applied to these various fields. Often, companies use the patterns of connectivity that pervade social networks, such as assortativity—the social similarity between users that are induced by influence, homophily, and reciprocity and transitivity.[10] These forces are then measured via statistical analysis of the nodes and connections between these nodes.[8] Social analytics also uses sentiment analysis, because social media users often relay positive or negative sentiment in their posts.[11] This provides important social information about users' emotions on specific topics.[12][13][14]
These three patterns have several uses beyond pure analysis. For example, influence can be used to determine the most influential user in a particular network.[8] Companies would be interested in this information in order to decide who they may hire for influencer marketing. These influencers are determined by recognition, activity generation, and novelty—three requirements that can be measured through the data mined from these sites.[8] Analysts also value measures of homophily: the tendency of two similar individuals to become friends.[10] Users have begun to rely on information of other users' opinions in order to understand diverse subject matter.[11] These analyses can also help create recommendations for individuals in a tailored capacity.[8] By measuring influence and homophily, online and offline companies are able to suggest specific products for individuals consumers, and groups of consumers. Social media networks can use this information themselves to suggest to their users possible friends to add, pages to follow, and accounts to interact with.
Perception
Modern social media mining is a controversial practice that has led to exponential gains in user growth for tech giants such as Facebook, Inc., Twitter, and Google. Companies such as these, considered "Big Tech" are companies that build algorithms that take advantage of user input to understand their preferences, and keep them on the platform as much as possible. These inputs, that can be as simple as time spent on a given screen, provide the data being mined, and lead to companies profiting heavily from using that data to capitalize on extremely accurate predictions about user behavior. The growth of platforms accelerated rapidly once these strategies were put in place; Most of the largest platforms now average over 1 billion active users per month as of 2021.[15]
It has been claimed by a multitude of anti-algorithm personalities, like Tristan Harris or Chamath Palihapitiya, that certain companies (specifically Facebook) valued growth above all else, and ignored potential negative impacts from these growth engineering tactics.[16]
At the same time, users have now created their own data arbitrages with the help of their own data, through content monetization and becoming influencers. Users typically have access to a varied set of analytics specific to people that interact with them on social media, and can use these as building blocks for their own targeting and growth strategies through ads and posts that cater to their audiences. Influencers also commonly promote products and services for established brands, creating one of the largest digital industries: Influencer marketing. Instagram, Facebook, Twitter, YouTube, Google, and others have long given access to platform analytics, and allowed third parties to access that information as well, at times unbeknownst to even the user whose data is being viewed/bought.[17]
Research
Research areas
Social media event detection – Social networks enable users to freely communicate with each other and share their recent news, ongoing activities or views about different topics. As a result, they can be seen as a potentially viable source of information to understand the current emerging topics/events.[18][19][20][21][22][23]
Public health monitoring and surveillance - Using large-scale analysis of social media to study large cohorts of patients and the general public, e.g. to obtain early warning signals of drug-drug interactions and adverse drug reactions,[24][25] or understand human reproduction and sexual interest.[26]
Community structure (Community Detection/Evolution/Evaluation) – Identifying communities on social networks, how they evolve, and evaluating identified communities, often without ground truth.[1]
Network measures – Measuring centrality, transitivity, reciprocity, balance, status, and similarity in social media.[1]
Network models – Simulate networks with specific characteristics. Examples include random graphs (E-R models), Preferential attachment models, and small-world models.[1]
Information cascade – Analyzing how information propagates in social media sites. Examples include herd behavior, information cascades, diffusion of innovations, and epidemic models.[1]
Influence and homophily – Measuring network assortativity and measuring and modeling influence and homophily.[1]
Social spammer detection – Detecting social spammers who send out unwanted spam content appearing on social networks and any website with user-generated content to targeted users, often corroborating to boost their social influence, legitimacy, credibility.[34][35][36][37]
Distrust and negative links – Exploring negative links in social media.[46][47][48]
Role of social media in crises – Social media is continuing to play an important role during crises, particularly Twitter.[49] Studies show that it is possible to detect earthquakes[50] and rumors[51] using tweets published during crisis. Developing tools to help first responders to analyze tweets towards better crisis response[52] and developing techniques to provide them faster access to relevant tweets[53] is an active area of research.
Location-based social network mining – Mining Human Mobility for Personalized POI Recommendation on Location-based Social Networks.[54][55][56][57][58][59]
Provenance of information in social media – Provenance informs a user about the sources of a given piece of information. Social media can help in identifying the provenance of information due its unique features: user-generated content, user profiles, user interactions, and spatial or temporal information.[60][61]
Vulnerability management – A user's vulnerability on a social networking sites can be managed in three sequential steps: (1) identifying new ways in which a user can be vulnerable, (2) quantifying or measuring a user's vulnerability, and (3) reducing or mitigating them.[62]
Opinion mining on candidates/parties - Social media is a popular medium for candidates/parties to campaign and for gauging the public reaction to the campaigns. Social media can also be used as an indicator of the voters' opinion. Some research studies have shown that predictions made using social media posts can match (or even improve) traditional opinion polls.[63]
Publication venues
Social media mining research articles are published in computer science, social science, and data mining conferences and journals:
Conferences
Conference papers can be found in proceedings of Knowledge
Discovery and Data Mining (KDD), World Wide Web (WWW), Association
for Computational Linguistics (ACL), Conference on Information
and Knowledge Management (CIKM), International Conference on Data
Mining (ICDM), Internet Measuring Conference (IMC).
^Sumbaly, Roshan; Kreps, Jay; Shah, Sam (June 2013). "The big data ecosystem at LinkedIn". Proceedings of the 2013 international conference on Management of data - SIGMOD '13 (Report). SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. pp. 1125–1134. doi:10.1145/2463676.2463707. ISBN978-1-4503-2037-5.
^Kaplan, Andreas M.; Haenlein, Michael (2010). "Users of the world, unite! The challenges and opportunities of social media". Business Horizons. 53 (1): 59–68. doi:10.1016/j.bushor.2009.09.003. S2CID16741539.
^ abcdeZafarani, R., Ali Abbasi, M., Liu, H., (2014). Social Media Mining. Cambridge University Press. http://dmml.asu.edu/smm.
^Singh, Archana (2017). "Mining of Social Media data of University students". Education and Information Technologies. 22 (4): 1515–1526. doi:10.1007/s10639-016-9501-1. S2CID1761288.
^Shahheidari, S; Dong, H; Daud, R (2013). "Twitter sentiment mining: A multi domain analysis". 2013 Seventh International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS 2013). pp. 144–149.
^Hu, Xia; Tang, Jiliang; Zhang, Yanchao; Liu, Huan (2013). "Social Spammer Detection in Microblogging"(PDF). Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Archived from the original(PDF) on March 4, 2016. Retrieved November 29, 2014.
^Hu, Xia; Tang, Jiliang; Liu, Huan (2014). "Online Social Spammer Detection"(PDF). Proceedings of the 28th AAAI Conference on Artificial Intelligence. Archived from the original(PDF) on March 28, 2016. Retrieved November 29, 2014.
^Tang, Jiliang; Liu, Huan (2014). "Trust in Social Computing". Proceedings of the 23rd International World Wide Web Conference. Archived from the original on March 4, 2016. Retrieved November 30, 2014.
^Tang, Jiliang; Gao, Huiji; DasSarma, Atish; Liu, Huan (2012). "eTrust: Understanding Trust Evolution in an Online World"(PDF). Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Archived from the original(PDF) on March 4, 2016. Retrieved November 30, 2014.
^Tang, Jiliang; Gao, Huiji; Hu, Xia; Liu, Huan (2013). "Exploiting Homophily Effect for Trust Prediction"(PDF). The 6th ACM International Conference on Web Search and Data Mining. Archived from the original(PDF) on March 4, 2016. Retrieved November 30, 2014.
^Tang, Jiliang; Hu, Xia; Chang, Yi; Liu, Huan (2014). "Predictability of Distrust with Interaction Data"(PDF). ACM International Conference on Information and Knowledge Management. Archived from the original(PDF) on March 3, 2016. Retrieved November 30, 2014.
^Bruno, Nicola (2011). "Tweet first, verify later? How real-time information is changing the coverage of worldwide crisis events". Oxford: Reuters Institute for the Study of Journalism, University of Oxford. 10: 2010–2011.
^Sakaki, Takashi; Okazaki, Makoto; Yutaka, Matsuo (2010). "Earthquake shakes Twitter users: real-time event detection by social sensors". Proceedings of the 19th International Conference on World Wide Web. pp. 851–860.
^Mendoza, Marcelo; Poblete, Barbara; Castillo, Carlos (2010). "Twitter under crisis: Can we trust what we RT?". Proceedings of the First Workshop on Social Media Analytics. pp. 71–79.
^Kumar, Shamanth; Hu, Xia; Liu, Huan (2014). "A behavior analytics approach to identifying tweets from crisis regions". Proceedings of the 25th ACM Conference on Hypertext and Social Media. pp. 255–260.
^Barbier, Geoffrey; Feng, Zhuo; Gundecha, Pritam; Liu, Huan (2013). "Provenance Data in Social Media". Synthesis Lectures on Data Mining and Knowledge Discovery. 4: 1–84. doi:10.2200/S00496ED1V01Y201304DMK007. S2CID46794494.
^Gundecha, Pritam; Feng, Zhuo; Liu, Huan (2013). "Seeking Provenance of Information in Social Media"(PDF). Proceedings of the 22nd ACM International Conference on Information and Knowledge Management Conference. Archived from the original(PDF) on March 4, 2016. Retrieved December 1, 2014.
^Marozzo, Fabrizio; Bessi, Alessandro (2018), "Analyzing polarization of social media users and news sites during political campaigns", Social Network Analysis and Mining, 8: 1, doi:10.1007/s13278-017-0479-5, S2CID21257844
Barbier, Geoffrey; Feng, Zhuo; Gundecha, Pritam; Liu, Huan (2013). "Provenance Data in Social Media". Synthesis Lectures on Data Mining and Knowledge Discovery. 4: 1–84. doi:10.2200/S00496ED1V01Y201304DMK007. S2CID46794494.