Abstract
Blogging has become a popular and convenient way to communicate, publish information, share preferences, voice opinions, provide suggestions, report news, and form virtual communities in the Blogosphere. The blogosphere obeys a power law distribution with very few blogs being extremely influential and a huge number of blogs being largely unknown. Regardless of a (multi-author) blog being influential or not, there are influential bloggers. However, the sheer number of such blogs makes it extremely challenging to study each one of them. One way to analyze these blogs is to find influential bloggers and consider them as the community representatives. Influential bloggers can impact fellow bloggers in various ways. In this paper, we study the problem of identifying influential bloggers. We define influential bloggers, investigate their characteristics, discuss the challenges with identification, develop a model to quantify their influence, and pave the way for further research leading to more sophisticated models that enable categorization of various types of influential bloggers. To highlight these issues, we conduct experiments using data from blogs, evaluate multiple facets of the problem, and present a unique and objective evaluation strategy given the subjectivity in defining the influence, in addition to various other analytical capabilities. We conclude with interesting findings and future work.
Similar content being viewed by others
Notes
More details on identifying and measuring these indicators are provided in Sect. 3.
Note that K is a user specified parameter.
A reason we did not adopt any of these is their computation is beyond the scope of this work. We use some simpler measure to examine its effect in determining influence.
TUAW was setup in February 2004.
This dataset will be made available upon request for research purposes.
We get this data using Digg API.
On average, 70–80 blog posts from TUAW are submitted to Digg every month, so we pick 20 most “digged” or influential posts to avoid under-sampling or over-sampling.
In early stage of the blog site, there are a few cases in which there was little blogging activity such as Feb-04, Oct-04, and Nov-04, resulting in fewer than five influentials.
References
Agarwal N, Kumar S, Lim M, Liu H (2009a) Mapping socio-cultural dynamics in indonesian blogosphere. In: 3rd AAAI International Conference on Computational Cultural Dynamics (ICCCD09), pp 37–44
Agarwal N, Kumar S, Liu H, Woodward M (2009b) Blogtrackers: a tool for sociologists to track and analyze blogosphere. In: Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media (ICWSM)
Agarwal N, Liu H, Murthy S, Sen A, Wang X (2009c) A social identity approach to identify familiar strangers in a social network. In: Proceedings of the Third International AAAI Conference of Weblogs and Social Media, pp 2–9
Agarwal N, Liu H, Salerno JJ, Yu PS (2007) Searching for familiar strangers on blogosphere: problems and challenges. In: NGDM
Anderson C (2006) The long tail: why the future of business is selling less of more. Hyperion, New York
Argamon S, Koppel M, Fine J, Shimoni A (2003) Gender, genre, and writing style in formal written texts. TextInterdiscip J Study Discourse 23(3):321–346
Berelson B, Lazarsfeld P, McPhee W (1986) Voting: a study of opinion formation in a presidential campaign. University of Chicago Press, Chicago
Bonacich P (1987) Power and centrality: a family of measures. Am J Sociol 92(5):1170–1182
Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30(1–7):107–117
Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. In: Proceedings of the seventh international conference on World Wide Web, pp 107–117
Chen C, Paul R (2001) Visualizing a knowledge domain’s intellectual structure. Computer 34(3):65–71
Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, pp 199–208
Chin A, Chignell M (2006) A social hypertext model for finding community in blogs. In: HYPERTEXT ’06: Proceedings of the seventeenth conference on Hypertext and hypermedia, ACM Press, New York, pp 11–22
Coffman T, Marcus S (2004) Dynamic classification of groups through social network analysis and hmms. In: Proceedings of IEEE Aerospace Conference
Coleman J, Katz E, Menzel H (1966) Medical innovation: a diffusion study. Bobbs-Merrill Co, Indiana
Drezner D, Farrell H (2004) The power and politics of blogs. In: American Political Science Association Annual Conference
Elkin T (2007) Just an online minute… online forecast. http://publications.mediapost.com/index.cfm?fuseaction=Articles.showArticle&artaid=29803
Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, pp 251–262
Fensterer GD (2007) Planning and assessing stability operations: a proposed value focus thinking approach. PhD thesis, Air Force Institute of Technology
Gill KE (2004) How can we measure the influence of the blogosphere? In: Proceedings of the WWW’04: workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics
Gillmor D (2006) We the media: grassroots journalism by the people, for the people. O’Reilly, Sebastopol
Goldenberg J, Libai B, Muller E (2001) Talk of the network: a complex systems look at the underlying process of word-of-mouth. Mark Lett 12:211–223
Golub G, Van Loan C (1996) Matrix computations. 3rd edn. Johns Hopkins University Press, Baltimore
Goyal A, Bonchi F, Lakshamanan LVS (2010) Learning influence probabilities in social networks. In: WSDM
Gruhl D, Guha R, Kumar R, Novak J, Tomkins A (2005) The predictive power of online chatter. In: KDD ’05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, ACM Press, New York, pp 78–87
Gruhl D, Liben-Nowell D, Guha R, Tomkins A (2004) Information diffusion through blogspace. SIGKDD Explor Newsl 6(2):43–52
Hu M, Lim E, Sun A, Lauw H, Vuong B (2007) Measuring article quality in wikipedia: models and evaluation. In: Proceedings of the Sixteenth ACM Conference on Conference on information and Knowledge Management, ACM, New York, pp 243–252
Java A, Kolari P, Finin T, Oates T (2006) Modeling the spread of influence on the blogosphere. In: Proceedings of the 15th International World Wide Web Conference
Katz E (1957) The two-step flow of communication: an up-to-date report on an hypothesis. Public Opin Q 21(1):61–78
Katz E, Lazarsfeld P (1955) Personal influence: the part played by people in the flow of mass communications. Free Press, Glencoe, IL
Kavanaugh A, Zin TT, Carroll JM, Schmitz J, Manuel Pérez-Qui N, Isenhour P (2006) When opinion leaders blog: new forms of citizen interaction. In: Proceedings of the 2006 international conference on Digital government research, ACM, New York, pp 79–88
Keeney RL, Raiffa H (1993) Decisions with multiple objectives: preferences and value tradeoffs. Cambridge University Press, Cambridge
Keller E, Berry J (2003) One American in ten tells the other nine how to vote, where to eat and, what to buy. They are The Influentials. The Free Press, New York
Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of the KDD, ACM Press, New York, pp 137–146
Kendall M (1938) A new measure of rank correlation. Biometrika 30:81–89
Kleinberg J (1998) Authoritative sources in a hyperlinked environment. In: 9th ACM-SIAM Symposium on Discrete Algorithms
Knoke D, Burt R (1983) Prominence. In: Applied network analysis, pp 195–222
Kolari P, Finin T, Joshi A (2006) SVMs for the blogosphere: Blog identification and splog detection. In: AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs
Kritikopoulos A, Sideri M, Varlamis I (2006) Blogrank: ranking weblogs based on connectivity and similarity features. In: AAA-IDEA ’06: Proceedings of the 2nd international workshop on Advanced architectures and algorithms for internet delivery and applications, ACM Press, New York
Lazarsfeld P, Berelson B, Gaudet H (1944) The People’s Choice. How the Voter Makes up His Mind in a Presidential Campaign 1944. Columbia University Press, New York
Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N (2007) Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, pp 420–429
Leskovec J, McGlohon M, Faloutsos C, Glance N, Hurst M (2007) Cascading behavior in large blog graphs. In: SIAM International Conference on Data Mining
Lin Y-R, Sundaram H, Chi Y, Tatemura J, Tseng BL (2007) Splog detection using self-similarity analysis on blog temporal dynamics. In: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web (AIRWeb), ACM press, New York, pp 1–8
Merton R (1968) Social theory and social structure. Free Press, New York
Mimno D, McCallum A (2007) Mining a digital library for influential authors. In: JCDL ’07: Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, ACM, New York, pp 105–106
Mishne G, de Rijke M (2006) Deriving wishlists from blogs show us your blog, and we’ll tell you what books to buy. In: Proceedings of the 15th international conference on World Wide Web, ACM Press, New York, pp 925–926
Moed H (2005) Citation analysis in research evaluation. Kluwer Academic Publishers, Dordrecht
Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University Press, Cambridge
Nakajima S, Tatemura J, Hino Y, Hara Y, Tanaka K (2005) Discovering important bloggers based on analyzing blog threads. In: Annual Workshop on the Weblogging Ecosystem
Ni X, Xue G-R, Ling X, Yu Y, Yang Q (2007) Exploring in the weblog space by detecting informative and affective articles. In: WWW ’07: Proceedings of the 16th international conference on World Wide Web, ACM, New York, pp 281–290
O’Reilly T (2005) What is Web 2.0 - design patterns and business models for the next generation of software. http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
Podolny J (2005) Status signals: a sociological study of market competition. Princeton University Press, Princeton
Richardson M, Domingos P (2002) Mining knowledge-sharing sites for viral marketing. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge Discovery and Data mining, ACM Press, New York, pp 61–70
Rogers E (1995) Diffusion of innovations. Free Press, New York
Rogers E, Shoemaker F (1971) Communication of innovations: a cross-cultural approach. Free Press, New York
Scoble R, Israel S (2006) Naked conversations: how blogs are changing the way businesses talk with customers. Wiley, London
Song X, Chi Y, Hino K, Tseng B (2007) Identifying opinion leaders in the blogosphere. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, ACM, New York, pp 971–974
Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15:72–101
Stefanone M, Jang C (2008) Writing for friends and family: the interpersonal nature of blogs. J ComputMediat Commun 13(1):123–140
Tang J, Sun J, Wang C, Yang Z (2009) Social influence analysis in large-scale networks. In: KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, pp 807–816
Thelwall M (2006) Bloggers under the London attacks: top information sources and topics. In: Proceedings of the 3rd annual workshop on webloging ecosystem: aggreation, analysis and dynamics
Turner J (1991) Social influence. Thomson Brooks/Cole, Belmont
Watts D (2007) Challenging the influentials hypothesis. WOMMA Meas Word Mouth 3:201–211
Watts D, Dodds P (2007) Influentials, networks, and public opinion formation. J Consum Res 34(4):441
Watts DJ, Peretti J (2007) Viral marketing in the real world. Harvard Business Review, Cambridge
Weng J, Peng Lim E, Jiang J, He Q (2010) Twitterrank: finding topic-sensitive influential twitterers. In: WSDM
Yin X, Han J, Yu PS (2007) Truth discovery with multiple conflicting information providers on the web. In: IEEE Transactions on Knowledge and Data Engineering (TKDE)
Zheng R, Li J, Chen H, Huang Z (2006) A framework for authorship identification of online messages: writing-style features and classification techniques. J Am Soc Inf Sci Technol 57(3):378–393
Acknowledgments
This research was funded in part by the National Science Foundations Social-Computational Systems (SoCS) Program within the Directorate for Computer and Information Science and Engineerings Division of Information and Intelligent Systems (Award numbers: IIS-1110868 and IIS-1110649), the US Office of Naval Research (Grant number: N000141010091), and the US Air Force Office of Scientific Research (Grant number: FA95500810132). We gratefully acknowledge this support.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Agarwal, N., Liu, H., Tang, L. et al. Modeling blogger influence in a community. Soc. Netw. Anal. Min. 2, 139–162 (2012). https://doi.org/10.1007/s13278-011-0039-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13278-011-0039-3