Measuring Social Media Influencer Index

Journal of Retailing and Consumer Services 49 (2019) 86–101

Measuring social media influencer index- insights from facebook, Twitter T

and Instagram
Anuja Aroraa,∗, Shivam Bansalb, Chandrashekhar Kandpalb, Reema Aswanic, Yogesh Dwivedid
Jaypee Institute of Information Technology, Noida, India
Exzeo Software Private Limited, India
Indian Institute of Technology Delhi, New Delhi, India
Swansea University, Wales, UK


Keywords: The growth of social media has completely revamped the way people interact, communicate and engage. These
Social media analytics platforms play a key role in facilitating greater outreach and influence. This study proposes a mechanism for
Influencer measuring the influencer index across popular social media platforms including Facebook, Twitter, and
Regression modelling Instagram. A set of features that determine the impact on the consumers are modelled using a regression ap-
Internet marketing
proach. The underlying machine learning algorithms including Ordinary Least Squares (OLS), K-NN Regression
(KNN), Support Vector Regression (SVR), and Lasso Regression models are adapted to compute a cumulative
score in terms of influencer index. Findings indicate that engagement, outreach, sentiment, and growth play a
key role in determining the influencers. Further, the ensemble of the four models resulted in the highest accuracy
of 93.7% followed by the KNN regression with 93.6%. The study has implications across various domains of e-
commerce, viral marketing, social media marketing and brand management wherein identification of key in-
formation propagators is essential. These influencer indices may further be utilized by e-commerce portals and
brands for the purpose of social media promotion and engagement for larger outreach.

1. Introduction be beneficial in various domains of business and management including

social commerce (Chen et al., 2017), e-governance (Dwivedi et al.,
Latest technological advancements such as Internet of Things (IOT) 2017; Vakeel&Panigrahi, 2018), political marketing (Grover et al.,
(Taylor et al., 2018; Lo and Campos, 2018), Internet of Everything 2018; Kapoor& Dwivedi, 2015) and digital marketing (Aggrawal et al.,
(IOE) (Zwick and Denegri-Knott, 2018), Mobile applications and Social 2017; Alalwan et al., 2017; Dwivedi et al., 2015; Pintadoet al., 2017;
media (Alalwan et al., 2018; Shareef et al., 2017, 2018; 2019; Shiau Parsons&Lepkowska-White, 2018).
et al., 2017, 2018) have brought number of decision making challenges Now-a-days, social media content has been used by various brands
for digital marketing industries. Specifically, Social media platforms for competing with the competitors, promoting products and offers, and
have become essentially a medium not only for communication among maintaining a reputation among the stakeholders (Brennan and Croft,
individual but also for several aspects of business sectors which in- 2012; Chen, 2013). However, it often becomes difficult for these brands
cludes decision making process (Choi, 2017), knowledge-based decision to actually monitor the impact of the brand positioning moves adopted
support systems (Chen et al., 2012; Ibrahim et al., 2016), brand pro- by them (Klostermann et al., 2018; Pike et al., 2018). Henceforth, this
motions (Kaplan and Haenlein, 2010; Lipsman et al., 2012), brand high volume data is changing the landscape of digital marketing and
marketing (Aggrawal et al., 2017a; Kapoor et al., 2018), brand and raised great challenges to turn this brand marketing data into business
product co-creation (Kamboj et al., 2018; Rathore et al., 2016), product insights using analytical modelling and management techniques. Social
diffusion (Aggrawal et al., 2017b) etc. In the current scenario where Influencer (SI) index is one such strategy through which brands can
there is a constant race for content promotion and propagation, orga- discover the right influencers based on their requirements for their
nizations are leveraging the power of social media for reaching out to brand promotion (Booth&Matic, 2011; Baldus, 2018). Social Media
the masses (Hanna et al., 2011; Kietzmann et al., 2011). It is known to Influencers are users those have highly established credibility for a





Received 4 January 2019; Received in revised form 26 February 2019; Accepted 17 March 2019
Available online 26 March 2019
0969-6989/ © 2019 Elsevier Ltd. All rights reserved.
A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

specific industry like Bollywood (Hearn and Schoenhoff, 2016), Tel- modelling. Finding and interpretation of all models and influence
ecom (Doyle, 2008), News, etc. These social media influencers have ranking results are detailed in section 5. Further, contribution to ex-
connections with large audience and others can also support and trust isting knowledge and implications are described in section 6 followed
them due to their admirable authenticity and position (Lou and Yuan, by concluding remark in section 7.
It becomes critically essential for brands to identify the right in- 2. Literature review
fluencers across the web through social media to promote their pro-
ducts and services (Booth&Matic, 2011; Huang et al., 2014). Brands can The exponential increase in the amount of content generated
directly leverage this to improve and enhance public relations by pro- through social media forces the network participants to strive for
moting their offerings for higher engagements (De Vries et al., 2011). greater attention and subsequent influence on the information takers
Identification of social media influencer can be the most important (Trusov et al., 2010). Literature highlights that influence can be easily
influence marketing strategies to increase the brand's influence on the predicted by URL clicks amongst other important metrics (Romero
target audience via their influencers (Lou and Yuan, 2018). Social et al., 2011). It is further evident that people can better leverage the
media influence thus plays a key role in this context at different levels power of social media by paying attention to the content outreach along
(Romero et al., 2011; Aggrawal et al., 2017). Businesses need social with focusing on extending networks (Lipsman et al., 2012). Existing
influence to connect with their existing and prospective customers studies also focus on different forms of social media including blogs and
(Mangold and Faulds, 2009). It is an essential requirement for greater conclude that these elite media outlets gain immense traction and have
interaction and engagement with the potential customers (De Vries, a subsequent social influence on the information consumers (Meraz,
Gensler and Leeflang, 2012). Further, it can also be beneficial for in- 2009; Berthon et al., 2012). This section is divided in three subsections
creasing the visibility in various online communities subsequently which includes Social media usage for influence identification;
leading to a greater outreach (Yang and Kent, 2014).
Influence is the ability to drive action and receive people's en- 2.1. Social media usage for influence identification
gagement on a post which is shared by a strong social influencer on
social media or in real life(Freberg et al., 2011). Since the internet is Social media platforms have led to entirely new ways of interaction,
now flooded with large number of influencers - celebrities, athletes, communication and engagement (Hansen et al., 2011). Because of the
musicians etc., it is necessary to cut through the noise and identify the availability of plethora of social networking and media options, it is not
right category of influencers at the right time (Gillin, 2008). However, a surprise that marketing professionals are very actively exploring these
computing the influencer index (Morone et al., 2016) is not a platforms for influencing their potential consumers (Hanna et al.,
straightforward task and requires the assessment of many data points 2011). Recent study by (Weeks et al., 2017) claims that opinion leaders
captured from various sources. Moreover, social media data is also not can be influential and can persuade their peers about news, movies,
structured in nature. Though it is available in plenty it needs the right politics, etc. on social media. Everyone has an influence on social media
approach to dissect the data into meaningful features (Kiss and Bichler, which could be predicted using an individual's attributes and historical
2008). Further, it is important to use the features from social media activities (Bakshy et al., 2011). Studies in literature explore the influ-
data and regress them in order to generate the influence index. The ence and propagation of content through Twitter (Aswani et al., 2017a;
elements which play a major role in the influencer index are - total Bakshy et al., 2011; Cha et., 2010), Facebook (Aswani et al., 2017b;
engagement, total reach, total sentiment, and total growth(Aggrawal Cavalli et al., 2011), GitHub (Bana and Arora, 2018) and other popular
et al., 2018). platforms. The impact is evident in various domains including health-
This research work is done to measure influencer index on varying care (McNeill and Briggs, 2014), education (Tess, 2013), business
social media portal. Basic statistical measures which are commonly (Qualman, 2010), Coding portal (Bana and Arora, 2018), fashion
used by researchers are not able to learn the system. Due to the same, marketing (Wiedmann et al., 2010).
various machine learning regression models- OLS, KNN, SVM and Lasso Apart from organizations, there are multitude of individuals in-
Regression model are applied to measure influence of various celeb- cluding celebrities, actors, bloggers, politicians who voice their opinion
rities on various social media applications. Basically, consumers' reac- on these platforms and act as influencers for the masses (Cha et al.,
tions towards the posts are considered as features. The above men- 2010; Dix et al., 2010; Fraser and Brown, 2002). They share their
tioned varying learning model will measure the influence of celebrities opinions, views, experiences and even daily routine activities that are
based on their reactions on the celebrities’ posts which is known as known to influence their viewers, fans and followers across the globe.
feature engineering. Therefore, a solution for influencer indexing using These influencers use a combination of these platforms for content
feature engineering and linear modelling techniques is proposed in this dissemination and larger outreach. Studies in literature highlight fra-
work which is a generic approach and can be applied to any social meworks that describe factors like spread ability, propagation, in-
media application to identify niche of influencers. Even, An ensemble tegration and nexus for content popularity (Mills, 2012). Neystadt et al.
learning model has also been developed to measure impact of influ- (2012) identified social influencers based on usage context and influ-
ences from social media data. The objective to propose this model is to encer type. Freberg et al., 2010) mentioned influences as third party
get better influencer index accuracy. This learning model can be endorser and they divert audience attitude through various social
adopted by brands to discover influencers based on their specific needs media platforms and Blogging sites.
and requirements. The article further in detail discusses the data used in
the model, feature engineering techniques, regression models, and 2.2. Identified knowledge gaps and research contributions
lastly, showcases the results in the form of influence indices.
The presentation of research work is divided in six sections- First Influencer marketing is often done by brands for building strong
section discusses the need to model influencer from social media con- relationships with the consumers via influencers, a strategy that is
tent. Literature on the same direction is discussed in section 2 followed mutually beneficial to everyone (Woodcocket al., 2011). The pre-
by social media usage for influence identification, identified knowledge liminary study in this direction is performed by (Lagree et al., 2017) in
gaps and research contributions. Third section provides a theoretical relation to the new form of online marketing known as influencer
basis to the study and focuses on research questions with reference to marketing. In their work, an empirical analysis is applied on twitter
hypothesis of the study. Research methodology for the study is dis- data. Influence marketing is basically to connect online personas with
cussed in section four which consists of four subsections-data pro- brands based on trust and engagement of target audiences on regular
curement, feature engineering, feature normalization, and regression basis (Childers et al., 2018). With the increase in the number of

A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

offerings by various brands, consumers often look out for authenticity social media content based influence measurement issue. Although, five
from the brands they interact with. For introducing familiarity and trust machine learning standard algorithms- Naïve bayes, k-nearest neigh-
factor, brands often use influencer experiences shared on both social bours, decision trees, support vector machines, and logistic regression
media and traditional media along with their posts and commercials are used by Ma et al. to predict popularity of a new hashtag on Twitter
respectively (Lou and Yuan, 2018). This makes the product more re- (Ma et al., 2013). The broad ambit of social media attracted computer
levant and trustworthy to the consumers. scientist to provide support to numerous business related activities such
Literature highlights that the emerging influencer community is as advertisement (Goeldi, 2011), recommendation (Taneja et al., 2018,
exercising significant power over brand perceptions (Childers et al., 2019), product popularity prediction (Jamali et al., 2009), etc. In order
2018). In Childers et al. work, influence marketing based research tries to glean useful business intelligence knowledge from extracted social
to find out insights and perception of influence marketing and for this media information, researchers used both supervised (He, 2013; Kelly
experiment data is collected by interviewing professionals of 19 ad- et al., 2015) and unsupervised machine learning techniques (Anshary
vertisement agencies. Lagree et al., in 2017 proposed a diffusion model and Trilaksono, 2016; Pham and Simoiu, 2016). Dai et al. proposed a
to overcome online influence marketing with persistence (OIMP) pro- decision support model in which researcher presents a Mining En-
blem and worked on real data gathered from twitter. Most recent work vironment for Decision (MinEDec) framework (Dai et al., 2011). It
in this direction is done by (Mallipeddi et al., 2018). This work has been analyzes unstructured data to gain business intelligence for a specific
done in two directions-selection of influencers and scheduling of in- outcome such as rival tracking, Environment change detection, stra-
fluencers’ ads on real data from twitter. Further, a polynomial time tegic matrix etc for competitive intelligence. Hence, machine learning
heuristic model is introduced to provide optimal solution. approaches have been used by researchers in order to solve business
These works have further been driven by the rampant growth of intelligence problem with respect to social media but influence in-
social media which acts as the major platform for influencer commu- dexing is not attempted yet using ML techniques. Various text mining
nication and subsequent engagement (Nabi, O'Cass and Siahtiri, 2019). and analysis tools have also been used to facilitate in this regard such as
McCormick has investigated celebrity endorsement and measured in- SPSS Clementine text mining tool, Nvivo 9, AMOS 18.
fluence of a product endorser (i.e. celebrity) in order to match con- Social media influence indexing is considered appropriate to eval-
sumers attitudes and purchase intentions (McCormick, 2016). Further, uate potential users those are main source of enhancing post influence
the selection of influencer is also very important when he/she has to be and influencers can be exposed based on various measures such as-size
chosen to be affiliated to the brand. Even, consumer purchase intention of their social media audience, page engagement, and page views
is influenced by credibility and parasocial interaction of social media Gaining post influence is the top most priority of all brand marketers.
application which is tested on Instagram and Youtube by Sokolova Higher influence on a specific social media leads to higher engagement
et al., in 2019 (Sokolova and Kefi, 2019). It is thus critical to establish and higher visibility of content and helps in increasing the high order of
an influencer index across social media platforms to enable selection of discussion among users. This helps post to get viral in market and also
influencers by various brands (Byrne et al., 2017). Undoubtedly, tech- can be used for online advertisement purpose. The following research
nologies need to be developed in order to identify and trace influencer questions are investigated in this research study:
on the basis of their content dissemination on varying social media
RQ1. What is the impact of social media engagement, outreach and
sentiment (in discussions) on social influence index?
Researchers worked on influence tracing based on context and in-
fluencer type but influencers’ role on varying social media portals is In recent study, Sokolova and Kefi investigated the persuasive cues
untouched area and needs in-depth exploration (Mittal et al., 2017). of fashion and beauty influencers based on Youtube and Instagram
However, to the best of our knowledge there are no studies in literature content. Study refers that the audience created para-social interaction
that have taken into consideration an integration of metrics from var- with the influencer (Sokolova and Kefi, 2019). This study claims that
ious platforms for measuring the influence on the target audience. The attitude homophily positively affect para-social interactions. On similar
role of machine learning approach to resolve influence indexing pro- ground, impact of audience engagements on social influencers’ content
blem is also not explored in existing literature. is questionable which is tried to resolve in this research work.
Based on above facts, this study tries to: (a) identify the ever Indeed, social network consumer behaviour is an important and
growing impact of varying social media platforms and its impact on the essential factor to recognize social media influence. Furthermore, an
masses; (b) measure relevant attributes from three popular platforms- attempt is presented to conceptualize the impact of social media en-
Facebook, Twitter and Instagram; (c) compute an influencer index for gagement, outreach and sentiment of a specific discussion on influencer
the top celebrities in the Indian movie industry using identified relevant indexing. Some multidimensional factors with respect to a specific topic
attributes; and (d) predict the influencer index of top celebrities using posts are: people-talking-about, likes-count, followers, engagement,
machine learning approaches. outreach, posting rate, post-sentiment etc. These factors seem useful to
conform to social influence/prestige. The Bollywood celebrities are
3. Theoretical basis and hypotheses development chosen as application domain from three well known social media
platforms- Facebook, Twitter, and Instagram.
To the best of my knowledge, there is no study in the literature Therefore, to investigate RQ1, following hypothesises are framed:
which highlights how the social influence can be measured on social
media sites in relation to the users’ engagements. Therefore, there is a • H1: Average likes on Instagram has maximum impact for defining
need to introduce a method which can be used to measure social in- social influence as compared to other factor of twitter and Facebook.
fluence of a celebrity based on how intimately they are able to engage
users with their post. Various computing approaches such as To test the significance of features of all three social media plat-
Evolutionary algorithm (Agarwal and Mehta, 2018), Nature inspired forms regarding social influence two models of multiple linear regres-
algorithm (Aswani, Ghrera, Kar and Chandra, 2017) and machine sion (MLR) are applied. These MLR models are- Ordinary Least Squares
learning approaches (Joseph et al., 2018) are used by researchers as (OLS), support vector machine regression (SVM). These MLR are used
learning mechanism on other social media issues. Classification models to test which features of different social media has maximum impact for
are also used to learn the tweets for predicting test data (Joseph et al., defining social influence the average and whether Instagram likes give
2018). maximum impact to social influence.
Henceforth, machine learning models had been used as the theo-
retical lens for framing the context of learning model in order to resolve • H2: Total engagements garnered by the post of Instagram are more
A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

impactful as compared to twitter engagement and Facebook en- major OSNs are - facebook as relationship network, Instagram as media
gagement. sharing network and Twitter as social publishing network. Celebrities
end up posting multiple contents across these platforms while availing
Other than OLS and SVM regression, one more multiple linear re- these services. Indeed, celebrities post content on multiple OSN based
gression model- Lasso regression is experimented. All three models on their popularity. Each and every celebrity has variable influence on
prove that Instagram total engagements are putting better impact as varying social media sites but still no approach exist to measure social
compared to twitter and Facebook. influence on all social media applications. On different social media
platform, influence of each one is measured with a set of weighted at-
• H3: Different features have varying significance for varying social tributes in accordance of that specific portal. Based on this hypotheses
media platforms. H5 and H6 are spotted.

To validate the significance and impact of consumers four multiple • H5: A celebrity has distinctive exposure across OSNs, thereby con-
linear regression models- OLS, SVM regression, KNN-Regression, and tributing different influence index on different OSN.
Lasso regression are modelled. These models are used to identify the • H6: Diverse user actions on diverse social media play an important
greatest impact features i.e. factor those affect social influence max- role for measuring accurate influence.
The subsequent section focuses on the analysis to address the
RQ2. How social media engagements had been associated to identify
identified research questions using a mixed research methodology
top social media influencers?
comprising of social media analytics and machine learning approaches
Social media contains number of factors which influence customer used in the study.
engagements. Media and content type of posts is the most significant
effect examined by Farook et al.. In their work five factors those sig- 4. Research methodology
nificantly affect the influence (Farook and Abeysekara, 2016) are re-
vealed. One more study claims that identification of engagement has The current study uses a mixed researchmethodology comprising of
the significant impact on customer engagement (Prenticeet al., 2019). aspects of social media analytics along with machine learning ap-
Hence, this proves that social media is shaping influencers based on proaches to compute the influencer index across different social media.
their interaction on various platforms. These reactions controls and Social influence using agent based simulation and regression model has
influences the consumer behaviour. Undoubtedly, Social media has been measured by Chan (2017). In this research, agent interaction by
become the integral part to influence the society. exchanging social belief and their aggregated neighbours (social con-
The multiple linear regression (MLR) models are used to identify top nections) belief is described by linear regression model.
influencers. An ensemble model is experimented in stacking order The subsequent sub-sections focus on the details of data procure-
based on accuracy of numerous applied MLR models. Best accuracy is ment, feature engineering, feature normalization, regression modelling
achieved by ensemble model, hence this is used for influencer identi- and subsequent ranking of influencers. The study uses multiple re-
fication result. gression approaches for exploring the best accuracy. Somewhat similar
Hence, to validate RQ2, designed hypothesis is – sort of work has been presented bypopescu et al. to explore and mea-
sure the impact of student performance using social media engagement.
RQ3. Which social media platform contributes more to the influencer
Students’ active participation has been explored on three social media
index and how much accuracy has been achieved in social influence
tools: wiki, blog, and microblogging (popescu et al., 2016). They have
index assignment?
applied multiple linear regression model to predict final grades. Their

• H4: Social media interactions have strong connection in top

model also presented that several features have an influence on the
grade. Based on this study, the primary reason behind using multiple
influencer identification.
regression modelling techniques is to identify the variables having
greater contribution towards the social influencer index or in other
Social media content popularity comparative studies are performed
words which features have a positive/higher weight in comparison to
by numerous researchers. These studies claim the method to identify
the most suitable social media platform to a specific kind of post
(Kaushal et al., 2017; Sokolova and Kefi, 2019; Macarthy, 2018). Boss
4.1. Data procurement
et al. presented an approach to track and measure influence of social
networking members for e-commerce sites (Boss et al., 2018). Hence-
This sub-section discusses the steps used for data collection and
forth, a platform is required in order to measure influence on varying
dataset preparation. An end-to-end data procurement pipeline com-
social media to measure which social media is preferable for what kind
prising of multiple steps is created for the purpose of procuring data for
of content. In order to influence consumers, celebrity (brand) ends of
this study. Every step is linked together and is used to fetch the desired
posting content across multiple social media platforms. For each social
data instances and attributes at regular intervals from variety of
media, a specific celebrity influence can be identified using different set
sources. Fig. 1 illustrates the entire data procurement process through a
of features. Different social media sites have variation in influence of a
brief flow diagram for the same.
specific celebrity. Therefore, a framework is needed to compute influ-
The procurement process starts with the preparation of input seeds.
ence on varying social media portals to distinguish which platform is
The seeds serve as the inputs to the different social media platforms
having which celebrity as most influential. To dig into the same, ce-
considered in the study. As an example, the Twitter handle of the in-
lebrities and consumers actions on OSNs should be closely monitored
fluencers serve as the input seed for data extraction, the user identifier
and analysed. Consumers are exposed to diversity of opinion on varying
for the influencer is an input seed for Facebook, and finally the
social media platforms which helps them to refine their thought. Fur-
Instagram URL can be used for information extraction. To prepare these
thermore, this diversity of social media causes the variation in influence
input seeds for various platforms considered in the current work in-
on different OSNs. To measure this, a detailed lab experiment for
cluding Twitter, Facebook and Instagram, several available lists on the
consumer reaction to measure celebrity's influence to assess effective-
web are searched and 1000 seeds for various celebrities (influencers)
ness and accuracy of influence index on varying portals is must.
are manually collated. These seeds are for different celebrities across
The elementary functionalities of online social network differ. The
different categories including international athletes, entertainment

A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

Fig. 1. Data procurement process pipeline.

actors and others. modelling and analysis. However, since the current study deals with a
The second component of this pipeline is the data connectors which specific domain/problem statement, a set of desired features have been
are responsible for pulling out the relevant data from the social media generated from the existing ones. For the purpose of influencer iden-
platforms using the defined seed inputs. A Twitter Rest API, Instagram tification and influencer ranking there are six major components/
API, and Facebook Graph API are used to fetch the desired data which is buckets that are relevant for an influencer's overall rank have been
saved in as a raw file in JSON format. The social media platforms computed as a part of this study. These new features have been created
(Facebook, Twitter, and Instagram) under consideration provideseveral under every social media category. The broad feature categories include
page insights including the type of post that is adequate for engaging the Overall Footprint (OF), Engagements & Outreach (EO), Hourly
people and the number of posts that reach a certain number of users. Engagement Velocity (HEV), Daily Engagement Velocity (DEV),
Subsequently, the third component focuses on the data parsing layer Audience Sentiment (AS) and Posting Rate (PR). These groups and the
in which the raw data is parsed to generate relevant metrics used for the features categorized under the same are described subsequently.
analysis. This includes mining the overall followers for the considered
influencers, their engagements on posts, the content shared by them
and the entire meta-data associated with the post (likes, comments and 4.2.1. Overall Footprint (OF)
shares). Lastly, data linking module becomes the final component of the This metric measures the overall presence of an influencer across
pipeline. Since, thedata for same seed (same personality) is extracted three channels - Twitter, Facebook and Instagram. This head includes
from different platforms, it is essential to collate the information mined the total Facebook page likes, Instagram page likes, twitter followers
from these platforms and bundle them together. This is done by count. The raw numbers are bucketed and normalized in a standard
maintaining the seed across all the captured data and using the same as range of 0–100 using min-max normalization technique. It also includes
a unique identifier in the dataset. data-buckets & ranges created by exploratory data analysis and in-
Since the entire data collection pipeline is connected together, it has corporating domain knowledge.
the capability to fetch data at different rates and frequencies. The study raw_overall_footprint = sum(twitter followers, Instagram fol-
mines the data at different frequencies that includes minute-wise, hour- lowers, Facebook page likes, Facebook people talking about).
wise and day-wise. The dynamic data pull enabled to create greater
number of temporal features associated with the influencer. The com-
4.2.2. Engagements & Outreach (EO)
plete data comprised of 900 social influencers and their social media
This metric measures the average engagements per post garnered by
attributes obtained from different channels. The model is run on the
an influencer across the three channels. The per post engagements are
data which is obtained in a 90 day period window. The summary of
computed by measuring likes, comments, shares, retweets, and fa-
data statistics is highlighted in Table 1.
vourites counts on a post created by the influencer. These engagements
The subsequent section provides a description of the features used to
are aggregated from the influencer post level data of last 30 days from
model the social influencer index.
the current date. These engagements are normalized and bucketed in a
standard range (using the data-buckets & ranges created specifically for
4.2. Feature engineering each channel by observing and analysing the complete data of an in-
Several features can be directly taken from social media for Total_outreach = sum(twitter replies, favourites, retweets, likes,
comments, shares, reactions).
Table 1
Data statistics summary.
Twitter API Facebook Graph API Instagram API 4.2.3. Hourly Engagement Velocity (HEV)
This metric measures the change in engagements per hour from the
Total Seeds 1074 689 783 time of creation of the post. The model considers the average engage-
Total Documents Collected 93,485 37,133 47,113
ments of first hour, second hour, fifth hour and the tenth hour. The final
Data collected minute-wise 234.73 85.05 110.565
Data collected hour-wise 14,100.55 5103.25 6634.225 score is computed by aggregating these values, which are further nor-
malized and bucketed in the similar manner as the other metrics.

A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

4.2.4. Daily Engagement Velocity (DEV) a common scale. Therefore, in order to calculate the normalized value
This metric is analogous to hourly engagements. However, it mea- (Z) for an observed value of x, Equation (1) is used. The normalized
sures the change in engagements per day (instead of hour) from the resultssnapshotis depicted in Table 2.
time of creation of the post. A final score is computed by aggregating
x min(x )
the average engagements of first, second and seventh day. The values Z=
[max(x ) min (x )] (1)
are then normalized using the same procedure.
facebook_span_raw_engagement = sum(likes, comments, shares) * wheremin and max are the minimum and maximum values for the
total_span_count/total_posts_collected_every_span. feature x given its range.
instagram_span_raw_engagement = sum(likes, comments, shares) *
total_span_count/total_posts_collected_every_span. 4.4. Regression modelling
twitter_span_raw_engagement = sum(retweets, favourites, replies)
* total_span_count/Total_posts_collected_every_span span - > “hour”, The initial impetus in this direction is tested by anagnostopoulos
“minute”, “day”, “week” et al., they have applied logistic regression to quantify the extent of
social relationship and proves that influence is likely source of corre-
4.2.5. Audience sentiment (AS) lation with the help of shuffle test (Anagnostopoulos et al., 2008). A
This metric measures average audience sentiment from audience hierarchical classification scheme is proposed in a survey paper which
comments and mentions. Higher the value of this number means more depicts that quantitative assessment methods-influence metrics, in-
positive sentiment is observed from the audience conversations, lower formation flow and influence model (including machine learning
value of this number implies more negative sentiment is observed from models), network/graph properties exist in literature to model social
the audience conversations. influence. Even, qualitative assessment is also possible using social
The overall positive and negative sentiment in the influencer tweets modelling, social matching, and community detection (Razis et al.,
and posts is computed. The bag of words classifier has been adopted to 2018).
compute the negative, positive, or neutral sentiment of the data. The problem under consideration in this study is surrounding in-
fluencer indexing which is a classical regression problem. When it
4.2.6. Posting rate (PR) comes to user engagement on social media, some social media users
This metric measures the average posting rate for the influencer. It have higher engagement on these platforms and tend to tweet/post
is a measure of the rate at which influencers make the most, if the value more often as compared to others. Thus, the features used for the
is too low means influencer is less active on social media channels, if it prediction of social influence has the values in continuous ranges and so
is high influencer is most active. is our target variable, influencer index becomes a continuous variable
The system is designed to calculate the span dynamically by un- to estimate. Regression analysis comes as the perfect choice to solve the
derstanding the overall post distribution of the influencers. For ex- problem at hand. The concept of regression expresses a statistic con-
ample, if an influencer posts very frequently, his span period is defined nection indicating the average regression on the behaviour of variables.
as “minute”, if it posts moderately slowly, its span period will be The target variable in this case is computed by combining influencer
“hour”, or it posts really slow then its span will be “weekly”. lists for actual brands. The study uses a collection of influencer data for
twitter_posting_rate = # of tweets/total_spans. specific brands across different industries including Entertainment,
instagram_posting_rate = # of instagram posts/total_spans. Sports and Publishing amongst others. The list is collated by combining
facebook_posting_rate = # of facebook posts/total_spans. 43 Indian brands comprising of a combined influencer list of about
Table 2 lists the 39 features for every bucket with the category and 1000 Celebrities, Bloggers and YouTubers.
source of extraction stated against the component being considered. For The model is adopted to explain the variation of the influencer
daily and hourly engagement [plat] may be replaced by fb (Facebook), index across instances. The variation of the dependant variable is ex-
tw (Twitter) and insta (Instagram) resulting in a set of 18 features on plained by computed its covariance with the independent variables.
temporal engagement. The independent variables in the current study include the average
Instagram likes, Average Tweets, and Facebook posts to name a few. A
4.3. Feature normalization Multiple Linear Regression (MLR) model is used to compute the influ-
encer index. Equation (2) describes the mathematical model.
The collected data and extracted features vary largely in terms of
y= + 1 x1 + 2 x2 + ........ + k xk + (2)
range of values and are on different scales. For instance, the total 0

footprint of an influencer may lie between 100,000 and 500000 while where, y is the explained variable; x1, x2 ......xk are thek explanatory
the posting rate may be between 10 and 100. On the other hand the variables, 0, 1, ....... k are the model parameters and is the specifi-
range for audience sentiment may be as low as −2 to 2. In scenarios cation error being the difference between the true and the specified
like the one discussed, if a simple regression metric is used to model the model.
problem, the Audience Sentiment feature will not play any significant Further, to model the above defined regression problem the study
role because it is several orders smaller as compared to other features. uses three primary implementations including Ordinary Least Squares
This feature, which on the contrary, seems insignificant, may actually (OLS) (Craven and Islam, 2011), K-NN Regression (KNN) (Hastie &
contain extremely important information which can be useful for Tibshirani, 1996), Support Vector Regression (SVR)(Basaket al., 2007)
computation of the final outcome. Thus, using these features without and Lasso with cross validation (Tibshirani, 1996). While adopting such
normalization may bias the outcome in favour of the feature with larger models for regression problems, the main motive is to identify the Best
computing the outcome values. If the scales for different features are Linear Unbiased Estimator (BLUE). The basic idea is toidentify which
wildly different, this can have a knock-on effect on the ability of re- variables have a greater impact in creating social influencer index or in
gression models to learn. Hence, in order to make the contribution of other words which features are more important having a positive/
these features equal while, it is always a pre-requisite to normalize the higher weight in comparison to others. Thus, in order to achieve the
data which brings the features on the same scale. A depiction of few desired results the study attempts to analyse the data by regressing the
instances prior to the normalization process is shown in Table 3. target variable (influencer index) using OLS, KNN, SVR and Lasso.
The study uses min-max normalization to scale every feature in the For the initial comparison among the identified models, Grid
range of 0–100. Min-max normalization is often adopted for feature searching and Parameter Grid is applied to text the model performance
scaling where the values of a numeric range of a feature are reduced to based on data learning and relationship with the target variable. The

A. Arora, et al.

Table 2
Feature for influencer index computation.
Feature Category Source Acronym Definition

People Talking About Overall Footprint (OF) Page facebook_PTA This is the number of people who have created a story from your Page post in the form of liking, commenting, sharing the page's posts
Total Likes Page *_likes Total Number of Likes on the Facebook page of the brand
Twitter Followers Timeline twitter_followers Total number of twitter followers of the brand's twitter handle
Instagram Followers Page instagram_followers Total number of followers of the brand page on Instagram
Average Engagement
Twitter Engagement (EO) Tweet avg_eng_tw This number is derived as the average of the sum of retweets and favourites on a tweet over all the tweets made by the brand
Instagram Post avg_eng_insta This number is derived as the average of the sum of likes, comments and shares on an Instagram post over all of the Instagram posts of the brand
Facebook Post avg_eng_fb This number is derived as the average of the sum of likes, comments and shares on an Facebook post over all of the Facebook posts of the brand
Aggregated Likes/Favorites/Shares/Comments/Retweets
LikesFacebook Outreach (EO) Post avg_likes_fb Average number of likes garnered by the Facebook posts of the brand
Comments Facebook Post avg_comments_fb Average number of comments garnered by the Facebook posts of the brand
Likes Instagram Post avg_likes_insta Average number of likes garnered by the Instagram posts of the brand
CommentsInstagram Post avg_ Average number of comments garnered by the Facebook posts of the brand
Shares Instagram Post avg_ Average number of shares garnered by the Instagram posts of the brand

Favourites Facebook Post avg_shares_fb Average number of shares garnered by the Facebook posts of the brand
RT –Twitter Post avg_rt Average number of retweets garnered by the tweets of the brand
Hour 1 Hourly Engagement (HEV) Post h1_tot_eng_[plat] Total engagement garnered by the posts on the platform in the 1st hour since it was posted.
Hour 5 Post h5_tot_eng_[plat] Total engagement garnered by the posts on the platform till the 5th hour since it was posted.
Hour 10 Post h10_till_eng_[plat] Total engagement garnered by the posts on the platform till the 10th hour since it was posted.
Day 1 Daily Engagement (DEV) Post d1_eng_ [plat] Total engagement garnered by the posts on the platformtill the 24th hour since it was posted.
Day 2 Post d2_eng_ [plat] Total engagement garnered by the posts on the platformtill two days since it was posted.
Day 7 Post d7_eng_tot_[plat] Total engagement garnered by the posts on the platformtill first week since it was posted.
Average Post Rate
Twitter Posting Rate (PR) Timeline avg_post_rate_tw The rate at which the brand makes tweets. This number is obtained by averaging out the time gap between successive tweets.
Facebook Page avg_post_rate_fb The rate at which the brand makes Facebook posts. This number is obtained by averaging out the time gap between successive posts.
Instagram Page avg_post_rate_in The rate at which the brand makes Instagram posts. This number is obtained by averaging out the time gap between successive posts.
Average Audience Sentiment
Twitter Audience Sentiment (AS) Replies avg_sent_tw Average value of overall sentiment computed using the bag of words approach for all the tweets.
Facebook Comments avg_sent_fb Average value of overall sentiment computed using the bag of words approach for all the Facebook posts.
Instagram Comments avg_sent_insta Average value of overall sentiment computed using the bag of words approach for all the Instagram posts.
Journal of Retailing and Consumer Services 49 (2019) 86–101
A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

Table 3
Sample data instances for raw dataset.
brand_id facebook_PTA facebook_likes twitter_followers instagram_followers max_eng_tw avg_eng_tw max_eng_fb avg_eng_fb

AmyJackson 21,789 2,338,392 902,351 2,273,038 2911 1007.08 1891 1891

AnoushkaShankar 14,298 284,841 25,644 20,831 101 25.60 23,424 3282.70
AnushaDandekar 69,604 1,738,888 540,137 517,705 422 169.12 37,182 9396.36
AnushkaSharma 325,524 6,079,571 9,146,729 8,028,658 6279 3736.33 84,771 19,011.57
AyeshaTakia 0 0 389,440 153,459 0 0 0 0
BipashaBasu 14,489 5,896,143 4,955,650 3,683,686 496 235.38 0 0
BrunaAbdullah 1177 155,988 64,521 213,454 28 18.15 1134 352.30
ChitrangdaSingh 0 0 780,126 0 1031 569.17 0 0
DeepikaPadukone 424,497 32,951,962 16,909,445 13,751,079 5803 3809.38 175,798 71,343.36

test takes into consideration the default parameters like C , number of popular nonlinear function that used for linear learning machine
neighbours etc. Grid Searching and Parameter Grid are applied to test mapping into high dimensional kernel induced feature space. The ca-
the performance of different models by analyzing the data learning and pacity of the system is controlled by parameters that do not depend on
relationship with the target variable. Each model is evaluated on dif- the dimensionality of feature space. Similar to how the classification
ferent parameters including accuracy (R-Squared Scores), Mean approach works there is motivation to seek and optimize the general-
Absolute Error (MAE), Mean Squared Errors (MSE) and the feature ization bounds given for the regression model.
coefficients. The parameter tuning is done using model selection The loss function often referred to as the epsilon intensive function
Parameter Grid mechanism havinggrid of parameters that possess dis- is used since it is known to ignore errors. This is also known to reach a
crete number of values for each. The weights for the model coefficients globally optimum solutionand simultaneously ensures a reliable gen-
and intercepts are extracted and plotted for linear kernels. The sub- eralization bound. In addition to this, SVR presents the solution using
sequent sub-sections discuss in detail the three regression models small subset of training points whichprovides enormous computational
adopted in the study. advantages. The dataset is scaled to train the regression model using
linear kernel as expressed in Equation (7).
4.4.1. Ordinary Least Squares (OLS) As a part of tuning the model the grid search functionality is used to
The MLR model adopted in the current study, is oftenan ideal choice test the model's accuracy and lastly the model with best hyper-para-
while modelling the linear relationship between a dependent variable meters in the grid search is adopted for computing the influencer index.
(Target) and one or more independent variables (Predictors) (Andrews, Fig. 3 highlights the features significant features along with the
1974). MLR is based on OLS, the model is fit such that the sum-of- weights.
squares of the differences of the observed and predicted values is k
minimized. The MLR model is based on several assumptions (e.g., errors f (x , w ) = wj gj (x ) + b
are normally distributed with zero mean and constant variance). Pro- j =1 (7)
vided the assumptions are satisfied, the regression estimators are op-
where, g j (x) denotes a set of nonlinear transformations, and b is the bias
timal, The optimality is judged by the fact that the estimators are-
that can be dropped in case of data having zero mean.
unbiased (expect and true value of the estimator are same), efficient
(variance is small as compared to other estimators), and consistent
(estimator bias and variance tend to approach zero as the sample size 4.4.3. K-NN regression
approaches infinity). The square of the determination coefficient K-nearest neighbours (KNN) is amongst the popular yet most simple
(DetCeof )in Equation (3) describes the proportion of variance of the algorithmsthatpredicts the numerical target based on a similarity
dependent variable explained by the regression model. measure which is often any distance functions. Over the decades KNN
has been used in statistical estimation and pattern recognition as a
SumSqTot SumSqEr popular and efficient non-parametric technique. A simple im-
DetCeof 2 = =1
SumSqReg SumSqTot (3) plementation of KNN regression is to calculate the average of the nu-
merical target of the ‘k’ nearest neighbours. Another approach uses an
where, SumSqTot , SumSqReg and SumSqEr are representatives of sum
inverse distance weighted average of the nearest neighbours. The re-
of squares total, regression and errors respectively. The same are de-
gression variant adopts the same distance functions as the KNN classi-
fined by Equations (4)–(6).
fication. The Euclidian (Euc _Dist ) and Manhattan (Man _Dist ) distance
SumSqTot = (y y)
(4) is expressed by Equations (8) and (9) respectively where x and y are the
two data instances between which the distance is computed.
SumSqReg = (y y ) (5) k
Euc _Dist = (x i yi )2
SumSqEr = (y y )2 (6) i=1 (8)

For the regression model to be perfect, the SumSqEr isideally zero k

Man _Dist = | xi yi |
while the DetCeof 2 is 1. On the contrary, if the regression model is a total (9)
failure, SumSqEr and SumSqTot become equal and no variance is ex-
plained by the regression making the value of DetCeof 2 zero. However, it
is important to keep in mind that there is no direct relationship between 4.4.4. Lasso regression
high determination coefficientand causation. The Lasso does both parameter shrinkage and variable selection
automatically. Since, along with knowing the weights for the features
4.4.2. Support Vector Regression (SVR) we would also like to know the least important variables which can be
Support Vector Machine can be applied not only to classification eliminated. This will also be eventually informative and thus l1 reg-
problems but also to the case of regression. It comprises of all the main ularisation is used. After penalizing (constraining the sum of the ab-
features that characterize maximum margin algorithm which is a solute values of the estimates), some of the parameter estimates may be

A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

exactly zero. The larger the penalty, the further estimates are shrunk 5.1.2. SVM regression results
towards zero. This is convenient when some automatic feature/variable SVM Regression results show the features having higher importance
selection needs to be done or even while dealing with highly correlated on the dependent variable along with weights (Ma et al., 2013). SVM
predictors, where standard regression will usually have regression Regression provided significant features result is depicted in Fig. 3. The
coefficients that are exceptionally large. Mathematically, it consists of a feature which has highest significance as compared to others is average
linear model trained with l1prior as regularisation model. The objective likes on Instagram (significance value = 1.112). Second highly sig-
function (ObjFunc ) to minimize is expressed in Equation (10). nificant feature is engagement of the post on Instagram till the 10th
hour (significance value = 0.748) which is followed by the average
ObjFunc = min || Xw y ||22 + ||w||1 engagement on Twitter (significance value = 0.470) and the engage-
2nsamples (10)
ment on Facebook over a week (significance value = 0.365). The re-
The lasso estimate thus solves the minimization of the least-squares sults are comparable with OLS model. Even, some features significance
penalty with ||w||1 added, where is a constant and ||w||1 is the l1-norm value ranking is almost same as given by OLS model.
of the parameter vector. Ordinary linear regression and SVM regression results support
Hypothesis 1 and Hypothesis 2. Hence, H1 and H2 are not rejected and
present following as true: Average likes on Instagram has maximum
5. Findings and interpretations
impact for defining social influence as compared to other factor of
twitter; Facebook and total engagements garnered by the post of
This section is divided into three subsections-section 5.1 presents
Instagram are more impactful as compared to twitter.
the findings of various MLR techniques to identify the high impact
features i.e. rank features in accordance to their significance with social
5.1.3. KNN regression results
influence. Section 5.2 illustrates the MLR resultant top influencers in
Choosing the optimal value for ‘k’ is the most critical aspect of
terms of their percentile. Section 5.3 shows the comparative influence
adopting KNN Regression approach. Since, k is a critical tuneable
of a specific celebrity on varying social media platforms and finally
hyper-parameter. The model is trained with different values of k and is
section 5.4 presents the accuracy results of various MLR techniques and
subsequently checked for accuracy as illustrated in Fig. 4 which clearly
proposed ensemble technique for social influence indexing.
represents that the accuracy is highest for 2 neighbours.
Further, using KNN with k being 2, the regression model identifies
5.1. High impact features identification results the significant features along with their corresponding weights (Ma
et al., 2013). The features with a greater importance as obtained by the
Using the MLR techniques (Joseph et al., 2018), features those are KNN regression model are illustrated in Fig. 5.
having high association with social influence are identified. We had The total engagement gained by the Facebook post of the influencer
proposed the following three hypotheses. up to the 5th hour since it was posted contains maximum significance
value = 0.45. further, Engagement garnered on Facebook till the end of
• H1: Average likes on Instagram has maximum impact for defining the week, Facebook total engagement, engagement over varied inter-
social influence as compared to other factor of twitter and facebook. vals on Instagram is at second highest value 0.44.
• H2: Total engagements garnered by the post of Instagram are more The results obtained from the KNN approach possess weights having
impactful as compared to twitter. negligible difference varies from 0.45 to 0.36 which reflects the fact
• H3: Different features have varying significance for varying social that the top ten features are almost equally important while computing
media platforms. the final influencer scores.

For testing the hypothesis, Four MLR techniques: OLS, SVM 5.1.4. Lasso regression results
Regression, KNN Regression, lasso regression are applied in order to The current study uses the Lasso Regression to train the model and
find out significant feature results. The implementations outcome of all five-fold cross validation for the purpose of result verification. Fig. 6
the MLR techniques is shown in further sub sections. illustrates the variation of mean R2 score during cross validation with
varying scores.
5.1.1. Ordinary least square (OLS) results Fig. 7 presents the significant features obtained from Lasso Re-
The OLS experiment utilises this simple yet efficient regression ap- gression along with their positive and negative weights.
proach for modelling the Social Influence index. The results obtained When it comes to the significant features identified by Lasso re-
are after running the regression analysis (Joseph et al., 2018). It is gression, Result shows that Instagram engagement for a week is the
evident from the statistics summary that the determination coefficient most significant feature with lasso model coefficient value 0.29. Second
(R-Squared) is 0.894 which is close to 1. most significant feature is average twitter engagement (significance
Further, the model also proves beneficial in identifying the features value = 0.25). Further, maximum Instagram engagement, Facebook
that have a greater impact on the target social influencer index. Fig. 2 is engagement for a week, etc are in decreasing ranking order as shown in
representative of the features that seem to have higher significance on Fig. 7.
the dependent variable along with the weights using OLS. The average The graph also presents negative coefficients wherein highest ne-
number of likes on Instagram is noticed to have the highest significance, gative coefficient is average engagement over Instagram (significance
followed by rate of engagement on Instagram with weights 1.747 and value = −0.36). This clearly indicates that the Instagram post is
0.738 respectively. It is evident from the graph that average likes on tending to gain higher engagement over time. The average tweets and
Instagram is the most significant feature (significance value = 1.747) as Instagram followers also showcase negative weights indicating a ne-
obtained from the OLS model, followed by engagement on Instagram gative impact while computing the influencer score.
till the tenth hour since the post was made (significance Therefore, H3 hypothesis is not rejected. KNN-Regression result and
value = 0.738). This is followed by the average engagement garnered Lasso Regression result shows that features have varying significance
on Twitter (significance value = 0.381) and weekly engagement on value for varying social media platforms which was out hypothesis 3.
Facebook (significance value = 0.342) and Twitter (significance
value = 0.289). The average number of retweets on Twitter also shows 5.2. Ranking results
significant importance. Least significant feature is twitter engagement
on day 2(0.101). In previous section, weights are derived by analysing feature

A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

Fig. 2. Significant Features obtained by OLS.

Fig. 3. Significant Features obtained by SVR.

importance of each variable based on the domain knowledge. These are

further fined tuned by iterating output results insights. Thus, we hy-
pothesize that social media interaction (i.e. engagement, outreach, etc)
have the ability to measure social influence index on social media
portals and initialize the process for top influence identification.

• H4: Social media interactions have strong connection in top influ-

encer identification.

The weights are validated using linear regression model and en-
semble gradient boosting model. For this, Klout score is considerable
dependent variable. Klout is a social media analytics website which rate
users in between 1 and 100 based on their online social influence. The
study also uses percentiles and z-score to compute the right buckets for
Fig. 4. Accuracy for different ‘k’ neighbours. a list (see Table 4).
The influencer scores forIndiancelebrities are estimated using the
weights of the features generated from the regression models and a list

A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

Fig. 5. Significant Features obtained by KNN.

contributing different influence index on different OSN.

Fig. 8 illustrates the result of influencer scores for some celebrities

across the three social media platforms namely Facebook, Twitter and
Instagram. The scores have been computed based on the Outreach,
Footprint, Engagement and Sentiment of the influencer. It is evident
from the table that DeepikaPadukone has a very high influencer score
on Facebook as well as on Twitter while it is comparatively low on
Instagram as compared to the other celebrities. On the other side, An-
ushka Sharma's Twitter influence and Ayesha Takia'sInstagram influ-
ence has been low.

5.4. Social influence indexing accuracy results

The critical influence index of Indian celebrities is obtained from ML

regression models. In order to compute Social Influence Index on
varying OSN based on different features, four fundamental regression
models- Ordinary least square (OLS), Support vector regression (SVR),
K-Nearest neighbour regression (KNN-R) and Lasso regression is ap-
Fig. 6. Cross Validation Accuracy vs. plied. Even to further improve the accuracy result ensemble of these
four basic regression models in stacking manner is validated. To vali-
of top 20 influencers is computed in terms of their percentile. The result date this, we had proposed the hypothesis:
of top 21 influencers are presented in Table 5 Which shows highest rank
celebrity as 100 percentile and all influences are computed based on • H6: Diverse user actions on diverse social media play an important
influence percentage. role for measuring accurate influence.
Apart from ranking the influencers based on percentile, the study
also identifies relevant factors comprising of outreach, footprint and The results for the adopted approaches are compared and evaluated
sentiment that play a critical role in identifying prominent influencers on the basis ofMean Absolute Error (MAE), Mean Squared Error (MSE)
(section 5.1). and Root Mean Squared Error (RMSE). Table 6 reports the details for
the same. The KNN approach with 2 neighbours has the lowest MAE of
5.3. 3 Comparative analysis of influence index on distinct social media 3.67 while the ensemble model reflects the lowest MSE and RMSE,
32.50 and 5.70 respectively.
The elementary functionalities of online social network differ. Few Lastly, accuracy of all MLR model is computed. Accuracy refers to
of them the major OSNs are -facebook as relationship network, the number of influence index correctly measured to either same or
Instagram as media sharing network and Twitter as social publishing different celebrities from among the total celebrities under study. Fig. 9
network. Celebrities end up posting multiple contents across these depicts the comparison plots in terms of accuracy for the models.
platforms while availing these services. On different social media It is evident from the graph that ensemble model results in the
platform, influence of each one is measured with a different set of at- highest accuracy of 93.7% followed by KNN, Lasso, Linear Regression
tributes in accordance of that specific portal. Based on this hypothesis (OLS) and SVR with accuracies 93.6%, 88.6%, 86.8% and 86.4% re-
had been proposed: spectively. The ensemble and KNN model outperform the remaining
approaches in terms of error and accuracy resulting in successful pre-
H5: A celebrity has distinctive exposure across OSNs, thereby diction of the social influencer score. These approaches better identify
the significant features as compared to the OLS, Lasso and SVR

A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

Fig. 7. Significant Features obtained by Lasso.

Table 4
Sample data instances for normalized dataset.
facebook_PTA facebook_likes twitter_followers instagram_followers max_eng_tw avg_eng_tw max_eng_fb avg_eng_fb max_eng_insta avg_eng_insta avg_

98.06 98.77 98.71 85.43 48.17 84.54 98.2 96.16 48.48 68.48 95.54
70.5 95.26 84 81.22 30.68 50.24 62 49.79 65.8 71.41 48.58
0 0 62.37 64.06 36.33 56.72 0 0 36.11 46.53 0
81.93 80 36.81 54.88 1 1 90.41 95.2 46.25 55.3 95.13
35.95 70.16 68.7 76.92 3 8.75 42.75 40.14 60.14 58.96 40.02
68.51 72.91 67.6 68.73 13.58 35.03 47.44 34.01 55.14 59.92 33.8
67.07 80.76 66.19 81.44 32.13 68.92 67.55 57.91 85.83 95.55 57.65
97.4 97.16 66.01 88.92 21.86 36.73 95.57 88.58 90.24 95.64 88.27
32.1 78.2 29.54 71.67 0 0 0 0 61.44 64.05 0

Table 5 information consumers (Meraz, 2009; Berthon et al., 2012; McCormick,

Top 20 influencer percentile. 2016; Prentice et al., 2019). The studies further highlight instances of
SalmanKhan 100.00 RiteishDeshmukh 86.25
social media influence and subsequently generated capital (Freberg
et al., 2011). However, to the best of our knowledge none of the studies
AkshayKumar 97.58 VarunDhawan 86.13 discuss the factors that help in identification of potential influencers.
DeepikaPadukone 94.81 SunnyLeone 84.16 The current study uses a mixed research methodology comprising of
HritikRoshan 93.93 Ranveer Singh 84.00
both social media analytics and regression analysis to identify sig-
SonakshiSinha 92.67 SidharthMalhotra 83.74
ParineetiChopra 92.58 DishaPatani 83.58 nificant factors that contribute to social media influencer for selected
ShraddhaKapoor 92.01 ShrutiHaasan 82.63 celebrities (Shareef et al., 2019). We provided a comprehensive eva-
JacquelineFernandez 91.79 KajalAggarwal 81.12 luation of contextual features of all three targeted social media plat-
ShahidKapoor 89.43 ArjunKapoor 80.35
forms. Some of these features are utilized in other studies as well. For
PriyankaChopra 88.94 SonamKapoor 80.28
AjayDevgn 88.55
example: sentiment, mentions, and hashtags for twitter as features are
used in (Ma et al., 2013); Number of images, Hashtags count, number of
filters used, image content length for Instagram are used in (Mittal
regression models. et al., 2017). In brief, our evaluation results showed that different
features have different weights according to influencer indexing, In-
6. Discussion fluencer indexing computed based on features to influencers/celebrities
and varying influencer/celebrities have varying influencer index on
Existing studies in literature focus on different forms of social media different social media applications.
including blogs and conclude that these elite media outlets gain im- The study models and ranks top influencers across three major
mense traction and have a subsequent social influence on the platforms including Facebook, Twitter and Instagram. This is done

A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

Fig. 8. Platform-wise influencer score.

Table 6 influencer index. Lastly, the study also tried to evaluate which social
Regression metrics. media platform is more dominant when it comes calculating the in-
fluencer index. There were not enough instances to establish general-
izability of weights that can be attributed with each social media
Ordinary Least Square(OLS) 5.72 77.97 8.83 platform. This could be one important area to explore in the future as
Support Vector Regression (SVR) 4.92 80.75 8.98 the current study could not provide sufficient evidences for validating
K-Nearest Neighbours (k = 2) 3.67 37.56 6.12
Lasso (Alpha = 0.1) 5.92 55.8 7.47
the research question.
Ensemble Model (OLS, SVR,KNN, Lasso) 4.49 32.50 5.70
6.1. Contribution to existing knowledge

based on Outreach, Footprint, Engagement and Sentiment of the in- The growth of Web 3.0 has enhanced means of interaction, com-
fluencer. These constructs are computed using attributes mined from munication and engagement among individuals. In the constant quest
the social media profiles of these influencers. Further, none of the for gaining higher than usual traction, social media platforms play a key
studies explore the impact of celebrities’ frequency and/or diversity of role. The diffusion of information determines which piece of informa-
social media use on their influencer index. The regression model used in tion cuts through the noise and stands out influencing a larger audi-
the current study tries to identify whether these attributes play a sig- ence. The current study proposes a scoring mechanism for influencers.
nificant role when it comes to computing the social media influencer The contribution of the study is two-fold. Firstly, the study proposes
index. It is observed that the overall influencer score does depend on multiple attributes from social media, categorized into 6 constructs
the individual scores of the different platforms on which the influencers namely Overall Footprint (OF), Engagements & Outreach (EO), Hourly
engage. Further, findings also indicate that the frequency of social Engagement Velocity (HEV), Daily Engagement Velocity (DEV),
media usage also increases the social media index in most of the cases. Audience Sentiment (AS) and Posting Rate (PR). Further, the study uses
In addition to this, existing studies explore the impact of user sen- regression modelling to compute influencer scores using the identified
timent on firm's equity (Yu et al., 2013). The same can be explained constructs. These constructs provide a holistic view of different aspects
here for an influencer's index. The impact of audience reaction in the including outreach and engagement which can be useful in multiple use
form of comments and replies is critical for the influencers. The study cases. The existing studies can literature can adopt these constructs for
explores the effect of overall sentiment scores for computing the understanding social influencers across platforms. The current study

Fig. 9. Accuracy of the models.

A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

through the use of social media analytics. • Celebrities may endorse a specific brand based on their social in-
fluence on a specific platform and can earn well by promoting brand
6.2. Implications for practice on social media.
• Individual can help society in order to make social changes by using
Our study does not directly provide insights for practice but based their social influence.
on correlations, it provides use cases where influencer indices may be
useful. Influencer indices can be used by Brand companies or brand The current study provides influencer scores for different platforms
marketers and celebrities/consumers those are intensively associated and based on a brand's need an influencers score on a particular plat-
with social media for promotion purpose. This section is further divided form, the best suited choice could be made.
in two subsections which are as follows:
7. Conclusion and future research directions
6.2.1. Brand companies/brand marketers
With the rampant growth of social media usage, brands have started The current study proposes a mechanism for measuring influencer
utilising these platforms for enhancing customer engagement and index across popular social media platforms including Facebook,
reaching out to a larger audience base. Literature highlights evidence of Twitter and Instagram. The study presents several research questions
the impact of social media use in domains like influencer marketing and and in light of those tries to compute a social influencer index. Further,
brand management. These platforms have become increasingly popular a set of 39 features that help determine the impact on the consumers is
forfacilitating engagement, collaborations and drastically impact a modelled using a regression approach. These features have been created
brand's reputation (Kim &Ko, 2010; Kim &Ko, 2012). With the plethora under every social media category and are categorized into various sub-
of products and services available in the market by different brands, the heads including Overall Footprint (OF), Engagements & Outreach (EO),
choice that the consumers have also increases (De Vries et al., 2012). Hourly Engagement Velocity (HEV), Daily Engagement Velocity (DEV),
Brands are striving for presence and want to establish themselves for Audience Sentiment (AS) and Posting Rate (PR). The features are sub-
greater outreach and thus invest heavily in influencer marketing. It is sequently analysed using the regression models OLS, KNN, SVR, Lasso
very important for these brands to identify influencers that could Regression and subsequently an ensemble model are adopted to com-
market their products/services. This makes the current study critically pute a cumulative score in terms of influencer index. The results and
essential for these brands while selecting influencers for their portfolio. findings are indicative of the fact that engagement, outreach, senti-
The research work for weighted feature finding and influence indices ment, and growth play a key role in determining the influencers. The
reveals- ensemble model outperforms the remaining approaches in terms of
error rate and accuracy. The KNN regression also reflects significantly
• Detailed insights on social media engagement features and which high accuracy almost equal to the ensemble. Further, the study has
feature is significant on the final social influence indies assignment; implications across various domains of e-commerce, viral marketing
• Investigate the possibility to obtain an interpretable celebrity that (Petrescu and Korgaonkar, 2011), social media marketing (Akar and
makes a specific post highly impactful on a specific social media Topçu, 2011), and brand management (Baldus, 2018) where in iden-
platform out of Facebook, Instagram, and Twitter. tification of key information propagators is essential.
• Multiple MLR models- OLS, SVM Regression, KNN Regression, and The current study does not identify the relative importance of dif-
lasso regression is used to build final influence indices model. Based ferent social media platforms while computing the influencer index.
on experiments this is made clear to marketers that out based Each platform is given equal weight age while modelling the constructs.
standard linear regression models- KNN regression provides the This could be one of the future research directions where apart from the
highest accuracy for social influence indexing. constructs a weighted model for platform isolation is used to compute
• Finally, based on these models, an ensemble model is introduced for the score. Further, optimization techniques could be used to compute
marketers. This model is able to provide the highest social influence these scores. Evolutionary intelligence including swarm intelligence
index generation accuracy 93.7%. and bio-inspired computing approaches could be incorporated for
• Marketers are able to identify which social media platform is best finding the optimal value of these weights (Kar, 2016; Aswani et al.,
for promotion in case of a specifically selected celebrity. 2018a).
Further, Future studies can integrate network metrics like centrality,
Further, literature highlights several evidences of research being reciprocity, in-degree and out-degree to better understand the influence
conducted on the success of marketing activities on social media, little of the person on their network. These network related attributes can
is known about which platform is best suited for influencer marketing provide useful insights in terms of information propagation to the social
should there be restriction or limitation in the paid marketing budget. network of influencers on various platforms (Aswani et al., 2018b).
Also, a mapping with a personality framework like Big Five could be
6.2.2. Celebrities/consumers done to identify personality types of the influencers. The influencers
This study basically investigates the relationship between celeb- based on their social media activities could be grouped in either of the
rities’ engagement on three popularly known social media platforms. personality types including extroversion, neuroticism, and openness to
Many celebrity cohort is included in this work for modelling purpose. experience etc. This could be beneficial in providing a generalised
Undoubtedly, celebrities on social media hold influence over millions of personality of influencers with higher or lower influencer indices en-
fans. With all this in mind, we suggest few use cases those seems useful hancing the model adaptability in related domains and use cases.
for celebrity perspective-
Appendix A. Supplementary data
• Social media influence can help celebrity to maintain their status
among public. Supplementary data to this article can be found online at https://
• Uses social media to bring attention of fans towards their social
work and often by pointing societal issues.
• Celebrities may get ideas about which social media platform can References
viral their news maximally.
• Users/celebrities may inspire their followers to embrace their per- Alalwan, A.A., Rana, N.P., Algharabat, R., 2018. Examining factors influencing Jordanian
customers’ intentions and adoption of internet banking: Extending UTAUT2 with risk.

A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

J. Retail. Consum. Serv. 40, 125–138. Database Mark. Cust. Strategy Manag. 15 (2), 130–134.
Agarwal, S., Mehta, S., 2018, August. Social influence maximization using genetic algo- Dwivedi, Y.K., Kapoor, K.K., Chen, H., 2015. Social media marketing and advertising.
rithm with dynamic probabilities. In: 2018 Eleventh International Conference on Market. Rev. 15 (3), 289–309.
Contemporary Computing (IC3). IEEE, pp. 1–6. Dwivedi, Y.K., Rana, N.P., Tajvidi, M., Lal, B., Sahu, G.P., Gupta, A., 2017. March).
Aggrawal, N., Ahluwalia, A., Khurana, P., Arora, A., 2017a. Brand analysis framework for Exploring the role of social media in e-government: an analysis of emerging litera-
online marketing: ranking web pages and analyzing popularity of brands on social ture. In: Proceedings of the 10th International Conference on Theory and Practice of
media. Soc. Network Analy. Mining 7 (1), 21. Electronic Governance. ACM, pp. 97–106.
Aggrawal, N., Arora, A., Jain, A., Rathor, D., 2017b. Product diffusion pattern analysis Farook, F.S., Abeysekara, N., 2016. Influence of social media marketing on customer
model based on user's review of E-commerce application. In: Hybrid Intelligence for engagement. International Journal of Business and Management Invention 5,
Social Networks. Springer, Cham, pp. 227–248. 115–125.
Aggrawal, N., Arora, A., Anand, A., Irshad, M.S., 2018. View-count based modeling for Fraser, B.P., Brown, W.J., 2002. Media, celebrities, and social influence: identification
YouTube videos and weighted criteria–based ranking. In: Advanced Mathematical with elvis presley. Mass Commun. Soc. 5 (2), 183–206.
Techniques in Engineering Sciences. CRC Press, pp. 149–160. Freberg, K., Graham, K., McGaughey, K., Freberg, L.A., 2011. Who are the social media
Akar, E., Topçu, B., 2011. An examination of the factors influencing consumers' attitudes influencers? A study of public perceptions of personality. Publ. Relat. Rev. 37 (1),
toward social media marketing. J. Internet Commer. 10 (1), 35–67. 90–92.
Alalwan, A.A., Rana, N.P., Dwivedi, Y.K., Algharabat, R., 2017. Social media in mar- Gillin, P., 2008. New media, new influencers and implications for the public relations
keting: a review and analysis of the existing literature. Telematics Inf. 34 (7), profession. Journal of New Communications Research 2 (2), 1–10.
1177–1190. Goeldi, Andreas, 2011. Website network and adver-tisement analysis using analytic
Anagnostopoulos, A., Kumar, R., Mahdian, M., 2008, August. Influence and correlation in measurement of online social media content. U.S. Patent No. 7, 974–983.
social networks. In: Proceedings of the 14th ACM SIGKDD International Conference Grover, P., Kar, A.K., Dwivedi, Y.K., Janssen, M., 2018. Polarization and acculturation in
on Knowledge Discovery and Data Mining. ACM, pp. 7–15. US Election 2016 outcomes–Can twitter analytics predict changes in voting pre-
Andrews, D.F., 1974. A robust method for multiple linear regression. Technometrics 16 ferences. Technol. Forecast. Soc. Change.
(4), 523–531. 09.009.
Anshary, M.A.K., Trilaksono, B.R., 2016. Tweet-based target market classification using Hanna, R., Rohm, A., Crittenden, V.L., 2011. We’re all connected: the power of the social
ensemble method. J. ICT Res. Appl. 10 (2), 123–139. media ecosystem. Bus. Horiz. 54 (3), 265–273.
Aswani, R., Ghrera, S.P., Kar, A.K., Chandra, S., 2017a. Identifying buzz in social media: a Hastie, T., Tibshirani, R., 1996. Discriminant adaptive nearest neighbor classification and
hybrid approach using artificial bee colony and k-nearest neighbors for outlier de- regression. In: Advances in Neural Information Processing Systems, pp. 409–415.
tection. Soc. Network Analy. Mining 7 (1), 38. He, W., 2013. Examining students' online interaction in a live video streaming environ-
Aswani, R., Kar, A.K., Aggarwal, S., Ilavarsan, P.V., 2017b. November). Exploring content ment using data mining and text mining. Comput. Hum. Behav. 29 (1), 90–102.
virality in facebook: a semantic based approach. In: Conference on e-Business, e- Hearn, A., Schoenhoff, S., 2016. From Celebrity to Influencer. Wiley, London, pp.
Services and e-Society. Springer, Cham, pp. 209–220. 194–212.
Aswani, R., Kar, A.K., Ilavarasan, P.V., 2018a. Detection of spammers in twitter mar- Huang, J., Zhang, J., Li, Y., Lv, Z., 2014. Business value of enterprise micro-blogging:
keting: a hybrid approach using social media analytics and bio inspired computing. empirical study from weibo. Com in sina. J. Glob. Inf. Manag. 22 (3), 32–56.
Inf. Syst. Front. 20 (3), 515–530. Ibrahim, H., Targio, A., Chang, V., Anuar, N., Adewole, K., Yaqoob, I., Gani, A., Ahmed,
Aswani, R., Kar, A.K., Ilavarasan, P.V., Dwivedi, Y.K., 2018b. Search engine marketing is E., Chiroma, H., 2016. The role of big data in smart city. Int. J. Inf. Manag. 36 (5),
not all gold: insights from Twitter and SEOClerks. Int. J. Inf. Manag. 38 (1), 107–116. 748–758.
Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J., 2011. February). Everyone's an in- Jamali, S., Rangwala, H., 2009. November). Digging digg: comment mining, popularity
fluencer: quantifying influence on twitter. In: Proceedings of the Fourth ACM prediction, and social network analysis. In: Web Information Systems and Mining,
International Conference on Web Search and Data Mining. ACM, pp. 65–74. 2009. WISM 2009. International Conference on. IEEE, pp. 32–38.
Baldus, B.J., 2018. Leveraging online communities to support the brand and develop the Joseph, N., Sultan, A., Kar, A.K., Ilavarasan, P.V., 2018. October). Machine learning ap-
community. J. Internet Commer. 1–30. proach to analyze and predict the popularity of tweets with images. In: Conference on
Bana, R., Arora, A., 2018, August. Influence indexing of developers, repositories, tech- e-Business, e-Services and e-Society. Springer, Cham, pp. 567–576.
nologies and programming languages on social coding community GitHub. In: 2018 Kamboj, S., Sarmah, B., Gupta, S., Dwivedi, Y.K., 2018. Examining branding co-creation
Eleventh International Conference on Contemporary Computing (IC3). IEEE, pp. 1–6. in brand communities on social media: applying paradigm of Stimulus-Organism-
Basak, D., Pal, S., Patranabis, D.C., 2007. Support vector regression. Neural Information Response. Int. J. Inf. Manag. 39 (April), 169–185.
Processing-Letters and Reviews 11 (10), 203–224. Kaplan, A.M., Haenlein, M., 2010. Users of the world, unite! the challenges and oppor-
Berthon, P.R., Pitt, L.F., Plangger, K., Shapiro, D., 2012. Marketing meets Web 2.0, social tunities of Social Media. Bus. Horiz. 53 (1), 59–68.
media, and creative consumers: implications for international marketing strategy. Kapoor, K.K., Dwivedi, Y.K., 2015. Metamorphosis of Indian electoral campaigns: modi's
Bus. Horiz. 55 (3), 261–271. social media experiment. Int. J. Indian Cult. Bus. Manag. 11 (4), 496–516.
Booth, N., Matic, J.A., 2011. Mapping and leveraging influencers in social media to shape Kapoor, K.K., Tamilmani, K., Rana, N.P., Patil, P., Dwivedi, Y.K., Nerur, S., 2018.
corporate brand perceptions. Corp. Commun. Int. J. 16 (3), 184–191. Advances in social media research: past, present and future. Inf. Syst. Front. 20 (3),
Boss, G. J., Rick, A. H. I., HUERTAS, L. C. C., & DURAN, E. A. Z. (2018). U.S. Patent No. 9, 531–558.
996,846. Washington, DC: U.S. Patent and Trademark Office. Kar, A.K., 2016. Bio inspired computing–A review of algorithms and scope of applica-
Brennan, R., Croft, R., 2012. The use of social media in B2B marketing and branding: an tions. Expert Syst. Appl. 59, 20–32.
exploratory study. J. Cust. Behav. 11 (2), 101–115. Kaushal, R., Chandok, S., Jain, P., Dewan, P., Gupta, N., Kumaraguru, P., 2017,
Byrne, E., Kearney, J., MacEvilly, C., 2017. The role of influencer marketing and social September. Nudging nemo: helping users control linkability across social networks.
influencers in public health. Proc. Nutr. Soc. 76 (OCE3). In: International Conference on Social Informatics. Springer, Cham, pp. 477–490.
Cavalli, N., Costa, E., Ferri, P., Mangiatordi, A., Micheli, M., Pozzali, A., et al., 2011. Kelly, B., Vandevijvere, S., Freeman, B., Jenkin, G., 2015. New media but same old tricks:
Facebook influence on university students' media habits: qualitative results from a food mar-keting to children in the digital age. Current obesity reports 4 (1), 37–45.
field research. In: Media in Transition-Unstable Platforms: the Promise and Peril of Kietzmann, J.H., Hermkens, K., McCarthy, I.P., Silvestre, B.S., 2011. Social media? Get
Transition, US. serious! Understanding the functional building blocks of social media. Bus. Horiz. 54
Cha, M., Haddadi, H., Benevenuto, F., Gummadi, P.K., 2010. Measuring user influence in (3), 241–251.
twitter: the million follower fallacy. Icwsm 10 (10–17), 30. Kim, A.J., Ko, E., 2010. Impacts of luxury fashion brand's social media marketing on
Chen, H., Chiang, R., Storey, V., 2012. Business intelligence and analytics: from big data customer relationship and purchase intention. J.Global Fashion Mar. 1 (3), 164–171.
to big impact. MIS Q. 36 (4), 1165–1188. Kim, A.J., Ko, E., 2012. Do social media marketing activities enhance customer equity?
Chen, A., Lu, Y., Gupta, S., 2017. Enhancing the decision quality through learning from An empirical study of luxury fashion brand. J. Bus. Res. 65 (10), 1480–1486.
the social commerce components. J. Glob. Inf. Manag. 25 (1), 66–91. Kiss, C., Bichler, M., 2008. Identification of influencers—measuring influence in customer
Chen, C.P., 2013. Exploring personal branding on YouTube. J. Internet Commer. 12 (4), networks. Decis. Support Syst. 46 (1), 233–253.
332–347. Klostermann, J., Plumeyer, A., Böger, D., Decker, R., 2018. Extracting brand information
Childers, C.C., Lemon, L.L., Hoy, M.G., 2018. # sponsored# ad: agency perspective on from social networks: integrating image, text, and social tagging data. Int. J. Res.
influencer marketing campaigns. J. Curr. Issues Res. Advert. 1–17. Mark. 35 (4), 538–556.
Choi, T.M., Chan, H.K., Yue, X., 2017. Recent development in big data analytics for Lagrée, P., Cappé, O., Cautis, B., Maniu, S., 2017. November). Effective large-scale online
business operations and risk management. IEEE Transactions on Cybernetics 47 (1), influence maximization. In: 2017 IEEE International Conference on Data Mining
81–92. (ICDM). IEEE, pp. 937–942.
Craven, B.D., Islam, S.M., 2011. Ordinary Least-Squares Regression. Sage Publications, Lipsman, A., Mudd, G., Rich, M., Bruich, S., 2012. The power of “like”: how brands reach
pp. 224–228. (and influence) fans through social-media marketing. J. Advert. Res. 52 (1), 40–52.
Dai, Y., Kakkonen, T., Sutinen, E., 2011. MinEDec: a decision-support model that com- Lo, F.Y., Campos, N., 2018. Blending internet-of-things (IoT) solutions into relationship
bines text mining technologies with two competitive intelligence analysis methods. marketing strategies. Technol. Forecast. Soc. Change 137, 10–18.
International Journal of Computer Information Systems and Industrial Management Lou, C., Yuan, S., 2018. Influencer marketing: how message value and credibility affect
Applications 3, 165–173. consumer trust of branded content on social media. J. Interact. Advert. 1–45 (just-
De Vries, L., Gensler, S., Leeflang, P.S., 2012. Popularity of brand posts on brand fan accepted).
pages: an investigation of the effects of social media marketing. J. Interact. Mark. 26 Ma, Z., Sun, A., Cong, G., 2013. On predicting the popularity of newly emerging hashtags
(2), 83–91. in T witter. J. Am. Soc. Inf. Sci. Technol. 64 (7), 1399–1410.
Dix, S., Phau, I., Pougnet, S., 2010. “Bend it like Beckham”: the influence of sports ce- Macarthy, A., 2018. 500 Social Media Marketing Tips: Essential Advice, Hints and
lebrities on young adult consumers. Young Consum. 11 (1), 36–46. Strategy for Business Facebook, Twitter, Pinterest, Google+, YouTube, Instagram,
Doyle, S., 2008. Social network analysis in the Telco sector—marketing applications. J. LinkedIn, and More!. CreateSpace Independent Publishing Platform.

A. Arora, et al. Journal of Retailing and Consumer Services 49 (2019) 86–101

Mallipeddi, R., Kumar, S., Sriskandarajah, C., Zhu, Y., 2018. A Framework for Analyzing Romero, D.M., Galuba, W., Asur, S., Huberman, B.A., 2011, September. Influence and
Influencer Marketing in Social Networks: Selection and Scheduling of Influencers. passivity in social media. In: Joint European Conference on Machine Learning and
Fox School of Business Research Paper. pp. 18–042. Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, pp. 18–33.
Mangold, W.G., Faulds, D.J., 2009. Social media: the new hybrid element of the pro- Shareef, M.A., Dwivedi, Y.K., Kumar, V., Kumar, U., 2017. Content design of advertise-
motion mix. Bus. Horiz. 52 (4), 357–365. ment for consumer exposure: Mobile marketing through short messaging service. Int.
McCormick, K., 2016. Celebrity endorsements: influence of a product-endorser match on J. Inf. Manag. 37 (4), 257–268.
Millennials attitudes and purchase intentions. J. Retail. Consum. Serv. 32, 39–45. Shareef, M.A., Mukerji, B., Alryalat, M.A.A., Wright, A., Dwivedi, Y.K., 2018.
McNeill, A.R., Briggs, P., 2014. April).Understanding Twitter influence in the health Advertisements on Facebook: identifying the persuasive elements in the development
domain: a social-psychological contribution. In: Proceedings of the 23rd International of positive attitudes in consumers. J. Retail. Consum. Serv. 43 (July), 258–268.
Conference on World Wide Web. ACM, pp. 673–678. Shareef, M.A., Mukerji, B., Dwivedi, Y.K., Rana, N.P., Islam, R., 2019. Social media
Meraz, S., 2009. Is there an elite hold? Traditional media to social media agenda setting marketing: comparative effect of advertisement sources. J. Retail. Consum. Serv. 46,
influence in blog networks. J. Computer-Mediated Commun. 14 (3), 682–707. 58–69.
Mills, A.J., 2012. Virality in social media: the SPIN framework. J. Public Aff. 12 (2), Shiau, W.-L., Dwivedi, Y.K., Yang, H.-S., 2017. Co-citation and cluster analyses of extant
162–169. literature on social networks. Int. J. Inf. Manag. 37 (5), 390–399.
Mittal, V., Kaul, A., Gupta, S.S., Arora, A., 2017. Multivariate features based instagram Shiau, W.-L., Dwivedi, Y.K., Lai, H.-H., 2018. Examining the core knowledge on
post analysis to enrich user experience. Procedia Computer Science 122, 138–145. Facebook. Int. J. Inf. Manag. 43, 52–63.
Morone, F., Min, B., Bo, L., Mari, R., Makse, H.A., 2016. Collective influence algorithm to Sokolova, K., Kefi, H., 2019. Instagram and YouTube bloggers promote it, why should I
find influencers via optimal percolation in massively large social media. Sci. Rep. 6, buy? How credibility and parasocial interaction influence purchase intentions. J.
30062. Retail. Consum. Serv.
Nabi, N., O'Cass, A., Siahtiri, V., 2019. Status consumption in newly emerging countries: Taneja, A., Arora, A., 2018. Cross domain recommendation using multidimensional
the influence of personality traits and the mediating role of motivation to consume tensor factorization. Expert Syst. Appl. 92, 304–316.
conspicuously. J. Retail. Consum. Serv. 46, 173–178. Taneja, A., Arora, A., 2019. Modeling user preferences using neural networks and tensor
Neystadt, E.J., Karidi, R., Weisfeild, Y.T., Tennenholtz, M., Radinsky, K., Varshavsky, R., factorization model. Int. J. Inf. Manag. 45, 132–148.
2012. U.S. Patent Application No. 13/042. pp. 463. Taylor, M., Reilly, D., Wren, C., 2018. Internet of things support for marketing activities.
Parsons, A.L., Lepkowska-White, E., 2018. Social media marketing management: a con- J. Strateg. Mark. 1–12.
ceptual framework. J. Internet Commer. 1–15. Tess, P.A., 2013. The role of social media in higher education classes (real and virtual)–A
Petrescu, M., Korgaonkar, P., 2011. Viral advertising: definitional review and synthesis. J. literature review. Comput. Hum. Behav. 29 (5), A60–A68.
Internet Commer. 10 (3), 208–226. Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser.
Pham, T., Simoiu, C., 2016. Unsupervised Learning for Effective User Engagement on B 267–288.
Social Media. arXiv preprint arXiv:1611.03894. Trusov, M., Bodapati, A.V., Bucklin, R.E., 2010. Determining influential users in internet
Pike, S., Gentle, J., Kelly, L., Beatson, A., 2018. Tracking brand positioning for an social networks. J. Mark. Res. 47 (4), 643–658.
emerging destination: 2003 to 2015. Tourism Hospit. Res. 18 (3), 286–296. Vakeel, K.A., Panigrahi, P.K., 2018. Social media usage in E-government: mediating role
Pintado, T., Sanchez, J., Carcelén, S., Alameda, D., 2017. The effects of digital media of government participation. J. Glob. Inf. Manag. 26 (1), 1–19.
advertising content on message acceptance or rejection: brand trust as a moderating Weeks, B.E., Ardèvol-Abreu, A., Gil de Zúñiga, H., 2017. Online influence? Social media
factor. J. Internet Commer. 16 (4), 364–384. use, opinion leadership, and political persuasion. Int. J. Public Opin. Res. 29 (2),
Popescu, P.S., Mihaescu, M.C., Popescu, E., Mocanu, M., 2016. July). Using ranking and 214–239.
multiple linear regression to explore the impact of social media engagement on Wiedmann, K.P., Hennigs, N., Langner, S., 2010. Spreading the word of fashion: identi-
student performance. In: Advanced Learning Technologies (ICALT), 2016 IEEE 16th fying social influencers in fashion marketing. J.Global Fashion Mar. 1 (3), 142–153.
International Conference on. IEEE, pp. 250–254. Woodcock, N., Green, A., Starkey, M., 2011. Social CRM as a business strategy. J.
Prentice, C., Han, X.Y., Hua, L.L., Hu, L., 2019. The influence of identity-driven customer Database Mark. Cust. Strategy Manag. 18 (1), 50–64.
engagement on purchase intention. J. Retail. Consum. Serv. 47, 339–347. Yang, A., Kent, M., 2014. Social media and organizational visibility: a sample of Fortune
Qualman, E., 2010. Socialnomics: How Social Media Transforms the Way We Live and Do 500 corporations. Publ. Relat. Rev. 40 (3), 562–564.
Business. John Wiley & Sons. Yu, Y., Duan, W., Cao, Q., 2013. The impact of social and conventional media on firm
Rathore, A.K., Ilavarasan, P.V., Dwivedi, Y.K., 2016. Social media content and product co- equity value: a sentiment analysis approach. Decis. Support Syst. 55 (4), 919–926.
creation: an emerging paradigm. J. Enterp. Inf. Manag. 29 (1), 7–18. Zwick, D., Denegri-Knott, J., 2018. Biopolitical Marketing and Technologies of Enclosure.
Razis, G., Anagnostopoulos, I., Zeadally, S., 2018. Modeling Influence with Semantics in The SAGE Handbook of Consumer Culture, pp. 333–348.
Social Networks: a Survey. arXiv preprint arXiv:1801.09961.


