Web Traffic Time Series Forecasting Using ARIMA Model
Web Traffic Time Series Forecasting Using ARIMA Model
Web Traffic Time Series Forecasting Using ARIMA Model
https://doi.org/10.22214/ijraset.2022.42800
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
Abstract: Web traffic prediction is a major concern since it has the potential to produce severe snags in the working of websites.
It is one of the most difficult tasks to make predictions about future time series values, so been a hot topic for research. The
increase in web traffic may encounter a crashed site or very slow loading time. Such disturbances may cause many disturbances
for the users, consequently decreased users rating of the site and user move to another site that affects the business. We have
implemented a forecasting model to predict web traffic. ARIMA model is used for Web traffic time series forecasting. We have
used some of the features like page name, date visited, and the number of visits for prediction with higher accuracy.
Keywords: Web traffic prediction, ARIMA model, Time series forecasting, Data Collection and Feature Understanding.
I. INTRODUCTION
People who work for web service providers need to know how much traffic a web server is getting, because if they don’t, customers
might have long waited and leave the site. However, this is a difficult task because it requires making accurate predictions about
how people will act based on their randomness. In this article, we show how to build an architecture that takes source data and uses
it to make predictions about how many people are going to see a given page at a given time. Depending on the website’s response,
web applications handle HTTP GET requests, media apps spread content based on what the user wants, and so on and so forth
Request time will have a big impact on how the end-user sees the quality of the service. A lot of people have left a lot of platforms
because they took too long to respond. However, the response time is the time between when the application receives the request
and when it sends back the answer. This is called the response time. This can’t be taken away. In the case of web services, the
response time is too long for customers to expect. Developers have been able to figure out when the response time is too long,
known as web congestion. A time series with the dates and number of page views make sense for the problem. The purpose of this
research is to design a forecasting model to predict web traffic based on the certain features like page name, visited date and the
number of visits for pages for a year. As more people gain access to the internet around the world, the increase in traffic to
practically all websites have become unavoidable. The increase in website traffic could bring a slew of issues, and the company that
is able to deal with the variations in traffic the most effectively will emerge.[7]As most people have experienced a crashed site or a
very slow loading time for a website when there are a lot of people using it, such as when various shopping websites may crash just
before festivals as more people try to log in to the website than it was originally capable of, causing a lot of inconveniences for the
users and as most people have encountered a crashed site or a very slow loading time for a website when there are a lot of people
using it, such as when various shopping websites may crash just before festivals as a result, it's possible that users will give the site a
lower rating and instead use another site, lowering their business. As a result, a traffic management approach or plan should be
implemented to limit the danger of such disasters, which could jeopardise the company's existence. Until recently, there was no need
for such tools because most servers could handle the traffic influx. However, the smart phone era has increased demand to such a
high level for some websites that businesses have been unable to respond quickly enough to maintain the inconsistent customer
service level.
II. LITERATURE SURVEY
During the construction of the prediction model, the system successfully rebuilt the existing model and added new features,
resulting in increased model efficiency. New features were used in various combinations.
1) For capturing weekly, monthly, quarterly, and yearly page popularity, use the median of specified window length in each time
series as an independent feature.
2) Golden ratio-based median of medians of variable time frame windows.
To determine the importance of each feature, the study [1] analysed the obtained results and compared the ac curacies in various
cases. Next, we’ll try to figure out how to tweak parameters in an existing model to get better results. Study wanted to find the most
suitable forecasting model based on time-series which helps us to forecast future traffic data when there is enough dataset is
provided.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2447
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
Having this goal in mind, study began to search for models based on prediction, which would enable us to predict the value data.
However, upon more research, we found that it, not a prediction but rather forecasting, after which we focused on that. Study in [2]
came across so many timeseries forecasting models that it made our work both tedious and fun at the same time. Paper proposed a
time series forecasting technique to predict internet traffic based on past values using past values. Many forecasting techniques like
ARIMA are used extensively in literature for making forecasts, but it is useful mostly for a time series which is linear in nature. On
the other hand, neural networks like RNN are very useful in forecasting time series which are nonlinear in nature. Proposed
technique uses Discrete Wavelet Transform and using a high pass filter and a low pass filter producing linear and nonlinear parts
for the time series. The proposed technique [3] clearly outperforms ARIMA and RNN. And because of the simplicity of the
technique, it can be easily employed at data centres. The paper [4] put forward a new engineering approach to prediction of campus
network exit-link traffic trend. And it predicts that EPTS can have following effect in network traffic forecasting if having enough
historical data. Web Traffic Time Series Forecasting
a) To predict network exit-link traffic trend based on historical network traffic data, so we can layout the network resource
planning in advance.
b) It is easy to implement and its computing complexity is acceptable.
The paper [5] compares the traffic flow forecast effects of the LSTM network, BPNN model and ARIMA model on time series
captured at a single point. The proposed LSTM network can accurately predict the traffic flow based on the relatively stable time
series under normal conditions. However, the traffic system on roads is stochastic and complex, and often affected by abnormal
factors like bad weather, traffic accident and large events.
TITLE PUBLICATION AND TECHNICAL DETAILS
AITHOR
Web Traffic Prediction of 2018IEEE International In the process of building prediction model, System
Wikipedia Pages Conference on Big Data successfully rebuilt the existing model and added new
(Big Data) [1] features to observe improvements in efficiency of
-Navyasree Petluri, Eyhab model.
Al-Masri Applied new features in different combinations
1) Median of specified window length in each time
series as an independent feature for capturing weekly,
monthly, quarterly and yearly page popularity
2) Median of medians of variable time frame windows
based on golden ratio. Study analysed the Obtained
result and compared the accuracies in different cases,
to know the importance of each feature. As a next step,
we will try to work on how to tune parameters in
existing model to Observe better results.
Traffic Forecasting using 2021 6th International Study wanted to find the most suitable forecasting
Time-Series Conference on Inventive model based on time-series which helps us to forecast
Analysis. Computation future traffic data when there is enough dataset is
Technologies (ICICT) provided. Having this goal in mind, study began to
[2] search for models based on prediction, which would
- Mohammmad enable us to predict the value data. However, upon
Asifur Rahman Shuvo, more research, we found that it, not a prediction but
Muhtadi Zubair Afsara rather forecasting, after which we focused on that.
Tahsin Purnota, Study came across so many time- series forecasting
Sarowar Hossain, models that it made our work both tedious and fun at
Muhammad Iqbal Hossain the same time.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2448
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
Predicting Computer 2018 Eleventh Paper proposed a time series forecasting technique to
Network Traffic: A International Conference on predict internet traffic based on past values using past
Time Series Forecasting Contemporary values. Many forecasting techniques like ARIMA are
Approach Using DWT, Computing (IC3) [3] used extensively in literature for making forecasts, but,
ARIMA and RNN - Rishabh Madan, Partha it is useful mostly for a time series which is linear in
Sarathi nature. On the other hand, neural networks like RNN
Mangipudi are very useful in forecasting time series which are
nonlinear in nature. Proposed technique uses Discrete
Wavelet Transform and using a high pass filter and a
low pass filter producing linear and nonlinear parts for
the time series. The proposed technique clearly
outperforms ARIMA and RNN. And because of the
simplicity of the technique, it can be easily employed
at data centres.
An Engineering 2009 International Joint The paper put forward a new engineering approach to
Approach to Prediction of Conference on prediction of campus network exit-link traffic Trend
Network Traffic Based on Artificial Intelligence. [4] and it predicts that EPTS can have following effect in
Time-Series -Fu-Ke Shen, Wei Zhang, network traffic forecasting if having enough historical
Model Pan Chang data.
1) To predict network exit-link traffic trend based on
historical network traffic data, so we can layout the
network resource planning in advance.
2) It is easy to implement and its computing
complexity is acceptable.
Traffic Flow Forecast 2020 IEEE Access [5] This paper compares the traffic flow forecast effects of
Through Time Series - Jianhu Zheng, the LSTM network, BPNN model and ARIMA model
Analysis Based Mingfang Huang on time series captured at a single point. The proposed
Deep Learning LSTM network can accurately predict the traffic flow
based on the relatively stable time series under normal
conditions. However, the traffic system on roads is
stochastic and complex, and often affected by
abnormal factors like bad weather, traffic accident and
large events.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2449
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
They become inaccessible in the situation of large data, because a single model can learn from several similar timeseries at the same
time. Furthermore, even more advanced models, such as neural networks (NNs), benefit as much as feasible from access to large
amounts of data [10,11]. The Recurrent Neural Networks (RNN) play a role in this new field of growing scientific interest in the
NN. Results never seen previously in the field of language and time series analysis are beginning to be realised with this new form
of neural networks specialising in the sequence prediction problem [18].
RNNs, on the other hand, have major memory issues, which were remedied when the LSTM was introduced into the research field.
In addition to the RNN's regular hidden state, this new type of RNN has a new internal memory (cell state). This makes it easier to
prevent vanishing or exploding gradient difficulties when training LSTMs [19]. LSTM can be utilised in a predictive setting since
time series have seasonality components. If a monthly time series exhibits yearly seasonality, for example, the value of the same
exact month the previous year is more useful in predicting the value for the next month. Suilin et al. accomplished a fantastic job on
the Kaggle challenge for Wikipedia's web traffic estimate, which included this concept. [20]. Despite the fact that this dataset has
been widely utilised for the prediction of time series related to online traffic, it has not been employed in the construction of the
LSTM with minimal data since the researchers believe that alternative models, such as ARIMA, are more efficient in these
situations. Other study Mathematics 2021, 9, 421 4 of 21 TSF has resulted in more complex prediction algorithms that do not
account for the seasonal component. [21]. For multivariate forecasting issues, Qin et al. presented an RNN. Different weights are
allocated to the various driving data in this model based on how important they are in contributing to the forecast at each time stage.
The authors compared this model to ARIMA, NARX RNN, Encoder Decoder, Attention RNN, Input Attention RNN, and the Dual
Stage Attention RNN in order to validate it [22]. The model presented by Qin et al. has been studied more recently. The authors
claimed in [23] that their model can cope with the spatial temporal series' fundamental properties. Researchers have been stacking
RNN or LSTM to reach the needed outcome in FTS challenges in recent years. However, with FTS prediction models in time series
with little data, we discovered a gap in the literature [24]. We suggested a supervised architecture based on LSTM that is trained
through distributed data parallelism and follows the Downpour technique due to a gap in the state of the art in the prediction of time
series with limited data.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2450
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
Apply
Input forecasting
Data Analysis System
Dataset model
Preprocessing
2) Databases
The database basically used for user storing user details like Username and Password.
The tool used for db functionalities was MYSQL GUI Browser.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2451
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
B. Limitations
Server machine should be on all the time.
VI. CONCLUSION
Our research’s primary objective is to develop a consistent forecasting model for predicting the future traffic of Wikipedia pages. To
validate our prediction model, we use ARIMA model on Web Traffic Time Series Forecasting dataset. We have trained the data
with this model using features like page name, visited date and the number of visits for pages for a year to predict the future web
traffic.
REFERENCES
[1] Navyasree Petluri and Eyhab Al-Masri, "Wikipedia Page Traffic Prediction," 2018 IEEE International Conference on Big Data (Big Data).
[2] Mohammad Asifur Rahman Shuvo, Muhtadi Zubair, Afsara Tahsin Purnota, Sarowar Hossain, and Muhammad Iqbal Hossain, "Traffic Forecasting Using
Time-Series Analysis," 6th International Conference on Inventive Computation Technologies, 2021. (ICICT).
[3] Partha Sarathi Mangipudi and Rishabh Madan, "Predicting Computer Network Traffic: A Time Series Forecasting Approach Using DWT, ARIMA, and RNN,"
2018 Eleventh International Conference on Contemporary Computing (IC3).
[4] Fu-Ke Shen, Wei Zhang, and Pan Chang, "An Engineering Approach to Network Traffic Prediction Using a Time-Series Model," International Joint
Conference on Artificial Intelligence, 2009.
[5] Jianhu Zheng and Mingfang Huang, "Traffic Flow Forecasting Using Deep Learning and Time Series Analysis," IEEE Access, 2020. P Montero-Manso.
[6] Montero-Manso, P.; Athanasopoulos, G.; Hyndman, R.J.; Talagala, T.S. Fforma: Featurebased forecast model averaging. Int. J. Forecast. 2020,36, 86–92.
[7] Rangapuram, S.S.; Seeger, M.W.; Gasthaus, J.; Stella, L.; Wang, Y.; Januschowski, T. Deep state space models for time series forecasting. Adv. Neural Inf.
Process. Syst. 2018,31, 7785–7794.
[8] Tealab, A. Time series forecasting using artificial neural networks methodologies: A systematic review. Future Comput. Inform. J. 2018,3, 334–340.
[9] Tyralis, H.; Papacharalampous, G. Variable selection in time series forecasting using random forests. Algorithms 2017,10, 114.
[10] Chen, W.C.; Chen, W.H.; Yang, S.Y. A big data and time series analysis technology-based multi-agent system for smart tourism. Appl. Sci. 2018,8, 947.
[11] Boone, T.; Ganeshan, R.; Jain, A.; Sanders, N.R. Forecasting sales in the supply chain: Consumer analytics in the big data era. Int. J. Forecast. 2019,35,170–
1801
[12] Chen, D.; Gao, M.; Liu, A.; Chen, M.; Zhang, Z.; Feng, Y. A Recurrent Neural Network Based Approach for Web Service QoS Prediction. In Proceedings of
the 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 25–28 May 2019; pp. 350–357.
[13] Zhou, K.; Wang, W.; Huang, L.; Liu, B. Comparative study on the time series forecasting of web traffic based on statistical model and Generative Adversarial
model. Knowl.-Based Syst. 2020, 213, 106467.
[14] Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. The M4 Competition: 100,000 time series and 61 forecasting methods. Int. J. Forecast. 2020, 36, 54–74.
[15] Yang, Y.; Lu, S.; Zhao, H.; Ju, X. Predicting Monthly Pageview of Wikipedia Pages by Neighbor Pages. In Proceedings of the 2020 3rd International
Conference on Big Data Technologies, Qingdao, China, 18–20 September 2020; pp. 112–115. Mathematics 2021, 9, 421 20 of 21
[16] Bojer, C.S.; Meldgaard, J.P. Kaggle forecasting competitions: An overlooked learning opportunity. Int. J. Forecast. 2020.
[17] Fry, C.; Brundage, M. The M4 Forecasting Competition-A Practitioner’s View. Int. J. Forecast. 2019.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2452
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
[18] De Gooijer, J.G.; Hyndman, R.J. 25 years of time series forecasting. Int. J. Forecast. 2006, 22, 443–473.
[19] Madan, R.; SarathiMangipudi, P. Predicting computer network traffic: A time series forecasting approach using DWT, ARIMA and RNN. In Proceedings of the
2018 Eleventh International Conference on Contemporary Computing (IC3), Noida, India, 2–8 August 2018; pp. 1–5.
[20] Le, P.; Zuidema, W. Quantifying the vanishing gradient and long-distance dependency problem in recursive neural networks and recursive LSTMs. arXiv 2016,
arXiv:1603.00423.
[21] Suilin, A. kaggle-web-traffic. 2017. Available online; https://github.com/Arturus/kaggle-web-traffic/ (accessed on 19 November2018).
[22] Cinar, Y.G.; Mirisaee, H.; Goswami, P.; Gaussier, E.; Aït-Bachir, A.; Strijov, V. Position-based content attention for time series forecasting with sequence-to-
sequence rnns. In Proceedings of the International Conference on Neural Information Processing, Guangzhou, China, 14–18 November 2017; Springer: Cham,
Switzerland, 2017; pp. 533–544.
[23] Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Jiang, G.; Cottrell, G. A dual-stage attention-based recurrent neural network for time series prediction. arXiv 2017,
arXiv:1704.02971.
[24] Liang, Y.; Ke, S.; Zhang, J.; Yi, X.; Zheng, Y. Geoman: Multi-level attention networks for geo-sensory time series prediction. In Proceedings of the 2018
International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden, 13–19 July 2018; pp. 3428–3434.
[25] Smagulova, K.; James, A.P. A survey on LSTM memristive neural network architectures and applications. Eur. Phys. J. Spec. Top. 2019, 228, 2313–2324.
[26] Miyaguchi, A.; Chakrabarti, S.; Garcia, N. ForecastingWikipedia Page Views with Graph Embeddings. 2019. Available online:
http://cs229.stanford.edu/proj2019aut/data/assignment_308832_raw/26647399.pdf (accessed on 30 November 2020).
[27] Wunnava, V.P. Exploration of Wikipedia traffic data to analyze the relationship between multiple pages. Master’s Thesis, University of North Carolina, Chapel
Hill, NC, USA, May 2020.
[28] Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018,
arXiv:1803.01271.
[29] Srinivasan, A.; Jain, A.; Barekatain, P. An analysis of the delayed gradients problem in asynchronous sgd. In Proceedings of the 6th International Conference
on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018.
[30] Dean, J.; Corrado, G.; Monga, R.; Chen, K.; Devin, M.; Mao, M.; Le, Q.V.; Mao, M.Z.; Ranzato, M.; Senior, A.; et al. large scale distributed deep networks.
Adv. Neural Inf. Process. Syst. 2012, 25, 1223–1231.
[31] Talyansky, R.; Kisilev, P.; Melamed, Z.; Peterfreund, N.; Verner, U. Asynchronous SGD without gradient delay for efficient distributed training. In
Proceedings of the International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019.
[32] Tian, C.; Ma, J.; Zhang, C.; Zhan, P. A deep neural network model for short-term load forecast based on long short-term memory network and convolutional
neural network. Energies 2018, 11, 3493.
[33] Liu, Y.; Guan, L.; Hou, C.; Han, H.; Liu, Z.; Sun, Y.; Zheng, M.Wind power short-term prediction based on LSTM and discrete wavelet transform. Appl. Sci.
2019, 9, 1108.
[34] Liu, Z.; Yan, Y.; Hauskrecht, M. A flexible forecasting framework for hierarchical time series with seasonal patterns: A case study of web traffic. In
Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018;
pp. 889–892.
[35] Shelatkar, T.; Tondale, S.; Yadav, S.; Ahir, S. Web Traffic Time Series Forecasting using ARIMA and LSTM RNN. In Proceedings of the ITM Web of
Conferences 2020; EDP Sciences: Ulis, France, 2020; Volume 32, p. 03017.
[36] Petluri, N.; Al-Masri, E. Web Traffic Prediction of Wikipedia Pages. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data),
Seattle,WA, USA, 10–13 December 2018; pp. 5427–5429. Mathematics 2021, 9, 421 21 of 21
[37] Du, S., Pandey, M., & Xing, C. Modeling Approaches for Time Series Forecasting and Anomaly Detection. Technical Report. 2017. Available online:
http://cs229.stanford.edu/proj2017/final-reports/5244275.pdf (accessed on 30 November 2020).
[38] Ragno, R.; Papa, R.; Patsilinakos, A.; Vrenna, G.; Garzoli, S.; Tuccio, V.; Fiscarelli, E.; Selan, L.; Artini, M. Essential oils against bacterial isolates from cystic
fibrosis patients by means of antimicrobial and unsupervised machine learning approaches. Sci. Rep.2020, 10, 1–11.
[39] Ieracitano, C.; Paviglianiti, A.; Campolo, M.; Hussain, A.; Pasero, E.; Morabito, F.C. A novel automatic classification system based on hybrid unsupervised and
supervised machine learning for electrospun nanofibers. IEEE/CAA J. Autom. Sin. 2020, 8, 64–76.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2453