Predictive models of hotel booking cancellation: a semi-automated analysis of the literature


  • Nuno Antonio ISCTE-IUL; ESGHT-Universidade do Algarve


Data Science, Forecast, Literature review, Prediction, Revenue Management


This study sought to combine data science tools and capabilities with human judgement and interpretation in order to demonstrate how semiautomatic analysis of the literature can contribute to identifying and synthesising research findings and topics about booking cancellation forecasting. The study also focused on recording in detail the analysis’s full experimental procedure to encourage other researchers to conduct automated literature reviews in order to understand more fully the current tendencies in their field of study. The data were obtained through a keyword search in Scopus and Web of Science databases. The methodology presented not only diminishes human bias but also enhances data visualisation and text mining techniques’ ability to facilitate abstraction, expedite analysis and improve literature reviews. The results show that, despite the importance of forecasting booking cancellations to understanding net demand and improving cancellation and overbooking policies, further research on this subject is needed.

Author Biography

  • Nuno Antonio, ISCTE-IUL; ESGHT-Universidade do Algarve
    Department of Information Science and Technology


Ali, N. B., & Usman, M. (2018). Reliability of search in systematic reviews: Towards a quality assessment framework for the automated-search strategy. Information and Software Technology, 99, 133–147.

Al-Safadi, E. B., & Al-Naffouri, T. Y. (2012). Peak reduction and clipping mitigation in OFDM by augmented compressive sensing. IEEE Transactions on Signal Processing, 60(7), 3834–3839.

Antonio, N., Almeida, A., & Nunes, L. (2017a). Predicting hotel booking cancellation to decrease uncertainty and increase revenue. Tourism & Management Studies, 13(2), 25–39.

Antonio, N., Almeida, A., & Nunes, L. (2017b). Predicting hotel bookings cancellation with a machine learning classification model. In Proceedings from the 16th IEEE International Conference on Machine Learning and Applications (pp. 1049–1054). Cancun, Mexico: IEEE.

Antonio, N., Almeida, A. de, & Nunes, L. (2017c). Using data science to predict hotel booking cancellations. In P. Vasant & K. M (Eds.), Handbook of Research on Holistic Optimization Techniques in the Hospitality, Tourism, and Travel Industry (pp. 141–167). Hershey, PA, USA: Business Science Reference.

Arun, R., Suresh, V., Madhavan, C. E. V., & Murthy, M. N. N. (2010). On finding the natural number of topics with Latent Dirichlet Allocation: Some observations. In Advances in Knowledge Discovery and Data Mining (pp. 391–402). Springer, Berlin, Heidelberg.

Azadeh, S. S., Labib, R., & Savard, G. (2013). Railway demand forecasting in revenue management using neural networks. International Journal of Revenue Management, 7(1), 18.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.

Bragge, J., Relander, S., Sunikka, A., & Mannonen, P. (2007). Enriching literature reviews with computer-assisted research mining. Case: profiling group support systems research (pp. 243a-243a). IEEE.

Calheiros, A. C., Moro, S., & Rita, P. (2017). Sentiment classification of consumer-generated online reviews using topic modeling. Journal of Hospitality Marketing & Management, 0(0), 1–19.

Chen, C.-C. (2016). Cancellation policies in the hotel, airline and restaurant industries. Journal of Revenue and Pricing Management, 15(3–4), 270–275.

Chiang, W.-C., Chen, J. C., & Xu, X. (2007). An overview of research on revenue management: current issues and future research. International Journal of Revenue Management, 1(1), 97–128.

Cirillo, C., Bastin, F., & Hetrakul, P. (2018). Dynamic discrete choice model for railway ticket cancellation and exchange decisions. Transportation Research Part E: Logistics and Transportation Review, 110, 137–146.

Delen, D., & Crossland, M. D. (2008). Seeding the survey and analysis of research literature with text mining. Expert Systems with Applications, 34(3), 1707–1720.

Denizci Guillet, B., & Mohammed, I. (2015). Revenue management research in hospitality and tourism: A critical review of current literature and suggestions for future research. International Journal of Contemporary Hospitality Management, 27(4), 526–560.

Fabbri, S., Hernandes, E., Di Thommazo, A., Belgamo, A., Zamboni, A., & Silva, C. (2013). Using information visualization and text mining to facilitate the conduction of systematic literature reviews. In J. Cordeiro, L. A. Maciaszek, & J. Filipe (Eds.), Enterprise Information Systems (Vol. 141, pp. 243–256). Berlin, Heidelberg: Springer Berlin Heidelberg.

Feinerer, I., & Hornik, K. (2017). tm: Text mining package (Version 0.7-3). Retrieved from

Fellows, I. (2014). wordcloud: Word clouds (Version 2.5). Retrieved from

Feng, L., Chiam, Y. K., & Lo, S. K. (2017). Text-mining techniques and tools for systematic literature reviews: A systematic literature review. In 2017 24th Asia-Pacific Software Engineering Conference (APSEC) (pp. 41–50).

Gayar, N. F. E., Saleh, M., Atiya, A., El-Shishiny, H., Zakhary, A. A. Y. F., & Habib, H. A. A. M. (2011). An integrated framework for advanced hotel revenue management. International Journal of Contemporary Hospitality Management, 23(1), 84–98.

Grun, B., & Hornik, K. (2011). topicmodels: An R Package for fitting topic Models. Journal of Statistical Software, 40(11), 1–30.

Guerreiro, J., Rita, P., & Trigueiros, D. (2016). A text mining-based review of cause-related marketing literature. Journal of Business Ethics, 139(1), 111–128.

Guo, X., Dong, Y., & Ling, L. (2016). Customer perspective on overbooking: The failure of customers to enjoy their reserved services, accidental or intended? Journal of Air Transport Management, 53, 65–72.

Haneem, F., Kama, N., Ali, R., & Selamat, A. (2017). Applying data analytics approach in systematic literature review: Master data management case study. In Frontiers in Artificial Intelligence and Applications (Vol. 297, pp. 705–715). Kitakyushu, Japan.

Hornik, K. (2017). NLP: Natural language processing Infrastructure (Version 0.1.11). Retrieved from

Ivanov, S., & Zhechev, V. (2012). Hotel revenue management–A critical literature review. Turizam: Znanstveno-Strucnicasopis, 60(2), 175–197.

Kassambara, A. (2017). Practical guide to cluster analysis in R: Unsupervised machine learning. STHDA.

Kassambara, A., & Mundt, F. (2017). factoextra: Extract and visualize the results of multivariate data analyses (Version 1.0.5). Retrieved from

Kimes, S. E., & Wirtz, J. (2003). Has revenue management become acceptable? Findings from an International study on the perceived fairness of rate fences. Journal of Service Research, 6(2), 125–135.

Kitchenham, B. A., & Charters, S. (2017). Guidelines for performing Systematic Literature Reviews in Software Engineering (version 2.3) (EBSE Technical Report No. EBSE-2007-01). Durham, UK: Keele University.

Krasteva, R. (2017). Local impact of refugee and migrants crisis on greek tourism industry. Economic Studies Journal, (4), 182–195.

Lan, Y., Ball, M. O., & Karaesmen, I. Z. (2011). Regret in overbooking and fare-class allocation for single leg. Manufacturing & Service Operations Management, 13(2), 194–208.

Lee, M. (2018). Modeling and forecasting hotel room demand based on advance booking information. Tourism Management, 66, 62–71.

Lemke, C., Riedel, S., & Gabrys, B. (2009). Dynamic combination of forecasts generated by diversification procedures applied to forecasting of airline cancellations. In IEEE Symposium on Computational Intelligence for Financial Engineering, 2009. CIFEr ’09 (pp. 85–91).

Lemke, C., Riedel, S., & Gabrys, B. (2013). Evolving forecast combination structures for airline revenue management. Journal of Revenue and Pricing Management, 12(3), 221–234.

Lewis-Beck, M. S. (2005). Election forecasting: Principles and practice. The British Journal of Politics & International Relations, 7(2), 145–164.

Liu, P. H. (2004). Hotel demand/cancellation analysis and estimation of unconstrained demand using statistical methods. In I. Yeoman & U. McMahon-Beattie (Eds.), Revenue management and pricing: Case studies and applications (pp. 91–101). Cengage Learning EMEA.

Matsuo, Y. (2003). Prediction, forecasting, and chance Discovery. In Y. Ohsawa & P. McBurney (Eds.), Chance discovery. Berlin, Heidelberg: Springer.

McGuire, K. A. (2017). The analytic hospitality executive: implementing data analytics in hotels and casinos. Hoboken, New Jersey: John Wiley & Sons, Inc.

Metzger, A., Franklin, R., & Engel, Y. (2012). Predictive monitoring of heterogeneous service-oriented business networks: the transport and logistics case (pp. 313–322). IEEE.

Morales, D. R., & Wang, J. (2010). Forecasting cancellation rates for services booking revenue management using data mining. European Journal of Operational Research, 202(2), 554–562.

Moro, S., Cortez, P., & Rita, P. (2015). Business intelligence in banking: A literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation. Expert Systems with Applications, 42(3), 1314–1324.

Nikita, M. (2016). ldatunning: Tuning of the Latent Dirichlet Allocation model parameters (Version 0.2.0). Retrieved from

Noone, B. M., & Lee, C. H. (2011). Hotel overbooking: The effect of overcompensation on customers’ reactions to denied service. Journal of Hospitality & Tourism Research, 35(3), 334–357.

Nunez-Mir, G. C., Iannone, B. V., Pijanowski, B. C., Kong, N., & Fei, S. (2016). Automated content analysis: addressing the big literature challenge in ecology and evolution. Methods in Ecology and Evolution, 7(11), 1262–1272.

O’Neil, C., & Schutt, R. (2013). Doing data science. Sebastopol, CA, USA: O’Reilly Media.

Pan, B., & Yang, Y. (2017). Monitoring and forecasting tourist activities with big data. In M. Uysal, Z. Schwartz, & E. Sirakaya-Turk (Eds.), Management science in hospitality and tourism: Theory, practice, and applications (pp. 43–62). Apple Academic Press. Retrieved from

Park, J. Y., & Nagy, Z. (2018). Comprehensive analysis of the relationship between thermal comfort and building control research - A data-driven literature review. Renewable and Sustainable Energy Reviews, 82, 2664–2679.

Pulugurtha, S. S., & Nambisan, S. S. (2003). A decision-support tool for airline yield management using genetic algorithms. Computer-Aided Civil and Infrastructure Engineering, 18(3), 214–223.

R Core Team. (2016). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from

Talluri, K. T., & Van Ryzin, G. (2005). The theory and practice of revenue management. New York, NY: Springer.

Tsafnat, G., Glasziou, P., Choong, M. K., Dunn, A., Galgani, F., & Coiera, E. (2014). Systematic review automation technologies. Systematic Reviews, 3, 74.

Tsai, T.-H. (2011). A temporal case-based procedure for cancellation forecasting: a case study. Current Politics and Economics of South, Southeastern, and Central Asia, 20(2), 159–182.

Weatherford, L. R., & Kimes, S. E. (2003). A comparison of forecasting methods for hotel revenue management. International Journal of Forecasting, 19(3), 401–415.

Webster, J., & Watson, R. T. (2002). Analyzing the past to prepare for the future: Writing a literature review. MIS Quarterly, 26(3), xiii–xxiii.

Welbers, K., Van Atteveldt, W., & Benoit, K. (2017). Text analysis in R. Communication Methods and Measures, 11(4), 245–265.

Zakhary, A., Atiya, A. F., El-Shishiny, H., & Gayar, N. (2011). Forecasting hotel arrivals and occupancy using Monte Carlo simulation. Journal of Revenue and Pricing Management, 10(4).






Tourism/Hospitality: Research Papers

How to Cite

Antonio, N. (2019). Predictive models of hotel booking cancellation: a semi-automated analysis of the literature. Tourism & Management Studies, 15(1), 7-21.