Sentiment Analysis of Customer Feedback Reviews Towards Hotel’s Products and Services in Labuan Bajo

The feedback review column on the TripAdvisor website can process the sentiment analysis by classifying the positive and negative impressions of the consumer using machine learning, namely text mining. This study analyzes consumer sentiment for hotel products and services in Labuan Bajo based on feedback review data on the TripAdvisor website. Meanwhile, the algorithm used is Nave Bayes Classifier (NBC), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN) algorithm. In addition, SMOTE Upsampling is used as technic to balance the dataset calculated by k-Nearest Neighbor (k-NN). The classification result shows 702 negative feedback reviews and 2531 positive feedback reviews. The evaluation of algorithm performance shows the accuracy of SVM is 78,30% and NBC is 78,29% compared with k-NN with 85,24% accuracy, using SMOTE Upsampling with a class precision description of 100% prediction negative (1784 true Neg & 0 true Pos) and 77,21% prediction positive (747 true Neg & 2531 true Pos). The class recall description also shows 70,49% true Negative (1784 pred Neg & 747 pred Pos) and 100% true Positive (0 pred Neg & 2531 pred Pos). These findings indicate that the k-NN algorithm shows the best result instead of the SVM and NBC algorithm, according to the sentiment analysis result of customer feedback reviews on hotel products and services in Labuan Bajo through the TripAdvisor website.


INTRODUCTION
Hotel consumers have expectations of products and services balanced with the value paid [1]. [2] reported that product prices influence consumer expectations, especially related to product quality. In addition, [3] shows that consumers have high expectations of the product's value. It indicates that consumer preferences for the value of a product or service will affect behavior and purchasing decisions [4]. Several previous researchers have shown that consumer expectations and satisfaction can be classified into sentiment and serve as recommendations for business development [5]. Thus, the study of consumer sentiment analysis needs to be improved to improve the quality of products and services.
This study aims to classify consumer sentiment toward hotels, resorts, and restaurants in Indonesia's super-priority tourism destination Labuan Bajo. The initiation of sentiment analysis studies at Indonesian Super Priority Tourist Destinations still needs to be optimized to be utilized by stakeholders to increase the quantity and quality of the hotel, resort, and restaurant business, and other tourism supporting facilities. The business sentiment analysis of hotels, resorts, and restaurants is used to analyze consumer preferences related to the room's characteristics, food, location, services, cleanliness, price, and entertainment [6]. Hotel managers can improve business management strategies to meet consumer expectations [7]. Several studies on sentiment analysis show that the review column on hotel websites to accommodation service reservation agents is used to identify and analyze consumers' positive-negative sentiments towards facilities, reservation systems, locations, and services [8], [9]. In addition, the study of sentiment analysis on the business of hotels, resorts, and restaurants in Indonesia, especially the super-priority tourist destination of Labuan Bajo, needs to be discussed from an empirical and theoretical perspective. Sentiment analysis using a text mining approach in the feedback review column is essential to identify consumer perceptions as end-users on the existing conditions of products and services provided [10]. In addition, the discourse environment for hotel consumer reviews on various review platforms was reviewed by considering subjectivity, diversity, length, sentiment, and readability [11]. Each hotel has different management policies, including establishing a reservation service system for consumers [12] and human-like robot employees in the hotel industry [13]. The development of studies on sentiment analysis is not only carried out to analyze consumer perceptions. [14] also analyzed the sentiments of hotel employees related to the management system adopted by each hotel. [15] Big data analysis or text mining will provide better insight into hotel performance and guest satisfaction. It indicates the importance of the sentiment analysis of hotel consumers to evaluate the quality of products and services. Sentiment analysis of hotel products and services in Indonesia needs to be explored in-depth, given the Covid-19 pandemic, which urges hotel management to adjust changes in the implementation of hotel guest services based on health protocols. Managing hotel accommodation services during the Covid-19 pandemic has drawn several criticisms from consumers regarding staff, service, room, cleanliness, slow booking, and hotel pandemic response [16]. The same thing is also shown by [17] that consumer confidence in the products and services offered by the hotel during the Covid-19 pandemic guarantees health and safety through social distancing policies and optimization of cleaning services. On the other hand, [18] shows four critical dimensions in hotel management during the Covid-19 pandemic: medical preparedness, hygiene control, health communication, and self-service technology. It indicates that analyzing hotel consumer sentiment towards products and services is essential in maintaining the hotel business during the Covid-19 pandemic.
This article offers insight to discuss empirical data on consumer sentiment toward the products and services of accommodation service providers in Labuan Bajo as Indonesia's super-premium destinations. The qualification standard for the hotel and resort review dataset is limited to only ten hotels and resorts with more than 100 reviews. The data includes consumer review data on the TripAdvisor platform until May 1, 2022. Thus, the analyzed hotels and resorts are Puri Sari Beach Hotel, Plataran Komodo Resort and Spa, Bintang Flores Hotel, Ayana Komodo Resort, Sylvia Resort Komodo, Bayview Garden Hotel, Golo Hiltop Hotel and Restaurant, Laprima Hotel, Sudamala Resort Seraya, Eco Tree O'tel. Meanwhile, the source of review data comes from the TripAdvisor website. Consumer reviews of hotel and resort products and services will be classified based on positive and negative sentiments using the Naïve Bayes Classification (NBC) Support Vector Machine (SVM) algorithm, k-Nearest Neighbor (k-NN). Furthermore, the algorithm performance is evaluated based on the highest level of accuracy as a recommendation for classification methods relevant to data on consumer reviews of hotel products and services in Labuan Bajo. In addition, the consumer reviews related to employees, services, rooms, cleanliness, and hotel reservation systems are discussed with a theoretical perspective that relates to consumer expectations and satisfaction while consuming the products and services of accommodation service providers in Labuan Bajo.

METHODS
This study uses machine learning methodology [19] as a computational technique based on the Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), and Naïve Bayes Classification (NBC) algorithms. Previous studies have shown that machine learning is a relevant approach in text mining [20]. In addition, one of the essential processes in the text mining approach is the implementation of classification methods based on negative and positive sentiments. This research consists of three stages: pre-processing dataset, processing dataset, and evaluation algorithm performance. The pre-processing stage selects and determines hotels and resorts on the TripAdvisor website, as shown in figure 1 below. The hotels and resorts dataset qualifications are limited to ten hotels or resorts with more than 100 reviews on TripAdvisor. The data includes consumer review data on the TripAdvisor platform until May 1, 2022. Thus, the dataset is selected based on the qualification standard. This research uses the dataset of consumer reviews toward products and services in hotels and resorts, such as Puri Sari Beach   Henoch Juli Christanto, Yerik Afrianto Singgalen | 809 Figure 2 scrapes the TripAdvisor website's review data for each hotel and resort related to location, cleanliness, service, and value. The data scraped were customer name data, review date, travel rating (excellent, very good, average, poor, terrible), time of year (Mar-May, Jun-Aug, Sep-Nov, Dec-Feb), traveler type (families, couples, solo, business, friends), and languages. Meanwhile, the limitation of this research is the review data in English. Specifically, consumer reviews used in the analysis also relate to assessments of property amenities, room features, room types, location, and services. Review data related to the hotel's products and services in Labuan Bajo, representing the quality of super-premium tourism destinations in Indonesia. Positive or negative sentiments characterize customer expectation and satisfaction analysis. Thus, the data processing should be well conducted using text mining tools, as shown in figure 3 below. Figure 3 is Rapidminer software developed for machine learning, text mining, and predictive analytics. The data that has been obtained from the website will be processed using text mining operators to clean the data through the tokenize operator (regular expression and non-letters), transform cases, filter tokens (by length), stopwords filter (English), and stem (snowball). In addition, several operators such as select attributes, sort, remove duplicates, and store data in a repository. After all, data is combined and stored in the storage. The following process will classify negative and positive sentiments using the k-NN, NBC, and SVM algorithms and evaluate the performance. In addition, several studies have shown that the Tripadvisor Website is one of the providers of credible review data with a capacity control system, where the syntax on the Tripadvisor Website has set the user detection function as a reviewer and the language used in the review.

Figure 3 Text Mining Tools
If the user provides a study in an undetectable language, the system will automatically give a notification to delete the review. The following is a link from the data source taken for analysis.  Table 1 shows the data sources used to proceed with the classification method. The Classification of positive and negative sentiments can be done manually following the rating in the review. If there is a review with a five-star rating, it can be marked with positive sentiment. If there is a review with a one-star rating, it can be marked as a negative sentiment. Determining positive and negative sentiments based on five-star and one-star ratings produces training data. Meanwhile, review data with two-star, three-star, and four-star ratings can be predicted and classified into negative and positive sentiments using the k-NN, NBC, and SVM algorithms. The importance of categorizing five clusters into two clusters is to confirm consumer expectations and satisfaction with hotel products and services in Labuan Bajo. Thus, hotel management can optimize products and services.
Configuration for data performance testing using operators retrieving data from the repository, multiplying, processing documents from data, then setting role. For testing the k-NN and NBC algorithms, the sentiment column is used as a label and then connected to the nominal text and multiply operators. Specifically, the k-NN SMOTE Upsampling operator uses to balance the dataset. Meanwhile, the cross-validation operator processes the k-NN, NBC algorithm model. In the cross-validation operator, apply model and performance operators are needed to measure accuracy, class precision, and class recall. Furthermore, the SVM's testing process flow uses text to nominal and nominal to numerical operators. Moreover, data split operators are needed for data sharing (0.7 and 0.3) to be tested using the SVM model in operator cross-validation. The sentiment classification process's configuration, the analysis and algorithm performance evaluation, and the relevant algorithms are used in sentiment analysis and the increasing data on consumer reviews of hotels and resorts in Labuan Bajo.

Sentiment Classification and Model Performance Evaluation
The consumer reviews of hotels and resorts in Labuan Bajo based on negative and positive sentiments indicate that the aspects most reviewed by consumers are: The strategic location of the hotel; Cleanliness in the hotel environment; The rooms used. The services provided by the hotel and the value of the product and services are in balance with consumer spending. Consumer review data is vital for developing service and product management offered by the hotel [21]. In addition, hotel consumer reviews on various digital media platforms provide an overview of the hotel's brand image and measure the performance of the hotel's marketing strategy [22]. Meanwhile, the marketing strategy of hotel products and services is implemented as a Corporate Social Responsibility program on social media and consumer brand engagement [23]. It indicates that the classification of negative and positive sentiments for hotel consumers has an empirical contribution to developing marketing strategies to increase customer loyalty.
Indonesia has a variety of tourist destinations that attract investors in the hospitality industry, one of which is Labuan Bajo which is being developed as a super-premium tourist destination [24]. One of the Labuan Bajo tourism icons that attract foreign tourists is the existence of Komodo dragons, authentic local community traditions, and all stakeholders' support to develop the Labuan Bajo tourism sector [25]. On the other hand, through the Ministry of Tourism and Creative Economics, the Indonesian government develops tourism resources in  [26]. Thus, a study of consumer reviews of hotels and resorts in Labuan Bajo also provides an overview of the expectations and satisfaction of foreign tourists. Therefore, the output of this research contributes not only empirically but also theoretically to the field of tourism-related information systems, especially studies on machine learning and text mining.
This study uses consumer review data on the TripAdvisor platform to get an insight into the expectations and satisfaction of foreign tourists as consumers for hotel products and services in Labuan Bajo. Meanwhile, the minimum review limit and the number of hotels set in this study are ten hotels with several reviews above 100. The data includes consumer review data on the TripAdvisor platform until May 1, 2022. In the pre-processing stage in each review column, duplicated data, numbers, and expressions need to be processed to produce a dataset ready to be managed. The results of processing the review dataset are prepared to classify based on negative and positive sentiment classification process are 3233, as shown in Table 2 below. Based on the classification of positive and negative sentiments from the accumulated dataset, there are 2531 positive and 702 negative sentiments. It indicates that the image of Labuan Bajo's super-premium tourist destination for foreign tourists, especially hotel and resort consumers, is still positive. Nevertheless, the existence of 702 negative sentiments needs to be studied in depth to produce recommendations for the development of optimal hotel products and services. In the context of the Covid-19 pandemic, hotels used as accommodation facilities for Covid-19 suspects need medical assistance, control consumer hygiene, healthy communication, and self-preservation technology [18]. Complaints about services that are not optimal affect the consumer's perception that the products and services provided by the hotel or resort are not in line with expectations or even unsatisfactory. A not optimal service implies a loss of trust and a decrease in consumers [27]. Therefore, negative consumer sentiment needs to be validated as a reference for developing products and services in line with consumer expectations and satisfaction.
The sentiment classification process from the prepared dataset shows that the text mining approach is significant for processing consumer review data in the future. However, the use of the algorithm needs to be considered with a better accuracy value [28]. In addition, [29] stated that the Support Vector Machine (SVM), Artificial Neural Network (ANN), Naïve Bayes Classifier (NBC), Decision Tree (DT), C4.5, and k-Nearest Neighbor (k-NN) could classify the customer reviews. Unfortunately, this research only uses SVM, NBC, and k-NN algorithms to be evaluated. Based on the performance evaluation of the algorithm used, there are percentage differences in each hotel and resort dataset and a combination of the entire dataset, as shown in Table 3 below.  Table 3 shows that each hotel dataset shows various accuracy values. The difference in accuracy values is assessed as the algorithm's relevant performance to the dataset. The accuracy value can be increased by increasing the amount of training data and test data. However, in this research, the evaluation results of algorithm performance for the classification of positive and negative sentiments with the highest accuracy value are k-NN with a total of 85, 24%. There are differences between the calculation result using SMOTE UPsampling and without SMOTE Upsampling for the k-NN algorithm, as shown in figure 4 below.
Before SMOTE Upsampling After SMOTE Upsampling

Figure 4
Confusion Matrix before and after using SMOTE UPsampling for the k-NN algorithm.
Implementing k-NN before using SMOTE UPsampling resulted in a lower accuracy and precision value of 78.29%, even though the recall value was 100%. Unlike the case with the accuracy value after using SMOTE UPsampling, which is 85.24%, where the precision value is 77.22%, and the recall value is 100%. On the other hand, the Area Under Curve (AUC) value from the calculation results using the k-NN algorithm shows a significant difference where the AUC k-NN value before using SMOTE UPsampling is 0.500 and the AUC k-NN after using SMOTE UPsampling is 0.948, as figure 5 below.
Area Under Curve (AUC) k-NN Before using SMOTE UPsampling Area Under Curve (AUC) k-NN After using SMOTE UPsampling Henoch Juli Christanto, Yerik Afrianto Singgalen | 815 Figure 5 is the Area Under Curve (AUC) value from the calculation results using the SMOTE UPsampling dataset balancing technique in the k-NN algorithm. [30] shows that the value of using SMOTE Upsampling greatly affects the AUC value and shows a significant difference. In this study, the difference in accuracy values before and according to SMOTE Upsampling in the k-NN algorithm was 6.95%, while the precision value was -1.07%. Meanwhile, the Recall value has not changed at all. Furthermore, [31] shows that the categories of AUC values are as follows: excellent (0.9-1.0); good (0.8-0.9); fair (0.7-0.8); poor (0.6-0.7); failure (0.5-0.6). Thus, the AUC value of k-NN after using SMOTE Upsampling shows an excellent condition with a value of 0.948. Thus, the k-NN shows the best result using SMOTE UPsampling compared to NBC (78,29%) and SVM (78,30%), as shown in the table below.  Table 4 shows the change in the values of accuracy, precision, and recall before and after using the SMOTE UPsampling value as a balancing technology for the dataset; considering this, the best accuracy, precision, and recall values are used and proceed to the data analysis stage of text mining processing results. After knowing that the results of data processing using the k-NN algorithm show the best performance, it can be continued to the stage of analysis and interpretation of data, where the results of the classification of negative sentiment and positive sentiment need to be formulated into recommendations for stakeholders to optimize product and service management in their respective hotels.
In this research, the processing of review data obtained from the TripAdvisor platform is not used to provide recommendations for each hotel but to obtain a broad picture of consumer sentiment towards hotel products and services in Labuan Bajo. Based on these considerations, partial hotel and resort data are combined into one dataset to be processed using the NBC, SVM, and k-NN classification models, then compared based on a higher accuracy [31] value. Thus, the interpretation of the review data becomes relevant to the context of consumers of hotel products and services in Labuan Bajo. In addition, homestay businesses, especially in Labuan Bajo, to increase hotel guest satisfaction to increase consumer loyalty. The results of sentiment data processing classify the perceptions of hotel guests in the form of text, which is uploaded online on the TripAdvisor website page. However, TripAdvisor's information about tourism and travel support facilities is used as a reference before planning a trip. In the context of tourist behavior, [32], [33] show that travelers have consumption behaviors based on cognitive and affective aspects. The behavior of accessing, consuming, and distributing information on TripAdvisor websites is part of the consumption behavior of travelers. Therefore, it is essential to analyze consumer feedback reviews, especially consumer satisfaction and dissatisfaction with hotel products and services.

Consumer Feedback Reviews: Satisfaction and Dissatisfaction of Hotels Product and Services
The hotel industry plays an essential role in the development of tourism. However, [34] stated that hotels also suffer from the economic effects of tourism seasonality. Furthermore, [35] points to hotel management changes in response to the crisis caused by the Covid-19 pandemic. The condition of the Covid-19 pandemic with the limitation of access mobility to various tourist destinations has increased technology users as a medium of communication and electronic transactions [36].
In response to these changes, the management adjusted the needs of consumers by providing electronic services to mobilize users' lack of product information and e-services applied in the Covid-19 pandemic [27]. The intensity of the use of technology also encourages the development of feedback review services on travel platforms. Also, used by hotel management to increase users' satisfaction, engagement, and loyalty [37]. Meanwhile, the problems faced by the hotel industry in various tourist destinations cannot be generalized or contextual. Therefore, studying consumer sentiment analysis toward hotel products and services is necessary.
Consumer perceptions of the hospitality industry are very complex to predict, so it is necessary to conduct comprehensive research to get an idea of the disappointment and satisfaction of consumers with hotel products and services [38]. On the other hand, [39] stated the consequences of consumer dissatisfaction that affect business sustainability. Consumer dissatisfaction also affects the behavior of guests in the future. The hotel management must provide excellent service to every guest who comes to stay and gives a positive impression to the guests. In addition, [40] argues that some hotels have adopted sophisticated technology in serving customers for the reservation process, but this can also negatively impact them when it does not meet consumer expectations. It shows that the effectiveness and efficiency in service to hotel guests are not only supported by human skills and technology but, more importantly, the link to customer expectation.

Henoch Juli Christanto, Yerik Afrianto Singgalen | 817
Several studies on cross-hotel and cross-country show that consumer segmentation based on culture influences expectations of hospitality service attributes [41]. Furthermore, [42] argues that an in-depth analysis of the complaint resolution process is necessary to measure consumer satisfaction or dissatisfaction. There is a critical dimension: compensation received and attributes of the retailer's representative. Meanwhile, [43] stated that social media had expanded word of mouth (WOM). It is no massive means of online communication. So that consumers have had a significant impact on the business's reputation disseminated externally in virtual media, it shows that negative sentiment is an incident that must be taken seriously by considering the demographic background of consumers and the class of the hotel.
If explored in-depth, each hotel and resort studied has different products and services. Meanwhile, the number of visits to the hotel also significantly differs and is related to the hotel's location. Likewise, review data on TripAdvisor shows that Puri Sari Beach Hotel and Bayview Garden Hotel have a higher quantity of guest reviews than Eco Tree O'tel and Sudamala Resort, Seraya. In addition, based on the classification of negative and positive sentiments, the Bintang Flores and Laprima Hotel have several positive and negative reviews that are not much different. Unlike the case with Sylvia Resort Komodo, which needs to evaluate the performance of product and service management to increase positive reviews on the TripAdvisor platform. Although not all guests intend to fill out review data on the TripAdvisor platform, complaints about incidents experienced by consumers tend to review on the TripAdvisor platform. This condition affects the image of the hotel or resort in the competitive super premium tourism market of Labuan Bajo.
The challenge for the hotel is the inability to control consumer perceptions to give a positive rating to hotel or resort products and services. [44] stated that the challenges and opportunities for hotels to obtain positive ratings as a form of customer satisfaction is by transferring knowledge about hotel management and maintenance processes to maintain and improve product and service quality. [45] emphasized that the hotel business is a business that is very vulnerable to decline due to consumer perceptions, expectations, and dissatisfaction. Therefore, hotel management must always be innovative in adopting an effective and efficient managerial approach and meeting consumer expectations. Likewise, [16] shows that the skills of staff who deal directly with consumers in the service process will determine the image of the hotel's business. In Labuan Bajo's context, employees hired to serve guests must understand consumer characteristics based on demographic, geographic, and psychographic segmentation. Consumers will give positive reviews if the service is imposing and continue improving the hotel's image or resort on the TripAdvisor platform.

CONCLUSION
The results of this study indicate a theoretical and empirical contribution. The practical assistance of this research emphasizes the study of tourism and information systems through machine learning and text mining approaches. This study's in-depth concern was consumer sentiment towards the products and services of hotels and resorts in the super-premium tourist destination of Labuan Bajo. The findings of this study indicate that the positive sentiment of hotel and resort consumers in Labuan Bajo is still dominant, thus maintaining a positive image of Labuan Bajo's super-premium tourism. However, negative consumer sentiment towards hotel products and services is relatively high. It requires hotel management to update services and products to align with consumer expectations and increase customer satisfaction and loyalty. In addition, in dataset processing, the classification model algorithm ideally used to sort negative and positive sentiments is k-NN, with the highest accuracy value compared to NBC and SVM. Furthermore, the accuracy value will change along with the processing of review data obtained from the TripAdvisor platform. This research is still limited because it only uses one outlet in the sentiment analysis process. Therefore, a further research recommendation is to analyze consumer sentiment for hotels, resorts, cottages, and restaurants in super-premium tourist destinations using datasets from various platforms.