Enhancing Sales Determination for Coffee Shop Packages through Associated Data Mining: Leveraging the FP-Growth Algorithm

The coffee shop business offers a diverse range of coffee and food options. However, customers often experience delays during transactions due to the extensive selection of menu items and combinations. This inconvenience not only discomforts new customers but also hampers their likelihood of returning, potentially impacting the overall business turnover. To address this issue, this study aims to establish association rules by combining the least and most popular menu items for the upcoming month. These rules will serve as a guideline for creating shopping packages that streamline the decision-making process. The FP-Growth algorithm is employed to analyze sales transaction data from January to March 2023, comprising 2,336 transactions in .csv format. Among the generated association rules, two rules stand out with the highest support and confidence values. The first rule exhibits a support value of 0.3% and a confidence of 70.0%, while the second rule showcases a support value of 0.4% and a confidence of 69.2%. By considering these two rules alongside the existing menu options, coffee shop owners can effectively curate shopping packages that cater to customer preferences. It is anticipated that these packages will elevate the quality of service, attract a greater number of customers, and subsequently enhance the overall business turnover.


INTRODUCTION
Sales is a crucial business activity that can significantly boost profitability, particularly in the coffee shop sector. The business world is witnessing rapid growth and fierce competition, especially within the coffee shop industry. This trend is evident in the increasing number of coffee shops in Indonesia, particularly in Yogyakarta. As the business landscape evolves, it becomes essential for entrepreneurs to explore various strategies to maintain product quality, anticipate customer demands, and identify top-selling products. This ensures efficiency and effectiveness, enabling businesses to thrive and compete successfully against their counterparts. One such coffee shop is "End of Heaven," privately owned and managed. At this establishment, a diverse range of coffee beverages, including Arabica, Robusta, and other varieties, are offered Wahyuningsih, Putri Taqwa Prasetyaningrum | 759 alongside snacks. Additionally, the coffee shop provides internet facilities and a comfortable ambiance. However, due to customers frequently inquiring about menu options or combinations, transactions often take longer than desired, potentially leading to customer dissatisfaction and reduced turnover.
Data mining is a powerful technique used to discover, search, and explore new information or knowledge within extensive datasets. It involves integrating and merging various scientific disciplines such as statistics, artificial intelligence, and machine learning. By employing data mining, businesses can analyze data to extract valuable insights and generate useful information [1]. When determining association rules, interestingness measures, derived from specific data processing calculations, play a vital role. Support and confidence are two common measures of interest in association rules [2]. In the realm of association methods, the FP-Growth algorithm stands out for its faster execution time in identifying frequent itemsets when compared to other algorithms like the a priori algorithm [3].
The distinguishing feature of the FP-Growth algorithm lies in its utilization of the FP-Tree data structure, enabling direct extraction of frequent itemsets [4]. The FP-Tree serves as a storage structure, mapping each transaction data onto a specific path. It effectively identifies frequent patterns by applying a minimum support count threshold through the FP-Growth algorithm [5]. Association rules mining, a data mining technique, leverages item relationships within an itemset to predict patterns in a dataset. This involves identifying recurring itemsets from transaction data, relational datasets, or other data types [6]. In the research conducted by [7], the FP-Growth algorithm, in conjunction with a closure table, was employed to discover frequent itemsets in shopping carts.
The study delves into the construction of association rules by analyzing general itemsets. Through the compression of shopping cart data into the FP-Tree, the FP-Growth algorithm effectively identifies common itemsets. It focuses on data mining, a process that extracts valuable information from a database warehouse. Utilizing the concept of tree development, the FP-Growth algorithm searches for frequent itemsets. Confidence values for obtained rules were calculated using Rapidminer-studio 7.3.0 [8]. These rules find application in product marketing strategies, with a designated minimum support of 0.1%. As the minimum confidence value increases, the number of generated rules decreases [9]. In the study conducted by [10], library transaction data from January to May 2019 comprising 5026 data points was utilized. Data analysis revealed all obtained budgets having ratio values above 1.00, affirming the accuracy of the data for library book layout arrangements. Through data processing, the objective is to identify patterns or characteristics that enhance promotional effectiveness, facilitate informed decision-making, increase customer satisfaction, and mitigate potential losses [11], [12]. The accuracy level of the FP-Growth algorithm is three times higher than that of the Apriori algorithm [13]. Manual calculations and Rstudio were employed to apply the FP-Growth algorithm with minimum support parameters of 0.04 and confidence of 0.2 to the first 100 transactions, resulting in the identification of the same association rules [14]. The application of the FP-Growth algorithm to sales transaction data facilitates the discovery of association rules based on consumer shopping habits. Optimizing product placement is expected to enhance customer satisfaction, increase store sales, and subsequently boost turnover in the following year [15].
This study primarily focuses on determining the highest value relationship patterns in terms of confidence, particularly the association patterns between menus and sales transactions. Multiple datasets were tested using the same association parameters and algorithms. The ultimate result, including the highest confidence values and the best recommendations from various sample datasets, will be provided to the coffee shop owner as a decision-making reference.

Data Description
The data utilized in this study comprises sales transaction data from Ujung Langit coffee shop, spanning the period of January to March 2023. The data originates from the cashier software database and has been exported to an Excel format. In total, the dataset consists of 2,336 transactions, encompassing 60,762 individual records.

Method Analysis
In this section, we will elucidate the research flow. Figure 1 provides an overview of the sequential planning stages, commencing with a comprehensive understanding of the business goals and requirements. Subsequently, the process entails dataset collection, pre-processing of the data, modeling data associations, evaluation, and ultimately culminates in the preparation of a report summarizing the insights derived from the evaluation of the data mining process. This study adheres to the registered CRISP-DM methodology, founded on the author's theory. CRISP-DM, also known as the Cross-Industry Standard Process for Data Mining, encompasses six stages: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Implementation. The intricate details of the research flow are outlined as follows.

Business Understanding
This stage primarily focuses on comprehending the business goals and requirements. The research objectives align with the background context, aiming to uncover the highest value of confidence and identify various relationship patterns between menus within transactional sales. The management's intention is to experiment with creating packages that may involve removing items from the menu based on low sales or prioritizing the sales of items that have a potential for low sales. These items would then be combined with one or several best-selling items, with the aim of simplifying customers' decision-making process when selecting their shopping items.

Data Understanding
This step entails the collection and identification of data. As outlined in the Research Path, the data to be utilized consists of a historical data recap spanning the last three months, formatted as a .csv file. Subsequently, the data will undergo further processing using Microsoft Excel. Subsequent processing will be conducted using Rapid Miner Studio 9.10, with the objective of determining the relationship values among various items. This value, also known as confidence, can subsequently inform decision-making regarding packaging strategies by the coffee shop.
The chosen method for this study is the Association approach, utilizing the FP-Growth algorithm within the Rapid Miner Studio 9.10 application. It is important to note that the datasets used must conform to certain requirements, specifically in a binomial format. The obtained data fulfills these requirements following preprocessing stages, as illustrated in Figure 2.

Data Preparation
This stage encompasses all the necessary activities to construct the final dataset, which will serve as the input for the subsequent data mining phase. The stages involved in this process include data selection, data cleaning, data construction, data integration, and data formatting. These stages can be further broken down into sub-steps. Once the data has been prepared in the (.csv) format and contains binomial values, it can be opened in the Rapid Miner application for further execution.
To begin, install and open the Rapid Miner application. Next, import the dataset by selecting the appropriate storage location. Proceed to the Design menu and drag and drop the stored dataset onto the worksheet. Connect the output point by clicking and dragging, and then link it to the result. Finally, click the blue "Start Execution" button to initiate the process, as depicted in Figure 3. Wahyuningsih, Putri Taqwa Prasetyaningrum | 763 The subsequent step involves transforming the values in the dataset to a binomial format. To achieve this, click on the "Find Data" section located at the top right corner. Enter "Numerical to Binomial" in the search bar and select the "Numerical to Binomial" option under Carrier Options. Connect the "Numerical to Binomial" operator with the "Retrieve Data" and "Result" components, as demonstrated in Figure 4. The purpose of the "Numerical to Binomial" operator is to convert numeric data into binomial values, specifically true or false, or 0 and 1. This conversion is necessary for the subsequent stage, as the association method requires the data to be in a binomial format for further processing.

Modelling
At this stage, we will focus on the manual simulation of association methods and algorithms, specifically highlighting the FP-Growth algorithm in machine learning. The primary objective of this modeling phase is to identify the highest values for Support and Confidence. The FP-Growth algorithm is an improved version of the Apriori algorithm, developed to overcome its limitations. The FP-Growth algorithm, also known as Frequent Pattern Growth, offers an alternative approach to identify frequent item sets, which are the most frequently occurring datasets. Unlike the Apriori algorithm that requires candidate generation to obtain these item sets, the FP-Growth algorithm eliminates the need for such generation. Instead, it leverages the concept of building a tree structure to efficiently identify recurring items. This unique characteristic enables the FP-Growth algorithm to generate results more swiftly compared to the Apriori algorithm. In this step, calculations are performed on item sets with a minimum support value of 0.01 or 1%. This means that if an item set meets the 1% support threshold, it will be considered for further calculations in subsequent steps. The ultimate goal is to determine the Confidence value for the final item sets. The calculation of the Support value for each item set follows equation 1, and for the minimum confidence value, it is calculated using equation 2.

Number of Transaction Contains A Value Support (A) =
(1) Total Transaction

Transaction Containing A and B Confidence = p(B|A) =
(2)

Number of Transaction Containing A
The next step involves incorporating the FP-Growth method into the design view. To do this, go back to the Design menu and click on the search section. Type "FP-Growth" in the search box. An option for "FP-Growth (modelling/associations)" will appear. Drag and drop it onto the worksheet design. Connect the output from the "Numerical to Binomial" operator to the input of the FP-Growth operator. In the subsequent step, click on the FP-Growth operator and enter the minimum support parameter as 0.02. The support value represents the percentage of item combinations present in the database. You can refer to Figure 5 for a visual representation of the configuration.

Figure 5. Linking Numerical to Binominal Operators with FP-Growth
The next step involves incorporating the "Create Association Rules" operator in the Process view. To do this, go back to the search section and type "Create Association Rules." Drag and drop it into the Process section. Connect the meta data output from "Frequent FP-Growth" to the meta data input of "Item set." In the "Create Association Rules" operator, fill in the parameter for the minimum confidence as 0.5. Confidence represents the strength of the relationship between items in the generated associative rules using the association method in data mining. You can refer to Figure 6 for a visual representation of this configuration.

Evaluation
During the evaluation phase, the dataset that has undergone machine learning processing will yield Support and Confidence values. Subsequently, the output will be evaluated to serve as a representative sample, providing a foundation for the coffee shop owner to make informed decisions. The primary objective of the Evaluation Phase is to ensure the accuracy and correctness of the modeling stage, thereby generating outputs that align with the desired expectations. The obtained results consist of datasets that have been subjected to machine learning processes, specifically by employing the FP-Growth algorithm with a minimum confidence value of 0.5 and a minimum support value of 0.002. These outcomes are illustrated in Figure 7.

Deployment
The outputs generated through machine learning will be extracted based on the specific requirements and presented in a dedicated report, which can then be utilized by coffee shop owners. The Deployment stage entails collaboration between a data analyst and the coffee shop owner to comprehensively explain the output outcomes derived from the machine learning process. This stage assumes utmost significance as it empowers the coffee shop owner to comprehend the model and make informed decisions based on the insights obtained.

Results
Among the comprehensive set of 29 rules generated by the association method, only a select few will be employed for practical application and effective communication with decision makers. It is worth noting that decisions can, at times, be obtained by adhering to predetermined thresholds of support and confidence. The assessment process for the obtained rules must align closely with the desired outcomes, fueled by the aspiration that these rules will hold substantial value in the future. Notably, a precise filtering process has resulted in three rules being excluded from the overall pool generated through the association rule method. The support and confidence values are meticulously presented in Table 1, illustrating their significance in facilitating a comprehensive understanding of the association rules. The value result in Table 1 exhibits three rows of Support values, with the highest Support value being selected as a recommendation for creating shopping packages. The first row of the results displays a Size 2 Items value, accompanied by a Support value of 0.032, or equivalently, 3.2% when expressed as a percentage. This result indicates that when consumers purchase Walini tea (Items 1), there is a 3.2% probability that they will also purchase Snack (Items 2). These probability results, obtained from the cumulative transactions, can serve as recommendations for designing shopping packages aligned with the objectives of this scientific paper. Wahyuningsih, Putri Taqwa Prasetyaningrum | 767 The Support value in the result holds significant importance as a compelling argument or suggestion to be presented to company directors or owners, as well as to customers facing challenges in choosing item combinations. The provider of this argument can substantiate their claims with valid evidence, leveraging the Support value to enhance their recommendations.
1) FP-Growth min Support = 0.002 2) Min Items per Itemset = 2 3) Association Rules Confidence = 0.5 4) Requerement decrease factor = 0.5 Next, two rules with the highest Support values will be selected to serve as recommendations for package formation, based on the Support (X, Y) results.
The Support values will be converted into percentages. The outcome of selecting these two rules can be observed in Table 2. The value results of the Confidence Association Rules consist of three rows, representing three combinations of items. In the first row, it is elucidated that if a consumer purchases Taro and Teh Tarik, the probability of buying Walini Tea is 0.700, which is equivalent to 70.0% when expressed as a percentage. Furthermore, the probability of this combination occurring across all transactions is 0.3%. Subsequently, the two rules with the highest values of Support and Confidence will be selected as recommendations for package formation. Both the Support and Confidence values will be converted into percentages. The outcome of selecting these two rules can be found in Table 3.

Discussion
The generated rules hold significant implications for decision making and package formation within the context of a coffee shop. To begin the analysis, a set of 29 rules is identified, generated through the association method. It is important to recognize that not all of these rules are applicable or suitable for effective communication with decision makers. Instead, decisions can be made based on predetermined thresholds of support and confidence.
The evaluation process for the obtained rules is of utmost importance and should be closely aligned with the desired outcomes. The goal is to extract valuable insights that can be utilized in the future. In this particular case, three rules have been carefully selected from the overall pool of generated rules through the association rule method.
The support and confidence values play a pivotal role in guiding decision making. The support values reflect the frequency of occurrence for specific combinations, enabling the identification of recommendations for creating shopping packages. On the other hand, the confidence values provide valuable insights into the probability of consumers purchasing specific items together.
The results obtained from the confidence association rules highlight specific combinations as potential recommendations for package formation. For example, the analysis reveals that when consumers purchase Taro and Pulled Tea, there is a 70.0% probability that they will also buy Walini Tea. This valuable insight can guide the coffee shop in crafting a package that includes Taro, Pulled Tea, and Walini Tea, aiming to attract customers. Similarly, the combination of Robusta, Peanuts, and Snacks exhibits a high confidence value of 0.692, indicating a strong likelihood of consumers purchasing these items together.
The support and confidence values, effectively presented in tables, provide actionable information for decision makers. These values can be utilized as compelling arguments and suggestions when communicating with company directors, owners, or customers who may face challenges in selecting item Wahyuningsih, Putri Taqwa Prasetyaningrum | 769 combinations. The inclusion of valid evidence strengthens the recommendations made based on these values.
Finally, the analysis of the generated rules, support values, and confidence values equips the coffee shop with informed decision-making capabilities for package formation. By considering the likelihood of item combinations and aligning them with the establishment's objectives, the coffee shop can optimize its offerings and enhance customer satisfaction.

CONCLUSION
The analysis of several sales transactions at Ujung Langit coffee shop, utilizing the association rule mining method, provides valuable insights for the creation and selection of shopping packages. This approach enables business owners and customers to make informed decisions when choosing menus. The results of the association rules are expected to contribute to the coffee shop's business strategy, enhancing the quality of future sales. For future research, it is recommended to incorporate transaction data that reflects a higher percentage of simultaneous purchases. This would yield higher confidence and support values, further optimizing the results. Additionally, exploring various data mining models can offer additional opportunities to improve the analysis and uncover further insights. By leveraging the power of association rule mining, the coffee shop can enhance its offerings, attract more customers, and drive business growth. The utilization of data-driven strategies and continuous research efforts can contribute to the coffee shop's success in an increasingly competitive market.