Extracting Place Functionality from User-Generated Textual Contents Using Machine Learning Methods

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Introduction

In GIScience, spatial information has usually been presented in the form of space. However, human reasoning, behavior, and perception are mainly based on place, not space. Places are usually ambiguous and context-dependent and are related to the human experience of the world. Place functionality as a context in place descriptions is one of the main and distinguishing features of the place. Today, with the increasing use of users of social networks, volunteered geographic information (VGI) and crowdsourcing information has grown significantly. However, information obtained from social networks, e.g. check-ins, often does not have a complete and clear view of the concept of place and it does not include spatial information between phenomena, land uses, and points of interest (POI). It ultimately limits their ability to work with the concept of place. In this case, GIS should detect the place functionality that does not necessarily exist simply and clearly in the stored data.

Materials and Methods

To address these issues, this paper aims to extract place functionality based on analysis of user-generated textual contents. In order to achieve this goal, first places and user’s reviews about places in TripAdvisor website are collected through web crawling. The advantage of these data over other place-based data is their independence from formal descriptions of place. These data were collected in October 2020, and only English reviews are considered. New York City (NYC) is selected as our case study area. At first, for each place type, we extracted all corresponding places. Then, for each place, we extracted a maximum of 1000 top reviews. To prepare data, places without geographic coordinates, places out of the study area, duplicates or places whose type is unknown are removed. There are five types of place categories on TripAdvisor, including Attraction, Food Serving Place, Hotel, Shop, and Vacation Rental. Then, different natural language processing (NLP) methods are used to preprocess the reviews. First, each review is converted to lower case and tokenized, then punctuations and stop words are removed. Afterward, all tokens are stemmed and lemmatized. In the next step, proper features should be selected for knowledge discovery. We use a bag-of-words (BoW) feature selection method which features values are weighted using TF-IDF scores for each user’s review. Finally, in a supervised method, these values and place functionalities are trained using a logistic regression classifier to predict place functionality on the test dataset.

Results and Discussion

We randomly assigned 75% of the data set to train the model and 25% to test the results. Finally, the results are evaluated using common machine learning evaluation measures by computing confusion-matrix. The evaluation results demonstrate that the overall accuracy of the proposed method is about 96% which is remarkable. For Food Serving Place, the predictions are so close to reality that in 98% of cases the algorithm was able to correctly predict Food Serving Places. Also, about 0.8% of them are considered as Attractions. In the case of Hotels, the accuracy is 97%. However, about 1.8% of Hotels are incorrectly categorized as Food Serving Places. Attractions are also 93% correctly predicted and about 3.8% of them are mistaken for Food Serving Places. In the case of Shop, the accuracy is about 74%, because the number of reviews related to this type of functionality is lower, although this issue has been partially resolved by weighting the samples. Secondly, in many cases, people visit the shopping malls for entertainment and not just shopping, which has led to about 15% of Shops being classified as Attractions. Also, about 11% of these Shops are considered as Food Serving Places. One of the most important reasons for this is the action of buying food in these places, which is a kind of purchase. In addition, in some shopping malls there are places to serve drink and food. Since the reviews of the Vacation Rentals was less than other functionalities, the lowest accuracy (about 65%) is related to them. In 25% of cases, Vacation Rentals are classified as Hotels. This result is not too far-fetched, as Vacation Rentals and Hotels are very similar in function and are often used to accommodate travelers and tourists. Also, 4.8% and 4.6% of them are classified as Attractions and Food Serving Places, respectively. The maximum precision and F1-score is achieved for Food Serving Places while Vacation Rentals show the least precision and F1-score since their functionality is similar to hotels, however, their results are also reliable and satisfactory.

conclusion

In this study, we tried to extract the place functionality by analyzing the user-generated textual contents shared on the TripAdvisor website by users. To achieve this purpose, different NLP methods were used to prepare and preprocess the data. The bag-of-words constructed for each user's review was then modeled to a logistic regression classifier, and the place functionality on the test data was predicted. In future works, the efficiency of other feature selection methods as well as other classifiers in extracting place functionality can be evaluated and compared. In addition, the place functionality should be extracted in more detail where different types of attractions can be distinguished.

Language:
Persian
Published:
Journal of of Geographical Data (SEPEHR), Volume:31 Issue: 124, 2023
Pages:
7 to 19
https://magiran.com/p2546113  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!