A Genetic-based approach for Author Name Disambiguation Problem
In the recent years, with the increasing volume of articles and the use of Internet and search engine services, the author name disambiguation problem has received a lot of attention. Name disambiguation can occur when one is seeking a list of publications of an author who has used different name variations and also when there are multiple other authors with the same name. So far, various methods have been proposed to solve this problem, each of which has its own advantages and disadvantages. Despite years of research, the name disambiguation problem remains largely unresolved. In this study, we propose an algorithm to identify several records that belong to one author. For this purpose, a new criterion has been proposed to determine the similarity between the two records. Since this study addresses the approximate matching of authorschr('39') records, the importance of the fields in each record is determined by the coefficients. In order to get the optimal coefficients, we propose a genetic algorithm to learn from the available samples. The proposed method has been evaluated with two fitness functions on experimental data and the results are promising.