Comparison of different methods for cluster analysis (Case study: Kermanshah oak forest, Iran)
Vegetation classification is an essential tool to describe, understand, predict and manage ecosystems. The aim of this study was to compare different types of hierarchical clustering. Three forest patches with similar slope and altitude gradients located on the southern slopes of Chahar Zebar forests, Kermanshah province, were selected. Vegetation sampling in each patch was conducted at 0, 25, 50, 100 and 150-meter distances along three transects that were 200 m apart. Cluster analysis was used for the classification of samples. Amongst the applied methods, Gower’s distance (or similarity) initially computes distances between pairs of variables over data sets and then merges those distances with the nearest neighbor, complete neighbor, average neighbor, and Ward’s method. The optimal number and quality of clusters were evaluated with silhouette criteria. In addition, the Cophenetic correlation coefficient was computed for evaluating the correlation between the dendrogram and the distance matrix. Results showed that two was the optimal number of clustering for oak stands. Moreover, the Cophenetic correlation coefficient between the distance matrix and the nearest neighbor and average method was higher than that returned between complete neighbor and Ward’s method. Based on silhouette criteria, the nearest neighbor and average methods were associated with higher cluster quality compared with two other methods. However, the mean value of the silhouette index was low for the second cluster of the nearest neighbor method. Considering the disadvantages of the nearest neighbor, the average method is suggested for clustering categorical data.