فهرست مطالب
Journal of Computing and Security
Volume:1 Issue: 4, Autumn 2014
- تاریخ انتشار: 1393/08/30
- تعداد عناوین: 6
-
-
Pages 261-272In recent decades, as enormous amount of data being accumulated, the number of text documents is increasing vastly. E-mails, web pages, texts, news and articles are only part of this grow. Thus the need for text mining techniques, including automatic text classification, is rising. In automatic text classification, feature selection from within any text appears to be the most important step. Since the feature space in textual data includes tens of thousands of words, feature selection is used for dimension reduction. Different techniques, from statistical to machine learning approaches for feature selection in text have been reported in literature, each with advantages and disadvantages. However up to now there have been very rare researches on utilizing advantages of both learning and statistical approaches. In this paper a new algorithm for feature selection in text is presented to improve the classification performance substantially. The proposed approach - PSA is based on simulated annealing algorithm and document frequency method. So it can benefit from advantages of both statistical and learning techniques. The simulated annealing algorithm requires an appropriate function for fitness evaluation, where document frequency method as an evaluation function has low computational cost. In addition, a new Persian text dataset, i.e. Persian 7-NewsGroups Dataset, is introduced for evaluating the proposed approach. Therefore, to justify and evaluate our approach, the performance of the PSA is compared to famous methods such as chi-square and correlation coefficient on Persian 7-NewsGroups dataset. The results show that the PSA has overall better performance in comparison to the other methods.Keywords: Text Classification, Text Mining, Feature Selection, Simulated Annealing Algorithm, Persian Language
-
Pages 273-281Recently, semi-supervised clustering methods have been considered by many researchers. In this type of clustering, there are some constraints and information about a small portion of data. In constrained k-means method, the user (i.e. an expert) selects the initial seeds. In this paper, a constraint k-means method based on user feedback is proposed. With the help of the user, some initial seeds of boundary data obtained from clustering were selected and then the results of the user feedback were given to the constrained k-means algorithm in order to obtain the most appropriate clustering model for the existing data. The presented method was applied to various standard datasets and the results showed that this method clustered the data with more accuracy than other similar methods.Keywords: Clustering, Semi, supervised using user feedback, Active learning, Boundary data
-
Pages 283-292During the last decades, opponent modeling techniques, utilized to improve the negotiation outcome, have sparked interest in the negotiation research community. In this study, we first investigate the applicability of nearest neighbor method with different distance functions in modeling the opponents preferences. Then, we introduce a new distance-based model to extract the opponents preferences in a bilateral multi issue negotiation session. We devise an experiment to evaluate the efficiency of our proposed model in a real negotiation setting in terms of a number of performance measures.Keywords: Bilateral Multi Issue Negotiations, Linear Utility, Opponent Modeling, Bidding Strategy, Acceptance Strategy, Nearest Neighbor Method, Distance Based Learning
-
Pages 293-305Data mining techniques are widely used for intrusion detection since they have the capability of automation and improving the performance. However, using a single classification technique for intrusion detection might involve some difficulties and limitations such as high complexity, instability, and low detection precision for less frequent attacks. Ensemble classifiers can address these issues as they combine different classifiers and obtain better results for predictions. In this paper, a novel ensemble method with neural networks is proposed for intrusion detection based on fuzzy clustering and stacking combination method. We use fuzzy clustering in order to divide the dataset into more homogeneous portions. The stacking combination method is used to aggregate the predictions of the base models and reduce their errors in order to enhance detection accuracy. The experimental results on NSL-KDD dataset demonstrate that the performance of our proposed ensemble method is higher compared to other well-known classification techniques, particularly when the classes of attacks are small.Keywords: Intrusion Detection, Ensemble Classifiers, Stacking, Fuzzy Clustering, Artificial Neural Networks
-
Pages 307-317Digital libraries may keep millions of citation records and bibliographic attributes such as title, authors names, and the place of publication. Since the materials and contents in digital libraries are taken from diverse and distinct sources, there are some challenges regarding the use of digital libraries. One of the most important challenges is the ambiguity of authors names. Although many methods have been proposed for solving the problem of ambiguous authors names, their accuracy still must be enhanced. In this paper, an accurate method for author name disambiguation is proposed. It combines heuristic hierarchical clustering method and social networks to produce clusters with high accuracy. To evaluate the proposed method, an experiment is conducted using real dataset DBLP. The experimental results show that the accuracy can be enhanced using the proposed method.Keywords: Author Name Disambiguation, Social Networks, Digital Library, Heuristic Hierarchical Clustering
-
Pages 319-327Due to increasing criminal activities by anonymous E-mails in the cyber world, it is a challenging task to extract beneficial knowledge from E-mail systems. This problem in cyber world attracts many researchers in cyber-crime domain. Recent studies in this area concentrate on traditional classification approaches such as Decision Tree and Support Vector Machines (SVM). These approaches are employed to identify the author. The main goal of these researches is increasing the accuracy of identification, but the quality of evidence is ignored and also it is hard to be traced. So, in this paper, we propose a new approach based on data mining methods for improving the quality of evidence which leads to boost the accuracy of identification. We use writeprints as the evidence and extract them from each E-mail of individuals. The next step for author identification, is matching the writeprints with anonymous E-mails by applying Earth Mover Distance (EMD) criterion to identify the plausible author. In addition to high accuracy, EMD can help cybercrime investigators in making decision about anonymous intruder. Experiments with real data in both English and Persian languages, demonstrate the proposed approach can effectively identify the author and capture strong evidence to prove the identification.Keywords: Anonymous, Cybercriminals, Authorship Identification, Writeprint, Frequent Pattern, Similarity Measure