Extracting and combination efficient feature from protein sequence for classify protein based on rotation forest

Author(s):

Jamshid Pirgazi , Ali Ghanbari Sorkhi* , Majid Iranpour Mobarakeh

Message:

Article Type:

Research/Original Article (دارای رتبه معتبر)

Abstract:

Protein function prediction is one of the main challenges in bioinformatics, which has many applications. In recent years, many researches in this field have been used machine learning methods. In these methods, First, different features should be extracted from the protein sequence and classification should be done based on the extracted features. The feature extraction methods are based on the physical and chemical properties of the protein sequence. Therefore, extracting suitable features from protein sequence increases and improves the performance of machine learning methods. In this paper, usage of a new set of features based on Position-Specific Scoring Matrix (PSSM), Pseudo-Position Specific Scoring Matrix (PsePSSM), K-gram, Amino Acid Composition (AAC) and the new Term Frequency and Category Relevancy Factor (TFCRF) method, which has not been used in this application so far, is proposed to extract suitable features. In the PSSM method for protein BLAST searches, a scoring matrix is used, in which amino acid substitution scores are given separately for each position in a multi-sequence protein alignment. The PsePSSM feature is described by considering different ranking correlation factors along a protein sequenc to preserve information about the amino acid sequence. The normalized occurrence frequency of a certain number of amino acids in the protein is calculated by the ACC method. An K-gram is a set of K successive items in a protein that include amino acid. In the TFCRF weighting method, in addition to paying attention to how these are distributed in different sequences, how these are distributed in different classes is also paid attention to.The features extracted using this method give machine learning models a good discriminating power between data in classes. In the next step, classification is done using the extracted features using the rotation forest method. This classifier is a successful ensemble method for a wide range of data mining applications. In this method, the feature space is changed through Principal Component Analysis (PCA), which increases the power of this classifier. The proposed method has been compared to different classifiers. The results show that the efficiency of the proposed method is much better than other state-of–the-art methods in this application.

Keywords:

Protein Sequence , Feature Extraction , TFCRF , Rotation Forest , Relevancy Factor

Language:

Persian

Published:

Signal and Data Processing, Volume:21 Issue: 1, 2024

Pages:

15 to 26

https://magiran.com/p2747980

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با ثبت ایمیلتان و پرداخت حق اشتراک سالانه به مبلغ 1,490,000ريال، بلافاصله متن این مقاله را دریافت کنید.اعتبار دانلود 70 مقاله نیز در حساب کاربری شما لحاظ خواهد شد.

پرداخت حق اشتراک به معنای پذیرش "شرایط خدمات" پایگاه مگیران از سوی شماست.

پست الکترونیکی

اگر مقاله ای از شما در مگیران نمایه شده، برای استفاده از اعتبار اهدایی سامانه نویسندگان با ایمیل منتشرشده ثبت نام کنید. ثبت نام

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر ثبت نام با ایمیل دانشگاهی/سازمانی

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

مقالات دیگری از این نویسنده (گان)

Comparison of three LDA, PCA and ICA Fast methods using fourteen data analysis algorithms to develop a risk assessment management model for export declarations to deal with illegal trade in Iran customs
Hassan Ali Khojasteh Aliabadi, Saeed Daei-Karimzadeh *, Majid Iranpour Mobarakeh, Farsad Zamani Boroujeni
International Journal Of Nonlinear Analysis And Applications, Jul 2024
Automatic detection and counting of Tuta absoluta (Myrick) using deep learning technique
Alireza Shabaninejad, Abbas Ali Zamani *, Majid Iranpour, Saeed Abbasi, Faranak Ranjbar
Applied Entomology and Phytopathology,

علمی مصوب

فصلنامه پردازش علائم و داده ها

Signal and Data Processing

فصلنامه فنی مهندسی

آخرین شماره | آرشیو

ISSN: 2538-4201 eISSN: 2538-421X

صاحب امتیاز:

پژوهشگاه توسعه فناوری های پیشرفته خواجه نصیرالدین طوسی

مدیر مسئول:

دکتر جواد شیخ زادگان

سردبیر:

دکتر محمدحسن قاسمیان

تلفن نشریه: ۰۲۱-۸۳۸۵۷۶۰۵

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله

سامانه نویسندگان

Author (3)

Iranpour Mobarakeh, Majid

Assistant Professor computer engineering and IT, Payame Noor University, تهران, Iran

اطلاعات نویسنده(گان) توسط ایشان ثبت و تکمیل شده‌است. برای مشاهده مشخصات و فهرست همه مطالب، صفحه رزومه را ببینید.

به جمع مشترکان مگیران بپیوندید!

Extracting and combination efficient feature from protein sequence for classify protein based on rotation forest

Jamshid Pirgazi , Ali Ghanbari Sorkhi* , Majid Iranpour Mobarakeh

Protein Sequence , Feature Extraction , TFCRF , Rotation Forest , Relevancy Factor

فصلنامه پردازش علائم و داده ها

Signal and Data Processing