فهرست مطالب

Journal of Biostatistics and Epidemiology
Volume:5 Issue: 2, Spring 2019

  • تاریخ انتشار: 1399/02/16
  • تعداد عناوین: 8
|
  • Alabi Banjoko, Waheed Yahya*, Mohammed Garba Pages 91-104

    In this study, efficient Support Vector Machine (SVM) algorithm for feature selection and classification of multi-category tumour classes of biological samples using gene expression profiles was proposed. Feature selection interface of the algorithm employed the F-statistic of the ANOVA–like testing scheme at some chosen family-wise-error-rate which ensured efficient detection of false-positive genes. The selected gene subsets using the above method were further screened for optimality using the Misclassification Error Rates yielded by each of them and their combinations in a sequential selection manner. In a 10-fold cross-validation, the optimal values of the SVM parameters with appropriate kernel were determined for tissue sample classification using one-versus-all approach. The entire data matrix was randomly partitioned into 95% training set to train the SVM classifier and 5% test set to evaluate the predictive performance of the classifier over 1,000 Monte-Carlo cross-validation runs. Results from Monte-Carlo study showed excellent performance of the SVM classifier with higher prediction accuracy of the tissue samples based on the few gene biomarkers selected by the proposed feature selection method. Published microarray breast cancer dataset with five clinical endpoints was employed to validate the results from the simulation studies

    Keywords: Support Vector Machines, Monte-Carlo CrossValidation, F-Statistic, Family wise error rate, Misclassification Error Rate
  • Naser Ahmadi*, Saeed Shirazi, Hamed Baziyad Pages 105-109
    Background and Aim

    One of the statistical methods used to analyze the time-to-event medical data is survival analysis. In survival models, the response variable is time to the occurrence of an event. The main characteristic of survival data is the existence of censored data. When we have the distribution of survival time, we can use parametric methods. Among the important and popular distributions that can be used, we can mention the Weibull distribution. If the data derives from a heterogeneous population, simple parametric models (such as Weibull) would not fit the data appropriately. One of the methods which have been introduced to overcome this problem is the use of mixture models.

    Methods

    To assess the validity of the two-component Weibull mixture model, we use a simulation method on heterogeneous survival data. For this purpose, data with different sample sizes were produced in a batch of 1000. Then, the validity of the model is checked using root mean square error (RMSE) criterion

    Results

    It is obtained that increasing the sample size would decrease the RMSE in the parameters. However the maximum observed RMSE in all the parameters was negligible.

    Conclusion

    The Bayesian Weibull mixture model was a proper fit for the heterogeneous survival data.

    Keywords: Bayesian mixture model, Survival analysis, Survival models, Weibull mixture, RMSE
  • Amuche Ibenegbu*, George Osuji, Edith Umeh Pages 110-119
    Introduction

    In Nigeria, hypertension is a common sickness among grownups. This research was carried out to determine the best model for predicting survival of hypertensive patients using goodness of fit criteria, Standard Error (SE), Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC).

    Method

    A total of 105 patients who were diagnosed with hypertension from January 2013 to July 2018 were considered in which death is the event of interest. Six parametric models such as; exponential, Weibull, Lognormal, Log-logistic, Gompertz and hypertabastic distribution were fitted to the data using goodness of fit such as S.E, AIC and BIC to determine the best model. The parametric models were considered because they are all lifetime distributions.

    Results

    The result shows that the hypertabastic distribution has the lowest AIC and BIC, followed by Gompertz distribution. The standard error also indicates the hypertabastic model is better because it has the least value of standard error. This indicates that in terms of relative efficiency and parameterization the hypertabastic model is the best. The Survival Probability Plot of the six parametric models shows that the Hypertabastic distribution best fitted the data because it shows a clear step function than the other distribution and this justifies the result SE, AIC and BIC presented.

    Conclusion

    Since hypertabastic distribution has the lowest SE, AIC and BIC it indicates that it is the best parametric model for predicting survival of hypertensive patients in chukwuemeka Odumegwu Ojukwu university teaching hospital Awka, Nigeria.

    Keywords: Survival analysis, Censoring, Parametric models, Hypertabastic
  • Mehari Teklezgi* Pages 120-136
    Background & Aim

    Durum wheat is an economically important and regularly eaten food for billions of people in the world. In the International Center for Agriculture Research in the Dry Areas (ICARDA), genbanks are using Focused Identification of the Germplasm Strategy (FIGS) to find out and quantify relationships between agro-climatic conditions and the presence of specific traits. Hence, the study is aimed to investigate the predictive value of various types of long-term agro-climatic variables on the future values of different traits.

    Method

    Ordinary multiple linear regression with stepwise variable selection method on the complete data set, and multiple linear regression models with predictors selected by penalized methods with mean square error cross-validation as a model selection criterion, are used to analyze 238 durum wheat landraces. Each of the models are fitted on Days to Heading and Days to Maturity response variables with 57 predictor variables, independently. Ordinary least square and weighted least square estimation methods were used.

    Result

    Findings implied that there is high multicollinearity among the predictor variables. It is found that there are some predictors which affect positively and some others affect negatively for both Days to Heading and Days to Maturity using both ordinary and shrinkage based models. It is revealed that the prediction from the lasso based model is not that much reasonable. Furthermore, for the Days to Heading showed that there seems better prediction as their predicted value increase continuously as a function of the actual values though there is considerable variability.

    Conclusion

    In conclusion, inferences and predictions by the ordinary MLR models are not trusted due to the presence of multicollinearity, and violation of some model assumptions. However, predictions using the models with predictors selected by the shrinkage methods may be better as the effects of the variability on these methods are minimal. Moreover, the WLS methods might give more sensible predictions than the OLS estimation methods. Better predictions were found on the Days to Heading.

    Keywords: Cross-validation, Mean Square Error, MLR, Penalized Methods, Lasso, Elastic net, Bias-Variance Trade-off, Weighted Least Square
  • Abbas Mahdavi*, Mohadese Akbarinasab, Alireza Arabpour Pages 137-147
    Background & Aim

    There are various data associated with any events in the world which need to be analyzed. In response to this, many researchers attempt to find appropriate methods that could better fit these data using new models. One of these methods is to introduce new distributions which could better describe available data. During last few years, new distributions have been extended based on existing well-known distributions. Usually, new distributions have more parameters than existing ones. This addition of parameter(s) has been proved useful in exploring tail properties and also for improving the goodness-of-fit of the family under study.

    Methods & Materials

    A new family of distributions is introduced by using truncated log-logistic distribution. Some statistical and reliability properties of the new family are derived.

    Results

    Four special lifetime models of the new family are investigated. We estimate the parameters by maximum likelihood method. The obtained results are validated using a real dataset and it is shown that the new distributions provide a better fit than some other known distributions.

    Conclusion

    We have provided four new distributions. The flexibility of the proposed distributions and increased range of skewness was able to fit and capture features in one real dataset much better than some competitor distributions

    Keywords: Hazard rate function, Log-logistic distribution, Maximum-likelihood estimation, Survival reliability function
  • M. Mazharul Islam, Uzma Marium Pages 148-162
    Background & Aim

    Little is known about twinning in developing countries due to lack of reliable data. However, the large data set from the national level Demographic and Health Surveys (DHSs) in developing countries can fill this gap. This paper examines the level, trends and determinants of twin births, and their risk of survival until age five relative to singletons in Bangladesh.

    Methods & Materials

    The data for the study were obtained from the 2014 Bangladesh DHS. The analysis was based on birth histories of 43,842 live births, experienced by the 17,863 women between 1978 and until survey date November 2014. Frequency distribution, cross tabulation, univariate and multivariate logistic regression models, and demographic methods such as conventional life table approach were used for data analysis.

    Results

    About 1.52% of the total live births in Bangladesh were found found to be twins. The twin birth rate has increased by 13.4% over the last 20 years in Bangladesh. Maternal age, parity, region of residence, economic status, father’s education, contraceptive use status and religion were identified as significant predictors of twin births. Twinning appeared as a significant predictor of high childhood mortality. Twins were found to have more than eight times higher risk of death during neonatal period than that of singletons.

    Conclusion

    The increasing trends in twin births in Bangladesh and the associated higher risk of childhood mortality among twins underscores the need for more focused care strategy during pregnancy and after birth. Further studies are needed to identify the reasons for exceptionally high childhood mortality among twins in Bangladesh

    Keywords: Twin births, Survival, Childhood, Mortality, Bangladesh
  • Yousef Alimohamadi*, Firooz Esmailzadeh, Abdolhalim Rajabi, Zahra Kavosi, Manije Alimohammadi, Mojtaba Sepandi Pages 163-171
    Introduction

    HIV infection is one of the main public health problems in the world. This study aimed to assess the knowledge and attitudes of young couples married in the city of Shiraz, and eventually suggest an Operational Program for the prevention of HIV in Iran.

    Method

    the data collection tool was a questionnaire consisted of 32 questions on transmission and prevention of HIV infection. The young couples were selected through simple random sampling, and the sample size was 400. The data analysis was performed using SPSS 19 software.

    Results

    Of the total of 400 cases, 201 (50.25%) were male and 199 (49.75%) were female. The mean age of the couples was 25.96±5.95 years. The most frequent correct answer was related to the knowledge of transmission through sharing needles among drug users (87.4%). Regarding attitude, 94.6% of the subjects agreed with the struggle against HIV. Examining the relationship between knowledge and age showed that they had a significant relationship (P=0.002). There was also a significant relationship between attitude and gender (P=0.004).

    Conclusion

    One of the important ways to stop the epidemic and prevent the incidence of new cases of HIV is educating people at an early age

    Keywords: HIV, Young Couple, Knowledge, Attitude, Prevention
  • Bayowa Babalola*, BABATUNDE YAHYA Pages 172-182
    Background

    The Cox proportional hazard model has gained ground in Biostatistics and other related fields. It has been extended to capture different scenarios, part of which are violation of the proportionality of the hazards, presence of time dependent covariates and also time dependent co-efficients. This paper focuses on the behaviour of the Cox Model in relation to time coefficients in the presence of different levels of collinearity.

    Objectives

    The objectives of this study are to examine the effects of collinearity on the estimates of time dependent co-effiecients in Cox proportional hazard model and to compare the estimates of the model for the logarithm and the square functions of time.

    Materials and methods

    The Algorithm based on a binomial model was extended in order to incorporate the different correlation structures required for the study. The scaled Schoenfeld residuals plots revealed the behaviour of the estimated betas at different degrees of collinearity. Results and conclusions are based of outcome of simulation study performed only.

    Results

    The estimated betas were compared to the true betas at the different level of collinearity in graphical pattern.

    Conclusion

    The study shows that collinearity is a huge factor that influences the correctness of the estimates of the regressors within the framework of Cox model.

    Keywords: Baseline hazard, Time-dependent coefficient, Collinearity, Schoenfeld residuals