-
Determinants of Environmental Health Related Diseases in Kenya with Generalized Linear Mixed Models: Analysis of Kenya Integrated Household Budget Survey
Jemimah Wangui Muraya,
Beatrice Karanja Kimani,
John Mwangi Ndiritu
Issue:
Volume 5, Issue 4, July 2016
Pages:
162-172
Received:
29 February 2016
Accepted:
11 March 2016
Published:
4 June 2016
Abstract: Generalized linear models (GLMs) form a class of fixed effects regression models for several types of dependent variable, whether continuous, dichotomous or counts. Common GLMs include linear regression, Logistic regression and Poison regression. These models have typically been used a lot in modeling of data arising from a heterogeneous population under the assumption of independence. However, in applied science and in real life situations in general, one is confronted with collection of correlated data (Mark Aerts et al, 2005). This generic term embraces a multitude of data structures, such as multivariate observations, clustered data, repeated measurements, longitudinal data, and spatially correlated data. Generalized Linear Mixed Models (GLMMs) are able to handle extraordinary range of complications in regression-type analyses. They are often used to handle correlations that arise in longitudinal and other clustered data. This study sought to fit GLMMs to Kenya integrated household data collected in 2005/6 to explain different factors and their influence on an individual morbidity in Kenya. The cluster variable was used to introduce the random effect in this data. From the analysis, it was deduced that gender increases the log-odds of an individual getting a disease, while people who are living in good housing conditions reduces the log-odds of an individual experiencing morbidity. Main source of drinking water and the human waste disposal method were significant in explaining individual morbidity in Kenya. This study can however be extended to incorporate other factors such as income level of individuals. Individuals with low level of income are believed to be more likely to experience environmental health related diseases than individuals with higher levels of income.
Abstract: Generalized linear models (GLMs) form a class of fixed effects regression models for several types of dependent variable, whether continuous, dichotomous or counts. Common GLMs include linear regression, Logistic regression and Poison regression. These models have typically been used a lot in modeling of data arising from a heterogeneous population...
Show More
-
Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables
Issue:
Volume 5, Issue 4, July 2016
Pages:
173-179
Received:
1 April 2016
Accepted:
19 April 2016
Published:
4 June 2016
Abstract: Classification problems often suffers from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied and the sample size. This paper is concerned with evaluation of error rate estimators in two group discriminant analysis with multivariate binary variables. Behaviour of eight most commonly used estimators are compared and contrasted by mean of Monte Carlo Simulation. The criterion used for comparing those error rate estimators is sum squared error rate (SSE). Four experimental factors are considered for the simulation namely: the number of variables, the sample size relative to number of variables, the prior probability and the correlation between the variables in the populations. From the analysis carried out the estimators can be ranked as follows: DS, O, OS, U, R, JK, P and D.
Abstract: Classification problems often suffers from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or bo...
Show More
-
Feed Forward Neural Network Versus Kernel Regression a Case of Body Mass Index and Body Dimensions
Nzinga Christine Mutono,
Gichuhi Anthony Waititu,
Wanjoya Anthony Kiberia
Issue:
Volume 5, Issue 4, July 2016
Pages:
180-185
Received:
5 May 2016
Accepted:
18 May 2016
Published:
7 June 2016
Abstract: Body mass index is a measure of body fitness and is considered very important in screening body categories that may lead to health problems. Understanding risk factors of obesity provide more insight and nature of policies that can be put up to fight obesity. However, uncertainty regarding most appropriate means by which to define excess body weight remains. It is important to develop models that best calculate Body Mass Index to help reduce the chances of obesity. The objective of this research ismodeling Body Mass Index using Feed Forward Neural Network and Kernel regression. Modeling will be first done using height and weight alone, later 21 body dimensions will be added. The analysis was based on body dimensions data provided by San Jose State University and the U.S. Naval Postgraduate School in Monterey, California. To determine the best model, Adjusted R2 and Mean Square Error (MSE) were used. From the results of the study, Kernel regression was better in modeling Body Mass Index than Feed Forward Neural Network.
Abstract: Body mass index is a measure of body fitness and is considered very important in screening body categories that may lead to health problems. Understanding risk factors of obesity provide more insight and nature of policies that can be put up to fight obesity. However, uncertainty regarding most appropriate means by which to define excess body weigh...
Show More
-
Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants
Mutua Jennifer Ndanu,
Gichuhi Anthony Waititu,
Wanjoya Anthony Kiberia,
Muia Patricia Nthoki
Issue:
Volume 5, Issue 4, July 2016
Pages:
186-191
Received:
5 May 2016
Accepted:
18 May 2016
Published:
7 June 2016
Abstract: Potential risk on credit applicants is the probability of default on repayment of a credit facility rendered by a commercial bank. To improve efficiency in decision making on credit risk, therefore credit scoring models are developed. The objectives of this research areto classify credit applicants cluster analysis, Artificial Neural Network and K-Nearest neighbours techniques and to compare their predictive accuracy. The analysis was first by training the dataset, where by 70% of the data was used for training and the remaining 30% was used for testing. Finally, the ability of the developed models to forecast trends was investigated. Here we assume that a cluster is homogeneous, if it contains members that have a high degree of similarity. The analysis is therefore based on credit data provided by commercial banks in Kenya used to test the effectiveness of cluster analysis, K-Nearest neighbour (K-NN) and artificial neural network (ANN) models. To determine the best model in classification accuracy, confusion matrix was used. To test for the goodness of fit the chi square test was used. From the results of the study, the researcher concluded that ANN was better in predicting the classification of credit applicants than K-NN and Cluster Analysis.
Abstract: Potential risk on credit applicants is the probability of default on repayment of a credit facility rendered by a commercial bank. To improve efficiency in decision making on credit risk, therefore credit scoring models are developed. The objectives of this research areto classify credit applicants cluster analysis, Artificial Neural Network and K-...
Show More
-
Bayesian Prediction Based on Type-I Hybrid Censored Data from a General Class of Distributions
Issue:
Volume 5, Issue 4, July 2016
Pages:
192-201
Received:
4 May 2016
Accepted:
12 May 2016
Published:
14 June 2016
Abstract: One and two-sample Bayesian prediction intervals based on Type-I hybrid censored for a general class of distribution 1-F(x)=[ah (x)+b]c are obtained. For the illustration of the developed results, the inverse Weibull distribution with two unknown parameters and the inverted exponential distribution are used as examples. Using the importance sampling technique and Markov Chain Monte Carlo (MCMC) to compute the approximation predictive survival functions. Finally, a real life data set and a generated data set are used to illustrate the results derived here.
Abstract: One and two-sample Bayesian prediction intervals based on Type-I hybrid censored for a general class of distribution 1-F(x)=[ah (x)+b]c are obtained. For the illustration of the developed results, the inverse Weibull distribution with two unknown parameters and the inverted exponential distribution are used as examples. Using the importance samplin...
Show More
-
Minimax Estimation of the Parameter of Maxwell Distribution Under Different Loss Functions
Issue:
Volume 5, Issue 4, July 2016
Pages:
202-207
Received:
21 May 2016
Accepted:
6 June 2016
Published:
23 June 2016
Abstract: The aim of this article is to study the Bayes estimation and minimax estimation of the parameter of Maxwell distribution. Bayes estimators are obtained with non-informative quasi-prior distribution under different loss functions, namely, weighted squared error loss, squared log error loss and entropy loss functions. Then the minimax estimators of the parameter are obtained by using Lehmann’s theorem. Finally, performances of these estimators are compared in terms of risks.
Abstract: The aim of this article is to study the Bayes estimation and minimax estimation of the parameter of Maxwell distribution. Bayes estimators are obtained with non-informative quasi-prior distribution under different loss functions, namely, weighted squared error loss, squared log error loss and entropy loss functions. Then the minimax estimators of t...
Show More
-
Food Production Modelling Using Fixed Effect Panel Data for Nigeria and Other 14 West African Countries (1990-2013)
Olatunji Taofik Arowolo,
Matthew Iwada Ekum
Issue:
Volume 5, Issue 4, July 2016
Pages:
208-218
Received:
22 May 2016
Accepted:
31 May 2016
Published:
8 July 2016
Abstract: In this research, the fixed effect panel data predictive model was employed to formulate panel regression models of food production of 15 selected Economic Community of West African States (ECOWAS) using four (4) World Development Indicators (WDI) as explanatory variables. Data were collected from 1990 to 2013. The four WDI are Food imports (% of merchandise imports), Agricultural land (% of land area), Fertilizer consumption (kilograms per hectare of arable land) and Inflation (consumer prices annual %). The fixed effect with cross-sectional seemingly unrelated regression (SUR) static panel data method was employed. The result of the analysis shows that agricultural land and fertilizer consumption have significant positive effect on the food production index of ECOWAS countries, while food imports and rate of inflation have significant negative effect on food production index of the ECOWAS countries. It is seen that 98.8% of the variation in food production among ECOWAS countries can be explained by the variations in food imports, agricultural land, fertilizer consumption and inflation. We therefore recommend that ECOWAS countries should increase agricultural land and fertilizer consumption and reduce food imports and rate of inflation in order to boost their food production level and have excess to export.
Abstract: In this research, the fixed effect panel data predictive model was employed to formulate panel regression models of food production of 15 selected Economic Community of West African States (ECOWAS) using four (4) World Development Indicators (WDI) as explanatory variables. Data were collected from 1990 to 2013. The four WDI are Food imports (% of m...
Show More
-
Estimation of Change Point in Poisson Random Variables Using the Maximum Likelihood Method
Shalyne Nyambura,
Simon Mundia,
Anthony Waititu
Issue:
Volume 5, Issue 4, July 2016
Pages:
219-224
Received:
27 May 2016
Accepted:
18 June 2016
Published:
11 July 2016
Abstract: The point at which a process undergoes a significant shift from its usual course is known as change point. Change point analysis entails testing for the presence of change in a given process, and the location of a single or multiple change points. This study presents a maximum likelihood estimate of a single change point in a sequence of independent and identically distributed Poisson random variables which are dependent on some covariates. A Poisson regression model is used to estimate the mean parameter and the likelihood function. A likelihood ratio test is conducted to check whether change exists with critical values of the test being obtained as in Gombay and Horvath [9]. The procedure is validated for simulated data for cases when there is no change and when there is a predefined change point with special application to incidence of road accidents in Kenya.
Abstract: The point at which a process undergoes a significant shift from its usual course is known as change point. Change point analysis entails testing for the presence of change in a given process, and the location of a single or multiple change points. This study presents a maximum likelihood estimate of a single change point in a sequence of independen...
Show More
-
Modeling Multivariate Correlated Binary Data
Ahmed Mohamed Mohamed El-Sayed
Issue:
Volume 5, Issue 4, July 2016
Pages:
225-233
Received:
13 June 2016
Accepted:
22 June 2016
Published:
13 July 2016
Abstract: This paper provides the model, estimation and test procedures for the measures of association in the correlated binary data associated with covariates in multivariate case. The generalized linear model (GLM) which satisfies the Markov properties for serial dependence, and the alternative quadratic exponential form (AQEF) are employed for multivariate Bernoulli outcome variables. The log-odds ratios as measures of association have been estimated, and the appropriate test procedures are suggested. The over-dispersion measure is investigated for the multivariate correlated binary outcomes. The scaled deviance is used as a goodness of fit of the model. For comparison, we have used the data on the respiratory disorder. In such situation, we indicate that the vectorized generalized linear models (VGLM) and AQEF procedures have the same estimates of regression parameters in the bivariate case.
Abstract: This paper provides the model, estimation and test procedures for the measures of association in the correlated binary data associated with covariates in multivariate case. The generalized linear model (GLM) which satisfies the Markov properties for serial dependence, and the alternative quadratic exponential form (AQEF) are employed for multivaria...
Show More
-
Extreme Values Modelling of Nairobi Securities Exchange Index
Kelvin Ambrose Kiragu,
Joseph Kyalo Mung’atu
Issue:
Volume 5, Issue 4, July 2016
Pages:
234-241
Received:
21 June 2016
Accepted:
28 June 2016
Published:
13 July 2016
Abstract: Extreme events and the clustering of extreme values provide fundamental information which can be used for risk assessment in finance. When applying extreme value analysis to financial time series we handle two major issues, bias and serial dependence. The main objective of the study will be to model the extreme values of the NSE all share index using EVT method thus contributing to empirical evidence of the research into the behavior of the extreme returns of financial series in East Africa and specifically Kenya. This study will model the extreme values of the Nairobi Securities Exchange all share index (2008-2015) by applying the Extreme Value Theory to fit a model to the tails of the daily stock returns data. A GARCH-type model will be fitted to the data to correct for the effects of autocorrelation and conditional heteroscedasticity before the EVT method is applied. The Peak-Over-Threshold approach will be employed with the model parameters obtained by means of Maximum Likelihood Estimation. The models goodness of fit will be assessed graphically using Q-Q and density plots.
Abstract: Extreme events and the clustering of extreme values provide fundamental information which can be used for risk assessment in finance. When applying extreme value analysis to financial time series we handle two major issues, bias and serial dependence. The main objective of the study will be to model the extreme values of the NSE all share index usi...
Show More
-
Multinomial Logistic Regression for Modeling Contraceptive Use Among Women of Reproductive Age in Kenya
Anthony Makau,
Anthony G. Waititu,
Joseph K. Mung’atu
Issue:
Volume 5, Issue 4, July 2016
Pages:
242-251
Received:
14 June 2016
Accepted:
24 June 2016
Published:
23 July 2016
Abstract: Contraceptive use is viewed as a safe and affordable way to halt rapid population growth and reduce maternal and infant mortality. Its use in Kenya remains a challenge despite the existence of family planning programmes initiated by the government and other stakeholders aimed at reducing fertility rate and increasing contraceptive use. This study aimed at modeling contraceptive use in Kenya among women of reproductive age using Multinomial logistic regression technique. A household based cross-sectional study was conducted between November 2008 and March 2009 by Kenya National Bureau of Statistics on women of reproductive age to determine the country’s Contraceptive Prevalence Rate and Total Fertility Rate among other indicators, whose results informed my data source. Multinomial logistic regression analysis was done in R version 3.2.1. statistical package. Modern method was the most preferred contraceptive method, of which Injectable, female sterilization and pills were the common types. Descriptive Analysis showed richest women aged between 30-34 years used modern contraceptives, while poorer women aged 35-39 years preferred traditional method. Multinomial Logistic Regression Analysis found marital status, Wealth category, Education level, place of Residence and the number of children a woman had as significant factors while age, religion and access to a health facility were insignificant. Simulation study showed that MLR parameters estimates converged to their true values while their standard errors reduced as sample size increased. Kolmogorov-Smirnov statistic of the MLR parameter estimates decreased while the P-value increased as the sample size increased and remained statistically insignificant. Marital status, Wealth category, Education level, place of Residence and the number of children a woman had could determine the contraceptive method a woman would choose, while age, religion and access to a health facility had no influence on the decision of choosing folkloric, traditional or modern method of contraception. MLR parameter estimates are consistent and normally distributed.
Abstract: Contraceptive use is viewed as a safe and affordable way to halt rapid population growth and reduce maternal and infant mortality. Its use in Kenya remains a challenge despite the existence of family planning programmes initiated by the government and other stakeholders aimed at reducing fertility rate and increasing contraceptive use. This study a...
Show More