-
Outlier Detection Technique for Univariate Normal Datasets
Ooko Silas Owuor,
Troon John Benedict,
Otieno Okumu Kevin
Issue:
Volume 11, Issue 1, January 2022
Pages:
1-12
Received:
19 December 2021
Accepted:
8 January 2022
Published:
21 January 2022
Abstract: This paper presents an outlier detection technique for univariate normal datasets. Outliers are observations that lips an abnormal distance from the mean. Outlier detection is a useful technique in such areas as fraud detection, financial analysis, health monitoring and Statistical modelling. Many recent approaches detect outliers according to reasonable, pre-defined concepts of an outlier. Methods of outlier detection such as Gaussian method of outlier detection have been widely used in the detection of outliers for univariate data-sets, however, such methods use measure of central tendency and dispersion that are affected by outliers hence making the method to be less robust towards detection of outliers. The study aimed at providing an alternative method that can be used in outlier detection for univariate normal data sets by deploying the measures of variation and central tendency that are least affected by the outliers (median and the geometric measure of variation). The study formulated an outlier detection formula using median and geometric measure of variation and then applied the formulation on randomly simulated normal dataset with outliers and recorded the number of outliers detected by the method in comparison to the other two existing best methods of outlier detection. The study then compared the sensitivity of the three methods in outlier detection. The simulation was done in two different ways, the first considered the variation in mean with a constant standard deviation while the second test held the mean constant while varying the standard deviation. The formulated outlier detection technique performed the best, eliminating the most required number of outliers compared to other two Gaussian outlier detection techniques when there was variation in mean. The study also established that the formulated method of outlier detection was stricter when the standard deviation was varied but still stands out to be the best as an outlier is defined relative to the mean and not the standard deviation. The study established that the formulated method is more sensitive than the Gaussian Method of outlier detection but performed as well as the best existing outlier detection technique. In conclusion, the study established that the formulated method could be employed in outlier detections for univariate normal data-sets as it performed almost the same to the best existing method of outlier detection for univariate data-sets.
Abstract: This paper presents an outlier detection technique for univariate normal datasets. Outliers are observations that lips an abnormal distance from the mean. Outlier detection is a useful technique in such areas as fraud detection, financial analysis, health monitoring and Statistical modelling. Many recent approaches detect outliers according to reas...
Show More
-
The Determination of Sample Size in a Bayesian Estimation of Population Proportions: How and Why to Do It in a Regression Framework
Issue:
Volume 11, Issue 1, January 2022
Pages:
13-18
Received:
28 December 2021
Accepted:
17 January 2022
Published:
21 January 2022
Abstract: Principally to reduce the cost by reducing the sample size required to conduct survey research, this article presents and illustrates the use of a method to determine the sample sizes required to obtain Bayesian estimates of population proportions with specified margins of error. The development proceeds within a regression framework derived from mental test theory. Specifically, building on prior work, the development presented here enables a researcher to conduct a survey to obtain pure Bayes estimates of the proportion of all members of a defined population choosing each one of a number of mutually exclusive and exhaustive options or falling into each one of a number of mutually exclusive and exhaustive categories, including two. The regression framework not only provides useful insight into Bayesian and classical statistics but also enables the development to proceed without explicit reference to the differing parent distributions of the sample and population proportions, both being asymptotically normal. In addition to the sample-size advantage, which is substantial, this article identifies other practical advantages that Bayesian has over classical estimation of population proportions and, in a somewhat in-depth comparison of the two, discusses other reasons a Bayesian method may be a powerful substitute for the classical method of estimating population proportions via independent random sampling.
Abstract: Principally to reduce the cost by reducing the sample size required to conduct survey research, this article presents and illustrates the use of a method to determine the sample sizes required to obtain Bayesian estimates of population proportions with specified margins of error. The development proceeds within a regression framework derived from m...
Show More
-
Control Chart and Its Applicationin Modelling Body Mass Index (BMI) of Students in Delta State Polytechnic, Oghara
Akpojaro Ogheneochuko Owens,
Agbogidi Bess Rioborue
Issue:
Volume 11, Issue 1, January 2022
Pages:
19-26
Received:
2 December 2021
Accepted:
7 January 2022
Published:
25 January 2022
Abstract: This study aimed to examine the health status of students as a function of the body mass index (BMI) using control chart. The utility of body mass index has proven very useful in helping managers to estimate the weight normality of individuals as a measure of healthy living among them. This study evaluated the reported BMI of students and problems associated with abnormal BMI among students. Stratified sampling was adopted since there are three faculties or schools in the polytechnic with each school having several departments, therefore, A simple random sample of 150 students was selected from the three schools of study in the Delta State Polytechnic, Otefe-Oghara and their BMI were examined through data on weight and height. The result of the study revealed that the students BMI are statistically in control for X-chat and out of control for MR-chat in engineering, both X-chart and MR-chart are out of control for Applied sciences but for school of business, both chats are in statistical control. The fact that the test for randomness proved to be false; it implies that the sample result is evident enough to infer on the general population that their current measures of body mass index are not random. Consequent to the study findings, it was concluded that quality control tools: (control chart for individual unit) is a veritable too for student BMI diagnostics. The study also concludes that most students’ BMI was classified according to the World Health Organization (WHO) standard to be obese, which indicates a huge health risk of various obese-related diseases like diabetes, cardiac issues and even stroke especially for school of engineering and applied sciences.
Abstract: This study aimed to examine the health status of students as a function of the body mass index (BMI) using control chart. The utility of body mass index has proven very useful in helping managers to estimate the weight normality of individuals as a measure of healthy living among them. This study evaluated the reported BMI of students and problems ...
Show More
-
On the Coefficient of Determination Ratio for Detecting Influential Outliers in Linear Regression Analysis
Arimiyaw Zakaria,
Benony Kwaku Gordor,
Bismark Kwao Nkansah
Issue:
Volume 11, Issue 1, January 2022
Pages:
27-35
Received:
1 January 2022
Accepted:
26 January 2022
Published:
9 February 2022
Abstract: The initial procedure of the Coefficient of Determination Ratio (CDR) for determining outliers in linear regression model is suggested for centred data and declares an observation as an outlier if the CDR value deviates from unity. Although the method performs very well and detects more precisely the requisite outliers than those observed by other well-known detection measures, the cut-off rule approach is a source of subjectivity and the data structure for which the method is designed is also restrictive. In this study therefore, a more rigorous cut-off rule of the same method for identifying influential observations is outlined for an updated method of the CDR that covers the more general case of a non-centred data. A cut-off rule is specified that involves the ratio of quantile values of the Beta distribution. An automated implementation of the procedure is presented that makes use of datasets in the literature and those that are simulated under various conditions of sample size, number and distribution of explanatory variables. The method is now made more generalized in application, objective and reliable as a detection measure than the initial proposal. It therefore provides most appreciable improvement in the explanatory power of linear regression models when the identified outliers are deleted from the data.
Abstract: The initial procedure of the Coefficient of Determination Ratio (CDR) for determining outliers in linear regression model is suggested for centred data and declares an observation as an outlier if the CDR value deviates from unity. Although the method performs very well and detects more precisely the requisite outliers than those observed by other ...
Show More
-
Multivariate Analysis of a Sequence of Paired Data Matrices: Succesive and Simultaneous Approaches
Rodnellin Onesime Malouata,
Chedly Gélin Louzayadio,
Bernédy Nel Messie Kodia Banzouzi
Issue:
Volume 11, Issue 1, January 2022
Pages:
36-44
Received:
20 August 2021
Accepted:
7 September 2021
Published:
14 February 2022
Abstract: The relationship between two data matrices has been studied in the interbattery factor analysis. When two data matrices are partitioned in rows, the relationship between two data matrices has been studied in the STATICO method. The main advantage of this method is the optimality of the compromise of co-structures. It is well known that the weighting coefficients of the compromise may be contrary sign in some cases and make it uninterpretable. Thus, many multivariate data analysis methods have been developed, particularly those designed to tackle the fundamental issue: the description of the relationships between two data matrices. This can be studied by successive modeling approaches as well as by a simultaneous modeling approach. These methods are based on co-inertia and can be reduced to finding the maximum, minimum, or other critical values of a ratio of quadratic forms. However, all these methods are successive. In this paper, we propose two algorithms. The first one called sDO-CCSWA (successive Double-Common Component and Specific Weight Analysis) maximizes the sum of squared covariances, by first finding the best pair-component solution, and repeating that process in the respective residual spaces. The sDO-CCSWA is a new monotonically convergent algorithm obtained by searching for a fixed point of the stationary equations. The second approach is a simultaneous algorithm (DO-CCSWA) which maximizes the sum of squared covariances.
Abstract: The relationship between two data matrices has been studied in the interbattery factor analysis. When two data matrices are partitioned in rows, the relationship between two data matrices has been studied in the STATICO method. The main advantage of this method is the optimality of the compromise of co-structures. It is well known that the weightin...
Show More
-
Minimum Number of Replications for Tests in Four-Way ANOVA in Cross Classification and Split-Plot Design
Rob Verdooren,
Dieter Rasch
Issue:
Volume 11, Issue 1, January 2022
Pages:
45-57
Received:
17 January 2022
Accepted:
9 February 2022
Published:
25 February 2022
Abstract: In statistical books for the analysis of designed experiments one can finds sometimes also the computation of the number of replications for balanced one-factor and two-factors designs. Later there were papers published concerning the computation of the number of replications of at most three-factors crossed or nested balanced designs. In 2011 the book “Optimal experimental design with R” was published; further a special R- program OPDOE was made to do the computation for these designs and the OPDOE program was used in this book. In this paper an extension of the determination of the minimum number of replications for balanced designs is given for four-factor crossed designs. The balanced cross classification of the four-way analysis of variance of the following models are investigated: Model 1 The factors A, B, C and D are all fixed; Model 2 D is random A, B and C are fixed; Model 3 C and D are random, A and B are fixed; Model 4 B, C and D are random, A is fixed. For these models small R-programs are given to compute the minimal number of the replications for testing the fixed effects using the non-centrality parameter λ of the non-central F- distribution F(df1, df2, λ). Further balanced Split-Plot design with one or two fixed factors in the main-plots are considered. The Blocks are denoted with B. The F statistics for testing the significance of the fixed factors are described and small R-programs for the determination of the minimal number of replications are given using the non-centrality parameter λ of the non-central F- distribution F(df1, df2, λ).
Abstract: In statistical books for the analysis of designed experiments one can finds sometimes also the computation of the number of replications for balanced one-factor and two-factors designs. Later there were papers published concerning the computation of the number of replications of at most three-factors crossed or nested balanced designs. In 2011 the ...
Show More