All posts by Gordon Clark

Use of Simulation to Reduce Hospital Emergency Department Waiting Times

The next posts will focus on the effective use of simulation to improve Emergency Department (ED) performance and use an optimization procedure with respect to controllable variables and constants.   Clark (2016) describes the approach and a case study illustrating its application.   Long waiting times and length of stay at hospital emergency departments is an important public health problem. This post describes the use of simulation to improve ED performance. The approach described was applied at the Saint Camille hospital in Paris.   The hospital has about 300 beds and its ED operates 24 hours per day and serves more than 60,000 patients per year.

Long wait times is an increasing problem in the United States, and visits to Hospital EDs has been increasing.   From 1999 to 2009, it had increased by 32% to 136 million annual visits (Hing, Bhuiya 2012).   That is, it increased at annual rate of 2.8%.   For some hospitals, this increase has resulted in crowding and longer wait times to see a provider.   Between 2003 and 2009, the mean wait time to be examined by a provider increased by 25% to 58.1 minutes. However, the distributions of wait times are highly skewed since more serious conditions are treated more quickly.   The median wait time increased by 22% to 33 minutes.

The National Academy of Engineering and the Institute of Medicine prepared a report presenting the importance of systems engineering tools in improving health care processes (Reid, Compton, Grossman et all 2005). They emphasized the use of simulation. A discrete-event simulation of patient flow through an ED represents the ED as it evolves over time. The simulation’s state is stochastic since the processes such as patient arrival times, patient severity, and treatment times are stochastic and represented by random variables.   Thus one must replicate the simulation model to estimate performance measures such as the average waiting time, the histogram of waiting times for a specified set of ED resources such as number of beds, doctor availability and nurse availability.


  1. Clark, Gordon (2016). “Statistics for Quality Improvement” ASQ Statistics Division Digest, 35(2): 22-26.
  2. E. Hing, F. Bhuiya (2012). “Wait Time for Treatment in Hospital Emergency Departments: 2009” National Center for Health Statistics Data Brief, No. 102, August 2012.
  3. P. P. Reid, W. D. Compton, J. H. Grossman et al (2005). Building a Better Delivery System: A New Engineering/Health Care Partnership, National Academies Press, Washington, DC.

Effectiveness of Approaches to Reduce Effect of Multicollinearity

The previous posting describes multicollinearity, a data limitation, and several methods for alleviating it when using existing data or observational data to construct a model for estimating relationships and making predictions.  This posting reviews the results of three experiments, described by Clark (2016), performed to compare the effectiveness of these approaches to alleviate the effects of multicollinearity.

Yeniay and Goktas (2002) used a real data set to predict the performance of the gross domestic product per capita (GDPPC) in Turkey.   They compared the performance of PLS regression, ridge regression (RR), principal component regression (PCR) and ordinary least squares (OLS).   The previous posting gives a brief description of these regression methods.   The data consisted of 80 observations, and each observation represents one of the 80 provinces in Turkey. The data set included 26 predictor variables, and 22 of these variables are highly correlated. They estimated the predictive capability of the models using the leave-one-out approach to estimating the variability of predictions.   PLS regression had the smallest prediction variability, but it was only slightly better than PCR.   The statistical significance of the difference was not examined.   However, the prediction variability of PLS regression and PCR was much smaller than the variability of RR and OLS.

Dumancas and Bello (2015) compared the performance of 12 predictive approaches using machine learning. The objective was to use lipid profile data to predict 5 year mortality after adjusting for confounding demographic variables. The approaches included PLS discriminant analysis, artificial neural network, ridge regression and logistics regression.   PLS discriminant analysis (PLS-DA) is used when Y is a categorical variable like 1 when a person dies in 5 years.     The dataset consisted of 726 individuals of which 121 died in the five year period.   The total dataset was divided into a training set (483 individuals) and a test set (243 individuals). The results ranked PLS-DA first among the 12 approaches for predictive accuracy.   However, the difference with PLS-DA was not statistically significant for artificial neural network, and logistics regression.

Dormann, Elith et al. (2013) evaluated methods for dealing with multicollinearity using simulation experiments.   They created training and test data sets that had 1000 cases and 21 predictors. The condition number is a measure of the degree of collinearity.   A condition number of 10 is approximately equivalent to |r| = .7.   The condition number is the square root of the ratio between the largest and smallest eigenvalue of X. On page 30, the statement is made that several of the latent variable methods were only marginally better than Multiple Linear Regression (MLR) delaying the degeneration of model performance from a condition level of 10 to 30.  MLR involves two or more explanatory variables when fitting a linear equation.   PLS regression uses latent variables.  However, the paper abstract states that latent variable methods did not outperform the MLR method. That conclusion did not apply when the condition number was less than 30.

Our conclusion is that evidence exists that PLS regression can outperform MLR in many situations with multicollinearity. However, severe multicollinearity can degrade PLS regression prediction performance.


    1. Clark, Gordon (2016). “Quality Improvement Using Big-Data Analytics” ASQ Statistics Division Digest, 35(1): 25-29.
    2. Dormann, C. F., J. Elith, et al. (2013). “Collinearity: A Review of Methods to Deal with It and a Simulation Study Evaluating Their Performance.” Ecography 36(1): 27-46.
    3. Dumancas, G. G. and G. A. Bello (2015). Comparison of Machine-learning Techniques for Handling Multicollinearity in Big Data Analytics and High-performance Data Mining. Supercomputing 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis.
    4. Yeniay, O. and A. Goktas (2002). “A Comparison of Partial Least Squares Regression with Other Prediction Methods.” Hacettpe Journal of Mathematics and Statistics 31: 99-111.


Multicollinearity in Big Data

This posting reviews approaches to improve the effectiveness of regression in big-data analytics.  Snee (2015) mentions a challenging problem when using big-data to estimate relationships and make predictions.   The data are likely to be observational and multicollinearity may exist among the predictor variables (Clark 2016). Continue reading Multicollinearity in Big Data

Using Big-Data Analytics to Improve Quality

This post addresses the expanding use of Big-Data Analytics to improve quality in many organizations.   Davenport (2013) defines Big Data as data that is either too unstructured, too voluminous or from too many different sources to be analyzed by traditional approaches.   The label analytics describes the use of big data to drive decisions and actions.  Continue reading Using Big-Data Analytics to Improve Quality

Analysis of a Combined-Array Design in A Robust Parameter Experiment

This posting presents the analysis of results from the Robust Parameter Design Experiment introduced in the previous posting.  The five factors are A = CO2 pressure (bar), B = CO2 temperature oC, C = peanut moisture (% by wt), D = CO2 flow rate (liters/min), and E = peanut particle size (mm).   The factor E is the noise factor).   The purpose of the experiment is to show the effects of these factors on Solubility, S, or mg of oil removed from the peanuts.  Continue reading Analysis of a Combined-Array Design in A Robust Parameter Experiment

Estimating Interaction Effects using a Combined-Array Design in Robust Parameter Experiments

This posting presents another example illustrating the advantage of combined-array experiments in Robust Parameter Designs over the Taguchi crossed-array designs.  Kilgo (1988)  presents an example of a 25-1 fractional factorial experiment providing data to construct a model for estimating the mean response.  We modify the use of the experiment to make it relevant to a Robust Parameter Design experiment. Continue reading Estimating Interaction Effects using a Combined-Array Design in Robust Parameter Experiments

Combined Array Designs in Robust Parameter Experiments

This posting introduces the use combined arrays in Robust Parameter Designs. Combined array designs have both controllable and noise factors in the same experimental design.   The previous posting describes Taguchi Parameter Designs using crossed arrays consisting of an inner array containing the controllable factors and an outer array containing the noise factors. Continue reading Combined Array Designs in Robust Parameter Experiments

Crossed Array Design Problems in Robust Parameter Experiments

This posting describes problems in using Taguchi Parameter designs with crossed arrays in Robust Parameter Designs.  The two previous postings describe an application of Taguchi Parameter Designs to reduce plasma cutter cycle time.  That is, the Taguchi Design posting and the Taguchi Results posting.  The crossed array designs proposed by Taguchi when used with the maximum allowable factors can’t estimate the interaction effects among the controllable factors. Continue reading Crossed Array Design Problems in Robust Parameter Experiments

Plasma Cutter Cycle Time Experimental Design Results

This post describes the results and their analysis of the experimental design to reduce plasma cutter cycle time.  The experimental design is a Taguchi Parameter Design.   The previous posting describes the experimental design, and refers to the Value Stream Map Case Study posting to review the Lean Six Sigma project that produced the experimental design.   Continue reading Plasma Cutter Cycle Time Experimental Design Results

Experimental Design to Reduce Plasma Cutter Cycle Time

This posting describes the corrective action using an experimental design to reduce a machine’s cycle time.  The machine is a plasma cutting machine, and a Lean Six Sigma (LSS) team identified it as the bottleneck operation in producing electrical switchboards by an electrical manufacturer.    Continue reading Experimental Design to Reduce Plasma Cutter Cycle Time