While you may have been hoping for rest and relaxation, the title actually refers to Gauge R&R – repeatability and reproducibility. Gauge R&R, or GRR, comprises a substantial share of the effort required by measurement system analysis. Preparation and execution of a GRR study can be resource-intensive; taking shortcuts, however, is ill-advised. The costs of accepting an unreliable measurement system are long-term and far in excess of the short-term inconvenience caused by a properly-conducted analysis.
The focus here is the evaluation of variable gauges. Prerequisites of a successful GRR study will be described and methodological alternatives will be defined. Finally, interpretation of results and acceptance criteria will be discussed.
What is Repeatability and Reproducibility?
To effectively conduct a measurement system analysis, one must first be clear on what comprises a measurement system. Components of a measurement system include:
Repeatability is an estimate of the variation in measurements induced by the measurement equipment – the equipment variation (EV). This is also known as “within-system variation,” as no component of the system has changed. Repeatability measures the variation in measurements taken by the same appraiser on the same part with the same equipment in the same environment following the same procedure. Changing any of these components during the study invalidates the results.
Reproducibility is an estimate of the variation in measurements induced by changing one component of the measurement system – the appraiser – while holding all others constant. This is called appraiser variation (AV) for obvious reasons. Reproducibility is also referred to as “between-systems variation” to reflect the ability to extend analysis to include multiple sites (i.e. “systems”). This extended analysis is significantly more complex than a single-system evaluation and is beyond the scope of this presentation.
Repeatability and reproducibility, collectively, are called “width variation.” This term refers to the width of the normal distribution of measurement values obtained in a GRR study. This is also known as measurement precision. Use of a GRR study can guide improvement efforts that provide real value to an organization. If resources are allocated to process modifications when it is the measurement system that is truly in need of improvement, the organization increases its costs without extracting significant value from the effort. Misplaced attention can cause a superbly-performing process to be reported as mediocre – or worse.
Preparation for a GRR Study
Conducting a successful GRR study requires organization and disciplined execution. Several physical and procedural prerequisites must be satisfied to ensure valid and actionable results are obtained. The first step is to assign a study facilitator to be responsible for preparation, execution, and interpretation. The facilitator should have knowledge of the measurement process, the parts to be measured, and the significance of each step to be taken in preparation for and in the execution of the study. This person will be responsible for maintaining the structure and discipline required to conduct a reliable study.
Next, the measurement system to be evaluated must be defined. That is, all components of the system, as discussed in the previous section, are to be identified. This includes the environmental conditions under which measurements will be taken, part preparation requirements, relevant internal or external standards, and so on. It is important to be thorough and accurate in this step; the definition of the system in use significantly influences subsequent actions and the interpretation of results.
The next few steps are based directly on the system definition; they are not strictly sequential. The stability of the system must be verified; that is, it must be known, prior to beginning a GRR study, that the components of the system will be constant for the duration of the study. Process adjustment, tooling replacement, or procedural change that occurs mid-study will invalidate the results.
The parts to be used in the study must be capable of withstanding several measurement cycles. If the measurement method or handling of parts required to complete it cannot be repeated numerous times without causing damage to the parts, other methods of evaluation may be required.
Review historical data collected by the measurement system, as defined above, to confirm a normal distribution. If the measurement data is not normally-distributed, there may be other issues (i.e. “special causes”) that require attention before a successful GRR can be conducted. The more the data is skewed, or otherwise non-normal, the less reliable the GRR becomes.
Once these prerequisites have been verified, the type of study to be conducted can be chosen. A “short” study employs the range method, while a “long” study employs graphical analysis and the average and range method or the ANOVA method. The flowchart presented in Exhibit 1 provides an aid to method selection. Each method will be discussed further in subsequent sections on execution of GRR studies.
The range of variation over which measurement system performance will be evaluated must also be defined. Though other ranges can be used, we will focus on two: process variation and specification tolerance. Process variation is the preferred range, as it accurately reflects the realized performance of a process and its contribution to measurement variability. The specification tolerance is a reasonable substitute in many instances, such as screening evaluations, or when comparisons of processes are to be made.
Other procedural prerequisites include defining how the measurement sequence will be randomized and how data will be recorded (e.g. automatically or manually, forms to be used). When the procedural activities are complete, physical preparations can begin. This may include organizing the area around the subject measurement equipment to facilitate containment of sample parts; the study may require more part storage than is normally used, for example. Participation of the facilitator may also require some accommodation.
Physical preparations of sample part storage should include provisions for identification of parts. Part identification should be known only to the facilitator, for purposes of administration and data collection, to avoid any potential influence on appraisers.
Once part storage has been prepared, samples can be collected. Samples should be drawn from standard production runs to ensure that they have been given no special attention that would influence the study. The number of samples to be collected will be discussed in the sections pertaining to the different study methods.
The final preparation steps are selection of appraisers and scheduling of measurements. Appraisers are selected from the pool identified in the system definition and represent the range of experience or skill available. Appraisers must be familiar with the system to be studied and perform the required measurements as part of their regular duties.
Scheduling the measurements must allow for part “soak” time in the measurement environment, if required. The appraisers’ schedules and other responsibilities must also be considered. The facilitator should schedule the measurements to minimize the impact on normal operations.
With all preparations complete, the GRR study begins in earnest. The following three sections present the methods of conducting GRR studies.
The range method is called a “short” study because it uses fewer parts and appraisers and requires fewer calculations than other methods. It does not evaluate repeatability and reproducibility separately; instead an approximation of measurement system variability (R&R) results. The range method is often used as an efficient check of measurement system stability over time.
Range method calculations require two appraisers to measure each of five parts in random order. The measurements and preliminary calculations are presented conceptually in Exhibit 2.
The Gauge Repeatability and Reproducibility is calculated as follows:
GRR = R ̅∕d2* ,
where R ̅ is the average range of measurements from the data table (Exhibit 2) and d2* is found in the table in Exhibit 3. For two appraisers (m = 2) measuring five parts (g = 5), d2* = 1.19105.
The commonly-cited value is the percentage of measurement variation attributable to the measurement system, %GRR, calculated as follows:
%GRR = 100% * (GRR/Process Standard Deviation),
where GRR is gauge repeatability and reproducibility, calculated above, and process standard deviation is determined from long-run process data.
Interpretation of GRR results is typically consistent with established guidelines, as follows:
Average and Range Method
The average and range method improves on the range method by providing independent evaluations of repeatability and reproducibility in addition to the composite %GRR value. To perform a GRR study according to the average and range method, a total of 90 measurements are recorded. Three appraisers, in turn, measure each of ten parts in random order. Each appraiser repeats the set of measurements twice.
For each set of measurements, a unique random order of parts should be used, with all measurement data hidden from appraisers. Knowledge of other appraisers’ results, or their own previous measurement of a part, can directly or indirectly influence an appraiser’s performance; hidden measurement data prevents such influence. For the same reason, appraisers should be instructed not to discuss their measurement results, techniques, or other aspects during the study. After completion of the study, such discussion may be valuable input to improvement efforts; during the study, it only serves to reduce confidence in the validity of the results.
Modifications can be made to the standard measurement sequence described above to increase the efficiency of the study. Accommodations can be made for appraisers’ non-overlapping work schedules, for example, by recording each appraiser’s entire data set (10 parts x 3 iterations = 30 measurements) without intervening measurements by other appraisers. Another example is an adaptation to fewer than 10 parts being available at one time. In this situation, the previously described “round-robin” process is followed for the available parts. When additional parts become available, the process is repeated until the data set is complete.
A typical GRR data collection sheet is shown in Exhibit 4. This form also includes several computations that will be used as inputs to the GRR calculations and graphical analysis.
Graphical analysis of results is an important forerunner of numerical analysis. It can be used to screen for anomalies in the data that indicate special-cause variation, data-collection errors, or other defect in the study. Quickly identifying such anomalies can prevent effort from being wasted analyzing and acting on defective data. Some example tools are introduced below, with limited discussion. More information and additional tools can be found in the references or other sources.
The average of each appraiser’s measurements is plotted for each part in the study on an average chart to assess between-appraiser consistency and discrimination capability of the measurement system. A “stacked” or “unstacked” format can be used, as shown in Exhibit 5.
Exhibit 5: Average Charts
Similarly, a range chart, displaying the range of each appraiser’s measurements for each part, in stacked or unstacked format, as shown in Exhibit 6, can be used to assess the measurement system’s consistency between appraisers.
Exhibit 6: Range Charts
Consistency between appraisers can also be assessed with X-Y comparison plots. The average of each appraiser’s measurements for each part are plotted against those of each other appraiser. Identical measurements would yield a 45° line through the origin. An example of X-Y comparisons of three appraisers displayed in a single diagram is presented in Exhibit 7.
A scatter plot, such as the example in Exhibit 8, can facilitate identification of outliers and patterns of performance, such as one appraiser that consistently reports higher or lower values than the others. The scatter plot groups each appraiser’s measurements for a single part, then groups these sets per part.
If no data-nullifying issues are discovered in the graphical analysis, the study proceeds to numerical analysis. Calculations are typically performed on a GRR report form, such as that shown in Exhibit 9. Values at the top of the form are transferred directly from the data collection sheet (Exhibit 4). Complete the calculations prescribed in the left-hand column of the GRR report form; the values obtained can then be used to complete the right-hand column. The formulas provided on both forms result in straightforward calculations; therefore, we will focus on the significance of the results rather their computation.
The right-hand column of the GRR report contains the commonly-cited values used to convey the effectiveness of measurement systems. The following discussion summarizes each and, where applicable, offers potential targets for improvement.
Equipment variation (%EV) is referred to as gauge repeatability, our first “R.” A universally-accepted limit on repeatability is not available; judgment in conjunction with other relevant information is necessary. If repeatability is deemed unacceptable, or in need of improvement, potential targets include:
Part variation (%PV) is something of a counterpoint to R&R. when %GRR is in the 10 – 30% range, where the system may be acceptable, high %PV can be cited to support acceptance of the measurement system. If equipment and appraisers contribute zero variability to measurements, %PV = 100%. This, of course, does not occur in real-world applications; it is an asymptotic target.
The final calculation on the GRR report, the number of distinct categories (ndc), is an assessment of the measurement system’s ability to distinguish parts throughout the range of variation. Formally, it is “the number of non-overlapping 97% confidence intervals that will span the expected product variation.” The higher the ndc, the greater the discrimination capability of the system. The calculated value is truncated to an integer and should be 5 or greater to ensure a reliable measurement system.
To conclude this section, three important notes need to be added. First, nomenclature suggests that a GRR study is complete with the calculation of %EV, %AV, and %GRR. However, %PV and ndc are included in a “standard” GRR study to provide additional insight to a measurement system’s performance, facilitating its evaluation and acceptance or improvement.
Second, some sources refer to %GRR as the Precision to Tolerance, or P/T, Ratio. Different terminology, same calculation.
The final note pertains to the evaluation of a system with respect to the specification tolerance instead of the process variation, as discussed in the Preparation for a GRR Study section. If the specification tolerance (ST) is to be the basis for evaluation, the calculations of %EV, %AV, %GRR, and %PV on the GRR report (Exhibit 9, right-hand column) are to be made with TV replaced by ST/6. Judgment of acceptability must also be adjusted to account for the type of analysis conducted.
Analysis of Variance Method
The analysis of variance method (ANOVA) is more accurate and more complex than the previous methods discussed. It adds the ability to assess interaction effects between appraisers and parts as a component of measurement variation. A full exposition is beyond the scope of this presentation; we will instead focus on its practical application.
The additional information available in ANOVA expands the possibilities for graphical analysis. An interaction plot can be used to determine if appraiser-part interaction effects are significant. Each appraiser’s measurement average for each part is plotted; data points for each appraiser are connected by a line, as shown in the example in Exhibit 10. If the lines are parallel, no interaction effects are indicated. If the lines are not parallel, the extent to which they are non-parallel indicates the significance of the interaction.
To verify that gauge error is a normally-distributed random variable (an analysis assumption), a residuals plot can be used. A residual is the difference between an appraiser’s average measurement of a part and an individual measurement of that part. When plotted, as shown in the example in Exhibit 11, the residuals should be randomly distributed on both sides of zero. If they are not, the cause of skewing should be investigated and corrected.
Numerical analysis is more cumbersome in ANOVA than the other methods discussed. Ideally, it is performed by computer to accelerate analysis and minimize errors. Calculation formulas are summarized in the ANOVA table shown in Exhibit 12. A brief description of each column in the table follows.
Finally, calculations analogous to those in the right-hand column of the GRR report used in the average and range method (Exhibit 9) can be performed. These calculations, shown in Exhibit 15, define measurement system variation in terms of a 5.15σ spread (“width”), or a 99% range. This range can be expanded to 99.73% (6σ spread) by substituting 5.15 with 6 in the calculations. The ubiquity of “six sigma” programs may make this option easier to recall and more intuitive, facilitating use of the tool for many practitioners.
The notes at the conclusion of the Average and Range Method section are also applicable to ANOVA. The additional calculations are shown, with ANOVA nomenclature, in Exhibit 16. The acceptance criteria also remain the same. The advantage that ANOVA provides is the insight into interaction effects that can be explored to identify measurement system improvement opportunities.
The three methods of variable gauge repeatability and reproducibility study discussed – range method, average and range method, and ANOVA – can be viewed as a progression. As measured features become more critical to product performance and customer satisfaction, the measurement system requires greater attention; that is, more accurate and detailed analysis is required to ensure reliable performance.
The progression, or hierarchy, of methods is also useful for those new to GRR studies, as it allows basic concepts to be learned, then built upon. Only an introduction was feasible here, particularly with regards to ANOVA. Consult the references listed below, and other sources on quality and statistics, for more detailed information.
JayWink Solutions awaits the opportunity to assist you and your organization with your quality and operational challenges. Feel free to contact us for a consultation.
For a directory of “The War on Error” volumes on “The Third Degree,” see “Vol. I: Welcome to the Army.”
[Link] “Statistical Engineering and Variation Reduction.” Stefan H. Steiner and R. Jock MacKay; Quality Engineering, 2014.
[Link] “An Overview of the Shainin SystemTM for Quality Improvement.” Stefan H. Steiner, R. Jock MacKay, and John S. Ramberg; Quality Engineering, 2008.
[Link] “Measurement Systems Analysis,” 3ed. Automotive Industry Action Group, 2002.
[Link] “Introduction to the Gage R & R.” Wikilean.
[Link] “Two-Way Random-Effects Analyses and Gauge R&R Studies.” Stephen B. Vardeman and Enid S. VanValkenburg; Technometrics, August 1999
[Link] “Discussion of ‘Statistical Engineering and Variation Reduction.’” David M. Steinberg; Quality Engineering, 2014.
[Link] “Conducting a Gage R&R.” Jorge G. Tavera Sainz; Six Sigma Forum Magazine, February 2013.
[Link] Basic Business Statistics for Managers. Alan S. Donnahoe; John Wiley & Sons, Inc., 1988.
[Link] Creating Quality. William J. Kolarik; McGraw-Hill, Inc., 1995.
Jody W. Phelps, MSc, PMP®, MBA
JayWink Solutions, LLC
If you'd like to contribute to this blog, please email firstname.lastname@example.org with your suggestions.
© JayWink Solutions, LLC