The standard says three samples. The customer's quality engineer says three samples does not give them statistical confidence. Your programme timeline has budget for two more samples and six more weeks. The product launch date does not move. You need a number that is technically defensible and practically achievable.
The answer is not in the standard. Standards specify minimum sample sizes for compliance purposes — they are not statistical frameworks. The answer requires understanding what question you are trying to answer with the test, what failure rate you need to detect, and what confidence you need in the detection. These three parameters determine the sample size. The standard's minimum is a floor, not an answer.
What you are actually trying to prove
Before choosing a sample size, decide what you are trying to demonstrate. There are three fundamentally different objectives, and they have different sample size implications.
Compliance demonstration: You are demonstrating that the product, tested under the specified conditions, produced no failures. You are not making a statistical claim about the population — you are demonstrating that the specific test was passed. Most qualification programmes operate in this mode. The minimum sample size is defined by the standard or customer specification. Typically two to five samples for electronics qualification (JEDEC), three samples for automotive (ISO 16750), and three to six samples for medical device qualification (ISO 14971-referenced testing).
Reliability estimation: You are trying to estimate the failure rate of the production population under field conditions, using accelerated test data. This requires a statistical model — typically Weibull analysis — and a defined confidence level. The sample size required depends on the target reliability (R), the confidence level (C), and the expected number of failures (r). For zero failures in test, the relationship is n ≥ ln(1−C)/ln(R). To demonstrate 90% reliability at 90% confidence with zero failures: n ≥ ln(0.1)/ln(0.9) = 21.9. You need 22 samples. This is why reliability demonstration tests are expensive — and why they are rarely run at the sample sizes the statistics actually require.
Failure mode discovery: You are running enough samples to have a reasonable probability of encountering the failure modes the test is designed to accelerate. This is the mode that HALT operates in — you are not trying to make a statistical claim, you are trying to find weaknesses before they reach production. Sample sizes for HALT are typically one to three units, because the objective is discovery, not demonstration.
The zero-failures case
Most qualification tests produce zero failures — if failures occur, the design is typically redesigned before qualification proceeds. The statistical interpretation of a zero-failure result depends on the sample size. With three samples and zero failures, you have demonstrated 63.2% reliability at 90% confidence (from the formula above: R = e^(ln(1−0.9)/3) = 0.632). That is not a high bar. With ten samples and zero failures, you have demonstrated 20.6% reliability at 90% confidence — which is actually lower, because the formula is R = e^(ln(1−C)/n) and as n increases with C held constant, R decreases. This seems counterintuitive but is mathematically correct: the confidence interval on a zero-failure result at ten samples with 90% confidence gives you a lower bound on reliability than three samples at the same confidence, because you have more information about the failure rate distribution.
The correct interpretation of three samples with zero failures is: at 90% confidence, the true reliability is at least 63.2% — meaning at most 37% of production units will fail under the applied stress level. Whether that is acceptable depends entirely on the application. For a safety-critical automotive component, it almost certainly is not. For a consumer electronics accessory, it may be. The standard's minimum sample size defines what is needed for compliance, not what is needed for statistical confidence at a specified reliability level.
Accelerated life testing and sample size
When the test uses an acceleration model — HALT, HAST, high-temperature operating life — the relationship between test duration, acceleration factor, and required sample size becomes important. If an accelerated test at 130°C represents ten years of field life at 40°C (using an Arrhenius model with Ea = 0.8 eV), and you run ten samples for 1,000 hours with zero failures, you can make a claim about the field life distribution. The sample size required to make that claim at a specified confidence depends on the Weibull slope — which requires either prior data from similar products or an assumed value, which introduces its own uncertainty.
The practical implication: accelerated life tests require either more samples than standard qualification tests, or more explicit assumptions about the acceleration model and Weibull parameters, or both. A test plan that specifies an accelerated test and makes reliability claims from the results should document the acceleration factor, the model used to derive it (Arrhenius, Coffin-Manson, inverse power law), the parameter values used, and the source of those values. The HAST methodology — which uses pressure and humidity as acceleration factors — is covered in the context of climatic chambers in Humidity Testing in Electronics.
When the customer asks for more samples than you have
The conversation that starts "three samples isn't enough" is usually a proxy for a different concern — either the customer has had field failures with similar products and doesn't trust qualification data, or they are applying a sample size rule from a different domain (medical, automotive) to a context where it doesn't apply, or they have a specific reliability target in mind but haven't stated it explicitly. Ask which of these three is the actual concern before proposing a solution.
If the concern is field failure history, more samples won't address it — root cause analysis of the field failures and a demonstration that the test programme would have caught them is the right response. If the concern is a statistical requirement, ask them to specify the target reliability and confidence level, then calculate the required sample size from the formula above and discuss whether it is achievable in the programme timeline. If the concern is domain-specific requirements, confirm which standard they are applying and verify that its sample size requirements apply to the product and test type in question. The test plan framework for documenting these decisions is at How to Write an Environmental Test Plan That Survives an Audit.