Services How We Work Sectors Academy About Contact Open the Toolbox →
Tool 3 Guide

RCM / RBM — Reliability-Centred Maintenance explained

A complete guide to the Bluestream RCM/RBM tool. Hidden failures, consequence categories, P-F intervals, task selection logic — how the decision tree works and why it maps directly to SAE JA1011/JA1012 and NORSOK Z-008:2024.

NORSOK Z-008:2024 SAE JA1011/JA1012 IEC 60300-3-11 ISO 14224:2016
⚡ TL;DR

The RCM/RBM tool takes the failure modes you identified in FMECA (Step 2) and assigns each one a maintenance strategy — on-condition monitoring, scheduled restoration, scheduled discard, failure-finding, run-to-failure, or redesign — following the decision logic set out in SAE JA1011/JA1012 and referenced by NORSOK Z-008:2024.

You don't pick the strategy. You answer questions about each failure mode — is it evident or hidden, is a task technically feasible, is it worth doing — and the tree derives the strategy. If no task works and the failure is safety-critical, the output is compulsory redesign. That's the standard, and the tool enforces it.

What is Reliability-Centred Maintenance?

Reliability-Centred Maintenance is a structured way of answering a single question for every way a piece of equipment can fail: what is the most cost-effective thing we can do to manage this failure?

Before RCM, maintenance programs were built on two assumptions. First, that every piece of equipment has an identifiable "right" age at which it should be overhauled or replaced. Second, that more preventive maintenance always reduces failure risk. Both turned out to be wrong for most equipment. Studies in the airline industry during the 1960s — the origin of RCM — found that for the majority of failure modes, scheduled overhauls did nothing to improve reliability, and in some cases made it worse by introducing errors during reassembly.

RCM reframed the problem. Instead of asking "how often should we service this?", it asks:

The answers produce a maintenance program that is targeted, justified, and defensible. That's what this tool does.

RCM vs RBM: Reliability-Centred Maintenance focuses on task selection for each failure mode. Risk-Based Maintenance is a prioritisation framework that ranks assets or failure modes by risk (probability × consequence) to allocate inspection and maintenance effort. The Bluestream tool implements RCM task-selection logic; the consequence inputs from Step 1 provide the risk dimension for prioritisation. In practice, the two approaches are complementary and this tool supports both workflows.

The standards behind the tool

Four standards shape the RCM/RBM tool. Each contributes something specific; none covers everything.

StandardWhat it contributes
NORSOK Z-008:2024The Norwegian petroleum industry standard for risk-based maintenance and consequence classification. Defines the framework: consequence classes drive task-selection rigour, Generic Maintenance Concepts are the preferred starting point, and tasks must be technically feasible and cost-effective. Z-008 does not prescribe a specific decision tree — it points outward to RCM methodology.
SAE JA1011 & JA1012The authoritative RCM standards. JA1011 defines the seven questions a valid RCM process must answer. JA1012 is the implementation guide with the full decision logic — the actual tree this tool walks. If you want to claim "standards-compliant RCM", you cite JA1011.
IEC 60300-3-11The international (IEC) parallel to SAE JA1012. Covers RCM application in industrial contexts beyond aviation and oil & gas. Z-008:2024 §9.2.1 explicitly points here for task-selection logic.
ISO 14224:2016Not RCM itself, but the taxonomy. Provides the equipment-class definitions and failure-mode codes (ELP, FTS, BRD, etc.) that feed in from Step 2.

Z-008 tells you that you need to select tasks based on consequence. JA1011/JA1012 tells you how. The Bluestream tool implements the how with Z-008 consequence categories as the input.

Z-008 edition change: Z-008:2024 superseded Z-008:2017 on 20 December 2024. The clause numbers shifted (consequence classification moved from §7 to §8; maintenance programme from §8 to §9). The tool supports both editions — if you're maintaining a legacy program under 2017, the references in the output will match. For new work, use 2024. See glossary for the full changelog.

Core concept: evident vs hidden failure

Every failure mode falls into one of two categories. This is the first question the tool asks, and it determines which branch of the decision tree you walk.

Evident failure

An evident failure is one the operating crew will become aware of under normal conditions, without needing to perform a test. The failure is self-announcing.

Examples:

For evident failures, the tool evaluates three task types in strict order: on-condition monitoring (CBM) first, scheduled restoration (SR) second, scheduled discard (SD) third.

Hidden failure

A hidden failure is one that will not become apparent to the crew during normal operation. You only find out it has occurred when the function is demanded — and by then, it's too late.

Hidden failures almost always involve protective functions: equipment that sits idle until something goes wrong elsewhere, at which point it's supposed to spring into action. If it has failed in the meantime, no one knows.

Examples:

The two-failure risk

Hidden failures don't cause accidents by themselves. They cause accidents in combination with the protected failure — the thing they were supposed to catch. The risk calculation is therefore about the probability of both failures coinciding: the protective device failed and the demand for protection arose. This is why hidden failures get their own dedicated task type (failure-finding) and their own interval calculation — the test interval is set to keep the combined probability below a tolerable level.

For hidden failures, the tool evaluates only one task type: failure-finding (FF) — a scheduled functional test to reveal whether the hidden function is still working. CBM, SR, and SD are skipped because they don't apply to hidden failures in the same way.

Rule of thumb: if you'd need to specifically test or inspect the item to know it had failed, it's hidden. If the failure announces itself through process effects, alarms, or observable symptoms, it's evident.

Core concept: consequence categories

After evident/hidden, the tree needs to know the consequence category of the failure. This drives how strict the task-selection logic is. A failure that can kill people gets evaluated very differently from a failure that just costs money.

RCM-II defines three consequence categories for evident failures, plus a fourth branch for hidden:

CategoryMeaningExamples
Safety / Environment (S/E)Failure directly harms people or the environmentLoss of containment on a toxic-gas line; failure of a fire pump; structural failure
OperationalFailure affects production output, product quality, or operational costA feed pump on a process train with no installed spare; a critical valve on the main product line
Non-operationalFailure affects only direct repair costA redundant pump where the spare can carry full duty; a utility that has no operational impact
HiddenProtective function that only matters when demandedPSV, ESD valve, F&G detector, standby generator

How the category is determined

The Bluestream tool doesn't re-ask you for consequence information in Step 3 — you already provided it in Step 1 (Criticality Classification). The RCM category is inherited automatically from your Step 1 output, using this logic:

Consequence category inheritance from Criticality to RCM A flow diagram showing how outputs from Step 1 Criticality Classification — barrier element flag, HSE and environment consequence classes, containment class, production class, and other/cost class — feed into the RCM consequence category selection. Barrier element or high HSE/Environment or high containment drives Safety/Environment category. Production C2 or C3 or Other C3 drives Operational. All C1 values drive Non-operational. Hidden is determined separately by Q1. STEP 1 OUTPUT Barrier element = true HSE C2 or C3 Environment C2 or C3 Containment C3 Production C2 or C3 Other/Cost C3 All other cases (mostly C1) RCM CATEGORY Safety / Environment Redesign compulsory if no task works Operational REVIEW: redesign vs run-to-failure Non-operational Run-to-failure acceptable OVERRIDE The analyst can override the inherited category per failure mode. The override is recorded in the justification string for audit.
How consequence category flows from Step 1 into RCM Barrier elements and high HSE, environment, or containment classes drive Safety/Environment. Production or high Other/Cost classes drive Operational. Everything else is Non-operational. The analyst can override per failure mode if the inheritance doesn't fit.

The override link is important. Inheritance is a sensible default, but it's not always right. A pump bearing might sit under a C3 HSE criticality because of fluid properties — but the specific bearing failure mode is not a safety issue, it's a production one. Override the category for that failure mode, and the tool re-derives the strategy. The justification string captures that the category was overridden, so an auditor reading the output sees the reasoning.

Why inheritance (not re-asking): NORSOK Z-008:2024 §9.2.1 explicitly points to the consequence classification as the input to task selection. Re-asking in Step 3 creates the risk of inconsistency between the two analyses. Inherit by default, override only where the analyst has a specific reason.

Core concept: the P-F interval

The P-F interval is the single most important concept in condition-based maintenance. It's the reason on-condition tasks work at all — and when the interval doesn't exist or isn't usable, CBM is off the table and the tool moves on to time-based tasks.

Definition

The P-F interval is the time between the point at which a failure first becomes detectable (P — Potential failure) and the point at which it progresses to actual functional failure (F — Functional failure).

The P-F interval curve A line graph showing equipment performance on the vertical axis against time on the horizontal axis. Performance stays at 100 percent for most of the operating life, then begins to decline at point P where the first detectable signs appear. Performance continues to drop until it reaches the functional failure threshold at point F. The horizontal distance between P and F is labelled as the P-F interval. Below the curve, a band marks the inspection interval, showing that it should be set to half the P-F interval or shorter so that at least one inspection falls within the window. 100% 0% Performance Time → Functional failure threshold P Potential failure detected F Functional failure P-F interval Inspect here ½ × P-F
The P-F interval Equipment performance sits at design level until the first detectable indicator appears at P. Degradation continues until the functional-failure threshold is crossed at F. The inspection interval for a CBM task must fall inside the P-F window — the rule of thumb is half the P-F interval.

Example: a pump bearing

A feed pump runs continuously. For 40 weeks the vibration signature sits flat — that's normal running wear. In week 41, vibration starts climbing. That's P — a detectable indicator of impending failure. The bearing doesn't fail yet; it just starts warning. Vibration keeps rising over the next 8 weeks. In week 49, vibration reaches a level at which the bearing is about to seize. Week 50, it seizes — that's F. The pump stops.

The P-F interval is 9 weeks. If the maintenance team checks vibration every 4 weeks, they'll catch the failure in progress and have time to plan an intervention. If they check every 12 weeks, they'll miss the window entirely — the bearing will go from fine-at-last-check to seized-and-stopped with no warning.

The three requirements for a valid CBM task

For an on-condition task to be feasible, all three of these must be true:

  1. A clear P point exists. There must be a measurable indicator — vibration, temperature, oil debris, acoustic emission, flow, pressure trend — that changes before functional failure. Random failures with no warning signature are not candidates for CBM.
  2. The P-F interval is long enough to act on. You need time to detect the indicator, diagnose the problem, plan the intervention, mobilise resources, get parts, obtain permits, and execute the repair before F. If the P-F interval is 3 days and your procurement cycle is 6 weeks, the task is not worth doing — you can't act on the information in time.
  3. The P-F interval is reasonably consistent. If the interval varies wildly — 2 days for some failures, 2 years for others — you cannot set a reliable inspection frequency. You either inspect too often (wasteful) or too rarely (miss the short-interval failures).

The tool's CBM feasibility question is really asking all three of these at once. If any one fails, CBM is not feasible and the tree moves to scheduled restoration.

Setting the monitoring interval

The standard rule: set the monitoring interval to half the P-F interval, or less. This guarantees the failure cannot progress from P to F between two consecutive inspections without being caught. The underlying principle is the same as the Nyquist sampling rule in signal processing — you have to sample at least twice as often as the event you're trying to observe.

Typical P-F intervals by monitoring technique

TechniqueTypical P-F intervalTypical inspection interval
Vibration analysis (rolling bearings)1 to 9 monthsMonthly to quarterly
Oil analysis (gearboxes)1 to 6 monthsMonthly to bi-monthly
Thermography (electrical connections)Weeks to monthsQuarterly to annually
Acoustic emission (structural cracks)Days to weeksContinuous monitoring preferred
Process parameter trending (flow, ΔP)Hours to weeksContinuous, automated
Ultrasonic thickness (corrosion)Months to yearsAnnually to once per 5 years

Common mistake: assuming a CBM task is feasible just because the monitoring technology exists. A vibration sensor can be fitted to anything, but if the P-F interval is shorter than your reaction time, the task is useless. The question isn't "can we monitor?" — it's "can we monitor, detect, and act before the failure occurs?"

Core concept: the six task types

RCM recognises four proactive task types and two default actions. The tool evaluates each in a strict order — if the first doesn't work, try the next.

CBM On-condition maintenance

Monitor an indicator that changes before failure. Act when the indicator crosses a threshold. Requires a valid P-F interval (see above). Examples: vibration trending, oil analysis, thermography, process parameter monitoring, acoustic emission.

When it's the right answer: there's a measurable warning sign, the P-F interval is actionable, and monitoring is cheaper than letting the failure happen.

SR Scheduled restoration

At a fixed interval, restore the item to a known-good condition — typically by overhauling, refurbishing, or recoating it. The item continues in service after restoration. Requires age-related wear-out (failure rate rises sharply at an identifiable age) AND a restoration action that actually returns the item to its original resistance to failure.

When it's the right answer: you can't detect a clear P-F signal, but the item wears out predictably with age and can be refurbished cost-effectively. Example: a gearbox overhauled every 10 years; a pump rebuild cycle.

When it isn't: for items that don't show age-related wear-out — most electronics, many hydraulic components. Running clocks on them does nothing. For items where "restoration" doesn't actually restore — you can't meaningfully overhaul a seized bearing back to factory specification.

SD Scheduled discard

At a fixed interval, discard the item and install a new one. No restoration is attempted. Requires age-related wear-out AND a cost-effective replacement strategy. Applies primarily to items that cannot be meaningfully restored.

When it's the right answer: items with a known wear-out life that are cheaper to replace than to overhaul. Examples: filter cartridges, o-rings and seals as part of a major service, batteries with a documented end-of-life, lamps in safety-critical lighting.

FF Failure-finding

Periodically test whether a hidden function still works. Does not prevent failure — it finds failures that have already occurred so they can be corrected before a demand arises. Only applies to hidden failures.

When it's the right answer: the function is hidden (protective devices, standby equipment) and a test is feasible without excessive disturbance. Examples: PSV function test, ESD valve stroke test, fire pump weekly run-test, fire and gas detector response test.

The FF interval is calculated to keep the combined probability of hidden failure and demand for the protected function below a tolerable level. For safety-critical barriers this interval is typically prescribed by IEC 61511 (SIL) or operator-specific risk acceptance criteria, not chosen by the analyst.

RTF Run to failure

Don't do any proactive task. Allow the failure to occur, then respond with corrective maintenance. This is a deliberate choice, not a default — it's the right answer when consequences are low and no task can prevent or predict the failure cost-effectively.

When it's the right answer: non-operational consequences, no feasible proactive task, corrective repair is cheap and quick. Examples: a redundant pump where the spare can carry duty; a non-critical instrument; a lamp that can be changed in five minutes.

RED Redesign

If no proactive task is feasible AND the consequences are Safety/Environment, redesign is compulsory under RCM-II. The equipment itself has to change — a different design, a different material, an added safety layer, or a different operating envelope. Redesign is not "something the analyst might want to consider" — the standard mandates it when the alternative is accepting unacceptable risk.

In the Bluestream tool, RED is derived (never selected by the analyst). If you walk the tree and end up with no feasible task, and the consequence category is Safety/Environment, the output is RED. The analyst's job is then to flag the item to the design team, not to continue the maintenance analysis as if a solution existed.

REVIEW Cost-benefit review required

If no proactive task is feasible AND the consequences are Operational (not S/E), RCM-II calls for a cost-benefit comparison between redesigning and running to failure. Neither is automatically right — it depends on the specific economics of the asset.

The Bluestream tool flags these cases as REVIEW rather than auto-defaulting to RTF. The analyst records the cost-benefit decision in the concept rationale. This keeps the methodology honest — an unresolved Operational case doesn't silently disappear into RTF.

Core concept: feasible AND worth doing

For each proactive task type, the tool asks two questions, not one:

  1. Is the task technically feasible? Can you actually do it? (Does the P-F interval exist? Does the item show age-related wear-out?)
  2. Is the task worth doing? Does it make economic or safety sense? (Is the monitoring cost less than the failure cost? Does restoration actually reduce risk to a tolerable level?)

Both must be yes for the task to be selected. This is explicit in SAE JA1012 and it's the most commonly skipped step in informal RCM analyses — people confirm feasibility and move on, without asking whether the task is justified.

Example of the distinction

Feasible but not worth doing: You can fit vibration monitoring to every small pump in a utility system. Technically feasible — the P point exists, the interval is reasonable, the indicator is measurable. But the failure consequence is trivial (run the spare, repair at leisure) and the cost of monitoring, data analysis, and alarm response is substantial. Not worth doing. Strategy: RTF, not CBM.

Worth doing but not feasible: A fire detector in a hazardous area would be very worth monitoring continuously — but the detector design doesn't expose a monitorable signal between tests. Not technically feasible. Strategy: FF (scheduled functional test), not CBM.

The full decision tree

This is the tree the tool walks for every failure mode. Read left to right. Green outcomes are proactive tasks selected by the tree; orange/red outcomes are derived defaults when no task is feasible.

RCM-II decision tree implemented by the Bluestream RCM/RBM tool A decision flow diagram. Starting from the failure mode on the left, the first branch splits on evident versus hidden. Evident failures walk through three task evaluations in order: CBM, then scheduled restoration, then scheduled discard. Each evaluation asks two questions, technically feasible and worth doing. A Yes to both at any stage produces the corresponding task as the output. If all three evaluations fail, the default action is derived from the consequence category: Safety/Environment goes to Redesign (compulsory), Operational goes to Review, Non-operational goes to Run-to-failure. Hidden failures walk through a single failure-finding evaluation. Pass produces FF; fail produces Redesign if S/E category, otherwise Run-to-failure. Failure mode Evident or hidden? CBM technically feasible? P-F interval exists? Evident CBM worth doing? cost-benefit? Yes CBM On-condition Yes SR technically feasible? age-related wear-out? No (either Q) SR worth doing? Yes SR Scheduled restoration Yes SD technically feasible? discard at fixed age? No SD worth doing? Yes SD Scheduled discard Yes No task feasible derive by category ↓ No RED if Safety/Env REVIEW if Operational RTF if Non-operational FF technically feasible? functional test possible? Hidden FF worth doing? meets tolerable risk? Yes FF Failure- finding Yes No FF feasible → RED if S/E → RTF otherwise No LEGEND Decision Task output Derived Yes path No path
The RCM-II decision tree Evident failures walk CBM → SR → SD in order; each task type requires both feasibility and worth-doing to be confirmed. Hidden failures walk a single failure-finding evaluation. When no task is feasible, the default action is derived from the consequence category — safety/environment failures mandate redesign.

How the tool walks you through it

For each failure mode from your FMECA, the tool asks at most nine questions — usually fewer, because the tree terminates as soon as a task is selected or a default is derived.

Q1. Evident or hidden?

The first question, and the one that determines which branch of the tree you walk. The sub-text in the tool gives you the crisp test: under normal conditions, will the operating crew become aware that this failure has occurred? Self-announcing failures are evident. Failures that only reveal themselves on demand or during test are hidden.

Q2/Q3. CBM evaluation

Asked for evident failures. Q2: is a CBM task technically feasible? This is really asking about the P-F interval — does a measurable warning exist, is it long enough to act on, is it consistent? If Yes, the tool asks Q3: is the CBM task worth doing? Compare monitoring cost (sensor + analysis + alarm response) to failure cost. If both Yes, strategy is CBM and the tree stops.

Q4/Q5. Scheduled restoration evaluation

Asked only if CBM failed (either feasibility or worth). Q4: does the item show age-related wear-out AND can restoration return it to its original resistance to failure? If Yes, Q5: is restoration cost-justified against failure cost? If both Yes, strategy is SR.

Q6/Q7. Scheduled discard evaluation

Asked only if SR failed. Q6: does discarding at a fixed age reduce failure risk? Q7: is periodic replacement cost-justified? If both Yes, strategy is SD.

Q8/Q9. Failure-finding evaluation (hidden only)

Asked for hidden failures, replacing the CBM/SR/SD evaluations. Q8: is a functional test feasible without excessive disturbance and is the test reliable? Q9: does the test interval keep the combined probability of hidden failure plus demand below the tolerable risk threshold? If both Yes, strategy is FF.

No task feasible — derivation

If the tree exits without selecting a task (all evaluations failed), the tool derives the strategy from the consequence category:

Video walkthrough

Full screen-recording of the RCM/RBM tool in use — from FMECA hand-off through to summary output, with commentary on each decision point.

Replace YOUR_VIDEO_ID_HERE with the YouTube video ID when published.

Three worked examples

The same tool, three very different failure modes. Each example shows the path through the tree and the resulting strategy with justification.

Example 1 — Feed pump, seal face wear

CBM

Asset: Centrifugal feed pump, single-stage, end-suction, continuous duty. Pumping a clean hydrocarbon at 15 bar, 80 °C. Redundant spare available but changeover takes 4 hours.

Failure mode (from FMECA): ELP — external leakage, process medium, from mechanical seal face wear.

Step 1 criticality: HSE C2 (hydrocarbon above flashpoint, moderate toxicity), Production C3 (4-hour changeover impacts throughput), barrier element = No. Inherited RCM category: Safety/Environment.

Q1 Evident/hidden? Evident — leak is visible, alarms on containment.
Q2 CBM feasible? Yes — vibration monitoring + process pressure trending give 2–4 weeks P-F interval.
Q3 CBM worth doing? Yes — online vibration sensor is already installed for the motor; marginal cost of trending is negligible against cost of an unplanned leak.
→ Strategy: CBM.

Justification: Category: Safety/Environment (inherited from Criticality). Evident failure branch. On-condition task selected: P-F interval measurable AND monitoring cost-justified. Ref: Z-008:2024 §9.2.1; SAE JA1012 §3.4.

Example 2 — Pressure Relief Valve, fails to open

FF

Asset: Spring-operated PSV on a gas processing vessel. Set pressure 25 bar. Last resort overpressure protection.

Failure mode (from FMECA): FTO — failure to open on demand. Can be caused by corrosion, fouling, spring set, or seat adhesion.

Step 1 criticality: Barrier element = Yes (realises the overpressure protection function per ISO 17776). Inherited RCM category: Safety/Environment.

Q1 Evident/hidden? Hidden — the valve sits closed, providing no observable signal. Failure only reveals itself on demand.
Q8 FF feasible? Yes — PSV pop-test at bench or online test rig is a standard procedure.
Q9 FF worth doing? Yes — interval set per operator SIL/barrier requirements (typically 2–5 years). Cost is small relative to vessel rupture consequence.
→ Strategy: FF.

Justification: Category: Safety/Environment (inherited). Hidden failure branch. Failure-finding task selected: functional test feasible AND test interval keeps multi-failure probability tolerable. Ref: Z-008:2024 §9.3; SAE JA1012 §3.7.

Example 3 — Obsolete fire detector, no test access

RED

Asset: Legacy point-type fire detector installed in a space that has been reconfigured since commissioning. Detector is now behind permanent cladding — accessing it for functional test requires 4-hour scaffolding + permit-to-work cycle.

Failure mode (from FMECA): FTF — failure to function on demand.

Step 1 criticality: Barrier element = Yes. Inherited RCM category: Safety/Environment.

Q1 Evident/hidden? Hidden.
Q8 FF feasible? No — access cost is disproportionate, scheduled testing is effectively impractical at any meaningful frequency.
→ No feasible task. Category = S/E. Strategy derived: RED.

Justification: Category: Safety/Environment (inherited). Hidden failure branch. No feasible FF task identified. Consequences are unacceptable (Safety/Environment). Redesign is compulsory per SAE JA1012 §3.2 — this must be referred to design authority to either relocate the detector, replace with a testable model, or add a second detector in an accessible location. Ref: Z-008:2024 §9.3; SAE JA1012 §3.2.

Note: This is a deliberately realistic example. Facilities accumulate inaccessible safety-critical hardware over their lifetime through modifications, debottlenecking, and insulation changes. RCM flags the problem honestly — continuing to "schedule" a test that nobody can practically execute is worse than no task at all, because it creates false paper compliance.

Common pitfalls

Treating CBM as a default

It's the first task the tree evaluates, and the trendiest in the industry. Analysts who are in a hurry assume CBM is the "modern" answer and stop interrogating. But CBM is only valid when a P-F interval exists, is actionable, and is consistent. Random failures, infant-mortality failures, and failures with no pre-failure signal are not CBM candidates — and that's a large fraction of real-world failure modes. Don't select CBM because it sounds sophisticated; select it because the P-F interval supports it.

Accepting inherited category without thinking

The tool inherits consequence category from Step 1 — that's the sensible default. But Step 1 classified the asset at the equipment level, not the failure-mode level. A pump might be classified S/E because of fluid hazards, but a specific bearing failure mode on that pump might have only operational consequence (the bearing can't release the hazardous fluid). The override link exists precisely for this. Use it when the inherited category doesn't fit the specific failure mode.

Skipping the worth-doing test

Feasible and worth-doing are different questions. Plenty of tasks are feasible but not cost-justified — especially CBM on low-consequence equipment where the sensor + analytics + response stack costs more than the occasional failure. Answering Q2 and moving on, without interrogating Q3, produces an over-engineered program. The two-question drill exists to catch this.

Missing hidden failures

Protective devices, standby equipment, interlocks, and failsafes are easy to forget about precisely because they work silently. If Step 2 (FMECA) didn't capture them, Step 3 can't assess them. Review the FMECA specifically for: pressure relief valves, ESD valves, F&G detectors, fire pumps, standby generators, battery-backed instruments, interlock circuits. Every protective function is a candidate hidden failure.

Trying to "choose" redesign

This tool does not let the analyst select RED as an answer to a question. That's deliberate. SAE JA1011 is explicit: redesign is the consequence of no feasible task being available against unacceptable consequences, not a substitute for task analysis. If you think you need redesign, walk the tree honestly — answer the feasibility questions truthfully. If RED falls out of the tree, it's compulsory and you escalate it. If it doesn't, the task it did select is the right answer, whether you like it or not.

Paper compliance for untestable barriers

As Example 3 shows, there's a strong institutional temptation to schedule a test-every-N-months task on a safety barrier even when nobody can practically execute it. This creates the illusion of compliance while leaving the actual risk uncontrolled. RCM-II and NORSOK Z-008 both require that tasks be actually executable. If FF isn't feasible, escalate to redesign. Don't schedule fiction.

References

Next steps