FAQ Category: semantic governance

  • How can we make ISO 22400 KPI calculations auditable?

    Making ISO 22400 KPI calculations auditable is less about the standard itself and more about how you define, implement, and govern the KPI logic in your systems. In regulated, brownfield plants, auditable KPIs require unambiguous definitions, reliable data capture, controlled calculation logic, and reproducible results backed by evidence.

    1. Start with precise, written KPI definitions

    ISO 22400 describes concepts and reference calculations, but each plant still makes choices. To be auditable, you should maintain a KPI definition sheet for each KPI that includes at least:

    In practice, this connects to ISO 22400 KPI governance when teams need to turn the answer into repeatable execution habits.

    • Name and identifier (e.g., ISO22400_OEE_V1, ISO22400_Availability_V2).
    • Scope: line, machine group, shift, product family, plant; plus time horizon (shift, day, week).
    • Exact formula, including units and references to the specific ISO 22400 clause or figure where applicable.
    • Time base definitions: what counts as planned time, operating time, planned/unplanned downtime, and which status codes map to each bucket.
    • Included and excluded events: e.g., warmup, maintenance, testing, changeovers, engineering trials, rework.
    • Data fields used: system, table, tag, or signal names (from MES, historian, PLC, ERP, QMS, etc.).
    • Aggregation rules: how you roll up across shifts, machines, or orders (e.g., weighted by planned time or output quantity).
    • Known limitations and assumptions: for example, how you handle missing machine states, partial cycles, or backfilled production counts.

    Auditors will challenge anything that is ambiguous or inconsistently applied across lines or plants. Written definitions are the baseline for repeatability.

    2. Make data lineage from KPI to raw signals traceable

    To be auditable, anyone should be able to start from a reported KPI value and trace back to:

    • The underlying time series of machine states and production counts.
    • The work orders, part numbers, and shift calendars involved.
    • The exact calculation logic and version that produced the result.

    Practical steps:

    • Retain time-stamped raw data from MES, SCADA/PLC, historian, and ERP at a resolution that supports reconstruction of events (for many KPIs, 1-second to 1-minute resolution is typical).
    • Maintain a data dictionary that maps source tags and fields (e.g., PLC bits, MES state codes, ERP order status) to KPI categories such as operating time, minor stop, changeover, scrap, or rework.
    • Record data transformations (e.g., state code reclassification, time bucket merging, filtering of obvious noise or outliers) with versioned logic.
    • Keep referential links between production events, work orders, batches, and KPIs (e.g., a KPI instance references specific order IDs and date ranges).

    If your KPI platform cannot show how a value was derived from raw data, auditors will treat it as a dashboard number rather than as reliable evidence.

    3. Put calculation logic under change control

    Many plants implement ISO 22400 KPIs in multiple places: historian scripts, MES reports, BI tools, or custom SQL. This is a common source of non-auditable discrepancies.

    To keep calculations auditable:

    • Centralize KPI logic as much as possible in a single, validated layer (e.g., MES or an analytics engine) and treat that as the system of record.
    • Apply formal change control to KPI definitions and logic: documented change requests, impact assessment, testing, approvals, and effective dates.
    • Version all calculation code and configurations (SQL, ETL flows, scripts, BI measures) in a repository where you can reconstruct the exact logic used on any historical date.
    • Document deviations from ISO 22400: if you implement a plant-specific variant of OEE or availability, label and document it as such rather than calling it “ISO 22400” without qualification.

    In regulated environments, ad hoc dashboard logic without version control is a major audit risk, even if the formulas are mathematically correct.

    4. Validate the full calculation pipeline

    In aerospace, pharma, and other regulated sectors, KPI numbers are often used to support capacity decisions, improvement programs, and sometimes compliance evidence. That makes the calculation pipeline itself subject to validation expectations.

    Consider a basic validation approach:

    • Define intended use: for example, “shift-level ISO 22400 availability and OEE for internal performance management, not used directly for product release decisions.”
    • Perform installation and operational checks: confirm data flows from each source (MES, historian, ERP) are complete, time-synchronized, and secure.
    • Develop test cases: use controlled historical periods where states and counts are known (e.g., a planned training day with a known stop pattern) and verify that your pipeline reproduces the expected KPI values.
    • Document limitations: for example, “Cycle counts on Line 3 prior to date X are underreported during minor stops due to PLC configuration; KPI values before that date are not fully comparable.”

    If your organization is subject to formal CSV or software validation expectations, your KPI tooling and data integration stack may fall in scope. Work with quality and IT to set appropriate validation depth.

    5. Ensure consistent handling of time, shifts, and calendars

    ISO 22400 KPIs such as availability, utilization, and OEE depend heavily on how you define planned time and schedule exceptions. These details are common audit failure points.

    • Use controlled calendars for shifts, holidays, and site-specific events, ideally managed in a master system (MES, HR, or scheduling tool) and propagated downstream.
    • Define standard rules for how you treat early starts, overtime, partial shifts, and overlap between shifts.
    • Classify schedule exceptions explicitly: e.g., planned maintenance, trials, and engineering work that should be removed from planned production time.
    • Synchronize time zones and clocks across OT and IT systems so that event sequences remain reconstructable.

    Auditable KPIs require that two analysts using the same rules and data can independently reproduce the same results for a given period.

    6. Design reports for drill-down and reproducibility

    Auditability also depends on practical usability. Reports and dashboards should support tracing a number back to its components.

    • Make drill-down supported: from monthly OEE to daily, shift-level, machine-level, and event-level views.
    • Show KPI components: for example, for OEE, separately display availability, performance, and quality, with their numerators and denominators.
    • Display applied filters and versions: time range, scope, excluded events, and KPI definition version.
    • Allow export of underlying event data for sampled periods to support manual recalculation and evidence reviews.

    If an auditor cannot inspect how a KPI changed when they adjust time filters or drill down into a particular machine or order, they will question its reliability.

    7. Manage coexistence with legacy MES, historians, and BI tools

    Most plants already calculate some form of OEE or ISO 22400-like KPIs in multiple systems. Full replacement is rarely realistic due to validation burden and downtime risk. Instead:

    • Pick one system as the KPI system of record for ISO 22400-aligned metrics, then document that all official numbers come from there.
    • Map and reconcile existing metrics in legacy tools to the new definitions; document differences (e.g., “Old_OEE includes planned maintenance as downtime; ISO22400_OEE excludes it.”).
    • Phase out non-comparable KPIs or clearly label them as legacy indicators to avoid mixing them with ISO 22400 KPIs in formal reports.
    • Implement interface tests to ensure data extracted from MES/ERP/historian into the KPI engine matches the source (record counts, sums, sample spot-checks).

    Auditability suffers when multiple conflicting KPI values exist for the same period without a clear explanation. Reconciliation and labeling are essential during transition.

    8. Capture governance, ownership, and training

    Even with robust technical controls, ISO 22400 KPI calculations will not be auditable unless people understand and follow the rules.

    • Assign ownership for each KPI (typically operations or industrial engineering) with clear responsibilities for definition, review, and continuous improvement.
    • Set a review cadence where KPI definitions, data quality, and observed anomalies are periodically checked and updated under change control.
    • Train key users (engineers, supervisors, analysts) on the definitions, typical pitfalls (e.g., double-counting, misclassified downtime), and how to respond to audit questions.
    • Maintain an evidence pack for each key KPI: definition documents, sample calculations, validation records, and recent change logs.

    9. What auditors will typically look for

    While specific expectations vary by regulator and customer, auditors reviewing ISO 22400-style KPIs used in decision-making commonly test:

    • Can you explain the formula and link it to ISO 22400 where applicable?
    • Can you trace a reported value back to raw data, with consistent time stamps and event records?
    • Is the calculation logic controlled and versioned, with documented changes and approvals?
    • Are there documented data quality controls and known limitations?
    • Can two people independently recalculate and match a sample KPI period using the same data and rules?

    If you can provide clear answers and evidence for these points, your ISO 22400 KPI calculations will generally be considered auditable, even in complex, mixed-vendor environments.

  • How many KPIs should be global versus local?

    There is no universal number, but the usual answer is: keep the global set small and the local set purposeful.

    For most regulated manufacturing environments, a practical pattern is 5 to 12 global KPIs that are defined consistently across sites, plus a larger set of local KPIs owned by plants, lines, cells, or functions. The exact split depends on process similarity, data quality, governance discipline, and whether sites are actually comparable.

    In practice, this connects to operational visibility when teams need to turn the answer into repeatable execution habits.

    What should be global

    Global KPIs should be limited to measures that meet all of these tests:

    • Leadership needs them for cross-site decisions, not just reporting.
    • The definition can be controlled consistently across plants.
    • The underlying data is available with acceptable quality and timing.
    • The metric can survive normal differences in routing, product mix, batch size, maintenance strategy, and quality workflows.

    Typical candidates include a small set around delivery, quality, schedule adherence, inventory, capacity, or cost of poor quality, but only if the calculation logic is actually harmonized. If one site books rework inside standard routing and another books it as a separate event, the same KPI can mean different things.

    What should stay local

    Local KPIs should capture what operators, supervisors, engineering, and quality teams can actually act on day to day. These often include bottleneck-specific losses, queue time between process steps, first-pass behavior by product family, inspection backlog, tooling availability, training coverage, or specific sources of scrap and rework.

    These measures are often more useful operationally than enterprise dashboards because they reflect local constraints. A site building stable repeat assemblies does not need the same local metrics as a high-mix repair operation or a tightly constrained outside-processing flow.

    Why not standardize everything

    Because full standardization usually breaks on operating reality.

    In brownfield environments, plants often run mixed MES, ERP, QMS, historian, spreadsheet, and manual log processes. They may also differ in work definitions, shift calendars, routing granularity, labor booking, and nonconformance handling. Forcing one enterprise KPI model across all sites without fixing those differences usually creates three problems:

    • Metrics look comparable when they are not.
    • Sites spend time arguing definitions instead of improving performance.
    • Teams create shadow reporting outside controlled systems.

    That is why a layered model is usually safer than an all-global model.

    A practical operating model

    A common structure is:

    • Tier 1 global: a small enterprise scorecard used for portfolio decisions and executive review.
    • Tier 2 functional/global-local hybrid: common categories with limited local parameterization, such as quality loss, schedule attainment, or material availability.
    • Tier 3 local: plant, line, cell, or program KPIs tied to actual constraints and daily management.

    This approach preserves comparability where it matters while allowing sites to manage the process they actually run.

    What ratio is reasonable

    If you need a rule of thumb, many organizations are better off with roughly 20 to 30 percent global and 70 to 80 percent local by count. But do not treat that as a target. Some networks need fewer global KPIs because products and processes vary too much. Others can support more global KPIs if they have strong master data, common routing logic, disciplined change control, and validated system integration.

    The real question is not the count. It is whether each KPI has a clear owner, stable definition, trusted source, and a decision that depends on it.

    Common failure modes

    • Too many global KPIs, which turns review meetings into dashboard maintenance.
    • Global KPIs defined centrally but calculated differently in each plant.
    • Local KPIs with no link to business outcomes, which creates optimization in the wrong direction.
    • Metrics introduced before data readiness, causing manual workarounds and low trust.
    • Replacing existing reporting too aggressively, which is risky in validated or heavily controlled environments.

    That last point matters. Full replacement strategies often fail when legacy reporting is tied into qualified processes, audit evidence, or long-established operational routines. Coexistence is usually more realistic: stabilize a small canonical KPI layer first, map source systems carefully, validate calculations where required, and retire old reports gradually under change control.

    Bottom line

    Use as few global KPIs as you can govern well, and as many local KPIs as teams need to run the process responsibly. If a KPI cannot be defined consistently across sites, it should probably not be global.

  • Who should be on a manufacturing KPI governance council?

    A manufacturing KPI governance council should include the people who own the process, the people who generate or steward the data, and the people who will be held accountable for acting on the metric. If one of those groups is missing, KPI definitions usually drift, reporting becomes political, and local workarounds take over.

    In most plants, the council should include:

    In practice, this connects to ISO 22400 KPI governance when teams need to turn the answer into repeatable execution habits.

    • Operations leadership, because they own throughput, schedule adherence, labor utilization, and day-to-day response.
    • Quality leadership, because many KPIs depend on how scrap, rework, defects, holds, deviations, and escapes are classified.
    • Manufacturing engineering or industrial engineering, because routing structure, cycle assumptions, standard work, and process changes affect KPI meaning.
    • IT and data/integration owners, because KPI reliability depends on source-system logic, interfaces, master data, timestamp quality, and reporting architecture.
    • Finance, if the council governs cost, variance, inventory, or cost-of-poor-quality metrics.
    • Planning or supply chain, if KPIs include schedule attainment, shortage impact, WIP aging, supplier performance, or queue behavior.
    • Site or business-unit leadership, to resolve cross-functional conflicts and approve standards that local teams may resist.
    • System owners for MES, ERP, QMS, historians, or data platforms that feed the governed KPIs.

    If the council is enterprise-wide, add representation from each major plant type or value stream. A single centralized team often misses real differences between discrete assembly, machining, batch processing, test, and repair operations. At the same time, letting every site define KPIs independently usually destroys comparability. The council has to manage that tradeoff directly.

    Who should not be the whole council

    No single function should dominate the group. A KPI council made up only of executives tends to approve metrics that look clean in slides but are weak operationally. A council made up only of analysts or IT teams often produces technically consistent numbers that do not match how the floor actually runs. A council made up only of site operators may optimize for local practicality while losing enterprise consistency.

    It is also a mistake to confuse stakeholders with decision-makers. Not everyone who consumes dashboards needs a vote on metric definitions. Keep the voting group limited, then bring in subject matter experts as needed for specific metrics.

    Typical roles and responsibilities

    The council works best when membership is paired with explicit responsibility:

    • Chair or sponsor to set priorities and break deadlocks.
    • KPI owners for each governed metric, usually from the business function accountable for outcomes.
    • Data owners or stewards for source-system definitions, transformations, and lineage.
    • Validation or quality representatives where reporting changes affect controlled processes, evidence, or decision support in a regulated environment.
    • Change control participants to review proposed definition changes, effective dates, impact analysis, and communication plans.

    Without named ownership, councils often discuss KPI problems repeatedly without fixing the underlying data, workflow, or definition issue.

    How big should it be?

    Usually 6 to 10 core members is enough. Larger groups become review forums instead of governance bodies. If you need broad input, create a smaller decision council and a wider working group underneath it.

    The right size depends on scope. A single-site KPI council can be leaner. A multi-site council with MES, ERP, QMS, and PLM dependencies usually needs more structured representation because changes in one system can alter reporting logic elsewhere.

    Brownfield reality

    In a brownfield environment, council membership should reflect the systems that actually exist, not the architecture leadership wishes it had. If KPI data comes from a mix of legacy ERP, partial MES coverage, spreadsheets, machine data, and QMS records, the council needs members who understand those boundaries. Otherwise, it will approve definitions that cannot be implemented consistently.

    This is also why full replacement is rarely the first answer. Replacing every execution and reporting system to standardize KPIs sounds clean, but in regulated and long-lifecycle operations it often fails under qualification burden, validation effort, integration complexity, downtime risk, and the need to preserve traceability across old and new records. In practice, the council usually has to govern KPI semantics across coexistence, not assume a reset.

    What the council should decide

    A KPI governance council is not just a dashboard review committee. It should decide:

    • which KPIs are official and which are local working metrics
    • the precise business definition for each KPI
    • source systems of record and fallback rules
    • calculation logic, inclusion and exclusion criteria, and effective dates
    • how changes are approved, tested, documented, and communicated
    • where site variation is allowed and where it is not
    • how exceptions, data quality issues, and disputed numbers are escalated

    If the council is not empowered to make those decisions, membership matters less because governance is not actually happening.

    Practical rule of thumb

    If a function can change the meaning of the metric, the availability of the data, or the action taken based on the result, it should be represented directly or through a named owner. If it only consumes the report, it does not necessarily need a seat.

    So the short answer is: include business owners, data owners, system owners, and decision-makers. Keep the group cross-functional, accountable, and small enough to govern definitions under change control. The exact roster depends on your KPI scope, plant diversity, and system landscape.

  • How long should we retain ISO 9001 quality records?

    ISO 9001 does not define specific retention periods for most quality records. Instead, it requires that you:

    • Identify which records are needed to demonstrate conformity and effective QMS operation.
    • Define how long each record type will be retained.
    • Control those records so they are legible, retrievable, and protected for the entire retention period.

    What ISO 9001 actually requires

    ISO 9001:2015 refers to quality records as “documented information” that must be retained to provide evidence of conformity and of the effective operation of the QMS. It requires you to:

    In practice, this connects to the ISO 9001 quality baseline when teams need to turn the answer into repeatable execution habits.

    • Maintain documented information on your processes, including how records are handled.
    • Retain documented information for as long as it is needed for evidence.
    • Control retention and disposition (e.g., through a retention schedule or procedure).

    However, it does not state a universal number of years for retaining nonconformance reports, inspection reports, training records, or similar artifacts.

    Key drivers for record retention periods

    In a regulated, long-lifecycle manufacturing environment, retention time is driven more by external and business requirements than by ISO 9001 itself. Typical drivers include:

    • Legal and liability requirements: Product liability and contract law in your jurisdiction may effectively require retention for the full product life plus a defined period (often several years). This is highly jurisdiction-specific and requires legal input.
    • Customer and contract clauses: Many aerospace, defense, and medical customers specify minimum retention times (e.g., 10, 15, 25 years, or life-of-program). These usually override your default QMS rules.
    • Regulatory and sector standards: Other standards (AS9100, IATF 16949, medical device regulations, etc.) and regulatory bodies may define explicit retention requirements for certain record types.
    • Product and fleet lifecycle: In aerospace and similar sectors, products are in service for decades. Records that support configuration, conformity, and investigations often need to be available for the entire expected life plus a buffer.
    • Internal risk appetite: Organizations with high risk exposure or complex failure modes often choose retention longer than the minimum to support incident investigation and trend analysis.

    Typical retention practices by record type

    The exact numbers must come from your own legal, customer, and regulatory analysis, but in aerospace and other high-liability sectors it is common to see:

    • Design & configuration records (drawings, models, BOMs, ECNs): Life of product + many years, often life-of-fleet or indefinitely, to support traceability and investigations.
    • Manufacturing and inspection records (travelers, inspection reports, test data, certificates of conformity): Frequently 10–25 years or life-of-program; sometimes life-of-product where required by contract or regulation.
    • Nonconformance, MRB, and CAPA records: Typically aligned with related product/lot records, often 10+ years in aerospace-grade environments.
    • Calibration and equipment qualification: Long enough to cover the use of the equipment plus an investigation window (for example, equipment life + 5–10 years), especially where measurement error could affect fielded product.
    • Training and competence: At least for the employment period plus a defined number of years, and at minimum for the duration of product realization activities relevant to that operator’s work.
    • Internal audit and management review: Often a rolling multi-year period (for example, 3–10 years), with longer retention if required by customer or sector standards.

    These are descriptive of common practice, not prescriptive rules. They may be insufficient in some regulatory contexts and excessive in others.

    Brownfield and system coexistence considerations

    Long retention times are often at odds with how legacy MES, ERP, PLM, and file systems were originally configured. Practical issues include:

    • Archival vs. online storage: You may need a layered approach where recent records remain online in MES/ERP and older records are migrated to an archive or records-management system with controlled access and metadata for retrieval.
    • System replacement risk: Full replacement of legacy systems purely to “fix” retention often fails in aerospace-grade environments due to validation burden, downtime risk, and the effort required to migrate and re-qualify historical data. Incremental digitization and targeted archival projects are usually more realistic.
    • Data integrity and format obsolescence: For multi-decade retention, you need a plan to maintain readability as software and formats change, including controlled migrations under change control.
    • Linkage across systems: Records often span QMS, MES, ERP, PLM, and LIMS. Your retention strategy has to preserve traceability across system boundaries, not just within a single application.

    How to define retention in your QMS

    To operationalize ISO 9001 requirements, most organizations create a documented retention schedule or matrix that:

    • Lists key record types (e.g., travelers, FAI reports, calibration certificates, NC/CAPA, training, audits).
    • Specifies the required retention period for each, with references to the sources (legal, customer, regulatory, internal policy).
    • Identifies the system of record (QMS, MES, ERP, PLM, document management, etc.).
    • Defines ownership (who is responsible for ensuring retention and controlled disposition).
    • Describes the method of disposal once the retention period ends, including required protections for confidential and export-controlled data.

    This retention schedule should be maintained under document control and updated through formal change control when requirements change (for example, a new customer contract with stricter terms).

    Validation and change control

    Any change that affects how and where quality records are stored, archived, or disposed should be handled under your normal change control and, where applicable, computer system validation processes. This is particularly important when:

    • Migrating records from paper to digital or between digital systems.
    • Introducing new archival technologies or cloud storage.
    • Decommissioning legacy systems that contain historically significant quality records.

    The goal is to demonstrate continued integrity, traceability, and retrievability of records throughout the retention period, despite technology changes.

    Bottom line

    ISO 9001 requires you to define and follow retention rules for quality records, but it does not provide universal timeframes. In long-lifecycle, regulated manufacturing, retention often extends into decades and must be aligned with legal, customer, and regulatory obligations, supported by a realistic strategy for coexistence of legacy and modern systems.

  • What data should feed an aerospace operational visibility platform?

    An aerospace operational visibility platform should be fed by the data needed to explain current execution status, constraints, quality risk, and near-term delivery risk. In most plants, that means a focused, governed set of feeds from execution, quality, material, maintenance, and engineering-change systems, not a bulk copy of everything.

    The practical starting point is this: if a data source does not support a specific operational decision, escalation, or traceability need, it probably should not be part of the first release.

    In practice, this connects to data mapping and system interoperability when teams need to turn the answer into repeatable execution habits.

    Core data domains

    • Production execution data
      Work order status, routing step completion, labor reporting, queue states, dispatch status, rework loops, traveler or digital traveler progress, and machine or cell status where it is reliable enough to support decision-making.

    • Material and inventory data
      Part availability, lot or serial assignments, shortages, kitting status, WIP location, issued versus consumed material, shelf-life controls where applicable, and outside processing status.

    • Quality and nonconformance data
      NCR status, defect categories, scrap and rework events, inspection results, hold points, CAPA linkage where relevant, MRB disposition status, and recurring failure patterns. Without this, visibility often becomes a throughput dashboard that hides quality-driven delay.

    • Traceability and genealogy data
      Serial numbers, lot genealogy, as-built relationships, operator and timestamp records, process parameters tied to product where required, and links to controlled records. In aerospace, visibility that cannot be reconciled back to traceable execution records has limited value.

    • Planning and schedule data
      Planned versus actual completions, constraint dates, due dates, backlog, finite-capacity assumptions if used, and schedule revisions. This is necessary to distinguish true execution problems from planning artifacts.

    • Engineering and change data
      Released revisions, effectivity, open change orders, dispositioned deviations or concessions where relevant to execution, and document version status. If the platform ignores revision and change context, it can misstate readiness and create confusion on the floor.

    • Maintenance and asset readiness data
      Equipment availability, downtime events, calibration status where operationally relevant, planned maintenance windows, and major asset constraints. This matters most when bottleneck equipment or special processes drive output risk.

    • Supplier and outside processing data
      PO to work order linkage, expected receipts, actual receipts, ASN status if available, outsourced processing milestones, supplier NCRs, and critical part delays. For many aerospace programs, supplier latency is a primary source of operational risk.

    • Operational event data
      Alarms, exceptions, manual escalations, blocked queues, missing approvals, and status changes that explain why work is not moving. Event context is often more useful than static KPI snapshots.

    What matters more than volume

    The platform needs data that is:

    • Authoritative for the decision being made. ERP may be authoritative for planned orders, MES for actual execution, QMS for NCR status, and PLM for released configuration.

    • Timely enough for the use case. Some decisions require near-real-time updates. Others only need shift-level or daily refreshes.

    • Contextualized across systems. A machine stop without work order, part, operator, and routing context is usually not enough.

    • Governed with stable definitions for status, completion, hold, shortage, scrap, rework, and similar terms. Plants often discover that disagreement over definitions is a bigger problem than missing data.

    • Traceable back to source records, especially where metrics may drive investigations, customer reporting, or regulated record review.

    Common source systems in a brownfield stack

    In practice, aerospace visibility platforms usually pull from a mix of ERP, MES, QMS, PLM, CMMS or EAM, historians, SCADA or shop-floor connectors, document control systems, and supplier portals. Some plants also need spreadsheets, Access databases, or email-driven trackers in the short term because key operational status still lives there.

    That is not ideal, but it is common. A useful platform often starts by normalizing a limited set of high-value signals across mixed vendors and legacy systems. Full replacement of ERP, MES, PLM, and QMS just to improve visibility is usually not realistic in regulated, long-lifecycle environments because qualification burden, validation cost, downtime risk, integration complexity, and change-control overhead are too high.

    Data to avoid feeding directly without controls

    • Unapproved engineering data or draft revisions

    • Duplicated status fields from multiple systems without source precedence rules

    • Raw machine signals with no filtering, asset model, or production context

    • Manually maintained spreadsheets treated as system-of-record data without ownership and review controls

    • Aggregated KPI feeds with no drill-back to underlying events

    These feeds can create false confidence, conflicting status, and audit-trail gaps.

    Recommended implementation sequence

    1. Define the decisions the platform must support, such as shortage escalation, bottleneck recovery, WIP aging review, or NCR impact assessment.

    2. Map those decisions to required data entities and authoritative source systems.

    3. Standardize critical master and transactional definitions before broad rollout.

    4. Integrate a narrow initial scope, usually work orders, routing status, inventory constraints, NCR status, and revision context.

    5. Add machine, maintenance, supplier, and advanced analytics feeds only after the baseline data is trusted.

    The short answer

    Feed the platform with the minimum cross-functional data needed to answer four questions reliably: What is running, what is blocked, what quality or configuration risk exists, and what will miss plan next. For most aerospace operations, that means coordinated feeds from MES, ERP, QMS, PLM, maintenance, and selected supplier systems, with strict source ownership, traceability, and change control.

    If those basics are not in place, adding more data usually increases noise faster than insight.

  • When should an NCR be escalated into a formal CAPA in aerospace manufacturing?

    An NCR should be escalated into a formal CAPA when the nonconformance points to a systemic problem, not just a one-off defect that can be contained and dispositioned. In aerospace manufacturing, that usually means recurrence, broader process failure, material risk of product escape, significant customer or contractual impact, or weak evidence that the true cause is understood and controlled. If your team is using CAPA for every NCR, that is usually too much. If you almost never escalate, that is usually a sign that issues are being under-classified.

    What usually justifies CAPA escalation

    Common triggers include:

    In practice, this connects to non-conformance management when teams need to turn the answer into repeatable execution habits.

    • Repeat NCRs on the same part family, process step, tool, machine, supplier, or work instruction.
    • Evidence that the problem existed beyond the specific unit found, including potential impact to other lots, serial numbers, or shipped product.
    • Major escape risk, especially where inspection, verification, or workflow controls failed to detect the issue at the intended point.
    • Nonconformances involving critical characteristics, key characteristics, airworthiness-related features, or contractually controlled requirements.
    • Customer complaints, customer-issued corrective action requests, regulator attention, or internal audit findings tied to the event.
    • A trend showing deterioration in yield, rework, scrap, or supplier quality, even if each individual NCR looks small.
    • Repeated use of the same temporary fix, deviation, concession, or MRB disposition without removing the underlying cause.
    • Breakdowns in the quality system itself, such as document control errors, training gaps, calibration failures, invalid software logic, or traceability gaps.

    Those are common signals, not universal rules. The actual threshold should be defined in your quality procedures, risk criteria, customer requirements, and program-specific controls.

    When an NCR may not need CAPA

    Not every NCR should become a CAPA. A single, well-bounded event may stay as an NCR if the issue is clearly contained, the cause is straightforward, the risk is low, no broader population is affected, and the correction does not require system-level change.

    Typical examples are isolated workmanship defects, obvious handling damage, or a single misbuild where the cause is directly observed and corrective action is local and verifiable. Even then, that judgment depends on your risk framework and whether similar events are actually rare in your environment.

    The practical decision test

    A useful rule is this: escalate when disposition answers what to do with the affected product, but does not adequately answer why it happened, whether it could happen elsewhere, and what controlled change will prevent recurrence.

    If your team cannot confidently close those questions inside the NCR workflow, you are usually in CAPA territory.

    What often goes wrong

    The most common failure mode is confusing MRB disposition with corrective action. Scrap, rework, repair, use-as-is, deviation, or concession decisions address the immediate product. They do not by themselves remove the cause. In aerospace settings, that distinction matters because traceability and auditability depend on showing how product disposition, root cause, actions, approvals, and effectiveness checks connect.

    Another common problem is escalation by severity alone. A severe event often does justify CAPA, but low-severity issues can also require CAPA if they are recurring or systemic. A pile of small NCRs can represent a larger control failure.

    The opposite problem is administrative overload. Opening formal CAPAs for routine isolated defects can bury quality teams, delay meaningful investigations, and create poor-quality closures. That weakens the system just as much as under-escalation.

    How this works in brownfield environments

    In many aerospace plants, the NCR starts in one system, MRB decisions happen in another, and CAPA is tracked in a QMS module, ERP quality function, MES workflow, or even a controlled manual process. That split is common. It also creates failure points.

    Escalation tends to break down when:

    • NCR, MRB, and CAPA records are not linked by a common identifier.
    • Part, lot, serial, supplier, routing, and work-center data are inconsistent across systems.
    • Trend thresholds rely on manual spreadsheet review.
    • Closure is allowed before effectiveness checks are complete.
    • Engineering, operations, supplier quality, and quality assurance do not share the same event history.

    Full platform replacement is usually unrealistic in regulated aerospace environments if validated workflows, customer reporting formats, legacy integrations, or qualified production systems are already in place. In practice, most sites improve CAPA escalation by tightening data links, approval paths, and decision criteria across existing MES, ERP, PLM, and QMS tools rather than replacing everything.

    What your procedure should define clearly

    If the escalation boundary is vague, people will make inconsistent decisions. At minimum, your procedure should define:

    • Risk-based escalation criteria.
    • Who can require CAPA initiation.
    • Time limits for containment, investigation, and action plan approval.
    • How NCR, MRB, supplier corrective action, audit findings, and customer issues feed CAPA.
    • When effectiveness checks are required and how long they run.
    • What evidence is needed before closure.

    That sounds basic, but many organizations still rely on tribal judgment. In regulated operations, that usually does not scale well across programs, shifts, or sites.

    Bottom line

    Escalate an NCR to CAPA when the problem is recurring, systemic, high-risk, or evidence shows that existing controls did not prevent or detect it reliably. Do not use CAPA as a default disposition step for every defect, and do not treat product disposition as proof that the underlying issue is resolved. The exact trigger belongs in your QMS, but it should be risk-based, documented, and consistently traceable across the systems you already run.

  • What non-conformance records might FAA or EASA request during an audit?

    FAA and EASA do not work from a fixed, universal checklist of non-conformance records. Instead, they sample evidence that shows your approved processes are being followed and that safety and airworthiness risks are being controlled. In practice, that typically includes the following categories of non-conformance (NC) records and related evidence.

    1. Non-conformance reports and defect records

    Auditors commonly request examples of:

    In practice, this connects to non-conformance management when teams need to turn the answer into repeatable execution habits.

    • Internal non-conformance reports (NCRs) / nonconformance documents raised on parts, assemblies, software, tooling, or processes.
    • External / supplier non-conformance reports and incoming inspection rejections.
    • Concessions, deviations, or waivers raised against design or process requirements.
    • Rework and repair records tied to specific NCs, including re-inspection evidence.
    • Scrap records where material was dispositioned as scrap due to non-conformity.

    They will usually trace from a part, lot, or order back into at least one NC case to confirm that detection and documentation are functioning as defined in your procedures.

    2. Disposition, MRB, and engineering decision records

    Authorities typically focus on how non-conformances were evaluated and dispositioned, not just that they were logged. Expect requests such as:

    • Material Review Board (MRB) records, including documented dispositions (use-as-is, repair, rework, scrap, return to supplier) and justification.
    • Engineering dispositions and approvals for deviations from type design or approved data.
    • Evidence that required signatories (e.g., DER, DOA, delegated engineering, quality) were involved where your procedures require it.
    • Records showing that limits of authority were respected (what shop, MRB, and quality are allowed to decide vs. what must go to design or the approval holder).

    Inadequate justification, missing approvals, or unclear authority boundaries are common audit findings.

    3. Corrective and preventive action (CAPA) records

    For systemic or repeated non-conformances, FAA or EASA will usually expect to see how you addressed root cause. They may request:

    • Corrective action requests linked to significant NCs, escapes, or customer complaints.
    • Root cause analysis records (e.g., 5-Why, fishbone diagrams, FMEA updates) demonstrating structured investigation.
    • Implementation evidence for corrective actions (procedure changes, tooling updates, software changes, training, etc.).
    • Verification of effectiveness (data trends, reduced recurrence, audit or inspection results).
    • Preventive actions where risks were addressed before recurrence or escape.

    Authorities often test whether you escalate appropriately: which non-conformances stay local and which trigger formal CAPA under your quality system.

    4. Traceability and genealogy related to non-conformances

    Beyond isolated NC records, auditors will usually test how you contain and trace issues. They may ask for:

    • Traceability from an NC to affected lots, serial numbers, batches, and delivered products.
    • Evidence of containment actions: holds, quarantines, stock sweeps, and recall decisions.
    • Configuration and revision status of the affected products and processes at the time of non-conformance.
    • Linkage between NC records and associated work orders, travelers, inspection plans, and as-built/as-maintained records.

    Weak linkage between NCs and product genealogy is a significant risk area, especially in brownfield environments where ERP, MES, and QMS are not fully integrated.

    5. Supplier-related non-conformance records

    Regulators pay close attention to how you control and react to supplier issues. They often request:

    • Supplier non-conformance reports, including delivery rejections and quality notifications.
    • Records of supplier corrective actions, including verification of effectiveness.
    • Evidence of flow-down of airworthiness or criticality requirements to suppliers.
    • Supplier performance metrics or trend reports where NCs are aggregated and analyzed.

    In complex supply chains, they may trace a single NC from your shop floor back through multiple tiers to understand systemic risk.

    6. Concessions, deviations, and repairs to approved data

    Where non-conformances affect airworthiness or type design, auditors often sample:

    • Deviation permits, concessions, or waivers, including justification and scope limitations.
    • Repair approvals, references to approved repair data, and evidence that approved instructions were followed.
    • Evidence of feedback to the design organization (e.g., DOA, TC/PC holder) for recurring deviations.
    • Records showing that any deviation from approved data was appropriately controlled and not applied outside its scope.

    Here, traceability to approved design or repair data and clear boundaries of authorization are critical.

    7. Rework, re-inspection, and re-release records

    Authorities may want to see that once a part or assembly is found nonconforming, it does not re-enter the system without proper control. Typical evidence includes:

    • Rework instructions and routing changes linked to the original NC.
    • Post-rework inspection and test results.
    • Updated as-built, as-repaired, or maintenance records reflecting the work performed.
    • Final acceptance and release records indicating the basis for restoring conformity.

    Audit findings often arise where rework is done informally or not fully captured in the traceable record.

    8. Trending, analysis, and management review inputs

    At the quality system level, FAA and EASA may request evidence that you analyze NC data and act on trends. Examples include:

    • Non-conformance trend reports by part family, line, process, or supplier.
    • Risk assessments where recurring NCs affect safety, reliability, or continued airworthiness.
    • Inputs to management review that summarize NC performance and CAPA status.
    • Decisions and actions recorded from management review meetings.

    Authorities are looking for a closed loop: detection, correction, analysis, and prevention.

    9. How system coexistence affects what you can show

    In brownfield environments, non-conformance information is usually scattered across QMS, MES, ERP, and sometimes spreadsheets. This affects what you can readily provide during an audit:

    • If systems are not integrated, be prepared to manually demonstrate linkage (e.g., NCR number to work order to serial number).
    • Interfaces and data transfers should themselves be under change control and validation where your procedures require it.
    • Replacing legacy systems solely to “look better” in audits is risky; regulators care more about control, traceability, and evidence than about specific tools.

    Weak integration does not automatically mean non-compliance, but it raises the burden on local procedures, training, and evidence retrieval during audits.

    10. Constraints and variations you should account for

    The exact non-conformance records requested will depend on:

    • Your approval basis (e.g., Part 21, Part 145, Part 145 approval in Europe, POA/DOA arrangements, production vs. maintenance).
    • The scope of the audit (system-level vs. product-specific, initial approval vs. continued oversight).
    • Your own documented procedures and how you define and categorize NCs, CAPA, and MRB.
    • Past findings, occurrences, or incidents that may trigger targeted sampling.

    There is no guarantee that a specific set of records will satisfy an auditor; what matters is consistency with your approved system, clear traceability, and evidence that non-conformances are controlled, analyzed, and fed back into continuous improvement.

  • How do I handle resistance when new KPIs don’t match legacy numbers?

    Start by assuming the resistance is rational. If a new KPI does not match a legacy number, the problem is usually not attitude alone. It is often a mismatch in definition, timing, source data, filtering rules, event capture, or master data. In regulated and brownfield environments, those differences are common.

    The practical answer is to treat this as a metric reconciliation exercise before treating it as a change management problem. Do not ask teams to trust the new number until you can explain why it differs.

    In practice, this connects to data mapping and system interoperability when teams need to turn the answer into repeatable execution habits.

    What to do first

    • Freeze the definitions. Document exactly how the legacy KPI is calculated and how the new KPI is calculated. Include numerator, denominator, exclusions, time boundary, unit of measure, system of record, and refresh timing.

    • Run both KPIs in parallel. Keep the legacy and new metric visible for a defined period. This reduces political friction and gives operations, quality, and IT a chance to see the variance pattern instead of arguing from anecdotes.

    • Reconcile to source events. Compare a sample of shifts, lots, work orders, machines, or jobs back to the underlying transactions. Differences usually come from status mapping, late postings, duplicate records, manual overrides, scrap treatment, rework handling, or missing downtime codes.

    • Classify the gap. Determine whether the new KPI is measuring the same thing differently, measuring a better version of the same thing, or measuring something else entirely. Those are not the same situation.

    • Set a controlled cutover rule. Do not switch incentive plans, escalation thresholds, or executive reporting to the new KPI until the variance is understood and approved.

    How to respond to resistance

    Do not frame the conversation as legacy versus modern. Frame it as traceability and fitness for use.

    • If the legacy KPI is operationally useful but loosely defined, say that plainly. It may still be valid for local management, but not reliable enough for cross-plant comparison or automated escalation.

    • If the new KPI is technically cleaner but depends on weak integrations, say that too. A better formula does not help if event capture is incomplete or delayed.

    • If the numbers differ because the new system exposes hidden loss, expect pushback. People may read the change as performance deterioration when it is actually measurement tightening.

    • If the new KPI rolls up across systems, explain the integration assumptions. In brownfield plants, ERP, MES, historians, QMS, and spreadsheets often disagree on timing and status. That is a systems reality, not user irrationality.

    Resistance usually drops when people can see three things: where the number comes from, why it changed, and what decisions it should and should not drive.

    What not to do

    • Do not declare the old number wrong without evidence.

    • Do not retire a legacy KPI before the new one is stable.

    • Do not mix old and new definitions in the same trend line without marking the change point.

    • Do not tie compensation, supplier scorecards, or audit-facing narratives to a new KPI before reconciliation and approval.

    • Do not assume a vendor default definition matches your plant reality.

    Governance matters more than persuasion

    The durable fix is governance, not messaging. Put KPI ownership, definition changes, mapping rules, and calculation logic under formal change control. Keep version history. Record who approved the metric, what changed, when it changed, and which reports are affected. That matters in regulated operations because performance measures often feed investigations, CAPA prioritization, release decisions, staffing choices, and management review.

    If you need one rule of thumb, use this: no KPI should become official until operations, engineering, quality, and IT can all trace it from dashboard to source transaction and explain known limitations.

    Tradeoffs to accept

    There is no risk-free path.

    • Long parallel runs improve confidence but slow standardization.

    • Fast cutovers reduce reporting clutter but increase credibility risk.

    • Tighter definitions improve comparability but may break historical continuity.

    • Local exceptions preserve plant reality but weaken enterprise rollups.

    In many regulated, long-lifecycle environments, full replacement of legacy reporting logic is not realistic in one step. Qualification burden, validation effort, downtime constraints, integration complexity, and existing evidence trails usually make phased coexistence the safer approach.