RSC Colour: Red

  • What criteria should drive a scrap decision for aerospace structural parts?

    A scrap decision for aerospace structural parts should be driven first by approved technical requirements and objective evidence, not by part value, schedule pressure, or whether the defect “looks minor.” In practice, the core question is simple: can the part still be shown, with traceable records, to conform to drawing, material, process, and customer or program requirements after any permitted rework or repair? If the answer is no, or cannot be demonstrated credibly, scrap is usually the right decision.

    What should drive the decision

    The decision normally starts with product definition and disposition authority, not shop-floor opinion. For structural parts, the most important criteria are these:

    • Type and severity of nonconformance: dimensional out-of-tolerance, wrong material, heat-treat issue, surface damage, blend-out condition, hole quality problem, coating issue, process excursion, or traceability gap do not carry the same risk.
    • Location and structural criticality: the same defect may be acceptable in a non-critical area and unacceptable in a high-stress, fatigue-critical, bearing, sealing, or mating feature.
    • Engineering allowables and drawing requirements: if approved limits, repair schemes, or rework paths exist, they govern. If they do not, the part is not automatically recoverable.
    • Material and process pedigree: unknown or broken traceability, suspect lot control, or missing process records can force scrap even when the geometry appears recoverable.
    • Effect of rework or repair: additional machining, blending, sleeving, bushing, cold work, weld repair, or other recovery steps may consume life margin, alter fit, or trigger requalification requirements.
    • Inspection evidence quality: the decision depends on reliable measurement, calibrated equipment, valid methods, and a clear understanding of actual versus suspected defect extent.
    • Contractual and customer constraints: some programs restrict repair methods, concession use, or repeated rework more tightly than internal practice would suggest.
    • Disposition authority: MRB, engineering, quality, and sometimes customer approval are required depending on the condition and the contract. Operators and supervisors should not be making final structural accept/scrap calls informally.

    What should not drive it

    Cost matters, but it is not the primary criterion. A high-cost part is not more repairable just because it is expensive to replace. Likewise, delivery pressure is not a technical basis for use-as-is, repair, or rework. In regulated aerospace environments, forcing borderline parts through because capacity is tight usually creates a larger problem later in audit trail review, customer escape analysis, or service risk assessment.

    When scrap is often the right answer

    Scrap is commonly the right outcome when one of these is true:

    • The nonconformance violates a requirement with no approved rework or repair path.
    • The defect affects a critical feature and engineering cannot justify residual strength, fatigue performance, fit, or function.
    • Required traceability is missing or compromised for material, processing, or serialized identity.
    • The proposed recovery would push the part outside other limits, including minimum wall, edge distance, coating thickness, residual stress expectations, or dimensional stack-up.
    • The process excursion is broad enough that the true impact cannot be bounded credibly.
    • The record set needed to support acceptance, concession, or repair cannot be completed with confidence.

    Where MRB and engineering matter

    For aerospace structural hardware, scrap decisions are usually part of a formal nonconformance process. MRB may coordinate the disposition, but engineering rationale is often decisive where structural performance is affected. The exact split of authority depends on company procedures, delegated authority, customer flowdowns, and part criticality.

    This is where many plants get into trouble. They treat scrap as a production loss decision when it is really a controlled disposition decision. If the nonconformance workflow in MES, QMS, and ERP is weak, teams may argue from incomplete data, outdated drawings, or disconnected inspection records. In brownfield environments, that failure mode is common.

    Data and system dependencies

    A good scrap decision depends on having the right records linked together:

    • Current drawing and revision from PLM or document control
    • Traveler or routing history from MES or paper traveler records
    • Material certs, lot genealogy, and serialized traceability
    • Special process records
    • Inspection results, including CMM or manual measurement evidence
    • Prior rework, prior NCRs, or repeated escapes on the same feature
    • Customer-specific disposition restrictions where applicable

    If those records are fragmented across MES, ERP, QMS, and shared drives, the practical risk is not just delay. It is an incorrect disposition based on partial evidence. Full system replacement is usually not the answer here. In regulated aerospace environments, replacing core execution and quality systems wholesale often fails because of validation cost, qualification burden, downtime risk, and integration complexity. More often, plants improve the decision process by tightening record linkage, authority rules, and evidence capture across existing systems.

    A practical decision frame

    A defensible scrap decision for a structural part usually asks these questions in order:

    1. What exact requirement was violated?
    2. Is the feature structurally or functionally critical in this location?
    3. Is there an approved rework or repair path for this condition?
    4. Will that recovery path preserve all other requirements and margins?
    5. Do we have complete, trustworthy evidence and traceability to support the disposition?
    6. Do the required authorities agree and document the rationale?

    If any of those answers is unclear, the part should stay in formal nonconformance control until clarified. In many cases, that uncertainty ends in scrap, and that is sometimes the least risky outcome.

    Bottom line

    The right criteria are technical conformity, structural intent, traceability, and approved disposition authority. Scrap should be decided by whether the part can still be demonstrated to meet requirements after an allowed recovery path, not by replacement cost or urgency. Where evidence is weak, traceability is broken, or structural margin cannot be justified, scrap is usually the defensible decision.

  • Process Parameter

    A process parameter is a measurable or controllable condition that defines how a manufacturing or industrial process is run. It commonly refers to variables such as temperature, pressure, speed, feed rate, torque, time, humidity, flow rate, or setpoint values.

    Process parameters are used in work instructions, recipes, routings, control plans, MES records, SCADA systems, quality records, and equipment settings. They help describe the conditions under which an operation was performed and may be recorded for traceability, troubleshooting, process monitoring, or product quality review.

    A process parameter should not be confused with a product characteristic. A process parameter describes the process input or operating condition, while a product characteristic describes the resulting part, material, or assembly feature. For example, oven temperature is a process parameter; coating thickness after curing is a product characteristic.

    Some parameters may be identified as critical process parameters when variation in the parameter can materially affect quality, yield, safety, or process capability. The broader term process parameter does not imply that the parameter is critical unless that status is defined by the applicable process, control plan, or quality system.

  • AOG risk map

    Core meaning

    An **AOG risk map** is a structured representation of the process, part, and supplier risks that can lead to **aircraft-on-ground (AOG)** events—situations where an aircraft is unable to operate because required parts, repairs, or documentation are not available.

    It typically combines:

    – Critical aircraft parts, systems, or configurations that can cause AOG if unavailable or non-conforming.
    – Manufacturing and maintenance process steps that affect those parts.
    – Suppliers and logistics paths that provide those parts or services.
    – Risk indicators such as likelihood of disruption, detection capability, and potential operational impact.

    The result is a map—often visual but sometimes tabular—that links operational risks in factories, supply chains, and MRO (maintenance, repair, and overhaul) operations to their potential to create AOG situations.

    Use in industrial and aerospace workflows

    In aerospace manufacturing and MRO environments, an AOG risk map is commonly used to:

    – Identify which parts or assemblies are AOG-critical and where they are produced or controlled.
    – Trace how issues in upstream processes, quality controls, or suppliers could cascade into AOG events.
    – Prioritize monitoring, contingency planning, and escalation paths for high-risk items.
    – Align OT/IT, MES, ERP, and supply-chain systems around consistent AOG-critical object lists and risk attributes.

    Operationally, manufacturing, supply chain, quality, and engineering teams may reference the AOG risk map when:

    – Assessing the impact of process changes or capacity shifts on AOG-critical parts.
    – Evaluating new or alternative suppliers for components with AOG exposure.
    – Routing nonconformances, deviations, or concession requests involving AOG-critical items.
    – Coordinating responses to disruptions (e.g., late deliveries, quality escapes) that could ground aircraft.

    Structure and data sources

    An AOG risk map often aggregates data from multiple systems, for example:

    – **ERP/MRP**: part master data, criticality flags, demand profiles.
    – **MES/production systems**: routings, work centers, process history, WIP positions.
    – **Quality systems (QMS, LIMS, CAPA tools)**: nonconformance history, escape risks, defect trends.
    – **Supplier and logistics systems**: lead times, performance, single- or sole-source exposure.

    The “map” may be implemented as:

    – A visual node-and-link diagram connecting parts, processes, suppliers, and AOG risk levels.
    – A matrix or table with part numbers, plants, suppliers, and risk ratings.
    – A model embedded in analytics or operations-intelligence platforms that supports filtering and alerts for AOG risk.

    Boundaries and exclusions

    An AOG risk map:

    – **Includes**: risks specifically tied to aircraft being unable to depart or continue service due to missing, delayed, or non-conforming parts, documentation, or repairs.
    – **Can include**: manufacturing, maintenance, supply, and logistics risks where their consequence is framed in terms of AOG probability or duration.
    – **Excludes**: general enterprise risk maps that do not explicitly tie risks to AOG impact (e.g., purely financial or reputational risks without an AOG linkage).
    – **Is not the same as**: a full safety hazard analysis (which focuses on hazards to people and equipment) or a generic FMEA, although those analyses may feed into an AOG risk map.

    Common confusion and related terms

    – **AOG vs. general production risk mapping**: AOG risk mapping is specifically oriented to aircraft-grounding consequences, not just late orders or production delays. A part can be high risk for schedule yet low AOG risk if it does not impact aircraft dispatch.
    – **AOG risk map vs. critical part list**: A critical part list is typically a flat list of high-importance items. An AOG risk map adds structure, showing how those items link to processes, plants, suppliers, and potential failure paths.
    – **AOG risk map vs. bow-tie or fault tree analysis**: Bow-tie or fault tree diagrams analyze causal chains for specific events. An AOG risk map is broader, aggregating many potential causes and pathways into one coherent view focused on AOG exposure.

    Application in the site context

    Within aerospace factories and regulated manufacturing environments, an AOG risk map is often maintained as part of broader risk and operations-intelligence practices. It is used to align MES, ERP, QMS, and supply-chain data around a shared understanding of AOG-critical items, making it easier to:

    – Monitor production and quality signals that may affect AOG-critical parts.
    – Coordinate cross-functional response when disruptions occur.
    – Review and update risk assessments when there are changes in demand, suppliers, or process design.

    The map is generally kept as a living artifact, subject to both periodic review and event-driven updates when material changes occur in products, processes, or supply networks that influence AOG risk.

  • Can I build AI models directly on my MES database without a data warehouse?

    Yes, you can, but the practical answer is usually not as your primary long-term architecture.

    Building AI models directly on an MES database can work for narrow cases such as prototyping, a read-only pilot, or a well-bounded model that uses a small, stable dataset. It becomes much harder when you need reliable production performance, historical context, governed data definitions, cross-system signals, or repeatable validation.

    MES databases are generally designed for transaction processing and shop floor execution. AI and analytics workloads tend to need different things: large historical windows, feature engineering, versioned datasets, stable semantics, and joins across MES, ERP, QMS, historian, CMMS, LIMS, or PLM data. Those needs often expose the limits of querying the MES directly.

    What usually goes wrong

    • Performance risk: analytical queries and model training can compete with execution workloads. In a live plant, that is rarely acceptable without strict isolation.

    • Incomplete context: MES data alone may not explain quality, downtime, yield, maintenance, supplier, or planning outcomes. Important signals often live in other systems.

    • Poor historical structure: many MES implementations keep only the history needed for execution and records, not the curated time-series or feature-ready history AI work expects.

    • Schema volatility: vendor upgrades, custom fields, local workarounds, and plant-specific extensions can break pipelines or silently change model inputs.

    • Data quality issues: missing timestamps, reused codes, manual overrides, late entries, and inconsistent reason codes can make a model look better in development than it performs in production.

    • Traceability and validation burden: if model outputs influence regulated processes, you need controlled data lineage, versioning, change control, and evidence of what data trained and fed the model.

    • Security and access concerns: direct database access to a production MES is often restricted for good reason, especially where OT segmentation, technical data handling, or least-privilege controls apply.

    When direct MES access can be reasonable

    It can be reasonable if all of the following are true:

    • the workload is read-only and isolated from production performance risk

    • the use case is limited in scope, such as a single line, single plant, or one prediction target

    • the MES schema is stable and well understood

    • you do not need extensive joins across multiple enterprise systems

    • data quality has already been assessed, not assumed

    • there is a clear validation and change-control process for model updates and data mapping changes

    Even then, many teams use a replicated copy, reporting replica, historian feed, or staged extract rather than hitting the live MES database directly.

    Why a warehouse or lakehouse often becomes necessary

    A warehouse is not mandatory on day one, but some governed analytical layer usually becomes necessary once the use case matters operationally.

    That layer helps with:

    • joining MES data with ERP, QMS, PLM, maintenance, lab, and supplier data

    • preserving historical snapshots when source systems overwrite or reclassify records

    • standardizing identifiers, timestamps, and event definitions across plants

    • supporting reproducible training datasets and model lineage

    • protecting the MES from heavy analytical workloads

    • enforcing governed access, retention, and auditability

    In regulated environments, this is often less about analytics convenience and more about being able to explain where the data came from, what transformations occurred, and what changed between model versions.

    Brownfield reality

    In most plants, the MES is only one part of the operational record. Actual execution context is fragmented across legacy and modern systems, custom integrations, spreadsheets, and manual workarounds. That means an AI model built only on MES data may be technically possible but operationally misleading.

    A full replacement strategy is usually not the answer. Replacing MES, ERP, QMS, or related systems just to make AI easier often fails in regulated, long-lifecycle environments because of qualification burden, validation cost, downtime risk, integration complexity, and the need to preserve traceability and controlled change. Coexistence with existing systems is usually more realistic than rip-and-replace.

    Practical recommendation

    If you want to move quickly without creating avoidable risk, start with a staged approach:

    1. assess whether the MES data actually contains the signal needed for the use case

    2. use read replicas, exports, or CDC pipelines instead of direct production querying where possible

    3. add only the minimum analytical layer needed for history, joins, and lineage

    4. validate data mappings and feature definitions with operations, quality, and IT together

    5. treat model deployment as a controlled change, especially if outputs affect execution, release decisions, or quality workflows

    So the answer is yes for limited scenarios, but usually no if you mean a scalable, production-grade AI program with strong traceability, cross-system context, and manageable risk.

  • Ramp-up

    Ramp-up is the controlled increase of production volume, staffing, equipment use, or system activity from an initial level toward a planned operating rate. In manufacturing, it commonly refers to the period after a product launch, line start, process change, or capacity addition when output is increased while performance is monitored.

    During ramp-up, teams typically track whether materials, work instructions, labor, equipment, quality checks, and system transactions can support the higher rate. In MES, ERP, and planning contexts, ramp-up may affect routings, work orders, schedules, inventory demand, inspection load, and throughput assumptions.

    Ramp-up is not the same as startup, which usually refers to the initial act of bringing a process, line, or system into operation. It is also different from capacity, which describes the amount of output a process can support under defined conditions. Ramp-up is the transition toward that expected operating level.

  • What is the difference between ISO 27001 and NIST 800-53?

    ISO 27001 and NIST SP 800-53 address similar security objectives but play different roles. In regulated industrial and manufacturing environments, they are often used together, not as substitutes.

    Core difference

    ISO 27001 is a management system standard. It defines how to establish, operate, monitor, and continually improve an information security management system (ISMS). It is structured around risk management, governance, and a Plan-Do-Check-Act cycle and can be formally certified by accredited bodies.

    NIST SP 800-53 is a control catalog. It defines what security and privacy controls can be implemented across a system or organization. It is detailed and control-centric and is not itself a certifiable standard. It is widely used in U.S. federal and defense contexts as a reference set of safeguards.

    Scope and focus

    • ISO 27001
      • Focuses on organizational processes for managing information security risk.
      • Addresses governance, policy, risk assessment, internal audit, management review, and continual improvement.
      • Includes Annex A, which points to a control set, but the main emphasis is on the management system.
      • Can apply across corporate IT, OT, and supporting processes if they are in scope of the ISMS.
    • NIST SP 800-53
      • Focuses on specific controls (technical, administrative, and physical) for information systems.
      • Organized into control families (such as access control, configuration management, incident response).
      • Used as a building block in risk management and authorization frameworks, not as a full management system.
      • Often applied at the system boundary level (for example MES, historian, OT network) as part of a broader program.

    Certification vs. assessment

    • ISO 27001
      • Organizations can be audited and certified by accredited certification bodies.
      • Certification typically covers a defined scope (for example “global IT” or “manufacturing IT and OT”), not every system everywhere.
      • A certificate does not guarantee regulatory compliance or eliminate cyber risk, but it shows that a documented ISMS is in place and audited.
    • NIST SP 800-53
      • There is no generic “NIST 800-53 certification.”
      • Controls are implemented, assessed, and authorized within frameworks such as the NIST Risk Management Framework.
      • Compliance is usually judged in the context of a specific program or contract (for example federal systems), not by a public certificate.

    How they relate in practice

    In a brownfield manufacturing environment, it is common to:

    • Use ISO 27001 to define the overarching information security management system and governance model, including risk assessment, roles, policies, and change control.
    • Use NIST SP 800-53 as a reference library when selecting and tailoring specific controls for IT, OT, MES, historians, and cloud integrations.

    Mappings exist between ISO 27001 and NIST 800-53, but they are approximations. Control coverage and depth differ, and mapping quality depends on your interpretation, tooling, and documentation discipline.

    Implications for regulated industrial environments

    • Coexistence with legacy systems: Applying either framework across mixed OT/IT landscapes requires careful scoping, because older PLCs, DCS, and MES platforms may not support all NIST 800-53-style controls. ISO 27001 emphasizes risk-based justification for such gaps and documented compensating controls.
    • Validation and change control: For GMP or safety-critical operations, adding or modifying controls (for example new logging, endpoint protection, or access mechanisms) can trigger validation, qualification, or re-testing of systems. Both ISO 27001 and NIST 800-53 must be implemented with existing change control and validation processes in mind.
    • Downtime and availability: Some NIST 800-53 controls (for example aggressive patching or network re-segmentation) can conflict with uptime requirements for 24/7 plants. ISO 27001’s risk-based approach allows you to prioritize and document deviations, but actual risk reduction depends on site-specific engineering and operations constraints.
    • No guarantee of compliance: Neither ISO 27001 certification nor strong alignment to NIST 800-53 ensures success in regulatory inspections or customer audits. They help demonstrate structured control selection, governance, and traceability, but outcomes depend on execution quality, evidence, and consistency across sites.

    Which should we use?

    They serve different purposes and often complement each other rather than compete.

    • Choose ISO 27001 when you need a formal, auditable management system for information security that covers policies, risk management, and continual improvement across the organization.
    • Use NIST SP 800-53 when you need a detailed control set for designing or evaluating safeguards on specific systems, especially where U.S. federal or defense requirements are relevant.
    • In many industrial organizations, the practical approach is ISO 27001 for how you manage security plus NIST 800-53 as one of the libraries for what controls you pick, customized for the realities of your OT and MES environment.

    Any decision should account for your existing control landscape, integration debt, regulatory obligations, and the cost and risk of retrofitting legacy production systems.

  • Can sites still adapt processes locally with MES?

    Short answer: yes, but with tighter guardrails than paper or spreadsheets

    Most MES implementations allow some degree of local process adaptation, but the latitude is typically much narrower than in paper-based or ad‑hoc digital systems. What a site can change locally depends on configuration options, governance, and the level of regulatory scrutiny. In many regulated plants, local changes are limited to parameters (like limits, sequences, resources) within approved templates rather than complete workflow redesign. This is intentional: it trades local freedom for consistency, traceability, and controlled risk. If your organization expects the MES to be both a rigid standard and a playground for local experimentation, there will be friction.

    What usually *can* be adapted locally in an MES

    In most brownfield environments, sites can locally adjust master data and configuration elements that are explicitly exposed as parameters. This often includes things like routing variants, resource assignments, work center calendars, and shift patterns that reflect local capacity and layout. Sites may also adjust work instructions, checklists, and data collection points, as long as the changes stay within controlled templates and approved content libraries. Limits, sampling frequencies, and inspection points can sometimes be tuned locally, especially when they are driven by risk assessments or product-family rules. However, each of these types of changes is normally subject to role-based access and a formal change process, not free-form shop-floor editing.

    In practice, this connects to shop floor execution control when teams need to turn the answer into repeatable execution habits.

    What usually *cannot* be freely adapted at site level

    Major structural changes to the process model are often restricted or centralized. Examples include altering the fundamental routing logic, removing critical data collection points, or bypassing electronic signatures. Cross-system flows that impact ERP, QMS, or serialization are usually locked down because they affect finance, compliance, and downstream traceability. Many multi-site MES deployments deliberately prevent local sites from forking core templates, since divergent models are expensive to validate, support, and audit. In highly regulated sectors, attempting to maintain dozens of local variants of validated workflows is rarely sustainable. This leads to a model where sites can propose changes but cannot independently rewire core process logic.

    Tradeoffs: standardization vs local agility

    MES is usually introduced to reduce uncontrolled local variation, which directly conflicts with the idea of unconstrained local adaptation. Tighter standardization simplifies training, audit readiness, deviation analysis, and master data maintenance, but it can make local continuous improvement slower. Allowing more local autonomy can accelerate problem solving and innovation, but it drives up validation overhead and complicates comparisons across plants. In regulated environments, leaders often accept slower local changes to protect consistency of data and evidence. The pragmatic compromise is to standardize the backbone flows and allow flexible configuration of parameters, prompts, and decision rules within that structure.

    Change control, validation, and why “just let sites change it” is risky

    Every non-trivial MES change that affects GMP, FAA, or similar-relevant records potentially requires impact assessment, regression testing, and documentation. If each site makes structural changes on its own, the organization inherits a large and often invisible validation burden. Over time, this leads to multiple, slightly different MES behaviors that are hard to qualify, re-test, and support during upgrades. When auditors or customers ask for evidence of control, explaining dozens of uncontrolled local variants is difficult. For this reason, many organizations centralize the change control process and require that site-level adaptations go through defined workflows with clear approvals and traceability.

    Coexistence with legacy systems and local workarounds

    In brownfield plants, MES often coexists with spreadsheets, local access databases, or niche tools that historically enabled very local process tweaks. After MES deployment, some of those tools persist as unofficial workarounds when MES is too rigid or change cycles are too long. This creates data fragmentation and can undermine the authoritative record expected from MES. Leaders need to be explicit about what is allowed locally and what must be in MES, and then align change control to make that realistic. If local adaptations are blocked in MES but tolerated in shadow systems, you get the worst of both worlds: fake standardization on paper and uncontrolled variation in practice.

    Practical patterns to enable safe local adaptation

    Many organizations adopt a tiered model: corporate or global engineering owns core process templates, while sites can configure bounded options and parameters. This can be implemented via feature flags, parameter tables, or site-specific configuration layers that do not break the underlying validated logic. Some teams also define “safe change” categories where sites can act quickly under local procedures, and “high-risk change” categories that require cross-functional review and potentially revalidation. Periodic configuration audits and configuration baselines help ensure that local adaptations remain visible and supportable. None of this removes the need for governance, but it can give plants meaningful room to adapt without fragmenting the entire MES landscape.

    Connecting this to continuous improvement and problem solving

    For continuous improvement and root cause analysis to be effective, sites must be able to close the loop by changing how work is executed, not just documenting issues. In a meshed MES–QMS landscape, that often means translating corrective actions into controlled MES changes: new checks, different sequencing, or adjusted limits. When the MES is overly centralized with long lead times, local teams will naturally push fixes into informal workarounds or training-only changes, which are fragile. Designing the MES governance so that well-justified, risk-assessed local adaptations can be implemented within reasonable timeframes is critical. Otherwise, MES becomes a barrier to improvement rather than an enabler.

  • What roles should participate in RCA for critical safety-of-flight nonconformances?

    For critical safety-of-flight nonconformances, root cause analysis should be cross-functional from the start. Quality typically facilitates, but quality alone is not enough. At minimum, you usually need the people who understand the requirement, the process that produced the condition, the evidence trail, and the authority to contain risk and approve corrective action.

    Core participants

    In most regulated aerospace and similar environments, the core RCA team should include these roles:

    In practice, this connects to data integrity, version control and audit when teams need to turn the answer into repeatable execution habits.

    • Quality engineering or quality management: owns the NCR workflow, evidence discipline, containment tracking, and linkage to CAPA or equivalent corrective action processes.
    • Responsible design or product engineering: confirms the requirement, characteristic criticality, functional impact, and whether the issue is design interpretation, tolerance stack-up, process capability, or execution failure.
    • Manufacturing or process engineering: analyzes routing, work instructions, tooling, fixtures, machine parameters, process controls, and recent changes.
    • Production supervision and the operator or inspector closest to the event: provides factual sequence-of-events detail that is often missing from formal records. Excluding frontline knowledge is a common RCA failure mode.
    • MRB authority or equivalent disposition authority: separates immediate disposition decisions from long-term corrective action and keeps the investigation grounded in product risk.
    • Program or business leadership for major events: ensures resourcing, customer communication paths, schedule impact management, and escalation discipline where the issue affects delivered or deliverable hardware.

    Roles that are often required depending on the case

    Critical safety-of-flight events usually pull in additional functions. Whether they are mandatory depends on your product, customer contract, internal procedures, and where the failure originated.

    • Supplier quality and supplier engineering if the nonconformance originated in purchased material, special processing, calibration services, or outside processing. If the supplier owns part of the cause chain, they need to participate directly, not just receive a corrective action request.
    • Special process engineering for heat treat, coating, bonding, welding, NDT, plating, composites, sterilization, or other tightly controlled processes where certification and parameter history matter.
    • Metrology, test, or labs when measurement method, fixture bias, software revision, environmental conditions, or test setup may have contributed. Many RCAs go wrong because they assume the detection method is valid without checking MSA, calibration status, or setup repeatability.
    • Configuration management or document control if there is any chance the event is tied to drawing revision mismatch, obsolete work instructions, uncontrolled local copies, or incorrect model-based definition release.
    • Maintenance, controls, or equipment engineering if machine condition, preventive maintenance gaps, alarms, overrides, sensor drift, or PLC or HMI changes may be involved.
    • PLM, MES, ERP, or QMS system owners when the event may involve bad master data, routing mismatch, serialization gaps, missing as-built records, or interface failures between systems. In brownfield plants, these are common contributors and often missed.
    • Training or competency owners when qualification, certification, or recency of training is in question.
    • Materials, planning, or receiving quality where lot mix, substitution, shelf-life, handling, storage, or traceability breaks may be causal.

    Who should lead?

    Usually, quality leads the investigation process, but the technical lead should match the dominant cause path. If the likely cause is process control, manufacturing engineering may drive the technical analysis. If the likely cause is requirement interpretation or design intent, engineering may need to lead that portion. What matters is that one role owns coordination and evidence control, while technical ownership sits with the people competent to test the actual failure theory.

    That distinction matters. A lot of weak RCAs are really documentation exercises run by whoever owns the form.

    Who should not be left out

    Three omissions are especially risky for safety-of-flight cases:

    • The person who knows the real shop-floor sequence. Formal travelers and system timestamps rarely capture workarounds, interruptions, re-clamping, tool swaps, or local decisions.
    • The requirement owner. Teams sometimes investigate process variation before confirming the requirement, characteristic classification, and functional effect.
    • The system or data owner when records conflict. If MES, ERP, PLM, QMS, calibration, and maintenance data do not agree, the RCA can be built on the wrong chronology.

    Boundaries and controls for critical cases

    For critical safety-of-flight nonconformances, the RCA team is only one part of the response. You also typically need explicit controls around containment, segregation, traceability review, shipped product impact assessment, and change control for any corrective action. If the proposed fix touches validated workflows, qualified equipment, approved process parameters, or controlled documentation, implementation will usually require formal review and may take longer than the urgency of the event would suggest.

    That is normal in regulated environments. Fast action without controlled evidence and change discipline often creates a second problem.

    Practical rule

    If a role can answer one of these questions, it probably belongs in the RCA:

    • What requirement was actually violated?
    • How could the process physically create this condition?
    • Can the detection method itself be trusted?
    • What else, by serial, lot, process window, or supplier batch, may be affected?
    • What system, document, or equipment changes happened near the event?
    • Who has authority to contain risk and approve the corrective path?

    For most critical safety-of-flight events, that means a small core team plus targeted subject-matter experts, not a giant meeting. Too few roles misses causes. Too many turns RCA into a status review.

  • How does ISA-95 help MES systems?

    ISA-95 helps MES systems by providing a structured reference model, shared terminology, and standard interface concepts for how manufacturing operations (Level 3) interact with business systems (Level 4) and shop-floor automation (Level 2). It does not make MES integration or compliance automatic, but it reduces ambiguity and integration risk when used correctly.

    1. Clarifies what MES is responsible for

    ISA-95 defines functional models and activity groups for Level 3 (manufacturing operations management). This helps you decide what your MES should and should not do in a brownfield stack.

    In practice, this connects to shop floor execution control when teams need to turn the answer into repeatable execution habits.

    • Scope boundaries: Separates order management, planning, and financials (typically ERP) from detailed scheduling, dispatching, execution, and data collection (typically MES).
    • Core MES functions: Provides a reference for capabilities such as production operations, quality operations, maintenance operations, and inventory operations at Level 3.
    • Gap analysis: Lets you compare your current MES (or MOM) footprint against the ISA-95 model to identify overlaps, gaps, and custom extensions.

    This is particularly useful in regulated environments where uncontrolled scope creep in MES can complicate validation, traceability, and long-term support.

    2. Provides a common language for IT, OT, and vendors

    ISA-95 offers a shared vocabulary for roles that typically talk past each other.

    • Consistent terminology: Terms like work center, equipment, material, segment, production schedule, and production performance are defined in a way that can be referenced in specifications and design documents.
    • Requirements clarity: You can write MES requirements that reference ISA-95 functions and objects instead of ad hoc descriptions that are interpreted differently by each stakeholder.
    • Vendor evaluation: You can ask MES vendors to map their functionality and data model to ISA-95 concepts, which exposes misalignments early.

    In practice, not all vendors implement ISA-95 consistently. You still need to verify how closely a given product aligns with the standard and where it diverges.

    3. Structures interfaces between MES, ERP, and automation

    ISA-95 is especially valuable for defining how MES exchanges data with ERP, PLM, WMS, and Level 2 systems.

    • Information models: The standard defines logical objects (e.g., material definitions, equipment, personnel, process segments) and how they relate, which you can use to design integration payloads.
    • Typical flows: Clarifies common flows like orders and master data from ERP to MES, and performance, consumption, and genealogy from MES back to ERP or QMS.
    • Stable contracts: Helps create API or message contracts that are easier to maintain across upgrades and supplier changes because they are based on a public reference model.

    However, ISA-95 itself does not mandate specific transport technologies or message formats. You still have to choose and validate protocols (for example, OPC UA companion specs, message queues, web services, or vendor-specific connectors) and ensure they work with your existing systems.

    4. Supports traceability and genealogy design

    For regulated operations, ISA-95 helps you structure traceability across MES and connected systems.

    • Material and equipment models: Provides logical patterns for representing units, lots, equipment hierarchies, and personnel assignments, which are key to end-to-end genealogy.
    • Event capture structure: Helps you decide which events should be recorded at which level (e.g., MES vs. SCADA) and associated to which ISA-95 objects.
    • Consistency across plants: Offers a template to harmonize MES data structures for multi-site operations, easing consolidated reporting and multi-site investigations.

    Traceability quality still depends on disciplined implementation: correct IDs, robust scanning or data capture, controlled master data, and clear procedures for manual overrides and rework.

    5. Improves MES implementation, validation, and change control

    Because ISA-95 breaks Level 3 into logical activities and information flows, it can make MES projects and their lifecycle governance more manageable.

    • Structured user requirements: URS and functional specs can be organized by ISA-95 activity and object, making them easier to review and maintain.
    • Test coverage and traceability: Test protocols can reference ISA-95 functions and data elements, improving traceability from requirement to design to test and, in some sectors, to validation documentation.
    • Change impact analysis: When you change an interface or MES function, the ISA-95 mapping helps you see which objects and upstream/downstream systems are impacted.

    In long lifecycle environments, this structure helps avoid full system replacement when business needs change. You can often adjust specific interfaces or functions rather than ripping and replacing an entire MES, which would carry substantial downtime, revalidation, and integration risk.

    6. Enables phased modernization in brownfield environments

    Most regulated facilities already run legacy MES, custom dispatch tools, or homegrown data collectors. ISA-95 helps you modernize without assuming a clean slate.

    • Incremental mapping: You can map existing capabilities and data structures to ISA-95, then fill gaps or rationalize overlaps over time.
    • Coexistence patterns: Facilitates scenarios where the legacy MES remains the system of record but new modules or point solutions are introduced, with interfaces designed using ISA-95 objects and activities.
    • Vendor-neutral designs: If you later change MES vendors, having ISA-95-based interface definitions reduces the amount of rework and requalification, though it does not eliminate it.

    Attempts to fully replace MES in a single step often struggle in aerospace-grade and similar contexts because of validation burden, downtime limits, and complex integrations. ISA-95 supports more realistic, staged migration strategies.

    7. Limitations and common pitfalls

    ISA-95 is useful, but it is not a turnkey solution.

    • No automatic compliance: Using ISA-95 language in documents does not provide regulatory compliance, audit outcomes, or validation coverage by itself. Those depend on your implementation, controls, and evidence.
    • Interpretation differences: Different vendors claim ISA-95 alignment while implementing very different models. You must inspect detailed mappings and test the behavior.
    • Not a process design standard: ISA-95 does not tell you how to run your operations or what your optimal workflow is. It only structures information and functions.
    • Abstraction vs. reality: The standard is intentionally abstract. Mapping complex, legacy processes into the model often reveals edge cases that require local conventions or extensions.

    When is using ISA-95 most valuable for MES?

    ISA-95 tends to add the most value when you are:

    • Defining or revising MES scope relative to ERP, PLM, QMS, and shop-floor control systems.
    • Designing or refactoring integrations between Level 3 and Level 4 in a multi-vendor environment.
    • Harmonizing MES and data structures across multiple plants or business units.
    • Preparing for a phased MES modernization where co-existence with legacy systems is required.

    In these cases, ISA-95 provides a stable reference that helps reduce misunderstandings, avoid unnecessary customization, and support more predictable validation and change control, while still requiring careful engineering and governance.

  • What is MES and MOM?

    MES and MOM are closely related concepts, but they are not interchangeable. In most industrial and regulated environments, MES is treated as a specific system layer, while MOM refers to a broader set of operations management capabilities that may span several systems.

    What is MES?

    A Manufacturing Execution System (MES) is the layer between ERP and the shop floor that manages and records the execution of production in near real time. In practice, an MES typically covers some or all of the following functions:

    In practice, this connects to shop floor execution control when teams need to turn the answer into repeatable execution habits.

    • Order dispatching and sequencing from ERP or planning systems to specific lines, cells, or machines
    • Electronic routing and enforcement of process steps (e.g., operations, resources, tools)
    • Electronic batch records or device history records, including operator sign-offs in regulated industries
    • Data collection from machines, test equipment, and operators (parameters, measurements, events)
    • In-process quality checks, holds, and nonconformance logging
    • Material tracking, work-in-process (WIP) visibility, and basic genealogy
    • OEE and basic performance metrics based on actual production events

    MES is usually a specific, named application (or closely integrated set of applications) that must be validated, integrated with existing systems, and maintained under change control. It is commonly positioned as the “system of record” for what actually happened during production.

    What is MOM?

    Manufacturing Operations Management (MOM) is a broader management discipline and capability set. It includes MES-like execution functions but also spans planning, quality, maintenance, and performance management at the operations level.

    Depending on the vendor and plant, MOM may encompass:

    • Production operations management (execution, dispatching, WIP, genealogy)
    • Quality operations management (in-process checks, SPC, deviation/CAPA integration, release workflows)
    • Maintenance operations management (basic asset status, coordination with CMMS, downtime categorization)
    • Inventory and material operations (material movements, consumption, kitting, limited warehouse functions)
    • Performance and analytics (KPIs, OEE, losses, bottleneck analysis, shift/plant dashboards)

    Some vendors brand their entire operations suite as a MOM platform, of which MES is one module. Others use “MOM” more as an architectural or process reference model (for example, based on ISA-95), even when the actual systems are a mix of MES, LIMS, QMS, CMMS, and custom tools.

    How do MES and MOM relate in real plants?

    In brownfield, regulated environments, the relationship between MES and MOM is often shaped by existing systems and constraints rather than by clean reference models:

    • MES as a subset of MOM: Many organizations treat MES as the execution subset of a broader MOM strategy. MOM capabilities may be spread across MES, QMS, LIMS, CMMS, data historians, and reporting tools.
    • Overlapping functions: Quality and maintenance functions in MES often overlap with standalone QMS and CMMS. Which system is the “source of truth” for a given function (e.g., nonconformance, calibration) must be defined explicitly.
    • Multiple MES-like systems: Plants may run several MES-like applications (line control systems, LIMS, custom shop-floor IT) that together form the effective “MOM” landscape, even if none is labeled as such.
    • ERP vs. MOM boundaries: In some implementations, ERP holds more detailed production and inventory logic, leaving MES relatively thin. In others, MES/MOM absorb more logic to keep ERP simpler. The split is rarely identical across sites.

    Why does the distinction matter in regulated, long-lifecycle environments?

    The MES vs. MOM distinction matters less as terminology and more as a way to think about responsibility, integration, and validation:

    • Scope and expectations: Labeling a project as “MES” tends to focus scope on execution and electronic records. Labeling it as “MOM” often implies broader changes to quality workflows, maintenance coordination, and performance reporting.
    • Validation and change control: In regulated contexts, MES changes can directly affect product records, electronic signatures, and traceability. Expanding MES into a full MOM platform increases validation scope and change-control overhead.
    • System coexistence: A MOM vision usually has to coexist with existing MES, QMS, PLM, LIMS, and historians. Full replacement strategies frequently stall due to qualification burden, downtime risk, and integration debt. Incremental integration and clear system-of-record choices are usually safer.
    • Traceability and genealogy: Deciding where genealogy, batch history, and device history records are mastered (MES vs. other MOM components) impacts auditability, data integrity controls, and cross-system reconciliation efforts.

    What should a practitioner focus on when hearing “MES” vs. “MOM”?

    When these terms come up in projects or vendor discussions, it is useful to clarify:

    • Which concrete functions are in scope (e.g., EBR/DHR, routing control, in-process quality, OEE, maintenance, material management)
    • Which systems are intended as system of record for each function, given existing ERP, QMS, LIMS, CMMS, PLM, and historians
    • How validation, change control, and audit trails will be managed across these systems
    • What the migration or coexistence path looks like, recognizing that wholesale replacement of legacy MES or QMS is often infeasible in one step

    In summary, MES is typically the execution-focused system layer on the shop floor, while MOM is a broader operations management scope that may involve multiple systems. The exact boundary between them depends heavily on plant history, vendor choices, and regulatory constraints, so definitions should always be tied back to specific functions, data ownership, and system responsibilities.