FAQ Tag: canonical data model

  • What is the role of design of experiments (DoE) in AI-driven process window optimization?

    DoE provides the disciplined experimental structure that AI needs to optimize a process window without relying only on noisy historical data or trial-and-error changes. In practice, DoE helps determine which factors matter, how factors interact, where the practical operating limits are, and which combinations produce acceptable performance across multiple responses such as yield, cycle time, scrap, and critical quality characteristics.

    AI and DoE are complementary, not interchangeable.

    In practice, this connects to lean and process improvement when teams need to turn the answer into repeatable execution habits.

    • DoE is used to generate informative data on purpose.

    • AI and statistical models are used to learn from that data, plus available historical data, to predict outcomes and recommend settings.

    • Process window optimization then uses those models to identify a robust operating region rather than a single best point that may fail under normal variation.

    That distinction matters because many plants do not have historical data that is clean, complete, or well-labeled enough for direct AI optimization. Data may be fragmented across MES, ERP, PLM, historians, spreadsheets, and lab systems. Measurements may also reflect changing tooling, operator methods, maintenance state, incoming material variation, or recipe revisions. In that situation, DoE is often the fastest way to create data with known intent, controlled factor changes, and defensible traceability.

    What DoE contributes to AI-driven optimization

    • Efficient data generation: It reduces the number of runs needed compared with changing one variable at a time.

    • Interaction discovery: It exposes factor interactions that simpler approaches miss, which is often where process instability actually comes from.

    • Boundary detection: It helps map where quality, throughput, or equipment constraints begin to break down.

    • Model training support: It creates balanced, informative data that improves model fitting and reduces bias from historical operating habits.

    • Robustness analysis: It supports optimization for tolerance to common variation, not just peak performance under ideal conditions.

    • Evidence for change control: It creates a more reviewable basis for recipe, setpoint, or routing changes than ad hoc tuning.

    What AI adds beyond classical DoE

    AI can help when the process is nonlinear, multivariate, and affected by hidden patterns across equipment, materials, or time. It can combine DoE results with broader production history to estimate more realistic operating windows, detect drift, and prioritize new experiments. In some cases, active learning or Bayesian optimization can propose the next most informative experiment instead of running a fixed design up front.

    But this only works if the underlying data is trustworthy enough. If sensor calibration is weak, timestamps do not align, genealogy is incomplete, or outcome labels are inconsistent, AI can amplify error rather than reduce it. A polished model on poor data is still poor evidence.

    Limits and tradeoffs

    DoE is not optional in every case, but it is often necessary when you need credible, explainable process learning in a regulated manufacturing context. That said, it has constraints:

    • Production disruption: Experiments consume machine time, material, engineering attention, and sometimes increase scrap risk.

    • Qualification burden: Changes to validated processes, recipes, inspection plans, or critical parameters may trigger formal review, revalidation, or additional evidence requirements.

    • Measurement dependency: Weak MSA or unstable test methods can invalidate the results.

    • Transfer risk: A model built on one machine, tool state, material lot profile, or facility may not generalize cleanly to another.

    • Objective conflicts: The best settings for yield may not be best for throughput, energy use, or downstream rework.

    • Human factors: If operators cannot execute the recommended settings consistently, the theoretical optimum may not be the operational optimum.

    So the role of DoE is not simply to feed data into AI. It is to create reliable learning conditions, expose cause-and-effect relationships, and define the safe space within which AI recommendations can be evaluated.

    How this usually fits into a brownfield environment

    In most plants, AI-driven process window optimization has to coexist with existing MES, ERP, PLM, QMS, historians, SCADA, and lab systems. Full replacement is rarely the practical starting point. In regulated, long-lifecycle environments, replacement strategies often fail because qualification effort is high, downtime is constrained, integrations are brittle, and traceability and change control obligations do not disappear just because a new platform is introduced.

    A more realistic approach is incremental:

    1. Use DoE to generate a controlled baseline on a targeted process.

    2. Link experiment plans, materials, machine states, and outcomes back to existing record systems.

    3. Train and compare models using both designed and historical data.

    4. Validate recommendations offline before limited production use.

    5. Deploy setpoint guidance or decision support first, not fully autonomous control, unless the control strategy is separately justified and governed.

    This approach is slower than a greenfield AI narrative, but it is usually more survivable operationally.

    Bottom line

    DoE is the structured foundation that makes AI-driven process window optimization more credible, explainable, and transferable. AI can accelerate learning and improve multivariable optimization, but it does not remove the need for designed experimentation, measurement discipline, validation, and controlled implementation. If those prerequisites are weak, neither DoE nor AI will produce a reliable process window.

  • What is the best way to handle serialized parts in KPI calculations?

    The best way is to calculate KPIs at the right grain and keep serialized units separate from simple quantity-based reporting when needed. In practice, that means using the serial number as the primary reporting object for unit history, while still aggregating to order, operation, work center, program, or period for management reporting.

    If you treat serialized parts like interchangeable pieces, KPI results often become misleading. A single serialized unit may pause, split, loop through rework, move between routings, or accumulate inspection and concession activity that does not fit cleanly into a basic completed-quantity model.

    In practice, this connects to part genealogy and traceability when teams need to turn the answer into repeatable execution habits.

    Practical approach

    • Use dual KPI logic: keep unit-level metrics for serialized behavior and flow-level metrics for line or cell performance.
    • Anchor calculations to the serial number: first-pass yield, rework rate, touch time, queue time, cycle time, and genealogy-dependent quality metrics should be traceable to each serialized unit.
    • Define event rules explicitly: specify what counts as start, complete, pass, fail, hold, rework entry, rework exit, scrap, replace, merge, split, and shipment.
    • Separate physical completion from booking completion: ERP completion timestamps, MES operation signoffs, and quality disposition dates are often different. Do not assume they are interchangeable.
    • Report rolled-up KPIs carefully: aggregate serialized results only after the unit-level logic is stable and governed.

    What usually works best for common KPI types

    Yield and first-pass yield: calculate at the serialized-unit level first. A part should count once for the relevant operation or route step, with a clear rule for whether re-entry after failure changes first-pass status. If the same serial can revisit an operation, you need a policy for unique pass/fail treatment.

    Cycle time and lead time: use serial-level start and finish timestamps, then summarize distributions, not just averages. Serialized work often has extreme variance due to inspection waits, engineering holds, nonconformance review, and outside processing. Averages alone can hide operational risk.

    WIP and aging: treat each serial as an individual WIP object with current status, current operation, and days in state. This is often more useful than unit counts because one aging serialized assembly can matter more than many standard parts.

    Throughput: use completed serialized units for finished throughput, but distinguish between good completions, conditional releases, and units awaiting final quality disposition if that distinction matters in your environment.

    OEE-adjacent metrics: be careful. Serialized part complexity can distort simple performance assumptions. If routing content differs by serial, quantity-per-hour may not be comparable without normalization by standard hours, operation content, or planned labor.

    Scrap and rework: do not count only transaction quantities. Tie scrap and rework to serial status history and disposition events. Otherwise, replacement activity and partial recovery can produce false rates.

    Key design decisions that affect KPI accuracy

    • Granularity: serial, lot, work order, operation, machine, shift, or program.
    • Rework policy: whether repeated operation attempts count as new opportunities or as continuation of the original unit path.
    • As-built structure changes: how substitutions, removed components, and serialized subassembly replacements affect denominator and completion logic.
    • Quality state handling: whether held, deviated, concessioned, or conditionally accepted units are included in standard output KPIs.
    • Timestamp precedence: which system is authoritative for operational events versus inventory movements versus quality dispositions.

    Brownfield reality

    In most plants, serialized part data lives across MES, ERP, QMS, test systems, and sometimes spreadsheets or local databases. The best KPI method depends on whether serial events are synchronized consistently across those systems. If they are not, KPI disputes usually reflect data model and process-control problems, not reporting problems.

    A full rip-and-replace is rarely the best answer in regulated, long lifecycle environments. It often fails because qualification effort, validation cost, downtime risk, integration complexity, and change-control burden are higher than expected. A more realistic path is to establish a governed event model for serialized units, map source-system ownership clearly, and improve calculation logic incrementally.

    What to avoid

    • Do not mix serialized and non-serialized production in one denominator without adjustment.
    • Do not use ERP completion transactions alone as proof of actual process completion.
    • Do not let operators or analysts infer KPI rules differently by program or shift.
    • Do not collapse rework loops unless you are doing it intentionally and documenting the tradeoff.

    Bottom line

    The best method is to model serialized parts as individually traceable units, calculate quality and time-based KPIs from serial event history, and then roll those metrics up under controlled rules. If your event definitions, system interfaces, or master data are inconsistent, the KPI will not be reliable regardless of the dashboard.

  • What are the essential stages in an aerospace non-conformance workflow?

    The essential stages are generally the same across aerospace manufacturers, even though names, approvals, and system steps vary by site.

    A practical non-conformance workflow usually includes:

    In practice, this connects to non-conformance management when teams need to turn the answer into repeatable execution habits.

    1. Detection and identification
      Record the non-conformance when it is found, whether during receiving, in-process inspection, final inspection, test, or field feedback. The record should identify the part, serial or lot, operation, requirement that was not met, how it was detected, and who found it.

    2. Containment
      Segregate or digitally block affected material to prevent unintended use. This often includes hold status, location control, and checks for suspect stock, work in process, tooling, documentation, or related assemblies. If traceability is weak, containment becomes slower and broader.

    3. Initial review and risk assessment
      Confirm the issue is real, assess immediate impact, and determine routing. Not every non-conformance needs the same path. The workflow often depends on severity, repeat history, whether the condition affects fit, form, function, safety, contractual requirements, or certification-related data, and whether supplier involvement is required.

    4. Documentation and evidence collection
      Attach inspection results, measurements, photos, operator comments, work instructions, revision levels, machine or process data where available, and affected order or traveler references. In regulated environments, weak evidence trails create rework in the investigation and slow closure.

    5. Disposition decision
      Determine what to do with the non-conforming item. Common outcomes include rework, repair if allowed, use-as-is only where authorized, return to supplier, or scrap. In many aerospace environments, this step requires formal review authority, often including MRB or designated engineering and quality roles. The exact authority model is site- and customer-dependent.

    6. Execution of disposition
      Carry out the approved action under controlled instructions. If rework or repair is required, the revised route, labor reporting, parts consumption, and document revisions need to be controlled. Informal fixes are a common failure mode because they break traceability and make as-built history unreliable.

    7. Verification and acceptance
      Inspect or test the item after disposition to confirm the result meets the approved criteria. This may involve repeat inspection, engineering sign-off, updated dimensional results, or downstream checks if the non-conformance affected assemblies or paperwork.

    8. Root cause and corrective action
      For significant, recurring, or systemic issues, the workflow should extend beyond item disposition into root cause analysis and corrective action. This is where process, training, supplier, equipment, document control, or planning issues are addressed. Not every isolated defect needs a full CAPA, but recurring escape patterns usually do.

    9. Closure and record retention
      Close the record only when approvals, evidence, disposition execution, and verification are complete. The retained record should support future audits, product history review, trend analysis, and linkage to related deviations, supplier NCRs, CAPAs, or changes.

    What makes aerospace different

    In aerospace, the stages above are not just administrative checkpoints. They are tied to product traceability, approved authority, and controlled changes. A fast workflow that cannot prove revision level, disposition approval, execution history, and final verification is usually not good enough.

    It is also common for the workflow to branch depending on whether the issue involves internal production, a supplier, a customer-returned unit, or maintenance and repair activity. Serialized products, critical characteristics, and long record-retention expectations increase the need for disciplined evidence handling.

    System reality in brownfield plants

    Most sites do not run this workflow in one clean system. Non-conformance data often spans QMS, MES, ERP, PLM, inspection software, and email or spreadsheets. That is workable, but only if ownership, data handoffs, and status controls are explicit.

    Full replacement is often not the safest answer in regulated aerospace environments. It can fail because of validation effort, qualification burden, downtime risk, integration complexity, and the need to preserve historical traceability across long equipment and product lifecycles. In many plants, the better approach is controlled coexistence: tighten the workflow, define system-of-record boundaries, and improve interfaces before attempting broader platform change.

    Common failure modes

    • Containment is recorded, but material is still physically accessible.

    • Disposition is approved, but execution instructions are ambiguous or not version-controlled.

    • Rework is completed, but the as-built or traveler record is not updated.

    • Supplier-related NCRs are disconnected from receiving, PO, or lot traceability.

    • Root cause is treated as optional, so repeat defects continue.

    • Closure occurs before verification evidence is complete.

    So the short answer is yes: there are essential stages, and they are fairly consistent. But the exact workflow, approvals, and system routing depend on product risk, customer requirements, organizational authority, and how well your existing quality and execution systems are integrated.

  • How fast should an aerospace organization be able to identify affected serial numbers?

    There is no single universal time standard that applies to every aerospace organization and every event. But operationally, an organization should be able to identify potentially affected serial numbers in minutes to a few hours for a high-risk quality issue, not days.

    If it takes multiple days to determine which serialized units consumed a suspect part, process, software revision, inspection result, or supplier lot, that usually indicates a traceability gap, an integration gap, or both.

    In practice, this connects to part genealogy and traceability when teams need to turn the answer into repeatable execution habits.

    What “fast enough” usually means

    The practical benchmark depends on the severity of the issue and the quality of the underlying genealogy data:

    • Immediate to under 1 hour: for suspected escape conditions, containment decisions, customer notifications, grounded asset impact, or any situation where ongoing production or field exposure must be assessed quickly.

    • Same shift: for most internal quality investigations where the organization needs to quarantine WIP, stock, or shipped units before the problem propagates.

    • Within 24 hours: may be workable for lower-risk investigations, but it is generally too slow if the issue could affect flight hardware, critical characteristics, or shipped product.

    The real expectation is not a specific number of minutes. It is the ability to produce a defensible, repeatable, and auditable affected population quickly enough to support containment and decision-making.

    What determines the answer

    Speed depends heavily on plant reality:

    • whether serial numbers are linked to lot, batch, work order, routing, and operator/inspection records

    • whether part substitutions, rework, splits, merges, and outside processing are captured correctly

    • whether ERP, MES, QMS, and PLM records agree on revision and as-built status

    • whether data entry is timely and controlled, rather than reconstructed after the fact

    • whether genealogy queries have been validated and tested before an actual event

    An organization may believe it has traceability because records exist somewhere, but if the team must manually reconcile spreadsheets, travelers, ERP transactions, supplier certifications, and inspection logs to identify impact, then response time will be inconsistent and error-prone.

    Brownfield reality

    In many aerospace environments, the answer is limited by coexistence with legacy systems. Serial traceability often spans older MES instances, ERP customizations, paper records, supplier portals, and quality systems that were never designed as one coherent genealogy model.

    That is why full replacement is often not the practical answer. In regulated, long-lifecycle environments, replacing execution and quality systems can trigger major qualification effort, validation cost, downtime risk, retraining burden, and new integration failure modes. Many organizations get better results by strengthening traceability across existing systems first, then narrowing manual handoffs and evidence gaps over time.

    What good looks like

    A mature organization can usually do all of the following without a special data-recovery project:

    • identify all suspect serial numbers, not just the obvious work order population

    • show why each serial number is in or out of scope

    • separate shipped, WIP, stock, scrapped, and reworked units

    • trace upstream to supplier lot or process condition and downstream to customer-delivered units

    • re-run the analysis consistently if scope changes

    If the organization can only provide a partial list quickly and needs days to confirm exceptions, alternates, or rework paths, then the initial answer may be useful for containment but not yet reliable enough for final disposition.

    Bottom line

    The right target is usually minutes to a few hours for high-consequence issues. Anything slower than that raises operational and quality risk. But the achievable speed depends on data readiness, genealogy completeness, system interoperability, and whether traceability processes have been tested under real conditions. Speed without defensible evidence is not enough.

  • Can automation help with NIST 800-53 continuous monitoring?

    Yes, automation can significantly support NIST 800-53 continuous monitoring, but only for well-defined portions of the process. It cannot by itself achieve compliance or eliminate the need for governance, risk assessment, human review, and disciplined change control. In industrial and regulated environments, automation is most useful for structured data collection, evidence management, and repeatable checks.

    Where automation actually helps

    In a brownfield industrial environment with mixed OT/IT, automation is typically effective in these areas:

    In practice, this connects to industrial security evidence when teams need to turn the answer into repeatable execution habits.

    • Asset discovery and status tracking: Periodic or near real-time discovery of servers, workstations, network devices, and some OT assets, feeding configuration management and inventory required by multiple NIST 800-53 controls.
    • Configuration and baseline checks: Automated comparison of device configurations, group policies, firewall rules, and key system parameters against approved baselines, then flagging drift for review.
    • Patch and vulnerability status: Scanning IT assets (and some OT assets where safe) for missing patches and vulnerabilities, generating prioritized lists and trend reports aligned to risk assessments.
    • Log collection and correlation: Centralizing logs from servers, network gear, security tools, and where possible industrial control systems, then automating correlation rules for known indicators and policy violations.
    • User access monitoring: Automated reporting on account changes, privileged access use, stale accounts, and multi-factor authentication coverage, with alerts on policy violations.
    • Evidence capture and retention: Automatically attaching logs, screenshots, configuration exports, and scan results to specific controls or policies in a repository to support audits and internal reviews.
    • Dashboarding and reporting: Generating periodic control health dashboards and exceptions lists, so that human reviewers can focus on interpretation and decisions rather than manual data collection.

    What automation cannot reliably do

    Several parts of NIST 800-53 continuous monitoring do not lend themselves to full automation, particularly in regulated manufacturing:

    • Risk acceptance and prioritization: Deciding which vulnerabilities or control gaps to accept, defer, or fix requires business, safety, and regulatory judgment.
    • Control design and tailoring: Selecting, tailoring, and scoping controls for OT and safety-critical systems is a design activity, not a monitoring task.
    • Evaluating process effectiveness: Determining whether an incident response, change control, or supplier management process is actually effective needs qualitative review, not just metrics.
    • Interpreting OT-specific constraints: Automated tools typically lack context on qualification, validation, and production constraints that drive why certain patches or architectural changes cannot be applied quickly.
    • Compliance judgments: Automation can provide evidence and metrics, but it cannot make defensible statements about compliance status on its own.

    Key dependencies and constraints in industrial environments

    The usefulness of automation for NIST 800-53 continuous monitoring depends heavily on your existing landscape and process maturity:

    • System diversity and age: Legacy PLCs, DCSs, and older HMIs may not support modern agents, APIs, or secure logging. Passive monitoring, network-based discovery, and selective integration are often the only viable options.
    • Integration quality: Automated monitoring tools must coexist with MES, ERP, historian, and QMS systems. Partial integration is common. Gaps in interfaces, identity management, or data models will limit what can be automated.
    • Downtime and validation constraints: Deploying agents, updating security tooling, or enabling new logging on production systems may trigger requalification or validation and cannot always be done on the vendor’s schedule. This slows rollout and sometimes forces lighter-touch approaches.
    • Data quality and normalization: Automation is only as good as the asset inventory, network diagrams, and configuration baselines it draws from. Incomplete or stale data will produce misleading dashboards and alerts.
    • Change control: Any automated change or remediation must go through established change control, with documented testing and rollback plans, especially in validated and safety-critical environments.

    How automation maps to NIST 800-53 continuous monitoring activities

    NIST 800-53 and associated guidance describe a continuous monitoring strategy built around defined metrics, event-driven updates, and periodic assessments. Automation can support several of those steps:

    • Defining key parameters and metrics: Once you decide what to measure (e.g., patch latency, number of unapproved configurations, account anomalies), automation can collect the raw data and compute metrics.
    • Ongoing security and configuration checks: Automated scans and configuration audits provide near real-time or scheduled checks of selected controls, especially technical access control, configuration management, and audit logging controls.
    • Event-driven updates: Triggers such as new high-severity vulnerabilities, significant configuration changes, or security events can initiate automated workflows that notify control owners and collect additional evidence.
    • Evidence packaging for assessments: Automation can pre-assemble evidence for periodic control assessments, reducing manual document hunting and screen captures.

    However, defining the monitoring strategy, selecting metrics, approving thresholds, and interpreting outcomes remain human responsibilities.

    Tradeoffs and typical failure modes

    Introducing automation into NIST 800-53 continuous monitoring in regulated manufacturing comes with predictable tradeoffs and risks:

    • Too much scope, not enough depth: Attempting to automate monitoring for every control at once often leads to shallow coverage and unreliable alerts. It is usually more effective to prioritize a subset of high-impact controls.
    • Alert fatigue: Poorly tuned tools generate noise that is ignored, effectively degrading monitoring. Thresholds and rules must be iteratively tuned to the actual environment.
    • Unvalidated changes to production systems: Automated remediation or configuration pushes can unintentionally impact production or validated states if not strictly controlled and tested.
    • Overreliance on IT-centric tools for OT: Tools built for corporate IT may misinterpret OT traffic or lack awareness of process-critical dependencies. Passive, read-only deployments are often the safest starting point for OT networks.
    • Assuming automation equals compliance: Dashboards showing “green” metrics do not replace formal risk assessments, documented justifications, or independent reviews required in many regulated contexts.

    Practical approach to adopting automation for continuous monitoring

    A pragmatic approach for industrial organizations is incremental and risk-based:

    1. Start from existing inventories and controls: Use current asset lists, network diagrams, and control matrices as the foundation. Identify where manual monitoring is most fragile or labor-intensive.
    2. Select a small set of high-value use cases: Common early wins include automated asset discovery on IT/DMZ segments, centralized logging for key servers and firewalls, and basic configuration drift detection for domain controllers and jump hosts.
    3. Separate OT and IT strategies: For core OT networks, consider passive monitoring and vendor-supported solutions, and avoid intrusive scanning unless tested and explicitly approved.
    4. Align with change control and validation: Treat monitoring tool deployment and configuration as controlled changes, with documented testing, rollback, and impact assessment.
    5. Define owners and review cadences: Make it explicit who reviews automated outputs, how often, and how findings feed into risk registers, CAPA, or similar processes.
    6. Iterate based on actual outcomes: Use early deployments to refine rules, thresholds, and data flows before scaling to additional plants or systems.

    Why full replacement strategies rarely work

    Some organizations try to replace existing monitoring, logging, and configuration tools with a single new platform in the name of NIST alignment. In aerospace-grade and other highly regulated environments, this often fails or stalls because:

    • Qualification and validation burden: Replacing a working tool can trigger system requalification, documentation rewrites, and revalidation that outweigh potential benefits.
    • Downtime and cutover risk: Monitoring is tightly coupled with production and safety. A mismanaged cutover can disrupt operations or leave blind spots.
    • Integration complexity: Existing MES, historian, QMS, and ERP interfaces are usually tailored over years. Rebuilding these integrations for a new platform is costly and risky.
    • Traceability and change history: Long equipment lifecycles mean historical logs and evidence must remain accessible. Wholesale replacement can complicate traceability unless carefully staged.

    Layered, coexistence-focused strategies are generally safer: augment existing capabilities with targeted automation rather than tearing everything out in one step.

    Bottom line

    Automation can substantially improve the efficiency, repeatability, and coverage of NIST 800-53 continuous monitoring activities, particularly for technical controls and evidence management. Its real value depends on careful scoping, integration with existing OT/IT systems, alignment with change control and validation practices, and clear human ownership of risk decisions and compliance judgments. It should be treated as an enabler, not a guarantee of compliance.

  • Is AS9102 mandatory for all aerospace first articles?

    AS9102 is not automatically mandatory for every aerospace first article. It becomes mandatory when it is explicitly required by one or more of the following:

    • Customer contract or purchase order terms
    • Customer quality clauses or supplier quality requirements
    • Prime or Tier 1 flowdown requirements (e.g., via SQAR, Q-notes, S-specs)
    • Your own QMS procedures that specify AS9102 as the standard FAI method

    AS9102 is a widely adopted standard FAI format in aerospace, but it is still a standardized method, not a universal legal mandate. Some OEMs and defense programs require fully compliant AS9102 Forms 1, 2, and 3 for defined scope (e.g., all flight hardware, safety critical, or key characteristics). Others accept an equivalent FAI structured differently, as long as the information content meets their requirements.

    In practice, this connects to digital AS9102 FAI when teams need to turn the answer into repeatable execution habits.

    When AS9102 is typically required

    • New part introduction on aerospace/defense programs that reference AS9100 and AS9102
    • First build from a new supplier, site, or production line
    • Configuration changes affecting fit, form, or function, when customer FAI re-trigger rules apply
    • After major process changes (new machine, facility move, new manufacturing route) when required by contract or QMS

    In these situations, the contract or OEM quality specification often calls out AS9102 explicitly, or references it as the default unless a different FAI format is agreed in writing.

    When you might not use AS9102 format

    There are common cases where a strict AS9102 form set is not used, even in aerospace:

    • Internal FAIs for process validation where your QMS defines a different template but equivalent content
    • Customer-specific FAI formats that differ from AS9102 (e.g., proprietary forms or portal-based workflows such as Net-Inspect configurations)
    • Legacy programs started before AS9102 adoption, still running under older FAI conventions
    • Non-flight or non-critical parts where the customer has not flowed down AS9102 or any formal FAI requirement

    In these scenarios, what is mandatory is whatever your customer contract, applicable quality specs, and internal procedures say. You can be audited against your commitments, but not automatically against AS9102 if it is never invoked.

    Brownfield reality and coexistence with other requirements

    In most established aerospace plants, you will see multiple FAI regimes coexisting:

    • Some programs requiring strict AS9102 compliance
    • Some legacy or commercial programs using older or simplified FAI formats
    • Some customer portals (e.g., Net-Inspect or OEM tools) that map to AS9102 concepts but use different data structures

    This mix is typical in brownfield environments with long product lifecycles and many OEMs. Attempts to force a single, universal FAI format can run into resistance due to contractual constraints, qualification burden, and revalidation cost. Often the practical approach is to standardize data content and traceability while still producing the specific form or portal output each customer requires.

    Key tradeoffs and constraints

    • Compliance risk: If a contract or quality clause calls out AS9102, deviating from that format without written customer approval is a risk and can surface in audits.
    • Internal consistency: If your QMS says “we perform AS9102 FAIs” in broad terms, auditors will expect evidence that FAIs follow the standard, not a patchwork of partial forms.
    • Operational burden: Running AS9102 for every low-risk, non-critical part can add paperwork without commensurate value, especially in high-mix, low-volume environments.
    • System limitations: Legacy MES, ERP, or PLM may not natively support AS9102 structures, so digital FAI often involves bolt-on tools or manual spreadsheets unless you invest in integration and validation.

    Any change from non-AS9102 FAI to AS9102 (or vice versa) across programs should go through formal change control, including updates to procedures, training, and, where relevant, validated systems.

    Practical guidance

    To determine whether AS9102 is mandatory for a given first article:

    1. Review the contract, PO, and referenced quality clauses for explicit AS9102 or FAI language.
    2. Check the customer’s supplier quality manual / specifications for FAI expectations and re-trigger rules.
    3. Confirm your internal QMS and work instructions: do they specify AS9102, an equivalent FAI, or program-specific rules?
    4. Align with your customer quality representative before deviating from AS9102 on parts where expectations are unclear.

    AS9102 is widely accepted because it standardizes expectations and evidence. But it is only mandatory where it has been made a requirement by contract, customer flowdown, or your own documented processes.

  • Can I build AI models directly on my MES database without a data warehouse?

    Yes, you can sometimes build AI models directly on an MES database without a data warehouse. For limited, read-only, non-critical analysis, it may be technically possible.

    But as a general production approach, it is usually not the right default in regulated manufacturing environments.

    In practice, this connects to data integrity, version control and audit when teams need to turn the answer into repeatable execution habits.

    The issue is not whether it is possible. The issue is whether the MES database is the right place to source, govern, contextualize, validate, and retain the data needed for reliable models without creating operational or compliance risk.

    Why direct MES access is often a bad default

    • MES databases are optimized for execution, not analytics. Query patterns for model training and feature generation can compete with shop floor transactions, reporting jobs, and integrations. In brownfield plants, that can create performance instability at exactly the wrong time.

    • Raw MES data is rarely analytics-ready. It often contains missing context, inconsistent timestamps, event duplication, late-arriving records, code-value variations, and plant-specific workarounds. If the data model reflects years of operational exceptions, the model will learn those inconsistencies too.

    • You usually need data beyond MES. Useful manufacturing AI often depends on ERP, QMS, PLM, historian, maintenance, lab, inspection, and sometimes manual records. MES alone may not contain the full causal chain for quality, throughput, delay, or scrap outcomes.

    • Traceability and reproducibility become harder. If source records can change after transactions are corrected, backfilled, or reprocessed, you can struggle to prove which data version trained which model. That matters for change control, investigation, and revalidation.

    • Security and access boundaries get messy. Direct connections from data science tools or AI platforms into a production MES database can expand attack surface, increase privilege complexity, and blur IT and OT responsibilities.

    • Validation effort rises. In regulated settings, the more tightly the model depends on live transactional structures and brittle custom joins, the harder it is to validate behavior and manage changes safely.

    When direct MES-based modeling can be reasonable

    It can be reasonable if all of the following are true:

    • You are using a read replica, reporting replica, or export, not the primary production database.

    • The use case is narrow, such as exploratory analysis, anomaly screening, or a pilot on one line or process area.

    • The data needed is mostly contained in MES and does not require heavy cross-system reconciliation.

    • You have stable identifiers, timestamps, revision handling, and event semantics.

    • You can document data lineage, model inputs, refresh logic, and change control.

    • The model is advisory, not making autonomous release, quality, or safety decisions.

    Even then, most teams end up creating a curated analytical layer because direct use of MES data becomes hard to maintain as scope grows.

    What you need instead of a full warehouse

    A data warehouse is not the only option. If the concern is cost, time, or architecture overhead, there are middle paths:

    • Read replicas for isolated analytical workloads

    • Curated data marts for specific use cases like yield prediction or cycle time variance

    • Lakehouse patterns if you need lower-cost storage and mixed structured data

    • Feature stores or governed model input layers if multiple models will reuse the same signals

    • Historian plus MES plus QMS extracts for process-focused analytics

    The practical requirement is not a warehouse by name. It is a governed, query-safe, version-aware data layer that does not put the execution system at risk.

    Brownfield reality

    In many plants, the MES is only one piece of a mixed vendor stack with custom interfaces, manual workarounds, and long-lived equipment. That matters because AI projects often fail when teams assume the MES database is a complete and clean system of record. It usually is not.

    Full replacement of MES, ERP, PLM, or QMS just to make AI easier is often the wrong move in regulated, long lifecycle environments. Replacement programs can trigger major qualification work, validation cost, downtime risk, interface rewrites, and traceability disruption. A coexistence approach is usually more realistic: extract and govern the data you need while leaving execution systems in place.

    Practical decision rule

    If the use case is small, read-only, and non-critical, direct access to a replica of MES data may be acceptable.

    If the use case will influence production decisions at scale, combine multiple systems, or need repeatable validation and auditability, build a governed analytical layer first. That can be modest in scope, but it should exist.

    So the short answer is yes, but usually not directly against the live MES database, and usually not without some intermediate data architecture.

  • How can we safely introduce custom KPIs without breaking comparability?

    Yes, you can introduce custom KPIs without losing comparability, but only if you treat KPIs like controlled objects: versioned, governed, and validated against a stable core. In regulated and multi-plant environments, the main goal is to add insight without breaking trend lines, benchmarks, and auditability.

    1. Establish a non-negotiable core KPI set

    Start by defining a small set of enterprise KPIs that must remain comparable across sites, lines, and time periods (for example: OEE, NPT, first-pass yield, scrap rate, on-time delivery, defect rate). Treat these as your reference frame.

    In practice, this connects to ISO 22400 KPI governance when teams need to turn the answer into repeatable execution habits.

    • Publish a controlled specification for each core KPI: purpose, scope, formula, timebase, data sources, inclusions/exclusions, and known limitations.
    • Put core KPIs under formal change control (similar to procedures): any change triggers impact assessment, backward compatibility review, and communication.
    • Make clear that custom KPIs may extend but not redefine this core set.

    2. Treat custom KPIs as derived, not alternative, views

    Where possible, define custom KPIs as derived from core KPIs or from the same atomically defined data elements used by the core set.

    • Prefer formulas like “Custom KPI = function(core KPIs, standard data elements)” instead of introducing new, opaque calculations.
    • For local nuances (e.g., special test steps, rework categories), define custom KPIs as filtered or segmented views (e.g., NPT for a specific product family) rather than totally new constructs.
    • Document the lineage explicitly: what they depend on, and how they differ from the core KPI they are closest to.

    This preserves comparability because everyone can still reconcile local metrics back to the agreed core definitions.

    3. Standardize definitions and metadata

    Comparability fails less due to math and more due to ambiguous definitions. To avoid that:

    • Use a shared data dictionary for KPI components (events, states, product families, defect codes, shift definitions, calendar rules).
    • Attach consistent metadata to every KPI: owner, formula, version, source systems, applicable sites/lines, intended decision use, and limitations.
    • Ensure terminology aligns with your MES/ERP/QMS master data; avoid plant-specific labels in enterprise KPIs.

    In brownfield environments, this often means mapping local codes and event types into a canonical layer before computing cross-plant metrics.

    4. Use a KPI governance model

    Custom KPIs should not appear via ad-hoc report edits in each plant. Create a lightweight but real governance process:

    • KPI request: Business owner submits a structured request describing problem, proposed KPI, and decision use.
    • Design review: Central cross-functional team (operations, quality, IT/data) checks for overlap with existing KPIs, core formula conflicts, and data feasibility.
    • Classification: Label as enterprise-standard, site-standard, or experimental/pilot, with different expectations for validation and documentation.
    • Approval & change control: Approved KPIs enter a controlled catalog with clear versioning and release notes.

    This does not have to be bureaucratic, but there must be a clear path from experiment to standard so that custom KPIs do not quietly fragment your metrics landscape.

    5. Ensure coexistence with legacy MES/ERP reporting

    In regulated, brownfield plants, core KPIs and some legacy reports are effectively baked into procedures, customer reports, and sometimes qualification dossiers. Replacing them outright is high risk.

    • Do not remove or redefine legacy KPIs that are referenced in specifications, customer agreements, or validated reports without a formal impact and revalidation process.
    • Where legacy KPI definitions are flawed, introduce a new corrected KPI with a distinct name, then run it side-by-side with the old one for a defined period.
    • Use integration layers or data marts to compute both “legacy” and “standardized” metrics from shared, validated data whenever possible, instead of letting each system calculate its own version silently.

    Full replacement of KPI logic embedded in validated MES/ERP modules usually triggers qualification, testing, and documentation that many plants underestimate; often a coexistence strategy is more realistic.

    6. Run overlapping periods and backfill where feasible

    To avoid breaking trend and benchmark comparability when introducing custom or revised KPIs:

    • Operate new KPIs in parallel with incumbent ones for a defined period, and document the observed differences (offsets, sensitivities, volatility).
    • Where technically and procedurally allowed, back-calculate the new KPI on historical data so you can maintain long-term trend lines and year-on-year comparisons.
    • If backfill is not possible (e.g., missing data granularity), explicitly mark on dashboards and management reviews where definitions changed so that misinterpretation is less likely.

    7. Make segmentation explicit instead of multiplying KPIs

    Many “custom KPIs” are really just segmentations of existing KPIs by product, customer, technology, or shift.

    • Keep the KPI definition constant; vary the population. For example, “OEE for Cell A” instead of “Advanced Cell A Uptime Index.”
    • Use consistent filter logic (e.g., product families, qualification statuses) documented centrally, not hidden in local queries.
    • Encourage sites to reuse the same KPI definition across segments to avoid a proliferation of slightly different metrics.

    This approach delivers local insight while preserving cross-site comparability of the underlying KPI.

    8. Preserve auditability and traceability

    For regulated environments, the main risk of custom KPIs is poor traceability from reported numbers back to data and logic. Mitigate this with:

    • Versioned KPI definitions and calculation logic kept in a controlled repository (could be part of your validated reporting/analytics stack).
    • Clear mapping from KPI outputs on dashboards or PDF reports back to data sources, transformations, and filters.
    • Documented validation/qualification for KPIs used in regulated decisions or external reports, with evidence of testing after any change.

    Do not imply that a KPI is “validated” or “compliant” unless it has gone through your formal validation or qualification process.

    9. Clarify usage levels: enterprise, plant, team

    Assign a “level” to each KPI so expectations for comparability are explicit:

    • Enterprise KPIs: Fully standardized, cross-plant comparable, used in external or executive reporting.
    • Plant KPIs: Standard within one site, potentially not comparable to other sites.
    • Team/Cell KPIs: Local, tactical metrics used for daily management and problem solving, not for cross-site benchmarking.

    Custom KPIs often live at plant or team level. Making that explicit avoids accidental use in enterprise dashboards or audits as if they were globally comparable.

    10. Communicate limitations clearly

    No KPI is perfect, and comparability is never absolute. To keep expectations realistic:

    • Publish known limitations (data gaps, approximations, site-specific constraints) alongside KPI definitions.
    • Educate leaders that numeric differences across sites may reflect both performance and context differences (mix, test coverage, rework policies, automation level).
    • Review KPIs periodically for relevance, data quality, and unintended behaviors they drive.

    By anchoring a small, stable core KPI set, tightly controlling definitions and lineage, and running new metrics in parallel before rolling them into formal reporting, you can introduce meaningful custom KPIs without losing comparability or undermining audit readiness.

  • How do digital work instructions feed data into our QMS?

    Digital work instructions feed data into a QMS by capturing structured execution data at the point of work, then handing selected records to QMS workflows through defined integrations. How robust this is in practice depends on your QMS capabilities, integration design, data model, and validation state.

    What data can flow from digital work instructions into a QMS?

    Typical data elements that can be pushed or made available to the QMS include:

    In practice, this connects to qms integration and evidence trails when teams need to turn the answer into repeatable execution habits.

    • Execution evidence: who did what step, when, on which order/serial/lot, with which revision of the instruction.
    • Completion and verification: step sign-offs, dual sign-offs, and e-signatures where required by your procedures.
    • Inspection and measurement results: recorded values, pass/fail statuses, gage IDs, and links to measurement records.
    • Defects and deviations: operator-logged issues, defect codes, photos, and comments that can initiate or feed nonconformance records.
    • Training and qualification usage: evidence that a qualified operator used the current approved instruction for a given job.
    • Process conformance signals: skipped steps, out-of-sequence work, rework loops, and holds that may need QMS visibility.

    Common integration patterns with a QMS

    In brownfield environments, digital work instructions usually coexist with a QMS, MES, and ERP rather than replacing them. Data flow typically follows one or more of these patterns:

    • Event-based triggers: Specific events in the work instruction system (e.g., “step fails”, “defect logged”, “rework started”) are configured to trigger QMS actions such as creating or updating an NCR, deviation, or CAPA record.
    • API-based synchronization: The work instruction system calls QMS APIs (or a middleware layer) to send structured execution data, associating it with part, order, lot, and configuration identifiers used by the QMS.
    • Message bus / middleware: Events are published to an integration bus (e.g., MQTT, Kafka, ESB), then transformed and routed into the QMS. This is more common where multiple plants and systems need consistent mapping.
    • Batch exports for evidence: Periodic exports of execution logs, inspection results, and attachments are stored in a repository or DMS and then referenced from the QMS as objective evidence for audits and investigations.
    • Indirect integration via MES: In many plants, the MES is the primary integration point. Digital work instructions feed data into MES, and MES feeds summarized or selected data into the QMS.

    The right pattern depends on how open your QMS is, how much change your IT and quality teams can support, and how tightly you want execution events coupled to quality workflows.

    How this supports NCR, CAPA, and audit evidence

    When integrated correctly, digital work instructions can reduce manual data entry into the QMS and improve traceability:

    • Nonconformance (NCR): Operator logs a defect during a step. The system creates a draft NCR in the QMS (or feeds the existing NCR system), pre-populating work order, part, serial/lot, step ID, operator, and attachments (photos, notes).
    • CAPA and problem-solving: Recurring failure patterns from work instruction data (e.g., repeated issues at one step, shift, or revision) can be analyzed and then linked to CAPA records. The QMS remains the system of record for CAPA, but the data used for root cause analysis comes from digital execution history.
    • Training and competency evidence: QMS or HR systems maintain operator qualifications. The work instruction system references those records to enforce who can execute or sign off specific steps, then returns usage data that can be used during audits to show that trained personnel followed the current approved instruction.
    • Audit trails: Time-stamped, immutable logs of step execution, sign-offs, and instruction revisions can be referenced by the QMS as objective evidence in internal and external audits.

    Key dependencies and failure modes

    Several practical issues often determine whether work instruction data is truly useful to the QMS:

    • Data model alignment: If part numbers, revision schemes, defect codes, and work order identifiers are not harmonized across systems, QMS records will be incomplete or mislinked.
    • Integration validation: In regulated environments, the integration itself often needs to be tested and validated. Poorly validated interfaces risk data gaps, duplicate records, or incorrect associations that are hard to detect until an audit or investigation.
    • Version and change control: If work instruction revisions are not tightly linked to document control and QMS change processes, you can end up with QMS records that reference the wrong or ambiguous version of the instruction.
    • Partial deployments: When only some lines or plants use digital work instructions, the QMS will contain a mix of digital and manual evidence. Your processes must explicitly define how both are handled, or you risk inconsistent investigations and audit findings.
    • Human workarounds: If the digital workflow is slow or hard to use, operators may bypass steps and log defects directly in the QMS or on paper, breaking the data chain.

    Coexistence with existing QMS and MES systems

    In most aerospace and other regulated operations, the QMS is established and tightly linked to existing MES/ERP stacks. Replacing the QMS or making it the point-of-work UI is rarely practical due to:

    • Qualification and validation burden for any major QMS or MES replacement.
    • Downtime and change risk when re-plumbing core production and quality workflows.
    • Integration debt across plants, sites, and suppliers that would need to be reimplemented.

    As a result, digital work instructions are typically introduced as the operator-facing layer while QMS and MES remain the systems of record. The strategic goal is usually to:

    • Keep QMS as the authoritative system for nonconformance, CAPA, audits, and controlled documents.
    • Use digital work instructions to capture high-fidelity execution and defect data at the source.
    • Integrate so that QMS workflows are fed, not duplicated, by execution data, with clear ownership of each data set.

    Practical steps to make the data flow work

    To ensure digital work instructions reliably feed your QMS:

    • Map which QMS processes (NCR, CAPA, audits, training) should consume which specific execution data elements.
    • Align identifiers and coding (parts, operations, defect codes, locations) across systems before integration.
    • Design and document the integration flows, including error handling and reconciliation procedures.
    • Include the integration in your validation and change control processes, with test cases that reflect real failure scenarios.
    • Train operators and quality engineers on when to initiate records via the work instruction system versus directly in the QMS, to avoid double entry and gaps.

    Done this way, digital work instructions do not replace your QMS, but they significantly improve the timeliness, completeness, and traceability of the data that the QMS relies on.