Monday, December 15, 2025

PROTECT: Engineering Field Guide for Threat Modeling

An interrogation framework for modern system design.

In practice, integrating multiple threat modeling frameworks reduces blind spots and rework by forcing earlier alignment between threats, impact, and controls. 

The result is stronger security outcomes, improved privacy posture, and better alignment with regulatory requirements.

Phase 1: VAST (The Attack Surface)

Focus: Topology, Boundaries, and Dependencies.

Mapping the Architecture

  • Boundary Analysis: Where does the data cross from a High-Trust zone (e.g., Private VPC) to a Low-Trust zone (e.g., Public Internet)? Is this explicitly drawn?
  • Actor Identification: Have we mapped every non-human actor? (e.g., Sidecars, lambda functions, cron jobs, CI/CD runners).
  • Dependency Graph: Which third-party libraries or external APIs are in the critical path? If npm package X is compromised, does the whole system fall?

Infrastructure & Scale

  • Scalability Bottlenecks: Identify the specific component (DB Write Master, Load Balancer) that will fail first under a DDoS condition.
  • Cloud Responsibility: For our PaaS/SaaS components, exactly where does the vendor's security stop and ours begin? (e.g., "AWS secures the cloud, we secure the S3 bucket config").

Phase 2: STRIDE (The Vulnerability Hunt)

Focus: Breaking the Logic.

Authentication (Spoofing)

  • Mechanism: How do we handle service-to-service auth? (e.g., mTLS, JWT, or static API keys?)
  • Identity Source: If the Identity Provider (IdP) goes down, what is the fail-open/fail-closed behavior?

Integrity & Input (Tampering)

  • Validation location: Do we validate input at the edge (WAF), at the controller (Code), or at the persistence layer (DB)? (Ideally all three).
  • Supply Chain: How do we verify that the container image deployed is the exact binary built by CI? (e.g., Image signing).

Observability (Repudiation)

  • Non-Repudiation: Can a rogue admin delete the audit logs that record their own actions?
  • Traceability: Do we have a correlation ID that tracks a request from the WAF all the way to the Database?

Confidentiality (Information Disclosure)

  • Secrets Management: Are secrets injected at runtime (Vault/Secrets Manager) or present in environment variables/code?
  • Data Leakage: Do error responses return stack traces, internal IP addresses, or version numbers to the client?

Availability (DoS)

  • Resource Starvation: Do we enforce rate limiting per-IP, per-user, or per-tenant?
  • Logic Bombs: Can a user upload a file that triggers recursive parsing (XML Bomb) or memory exhaustion?

Authorization (Elevation of Privilege)

  • Horizontal Escalation: Can User A access User B's resource by simply changing the ID in the URL (IDOR)?
  • Vertical Escalation: Does the API rely on the client to send its role (e.g., isAdmin=true), or is this validated server-side?

Phase 3: DREAD (The Risk Calculator)

Focus: Quantifying the Badness.

  • Damage: If this exploit lands, do we lose one user's session or the entire master database?
  • Reproducibility: Is this a "lab-only" theoretical exploit, or can it be scripted reliably?
  • Exploitability: Does the attacker need a supercomputer/insider access, or just curl?
  • Discoverability: Is the vulnerability broadcast in our HTTP headers, or hidden deep in compiled logic?

Phase 4: LINDDUN (The Privacy Engineer)

Focus: Data ethics and leakage.

  • Metadata Analysis: Even if the payload is encrypted, does the traffic pattern (size/timing) reveal user activity?
  • Data Minimization: Are we collecting fields we "might need later" (toxic assets) or only what is strictly required?
  • Unlinkability: If we combine Dataset A (Public) with our Anonymized Dataset B, can we re-identify users?

Phase 5: PASTA (The Reality Check)

Focus: Simulation & Resilience.

  • Kill Chain Validation: "If I am an attacker and I compromise the Web Server..."
    • ...Can I reach the Database? (Network Segmentation)
    • ...Can I read the keys? (IAM roles)
    • ...Will anyone notice? (Alerting)
  • Resilience: If the primary Region goes dark, is the failover automated or manual? Have we tested it?
  • Drift Detection: What prevents a developer from turning off the WAF tomorrow? (Infrastructure as Code / Policy as Code).

PROTECT: Integrating STRIDE, DREAD, LINDDUN, and PASTA for Threat Modeling

PROTECT: (P
rofile Review and Offensive Threat Evaluation for Countermeasures and Tactics)

The PROTECT framework acknowledges that no single methodology covers every aspect of modern security. Instead of choosing one, PROTECT orchestrates the industry's best specific-use models into a cohesive lifecycle. It leverages VAST for visibility, STRIDE for coverage, DREAD for prioritization, LINDDUN for privacy, and PASTA for defense.

PROTECT Threat Model Steps

1. Profile System and Assets (The Lens: VAST)

Objective: Visualize the architecture to establish scope.

  • The "Why": You cannot secure what you cannot understand. Before we can identify threats, we must have a clear, shared mental model of the system.
  • The Linkage: We use VAST (Visual, Agile, Simple) here not as a rigid checklist, but as the delivery mechanism. By creating a VAST-compliant process map, we generate the "Map" that the subsequent steps will hunt upon.

Key Actions:

  • Develop high-level architecture diagrams focusing on data flows, trust boundaries, and dependencies.
  • Profile threat actors (motivations, capabilities, resources).
  • Identify and prioritize critical assets based on business value.

2. Review Threats (The Net: STRIDE)

Objective: Achieve comprehensive threat coverage.

  • The Bridge (from Step 1): Once we have the VAST diagrams (the Map), we need a methodical way to sweep that map for vulnerabilities.
  • The Linkage: STRIDE acts as our "dragnet." It ensures we don't rely on gut feelings. We systematically apply STRIDE categories to every interaction and boundary identified in Step 1 to ensure we haven't missed a standard class of attack (like Spoofing or Tampering).

Key Actions:

  • Spoofing: Identify threats related to authentication and impersonation.
  • Tampering: Identify threats related to unauthorized modification of data or systems.
  • Repudiation: Identify threats related to the ability to deny actions or transactions.
  • Information Disclosure: Identify threats related to the unauthorized exposure of sensitive data.
  • Denial of Service: Identify threats related to the disruption or degradation of system availability.
  • Elevation of Privilege: Identify threats related to gaining unauthorized access or permissions.

3. Offensive Threat Impact Evaluation (The Scale: DREAD)

Objective: Filter noise and prioritize risk.

  • The Bridge (from Step 2): STRIDE is excellent at finding possible threats, but it doesn't tell us which ones matter. A STRIDE analysis often produces a massive, unprioritized list of "what-ifs."
  • The Linkage: We apply DREAD to the list generated by STRIDE to score them. This transforms a flat list of technical bugs into a ranked list of business risks. This is where we move from "Security Engineering" to "Risk Management."

Key Actions:

  • Damage: Assess the potential damage caused by the threat if it were to occur.
  • Reproducibility: Determine how easily the threat can be reproduced or exploited.
  • Exploitability: Evaluate the level of skill and resources required to exploit the threat.
  • Affected Users: Assess the number of users or systems that could be impacted by the threat.
  • Discoverability: Determine how easily the vulnerability or weakness can be discovered by potential attackers.

4. Evaluate Privacy Concerns (The Blindspot: LINDDUN)

Objective: Address non-security data risks.

  • The Bridge (from Step 3): Traditional security scoring (DREAD) focuses on broken systems. However, a system can be perfectly secure (unhackable) and still violate privacy laws (e.g., excessive data collection).
  • The Linkage: We pause the security workflow to run a specific LINDDUN pass. This captures the risks that STRIDE misses, specifically where the system functions exactly as designed, but that design harms the user's privacy (e.g., Unawareness or Linkability).

Key Actions:

  • Linkability: Determine if data from different sources can be combined to identify an individual or link their activities.
  • Identifiability: Assess if an individual can be singled out or identified within a dataset.
  • Non-repudiation: Evaluate if an individual can deny having performed an action or transaction.
  • Detectability: Determine if it is possible to detect that an item of interest exists within a system.
  • Disclosure of Information: Assess the risk of unauthorized access to or disclosure of sensitive information.
  • Unawareness: Evaluate if individuals are unaware of the data collection, processing, or sharing practices.
  • Non-compliance: Determine if the system or practices are not compliant with privacy laws, regulations, or policies.

5. Countermeasures and Tactical Safeguards (The Fix: PASTA)

Objective: Simulate attacks and validate defenses.

  • The Bridge (from Steps 3 & 4): We now have a prioritized list of Security risks (from DREAD) and Privacy risks (from LINDDUN). The final question is: Do our defenses actually work against a motivated human adversary?
  • The Linkage: We use the simulation strengths of PASTA (Process for Attack Simulation and Threat Analysis) here. While PASTA is a full lifecycle, PROTECT leverages its specific strength in Attacker-Centric simulation. We don't just patch vulnerabilities; we build attack trees to see if our proposed countermeasures actually break the attacker's kill chain.

Key Actions:

  • Attack Modeling: Simulate realistic attack scenarios and identify choke points.
  • Vulnerability Assessment: Conduct technical validation (pen-testing, code review) for high-risk vectors.
  • Countermeasure Analysis: Design countermeasures that address root causes. Map controls to regulatory requirements (PCI DSS, NIST 800-53, etc.).

PROTECT Summary

  • VAST draws the map.
  • STRIDE finds the holes in the map.
  • DREAD decides which holes are dangerous.
  • LINDDUN checks if the map exploits the user.
  • PASTA tests if the fences we build can actually stop the wolves.

The PROTECT model provides a comprehensive and integrated approach to threat modeling by combining the strengths of VAST, STRIDE, DREAD, LINDDUN, and PASTA into a unified framework.

Tuesday, December 9, 2025

Secure Software Delivery in Safety-Critical Systems

Why ASIL-D and DAL-A Now Require the Same Architecture

Introduction

Over the last several months, I’ve been working deeply with two industries that historically spoke different languages: automotive safety and aviation design assurance.

What surprised me: when you look at the engineering required for secure software delivery in their highest safety tiers, ASIL-D (Automotive Safety Integrity Level D) and DAL-A (Design Assurance Level A), the systems are effectively the same.

Standards bodies in both domains are now explicitly cross-referencing each other. This is a deliberate recognition of the rigor necessary for software-defined safety operating at fleet scale.

This post explains:

  • Why automotive and aviation converged
  • What the modern secure delivery architecture looks like
  • Which controls are identical vs. differently labeled
  • What each industry can learn from the other
  • Why this matters for certification, talent, and hiring

Why Convergence Happened

Historically, automotive and aviation had different assumptions:

Industry Past Assumption
Automotive Software is a performance feature bolted onto mechanical safety
Aviation Software supports but does not control physical flight mechanisms

Those assumptions collapsed:

  1. Software directly controls safety outcomes
    Brake-by-wire and fly-by-wire architectures made software a single point of failure.
  2. Long-lifecycle assets require secure updates
    Vehicles and aircraft must receive trustworthy updates for 15–20+ years.
  3. Regulators recognized updates as a persistent attack surface
    Cybersecurity is now inside the safety case.

As a result:

  • Automotive: UN R155/R156 made cybersecurity and update management mandatory for type approval.
  • Aviation: DO-326A/356A introduced cybersecurity artifacts into the certification basis.

Every software update must be cryptographically controlled, verifiable, and reversible without bricking fleets.

Standards Cross-Referencing

This convergence is codified:

  • ISO/SAE 21434 references aviation security concepts from DO-326A
  • DO-356A incorporates safety principles from ISO 26262
  • eVTOL certification guidance borrows from automotive OTA security practice

Different roots. Same alignment.

Requirements Are Effectively the Same

When aligned by engineering controls, the equivalence becomes obvious:

Concern Automotive (ASIL-D) Aviation (DAL-A)
Functional SafetyISO 26262DO-178C
CybersecurityISO/SAE 21434 + UN R155/R156DO-326A / DO-356A + DO-355A
Digital Signature Algorithmse.g., RSA-3072 / ECDSA P-384ARINC 835 (same families)
Private Key ProtectionHSM (FIPS 140-2/3)HSM (FIPS 140-2/3) + offline custody
RevocationCRL or OCSPCRL or OCSP
Rollback ProtectionMonotonic counterMonotonic counter

Different paperwork, same controls.
The cryptographic trust model is shared.

The Modern Secure Delivery Architecture

Here's an example architecture that meets both ASIL-D and DAL-A expectations for secure software delivery.

Figure 1: PKI-based secure delivery architecture supporting ASIL-D and DAL-A compliance.

At a high level:

  • An offline Root CA anchors trust, with tightly controlled ceremonies
  • An HSM-protected Code-Signing CA issues signatures on release artifacts
  • A revocation service distributes CRLs/OCSP responses
  • Updates are distributed over UPTANE, ARINC 615A/827, or equivalent secure loaders
  • The target ECU/LRU (Electronic Control Unit / Line Replaceable Unit):
    • Verifies signature against a burned-in root public key
    • Checks revocation status
    • Enforces anti-rollback via monotonic counters
    • Uses atomic install with dual-bank fallback
    • Enforces secure boot on every power cycle

Every update package is treated as hostile until proven otherwise.

UPTANE dominates automotive OTA distribution. Aviation reaches the same controls via ARINC loaders and DO-326A artifacts. These are different implementation paths with the same trust requirements.

What Each Industry Gets Right

Aviation strengths

  • Rigor in independent verification (no self-approval)
  • Hardware-enforced rollback counters per critical module
  • Zero tolerance for dead code in certified builds

Automotive strengths

  • Mature SBOM workflows (CycloneDX / in-toto)
  • Proven million-unit OTA rollout practices
  • Faster iteration in cybersecurity management systems (CSMS)

Risk-Based Assurance: A Shared Language

LevelAutomotiveAviationFailure Condition
4ASIL-DDAL-AMultiple fatalities, loss of vehicle/aircraft
3ASIL-CDAL-BSingle fatality / severe injury
2ASIL-BDAL-CMission abort / serious injury
1ASIL-ADAL-DMinor injury / inconvenience
0QMDAL-ENo safety effect

Certification Implications

If your secure delivery process is already approved for ASIL-D + ISO/SAE 21434 + UN R155 compliance, or DAL-A + DO-326A + ARINC 835 compliance — then you are very close to certification in the other domain.

Benefits of convergence:

  • Faster multi-market productization
  • Shared platform for PKI, SBOM, and secure boot
  • Consolidated supplier requirements
  • Broader talent mobility

The architecture is the same.
The talent pipeline is not.

Standards Ecosystem Alignment

Figure 3: Explicit cross-referencing across automotive and aviation security and safety standards.

This map shows, at a glance:

  • ISO 26262 and DO-178C as the functional safety backbone
  • ISO/SAE 21434, UN R155/R156, DO-326A, DO-356A, and DO-355A framing cybersecurity
  • Cross-reference arrows where one standard family borrows from or references the other

Conclusion

Automotive is becoming more like aviation — safety-critical actuators everywhere.
Aviation is becoming more like automotive — connected fleets with continuous updates.

The industry has already created common roots in a similar architecture.

If you are designing or certifying secure update pipelines in either domain and want to sanity-check your approach against both ecosystems, I’m always open to a conversation.

Where else are you seeing this convergence?

  • Medical: IEC 62304 + IEC 80001-1 + FDA cyber guidance
  • ICS: IEC 62443 + IEC 61508/61511
  • Rail: EN 50128 + TS 50701

Secure, cryptographically controlled updates are becoming universal in safety-critical systems.

Connect on LinkedIn

Wednesday, December 3, 2025

"Model Memory" attacks. I was wrong.

I thought AI "Model Memory" (whether the model or the logs) was just security FUD. Wasn't that solved already? Sure, sensitive information is involved - but please tell me it's not accessible… Right??

I’ve been knee-deep in AI retention stats lately, and one concept kept nagging at me: the idea that "model memory," such as retained prompts, chat histories, or session context, is quietly killing projects.

I’ve now reviewed several AI rollouts this year, and I’ve never personally seen an issue with it. It always felt like a keynote stump-the-chump quip: "What if the model remembers your SSN?"

So I went digging. Turns out, the problem isn't "Skynet never forgets" and the borg is after you. It's less interesting because it's boring, messy human error. But the damage is real.

Here's 4 times in 2025 where retention triggered bans, patches, or headlines.

1. DeepSeek's Exposed Chat Logs (Jan 2025)

DeepSeek AI platform exposed user data through unsecured database | SC Media

This startup left a ClickHouse database open. No "model regurgitation," but millions of plaintext chat contexts (PII + keys) were exposed.

👉 The Cost: "Security experts noted that such an oversight suggests DeepSeek lacks the maturity to handle sensitive data securely. The discovery raises concerns as DeepSeek gains global traction [...] prompting scrutiny from regulators and governments."

2. Microsoft 365 Copilot's EchoLeak (June 2025)

Inside CVE-2025-32711 (EchoLeak): Prompt injection meets AI exfiltration

A zero-click vulnerability allowed attackers to hijack the retrieval engine. The model’s working memory blended malicious input with user privileges, leaking docs without a single click. This resulted in CVE-2025-32711 (EchoLeak).

👉 The Cost: Shines a spotlight on Copilot’s prompt parsing behavior. In short, when the AI is asked to summarize, analyze, or respond to a document, it doesn’t just look at user-facing text. It reads everything, including hidden text, speaker notes, and metadata.

3. OmniGPT's Mega Chat Dump (Feb 2025)

OmniGPT Claimed To Be Subjected to Extensive Breach | MSSP Alert

Hacker "Gloomer" dumped 34M lines of user chats. The "memory" here was full conversation history stored for personalization, including financial queries and therapy-like vents.

👉 The Cost: 62% of AI teams now fear retention more than hallucination (McKinsey).

4. Cursor IDE's MCPoison (July 2025)

Cursor IDE's MCP Vulnerability - Check Point Research

A trust flaw in the Model Context Protocol allowed attackers to swap configs post-approval.6 Once in the session "memory," it could execute commands on every launch.

👉 The Cost: Devs at NVIDIA and Uber had to update frantically to close the backdoor.

The Bottom Line:

These aren't daily fires, but they are real issues that still need to be addressed. You may have seen my current discussions on storing hashes instead of payloads. There are multiple reasons behind this such as privacy, security, storage requirements, network requirements, processing, validation speed, etc. But it also helps reduce your attack surface for issues like this.

Of course, it's only one layer of defense. Data retention is still here. Good teams "receipt-ify" (store hashes, not payloads) and also enforce purges of sensitive information.

One slip, and you're the next headline. Define and enforce good hygiene.  

Stop Saying “Compliance ≠ Security.” You’re Missing the Point.

“Compliance is just theater. Checkboxes don’t equal security.” I'm sorry, but this is just wrong.

Why? Because every authoritative framework worth pursuing has mandated risk management. It’s not hidden in an appendix.

Every Major Framework Explicitly Requires Risk Management

  • SOC 2 (Trust Services Criteria): CC3.0 Risk Assessment and CC9.0 Risk Mitigation are mandatory common criteria. No documented risk assessment = automatic failure.
  • NIST SP 800-53 Rev 5: The entire RA and PM families require an organization-wide risk management framework. Controls are then tailored.
  • PCI DSS v4.0 Requirement 12.2 mandates a formal annual risk assessment. Requirement 12.3.1 introduces targeted risk analysis to justify control frequency.
  • ISO 27001/27002 (2022) Clause 6.1.3 and control 8.2 require you to establish, implement, maintain, and continually improve a risk management process.
  • And on it goes.

These aren't suggestions. Risk Management is a mandatory exercise.

You are required to exceed minimums when your risk demands it. The frameworks explicitly say the baseline is a floor, not your high-water mark.

What About Breaches?

When organizations are technically “compliant” and still get breached, the failure is almost always tied to a nonexistent or terribly executed risk management program.

Mature programs:

  • Start with the required risk assessment, then select and tailor controls
  • Apply stricter measures to high-risk/crown-jewel assets, lighter ones elsewhere
  • Exceed minimums where their own risk analysis justifies it
  • Continuously reassess because every framework demands it

That’s not “checkbox compliance.” That’s literally what the standards require.

So next time you’re tempted to say “compliance doesn’t equal security,” I'm curious to see your last risk assessment that actually drove control selection.

Because if your takeaway from reading SOC 2, NIST, PCI DSS, ISO, etc. is “just a checklist,” the problem might not be the framework...  


Tuesday, November 18, 2025

The Privacy Paradox: Proving Compliance Without Storing Sensitive Data

Consider a routine audit call you’ve probably lived far too many times:

Auditor: “We need proof MFA is enforced for all 10,000 accounts.”

Team: “We’ll export the Okta user list with MFA status and...”

Privacy Officer: “Absolutely not. That spreadsheet is a GDPR reportable breach waiting to happen.”

Auditor: “I still need immutable, point-in-time evidence.”

…dead silence.

You just hit the core paradox of modern compliance:

Auditors demand proof. Privacy laws demand you don’t keep that proof.

Traditional tools pick a side and lose:

  1. Store everything → perfect audits, terrible privacy
  2. Store nothing → perfect privacy, failed audits

There’s a third way almost nobody uses.

The Receipt Model: Store Proof, Not Payloads

Think Apple Pay. Your phone never stores your real credit card, just a token, a cryptographic signature, and a pointer to the bank. Compliance evidence should work exactly the same way.

Here’s an example receipt you could store (and notice what’s missing):

{
  "system_id": "okta-prod",
  "control_id": "MFA-001",
  "verdict": "PASS",
  "observed_at": "2025-01-15T14:30:00Z",
  "audit_pointer": "okta://policy/12345/version/7",
  "payload_sha256": "a7c8f3e2d1b4c9f6e3..."
}

No usernames, no emails, no configs, no secrets.

How it works

  1. Collector queries Okta → gets policy JSON
  2. Engine evaluates → PASS/FAIL
  3. Compute SHA-256 of the exact JSON evaluated
  4. Store verdict + timestamp + hash + pointer → discard raw data

Six months later the auditor asks for proof. You hand them the receipt. They can:

  • Verify nothing was tampered with (hash)
  • See exactly when you checked (timestamp)
  • Pull the live policy from Okta and validate themselves (pointer)

This is the sweet spot because:

  • Auditors get → timestamped, reproducible, tamper-evident evidence
  • Privacy gets → zero PII stored, data minimization by design
  • Security gets → no juicy evidence honeypot for attackers

Evidence Lake architecture (3 layers)

  1. API collectors (least-privilege)
  2. Evaluation engine → verdict + hash + pointer
  3. Append-only storage (S3 + object lock) → ~200-byte receipts retained 7 years

Real examples

  • FedRAMP root MFA: 365 receipts/year = 73 KB instead of 1.8 MB of sensitive config snapshots
  • Privileged access reviews: store just the violation count + pointer instead of spreadsheets full of usernames
  • Vault root tokens: prove zero active tokens without ever exporting the tokens themselves

Dealing with auditors

“I need the actual policy, not your summary.” You: “Here’s the exact pointer to pull it live from Okta. The stored hash proves it matches what we evaluated on [date].”

When they still look skeptical: “We’ve never accepted this before.” You: “That’s fair — but this is the exact model courts use for digitally signed evidence and that NIST 800-53, DORA (EU effective Jan 2025), and FedRAMP now explicitly endorse: cryptographic hash + timestamp + verifiable pointer. We’re not inventing anything; we’re applying decades-old forensic evidence standards to compliance. It’s actually more rigorous than storing raw user lists.” They usually stop pushing at that point.

Cost vs Benefit

  • Cost: 2–3 weeks of engineering
  • Benefit: no more GDPR heart attacks, 95% less storage, auditors off your back for good

Start with one high-risk control this quarter. You’ll never go back to evidence spreadsheets again. Every piece of data you collect is a liability. Before you store anything, ask: “Do I need the data, or just proof I checked it?”

99% of the time, you just need the receipt.

Tuesday, November 11, 2025

Why Your Compliance Automation Will Become Shelfware (And the Two Rules That Prevent It)

The Pattern I Keep Seeing

Over 25 years in cybersecurity and compliance, I've developed a strong opinion about why most compliance automation projects fail. Whether it's a vendor platform that gets deployed and abandoned, or an internal build that never quite gets adoption, the failure pattern is remarkably consistent.

The projects that die aren't killed by bad technology. They die because they're designed as passive repositories instead of active participants in how work actually gets done.

If you're building compliance automation right now, or evaluating vendors, understanding this distinction will save you millions and years of wasted effort.

The CMDB Trap: Why "Source of Truth" Thinking Fails

Here's the pattern: someone in compliance or security decides "we need a single source of truth for all our systems and controls." It sounds logical. You can't secure what you don't know about. You can't comply without visibility.

So teams start building:

  • System inventory with metadata (owner, data classification, connections)
  • Control mapping to requirements (NIST, SOC 2, FedRAMP)
  • Evidence collection pipelines
  • Risk scoring and dashboards
  • Attestation workflows

The data model gets complex. You're pulling from 15 different sources. You build normalization layers, reconciliation logic, beautiful UIs. Leadership loves the demos.

Then reality hits.

The data goes stale because updating it requires manual effort. Engineers bypass the system because it's not in their critical path. Exceptions pile up. The compliance team starts maintaining a separate spreadsheet "just for these edge cases." Within 18 months, you're back to manual processes.

Why does this keep happening?

Because these systems aren't built into how work actually gets done. They're observation layers that rely on people checking dashboards and manually updating records. People don't check dashboards unless the dashboard gives them something they can't get elsewhere. Teams don't update records unless the records are required for something they already need to do.

The system isn't in the critical path of anything people actually care about.

What Went Wrong?

The root cause isn't the technology. It's the mental model.

These projects fail because they're designed as passive repositories instead of active participants in how work gets done. They're built on the assumption that if you collect enough information and make it visible, people will magically change their behavior.

They won't.

People don't check dashboards unless the dashboard gives them something they can't get elsewhere. Teams don't update records unless the records are required for something they already need to do. Data doesn't stay fresh unless it's refreshed as a byproduct of real work.

This is why CMDBs fail. This is why compliance automation becomes shelfware. The system isn't in the critical path of anything people actually care about.

The Two Rules That Actually Work

Through building compliance programs at Intel, VMware, Oracle Cloud, and AWS, I've identified exactly two patterns that prevent the death spiral:

Rule 1: Event-Driven Data, Not Polling

Your compliance system should update itself when things happen, not by periodically asking "what changed?"

The difference matters:

  • Polling: Your system checks identity providers daily at 3am to see if policies changed
  • Event-driven: The identity provider sends your system a webhook immediately when a policy changes

Event-driven systems stay current because they're reacting to the same events that drive the business. When an engineer deploys a change at 2am, when does your compliance system know about it? If the answer is "whenever the next sync runs," you're building a system that will become stale.

The test: Could engineers bypass your system entirely and nothing would break operationally? If yes, your data will rot.

Rule 2: Tied to Action, Not Just Observation

Every piece of data you collect should be required for a decision or trigger an action. If it's just "nice to know," it will go stale within months.

The difference:

  • Observation: Your system shows which systems have MFA enabled
  • Action: Your system blocks deployments to production for services without MFA

Observation is passive. It relies on someone checking the dashboard and deciding to do something. Action is automatic. The system enforces the control.

The test: Pick any data field in your compliance system. Imagine deleting it. Would anyone's workflow break? Would any automated process fail? If not, that data will go stale within six months.

From Principle to Practice: Risk-Based Monitoring

Let me be specific about how these principles play out in real compliance architecture.

The core insight: Not all controls need the same monitoring frequency. Your marketing documentation doesn't need the same attention as your customer payment processing system. But most compliance tools treat everything equally, creating alert fatigue and wasting resources.

The solution is risk-based continuous monitoring where assessment frequency is driven by multiple factors:

  • Data sensitivity (PII, financial data, credentials)
  • Known vulnerabilities in the technology stack
  • Lateral attack surface and blast radius
  • External exposure (internet-facing vs. internal)
  • Business criticality

This isn't theoretical. I built a dynamic risk tool that calculated risk scores and automatically adjusted monitoring cadence. High-risk systems get hourly checks. Low-risk systems get quarterly reviews. The risk score directly changed SLAs, monitoring frequency, and escalation paths.

Why it worked:

  1. Event-driven: When a CVE was published affecting our dependencies, risk scores updated automatically
  2. Tied to action: The risk score wasn't just a number on a dashboard. It determined who got paged, what controls were required, and audit scope

The compliance team stopped maintaining manual risk registers. The security team trusted the risk scores because they reflected reality. Engineering teams understood why certain systems had tighter controls.

The tool didn't die because it was in the critical path of incident response, audit prep, and vulnerability management.

The Privacy Paradox

Here's an irony: compliance automation often creates its own compliance problems. To prove you're handling data correctly, tools collect and store sensitive configurations, user lists, and system states. Now you have PII retention and data minimization issues.

There's a better architectural approach that I've used: store cryptographic proof instead of the actual data.

Instead of storing complete configuration snapshots:

  • Store SHA-256 hashes of the configuration state
  • Store the compliance evaluation (PASS/FAIL with specific failures)
  • Store references to where the source data lives

This gives you verifiable evidence without the retention burden. You can prove "MFA was properly configured on systems X, Y, Z on January 15th" without storing every user's authentication settings. If an auditor questions it, they can request the current configuration, re-compute the hash, and verify your historical claim.

This isn't theoretical. Digital forensics has used hash-based chain of custody for decades. Blockchain uses similar concepts for tamper-proof records. The pattern works when you need point-in-time compliance verification without indefinite data retention.

The Hard Truth

Compliance automation fails when it asks people to do extra work for "compliance reasons." It succeeds when it reduces work and makes their jobs easier.

You cannot mandate people into maintaining a CMDB. You cannot policy your way into data freshness. You can't build a system that requires continuous manual feeding and expect it to survive.

Build systems that update themselves when things happen. Build tools that people need to use to do their actual jobs. Tie every piece of data to a decision or action that matters.

If you're building or buying compliance automation right now, ask yourself:

  • Does this system update automatically when things change, or does it require manual updates?
  • Is this system in the critical path of decisions we make, or is it an observation layer we hope people check?

If you can't answer "automatic" and "critical path," you're building expensive shelfware.


About the author: Chris Davis is a Principal Product Security Engineer with 25+ years building security and compliance programs at Intel, VMware, Oracle Cloud, and AWS. He specializes in translating complex compliance requirements into engineering controls that actually survive in production. His 13 published books on information security and IT auditing are used in graduate cybersecurity programs. Connect on LinkedIn or read more at cloudauditcontrols.com.

Wednesday, August 27, 2025

Preview: NCCoE Secure DevSecOps Practices - NIST SP 1800-44A

SourceSecure Software Development, Security, and Operations (DevSecOps) Practices


The National Cybersecurity Center of Excellence (NCCoE) has released an Initial Public Draft outlining their planned guide on Development, Security, and Operations (DevSecOps) practices. This draft represents their vision for helping organizations integrate security throughout their software development lifecycle.

Planned Key Components:

  • Will provide a notional reference model for implementing DevSecOps practices
  • Intends to emphasize zero trust security architecture integration
  • Plans to offer practical methodology for organizations seeking to enhance their software security posture
  • Being developed by NIST's National Cybersecurity Center of Excellence as part of their ongoing cybersecurity initiatives

Target Audience: IT professionals, security teams, software developers, and organizational leadership responsible for secure software development practices.

Expected Outcomes: The final document aims to outline actionable steps for organizations to begin implementing or improving their DevSecOps capabilities. This may be a future helpful resource for both beginners and those looking to mature their existing practices. It's tied to other initiatives from EOs and has some industry momentum that might not be apparent. It's a rising frustration that I think will see more attention over the next year. 

Monday, August 18, 2025

Meta-analysis of 28 AI Security Frameworks and Guidelines

28 AI Security Frameworks and Guidelines

Meta-takeaway: 28 AI Security Frameworks and Guidelines

Doing a quick dive into this... Look at the table and think about what stands out. 

What do you see?

Download it here: 

https://github.com/davischr2/Cloud-Documents

Here are my quick observations. This table reads less like a body of original work and more like a crowd of institutions trying not to be left out of the AI moment. The motivation mix is clear:

  • Fear of being blamed (if AI causes harm)
  • Fear of being left behind (if others set the norms)
  • Fear of losing control (if AI develops outside institutional guardrails)

And yet, in that swirl, you can see a few truly new constructs emerging. Consider adversarial threat taxonomies, LLM-specific risks, and the engineering of assurance. That’s where the real substance is.

1. Everyone wants a piece of the steering wheel

  • Multiplicity of bodies: Governments (EU, US, G7, UN), standards organizations (ISO, NIST), regulators (CISA, ENISA), industry (CSA, OWASP), and even loose communities are all publishing.
  • This signals that no one trusts any single authority to “own” AI governance. Everyone wants to shape it to their jurisdiction, sector, or constituency. People like control.
  • The table almost reads like a map of regulatory turf-staking.

2. Fear is driving much of the activity

  • You see the fingerprints of fear of harm everywhere: prohibited practices in the EU AI Act, adversarial threats in MITRE ATLAS, “best practices” for AI data from CISA.
  • Even the voluntary guidelines (e.g., OWASP LLM Top 10, CSA AI Safety) are mitigations against anticipated misuse.
  • These aren’t aspirational visions of AI’s potential. They’re largely defensive measures. These are a kind of collective bracing for impact. It's not that I think they are wrong... They just don't trust others to do it correctly.

3. Originality is thin and echo chambers dominate

  • Many of the documents are cross-referential: NIST AI RMF becomes the anchor, ISO drafts map to it, ENISA echoes both, and EU AI Act leans on “harmonized standards” that are largely ISO/NIST influenced.
  • The “new” work is often reinterpretation of old risk frameworks (ISO management systems, NIST RMF, 27k family) with “AI” pasted on.
  • Genuine innovation is scarcer, but you can see it in things like MITRE ATLAS (a fresh threat taxonomy) and OWASP LLM Top 10 (concrete new risks like prompt injection).

4. Regulation vs. implementation gap

  • Regulation-heavy side: EU AI Act, MITRE policy memos, UNESCO/OECD ethics are all conceptual or legal. This is fine except...
  • Implementation-light side: Only a few (e.g., CISA/NCSC secure AI dev, OWASP Testing Guide) actually tell engineers how to build or defend systems.
  • This leaves a vacuum: rules are being written faster than usable security engineering practices.

5. Globalization meets fragmentation

  • UNESCO, OECD, G7, INASI push for global harmonization.
  • But EU AI Act, CISA guidelines, UK Standards Hub all point to regional fragmentation.
  • Companies face a world where AI is “global by design, regulated local by law.” The burden is harmonizing across conflicting signals.

6. Cultural subtext: fear of “black box” systems

  • Most guidance (NIST RMF, ISO 42001, AI Act) centers around transparency, accountability, oversight.
  • That’s really a way of saying: “we don’t trust opaque algorithms.”
  • The core anxiety isn’t just data misuse... it’s losing human agency and visibility when decisions migrate into AI.

7. The rise of “assurance” as a currency

  • Assurance shows up repeatedly (MITRE AI Assurance, ISO/IEC 25059, CISA guidelines).
  • It suggests the world is shifting from just “secure design” to provable, auditable trustworthiness. This is what regulators, auditors, and customers want so that they can independently verify trust.

8. Early signs of standardization fatigue

  • There’s a lot of duplication. NIST, ENISA, ISO, CSA, OWASP are all publishing lists of “controls” and “practices.”
  • This could create compliance theater: organizations checking boxes against multiple overlapping frameworks without materially improving AI security.
  • The challenge will be convergence vs. chaos.

Included AI Security Frameworks and Guidelines:

  • CISA (Best Practices Guide for Securing AI Data, Guidelines for Secure AI System Development)
  • Cloud Security Alliance (CSA) (AI Safety Initiative)
  • ENISA (Multilayer Framework for Good Cybersecurity Practices for AI, Cybersecurity of AI and Standardisation)
  • European Union / European Commission (EU AI Act, Guidelines on Prohibited AI Practices)
  • G7 (Hiroshima Process) (International Code of Conduct for Organizations Developing Advanced AI Systems)
  • International Network of AI Safety Institutes (INASI) (International Network of AI Safety Institutes)
  • ISO/IEC (ISO/IEC 42001:2023, ISO/IEC DIS 27090, ISO/IEC 25059:2023)
  • MITRE (MITRE ATLAS, A Sensible Regulatory Framework for AI Security, Assuring AI Security & Safety through AI Regulation, AI Assurance Guide)
  • NIST (AI Risk Management Framework 1.0, Generative AI Profile, Trustworthy & Responsible AI [AIRC hub])
  • OECD (Recommendation of the Council on Artificial Intelligence)
  • OWASP (AI Security & Privacy Guide, Top 10 for LLM Applications, AI Exchange, AI Testing Guide, Securing Agentic Applications Guide 1.0)
  • UK AI Standards Hub (BSI/NPL/Turing) (AI Standards Hub)
  • UNESCO (Recommendation on the Ethics of Artificial Intelligence)
  • United Nations / ITU (Guide to Developing a National Cybersecurity Strategy)

Thursday, August 14, 2025

Tradeoffs to Consider: Serving Model Predictions

Credit to Santiago over at ML.School with his course Building AI/ML Systems That Don't Suck for thinking through and sharing this image during his course. 

This is similar to other tradeoffs in business where it's important to shape, temper, and communicate your expectations. What are your priorities? How do you solve the business problem while juggling the quality, speed, and cost of the output? 

Spend the time upfront defining the business problem and the ideal solution. 

The image is self-explanatory.