Vtool

Cogita-PRO Anomaly Detection

Olivera Stojanovic

January 21, 2026

‍

One of the most useful features of Cogita-PRO is anomaly detection. By definition, an anomaly is something that happens unexpectedly and rarely, deviating from the norm — an outlier. These can be some of the most interesting situations to arise in chip verification … if we can find them.

Since DV is primarily spec-driven, traditional methods include directed and constrained-random tests with coverage models to flag bugs. We code what we can anticipate. But what about unexpected and, therefore, unmodeled bugs? Behaviors that our coverage models miss entirely?

And, of course, these anomalies can hide among millions of signals, thousands of modes and gigabytes of simulation logs. Humans simply can’t inspect this manually.

The challenge: catch unknown unknowns at scale.

‍

Types of Anomalies

In most cases, RTL and testbench bugs arise from a specific combination of data or an unusual sequence of events.

Data anomalies: When patterns diverge from the norm, the data content can be inverted, shifted, misaligned, or otherwise odd. Perhaps a rare mix/combination of different fields values of packets. Rare data repetition can also be a symptom of incorrect behavior.
Sequence pattern anomalies: In this case, the error is not caused by unusual data values but by a unique event sequence that deviates from all previous execution patterns. Detecting such temporal anomalies helps reveal what sequence of interactions led to the issue and how it differs from normal behavior.

Cogita-PRO has a set of multiple anomaly-detection algorithms tailored for verification datasets.

Moreover, Cogita-PRO will use layers of anomaly detection algorithms to eliminate false-positives and ensure the user sees only the most relevant results. This pipelining of algorithms can be configured by the user or Cogita-PRO can gather results and provide a unified presentation of overall conclusions.

‍

Anomaly type: Data

‍

1. Neural Network model training and usages of the models

Data Scale Type: Large-scale high-dimensional (Big data set – lot of occurrences and lot of data fields columns)

Usecase/application: Model build on passing tests are used on failing test, subsystem or SoC level

Key benefits:

Captures non-linear feature interactions and complex data dependencies
Generalizes across test scenarios when trained on diverse passing data

‍

Neural Network model training and usages of the models

‍

2. Ensemble

Data Scale Type: Large-scale high-dimensional (Big data set – lot of occurrences and lot of data fields columns)

Usecase/application: Detects anomalies in multi-field value combinations within single test execution

Key benefits:

Identifies rare multi-dimensional combinations
Highlights specific field combinations contributing to anomaly score

‍

‍

3. Describe analysis

Data Scale Type: Large-scale high-dimensional (Big data set – lot of occurrences and lot of data fields columns)

Application: Post-detection explainability layer that analyzes flagged anomalies to identify distinguishing characteristics compared to normal transactions within the test

Key benefits:

Root cause attribution - identifies which specific fields/combinations deviate from baseline
Anomaly interpretability - translates probabilistic scores into actionable debug insights
Feature importance ranking - prioritizes which data fields contribute most to anomaly classification
Comparative analysis - quantifies how suspect transactions differs from cluster of normal transactions
Reduces debug time - eliminates manual comparison of thousands of field values
Multi-algorithm fusion - explains anomalies detected by any upstream detection method (NN, ensemble, statistical)

‍

4. Extreme values outliers

Data Scale Type: Small-scale and large-scale

Usecase/application: Identifies extreme value outliers, extreme successful occurrence values and timing distribution outliers

Key benefits:

Detects timing violations and performance anomalies
Pinpoints transactions with unusual field values or frequencies

‍

‍

Anomaly type: Sequence pattern

‍

1. Automated Transaction Path Extraction, Classification and Golden Reference Model for Failure Analysis

Data Scale Type: Small-scale and large-scale

Usecase/application: This method is ideal for NoC fabrics and multi-path subsystems, where transaction routing exhibits high combinatorial diversity.

Key benefits:

This automatic classification reveals:

Protocol conformance - whether transactions follow expected paths
Path diversity - how many variants exist in actual execution
Anomalous flows - rare or unexpected sequences
Execution coverage - which protocol paths are actually exercised
Specified it's a differential/comparison method
Positioned golden model as "learned specification"

‍

‍

2. Automated Discrete State Machine Transition Extraction and Modeling

Data Scale Type: Small-scale and large-scale

Usecase/application: Protocol FSM verification, transaction type state tracking, multi-FSM concurrent analysis

Key benefits:

Automatic transition discovery - extracts complete state graph from logs
Illegal transition detection - identifies state sequences absent in golden model
Multi-field FSM correlation - tracks coupled state machine behavior

‍

Automated Discrete State Machine Transition

‍

3. Automated Discrete State Machine Sequence Extraction and Modeling

Data Scale Type: Small-scale and large-scale

Usecase/application: Temporal ordering violations, multi-FSM interaction patterns, causality chain analysis

Key benefits:

Temporal anomaly isolation - detects illegal event orderings
Multi-dimensional sequence comparison - analyzes concurrent field evolution
Pass/fail differential - highlights sequence deviations causing failures
Interaction pattern discovery - reveals unexpected cross-FSM dependencies

‍

Automated Discrete State Machine Sequence

‍

Regression analysis

All anomaly detection algorithms can be deployed with Cogita-PRO in regression mode to perform real-time anomaly detection without user interaction. Then, in the event of an anomaly, an immediate alert is issued and the user can view the results, correlate the anomaly to any UVM errors or use it for regression triage. Cogita-PRO can then be launched in interactive mode and the regression results are immediately viewable.

‍

Conclusion

Verification anomalies—whether data-driven or sequence-driven—are the hardest bugs to find because they represent rare combinations that appear only rarely. Cogita-PRO's suite of tailored algorithms automates their detection across all scales of verification data, from block-level to full SoC regression, enabling verification teams to focus on fixing bugs rather than hunting for them in massive log files.

‍

Cogita-PRO Blog

Cogita-PRO Anomaly Detection

Olivera Stojanovic

January 21, 2026

‍

And, of course, these anomalies can hide among millions of signals, thousands of modes and gigabytes of simulation logs. Humans simply can’t inspect this manually.

The challenge: catch unknown unknowns at scale.

‍

Types of Anomalies

In most cases, RTL and testbench bugs arise from a specific combination of data or an unusual sequence of events.

Data anomalies: When patterns diverge from the norm, the data content can be inverted, shifted, misaligned, or otherwise odd. Perhaps a rare mix/combination of different fields values of packets. Rare data repetition can also be a symptom of incorrect behavior.
Sequence pattern anomalies: In this case, the error is not caused by unusual data values but by a unique event sequence that deviates from all previous execution patterns. Detecting such temporal anomalies helps reveal what sequence of interactions led to the issue and how it differs from normal behavior.

Cogita-PRO has a set of multiple anomaly-detection algorithms tailored for verification datasets.

‍

Anomaly type: Data

‍

1. Neural Network model training and usages of the models

Data Scale Type: Large-scale high-dimensional (Big data set – lot of occurrences and lot of data fields columns)

Usecase/application: Model build on passing tests are used on failing test, subsystem or SoC level

Key benefits:

Captures non-linear feature interactions and complex data dependencies
Generalizes across test scenarios when trained on diverse passing data

‍

2. Ensemble

Data Scale Type: Large-scale high-dimensional (Big data set – lot of occurrences and lot of data fields columns)

Usecase/application: Detects anomalies in multi-field value combinations within single test execution

Key benefits:

Identifies rare multi-dimensional combinations
Highlights specific field combinations contributing to anomaly score

‍

3. Describe analysis

Data Scale Type: Large-scale high-dimensional (Big data set – lot of occurrences and lot of data fields columns)

Application: Post-detection explainability layer that analyzes flagged anomalies to identify distinguishing characteristics compared to normal transactions within the test

Key benefits:

Root cause attribution - identifies which specific fields/combinations deviate from baseline
Anomaly interpretability - translates probabilistic scores into actionable debug insights
Feature importance ranking - prioritizes which data fields contribute most to anomaly classification
Comparative analysis - quantifies how suspect transactions differs from cluster of normal transactions
Reduces debug time - eliminates manual comparison of thousands of field values
Multi-algorithm fusion - explains anomalies detected by any upstream detection method (NN, ensemble, statistical)

‍

4. Extreme values outliers

Data Scale Type: Small-scale and large-scale

Usecase/application: Identifies extreme value outliers, extreme successful occurrence values and timing distribution outliers

Key benefits:

Detects timing violations and performance anomalies
Pinpoints transactions with unusual field values or frequencies

‍

Anomaly type: Sequence pattern

‍

1. Automated Transaction Path Extraction, Classification and Golden Reference Model for Failure Analysis

Data Scale Type: Small-scale and large-scale

Usecase/application: This method is ideal for NoC fabrics and multi-path subsystems, where transaction routing exhibits high combinatorial diversity.

Key benefits:

This automatic classification reveals:

Protocol conformance - whether transactions follow expected paths
Path diversity - how many variants exist in actual execution
Anomalous flows - rare or unexpected sequences
Execution coverage - which protocol paths are actually exercised
Specified it's a differential/comparison method
Positioned golden model as "learned specification"

‍

2. Automated Discrete State Machine Transition Extraction and Modeling

Data Scale Type: Small-scale and large-scale

Usecase/application: Protocol FSM verification, transaction type state tracking, multi-FSM concurrent analysis

Key benefits:

Automatic transition discovery - extracts complete state graph from logs
Illegal transition detection - identifies state sequences absent in golden model
Multi-field FSM correlation - tracks coupled state machine behavior

‍

3. Automated Discrete State Machine Sequence Extraction and Modeling

Data Scale Type: Small-scale and large-scale

Usecase/application: Temporal ordering violations, multi-FSM interaction patterns, causality chain analysis

Key benefits:

Temporal anomaly isolation - detects illegal event orderings
Multi-dimensional sequence comparison - analyzes concurrent field evolution
Pass/fail differential - highlights sequence deviations causing failures
Interaction pattern discovery - reveals unexpected cross-FSM dependencies

‍

Regression analysis

‍

Conclusion

‍

Real-world example:

Memory Access Time Anomaly in a Multi-CPU, Shared-Memory NoC System

‍

System Context

‍

4–8 CPU clusters (e.g., Cortex-A class or custom RISC-V)
Shared L3 cache + DRAM controller
Coherent NoC (AXI-based with QoS, virtual channels, and credit-based flow control)
Mix of real-time and best-effort traffic

Bug Scenario

‍

Under specific traffic interleavings, one CPU experiences sporadic 10–50× memory access latency spikes, even though:

No deadlock occurs
No protocol violation is flagged
Performance counters look mostly normal in aggregate

This only happens:

When three or more CPUs issue bursts of write-backs
While another CPU issues cache-miss reads with low QoS
During LLC eviction pressure

Root Cause (Observed in Real Systems)

‍

A rare interaction between:

NoC credit starvation on a return path
A QoS downgrade rule triggered when write buffers exceed a threshold
A fairness watchdog that incorrectly resets priority after a long stall

The result:

One CPU’s read responses get stuck behind write responses
The stall is long but finite, so watchdogs do not fire
The system “recovers” without errors, but latency spikes are extreme

Why This Is an Anomaly (Not a Simple Bug)
‍

Average latency is fine
Max latency occasionally explodes
Happens only under rare traffic mixes
Reproducing it requires timing alignment, not a single bad condition

Anomaly Detection Angle

‍

Instead of checking:

“Did latency exceed X?”

‍

Cogita-PRO detects:

Temporal patterns such as:
- Repeated long gaps between AR and R channels for one master
- Correlation between write-buffer occupancy and read starvation
Deviation from historical per-CPU latency distributions

‍

Cogita-PRO Anomaly Detection

Types of Anomalies

Anomaly type: Data

1. Neural Network model training and usages of the models

2. Ensemble

3. Describe analysis

4. Extreme values outliers

Anomaly type: Sequence pattern

1. Automated Transaction Path Extraction, Classification and Golden Reference Model for Failure Analysis

2. Automated Discrete State Machine Transition Extraction and Modeling

3. Automated Discrete State Machine Sequence Extraction and Modeling

Regression analysis

Conclusion

Cogita-PRO Anomaly Detection

Types of Anomalies

Anomaly type: Data

1. Neural Network model training and usages of the models

2. Ensemble

3. Describe analysis

4. Extreme values outliers

Anomaly type: Sequence pattern

1. Automated Transaction Path Extraction, Classification and Golden Reference Model for Failure Analysis

2. Automated Discrete State Machine Transition Extraction and Modeling

3. Automated Discrete State Machine Sequence Extraction and Modeling

Regression analysis

Conclusion

Real-world example:

Memory Access Time Anomaly in a Multi-CPU, Shared-Memory NoC System

System Context

Bug Scenario

Root Cause (Observed in Real Systems)

Why This Is an Anomaly (Not a Simple Bug)‍

Anomaly Detection Angle

Related Blogs

Why This Is an Anomaly (Not a Simple Bug)
‍