AEP Module

The AEP (Application Event Processing) module contains the canonical end-to-end performance benchmark for X Platform. This benchmark measures the complete Receive-Process-Send flow of a clustered microservice.

Overview

The AEP module benchmarks exercise the entire X Platform stack, including:

Messaging: Inbound and outbound message handling
Serialization: Message encoding/decoding (Xbuf2)
Handler Dispatch: Event routing to business logic
State Management: Object store operations
Persistence: Transaction log writes
Clustering: State replication to backup instances
Consensus: Acknowledgment protocol between primary and backup

This represents the most comprehensive benchmark in the suite and is used to publish official performance metrics for X Platform releases.

Test Programs

The AEP module provides two test programs:

ESProcessor (Event Sourcing)

Class: com.neeve.perf.aep.engine.ESProcessor

The Event Sourcing processor is the canonical benchmark used for published X Platform performance results. It uses Event Sourcing HA policy where:

Messages are the source of truth
State is replayed from message log on recovery
Optimal for high-throughput message processing

Used for: Official X Platform performance benchmarks published in documentation.

SRProcessor (State Replication)

Class: com.neeve.perf.aep.engine.SRProcessor

The State Replication processor uses State Replication HA policy where:

State objects are the source of truth
State changes are replicated to backup
State is persisted and recovered directly

Used for: Benchmarking state-heavy applications.

Canonical Benchmark Details

For complete details about the canonical benchmark methodology and published results, see:

Canonical Benchmark Overview - What is benchmarked and key metrics
Test Description - Complete test methodology, hardware, and configuration
Test Results - Published performance results by release

Test Message

The benchmark uses a Car message (defined in nvx-perf-models) that:

Exercises the complete Xbuf2 data model
Contains ~200 bytes when serialized
Includes primitives, strings, nested objects, and arrays
Represents a realistic business message

Message Access Methods

The benchmark tests two message access patterns:

Indirect Access (xbuf2.serial/xbuf2.random)

Message data accessed via POJO getters/setters:

@EventHandler
public void onMessage(Car inMessage) {
    // Read via getters
    String make = inMessage.getMake();
    String model = inMessage.getModel();

    // Create outbound message
    Car outMessage = Car.create();
    outMessage.setMake(make);
    outMessage.setModel(model);

    // Send
    messageSender.sendMessage(1, outMessage);
}

Direct Access (Serializer/Deserializer)

Message data accessed via zero-copy serializers:

@EventHandler
public void onMessage(Car inMessage) {
    // Read via deserializer (zero-copy)
    deserializer.decode(inMessage);

    // Create via serializer (zero-copy)
    Car outMessage = serializer.create();

    // Send
    messageSender.sendMessage(1, outMessage);
}

Direct access provides ~10% lower latency and 2.4x higher throughput

Running the Benchmark

Prerequisites

Extract the X Platform performance benchmark distribution
Two Linux servers with InfiniBand or 10GbE networking (for clustered tests)
Synchronized time between servers

Quick Single-Instance Test

For quick validation without clustering:

cd nvx-perf-3.16.34
$JAVA_HOME/bin/java -cp "libs/*" com.neeve.perf.aep.engine.ESProcessor \
  --count 10000 \
  --rate 5000 \
  --warmupTime 2 \
  --encoding xbuf2.random

Full Clustered Latency Test

For the complete canonical benchmark matching published results, see the Test Description for detailed command-line examples with clustering, persistence, and CPU affinitization.

Command-Line Parameters

General Parameters

Parameter

Default

Description

--encoding

xbuf2.serial

Message encoding: xbuf2.serial, xbuf2.random

--count

10,000,000

Number of messages to inject

--rate

100,000

Message injection rate (msgs/sec)

--warmupTime

Warmup time in seconds before collecting stats

--printIntervalStats

false

Print interval stats during test

CPU Affinity Parameters

Parameter

Default

Description

--injectorCPUAffinityMask

null

CPU mask for message injector thread

--muxCPUAffinityMask

null

CPU mask for event multiplexer thread

--busDetachedSendCPUAffinityMask

null

CPU mask for detached sender thread

Persistence Parameters

Parameter

Default

Description

--enablePersistence

false

Enable transaction log persistence

--persisterLogLocation

Directory for transaction log

--persisterFlushOnCommit

false

Flush log on every commit

Clustering Parameters

Parameter

Default

Description

--enableClustering

false

Enable clustering (primary/backup)

--clusteringLocalIfAddr

0.0.0.0

Local interface for replication

--clusteringDiscoveryLocalIfAddr

0.0.0.0

Local interface for discovery

For complete parameter reference, see the source distribution or run with --help.

Interpreting Results

Latency Results

The test outputs latency percentiles in microseconds:

Wire-to-Wire Latency Stats:
  Count: 600000
  50th percentile: 27.34 µs
  99th percentile: 30.56 µs
  99.9th percentile: 34.04 µs
  Mean: 27.89 µs

Includes:

Inbound deserialization
Handler dispatch
Business logic execution
Persistence
Replication to backup
Consensus acknowledgment
Outbound serialization
Round-trip wire latency (~23µs on unoptimized network)

Throughput Results

The test outputs maximum sustained throughput:

Throughput: 282,038 messages/second

Represents: Maximum rate at which the clustered microservice can process messages while maintaining:

Full persistence
Replication to backup
Consensus acknowledgment

Published Results

Official performance results from this benchmark are published in:

Canonical Benchmark Test Results
X Platform Release Notes

Results include:

Multiple CPU configurations (MinCPU, Default, MaxCPU)
Optimization modes (Latency, Throughput)
Message access methods (Indirect, Direct)
Detailed performance analysis and tuning recommendations

Next Steps

View Results: See Test Results for published performance data
Understand Methodology: Read Test Description for complete test details
Component Benchmarks: Explore other modules for component-level benchmarks:
- Serialization Module - Message encoding/decoding only
- Persistence Module - Persistence layer only
- Link Module - Cluster replication transport
Return: Go back to Modules Overview

PreviousStorage Module NextJavadoc

Last updated 3 months ago

hashtagOverview

hashtagTest Programs

hashtagESProcessor (Event Sourcing)

hashtagSRProcessor (State Replication)

hashtagCanonical Benchmark Details

hashtagTest Message

hashtagMessage Access Methods

hashtagIndirect Access (xbuf2.serial/xbuf2.random)

hashtagDirect Access (Serializer/Deserializer)

hashtagRunning the Benchmark

hashtagPrerequisites

hashtagQuick Single-Instance Test

hashtagFull Clustered Latency Test

hashtagCommand-Line Parameters

hashtagGeneral Parameters

hashtagCPU Affinity Parameters

hashtagPersistence Parameters

hashtagClustering Parameters

hashtagInterpreting Results

hashtagLatency Results

hashtagThroughput Results

hashtagPublished Results

hashtagNext Steps

Overview

Test Programs

ESProcessor (Event Sourcing)

SRProcessor (State Replication)

Canonical Benchmark Details

Test Message

Message Access Methods

Indirect Access (xbuf2.serial/xbuf2.random)

Direct Access (Serializer/Deserializer)

Running the Benchmark

Prerequisites

Quick Single-Instance Test

Full Clustered Latency Test

Command-Line Parameters

General Parameters

CPU Affinity Parameters

Persistence Parameters

Clustering Parameters

Interpreting Results

Latency Results

Throughput Results

Published Results

Next Steps