X Platform 3.16.34

Performance results for X Platform 3.16.34, based on testing conducted in February 2025.

Test Configuration

X Platform Version: 3.16.34
Java Runtime: Oracle Java 8
Message Encoding: Xbuf2
Cluster Configuration: Primary + Backup with persistence and replication
Test Hardware: Intel Xeon Gold 6334 (8-Core, 3.6 GHz), 128GB RAM, InfiniBand network
Message Rate (Latency Tests): 10,000 messages/second
Message Rate (Throughput Tests): As fast as possible (saturated)

See the Test Description for complete test methodology and hardware specifications.

Latency Results

All latency numbers are in microseconds (µs). Round-trip wire latency (~23µs on unoptimized network) is included in all results.

Indirect Message Access | Latency Optimization

Configuration: Xbuf2.Indirect | OptimizeFor=Latency | Message Rate=10,000/sec

Message data accessed via POJO setter/getter methods.

CPU Config

# CPUs

50th % (µs)

99th % (µs)

99.9th % (µs)

MinCPU

40.90

43.31

44.95

Default

30.55

34.85

38.27

MaxCPU

32.44

36.61

40.23

Best Configuration: Default (4 CPUs) - 30.55µs median latency

Direct Message Access | Latency Optimization

Configuration: Xbuf2.Direct | OptimizeFor=Latency | Message Rate=10,000/sec

Message data accessed via serializer/deserializer objects (zero-copy access).

CPU Config

# CPUs

50th % (µs)

99th % (µs)

99.9th % (µs)

MinCPU

37.72

40.07

41.43

Default

27.34

30.56

34.04

MaxCPU

29.09

33.08

47.14

Best Configuration: Default (4 CPUs) - 27.34µs median latency

Key Insight: Direct message access provides ~10% lower latency than indirect access by eliminating POJO method call overhead.

Throughput Results

Throughput measured in messages per second. Test mode: saturated load (as fast as possible).

Indirect Message Access | Throughput Optimization

Configuration: Xbuf2.Indirect | OptimizeFor=Throughput | Message Rate=[As Fast As Possible]

CPU Config

# CPUs

Throughput (msgs/sec)

MinCPU

117,124

Default

90,974

MaxCPU

56,419

Best Configuration: MinCPU (1 CPU) - 117,124 msgs/sec

Direct Message Access | Throughput Optimization

Configuration: Xbuf2.Direct | OptimizeFor=Throughput | Message Rate=[As Fast As Possible]

CPU Config

# CPUs

Throughput (msgs/sec)

MinCPU

282,038

Default

281,758

MaxCPU

107,336

Best Configuration: MinCPU (1 CPU) - 282,038 msgs/sec

Key Insight: Direct message access provides 2.4x higher throughput than indirect access by eliminating serialization overhead.

Performance Analysis

Latency Characteristics

Optimal CPU Configuration: 4 CPUs (Default) provides best latency across both access methods
- Balances parallelization benefits against thread coordination overhead
- 25-30% better than MinCPU configuration
Direct vs Indirect Access: Direct access reduces latency by ~10%
- Median: 27.34µs (Direct) vs 30.55µs (Indirect)
- Avoids POJO method call overhead and intermediate object creation
Tail Latency: 99.9th percentile latencies remain within 1.5x of median
- Indicates consistent, predictable performance
- Good mechanical sympathy with modern hardware

Throughput Characteristics

Optimal CPU Configuration: MinCPU (1-2 CPUs) provides best throughput
- Lightweight message handler benefits from single-threaded execution
- Thread handoff overhead exceeds parallelization benefits in this test
- Note: Real applications with heavier business logic may benefit from more CPUs
Direct vs Indirect Access: Direct access provides 2.4x throughput improvement
- MinCPU: 282K msgs/sec (Direct) vs 117K msgs/sec (Indirect)
- Zero-copy access eliminates serialization bottleneck
CPU Scaling: Throughput decreases with more CPUs in this lightweight test
- High-performance disk and zero-cost outbound messaging (in-process driver)
- Thread coordination overhead exceeds benefits
- Applications with heavier processing loads will see different scaling characteristics

Tuning Recommendations

For Lowest Latency

Use Direct message access (serializer/deserializer objects)
Configure 4 CPUs (Default configuration)
Enable latency optimization mode
Expected: ~27µs median, ~31µs 99th percentile

For Highest Throughput

Use Direct message access (serializer/deserializer objects)
Configure MinCPU (minimal thread count)
Enable throughput optimization mode
Expected: ~280K msgs/sec

Application-Specific Considerations

Heavier business logic: May benefit from more CPUs (Default or MaxCPU)
Complex message transformations: Direct access provides larger benefits
Network-limited scenarios: VMA enablement can further reduce latency
Multiple microservices per host: CPU isolation becomes critical

Next Steps

Review Test Description for complete test methodology
Return to Performance Overview
See Threading Configuration for CPU tuning
Learn about Direct Serialization for optimal performance
Explore Thread Affinitization for CPU pinning

PreviousX Platform 3.16 NextBenchmark Suite

Last updated 1 month ago