X Platform 3.16.34
Performance results for X Platform 3.16.34, based on testing conducted in February 2025.
Test Configuration
X Platform Version: 3.16.34
Java Runtime: Oracle Java 8
Message Encoding: Xbuf2
Cluster Configuration: Primary + Backup with persistence and replication
Test Hardware: Intel Xeon Gold 6334 (8-Core, 3.6 GHz), 128GB RAM, InfiniBand network
Message Rate (Latency Tests): 10,000 messages/second
Message Rate (Throughput Tests): As fast as possible (saturated)
See the Test Description for complete test methodology and hardware specifications.
Latency Results
All latency numbers are in microseconds (µs). Round-trip wire latency (~23µs on unoptimized network) is included in all results.
Indirect Message Access | Latency Optimization
Configuration: Xbuf2.Indirect | OptimizeFor=Latency | Message Rate=10,000/sec
Message data accessed via POJO setter/getter methods.
MinCPU
1
40.90
43.31
44.95
Default
4
30.55
34.85
38.27
MaxCPU
6
32.44
36.61
40.23
Best Configuration: Default (4 CPUs) - 30.55µs median latency
Direct Message Access | Latency Optimization
Configuration: Xbuf2.Direct | OptimizeFor=Latency | Message Rate=10,000/sec
Message data accessed via serializer/deserializer objects (zero-copy access).
MinCPU
1
37.72
40.07
41.43
Default
4
27.34
30.56
34.04
MaxCPU
6
29.09
33.08
47.14
Best Configuration: Default (4 CPUs) - 27.34µs median latency
Key Insight: Direct message access provides ~10% lower latency than indirect access by eliminating POJO method call overhead.
Throughput Results
Throughput measured in messages per second. Test mode: saturated load (as fast as possible).
Indirect Message Access | Throughput Optimization
Configuration: Xbuf2.Indirect | OptimizeFor=Throughput | Message Rate=[As Fast As Possible]
MinCPU
1
117,124
Default
4
90,974
MaxCPU
6
56,419
Best Configuration: MinCPU (1 CPU) - 117,124 msgs/sec
Direct Message Access | Throughput Optimization
Configuration: Xbuf2.Direct | OptimizeFor=Throughput | Message Rate=[As Fast As Possible]
MinCPU
1
282,038
Default
4
281,758
MaxCPU
6
107,336
Best Configuration: MinCPU (1 CPU) - 282,038 msgs/sec
Key Insight: Direct message access provides 2.4x higher throughput than indirect access by eliminating serialization overhead.
Performance Analysis
Latency Characteristics
Optimal CPU Configuration: 4 CPUs (Default) provides best latency across both access methods
Balances parallelization benefits against thread coordination overhead
25-30% better than MinCPU configuration
Direct vs Indirect Access: Direct access reduces latency by ~10%
Median: 27.34µs (Direct) vs 30.55µs (Indirect)
Avoids POJO method call overhead and intermediate object creation
Tail Latency: 99.9th percentile latencies remain within 1.5x of median
Indicates consistent, predictable performance
Good mechanical sympathy with modern hardware
Throughput Characteristics
Optimal CPU Configuration: MinCPU (1-2 CPUs) provides best throughput
Lightweight message handler benefits from single-threaded execution
Thread handoff overhead exceeds parallelization benefits in this test
Note: Real applications with heavier business logic may benefit from more CPUs
Direct vs Indirect Access: Direct access provides 2.4x throughput improvement
MinCPU: 282K msgs/sec (Direct) vs 117K msgs/sec (Indirect)
Zero-copy access eliminates serialization bottleneck
CPU Scaling: Throughput decreases with more CPUs in this lightweight test
High-performance disk and zero-cost outbound messaging (in-process driver)
Thread coordination overhead exceeds benefits
Applications with heavier processing loads will see different scaling characteristics
Tuning Recommendations
For Lowest Latency
Use Direct message access (serializer/deserializer objects)
Configure 4 CPUs (Default configuration)
Enable latency optimization mode
Expected: ~27µs median, ~31µs 99th percentile
For Highest Throughput
Use Direct message access (serializer/deserializer objects)
Configure MinCPU (minimal thread count)
Enable throughput optimization mode
Expected: ~280K msgs/sec
Application-Specific Considerations
Heavier business logic: May benefit from more CPUs (Default or MaxCPU)
Complex message transformations: Direct access provides larger benefits
Network-limited scenarios: VMA enablement can further reduce latency
Multiple microservices per host: CPU isolation becomes critical
Next Steps
Review Test Description for complete test methodology
Return to Performance Overview
See Threading Configuration for CPU tuning
Learn about Direct Serialization for optimal performance
Explore Thread Affinitization for CPU pinning
Last updated

