X Platform 3.16.34

Performance results for X Platform 3.16.34, based on testing conducted in February 2025.

Test Configuration

  • X Platform Version: 3.16.34

  • Java Runtime: Oracle Java 8

  • Message Encoding: Xbuf2

  • Cluster Configuration: Primary + Backup with persistence and replication

  • Test Hardware: Intel Xeon Gold 6334 (8-Core, 3.6 GHz), 128GB RAM, InfiniBand network

  • Message Rate (Latency Tests): 10,000 messages/second

  • Message Rate (Throughput Tests): As fast as possible (saturated)

See the Test Description for complete test methodology and hardware specifications.

Latency Results

All latency numbers are in microseconds (µs). Round-trip wire latency (~23µs on unoptimized network) is included in all results.

Indirect Message Access | Latency Optimization

Configuration: Xbuf2.Indirect | OptimizeFor=Latency | Message Rate=10,000/sec

Message data accessed via POJO setter/getter methods.

CPU Config
# CPUs
50th % (µs)
99th % (µs)
99.9th % (µs)

MinCPU

1

40.90

43.31

44.95

Default

4

30.55

34.85

38.27

MaxCPU

6

32.44

36.61

40.23

Best Configuration: Default (4 CPUs) - 30.55µs median latency

Direct Message Access | Latency Optimization

Configuration: Xbuf2.Direct | OptimizeFor=Latency | Message Rate=10,000/sec

Message data accessed via serializer/deserializer objects (zero-copy access).

CPU Config
# CPUs
50th % (µs)
99th % (µs)
99.9th % (µs)

MinCPU

1

37.72

40.07

41.43

Default

4

27.34

30.56

34.04

MaxCPU

6

29.09

33.08

47.14

Best Configuration: Default (4 CPUs) - 27.34µs median latency

Throughput Results

Throughput measured in messages per second. Test mode: saturated load (as fast as possible).

Indirect Message Access | Throughput Optimization

Configuration: Xbuf2.Indirect | OptimizeFor=Throughput | Message Rate=[As Fast As Possible]

CPU Config
# CPUs
Throughput (msgs/sec)

MinCPU

1

117,124

Default

4

90,974

MaxCPU

6

56,419

Best Configuration: MinCPU (1 CPU) - 117,124 msgs/sec

Direct Message Access | Throughput Optimization

Configuration: Xbuf2.Direct | OptimizeFor=Throughput | Message Rate=[As Fast As Possible]

CPU Config
# CPUs
Throughput (msgs/sec)

MinCPU

1

282,038

Default

4

281,758

MaxCPU

6

107,336

Best Configuration: MinCPU (1 CPU) - 282,038 msgs/sec

Performance Analysis

Latency Characteristics

  1. Optimal CPU Configuration: 4 CPUs (Default) provides best latency across both access methods

    • Balances parallelization benefits against thread coordination overhead

    • 25-30% better than MinCPU configuration

  2. Direct vs Indirect Access: Direct access reduces latency by ~10%

    • Median: 27.34µs (Direct) vs 30.55µs (Indirect)

    • Avoids POJO method call overhead and intermediate object creation

  3. Tail Latency: 99.9th percentile latencies remain within 1.5x of median

    • Indicates consistent, predictable performance

    • Good mechanical sympathy with modern hardware

Throughput Characteristics

  1. Optimal CPU Configuration: MinCPU (1-2 CPUs) provides best throughput

    • Lightweight message handler benefits from single-threaded execution

    • Thread handoff overhead exceeds parallelization benefits in this test

    • Note: Real applications with heavier business logic may benefit from more CPUs

  2. Direct vs Indirect Access: Direct access provides 2.4x throughput improvement

    • MinCPU: 282K msgs/sec (Direct) vs 117K msgs/sec (Indirect)

    • Zero-copy access eliminates serialization bottleneck

  3. CPU Scaling: Throughput decreases with more CPUs in this lightweight test

    • High-performance disk and zero-cost outbound messaging (in-process driver)

    • Thread coordination overhead exceeds benefits

    • Applications with heavier processing loads will see different scaling characteristics

Tuning Recommendations

For Lowest Latency

  1. Use Direct message access (serializer/deserializer objects)

  2. Configure 4 CPUs (Default configuration)

  3. Enable latency optimization mode

  4. Expected: ~27µs median, ~31µs 99th percentile

For Highest Throughput

  1. Use Direct message access (serializer/deserializer objects)

  2. Configure MinCPU (minimal thread count)

  3. Enable throughput optimization mode

  4. Expected: ~280K msgs/sec

Application-Specific Considerations

  • Heavier business logic: May benefit from more CPUs (Default or MaxCPU)

  • Complex message transformations: Direct access provides larger benefits

  • Network-limited scenarios: VMA enablement can further reduce latency

  • Multiple microservices per host: CPU isolation becomes critical

Next Steps

Last updated