What is the Real Cost of std::memory_order on ARM64? - Jetson Orin Benchmark
We measured whether atomic memory ordering could be a performance bottleneck in a 1kHz RT loop on Jetson Orin (Cortex-A78AE). On AArch64, the cost of seq_cst is virtually identical to release/acquire, and the total cost of 25 atomic operations is less than 0.01% of the 1ms budget.