Getting the most out of SSH - hardware acceleration tuning for AES-NI

  • Posted on: 3 September 2015
  • By: Michał Turecki

On Intel some OpenSSH ciphers use hardware accelerated AES-NI extensions which leads to significally better performance. There is a pretty easy way to determine cipher performance on any particular Linux installation:

for i in `ssh -Q cipher`; do dd if=/dev/zero bs=1M count=100 2> /dev/null \
  | ssh -c $i someuser@localhost "(time -p cat) > /dev/null" 2>&1 \
  | grep real | awk '{print "'$i': "100 / $2" MB/s" }'; done

The script will only work if someuser has key authentication configured (~/.ssh/config contains a valid entry for someuser@localhost).

You can replace both 100 in the above command with 1000 to get a more reliable result but 100MB seems to be a good performance indicator without much variance when checking again.

A side note - on my proxmox installation I needed to use "host" CPU type to pass AES extensions to the guest machine (while still giving 3 out of 4 cores to the guest VM to keep 1 spare always available). After this change running cat /proc/cpuinfo on the VM shows 3 CPUs with the following flags:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr
sse sse2 ss syscall nx lm constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq ssse3
cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor
lahf_lm xsaveopt fsgsbase smep erms
.

Thanks to AES extensions now it's obvious that only 2 ciphers are performing really well compared to others, here is a benchmark of ciphers supported by my OpenSSH installation:

3des-cbc: 21.1864 MB/s
blowfish-cbc: 80 MB/s
cast128-cbc: 70.922 MB/s
arcfour: 208.333 MB/s
arcfour128: 208.333 MB/s
arcfour256: 188.679 MB/s
aes128-cbc: 181.818 MB/s
aes192-cbc: 172.414 MB/s
aes256-cbc: 161.29 MB/s
rijndael-cbc@lysator.liu.se: 161.29 MB/s
aes128-ctr: 104.167 MB/s
aes192-ctr: 93.4579 MB/s
aes256-ctr: 85.4701 MB/s
aes128-gcm@openssh.com: 370.37 MB/s
aes256-gcm@openssh.com: 357.143 MB/s
chacha20-poly1305@openssh.com: 166.667 MB/s

Both aes128-gcm@openssh.com and aes256-gcm@openssh.com are leaving competition far behind so now it's time to make ssh actually use AES 256 first if possible. When running man sshd_config it shows the original order of cipher preference (yours can be different):

aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,
aes128-gcm@openssh.com,aes256-gcm@openssh.com,
chacha20-poly1305@openssh.com,
aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,
aes256-cbc,arcfour

I trust OpenSSH team the order in terms of security is tuned properly yet AES256 is known to be industry standard cipher and I hope moving aes256-gcm@openssh.com to the front of the list should be optimal.

Both sshd_config and ~/.ssh/config configuration files expect lines with supported ciphers:

# /etc/ssh/ssh_config (or ~/.ssh/config) and sshd_config
# Protocol version 1
Cipher aes256-gcm@openssh.com
# Protocol version 2
Ciphers aes256-gcm@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour

But before configuring it needs to be tested. I cannot use remote windows scp tools like FileZilla or WinSCP as HDD performance will affect results. Eventually I decided that data needs to go between /dev/zero and /dev/null - /dev/urandom has an additional performance hit.

On a side note, when using WinSCP, make sure to select SCP protocol instead of SFTP (default). Performance difference is huge.

Before changing any configuration:

time cat /dev/zero | head -c 1024M | ssh someuser@localhost 'cat >/dev/null'

real    0m10.146s
user    0m11.457s
sys     0m2.348s

To test if the result will be different when hardware AES256 cipher will be involved:

time cat /dev/zero | head -c 1024M | ssh -c aes256-gcm@openssh.com someuser@localhost 'cat >/dev/null'

real    0m3.922s
user    0m1.718s
sys     0m1.920s

Whoaa 2.5 times faster - nice. Now after changing sshd_config it should stay that fast by default.

time cat /dev/zero | head -c 1024M | ssh someuser@localhost 'cat >/dev/null'

real    0m10.294s
user    0m11.530s
sys     0m2.195s

No. It's still slow not to mention that lack of HW acceleration means higher CPU usage will prevent other server tasks to run faster...

So instead I added a line my client configuration in ~/.ssh/config with all supported ciphers but again in the preferred order with aes256-gcm@openssh.com first:

real    0m3.790s
user    0m1.672s
sys     0m2.019s

It works.

Of course it works.

Client configuration determines the order of ciphers to use, not the server - now to connect with maximum performance every user on every host needs to be configured to pick AES256 by default.

When cipher lines are added to /etc/ssh/ssh_config, all ssh connections will use the configured order by default, there is no need to set it per host.

Result:

  1. Have faster encrypted file transfers
  2. Won't strain the servers that much with (1)
  3. Have more free time.

It's a definitely win-win.