Amazon announced today the release of a new EC2 instance type targeted at high I/O dependant applications. We’ve run benchmarks on a lot of the Amazon plans in the past & found them to be rather lacklustre in terms of performance.
High I/O Quadruple Extra Large Instance Specs (hi1.4xlarge)
Firstly lets run with Cached IO. From previous SSD testing we should expect this upwards of the 800mb/s mark.
ioping . -c 10 -C 4096 bytes from . (ext3 /dev/xvdf): request=1 time=0.0 ms 4096 bytes from . (ext3 /dev/xvdf): request=2 time=0.0 ms 4096 bytes from . (ext3 /dev/xvdf): request=3 time=0.0 ms 4096 bytes from . (ext3 /dev/xvdf): request=4 time=0.0 ms 4096 bytes from . (ext3 /dev/xvdf): request=5 time=0.0 ms 4096 bytes from . (ext3 /dev/xvdf): request=6 time=0.0 ms 4096 bytes from . (ext3 /dev/xvdf): request=7 time=0.0 ms 4096 bytes from . (ext3 /dev/xvdf): request=8 time=0.0 ms 4096 bytes from . (ext3 /dev/xvdf): request=9 time=0.0 ms 4096 bytes from . (ext3 /dev/xvdf): request=10 time=0.0 ms --- . (ext3 /dev/xvdf) ioping statistics --- 10 requests completed in 9000.9 ms, 217391 iops, 849.2 mb/s min/avg/max/mdev = 0.0/0.0/0.0/0.0 ms
Then Direct IO. This number is actually comparable to some of the RAMNode SSD plans.
ioping . -c 10 -D 4096 bytes from . (ext3 /dev/xvdf): request=1 time=0.2 ms 4096 bytes from . (ext3 /dev/xvdf): request=2 time=0.3 ms 4096 bytes from . (ext3 /dev/xvdf): request=3 time=0.3 ms 4096 bytes from . (ext3 /dev/xvdf): request=4 time=0.3 ms 4096 bytes from . (ext3 /dev/xvdf): request=5 time=0.3 ms 4096 bytes from . (ext3 /dev/xvdf): request=6 time=0.3 ms 4096 bytes from . (ext3 /dev/xvdf): request=7 time=0.3 ms 4096 bytes from . (ext3 /dev/xvdf): request=8 time=0.3 ms 4096 bytes from . (ext3 /dev/xvdf): request=9 time=0.3 ms 4096 bytes from . (ext3 /dev/xvdf): request=10 time=0.3 ms --- . (ext3 /dev/xvdf) ioping statistics --- 10 requests completed in 9003.5 ms, 3700 iops, 14.5 mb/s min/avg/max/mdev = 0.2/0.3/0.3/0.0 ms
Then a seek rate test.
ioping -R /dev/xvdf --- /dev/xvdf (device 1024.0 Gb) ioping statistics --- 12296 requests completed in 3000.2 ms, 7817 iops, 30.5 mb/s min/avg/max/mdev = 0.1/0.1/0.4/0.1 ms
Now we’ll test the sequential writes
--- . (ext3 /dev/xvdf) ioping statistics --- 3377 requests completed in 3000.4 ms, 1344 iops, 336.0 mb/s min/avg/max/mdev = 0.6/0.7/2.6/0.2 ms
dd if=/dev/zero of=test bs=1M count=1k oflag=dsync 1073741824 bytes (1.1 GB) copied, 3.75235 s, 286 MB/s
dd if=/dev/zero of=test bs=64k count=16k oflag=dsync 1073741824 bytes (1.1 GB) copied, 13.4007 s, 80.1 MB/s
dd if=/dev/zero of=test bs=1M count=1k conv=fdatasync 1073741824 bytes (1.1 GB) copied, 3.25754 s, 330 MB/s
dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync 1073741824 bytes (1.1 GB) copied, 3.3367 s, 322 MB/s
There seems to be an issue with disk performance in dsync mode, we see this often with SSD drives. We we really need to use a bigger file to test the sustained performance.
dd if=/dev/xvdf of=test bs=1M 7646380032 bytes (7.6 GB) copied, 17.4732 s, 438 MB/s
hdparm -tT /dev/xvdf /dev/xvdf: Timing cached reads: 14502 MB in 1.99 seconds = 7289.54 MB/sec Timing buffered disk reads: 1142 MB in 3.00 seconds = 380.53 MB/sec
------------------------------------------------------------------------
Benchmark Run: Fri Jul 20 2012 02:40:24 - 03:08:35
16 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 24160326.4 lps (10.0 s, 7 samples)
Double-Precision Whetstone 2989.9 MWIPS (10.0 s, 7 samples)
Execl Throughput 986.9 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 271531.6 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 70013.0 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 802038.9 KBps (30.0 s, 2 samples)
Pipe Throughput 410363.6 lps (10.0 s, 7 samples)
Pipe-based Context Switching 44733.0 lps (10.0 s, 7 samples)
Process Creation 2211.5 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 3919.1 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 1538.9 lpm (60.0 s, 2 samples)
System Call Overhead 441575.6 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 24160326.4 2070.3
Double-Precision Whetstone 55.0 2989.9 543.6
Execl Throughput 43.0 986.9 229.5
File Copy 1024 bufsize 2000 maxblocks 3960.0 271531.6 685.7
File Copy 256 bufsize 500 maxblocks 1655.0 70013.0 423.0
File Copy 4096 bufsize 8000 maxblocks 5800.0 802038.9 1382.8
Pipe Throughput 12440.0 410363.6 329.9
Pipe-based Context Switching 4000.0 44733.0 111.8
Process Creation 126.0 2211.5 175.5
Shell Scripts (1 concurrent) 42.4 3919.1 924.3
Shell Scripts (8 concurrent) 6.0 1538.9 2564.9
System Call Overhead 15000.0 441575.6 294.4
========
System Benchmarks Index Score 527.9
------------------------------------------------------------------------
Benchmark Run: Fri Jul 20 2012 03:08:35 - 03:36:59
16 CPUs in system; running 16 parallel copies of tests
Dhrystone 2 using register variables 201313595.2 lps (10.0 s, 7 samples)
Double-Precision Whetstone 40724.6 MWIPS (10.1 s, 7 samples)
Execl Throughput 8050.6 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 348105.5 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 95790.4 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1189159.8 KBps (30.0 s, 2 samples)
Pipe Throughput 4705588.4 lps (10.0 s, 7 samples)
Pipe-based Context Switching 634652.9 lps (10.0 s, 7 samples)
Process Creation 14990.6 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 21494.0 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 2880.1 lpm (60.2 s, 2 samples)
System Call Overhead 4191761.5 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 201313595.2 17250.5
Double-Precision Whetstone 55.0 40724.6 7404.5
Execl Throughput 43.0 8050.6 1872.2
File Copy 1024 bufsize 2000 maxblocks 3960.0 348105.5 879.1
File Copy 256 bufsize 500 maxblocks 1655.0 95790.4 578.8
File Copy 4096 bufsize 8000 maxblocks 5800.0 1189159.8 2050.3
Pipe Throughput 12440.0 4705588.4 3782.6
Pipe-based Context Switching 4000.0 634652.9 1586.6
Process Creation 126.0 14990.6 1189.7
Shell Scripts (1 concurrent) 42.4 21494.0 5069.3
Shell Scripts (8 concurrent) 6.0 2880.1 4800.1
System Call Overhead 15000.0 4191761.5 2794.5
========
System Benchmarks Index Score 2652.2
The server seems to struggle in UnixBench with the low buffer size file copy tests, but excels in the Shell Script tests. The CPU is a Intel Xeon E5620 which is disappointing, it seems like it could be really holding this particular instance back when you consider some of the higher end CPU’s (E5 Series) benchmark up to 400x better.
Running this instance is by no means cheap, at $3.10/hour it’ll set you back almost $2,400 per month. However for high IO intensive applications that are currently using a large amount of EC2 instances it may be worthwhile downscaling to a smaller number of the hi1.4xlarge instances. You can see some interesting calculations from Netflix earlier today on their savings by moving to the high i/o instances.