← Back to Blog Home

Amazon announced today the release of a new EC2 instance type targeted at high I/O dependant applications. We’ve run benchmarks on a lot of the Amazon plans in the past & found them to be rather lacklustre in terms of performance.

High I/O Quadruple Extra Large Instance Specs (hi1.4xlarge)

  • 60.5 GB of memory
  • 35 EC2 Compute Units (8 virtual cores with 4.4 EC2 Compute Units each)
  • CPUs: Intel Xeon E5620
  • 2 SSD-based volumes each with 1024 GB of instance storage
  • Uplink: 10 Gigabit Ethernet

IOPS

Firstly lets run with Cached IO. From previous SSD testing we should expect this upwards of the 800mb/s mark.

ioping . -c 10 -C
4096 bytes from . (ext3 /dev/xvdf): request=1 time=0.0 ms
4096 bytes from . (ext3 /dev/xvdf): request=2 time=0.0 ms
4096 bytes from . (ext3 /dev/xvdf): request=3 time=0.0 ms
4096 bytes from . (ext3 /dev/xvdf): request=4 time=0.0 ms
4096 bytes from . (ext3 /dev/xvdf): request=5 time=0.0 ms
4096 bytes from . (ext3 /dev/xvdf): request=6 time=0.0 ms
4096 bytes from . (ext3 /dev/xvdf): request=7 time=0.0 ms
4096 bytes from . (ext3 /dev/xvdf): request=8 time=0.0 ms
4096 bytes from . (ext3 /dev/xvdf): request=9 time=0.0 ms
4096 bytes from . (ext3 /dev/xvdf): request=10 time=0.0 ms

--- . (ext3 /dev/xvdf) ioping statistics ---
10 requests completed in 9000.9 ms, 217391 iops, 849.2 mb/s
min/avg/max/mdev = 0.0/0.0/0.0/0.0 ms

Then Direct IO. This number is actually comparable to some of the RAMNode SSD plans.

ioping . -c 10 -D
4096 bytes from . (ext3 /dev/xvdf): request=1 time=0.2 ms
4096 bytes from . (ext3 /dev/xvdf): request=2 time=0.3 ms
4096 bytes from . (ext3 /dev/xvdf): request=3 time=0.3 ms
4096 bytes from . (ext3 /dev/xvdf): request=4 time=0.3 ms
4096 bytes from . (ext3 /dev/xvdf): request=5 time=0.3 ms
4096 bytes from . (ext3 /dev/xvdf): request=6 time=0.3 ms
4096 bytes from . (ext3 /dev/xvdf): request=7 time=0.3 ms
4096 bytes from . (ext3 /dev/xvdf): request=8 time=0.3 ms
4096 bytes from . (ext3 /dev/xvdf): request=9 time=0.3 ms
4096 bytes from . (ext3 /dev/xvdf): request=10 time=0.3 ms

--- . (ext3 /dev/xvdf) ioping statistics ---
10 requests completed in 9003.5 ms, 3700 iops, 14.5 mb/s
min/avg/max/mdev = 0.2/0.3/0.3/0.0 ms

Then a seek rate test.

ioping -R /dev/xvdf

--- /dev/xvdf (device 1024.0 Gb) ioping statistics ---
12296 requests completed in 3000.2 ms, 7817 iops, 30.5 mb/s
min/avg/max/mdev = 0.1/0.1/0.4/0.1 ms

Now we’ll test the sequential writes

--- . (ext3 /dev/xvdf) ioping statistics ---
3377 requests completed in 3000.4 ms, 1344 iops, 336.0 mb/s
min/avg/max/mdev = 0.6/0.7/2.6/0.2 ms

DD

dd if=/dev/zero of=test bs=1M count=1k oflag=dsync

1073741824 bytes (1.1 GB) copied, 3.75235 s, 286 MB/s
dd if=/dev/zero of=test bs=64k count=16k oflag=dsync 

1073741824 bytes (1.1 GB) copied, 13.4007 s, 80.1 MB/s
dd if=/dev/zero of=test bs=1M count=1k conv=fdatasync

1073741824 bytes (1.1 GB) copied, 3.25754 s, 330 MB/s
dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync

1073741824 bytes (1.1 GB) copied, 3.3367 s, 322 MB/s

There seems to be an issue with disk performance in dsync mode, we see this often with SSD drives. We we really need to use a bigger file to test the sustained performance.

dd if=/dev/xvdf of=test bs=1M

7646380032 bytes (7.6 GB) copied, 17.4732 s, 438 MB/s
hdparm -tT /dev/xvdf

/dev/xvdf:
 Timing cached reads:   14502 MB in  1.99 seconds = 7289.54 MB/sec
 Timing buffered disk reads: 1142 MB in  3.00 seconds = 380.53 MB/sec

UnixBench

------------------------------------------------------------------------
Benchmark Run: Fri Jul 20 2012 02:40:24 - 03:08:35
16 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       24160326.4 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     2989.9 MWIPS (10.0 s, 7 samples)
Execl Throughput                                986.9 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        271531.6 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           70013.0 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        802038.9 KBps  (30.0 s, 2 samples)
Pipe Throughput                              410363.6 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  44733.0 lps   (10.0 s, 7 samples)
Process Creation                               2211.5 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   3919.1 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1538.9 lpm   (60.0 s, 2 samples)
System Call Overhead                         441575.6 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   24160326.4   2070.3
Double-Precision Whetstone                       55.0       2989.9    543.6
Execl Throughput                                 43.0        986.9    229.5
File Copy 1024 bufsize 2000 maxblocks          3960.0     271531.6    685.7
File Copy 256 bufsize 500 maxblocks            1655.0      70013.0    423.0
File Copy 4096 bufsize 8000 maxblocks          5800.0     802038.9   1382.8
Pipe Throughput                               12440.0     410363.6    329.9
Pipe-based Context Switching                   4000.0      44733.0    111.8
Process Creation                                126.0       2211.5    175.5
Shell Scripts (1 concurrent)                     42.4       3919.1    924.3
Shell Scripts (8 concurrent)                      6.0       1538.9   2564.9
System Call Overhead                          15000.0     441575.6    294.4
                                                                   ========
System Benchmarks Index Score                                         527.9

------------------------------------------------------------------------
Benchmark Run: Fri Jul 20 2012 03:08:35 - 03:36:59
16 CPUs in system; running 16 parallel copies of tests

Dhrystone 2 using register variables      201313595.2 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                    40724.6 MWIPS (10.1 s, 7 samples)
Execl Throughput                               8050.6 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        348105.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           95790.4 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1189159.8 KBps  (30.0 s, 2 samples)
Pipe Throughput                             4705588.4 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 634652.9 lps   (10.0 s, 7 samples)
Process Creation                              14990.6 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  21494.0 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   2880.1 lpm   (60.2 s, 2 samples)
System Call Overhead                        4191761.5 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0  201313595.2  17250.5
Double-Precision Whetstone                       55.0      40724.6   7404.5
Execl Throughput                                 43.0       8050.6   1872.2
File Copy 1024 bufsize 2000 maxblocks          3960.0     348105.5    879.1
File Copy 256 bufsize 500 maxblocks            1655.0      95790.4    578.8
File Copy 4096 bufsize 8000 maxblocks          5800.0    1189159.8   2050.3
Pipe Throughput                               12440.0    4705588.4   3782.6
Pipe-based Context Switching                   4000.0     634652.9   1586.6
Process Creation                                126.0      14990.6   1189.7
Shell Scripts (1 concurrent)                     42.4      21494.0   5069.3
Shell Scripts (8 concurrent)                      6.0       2880.1   4800.1
System Call Overhead                          15000.0    4191761.5   2794.5
                                                                   ========
System Benchmarks Index Score                                        2652.2

The server seems to struggle in UnixBench with the low buffer size file copy tests, but excels in the Shell Script tests. The CPU is a Intel Xeon E5620 which is disappointing, it seems like it could be really holding this particular instance back when you consider some of the higher end CPU’s (E5 Series) benchmark up to 400x better.

Price

Running this instance is by no means cheap, at $3.10/hour it’ll set you back almost $2,400 per month. However for high IO intensive applications that are currently using a large amount of EC2 instances it may be worthwhile downscaling to a smaller number of the hi1.4xlarge instances. You can see some interesting calculations from Netflix earlier today on their savings by moving to the high i/o instances.

View Amazon’s Plans & Benchmarks