Fusion IO SSD Drive First Tests

DSC_0122

I had some more time to do testing with the Fusion IO card over the last week. Once again it’s proven it’s much more than just a standard SSD drive. It gains the benefits of SSD by having no moving parts and being all memory based. The big difference between the card and SSD drives is that this is plugged directly into the /motherboard using your PCI-E slot. This gives it a much faster transfer rate through the PCI-E slot than through the Sata controller like a standard SSD drive.

I used Brent O’s (Blog|Twitter) Article and script on SQL Server Pedia to get the IO numbers for the drive (Listed here). If you run the script from Sql Server Pedia then you can download my data that I’ve placed on Google docs.  Paste it into the raw data section and you can view it in the pivot table to directly compare to my results.  I have only included IO’s per/sec,  but all the counters are available in the excel file.   Here is the file on Google Docs.

FusionIoRandomsequential

The Fusion IO Drive is  a 160GB drive.  The card I tested retails at $6995

The san drives I tested against were 7 7200 RPM SATA drives in a 5+2 RAID 6 (7 total drives).  Connected via 4GB Fiber channel

So where can you use this effectively in your environment? If you’re a medium sized company and you have some Large Db’s (200-300GB) then placing the whole DB on this drive is not really a cost effective option. But here’s some options I’ve found success with.

1. Place your Tempdb on the Drive.

a. By placing Tempdb on the drive  I got a boost on basically all SQL Server activity since many of the day to day operations of SQL Server find their way into Tempdb. I’ll have a blog post in the future with some more specific numbers around this.

2. Place your Indexes on the Drive.

a. Given the extreme write and read performance of these cards if you place indexes on the drive you can also see a boost in performance. This does require a pretty big change since you have to drop and re-create your indexes into a new filegroup.

3. Use the card as a way to remove disk latency for testing

a. I have a project right now that needs very very fast disk access. I need to simulate a system that can do thousands of inserts a second in SQL Server. So Instead of trying to find a Raid 10 array on my san and getting it configured I can use this card to do this sort of testing right on my own box. It’s allowing me to find the fastest way to get data into my system without worrying about disk performance.

So here’s a list of some of the Pro’s and Con’s of the drives in general.

Pro’s

Blazing fast speed

Easy to install and setup

Con’s

Price

These cards will degrade over time.  Here’s a explanation as to why offered by Fusion IO.

Doesn’t NAND flash have a write limit? How does that effect the lifetime of the ioDrive™?

NAND flash has a limit on the number of writes that can be done to an individual cell. The particular limit depends on the type of flash used. For Single Level Cell (SLC) NAND, the limit exceeds 1,000,000 writes to a cell, whereas for Multi Level Cell (MLC) NAND, it is on the order of 10,000 writes. Hence, in order to exceed the limit of a single 80G ioDrive™, you would have to write almost 80PB (Petabytes) of data. Streaming data at 800GB/s to the card, it would take you 3.4 years of writing data non-stop to exceed the SLC limit.

Hopefully you can use the information I provided in the spreadsheet to determine your own ROI and whether the card is right for you.  Personally I have some more tests to run and will continue to post information on my write project that I’m working on with the Fusion Card as my testing platform.

pat

10 thoughts on “Fusion IO SSD Drive First Tests

  1. Estimating the probability of failure of a device’s component parts can be divided up into three main life stages: the infant mortality stage, the prime stage, and the wear-out stage. Infant mortality is measured using two metrics called Defective Parts Per Million (DPPM) and Failures In Time-Early Failure Rate (FIT-EFR). The prime of a device’s life, when the infant mortality failure has leveled off but before wear-out sets in, is measured using two metrics called Mean Time Between Failures (MTBF) and Failures In Time-Intrinsic Failure Rate (FIT-IFR). Wear-out, in the case of a NAND flash device, is generally a function of having lost enough storage cells that both capacity and reliability drop below acceptable thresholds which can be assessed by evaluating and keeping a record of the amount of errors detected at each physical location. Since the NAND flash chips themselves are considerably more likely to fail than any other part on the device, the wear-out rate of the NAND flash itself is the variable that best bounds the end of the device’s usable life. Assuming using prpoer strategies to ensure wear-leveling, it is fairly easy to predict the rate at which NAND flash wears out, as described by the following formulas:total-write-volume = total-capacity x usable-cyclesSLC flash 160GB x 1,000,000 = 160 PBMSC flash 320BG x 100,000 – 32 PBNote that MLC flash has double the capacity, since it can store two bits per cell instead of only one bit per cell, but at the cost of less durability. SLC can withstand one million program/erase cycles (assuming the use of error correction), whereas MLC can withstand one hundred thousand program/erase cycles (again assuming error correction).lifetime = total-write-volume / write-rateSLC flash 160PB / 600 MB/s = 10 write-yearsMLC flash 32PB / 600 MB/s = 2 write-yearsThis figure takes into account the much higher speed at which a Fusion-io device can write data, and assumes continuous write cycles, which is the worst-case scenario. Adjusting these numbers for something closer to expected use patterns gives the following:avg-lifetime = lifetime / read-write ratioSLC flash @ 40% write duty = 10 write-years / .4 read-write-ratio = 25 calendar yearsMLC flash @ 20% write duty = 2 write-years / .2 read-write-ratio = 10 calendar yearsMLC flash @ 40% write duty = 2 write-years / .4 read-write-ratio = 5 calendar years

  2. The read/write ratio is difficult to predict, and will vary considerably from environment to environment. As a point of reference, the International Disk-drive Equipment and Materials Association (IDEMA), an industry trade group that publishes storage device standards, recommends a read/write ratio of 60%/40% for its server-class device reliability testing (IDEMA Standards, Document R3-98).When a NAND chip on an ioDrive begins to fail, it simply writes the data to an adjacent NAND chip without experiencing any data loss. In the unlikely case that chips reach their write capacity, the drive simply will simply stop accepting new writes (though it continues to allow reads). We provide monitoring utilities that allow IT staff to know such things as how much life is left on the card and they can configure SNMP alerts, so they can know to replace drives before they fail (SMI‐S and WMI alerts will be supported in the future). Thus, with ioDrives failure becomes predictable and routine.Also, ioDrive has multiple technology features that protect against data loss. These include:- Multi-bit error detection and correction ensuring data integrity. Currently, ioDrives perform 11-bit correction on 240 bytes of data versus the industry standard of 2-bit correction on 540 bytes.- Patent-pending Flashback protection protects data when chip level N+1 redundancy, allowing chips to failover in RAID-5 like fashion. It allows users to never worry about chip failure resulting in data loss. This feature includes on-board self-healing so that no servicing is required at the chip level.- Error checking within the controller ensure end-to-end data integrity from the time data enters the controller to the time it reaches the disk.- At the drive level, ioDrives can be mirrored with software RAID, providing redundancy between ioDrive.Hope this answers a few questions surrounding lifetime of our ioDrive technology. As you can see, NAND does have some limitations as to the lifetime, but with Fusion-io technology in place, we can greatly increase that time and data integrity.

  3. That's so funny that you're working on 'em – I'm using that same SQLIO script as we speak on one! Next I'll be using Benchmark Factory to run TPC benchmarks against it in a few different configs. I want to test the old advice of putting SQL Server data and log files on separate drives – I don't think that's going to matter anymore with solid state drives until you hit the bus barrier.

  4. That's Great Brent I looked forward to Comparing notes! 🙂 My next step with it is a huge insert/ read write project like I mentioned I'll be posting lots of good info on that as well. 🙂

  5. Gentlemen, nice write up and comments. Very informative!The only thing i would add is that the pricing is right on if you are comparing a price/GB scenario. If you look at it in a price/performance than this technology is very cost effective.

  6. I always like to evaluate pricing of Fusion-io solution with a question. The question is this:"How much should one expect to pay in order to achieve a 100% or more performance improvement by traditional means (Adding Spindles, Adding RAM etc)?"What I consistently find is that Fusion-io provides this kind of Database Performance improvements at Total costs that are significantly less than the costs for traditional solutions for at least doubling the performance. Wine.com is a perfect example of that.

  7. Carnegie Mellon came out with an alarming white paper about the "Write Cliff" phenomenon these devices exhibit. Is anyone writing data for more than a few minutes to see if the CMU results are corroborated? It would be a little awkward to have to explain to my management that our expensive SSDs are actually slower than cheap HDDs in a true production environment.

  8. Athgeek, could you provide a link to this "White Paper"? I keep trying to Google it, but all I seem to come up with is your eye-catching fear and doubt-inspiring post. Thanks!

  9. Utah Jones, I agree I would like to see the white paper. To Answer Athgeek's question I was writing heavily to mine for about an hour. These tests didn't necessarily take this long but I have several other Data imports I've been testing with it that I ran for longer periods. pat

Leave a comment