SSD technology being used in the ‘field’ – both in workstations and data centers – is a relatively new thing. Traditional HDDs have been the go-to for a very long time, partly due to the (previously) vastly higher cost per gigabyte of SSD storage.
In the last 5-8 years, this is undergoing dramatic change – the pricing gap is closing, and the high speeds offered by SSD’s has made them a lot more attractive than they were in the past.
But how are we supposed to evaluate the reliability of this new technology?
SSD reliability: estimation or guesstimation?
SSD manufacturers often tout impressive lifespan estimates, even though they aren’t necessarily able to subject the drives they develop to a real-world usage test, as it takes too long to be feasible.
The manufacturer provides a (potential) buyer with a ‘guesstimate’ rather than an accurate evaluation, which places their purchase decision on a treacherous foundation.
It feels like we have been here before – remember LED lightbulbs?
When they first experienced a dramatic uptake in usage, end users noticed that it often wasn’t the actual light-emitting diode that failed but the driver (internal or external) of the bulb, which provides it with the DC power it needs to function.
Much like these bulbs, SSDs are complex devices with a wide array of points of failure, making it exceedingly difficult to make an accurate lifetime estimate.
And this is not just a clever analogy: enterprise-level SSD users have encountered lifetime estimates that deviate by up to 60% from the manufacturer specification. It seems safe to assume that the manufacturer lifetime projection is little more than a (poorly) informed guess.
Predicting SSD drive failure
Using Self-Monitoring, Analysis and Report Technology (or S.M.A.R.T for short) drives report key values from onboard sensors and controllers.
We can use these S.M.A.R.T data values to determine reliable indicators to predict drive failure.
Because SSD technology is (relatively) young, this is not a simple task – a 2016 study considered the what, when, and why of SSD failures based on a sample of no less than half a million SSDs in use across different data centers.1
Data errors and sector reallocations are strong indicators in predicting SSD failure.
Reallocations are the most precise indicator as roughly 45% of drives that failed exhibited this symptom before they stopped working. Only 10% of drives that showed this symptom stayed healthy.
As expected device/server workload is also relevant: drives that are used more intensively tend to wear out faster.
Which SSD technology is most reliable?
In short: SLC, MLC, and eMLC are relatively equal.2
SLC technology is much more reliable on paper: the occurrence of bit errors relative to program/erase cycles (i.e., the drive usage) is lower than other technologies.
However, when we look at repair & replacement rates in the field, they are just about equal – Google noticed this when analyzing their use of SSD technology in data centers.
Even though SLC drives may not exhibit the same tell-tale signs before failure, they will fail all the same.
There is currently relatively little quantitative data available for TLC and QLC failure rates. However, manufacturers acknowledge that QLC drives tend to have slightly lower endurance and reliability because of the higher storage density.
Which manufacturer makes the most reliable drives?
Researchers found several factors beyond the warning indicators that predict how well a drive will perform.
Good product design is essential to achieving high reliability.
SSD design impacts all critical reliability factors.
Many manufacturers disclose which memory controller they are using.
It is a good indicator to use to determine what to expect when it comes to reliability. If the manufacturer is willing to cheap out on what is essentially the brains of the operation, this mindset may extend to the rest of the product.
Manufacturers using quality components will not hide this information, as they are proud to use top-of-the-line chipsets in their products.
If it seems too good to be true, it probably is. If the price of the SSD you are considering sits considerably lower than the industry average, you can expect the manufacturer to be cutting corners. Top brands try to keep their price down to compete as well – go much lower than that, and you are entering dangerous territory.
For the past few years, Samsung and Western Digital have dominated the market, making them a relatively safe bet for many professional and personal consumers. This large market share comes with a lower price point and a large user base making it easier to find anecdotal experience and product reviews.
They also have (for the most part) ‘in-house’ development and manufacturing, as opposed to companies like Kingston who do not manufacture their own NAND flash chips or controllers.
Replacing a failed SSD
The large scale study conducted on SSD failures in datacenters identified an interesting difference between SSD and HDD failure:
instead of 11% for traditional HDDs, 79% of all SSD failure-related ’tickets’ or complaints raised internally within the business resulted in the SSD drive being replaced.1
In an enterprise environment, SSDs are often used to boost read/write speeds for critical data.
Think of operating systems, databases that are queried often or written to intensively, serving content (web, media, documents) and other tasks that primarily benefit from higher throughput speeds.
SSD drives often fill a critical role. Downtime needs to be minimal, leading to higher replacement rates. Taking the time to troubleshoot or recover often makes little business sense compared to replacing the drive and reinstating a backup.
Suppose you are similarly using an SSD (this includes using it as a boot drive for a workstation or a personal computer). In that case, you should anticipate failure and prepare to replace the drive and reinstate an image/snapshot from a slower ‘cold’ backup.
If one neglects to do this, you are setting yourself up for a lot of pain down the road:
SSD recovery is more complex, more expensive, and has an overall lower success rate than recovering a traditional HDD.
Your mileage may vary, but a cursory google search showed that HDD recovery would cost about 300 USD, whereas SSD recovery started at 1000 USD.
I can not stress this enough: not having a disaster recovery plan will cost time, money, and a whole lot of stress that you could have avoided.
Of course, you should take this with a grain of salt – the SSD devices used in this study had about 100 GB of data read from it and 10 GB written to it on any given day.
Let me give a personal example: over 602 workdays, the m.2 SSD in my main workstation has conducted about 11 Terabytes of both reading and writing, for a total of 22 terabytes. On an average day, my drive does the same amount of writing but only 19% of reading that the enterprise drives do.
Can SSD failure be prevented?
The sad reality is: not really. Every storage device will eventually wear out.
Some devices will far outlive the manufacturer’s estimate, and others will fail a month into very light usage. Since it is impossible to tell what the fate of a device will be, it is essential to have a contingency plan in place.
We all know driving a car at high speeds and high RPMs with a lot of time between maintenance and oil changes will probably result in the untimely demise of the vehicle.
Extending the same mentality to SSD drives has some merit to it. Suppose you have above-average read/write volume (otherwise known as program/erase or P/E cycles), exceptionally high operating temperatures, or find yourself erasing the drives using multiple passes often. In that case, you can expect the SSD to wear out sooner.
On the other hand, it is not advisable to assume a drive will last longer because your use is light or sporadic – always plan for the absolute worst-case scenario even though you take care of the equipment involved.
Are SSDs always better than a HDD?
The answer is the same as for many other performance-related questions: it depends.
Ask yourself what is most important in the specific context you will be using your drive for, and you may find that SSDs are (very) interesting for the vast majority of applications.
Price per GB
an SSD drive will be 25 to 50% more expensive on average.
High performance and/or small form factor SSD drives can be up to 200% more expensive than an SSD.
SSD price per GB:
$ 0,10 or € 0,08
HDD price per GB:
$ 0,04 to 0,06 or € 0,03 to 0,05
Prices from 2020 – subject to change.
Write speed
If you are looking for high speeds, m.2 ssd drives are the way to go.
This is namely because the SATA interface was essentially made for traditional drives, and this gets in the way of SSD performance.
However, SATA SSDs still offer a significant speed gain.
SSDs also do not suffer from latency caused by “spin-up”.
in some use cases (like a secondary storage drive in a PC), hard drives may stop spinning when not in use to reduce wear and heat generation. When the drive needs to read/write, it needs to spin up to the correct RPM, which may cause latency.
As such, the real-world performance of an SSD may be even better than it is on paper.
SATA HDD:
-130 to 160 MB/s for most brands
SATA SSD:
-400 to 500 MB/s (e.g. Samsung, WD, crucial)
-350 to 400 MB/s (budget brands e.g. Kingston)
high density m.2 SSD:
-1500 MB/s (e.g. WD Blue)
High end m.2 SSD:
-1800-2200 MB/s (e.g. Samsung Pro Evo)
Boot speed
When it comes to boot speed, SSDs smash the competition: swapping a HDD for a SSD will easily halve the boot speed of a given system.
Other limiting factors, like the motherboard firmware and whether or not you are using UEFI and fast booting technology may affect results.
Generally, the fastest boot speeds can be achieved by using a very fast m.2 drive in combination with a modern motherboard and operating system.
Physical durability & environmental factors
Every HDD has moving parts: the read/write header moves around as it functions.
SATA drives usually spin at 5400 or 7200 RPM, while SAS drives spin at 10000 or 15000 RPM.
It goes without saying that any drive that has (rapidly) moving parts is fragile to some degree, and tends to generate both heat and noise.
Even though SSDs (especially smaller ones) do generate some heat, they do not suffer from the downsides that come with having moving parts like an HDD does.
This makes SSDs the obvious choice for most places where equipment needs to be more rugged: a laptop used by technicians, a mobile server rack that sits in a running & vibrating truck, …
Frequent access performance
SSD drives are the best choice for any files that are accessed often – whether this is in a server or a workstation.
The combination of their high read/write speeds and the low latency (by not having to spin up the disk) makes their performance for frequently accessed files superior.
SSD drives also do not suffer from fragmentation, which tends to degrade reading speeds in HDDs.
Mass storage
HDDs are the best option for mass storage for several reasons.
- Lower price per GB
- Drive can be set up to only ‘spin up’ for reading/writing, prolonging the lifespan
- Many commercial options to create a storage array using multiple 3.5″ HDDs
- Lower read/write speed is usually less relevant for long-term data storage
- Recovery of a failed HDD is cheaper and has a higher succes rate
In this case, you are trading speed for a lower price and a possibly longer lifespan.
This makes it ideal for making backups or storing files that are accessed infrequently or even rarely (e.g. media content like photographs and videos from past projects, documents from a project that was finished years ago, …)
In short, SSDs work great for just about anything but long-term storage of infrequently accessed bulk data: HDDs are still king when it comes to price point and lifespan.
Does a SSD need maintenance?
No, a SSD does not need any maintenance.
Because the storage ‘architecture’ works differently, fragmentation does not occur on a SSD. This makes defragmentation entirely unnecessary.
Do make sure to leave some empty space on the drive. About 5% should be enough.
Some manufacturers use caching methods that require a small amount of empty space to improve performance – as such, filling up the drive completely is not a great idea.
Also make sure to monitor the health of the drive using a S.M.A.R.T. tool – some computers have this functionality integrated, but you can also use choose to use free software like CrystalDiskInfo to check up on the health of the drive(s) in your system.
Closing statements
The trends are clear: SSDs have captured a place in the heart of consumers, professionals, and manufacturers. The various benefits they have over HDDs far outweigh the higher price point, which also shrinks as technology progresses.
As with all technologies relating to data, one should still proceed with caution.
Failures are seemingly more challenging to predict as well as more dramatic due to lower recovery success rates for the drives that do end up failing.
When implemented correctly a SSD can provide a massive performance increase in both old and new systems alike – it makes a lot of sense to use them, but make sure to stick to the best practices to protect your data!
References
1: SSD Failures in Datacenters
https://dl.acm.org/doi/abs/10.1145/2928275.2928278
2: Flash Reliability in Production
https://www.usenix.org/conference/fast16/technical-sessions/presentation/schroeder