From time to time I hear people proudly boast that their data is safe because it is on RAID. They are WRONG and will unfortunately find out the hard way one day.
RAID stands for Redundant Array of Independent Disks and names a set of methods for distributing data across multiple hard drives. In most cases, the RAID array is designed to recover from at least one drive failure. This ability leads some to believe, incorrectly, that is a appropriate solution to the backup problem. RAID arrays have a level, generally 0, 1, 5 or 6. Raid 0 and 1 are the most common for consumers. Raid 0 is called “stripping” and offers no redundancy at all. It is all about performance. Raid 1 is called “mirroring” and as the name implies, it is about keeping 2 identical copies of your data.
So, with a RAID 1 array, with the data mirrored (exactly the same data is on both hard drives at all times) on two drives, why isn’t it a backup system?
- RAID can not protect against local disaster or theft. What happens if someone steals the entire computer? What about a fire or lighting strike?
- RAID can not protect against software errors like bugs in the file system drivers.
- RAID can not protect against virus or mal-ware that delete your data.
- RAID can not protect against human error (fat-fingers, overwrites, delete all, etc).
Of course that isn’t even the entire story. Even with a plain old hard drive failure – the thing RAID 0 is designed to protect against – the array often fails completely. Why? Because the existing “good” drive is subjected to considerable stress for several hours while the only existing copy of your data living on it is copied to the replacement drive. This often leads to a second failure of the “good” drive.
On a side note, the proper use of RAID is actually to maintain uptime and/or to increase performance. RAID allows many commercial systems (web servers, factory automation systems, etc) to survive one of the most common failures – a hard drive – without interrupting their users… most of the time.
Home User Backups
Backups are about surviving the worst case scenario.
The key to a good backup system is to maintain recent copies at a different location than your computer. The farther the better. Your next door neighbors house is better than nothing but isn’t ideal. Consider that you probably live in an area prone to some natural disaster that will wipe out a neighborhood, not just your house. For example flooding, tornadoes, hurricanes, earthquakes and wildfire all come easily to mind.
There is no one right solution. Instead, consider the following rules:
Keep at least one copy of your backup off line (powered down and disconnected) at all times.
Test your backup. However you create them, whether using automated software, or by simply dragging the folder to your backup drive, take time after each backup, or at least monthly, to verify that recent data is actually on the backup drive and usable.
Use an incremental solution if at all possible. Incremental backups are a form of backups where only new and changed data is backed each time a backup happens. Generally older incremental backups are kept for sometime before being removed to free space. Using incremental backups does two things.
- Backups are faster because only new and changed data must be copied.
- You can recover older versions of files that have been changed and then backed up more recently (i.e. you can get to older versions if something corrupts the most recent version).
Keep the backup copy(s) safe. That means as far from your computer as possible when a backup is not going on.
- Store the backup drive at your work place and only bring it home on your backup day (say every Tuesday).
- Purchase 2 identical external drives (they are cheap now!) and rotate them on a monthly basis. Keep whichever drive you are not currently using off-site at an office or friends house, or in a safe deposit box. Keep the other locally in a fire safe.
Use more than one backup system. This sounds like over kill and for some it may be, but for others (those who answer “oh god, I’d be done for” to the question “what would you do if you lost all your data”), it is a must.
Your primary system should be something simple and understandable that requires your interaction. Why? Because then you are paying attention to it and not just ignoring it only to realize when your hard drive fails that the automatic backup system you use stopped working 3 months ago.
You should also have a secondary system. It should be automated. One of the on line backup systems (“in the cloud”) is a good option here. The reason this isn’t good enough to be the primary is that it is totally out of your control and it relies on having a fast data connection when you need to do backups or, more importantly, restores. How long will it take to download 100Gb or even several Tera-bytes of data? What if you need to do it on a really slow connection (say after a major disaster when telecom services are iffy)?
You should also have special case backups. These are for the most important files – the “I can’t live without it” files - such as your almost finished dissertation, that manuscript for your killer novel, or your customer contact list and accounting data. Backing these things up individually and regularly independent of your other backups is one more way to make sure you don’t loose them (and to sleep better). And because the set of things that fall into this category is usually very small, there are some really simple ways to handle this, such as emailing them (properly encrypted for security) to your gmail/hotmail/yahoo account and then leaving them there in case you ever need them.