Wednesday, November 29, 2017

Nothing is more important than the reliability of data

Surendra Verma of Microsoft introduced ReFS here.

He talks a great deal, then makes the most important point and it almost gets overlooked:

"...with the caveat that nothing is more important than the reliability of data. So, unlike any other aspect of a system, this is one where a conservative approach to initial deployment and testing is mandatory." (my emphasis)

This is the thing that almost always gets overlooked. I've worked in organisations where they have no clue what they, an IT department, should be providing to the larger organisation. Management overhead gets bigger each year, office politics get worse as the organisation gets bigger and less effective, millions get spent on hardware or software. And guess what? They miss their purpose.

Their objective is to ensure data remains available to the wider organisation. Everything else is just plumbing.

Protect your data, everything else is just plumbing

Steve Riley of Microsoft wrote one of the best articles on the entire purpose of IT - data. It can be found here.

He makes a point which requires genius to understand. The average IT pro usually gets bogged down in whatever area they're in - DBA, sysadmin, programming, networks, etc. And they lose sight of the fact that we don't have time machines.

Yes, another way of looking at what Riley says, is that time machines haven't been invented. If we had a time machine and lost data, we could go back in time and either re-enter the data, or prevent its loss.

Your customers have spent millions of man-hours creating data. We can hire new people, buy new hardware, buy new software. All of those things are replaceable. But we can't travel back in time and re-enter all of that data. Some data creators have since retired. Sometimes paper documents have been destroyed, or never created in the first place.

Data protection is the highest imperative.

Phpar2 - Its importance

I'd like to explain why phpar2 is important, or what it provides.

I've been managing data for a long time now, and I've realised: script everything. Scripting allows you to always deploy your best practices. Not only your best practices get encapsulated in one set of scripts, but your deployment across multiple sites is consistent.

I use to have multiple methods of managing databases. Bespoke methods for different customers, different strategies for live, test and training databases. That's a crap approach.

One approach based on my best practices is the best method. Or, if you do have multiple methods, use one set of scripts to manage them. Scripts is the meta-management of minutia that we admins usually get bogged down in. That's why script management is a much more powerful approach than manual methods, or bespoke site-specific methods.

So, what does this have to do with phpar2? There's no functionality like parity data in PowerShell (PS). If there were, I wouldn't be using phpar2. I see phpar2 as an extension of PS which provides parity data for user data files. I will be writing PS functions around phpar2 to control its functions from other scripts. And, to have confidence in it I need to test it and ensure it works consistently 100% of the time. I also need to know its limitations.

If it doesn't work, or its limitations are too significant, then I'll not being using it. The closest alternative is command-line rar.exe from Winrar. But Winrar can generate parity only after compression & splitting.

Tuesday, November 28, 2017

Assumption is the mother of all fuckups

This should have been my first blog post ever. This is what separates the men from the boys, the professionals from the amateurs, the serious IT professional from the end-user.

Assumption is the mother of all fuckups.

Always test your backups, never assume they will work. Backups are sacred. When everything has gone to hell, backups will bring you salvation. And to be 100% assured that they will work, you need to test them.

Always plan your work around your backups, so that continuity of backups is preserved.

Never forget.

phpar2 64-bit Test 1 (1 T)

Well, the testing hasn't been going great. Fails: 17 T and 4 T of user data.

My testing protocol is pretty basic, but I want to know that the fundamentals work. A large part of being a database administrator is being methodical and working through all possibilities, including the obvious ones. To test phar2 64-bit, I'll be undertaking the following basic tests:

1. Create and verify 100% pars. Why verify? Because it's the most obvious step. Parity data created by app should be verifiable by the same app. Data should be no problem with the creator app. the verification should work 100% of the time.

2. Repair using 100% pars. Obviously, it should work, but assumption is the mother of all fuckups. Independently verify the repaired data file using SHA256 hashes.

3. Repair using 50% pars and 50% data. Independently verify the repaired data file using SHA256 hashes.

If a test can pass all three tests, the process is acceptable.

After ad-hoc testing failed with 17 T and 4 T of user data, I'm testing with 1 T of data. Failure at this level will mean that testing of higher amounts of data is not necessary.

Method:

1. I created a 1 T VHDX of fixed size. Even though I specified fixed, it was instantaneously created on a ReFS volume. ReFS is working great so far. I quick-formatted the VHDX, but there's no actual data in it.

2. I hashed the VHDX file. Funny, but it seems ReFS knows that most of the file is empty, so there was almost zero disk reads while hashing. I thought something was wrong with sha256deep64, so I used get-filehash in PS. Same behaviour - practically zero disk reads. And the hashing of a 1 T file took like an hour. This would normally take several hours at least. Nice!

3. Running the following command:

phpar264 c -b100 -r100 -u -n100 -m28672 -t16 -v test1.par2 .\1T.vhdx

A friend of minds thinks we should specify block SIZE, and not block COUNT (as I have in this command). I will test that in another test. once this is complete, I will proceed with the test and update this post.

Saturday, November 25, 2017

Phpar2 64-bit - Intro

So a friend of mine thinks I may be the first person to par data with large amounts of memory, and suggested that I blog about my experiences. Well, my worldwide audience, what say you? 😃

I guess I'll post an introduction and background.

I will not go in detail about parity data files ("pars"). Pars are like RAID 5 for data files. Using extra parity data, you can regenerate your original data files. That's the key idea.

The main developer of PAR2 was Peter B Clements. There are some limitations, like directory support is missing. Some newer versions, like Par 3, can overcome those limitations, but they are way too experimental to trust our real data to.

The best implementation of PAR2 is Paul Houle's phpar2. Paul is both a gentleman and a scholar. He kindly generated a 64-bit version of phpar2. He has made a number of performance improvements over the years, and the latest, the 64-bit version is his best ever. He also added a "-t" parameter to control the number of threads.

I have a testing rig. I recently moved my main server to a newer Dell T410 server. The old server has become my testing rig. It is a simple desktop workstation mobo, with 32 G DRAM. I can stuff 10 HDD's in it, and although I am having a few problems with the disk / controller, it's very handy to have a rig where you can simply test your backups. I've placed the disks in a RAID 0 single volume configuration, and it's perfect for testing. It's extremely fast for reads and writes. SSD's would be faster, but this is very good with conventional disks.

My backups are 17 T in size. My backup strategy is:

1. Shutdown all VM's
2. Par up all VHDX's (1%)
3. Calculate the SHA256 hashes of all VHDX's
4. Copy the files (VHDX, pars, etc.) to USB drives (backup #1)
5. Restore the backup to a Destination Server using another SS DP / ReFS volume (backup #2)
6. Execute a repair operation using phpar2. Repair implies verification, and repair if necessary.
7. Calculate the Sha256 hashes of the VHDX's.on the Dest.
8. Compare

In this post I haven't gone into too much detail, but only provided a background and introduction to my experiments with phpar2.

The idea of the following posts is to determine, using repeatable experiments, if phpar2 64-bit is dependable.