Wednesday, October 19, 2011

Repair: HP Ultrium2 Drive

Being the digital hoarder that I am I routinely end up with several hundred gigabytes I no longer actively need but don't want to just delete. The options when it comes to modern day large volume data archival are limited. The more straightforward choices are HDD, BluRay or DVDs. None of those are really reliable in the long run. The recoding layer in optical media deteriorates too fast and HDDs are generally unreliable if left idle for extended periods of time. So a good long time ago I bought an ancient DAT drive which I used for about a year then upgraded to an Ultrium1. Usually these enterprise grade backup devices are crazy expensive when released, then 5-10 years later they're within my budget :) The hard part is finding one that hasn't been used to death in some server-farm somewhere. Rarely drives bought as spares then never used go up on eBay which can turn out to be a killer deal, but otherwise It's basically a lottery. My first Ultrium1 drive did have some use in it, but nothing serious. I was happy enough with it and now have about 1TB archived all in all on LTO-1 tapes (they're 100GB per tape). About a week ago while browsing eBay I found an LTO-2 drive for $55+international shipping (which was $56, heh..) so I bought it.

The drive is an HP C7379 double-height drive. (My LTO-1 was a single height).
Now I assumed that the double-height drives are more robust mechanically and more reliable electronically because I took apart the single-height drive a few times and things are really cramped in there which I went ahead and blamed some of the stability issues on that I had with it over the years (without any real evidence).

So the new drive arrives and I put it in my archival machine for testing. I insert the first cartridge and even before the insertion process is complete the Tape Error LED starts flashing. Not good!

I proceeded to do a cleaning run, but the Tape Error came up for the Cleaning Tape as well. Hrmm..

My first assumption was that the CM (Cartridge Memory, an RFID-like wireless EEPROM inside every LTO cartridge) is not being read properly because the LED started flashing even before the spooling sequence begun. The drive didn't even open the cartridge door to take out the leader pin before the error came up. (The way Ultrium works is basically the cartridges have a single roll of tape in them the end of which is glued onto a metal pin. The drive physically pulls the metal pin along with the tape out of the cartridge and starts spooling the tape onto an internal drum. In contrast with VCR where you have two drums inside the cartridge, in Ultrium one of the drums is in the drive.)

So basically since the tape never even touched the heads when I got the errors, and there was no apparent mechanical failure during the load sequence, the CM reader became the prime suspect.

Thankfully double-height drives are much more serviceable than the single-heights. The CM reader module is a separate daughter-board labeled Philips FCMWP PCA. (WP stands for Write-protect since this board also connects to the optical gate and mechanical arm responsible for detecting write protection.)

So I took the board out and began to troubleshoot it. I was planning to capture the SPI communication to get a better picture of what's going on, but since I didn't have a 3.3v micro to do it and I was too lazy to bother with level shifting I decided to try other things that while more invasive I had the tools on hand for.
I desoldered the CM reader chip LTRCC10. This chip is a black-box. No information about it seems to be available anywhere. Actually that could be said for most Ultrium drive hardware. The only worthwhile material is the HP Ultrium Technical Reference Manual (google CRCM2161 to CRCM2167). Anyway, I traced the pins on said chip and the daughter-board to some extent.
The SPI stuff should be correct, the rest could be totally wrong, but maybe it'll be of some use to someone:

LTRCC10: (Pin numbering as silkscreening on the daugtherboard. (Pin1 ----- Pin16))
1: GND, 2: RF, 3: 5V Vcc, 4: RF, 5: GND, 6: XTAL1, 7: GND(? double-check), 8: XTAL2,
9: SPI Clock, 10: SPI In,11: SPI Out, 12: SPI Chip Select, 13: GND, 14: RF, 15: RF, 16: GND
I had no interest in the RF pins so whatever pins connected to the passives associated with RF I just labeled as such. If you want to build a CM reader with this chip you'll have to investigate them on your own, but you'll need a working drive anyway to reverse engineer the SPI protocol.
The LTRCC10 works at 3.3v SPI despite the 5V Vcc. Not sure if it would tolerate 5V SPI.

Daughterboard header :
1: Optogate Common?, 2: Onboard Optogate, 3: Write-protect Optogate, 4: 5v (CM Reader VCC and Optogate VCC), 5: EEPROM SPI Chip Select, 6: SPI In, 7: SPI Out, 8: GND, 9: SPI Clock, 10: CM Reader SPI Chip Select, 11: 3.3v (EEPROM Vcc)

The drive didn't really seem to be affected by the absence of the CM reader chip and continued to produce the exact same symptoms. I also tried disconnecting the EEPROM, which made the drive unable to boot up.

So I soldered both of the chips back in, then hooked up the reader board to my good LTO-1 drive's SPI bus, since it had the same LTRCC10 CM reader chip. I wanted to see if it produces any RF activity when it receives the correct read command. I cobbled together a radio detector (based on this) using a germanium transistor (as a diode), a 1nF capacitor and a +-50uA analog panel meter.

This test concluded that the reader chip did "something" at least, however the drive still failed to read CM. I assumed the reader chip was not the cause of the problem and left it alone.

At this point the only things left on the daughter board were the various passives that I tested with a multimeter for the most part and a 13.56Mhz quartz.
The operating frequency for the CM reader should be 13.56Mhz according to this document. I did a few insertion tests while lightly holding the quartz and managed to get the tape loaded without an error twice. This confirmed that the quartz was bad.

I had no 13.56Mhz quartz (suitable or otherwise) on hand so I tried the closest value I got which was 12Mhz. The difference of 1560Khz is way out of the +/- 7 Khz tolerance in the spec but... It seems to work just fine. I don't get any tape errors and it reads the cartridge name from the CM correctly. Unless it starts acting up during testing I'm going to leave it alone.

So that's one LTO drive (bought as functional) that had to be repaired. A success story I guess since I basically started the repair without any hopes of success after the seller refunded the price (minus shipping) as a last Hail-Mary attempt before selling it for parts to recoup some of my loss. It was mostly luck that I managed to fix it. Had the CM reader chip been defective for example the drive would've been beyond repair since there's no way to get replacements for that other than having a friend in a Chinese component warehouse or buying a broken drive. The drive also possibly still has some other issues.. if I want to do an assessment test in HP Tape Tools the drive locks up and the SCSI card driver eventually BSODs Windows unless I manually reset the drive. It does however appear to read and write tapes successfully.. for now.