Following up to Backing up to Amazon S3 using Amanda (June 28, 2008):
I’ve been having issues with both Amanda+S3 as well as JungleDisk. I’ll outline these here.
JungleDisk Problems
- JungleDisk sometimes destroys all of my data on S3 in the bucket JungleDisk uses and then on the next backup re-uploads all of the data! This is clearly a problem. Luckily I do not have more than a gigabyte of data thats backed up with JungleDisk. If I did this bug (or feature?) would be very expensive.
- JungleDisk doesn’t smartly handle moves. I’d like to be able to move things around on my local filesystem and have JungleDisk notice this and move them. Moving them over WebDAV isn’t feasible.
- JungleDisk scans individual files and doesn’t combine a whole bunch of them into one tarball. This gets very expensive! I wish they’d tar it all up first like Amanda.
Amanda+S3 Problems
- Sometimes the Amanda S3 device module has problems talking to S3. The only fix I’ve found so far is to destroy the bucket, remove it from the tapelist, readd the bucket, reidentify the bucket to Amanda and run amflush. This is clearly not good as it’s just as bad as Jungledisk destroying everything. I haven’t figured out why this happens yet.
Both products are still good and I’ll continue to use them. I’m considering using Amanda on my laptop, however, but this could cause problems in cases where it isn’t connected to the network at backup time.
Recently a friend of mine (Sarah) purchased a dedicated server at a hosting company and moved her data to it. Obviously she was in need of a backup solution. She chose Amanda 2.6.0 and to use S3 as her “tape” choice.
Seeing how well it worked for her I asked her to show me how it worked and now my server is also using Amanda to backup to S3. Recovery works with amrecover.
I have quite a bit of data to back up (4.5GB in /home, for example) and with a home DSL connection it takes a long time. However with a bigger pipe using S3 and Amanda would be extremely viable.
I’m very happy with the solution and will likely use it across all of my servers from now on.
For my laptop I’m still using JungleDisk, which seems to work fine. 2.0 is a very good improvement.
Since I got to North America I have wanted to put my system under better monitoring. It’s got four active hard disks (and a fifth powered, not active) hard disk in an IcyDock MB-455SPF 5-Bay Internal SATA Drive Enclosure (Manufacturer site, icydock.com, seems offline so link to where I bought it!) with the four active disks plugged into a HighPoint RocketRaid 1640 SATA controller.
I can easily use S.M.A.R.T. to monitor the temperature and the hddtemp Linux utility to get the temperatures of the disks.
It’s easy to use a crontask to poll the disks and to stuff the temperatures into an rrdtool database. I drew much inspiration from Martin Pot’s Perl script to do the same thing but implemented my own in a Ruby Rake task using the woefully undocumented RubyRRDtool gem and a custom hddtemp wrapper class to get the temperatures.
The similarities between my Rake task and Martin’s RRDtool-fu is obvious, however I wanted an hour graph too and so I added it in on line 17.
My next task is to combine the temperatures (lines) of all four disks into a single graph to get a feel for the overall temperature of the disks on one image.
The graphs can be seen at my personal website.
Just over four years ago I was given CVS commit access to the Gentoo portage tree. It’s been quite a dramatic time and it’s been very enlightening. The people I’ve met have been really nifty! Most of my the things that I maintain are stable. My main regret is that due to work completely sucking up my time I don’t have much left over at the end of the day for Gentoo.