Basic Command-line AWS Glacier Workflow
Glacier is Amazon’s AWS cold-storage service. Its data-center analog is archival tape storage, and it is about as slow as tape. Retrieval times are measured in hours (if not days). Glacier is a disaster-recovery tool, not live storage.
Unlike most AWS offerings, Glacier cannot be usefully controlled from the web console. It must be accessed with command-line tools or custom-built programs. Here’s a quick overview of Glacier operations using the AWS command line interface.
Note: You’ll see unexplained references to SNSTopic
in some of
the JSON snippets throughout this post. They refer to the AWS
Simple Notification Service, a
push-notification service that will alert you to AWS events that
interest you. I left them there for my own reference. You can safely
ignore them, though you may find it worth your time learning how
to set up notifications.
Create the Vault
aws glacier create-vault --account-id - --vault-name sandbox-02
If you log into your AWS web console, you should be able to see your new vault within a minute or two.
From the command line, you can retrieve a description of your vault.
[~]$ aws glacier describe-vault --account-id - --vault-name sandbox-02
{
"SizeInBytes": 1036288,
"VaultARN": "arn:aws:glacier:us-west-2:112233445566:vaults/sandbox-02",
"LastInventoryDate": "2016-09-14T12:27:07.315Z",
"NumberOfArchives": 1,
"CreationDate": "2016-08-03T21:56:26.616Z",
"VaultName": "sandbox-02"
}
Upload Archive Files
Here’s a script the uploads to Glacier all the tar archives in a given directory. The output file and the archive descriptions include a timestamp.
#!/bin/sh
# variables for vault name, timestamp, and output file
VAULT="sandbox-02"
NOW=$(date +%s)
IDFILE="archive-ids-${NOW}.json"
# make sure we can write the output file
touch $IDFILE || exit 1
# upload all tar files in forbackup directory, writing
# results to the output file
#
# the archive-description string in the filename prefixed
# with the timestamp. this information may be of great
# help when/if we later retrieve the file.
for F in /home/myproject/forbackup/*.tar; do
echo "# $F" >> $IDFILE
aws glacier upload-archive \
--vault-name "$VAULT" \
--account-id - \
--archive-description "${NOW}/$F" \
--body "$F" >> $IDFILE
done
The output file will contain a JSON stanza for each uploaded file:
# /home/myproject/forbackup/allimages.tar
{
"archiveId": "Uto28rqS24V9TD6YFVkny5bCUoRr4DOJIHzpOan-4uzy-EwEfRW2QkuuvtMw4pJxuP-dXbfCfATKOlmOgDMVCKVLRIh-eBD8Zq9TcBbq2ovrCb4y2Mccd3xwPQD1udWLhUp0cxeFiw",
"checksum": "70cde3046ff600c49e3de101df06bdba70a2acb31753cb33097c408b9baa9023",
"location": "/112233445566/vaults/sandbox-02/archives/Uto28rqS24V9TD6YFVkny5bCUoRr4DOJIHzpOan-4uzy-EwEfRW2QkuuvtMw4pJxuP-dXbfCfATKOlmOgDMVCKVLRIh-eBD8Zq9TcBbq2ovrCb4y2Mccd3xwPQD1udWLhUp0cxeFiw"
}
# /home/myproject/forbackup/sqldump.tar
{
"archiveId": "AveGlBWdJIDk8-THelSpu8FFo34KUmg8pVOQFvMxEQzM8MXMC6A4V7XcX3E3_qf7II3nYNuUpsgAhbSNYzbUUDKEmKv6VRwJvQZdP9m33ZpCGhsrMXnAgn05ng2xDvHHGFSRUjFf-g",
"checksum": "49b20365823966a8209e16625fe5c0cfee1a4299be01c9cfe3efbe7431908333",
"location": "/112233445566/vaults/sandbox-02/archives/AveGlBWdJIDk8-THelSpu8FFo34KUmg8pVOQFvMxEQzM8MXMC6A4V7XcX3E3_qf7II3nYNuUpsgAhbSNYzbUUDKEmKv6VRwJvQZdP9m33ZpCGhsrMXnAgn05ng2xDvHHGFSRUjFf-g"
}
It’s worth noting that comments are semantically invalid in the
JSON standard. My script adds a #-prefixed comment for each file,
which means that the output file is not, strictly speaking, proper
JSON. (This hack would be unnecessary if Amazon deigned to include
the ArchiveDescription
string in the JSON.)
Get Inventory
aws glacier initiate-job \
--account-id - \
--vault sandbox-02 \
--job-parameters '{ "Type": "inventory-retrieval" }'
An inventory-retrieval job will take several hours. I’d suggest submitting the job very early or very late in your workday. You can verify the job is in progress by submitting a list-jobs request.
[~]$ aws glacier list-jobs --account-id - --vault-name sandbox-02
{
"JobList": [
{
"InventoryRetrievalParameters": {
"Format": "JSON"
},
"VaultARN": "arn:aws:glacier:us-west-2:112233445566:vaults/sandbox-02",
"SNSTopic": "arn:aws:sns:us-west-2:112233445566:glacier-sandbox",
"Completed": false,
"JobId": "j6ig7qCeJ4Ortc-D83EgHsNxm3RriaAkyEFma3_dx_TV_xix5_APExmpGrDLT7EU07Wxc_5BQfwllggqsgH_JfLusxIV",
"Action": "InventoryRetrieval",
"CreationDate": "2016-09-15T15:42:07.927Z",
"StatusCode": "InProgress"
}
]
}
Once the job is complete, you can request its output. Use the JobId
from
the inventory-retrieval job.
aws glacier get-job-output \
--account-id - \
--vault-name sandbox-02 \
--job-id "j6ig7qCeJ4Ortc-D83EgHsNxm3RriaAkyEFma3_dx_TV_xix5_APExmpGrDLT7EU07Wxc_5BQfwllggqsgH_JfLusxIV" \
glacier-jobs-out
The resulting file (here, glacier-jobs-out
) will list the archives found
within the inventory-retrieval range:
{
"VaultARN":"arn:aws:glacier:us-west-2:112233445566:vaults/sandbox-02",
"InventoryDate":"2016-08-04T07:56:34Z",
"ArchiveList": [
{
"ArchiveId":"Uto28rqS24V9TD6YFVkny5bCUoRr4DOJIHzpOan-4uzy-EwEfRW2QkuuvtMw4pJxuP-dXbfCfATKOlmOgDMVCKVLRIh-eBD8Zq9TcBbq2ovrCb4y2Mccd3xwPQD1udWLhUp0cxeFiw",
"ArchiveDescription":"1470261757//home/myproject/forbackup/allimages.tar",
"CreationDate":"2016-08-03T22:02:37Z",
"Size":44120068,
"SHA256TreeHash":"70cde3046ff600c49e3de101df06bdba70a2acb31753cb33097c408b9baa9023"
},
{
"ArchiveId":"AveGlBWdJIDk8-THelSpu8FFo34KUmg8pVOQFvMxEQzM8MXMC6A4V7XcX3E3_qf7II3nYNuUpsgAhbSNYzbUUDKEmKv6VRwJvQZdP9m33ZpCGhsrMXnAgn05ng2xDvHHGFSRUjFf-g",
"ArchiveDescription":"1470261757//home/myproject/forbackup/sqldump.tar",
"CreationDate":"2016-08-03T22:02:58Z",
"Size":1003520,
"SHA256TreeHash":"49b20365823966a8209e16625fe5c0cfee1a4299be01c9cfe3efbe7431908333"
}
]
}
Retrieve an Archive
Using the correct ArchiveID
keypair from the inventory-retrieval data,
you need to build a JSON archive-retrieval request:
{
"Type": "archive-retrieval",
"ArchiveId": "AveGlBWdJIDk8-THelSpu8FFo34KUmg8pVOQFvMxEQzM8MXMC6A4V7XcX3E3_qf7II3nYNuUpsgAhbSNYzbUUDKEmKv6VRwJvQZdP9m33ZpCGhsrMXnAgn05ng2xDvHHGFSRUjFf-g",
"Description": "Retrieve SQL dump for audit team",
"SNSTopic":"arn:aws:sns:us-west-2:112233445566:glacier-sandbox"
}
Then reference that JSON file in your job request:
aws glacier initiate-job \
--account-id - \
--vault-name sandbox-02 \
--job-parameters file://archive-retrieval.json
You’ll receive a location and job ID:
{
"location": "/112233445566/vaults/sandbox-02/jobs/xGvIJyQPC9weheMNwIf4s2z8Zct1lYGvjzdxz84VwhD-OaGtCRPwLCAGdr5c_m3qadoOkMGo-FYaLJ5psLKhhcFDjC1n",
"jobId": "xGvIJyQPC9weheMNwIf4s2z8Zct1lYGvjzdxz84VwhD-OaGtCRPwLCAGdr5c_m3qadoOkMGo-FYaLJ5psLKhhcFDjC1n"
}
Asking AWS to list glacier jobs is instructive:
[~]$ aws glacier list-jobs --account-id - --vault-name sandbox-02
{
"JobList": [
{
"VaultARN": "arn:aws:glacier:us-west-2:112233445566:vaults/sandbox-02",
"RetrievalByteRange": "0-44120067",
"SNSTopic": "arn:aws:sns:us-west-2:112233445566:glacier-sandbox",
"Completed": false,
"SHA256TreeHash": "49b20365823966a8209e16625fe5c0cfee1a4299be01c9cfe3efbe7431908333"
"JobId": "xGvIJyQPC9weheMNwIf4s2z8Zct1lYGvjzdxz84VwhD-OaGtCRPwLCAGdr5c_m3qadoOkMGo-FYaLJ5psLKhhcFDjC1n",
"ArchiveId": "AveGlBWdJIDk8-THelSpu8FFo34KUmg8pVOQFvMxEQzM8MXMC6A4V7XcX3E3_qf7II3nYNuUpsgAhbSNYzbUUDKEmKv6VRwJvQZdP9m33ZpCGhsrMXnAgn05ng2xDvHHGFSRUjFf-g",
"JobDescription": "Retrieve SQL dump for audit team",
"ArchiveSizeInBytes": 1003520,
"Action": "ArchiveRetrieval",
"ArchiveSHA256TreeHash": "49b20365823966a8209e16625fe5c0cfee1a4299be01c9cfe3efbe7431908333"
"CreationDate": "2016-09-22T17:16:29.191Z",
"StatusCode": "InProgress"
}
]
}
Retrieve Your Bits
Once you’re notified the job is complete, you can retrieve the file:
aws glacier get-job-output \
--account-id - \
--vault-name sandbox-02 \
--job-id "xGvIJyQPC9weheMNwIf4s2z8Zct1lYGvjzdxz84VwhD-OaGtCRPwLCAGdr5c_m3qadoOkMGo-FYaLJ5psLKhhcFDjC1n" \
sqldump.tar
The output file I named sqldump.tar
, which is the same as the
original filename, but you can specify any filename you want.