Quantcast
Channel: GetInData
Viewing all articles
Browse latest Browse all 37

Tutorial: Creating HDFS Snapshots And Recovering a Deleted File

$
0
0

In this tutorial, we focus on HDFS snapshots. Common use cases of HDFS snapshots include backups and protection against user errors.

To demonstrate functionality of HDFS snapshots, we create an “important” directory in HDFS, create its snapshot and “accidentally” remove a file from the directory. Finally, we recover the file from the snapshot.

1. Create a snapshot of HDFS directory

Create the “important” directory and file:

$ hdfs dfs -mkdir important-dir
$ echo "important data" | hdfs dfs -put - important-dir/important-file.txt
$ hdfs dfs -cat important-dir/important-file.txt

Make your HDFS directory snapshotable:

$ sudo -u hdfs hdfs dfsadmin -allowSnapshot  /user/adam/important-dir

Navigate to the NameNode Web UI. Find the “Snapshot” link in the top menu of the webpage and see the increased number of snapshotable directories.

Now, let’s create a snapshot of our important directory!

$ hdfs dfs -createSnapshot important-dir first-snapshot

The snapshot name (“first-snapshot” in our case) is an optional argument. When it is omitted, a default name is generated using a timestamp with the format “syyyyMMdd-HHmmss.SSS”, e.g. “s20140730-052810.965″.

2. “Accidentally” remove the important file

Try to remove a snapshotable directory by typing a following command as the hdfs user (the HDFS superuser):

$ sudo -u hdfs hadoop fs -rm -r -skipTrash /user/adam/important-dir

As expected, the directory can’t be deleted because is snapshottable and it already contains a snapshot. So far so good! :)

Let’s check if the owner of the dataset can remove this directory. Run a following command as your own user:

$ hdfs dfs -rm -r -skipTrash important-dir

The same message is displayed! We confirmed that neither HDFS superuser nor the owner of the dataset can remove the snapshottable directory (please note that the HDFS superuser is actually the user who started the NameNode process – typically, it’s the hdfs user).

Now, let’s “accidentally” remove a file inside the snapshotable directory:

hdfs dfs -rm -r important-dir/important-file.txt

Oppsss… Surprisingly or not, the file was removed! What a bad day! What a horrible accident! :(

Do not worry to much, however. We can recover this file because … we have a snapshot! :)

3. Recover the file from the snapshot

First, find the file that you want to recover from in the snapshot subdirectory:

$ hdfs dfs -lsr important-dir/.snapshot

You can read the content of the file:

$ hdfs dfs -cat important-dir/.snapshot/first-snapshot/important-file.txt

Recovering from the snapshot is as simple as copying the file:

$ hdfs dfs -cp important-dir/.snapshot/first-snapshot/important-file.txt important-dir
$ hdfs dfs -cat important-dir/important-file.txt

Success Aren’t snapshots great, simple and powerful? :)


Viewing all articles
Browse latest Browse all 37

Trending Articles