Sunday, May 17, 2015

Installing and configuring s3fs, FUSE over Amazon S3 bucket

For certain applications, you may want to mount s3 bucket directly over FUSE (file system in user space). There are some limitations with this model, one of them is the max file size is 64g (imposed by s3fs and not amazon s3). s3fs also documents the below:-

https://code.google.com/p/s3fs/wiki/FuseOverAmazon

"Due  to  S3's "eventual consistency" limitations, file creation can and will occasionally fail. Even  after  a  successful  create,  subsequent reads  can  fail for an indeterminate time, even after one or more successful reads. Create and read enough files  and  you  will  eventually encounter  this failure. This is not a flaw in s3fs and it is not something a FUSE wrapper like s3fs can work around. The retries option does not  address  this issue. Your application must either tolerate or compensate for these failures, for example by retrying creates or reads."

To install you can follow the steps below:-

1. Download the source

*************
$wget https://s3fs.googlecode.com/files/s3fs-1.74.tar.gz
*************

2. Install the necessary dependent libraries

*************
$sudo yum install gcc-c++ fuse-devel libxml2-devel libcurl-devel openssl-devel
*************

3. Compile and install s3fs

*************
$cd s3fs-1.74
$./configure --prefix=/usr
$sudo make
$sudo make install
*************

4. Confirm that the library got installed correctly

*************
$ grep s3 /etc/mtab
s3fs /vol fuse.s3fs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
$ s3fs --version
Amazon Simple Storage Service File System 1.74
Copyright (C) 2010 Randy Rizun <rrizun@gmail.com>
License GPL2: GNU GPL version 2 <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
*************

5. Mount the s3 bucket over FUSE as a mount point

*************
$mkdir <mount_point>
$sudo /usr/bin/s3fs -d -o allow_other <bucket_name> <mount_point>
*************
NOTE -

a. You can also enable allow_other via /etc/fuse.conf

*************
$ cat /etc/fuse.conf
# mount_max = 1000
user_allow_other (default is commented out)
*************

b. The -d command line parameter enables s3fs to write debug output to /var/log/messages as below

*************
May 17 23:03:04 ip-198-x-x-x dhclient[1874]: bound to 198.x.x.x -- renewal in 1381 seconds.
May 17 23:24:06 ip-198-x-x-x kernel: [187835.012553] fuse init (API version 7.22)
May 17 23:24:06 ip-198-x-x-x s3fs: init $Rev: 497 $
May 17 23:26:05 ip-198-x-x-x dhclient[1874]: DHCPREQUEST on eth0 to 198.x.x.x port 67 (xid=0x138279e0)
May 17 23:26:05 ip-198-x-x-x dhclient[1874]: DHCPACK from 198.x.x.x (xid=0x138279e0)
May 17 23:26:07 ip-198-x-x-x dhclient[1874]: bound to 198.x.x.x -- renewal in 1705 seconds.
May 17 23:34:13 ip-198-x-x-x s3fs: init $Rev: 497 $
May 17 23:47:32 ip-198-x-x-x s3fs: init $Rev: 497 $
May 17 23:48:21 ip-198-x-x-x s3fs: Body Text:
May 17 23:48:21 ip-198-x-x-x s3fs: Body Text:
May 17 23:48:21 ip-198-x-x-x s3fs: Body Text:
*************

c. If the s3 bucket is not mounted correctly, you will see an error like below:-

*************
$sudo cp test.txt <mount_point>
cp: failed to access ‘<mount_point>’: Transport endpoint is not connected
*************

6. In case you want to unmount the s3 bucket, you can use the below command

*************
$sudo fusermount -u <mount_point>
*************

7. Now that s3fs has been set up correctly on the file system, you will have to set up the IAM user and bucket policy correctly

*************
IAM user policy with managed policy set to full s3 access

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": "*"
    }
  ]
}

S3 bucket policy that allows all actions to the specific IAM user

{
  "Id": "Policy1431904706700",
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1431904701345",
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::<bucket_name>/*",
      "Principal": {
        "AWS": [
          "arn:aws:iam::<aws_account_id>:user/<s3_user>"
        ]
      }
    }
  ]

*************

8. Now we can test by creating test files on the mounted volume

*************
$cd <mount_point>
$echo "this is the first test" >>test1.txt
$echo "this is the second test" >>test2.txt
$echo "this is the third test" >>test3.txt
*************

Now you will find the above files in your s3 bucket. If you would like to capture the packets you can use

*************
$sudo tcpdump -i eth0 -s 1500 -A port not 22 -n and net 54 >> info.txt
*************
NOTE - If tcpdump is run on s3.amazonaws.com then traffic is not seen. Here is the response from AWS support - "S3fs is sending traffic to S3 endpoints that have reverse DNS addresses that end in 'amazonaws.com', so in theory, tcpdump should allow you to filter on the hostname "amazonaws.com". But every time you try to use that filter, it doesn't show any traffic going to S3. In order to dump all traffic from eth0, we can filter out all traffic on port 22 (as we don't want to watch traffic from our own SSH session) and filter by IP address. As S3 endpoint IP addresses will be different, depending on your location, it may not make sense to filter by the entire IP address, but as the first octet will most likely always start with '54', so a command like this should give you the traffic."

In the s3 bucket, logs folder will have actual logs for the file that was put on s3 bucket

*************
82548f8fcda98eb96f29149b0cf3b8f4083f18b432adee0f38a9c4c52bc9b7cf <bucket_name>
[17/May/2015:23:48:22 +0000] 54.84.186.187 arn:aws:iam::<aws_account_id>:user/<IAM user>
DCEF0FA83504B586 REST.PUT.OBJECT test2.txt "PUT /test2.txt HTTP/1.1" 200 - - 24 27
5 "-" "-" -
*************

No comments:

Post a Comment