Friday, September 26, 2014

Jenkins service failed to startup as service

As part of the AWS instance reboot, on one of the machines jenkins service did not come up. The /var/log/jenkins/jenkins.log had the below permissions related error:-

**************
Sep 27, 2014 2:08:49 AM winstone.Logger logInternal
SEVERE: Container startup failed
java.io.FileNotFoundException: /var/cache/jenkins/war/META-INF/MANIFEST.MF (Permission denied)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:171)
        at winstone.HostConfiguration.getWebRoot(HostConfiguration.java:277)
        at winstone.HostConfiguration.<init>(HostConfiguration.java:81)
        at winstone.HostGroup.initHost(HostGroup.java:66)
        at winstone.HostGroup.<init>(HostGroup.java:45)
        at winstone.Launcher.<init>(Launcher.java:143)
        at winstone.Launcher.main(Launcher.java:354)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at Main._main(Main.java:293)
        at Main.main(Main.java:98)
**************

On checking the /var/cache folder, the user and group owner was set to a different user than what jenkins was trying to run as in the /etc/init.d/jenkins script. So had to change the folder permissions using in /var/cache/jenkins folder

$sudo chown -R <jenkins_user><jenkins_group> /var/cache/jenkins

and then restart the jenkins service

$sudo service jenkins restart

Not sure why an Amazon scheduled maintenance reboot would take 6 hrs?

One of the instances that is part of the reboot just reported "In progress" status but I couldn't understand for the world why it would take 6 hrs as reported in the duration column of their console window below:-


AWS instance reboots

AWS have scheduled instance reboots over this weekend on around 10% of their EC2 instance fleet for upgrading their hardware. Needless to say that in impacts many of the "always-on" production machines. Amazon's Jeff Barr provided an update today on their blog:-

http://aws.amazon.com/blogs/aws/

There have also been several AWS users complaining on the forums about why a simple start/stop may not solve the problem:-

https://forums.aws.amazon.com/thread.jspa?threadID=161544&tstart=0

In the end, these machines will have to be rebooted, therefore always have a backup and recovery plan ready for all your "always-on" machines.



Perfect storm!. Shellshock bash vulnerability and AWS Instance reboot

As the saying goes - "when it rains it pours". We have had to deal with AWS instance reboots as well as patching the "shellshock" bash vulnerability (CVE-2014-6271) at the same time across many of our instances. The quick way to determine if your instances are vulnerable is to run the below command:-

$env var='() { ignore this;}; echo vulnerable' bash -c /bin/true

If the above prints "vulnerable" then you are exposed to bash vulnerability. You can also check the current version of bash installed by running the command below:-

$sudo rpm -q bash
bash-4.1.2-15.el6_4.x86_64

Once you have determined it is an old version, you can run an update through your package manager

$sudo yum update -y bash

Once the update finishes, you can check for the version again

$sudo rpm -q bash
bash.x86_64 0:4.1.2-15.el6_5.2

Now test for the vulnerability again by running the small bash script on top. This time it will not print "vulnerable"

Tuesday, September 23, 2014

Grants for locking and unlocking tables in MySQL RDS instances

Sometimes you will need lock tables privilege when trying to dump data from an RDS instances. To see if the current user has lock privileges, you can run the below command

$mysql> show grants for current_user\G;
*************************** 1. row ***************************
Grants for <rds_user>@%: GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP,
RELOAD, PROCESS, REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER ON *.* TO '<rds_user>'@'%' IDENTIFIED BY PASSWORD 'XYZ' WITH GRANT OPTION

If you would like to grant all privileges for the RDS mysql root user, you can run the below query

$mysql>GRANT ALL PRIVILEGES ON `%`.* TO <rds_user>@'%' IDENTIFIED BY '<password>' WITH GRANT OPTION;

Now you can run the "show grants" command to see an additional row displayed

mysql> show grants for current_user\G;
*************************** 1. row ***************************
Grants for <rds_user>@%: GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP,
RELOAD, PROCESS, REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER ON *.* TO '<rds_user>'@'%' IDENTIFIED BY PASSWORD 'XYZ' WITH GRANT OPTION
*************************** 2. row ***************************
Grants for <rds_user>@%: GRANT ALL PRIVILEGES ON `%`.* TO '<rds_user>'@'%'
 WITH GRANT OPTION
2 rows in set (0.00 sec)

Now to test if the locks are enabled, you can try the below queries

mysql> lock tables <db>.<table_nameA> READ;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(*) from <db>.<table_nameA>;
+----------+
| count(*) |
+----------+
|   991225 |
+----------+
1 row in set (0.41 sec)

mysql> select count(*) from <db>.<table_nameB>;
ERROR 1100 (HY000): Table '<table_nameB>' was not locked with LOCK TABLES

To unlock the tables, you can run "unlock tables" command

mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)

In RDS instances the FILE privilege for MySQL is not applicable

Typically in MySQL instances that run on standalone EC2 boxes or local installs, you can set FILE privileges to write a SQL query to write the output into a local flat file such as

$mysql -u$MyUSER -p$MyPASS -h$MyHOST --port=$MyPORT --socket=$MySOCKET -e "select name, username, email, registerDate, lastvisitDate from TestDB.Employee where username not like 'TestUser' INTO OUTFILE '/tmp/dbout.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';"

In RDS instance, the above query won't work because we don't have access to local file system of RDS, so "/tmp/dbout.csv" cannot be created. You will see an error like

"ERROR 1045 (28000): Access denied for user '<rds_user'@'%' (using password:YES)"

Instead, you would have to run the query with "--execute" switch on the remote EC2 instance and dump the query results to a flat file by separating out the results using "sed" utility as documented on AWS developer forum - threadID=41443

$mysql -u$MyUSER -p$MyPASS -h$MyHOST --port=$MyPORT --socket=$MySOCKET -e "select name, username, email, registerDate, lastvisitDate from TestDB.Employee where username not like 'TestUser';" | sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g'  >/tmp/dbout.csv

Sunday, September 21, 2014

Finding a process that is consuming CPU excessively


  • Look at top output




  • Next check ps output to see the location on the file system where the process is running from 


$ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10
%CPU   PID USER     COMMAND
59.5 11286 ec2-user      /bin/sh /tmp/ismp002/1978898.tmp
59.1 22608 ec2-user      /bin/sh /tmp/ismp002/2326448.tmp
 5.8 22861 ec2-user     ./engine -Djmx_port=5555
 4.7 22865 ec2-user      ./engine --innerProcess


  • Run strace to see what calls are consuming the cpu cycles


$strace -c -p 11286
Process 11286 attached - interrupt to quit
Process 11286 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 90.77    0.667227         273      2447           clone
  6.38    0.046881          10      4892      2446 wait4
  1.33    0.009747           0     22014     19568 stat
  1.02    0.007489           0     31802           rt_sigprocmask
  0.12    0.000864           0      4893           rt_sigaction
  0.11    0.000826           0      2446      2446 ioctl
  0.11    0.000819           0      2446           read
  0.09    0.000659           0      2446      2446 lseek
  0.08    0.000579           0      2446           rt_sigreturn
------ ----------- ----------- --------- --------- ----------------
100.00    0.735091                 75832     26906 total


  • In the above case the /tmp/ismp002/1978898.tmp file was not on the system, so it looks like a zombie process that was left running on the system
  • do a kill -9 on the unwanted processes



Saturday, September 20, 2014

Updating Route53 A records for a hosted zone using restricted IAM policy

If you have an application that is dynamically tearing down ELB's and creating new ELB's with instances in its pool, you will be in a situation where Route53 recordsets have be frequently updated as well. To manually update the A records of newly created ELB's can be tedious. Instead, you can create a restricted IAM user or group policy that allows for "ChangeResourceRecordSets" privilege such as below:-

**************
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "route53:GetHostedZone"
                "route53:ChangeResourceRecordSets"
            ],
            "Resource": "arn:aws:route53:::hostedzone/<ZONE_ID>"
        },
        {
            "Effect": "Allow",
            "Action": [
                "route53:GetHostedZone",
                "route53:ListResourceRecordSets"
            ],
            "Resource": "arn:aws:route53:::hostedzone/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "route53:GetChange"
            ],
            "Resource": "arn:aws:route53:::change/*"
        }
    ]
}
**************

Once the above policy is set, you can use AWS CLI as below to CREATE, DELETE, UPSERT A records on Route 53:-

***************
$aws route53 change-resource-record-sets --hosted-zone-id <ZONE_ID> --change-batch file://opt/sampleupsert.json --profile <domain>
{
    "ChangeInfo": {
        "Status": "PENDING",
        "Comment": "string",
        "SubmittedAt": "2014-09-20T12:40:49.159Z",
        "Id": "/change/CNZHKUS1ZF9Z9"
    }
}
***************

Once the change is submitted, you can query for the status till it shows "INSYNC" as below:-

***************
$aws route53 get-change --id /change/CNZHKUS1ZF9Z9 --profile <domain>
{
    "ChangeInfo": {
        "Status": "INSYNC",
        "Comment": "string",
        "SubmittedAt": "2014-09-20T12:40:49.159Z",
        "Id": "/change/CNZHKUS1ZF9Z9"
    }
}
***************

and the sampleupsert.json file which is passed as argument to --change-batch parameter in "change-resource-record-sets" method of AWS CLI, looks like

***************
{
  "Comment": "string",
  "Changes": [
    {
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "<domain>",
        "Type": "A",
        "AliasTarget": {
          "HostedZoneId": "<ZONE_ID>",
          "DNSName": "<DNS_NAME of ELB>",
          "EvaluateTargetHealth": false
        }
      }
    }
  ]
}
***************

The above can also be done using Boto library and simple python code as below:-

***************
import boto.route53


conn = boto.route53.connect_to_region('us-east-1')
from boto.route53.record import ResourceRecordSets
zone_id = "<ZONE_ID>"
changes = ResourceRecordSets(conn, zone_id)
change = changes.add_change("UPSERT", '<domain>', "A")
change.set_alias("<ZONE_ID_2>", "<ELB A record>")
changes.commit()
***************

If ZONE_ID is not known, then you have to modify the above policy to allow listing of zones and then iterate through the code as shown in this blog - managing-amazon-route-53-dns-with-boto

Thursday, September 18, 2014

Domain delegation to Amazon Route 53

If you have domains registered by external domain name provider like Bluehost or Network Solutions. You can set up domain delegation to amazon Route53 service to handle sub domains more easily.

In Amazon Route 53 console, you can create a new hosted zone by clicking on "New Hosted Zone" and then specifying a name


Once you create the hosted zone, you will see NS records and SOA records as below:-



Next you will have to set up domain delegation in internal and external DNS server and add the NS records provided by Amazon Route 53 in the above screenshot. Once you have updated the DNS entries, you can run the below dig query to confirm that NS records match what has been provided by Amazon

$ dig -t NS example.mycompany.com

; <<>> DiG 9.7.1 <<>> -t NS example.mycompany.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32360
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 4

;; QUESTION SECTION:
;example.mycompany.com.              IN      NS

;; ANSWER SECTION:
example.mycompany.com.       0       IN      NS      ns-1467.awsdns-55.org.
example.mycompany.com.       0       IN      NS      ns-540.awsdns-03.net.
example.mycompany.com.       0       IN      NS      ns-1714.awsdns-22.co.uk.
example.mycompany.com.       0       IN      NS      ns-292.awsdns-36.com.
....

Tuesday, September 16, 2014

Monday, September 8, 2014

unix grep to search for multiple filter expressions

Recently, I was searching logs for a multiple string patterns. There are multiple ways to achieve that using "grep -E <string1>  grep -E <string2>". However, the below seemed more efficient

$ grep "08/Aug.*<service name>" access_log | cut -d[ -f2 | cut -d] -f1 | awk -F: '{print $2":00"}' | sort
 -n | uniq -c

Using wget to determine if ELB is misconfigured and is attached to a private subnet

Typically, you would want ELB's to be available in atleast 2 zones in a particular region, so that if one zone goes down, the ELB in the second zone will handle all the requests. If you ELB is configured correctly for multiple zones, you can do a "nslookup" on the ELB A record and you will get multiple EIP's returned (1 for each zone)

If the ELB is attached to a private subnet, you would see a request failure using wget:-

****************
$ wget http://<elb-name>.ap-northeast-1.elb.amazonaws.com/index.html
--2014-09-08 18:08:29--  http://<elb-name>.ap-northeast-1.elb.amazonaws.com/index.html
Resolving <elb-name>.ap-northeast-1.elb.amazonaws.com (<elb-name>.ap-northeast-1.elb.amazonaws.com)... 54.92.98.228, 54.238.149.12
Connecting to <elb-name>.ap-northeast-1.elb.amazonaws.com (<elb-name>.ap-northeast-1.elb.amazonaws.com)|54.92.98.228|:80... failed: Connection timed out.
Connecting to <elb-name>.ap-northeast-1.elb.amazonaws.com (<elb-name>.ap-northeast-1.elb.amazonaws.com)|54.238.149.12|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/xml]
Saving to: `index.html'
****************

The other way to confirm and check is through the AWS console:-


In the consile make sure the "subnet id" in the ELB's availability zone's have an igw-* associated with the zones that it has been added to. ELB's need to be public subnets so that they can be accessed from outside.