Categories

Upgrade an EC2 Instance Kernel

I’ve been using the Amazon Elastic Computing Cloud (EC2) for running our web servers at NetCrafters for almost a year now. It’s been an amazing experience and I’ve kept a detailed account of many of the lessons learned.

Amazon Web ServicesThe most recent challenge came when our servers just started locking-up for no reason. The sites would still be responding so we knew the LAMP stack was still limping along, but they were completely unresponsive via SSH. The only way to regain control was to issue a reboot through the Amazon API command line tools. Even this would sometimes take two or three times before the server would cycle.

Once I was able to login again, and look through the logs, there was absolutely nothing to go on…. until finally the server completely locked-up. Apache was not responding and we could not open a secure shell. Once the server was rebooted and came back up, there was an error in /var/log/dmesg:

...
kernel BUG at arch/i386/mm/pgtable-xen.c:306!
...

Unfortunately it seems that very few people are able to get this message because they are never able to recover the server once it happens. I was lucky and had this tiny shred of evidence to go on. After hours of searching I finally found that the kernel our instances were using (2.6.16) had been declared unstable for large CPU instance types. Originally these instances were started as m1.small and I had indeed converted them to c1.medium instances.

The Solution: Upgrade from the 2.6.16 kernel to the 2.6.18 (to be exact: vmlinuz-2.6.18-xenU-ec2-v1.0). Unfortunately, this too is a hard to find procedure, thus the long awaited point of this article:

How to Upgrade an EC2 Instance to 2.6.18 from 2.6.16

First, I recommend reading about Amazon EC2 User Selectable Kernels. I will outline the steps I took here, but if you want to know what you’re doing (versus just typing exactly what I tell you to!), please read at least the first paragraph or so then come back.

You’ll need the latest API and AMI tools installed on the instance, if you don’t know how to do that, search the AWS developer forums. There are plenty of tutorials already there, however, if you’re running Debian, there is a great post on the forums here for installing the AMI tools as a Debian package (this worked flawlessly for me).

Start by launching your instance from your existing AMI image (I’ll be using an arbitrary AMI: ami-99999999 as an example. Be sure to replace with your registered AMI), using the new Kernel (specified by the –kernel aki-9b00e5f2 directive), and the appropriate RAM disk (specified by the –ramdisk ari-67b95e0e).


ec2-run-instances ami-99999999 -k gsg-keypair --kernel aki-9b00e5f2 --ramdisk ari-67b95e0e -t c1.medium

This will start a new instance, using the vmlinuz-2.6.18-xenU-ec2-v1.0.i386 kernel (aki-9b00e5f2) and the initrd-2.6.18-xenU-ec2-v1.0.i386 RAM disk (ari-67b95e0e). These are both for a 32-bit instance, if you’re running 64-bit instances, you’ll want to use aki-9800e5f1 for the kernel and ari-64b95e0d for the RAM disk. You can see the list of AMI, ARI, and AKI available by issuing the following command:


ec2-describe-images -o amazon

The -t c1.medium is used to describe the instance type (number of processors, RAM, etc). There are others available:

Once the instance is up and running, login via SSH and we’ll need to install the new Kernel modules:

First verify you’re actually running the new kernel:


ec2# uname -a
2.6.18-xenU-ec2-v1.0

Then, install the new kernel modules:


cd /usr/local/src
wget http://ec2-downloads.s3.amazonaws.com/ec2-modules-2.6.18-xenU-ec2-v1.0-i686.tgz
tar xzf ec2-modules-2.6.18-xenU-ec2-v1.0-i686.tgz -C /
modprobe -l

Next, we need to install udev. You may not need to do this, but we are running Debian servers and udev is required to make the new image bootable. Otherwise, none of the devices will mount and the new image will just hang at boot. This is really easy though:


apt-get dselect-upgrade
apt-get install udev
# prevent udev from keeping its old ip-adress
rm /etc/udev/rules.d/z25_persistent-net.rules

Next step is to rebundle and register your new image. The new image will use the 2.6.18 kernel and RAM disk by default. If you’re a bit rusty on this process, I recommend the EC2 Getting Started Guide. Once you’ve done this, launching the newly registered AMI will launch your instance with the new 2.6.18 kernel and RAM disk.

NOTE: Each time you rebundle, you’ll need to re-issue the following command or udev will keep it’s IP address and the new image will not get a new IP via DHCP at first boot:

# prevent udev from keeping its old ip-address
rm /etc/udev/rules.d/z25_persistent-net.rules

I wrote this a week or so after actually doing it, so I’m going mainly from rough notes and memory. If you’re having a specific error while working through this, let me know and it’ll probably knock something loose and I’ll have an answer for you. This took nearly 10 hours to aggregate the steps for making this work. Once you have the steps in place though it only takes about 10 minutes. I sure hope to save someone else all that trouble.

Resources:

  1. Amazon EC2 User Selectable Kernels
  2. Programming Amazon EC2 (version 2008-09-01)
  3. Forum: AMI does not boot (post: j0nes2k, April 4, 2008 @ 5:25 am PDT)
  4. Forum: Installing ec2-ami-tools as a Debian package (post: Stephen Caudill, Dec 27, 2006 @ 8:26am PST)

9 comments to Upgrade an EC2 Instance Kernel

  • Dieter

    I’ve also been having these crashes. Pretty annoying.

    Didn’t know that the 2.6.16 kernels weren’t suitable for large instances. Weren’t these the default kernels a while ago?
    BTW: Where did you find this information?

    On to upgrading the kernels …

    Tnx for the info!

  • I didn’t find this anywhere… it was an aggregate of different research… this was EXTREMELY hard to track down! I’m really happy that it helped you!

  • [...] After further investigation, I found a kernel error in dmesg: this was an EC2 instance (actually several of them) running an older kernel (2.6.16 is apparently unstable). The fix to stop the crash was to upgrade kernels. [...]

  • Ashraf Khalid

    hi,

    this link has been a real eye opener to people trying with a higher kernel version than the amazon provided default 2.6.16-33; especially if any of the OEM softwares being installed required a higher kernel version.

    This worked for me for re-bundling my RHEL 5 update 3 AMI.

    But also I have few doubts to clear to get forward from the small hickup I face…that is, when upgraded to kernel 2.6.18 with the relevant AKI and ARI for x86_64 v1.0 (aki-9800e5f1 and ari-64b95e0d), and later updating with the kernel module ec2-modules-2.6.18-xenU-ec2-v1.0-x86_64.tgz, the AMI runs…but cannot be re-bundled successfully without specifying the AKI and ARI each time. If not specified, it do takes the newer AKI / ARI, but doesn’t start the instance; instead gets terminated with Server.InternalError.

    Is there anything I am missing in my steps?? I didnt do any udev installation though. Is this required on RHEL system? If so, how to do the same in RHEL distribution?

    Also when extracting the kernel module, it also creates a boot folder. Do we need to copy its content to /boot folder just like we do for the lib/modules ??

    Kindly help…your comments highly appreciated!!

    thank you

    -ashraf-

  • Mike

    Dear Friends,

    I have installed local machine for slackware13.1 and I have created new
    image for 10GB using below command and

    # dd if=/dev/zero of= bs=1M count=10000
    # mke2fs -F -j

    and installed the custom package for slackware13.1

    I have referred the below links and follow-up the steps, for pdf document.

    http://aws.typepad.com/aws/2010/07/u…mazon-ec2.html

    I am using the kernel is linux-2.6-xenU.x86.tar.bz2. And I have download the kernel below liks
    http://stacklet.com/downloads/kernels/xen/xenU-2.6.31

    I have untar the xenU-2.6.31 kernel, I am getting 2 folders (/lib/modules & /boot)
    and installed the path like (/lib/modules folder in /lib path and boot file is stored in /boot path.

    Also I have set the fstab entry for, and changed the above links steps also.

    /dev/xvda1 / ext3 defaults 1 1
    none /dev/pts devpts gid=5,mode=620 0 0
    none /dev/shm tmpfs defaults 0 0
    none /proc proc defaults 0 0
    none /sys sysfs defaults 0 0

    And I have created the /boot/grub/menu.lst file and store the file

    default 0
    timeout 3
    title vmlinuz-2.6.31.6
    root (hd0)
    kernel /boot/vmlinuz-2.6.31.6 console=hvc0 root=/dev/xvda1 ro

    and bundle with amazon aki (aki-407d9529) and upload the amazon server
    and register it,

    While run the instance, I am unable to login the server, I am getting
    below error.

    ec2-get-console-output i-ajht0c9
    i-ajht0c9
    2011-01-29T06:41:40+0000

    Please help me how to fix this issue.

    MIKE

  • @Mike: I’ve had problems like that before when I forget to clear UDEV before bundling. Read the parts above about clearing UDEV:


    NOTE: Each time you rebundle, you’ll need to re-issue the following command or udev will keep it’s IP address and the new image will not get a new IP via DHCP at first boot:


    # prevent udev from keeping its old ip-adress
    rm /etc/udev/rules.d/z25_persistent-net.rules

  • mike

    Dear,

    I am using Amazon server with pv-grub build compile kernel (2.6.34). I have login the server and checked
    ps -aux service it showing time value is wrong.

    Please help me how to fix this issue. Given below the error report in BOLD LETTER & Under line the word

    root@server:~#
    daemon    4095 3688751  0.0 184684 4456 ?      S    Apr12 21067849:44 /apache/sbin/httpd
    daemon    4097  0.0  0.0 184684  4456 ?        S    Apr12   0:00 /apache/sbin/httpd
    daemon    4098  0.0  0.0 184684  4140 ?        S    Apr12   0:00 /apache/sbin/httpd
    daemon    4115 3697201  0.0 184684 4456 ?      S    Apr12 21114581:29 /apache/sbin/httpd
    daemon    4116  0.0  0.0 184692  4268 ?        S    Apr12   0:00 /apache/sbin/httpd
    daemon    4152  0.0  0.0 184692  4420 ?        S    Apr12   0:00 /apache/sbin/httpd
    daemon    4262 3713011  0.0 184560 3812 ?      S    Apr12 21067849:44 /apache/sbin/httpd
    daemon   17818  0.0  0.0 184560  2880 ?        S    02:58   0:00 /apache/sbin/httpd

    root@server:~# date
    Mon Apr 18 03:34:39 Local time zone must be set–see zic manual page 2011
    root@server:~#

    Please help me how to fix this issue..

    Thanks for advance

    MIKE

  • [...] of het probleem vanuit de hypervisor kunnen oplossen. Misschien is een kernel-upgrade voldoende. http://www.vincestross.com/2009/04/u…-ec2-instance/ Uit de graph kan ik niets zien. Wat bedoel je? Enige wat ik zie is dat de swap na de reboot (de [...]

  • [...] After further investigation, I found a kernel error in dmesg: this was an EC2 instance (actually several of them) running an older kernel (2.6.16 is apparently unstable). The fix to stop the crash was to upgrade kernels. [...]

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>