Kushal Das

FOSS and life. Kushal Das talks here.

kushal76uaid62oup5774umh654scnu5dwzh4u2534qxhcbi4wbab3ad.onion

Using rkt and systemd

Few days back, I wrote about my usage of rkt containers. As rkt does not have any daemon running, the simplest way to have a container running is to start it inside some screen or tmux session. I started following the same path, I used a tmux session.

But then I wanted to have better control over the containers, to start or stop them as required. Systemd is the solution for all the other services in the system, that makes it an ideal candidate for this case too.

Example of a service file

[Unit]
Description=ircbot
Documentation=https://github.com/kushaldas/ircbot
Requires=network-online.target

[Service]
Slice=machine.slice
MemoryLimit=500M
ExecStart=/usr/bin/rkt --insecure-options=image --debug run --dns=8.8.8.8 --volume mnt,kind=host,source=/some/path,readOnly=false  /mnt/ircbot-latest-linux-amd64.aci
ExecStopPost=/usr/bin/rkt gc --mark-only
KillMode=mixed
Restart=always

The path of the service file is /etc/systemd/system/ircbot.service. In the [Unit] section, I mentioned a super short Description, and link to the documentation of the project. I also mentioned that this service requires network-online.target to be available first.

The [Service] is the part where we define all the required configurations. The first value we mention is the Slice.

Slices, a way to do resource control

Systemd uses slices to group a number of services, and slices in a hierarchical tree. This is built on top of the Linux Kernel Control Group feature. In a system by default, there are four different slices.

  • -.slice : The root slice.
  • system.slice : All system services are in this slice.
  • machine.slice : All vms and containers are in this slice.
  • user.slice : All user sessions are in this slice.

We can see the whole hierarchy using the systemd-cgls command. For example:

Control group /:
-.slice
├─machine.slice
│ ├─ircbot.service
│ │ ├─11272 /usr/bin/systemd-nspawn --boot --register=true -Zsystem_u:system_r:container_t:s0:c447,c607 -Lsystem_u:object_r:container_file_t:s0:c447,
│ │ ├─init.scope
│ │ │ └─11693 /usr/lib/systemd/systemd --default-standard-output=tty
│ │ └─system.slice
│ │   ├─ircbot.service
│ │   │ └─11701 /usr/bin/ircbot
│ │   └─systemd-journald.service
│ │     └─11695 /usr/lib/systemd/systemd-journald
├─user.slice
│ └─user-1000.slice
│   ├─session-31.scope
│   │ ├─16228 sshd: kdas [priv]
│   │ ├─16231 sshd: kdas@pts/0
│   │ ├─16232 -bash
│   │ ├─16255 sudo su -
│   │ ├─16261 su -
│   │ └─16262 -bash

You can manage various resources using cgroups. Here, in our example service file, I mentioned that memory limit for the service is 500MB. You can read more here on resource management.

There is also systemd-cgtop tool, which will give you a top like view for the various resources consumed by the slices.

# systemd-cgtop -M rkt-250d0c2b-0130-403b-a9a6-3bb3bde4e934

Control Group                                                           Tasks   %CPU   Memory  Input/s Output/s
/machine.slice/ircbot.service                                             9      -   234.0M        -        -
/machine.slice/ircbot.service/system.slice                                -      -     5.0M        -        -
/machine.slice/ircbot.service/system.slice/ircbot.service                 -      -     5.0M        -        -

The actual command which we used to run the container is mentioned in ExecStart.

Using the service

I can now use the standard systemctl commands for this new ircbot service. For example:

# systemctl start ircbot
# systemctl enable ircbot
# systemctl stop ircbot
# systemctl status ircbot

You can also view the log of the application using journalctl command.

# journalctl -u ircbot

The documentation from rkt has more details on systemd and rkt.

rkt image build command reference

In my last post, I wrote about my usage of rkt. I have also posted the basic configuration to create your own container images. Today we will learn more about those various build commands of the .acb files. We use these commands with the acbuild tool.

begin

begin starts a new build. The build information is stored inside the .acbuild directory in the current directory. By default, it starts with an empty rootfs. But we can pass some options to change that behavior. We can start with either a local filesystem, or a local aci image, or even from a remote aci image. To create the Fedora 25 aci image, I extracted the rootfs on a local directory and used that with begin command. Examples:

begin /mnt/fedora
begin ./fedora-25-linux-amd64.aci

dep

dep command is used to add any separate aci as a dependency to the current aci. In the rootfs the current aci will be on top of any dependency image. The order of the dependencies is important, so keep an eye to that while working on a new aci image. For example to build any image on top of the Fedora aci image we use the following line

dep add kushal.fedorapeople.org/rkt/fedora:25

run

We can execute any command inside the container we are building using the run command. For example to install a package using dnf we will use the following line:

run -- dnf install htop -y

The actual command (which will run inside the container) is after --, anything before that is considered part of the dep command itself.

environment

We can also add or remove any environment variable in the container image. We use environment command for the same.

environment add HOME /mnt
environment add DATAPATH /opt/data

copy

copy command is used to copy a file or a directory from the local filesystem to the aci image. For example, here we are coping dnf.conf file to the /etc/dnf/ directory inside the container image.

copy ./dnf.conf /etc/dnf/dnf.conf

mount

We use mount command to mark a location in the aci image which should be mounted while running the container. Remember one thing about mount points (this is true for ports too), they worked based on the name you give. Here, we are creating a mount point called apphome and then the next command we are actually specifying the host mount point for the same.

mount add apphome /opt/app/data
rkt run --volume apphome,kind=host,source=/home/kushal/znc,readOnly=false my-image.aci

port

Similar to the mount command, we can use the port command to mark any port of the container which can be mapped to the host system. We need to specify a name, the protocol (can be either udp or tcp) and finally the port number. We use the provided name to map it to a host port in the host.

port add http tcp 80
port add https tcp 443

set-user

set-user command specifies the user which will be used in the container environment.

set-user kushal

Remember to create the user before you try to use it.

set-group

Similar to the set-user command, it specifies the group which will be used to run the application inside the container.

set-working-directory

set-working-directory is used to set the working directory for the application inside the container.

set-working-directory /opt/data

set-exec

Using set-exec we specify a command to run as the application. In the below example we are running the znc command as the application in the container.

set-exec -- /usr/bin/znc --foreground

write

The final command for today is write. Using this command we create the final image from the current build environment. There is --overwrite flag, using which we can overwrite the image file we are creating.

write --overwrite znc-latest-linux-amd64.aci

I hope this post will help to understand the build commands, and you can use the same to build your own rkt images. In future, if I need to find the command reference, I can read this blog post itself.

Using rkt on my Fedora servers

Many of you already know that I moved all my web applications into containers on Fedora Atomic image based hosts. In the last few weeks, I moved a few of them from Docker to rkt on Fedora 25. I have previously written about trying out rkt in Fedora. Now I am going to talk about how can we build our own rkt based container images, and then use them in real life.

Installation of rkt

First I am going to install all the required dependencies, I added htop and tmux and vim on the list because I love to use them :)

$ sudo dnf install systemd-container firewalld vim htop tmux gpg wget
$ sudo systemctl enable firewalld
$ sudo systemctl start firewalld
$ sudo firewall-cmd --add-source=172.16.28.0/24 --zone=trusted
$ sudo setenforce Permissive

As you can see in the above-mentioned commands, rkt still does not work well with the SELinux on Fedora. We hope this problem will be solved soon.

Then install the rkt package as described in the upstream document.

$ sudo rkt run --interactive --dns=8.8.8.8 --insecure-options=image kushal.fedorapeople.org/rkt/fedora:25

The above-mentioned command downloads the Fedora 25 image I built and then executes the image. This is the base image for all of my other work images. You may not have to provide the DNS value, but I prefer to do so. The --interactive provides you an interactive prompt. If you forget to provide this flag on the command line, then your container will just exit. I was confused for some time and was trying to find out what was going on.

Building our znc container image

Now the next step is to build our own container images for particular applications. In this example first I am going to build one for znc. To build the images we will need acbuild tool. You can follow the instructions here to install it in the system.

I am assuming that you have your znc configuration handy. If you are installing for the first time, you can generate your configuration with the following command.

$ znc --makeconf

Now below is the znc.acb file for my znc container. We can use acbuild-script tool to create the container from this image.

#!/usr/bin/env acbuild-script

# Start the build with an empty ACI
begin

# Name the ACI
set-name kushal.fedorapeople.org/rkt/znc
dep add kushal.fedorapeople.org/rkt/fedora:25

run -- dnf update -y
run -- dnf install htop vim znc -y
run -- dnf clean all

mount add znchome /home/fedora/.znc
port add znc tcp 6667

run --  groupadd -r fedora -g 1000 
run -- useradd -u 1000 -d /home/fedora -r -g fedora fedora

set-user fedora

set-working-directory /home/fedora/
set-exec -- /usr/bin/znc --foreground 

# Write the result
write --overwrite znc-latest-linux-amd64.aci

If you look closely to the both mount and port adding command, you will see that I have assigned some name to the mount point, and also to the port (along with the protocol). Remember that in the rkt world, all mount points or ports work based on these assigned names. So, for one image HTTP name can be assigned to the standard port 80, but in another image, the author can choose to use port 8080 with the same name. While running the image we choose to decide how to map the names to the host side or vice-versa. Execute the following command to build our first image.

$ sudo acbuild-script znc.acb

If everything goes well, you will find an image named znc-latest-linux-amd64.aci in the current directory.

Running the container

$ sudo rkt --insecure-options=image --debug run --dns=8.8.8.8  --set-env=HOME=/home/fedora --volume znchome,kind=host,source=/home/kushal/znc,readOnly=false  --port znc:8010 znc-latest-linux-amd64.aci

Now let us dissect the above command. I am using --insecure-options=image option as I am not verifying the image, --debug flag helps to print some more output on the stdout. This helps to find any problem with a new image you are building. As I mentioned before I passed a DNS entry to the container using --dns=8.8.8.8. Next, I am overriding the $HOME environment value, I still have to dig more to find why it was pointing to /root/, but for now we will remember that --set-env can help us to set/override any environment inside the container.

Next, we mount /home/kushal/znc directory (which has all the znc configuration) in the mount name znchome and also specifying that it is not a readonly mount. In the same way we are doing a host port mapping of 8010 to the port named znc inside of the container. As the very last argument, I passed the image itself.

The following is the example where I am copying a binary (the ircbot application written in golang) into the image.

#!/usr/bin/env acbuild-script

# Start the build with an empty ACI
begin

# Name the ACI
set-name kushal.fedorapeople.org/rkt/ircbot
dep add kushal.fedorapeople.org/rkt/fedora:25

copy ./ircbot /usr/bin/ircbot

mount add mnt /mnt

set-working-directory /mnt
set-exec -- /usr/bin/ircbot

# Write the result
write --overwrite ircbot-latest-linux-amd64.aci

In future posts, I will explain how can you run the containers as systemd services. For starting, you can use a tmux session to keep them running. If you have any doubt, remember to go through the rkt documents. I found them very informative. You can also try to ask your doubts in the #rkt channel on Freenode.net.

Now it is an exercise for the reader to find out the steps to create an SELinux module from the audit log, and then use the same on the system. The last step should be putting the SELinux back on Enforcing mode.

$ sudo setenforce Enforcing

Trying out rkt v1.14.0 on Fedora 24

Few days back we had rkt v1.14.0 release from CoreOS. You can read details about the release in their official blog post. I decided to give it a try on a Fedora 24 box, I followed the official documentation. The first step was to download rkt, and acbuild tools.

To download, and install acbuild tool, I did the following: (Btw, as it was a cloud instance, I just moved the binaries to my sbin path)

$ wget https://github.com/containers/build/releases/download/v0.4.0/acbuild-v0.4.0.tar.gz
$ tar xzvf acbuild-v0.4.0.tar.gz
$ sudo mv acbuild-v0.4.0/* /usr/sbin/

Now for rkt, do the following.

$ wget https://github.com/coreos/rkt/releases/download/v1.14.0/rkt-v1.14.0.tar.gz
$ tar xzvf rkt-v1.14.0.tar.gz
$ cd rkt-v1.14.0
$ ./rkt help
$ sudo cp -r init/systemd/* /usr/lib/systemd/

Now I had to modify a path inside ./scripts/setup-data-dir.sh file, at line 58 I wrote the following.

systemd-tmpfiles --create /usr/lib/systemd/tmpfiles.d/rkt.conf

Next step is to execute the script. This will create the required directories, and fix the permission issues. Before that I will also create a group, and add my current user to the group. Remember to logout, and login again for it.

$ sudo groupadd rkt
$ export WHOAMI=$(whoami); sudo gpasswd -a $WHOAMI rkt
$ sudo ./scripts/setup-data-dir.sh

rkt documentation suggests to disable SELinux for trying out, I tried to run it with SELinux in Enforcing mode, and then created local policy based on the errors. I have also opened a bug for the rawhide package.

# ausearch -c 'systemd' --raw | audit2allow -M localrktrawhide
# semodule -i localrktrawhide.pp

After all this we are finally in a state to start using rkt in the system.

The Try out document says to trust the signing key of etcd, I am going to do that, and then test by fetchin the image.

$ sudo ./rkt trust --prefix coreos.com/etcd
$ ./rkt fetch coreos.com/etcd:v2.3.7
image: searching for app image coreos.com/etcd
image: remote fetching from URL "https://github.com/coreos/etcd/releases/download/v2.3.7/etcd-v2.3.7-linux-amd64.aci"
image: keys already exist for prefix "coreos.com/etcd", not fetching again
image: downloading signature from https://github.com/coreos/etcd/releases/download/v2.3.7/etcd-v2.3.7-linux-amd64.aci.asc
Downloading signature: [=======================================] 490 B/490 B
Downloading ACI: [=============================================] 8.52 MB/8.52 MB
image: signature verified:
  CoreOS Application Signing Key <security@coreos.com>
  sha512-7d28419b27d5ae56cca97f4c6ccdd309

You can view the images, with a image list subcommand.

$ ./rkt image list
ID                      NAME                                    SIZE    IMPORT TIME     LAST USED
sha512-5f362df82594     coreos.com/rkt/stage1-coreos:1.14.0     162MiB  1 day ago       1 day ago
sha512-86450bda7ae9     example.com/hello:0.0.1                 7.2MiB  15 hours ago    15 hours ago
sha512-7d28419b27d5     coreos.com/etcd:v2.3.7                  31MiB   48 seconds ago  48 seconds ago

From here, you can just follow the getting started guide. I used the debug flag to see what is going on.

$ sudo ./rkt --insecure-options=image --debug run ../hello/hello-0.0.1-linux-amd64.aci                                
image: using image from local store for image name coreos.com/rkt/stage1-coreos:1.14.0
image: using image from file ../hello/hello-0.0.1-linux-amd64.aci
stage0: Preparing stage1
stage0: Writing image manifest
stage0: Loading image sha512-86450bda7ae972c9507007bd7dc19a386011a8d865698547f31caba4898d1ebe
stage0: Writing image manifest
stage0: Writing pod manifest
stage0: Setting up stage1
stage0: Wrote filesystem to /var/lib/rkt/pods/run/d738b5e3-3fe9-4beb-ae5c-3e8f4153ee57
stage0: Pivoting to filesystem /var/lib/rkt/pods/run/d738b5e3-3fe9-4beb-ae5c-3e8f4153ee57
stage0: Execing /init
networking: loading networks from /etc/rkt/net.d
networking: loading network default with type ptp
Spawning container rkt-d738b5e3-3fe9-4beb-ae5c-3e8f4153ee57 on /var/lib/rkt/pods/run/d738b5e3-3fe9-4beb-ae5c-3e8f4153ee57/stage1/rootfs.
Press ^] three times within 1s to kill container.
systemd 231 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT -GNUTLS -ACL +XZ -LZ4 +SECCOM$ +BLKID -ELFUTILS +KMOD -IDN)
Detected virtualization rkt.
Detected architecture x86-64.

Welcome to Linux!

Set hostname to <rkt-d738b5e3-3fe9-4beb-ae5c-3e8f4153ee57>.
[  OK  ] Listening on Journal Socket.
[  OK  ] Created slice system.slice.
         Starting Create /etc/passwd and /etc/group...
[  OK  ] Created slice system-prepare\x2dapp.slice.
[  OK  ] Started Pod shutdown.
[  OK  ] Started hello Reaper.
[  OK  ] Listening on Journal Socket (/dev/log).
         Starting Journal Service...
[  OK  ] Started Create /etc/passwd and /etc/group.
[  OK  ] Started Journal Service.
         Starting Prepare minimum environment for chrooted applications...
[  OK  ] Started Prepare minimum environment for chrooted applications.
[  OK  ] Started Application=hello Image=example.com/hello.
[  OK  ] Reached target rkt apps target.
[111534.724440] hello[5]: 2016/09/10 14:48:59 request from 172.16.28.1:35438

While the above container was running, I tested it out from another terminal, and then stopped it.

$ ./rkt list
UUID            APP     IMAGE NAME              STATE   CREATED         STARTED         NETWORKS
865b862e        hello   example.com/hello:0.0.1 running 8 seconds ago   8 seconds ago   default:ip4=172.16.28.2
$ curl 172.16.28.2:5000
hello
$ sudo ./rkt stop 865b862e
"865b862e-21f5-43e0-a280-3b4520dad97c"

I hope this post will help you to try out rkt on a Fedora system. Feel free to comment if you have question, or ask over twitter.