Kushal Das4

FOSS and life. Kushal Das talks here.

New features in Gotun

Gotun is a written from scratch golang port of Tunir. Gotun can execute tests on remote systems, or it can run tests on OpenStack and AWS.

Installation from git

If you have a working golang setup, then you can use the standard go get command to install the latest version of gotun.

$ go get github.com/kushaldas/gotun

Configure based on YAML files

Gotun expects job configuration in a YAML file. The following is an example of a job for OpenStack.

---
BACKEND: "openstack"
NUMBER: 3

OS_AUTH_URL: "URL"
OS_TENANT_ID: "Your tenant id"
OS_USERNAME: "USERNAME"
OS_PASSWORD: "PASSWORD"
OS_REGION_NAME: "RegionOne"
OS_IMAGE: "Fedora-Atomic-24-20161031.0.x86_64.qcow2"
OS_FLAVOR: "m1.medium"
OS_SECURITY_GROUPS:
    - "group1"
    - "default"
OS_NETWORK: "NETWORK_POOL_ID"
OS_FLOATING_POOL: "POOL_NAME"
OS_KEYPAIR: "KEYPAIR NAME"
key: "Full path to the private key (.pem file)"

You can also point the OS_IMAGE to a local qcow2 image, which then will be uploaded to the cloud, and used. After the job is done, the image will be removed.

Multiple VM cluster on OpenStack

The OpenStack based jobs also support multiple VM(s). In the above example, we are actually creating three instances from the image mentioned.

Job file syntax

Gotun supports the same syntax of the actual tests of Tunir. Any line starting with ## means those are non-gating tests, even if they fail, the job will continue. For a cluster based job, use vm1, vm2 and similar numbers to mark which VM to be used for the command.

Rebuild of the instances on OpenStack

For OpenStack based jobs, gotun adds a new directive, REBUILD_SERVERS, which will rebuild all the instances. In case one of your tests does something destructive to any instance, using this new directive you can now rebuild all the instances, and start from scratch. The following is the tests file and output from one such job.

echo "hello asd" > ./hello.txt
vm1 sudo cat /etc/machine-id
mkdir {push,pull}
ls -l ./
pwd
REBUILD_SERVERS
sudo cat /etc/machine-id
ls -l ./
pwd
$ gotun --job fedora
Starts a new Tunir Job.

Server ID: e0d7b55a-f066-4ff8-923c-582f3c9be29b
Let us wait for the server to be in running state.
Time to assign a floating pointip.
Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Server ID: a0b810e6-0d7f-4c9e-bc4d-1e62b082673d
Let us wait for the server to be in running state.
Time to assign a floating pointip.
Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Executing:  echo "hello asd" > ./hello.txt
Executing:  vm1 sudo cat /etc/machine-id
Executing:  mkdir {push,pull}
Executing:  ls -l ./
Executing:  pwd
Going to rebuild: 209.132.184.241
Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Going to rebuild: 209.132.184.242
Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Polling for a successful ssh connection.

Executing:  sudo cat /etc/machine-id
Executing:  ls -l ./
Executing:  pwd

Result file at: /tmp/tunirresult_180507156


Job status: true


command: echo "hello asd" > ./hello.txt
status:true



command: sudo cat /etc/machine-id
status:true

e0d7b55af0664ff8923c582f3c9be29b


command: mkdir {push,pull}
status:true



command: ls -l ./
status:true

total 4
-rw-rw-r--. 1 fedora fedora 10 Jan 25 13:58 hello.txt
drwxrwxr-x. 2 fedora fedora  6 Jan 25 13:58 pull
drwxrwxr-x. 2 fedora fedora  6 Jan 25 13:58 push


command: pwd
status:true

/var/home/fedora


command: sudo cat /etc/machine-id
status:true

e0d7b55af0664ff8923c582f3c9be29b


command: ls -l ./
status:true

total 0


command: pwd
status:true

/var/home/fedora


Total Number of Tests:8
Total NonGating Tests:0
Total Failed Non Gating Tests:0

Success.

Using Ansible inside a job is now easier

Before running any actual test command, gotun creates a file called current_run_info.json in the job directory, we can now use that to create inventory file for Ansible. Then we can mark any Ansible playbook as a proper test in the job description.

#!/usr/bin/env python3
import json

data = None
with open("current_run_info.json") as fobj:
    data = json.loads(fobj.read())

user = data['user']
host1 = data['vm1']
host2 = data['vm2']
key = data['keyfile']

result = """{0} ansible_ssh_host={1} ansible_ssh_user={2} ansible_ssh_private_key_file={3}
{4} ansible_ssh_host={5} ansible_ssh_user={6} ansible_ssh_private_key_file={7}""".format(host1,host1,user,key,host2,host2,user,key)
with open("inventory", "w") as fobj:
    fobj.write(result)

The above-mentioned script is an example, we are reading the JSON file created by the gotun, and then writing to a new inventory file to be used by an Ansible playbook. The documentation has one example of running atomic-host-tests inside gotun.

You have any question, come and ask in the #fedora-cloud channel. You can contact me over twitter too.

Fedora Atomic Working Group update from 2017-01-17

This is an update from Fedora Atomic Working Group based on the IRC meeting on 2017-01-17. 14 people participated in the meeting, the full log of the meeting can be found here.

OverlayFS partition

We had a decision to have a docker partition in Fedora 26. The root partition sizing will also need to be fixed. One can read all the discussion on the same at the Pagure issue.

We also require help in writing the documentation for migration from Devicemapper -> Overlay -> Back.

How to compose your own Atomic tree?

Jason Brooks will update his document located at Project Atomic docs.

docker-storage-setup patches require more testing

There are pending patches which will require more testing before merging.

Goals and PRD of the working group

Josh Berkus is updating the goals and PRD documentation for the working group. Both short term and long term goals can be seen at this etherpad. The previous Cloud Working Group’s PRD is much longer than most of the other groups’ PRD, so we also discussed trimming the Atomic WG PRD.

Open floor discussion + other items

I updated the working group about a recent failure of the QCOW2 image on the Autocloud. It appears that if we boot the images with only one VCPU, and then after disabling chroyd service when we reboot, there is no defined time to have ssh service up and running.

Misc talked about the hardware plan for FOSP., and later he sent a detailed mail to the list on the same.

Antonio Murdaca (runcom) brought up the discussion about testing the latest Docker (1.13) and pushing it to F25. We decided to spend more time in testing it, and then only push to Fedora 25, otherwise it may break Kubernetes/Openshift. We will schedule a 1.13 testing week in the coming days.

Atomic Working Group update from this week's meeting

Two days back we had a very productive meeting in the Fedora Atomic Working Group. This post is a summary of the meeting. You can find all the open issues of the working group in this Pagure repo. There were 14 people present at the meeting, which happens on every Wednesday 5PM UTC at the #fedora-meeting-1 channel on Freenode IRC server.

Fedora 26 change proposal ideas discussion

This topic was the first point of discussion in the meeting. Walters informed that he will continue working on the Openshift related items, mostly the installer, system containers etc, and also the rpm-ostree. I have started a thread on the mailing list about the change idea, we also decided to create a wiki page to capture all the ideas.

During the Fedora 25 release cycle, we marked couple of Autocloud tests as non-gating as they were present in the system for some time (we added the actual tests after we found the real issue). Now with Fedora 25 out, we decided to reopen the ticket and mark those tests back as gating tests. Means in future if they fail, Fedora 25 Atomic release will be blocked.

The suggestion of creating the rkt base image as release artifact in Fedora 26 cycle, brought out some interesting discussion about the working group. Dusty Mabe mentioned to fix the current issues first, and then only jump into the new world. If this means we will support rkt in the Atomic Host or not, was the other concern. My reaction was maybe not, as to decide that we need to first coordinate between many other teams. rkt is packaged in Fedora, you can just install it in a normal system by the following command.

$ sudo dnf install rkt -y

But this does not mean we will be able to add support for rkt building into OSBS, and Adam Miller reminded us that will take a major development effort. It is also not in the road map of the release-infrastructure team. My proposal is to have only the base image build officially for rkt, and then let our users consume that image. I will be digging more on this as suggested by Dusty, and report back to the working group.

Next, the discussion moved towards a number of technical debts the working group is carrying. One of the major issue (just before F25 release) was about missing Atomic builds, but we managed to fix the issue on time. Jason Brooks commented that this release went much more promptly, and we are making a progress in that :) Few other points from the discussion were

  • Whether the Server working group agreed to maintain the Cloud Base image?
  • Having ancient k8s is a problem.
  • We will start having official Fedora containers very soon.

Documentation

Then the discussion moved to documentation, the biggest pain point of the working group in my mind. For any project, documentation can define if it will become a success or not. Users will move on unless we can provide clearly written instructions (which actually works). For the Atomic Working Group, the major problem is not enough writers. After the discussion in the last Cloud FAD in June, we managed to dig through old wiki pages. Trishna Guha is helping us to move them under this repo. The docs are staying live on https://fedoracloud.rtfd.io. I have sent out another reminder about this effort to the mailing list. If you think of any example which can help others, please write it down, and send in a pull request. It is perfectly okay if you publish it in any other document format. We will help you out to convert it into the correct format.

You can read the full meeting log here.

One week with Fedora Atomic in production

I was using containers for over a year in my personal servers. I was running a few services in those. For the last one week, I moved all my personal servers into Fedora Atomic, and running more number of services in those.

Server hardware & OS

These are actually VM(s) with couple of GB(s) of RAM, and a few CPU(s). I installed using the Fedora Atomic ISO image (get it from here) over virt-manager.

The containers inside

You can find all the Dockerfiles etc in the repo. Note: I still have to clean up a few of those.

Issues faced

In day zero the first box I installed, stopped printing anything on STDOUT, after a reboot I upgraded with atomic host upgrade command. I never had any other problem still now. So, try to stay updated.

Building my own rpm-ostree repo

My next target was to compose my own rpm-ostree repo. I used Patrick’s workstation repo files for the same. In my fork I added couple of files for my own tree, and the build script. The whole work is done on a Fedora 24 container. You can view the repo here. This whole thing is exposed via another apache container. I will explain more about the steps in a future blog post.

What is next?

First step is clean up my old Dockerfiles. I will add up any future service as containers in those boxes. Even though we are automatically testing our images using Autocloud, using this in my production environment will help me find bugs in more convenient manner.

Fedora Atomic 24 is now available

Just in case you missed the news, Adam Miller already announced the availability of Fedora Atomic release based on Fedora 24. You can get it from the usual place. Dusty already uploaded the same into Atlas for Vagrant. You can try it out by the following.

    $ vagrant init fedora/24-atomic-host; vagrant up

As Adam mentioned in his mail, we are sorry for the delay, but we will keep improving the process. Thank you everyone for helping us with this release.

Event report: Fedora Cloud FAD 2016

Around a month back the Fedora Cloud Working Group met in Raleigh for two days for Cloud FAD. The goal of the meet was to agree about the future we want, to go through major action items for the coming releases. I reached Raleigh one day before, Adam Miller was my room mate for this trip. Managed to meet Spot after a long time, this was my first visit to mothership :) I also managed to meet my new teammate Randy Barlow.

Adam took the lead in the event, we first went through the topics from the FAD wiki page. Then arranged them as in the order we wanted to discuss.

Documentation was the first item. Everyone in the room agreed that it is the most important part of the project, we communicate with our users using the documentation. If a project can provide better documentation with clear examples, it will be able to attract more users to itself. Last year I have started a repo to have documents available in the faster manner, but that did not work out well. Jared volunteered, and also suggested to have an open repo where people can submit documents without worrying much about format. He will help to convert to the right format as required. We also noted few important examples/documents we want to see. Feel free to write about any of these and submit a pull request to the repo.

Automated testing was the next import point. We went through the current state of tests in Autocloud project. Most of our tests there came from various bugs filed against the images/tools. Dusty was very efficient in creating the corresponding issues on github from the RH bugzilla, which then in turn was converted into proper Python3 unittest based test case by the volunteers. Next we moved to the automated testing of layered image builds. Tflink provided valuable input on this discussion. We talked SPC(s), how the maintainers of the images will be responsible for the tests. For the tests related to the Cloud/Atomic images, the Cloud Working Group will be responsible.

There were few super long discussions related Fedora, Atomic, and Openshift projects. Many members from the Openshift development team also joined in the discussion. Results of these discussions will be seen in the mailing lists (a lot of content for this blog post). We also discussed about Vagrant, and related tools people are using or creating. With help from Randy, I managed to package vagrant-digitalocean plugin during the FAD. There will be a Fedora magazine post with more details on the same.

We also agreed upon having monthly updated base images. We still have to find out a few missing points related to the same so that we can have a smooth updated release.

Public cloud provider outreach was one of the last points we discussed. We have to pick different providers one by one, and have to make sure that we can provide updated releases to them for the consumption the users. The point of more documentation came up in this discussion too.

Report: Fedora 24 Cloud/Atomic test day

Last Tuesday we had a Fedora 24 test day about Fedora Cloud, and Atomic images. With help from Adam Williamson I managed to setup the test day. This was first time for me to use the test day web app, where the users can enter results from their tests.

Sayan helped to get us the list of AMI(s) for the Cloud base image. We also found our first bug from this test day here, it was not an issue in the images, but in fedimg. fedimg is the application which creates the AMI(s) in an automated way, and it was creating AMI(s) for the atomic images. Today sayan applied a hotfix for the same, I hope this will take care of issue.

While testing the Atomic image, I found docker was not working in the image, but it worked in the Cloud base image. I filed a bug on the same, it seems we already found the root cause in another bug. The other major issue was about upgrade of the Atomic image failing, and it was also a known issue.

In total 13 people volunteered in the test day from Fedora QA, and Cloud SIG groups. It was a successful event as we found some major issues, but we will be happy not to have any issues at all :)

Testing containers using Kubernetes on Tunir version 0.15

Today I have released Tunir 0.l5. This release got a major rewrite of the code, and has many new features. One of them is setting up multiple VM(s) from Tunir itself. We now also have the ability to use Ansible (using 2.x) from within Tunir. Using these we are going to deploy Kubernetes on Fedora 23 Atomic images, and then we will deploy an example atomicapp which follows Nulecule specification.

I am running this on Fedora 23 system, you can grab the latest Tunir from koji. You will also need the Ansible 2.x from the testing repository.

Getting Kubernetes contrib repo

First we will get the latest Kubernetes contrib repo.

$ git clone https://github.com/kubernetes/contrib.git

Inside we will make changes to a group_vars file at contrib/ansible/group_vars/all.yml

diff --git a/ansible/group_vars/all.yml b/ansible/group_vars/all.yml
index 276ded1..ead74fd 100644
--- a/ansible/group_vars/all.yml
+++ b/ansible/group_vars/all.yml
@@ -14,7 +14,7 @@ cluster_name: cluster.local
# Account name of remote user. Ansible will use this user account to ssh into
# the managed machines. The user must be able to use sudo without asking
# for password unless ansible_sudo_pass is set
-#ansible_ssh_user: root
+ansible_ssh_user: fedora

# password for the ansible_ssh_user. If this is unset you will need to set up
# ssh keys so a password is not needed.

Setting up the Tunir job configuration

The new multivm setup requires a jobname.cfg file as the configuration. In this case I have already downloaded a Fedora Atomic cloud .qcow2 file under /mnt. I am going to use that.

[general]
cpu = 1
ram = 1024
ansible_dir = /home/user/contrib/ansible

[vm1]
user = fedora
image = /mnt/Fedora-Cloud-Atomic-23-20160308.x86_64.qcow2
hostname = kube-master.example.com

[vm2]
user = fedora
image = /mnt/Fedora-Cloud-Atomic-23-20160308.x86_64.qcow2
hostname = kube-node-01.example.com

[vm3]
user = fedora
image = /mnt/Fedora-Cloud-Atomic-23-20160308.x86_64.qcow2
hostname = kube-node-02.example.com

The above configuration file is mostly self explanatory. All VM(s) will have 1 virtual CPU, and 1024 MB of RAM. I also put up the directory which contains the ansible source. Next we have 3 VM definitions. I also have hostnames setup for each, this are the same hostnames which are mentioned in the inventory file. The inventory file should exist on the same directory with name inventory. If you do not want to mention such big names, you can simply use vm1, vm2, vm3 in the inventory file.

Now if we remember, we need a jobname.txt file containing the actual commands for testing. The following is from our file.

PLAYBOOK cluster.yml
vm1 sudo atomic run projectatomic/guestbookgo-atomicapp

In the first line we are mentioning to run the cluster playbook. In the second line we are putting in the actual atomic command to deploy guestbook app on our newly setup Kubernetes. You can understand that we mention which VM to execute as the first part of the line. If no vm is marked, Tunir assumes that the command has to run on vm1.

Now if we just execute Tunir, you will be able to see Kubernetes being setup, and then the guestbook app being deployed. You can add few more commands in the above mentioned file to see how many pods running, or even about the details of the pods.

$ sudo tunir --multi jobname

Debug mode

For the multivm setup, Tunir now has a debug mode which can be turned on by passing --debug in the command line. This will not destroy the VM(s) at the end of the test. It will also create a destroy.sh script for you, which you can run to destroy the VM(s), and remove all temporary directories. The path of the file will be given at the end of the Tunir run.

DEBUG MODE ON. Destroy from /tmp/tmp8KtIPO/destroy.sh

Please try these new features, and comment for any improvements you want.

State of tests for Fedora Cloud and Atomic in March 2016

Till Fedora 22 release we have tested our Cloud images only with manual help. The amazing Fedora QA team organized test days, and also published detailed documentation on the wiki about how to test the images. People tried to help as when possible, as not having access to a Cloud was a problem for many. The images are also big in size (than any random RPM), so that also meant only people with enough bandwidth can help.

During Fedora 23 release cycle, we worked on a change for Automated Two Week Atomic release. A part of it was about having automated testing, which we enabled using Autocloud project. This automatically tests every Cloud Base and Atomic qcow2 images, and libvirt, and Virtualbox based Vagrant images.

This post will explain the state of currently activated tests for the same. These tests are written as Python3 unittest cases.

At first we run 5 non-gating tests, failing any of these will not do a release blocking, but at the end of each run we get the summary for all of these non-gating tests. These include bzip2, cpio, diffutils, Audit.

Then we go ahead in testing the basics test cases, things like journald logging, package install (only on base image), SELinux should be enforcing, and no service should fail during startup of the image.

The next test is about checking that /tmp should be world writable. We then move into testing the mount status of the /tmp filesystem. If it is a link, then also it should behave properly. Our next test checks that for Vagrant images we are using predictable naming convention of network devices, basically checking the existing of eth0.

We then disable crond service, after a reboot we make sure that the service is still in disabled state, and do the rest of service manipulations. We also check that the user journald log file exists on that reboot, this comes from a regression test. Then we reboot the instance again.

After this second reboot, we test the status of the crond service once again, and then we move to our special tests cases related to Atomic host. Means these tests will run only on Atomic hosts (both cloud, and Vagrant ones).

In the first test we check if the package docker is at all installed or not. Yes, if you guessed that this is a regression test, then you are correct. We once had an image without the docker package inside :) Next, we test the docker storage setup, that should be up, and in running state. We then run the busybox image, and see that it should be able to run properly. The atomic command is used next to start the same container. Pulling in the latest Fedora image, and running it is the next test. As our next test, we try to mount / as /host in the container. The next test is about having /bin, /sbin, and /usr mounted as read only in the Atomic host. In our final test, we check that a privileged container should be able to talk to the host docker daemon.

We also have a github wiki page explaining how to write any new test case. Feel free to ping if you want to contribute, and make Fedora flying high in the Cloud.

Need help to test Fedora Cloud images

Fedora Cloud Working Group maintains two different types of Cloud images, Cloud base image (which is a minimal standard Fedora build for Cloud), and also the Fedora atomic image (from Project Atomic). We also maintain, and release the Vagrant images for the same. We are on our way to keep releasing updated Atomic images every two weeks, and updated cloud images once every month. This means we need more help in testing the images.

Autocloud is a new service running in Fedora Infrastructure which automatically tests the cloud images. In a last post I wrote about Autocloud, but it does not test every aspect of the images. We are yet to be in a state where we can say that 100(s) of tests are keeping eye on the images automatically. We need the support from the community in both writing new tests, and also testing the images manually. This is a great way not only to help the community, but also a chance to learn about the latest container and cloud technologies. You will be not only working with the Cloud WG, but will also be in touch with Fedora QA, and Fedora Infra group. If you are interested feel free to add your name in this list, or ping me (nick name kushal) on #fedora-cloud channel.