Giovanni Bricconi

My site on WordPress.com

Trying Kubeflow

leave a comment »

This week I decided to focus a bit on learning Kubeflow instead of reading a research paper. During this year I tried to follow Andrew Ng advice: read at least one research paper a week – actually he suggests reading 2 but I realize I don’t have enough time to do that. At least during these months I felt more happy and I realized I am really learning new things.

Now I would like to put in practice some things I have learned and I tried to approach Kubeflow. On the paper it is very powerful an you could put it on any cloud provider and start cooking your stuffs; in reality I feel like I bumped into a wall. Ouch, it hurts!

First you need to install a ton of operators and component into kubernetes. Ok they provide the templates to do that, and probably you can install all locally with some minikube like environment, but we tried to do that on Azure to see how it should really be. The fact is that you need something to pay for it, you need to have a credit card in your account, and the big company is putting a lot of constraints on how containers and networks should be deployed, which sites you can reach from your pods, how you can connect to them.

I started thinking that anybody else in a start-up can just have a cloud subscription and install the things plain vanilla, while I had to struggle for days, and ask for help. Finally I am not even sure all the issues are solved, just some modules start and I can reach an UI but only via a bridge server. Not really a productive way of working – especially because I just want to try the tool and understand if it can be useful in future.

Once accessing the UI I started having a look at the MNIST example. I have to run the examples on Azure and if you search in the example sources you will see that Azure is not that really present. At least the mnist example seemed clear. You have a notebook that prepares for you an image, containing your model, and you deploy and run it; easy no? No

First you need to do the kubernetes set-up, create storage accounts, create secrets into a namespace, create namespace to runs the notebooks, configure a docker registry… Also the example says it has been tested with a specific image version, but searching for it I did not find it. And once Jupyter is working I realized locally something was missing to have the kubectl api working. I needed to figure out that you can set up a kubectl connection with some azure command line tool, and that all the configuration and secrets go into a .kube/config file that the you can copy on your pod.

Then I started running the code and I saw error messages about module versions incompatibilities… the image I am using is probably newer that the example and, bha I started changing things until something started working.

Now I am stuck with some kubeflow fairing issues: in the code it is checking to see if a secret exists, and the library used is not ok. it says the method is not present. Sure another version incompatibility issue, not an happy Sunday morning experience.

I started thinking that we should just chose an environment available as a service, like azure machine learning studio, and then focus on how to run the model anywhere on other cloud providers. Setting up a whole environment like kubeflow is too complicated for a guy in few days, a more decent amount of time should be allowed. You should also make sure that guys with competences on the cloud provider and the security are available around, because theirs help will be precious.

Written by Giovanni

April 2, 2023 at 12:21 pm

Posted in Varie

Leave a comment