Skip to content

CodeFlare Operator Installation

James Busche edited this page May 25, 2023 · 21 revisions

CodeFlare Operator Installation

Taken from: https://github.com/opendatahub-io/distributed-workloads/blob/main/Quick-Start.md

0. Pre-reqs:

0.1 Assumes you have an OpenShift Cluster

0.2 It assumes you're logged into the OpenShift Console of your OpenShift Cluster, to be able to install the ODH and CodeFlare operators. (Applying a subscription from the terminal is available if you don't have the OpenShift UI)

0.3 It assumes you've already used oc login to log into your OpenShift cluster from a terminal.

0.4 It also assumes you have a default storage class already set up. For the IBM Fyre clusters, I'm using "PortWorx" storage and have defined a default storageclass:

oc get sc |grep default
portworx-watson-assistant-sc (default)   kubernetes.io/portworx-volume   Retain          Immediate           true                   3h50m

1. Install ODH in openshift-operators using the OpenShift UI console.

1.1 Using your Console, navigate to Operators --> OperatorHub and filter for Open Data Hub Operator

1.2 Press Install, accept all the defaults and then press Install again.

Optionally, you could have issued the subscription from the terminal with this:

cat << EOF | kubectl apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: opendatahub-operator
  namespace: openshift-operators
spec:
  channel: rolling
  name: opendatahub-operator
  source: community-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Automatic
  startingCSV: opendatahub-operator.v1.6.0
EOF

1.3 Using your terminal, you can see that the ODH operator is running by:

oc get pods -n openshift-operators

and you'll see that it has:

NAME                                                       READY   STATUS    RESTARTS   AGE
opendatahub-operator-controller-manager-84858b8998-7nd6q   2/2     Running   0          87s

2. Install the CodeFlare Operator into openshift-operators namespace using the OpenShift UI console:

2.1 Using your Console, navigate to Operators --> OperatorHub and filter for CodeFlare Operator

2.2 Press Install, accept all the defaults and then press Install again.

Optionally, you could have issued the subscription from the terminal with this:

cat << EOF | kubectl apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: codeflare-operator
  namespace: openshift-operators
spec:
  channel: alpha
  name: codeflare-operator
  source: community-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Automatic
  startingCSV: codeflare-operator.v0.0.3
EOF

2.3 Using your terminal, you can see that the CodeFlare operator is running by:

oc get pods -n openshift-operators

and you'll see that it has:

NAME                                                       READY   STATUS    RESTARTS   AGE
codeflare-operator-controller-manager-8594c586f4-rlbbv     2/2     Running   0          100s
opendatahub-operator-controller-manager-84858b8998-7nd6q   2/2     Running   0          2m24s

3. Now with the Codeflare and ODH operators installed, you can deploy the kfdefs which will install the underlying stack to the opendatahub namespace:

3.1 Create the opendatahub namespace with the following command:

oc create ns opendatahub

3.2 Apply the odh-core kfdef with this command:

oc apply -f https://raw.githubusercontent.com/opendatahub-io/odh-manifests/master/kfdef/odh-core.yaml -n opendatahub

3.3 Create the CodeFlare-Stack kfdef with this command:

oc apply -f https://raw.githubusercontent.com/opendatahub-io/distributed-workloads/main/codeflare-stack-kfdef.yaml -n opendatahub

3.4 Check that everything is running in opendatahub with this command:

oc get pods -n opendatahub

It should look like this:

NAME                                                              READY   STATUS    RESTARTS   AGE
data-science-pipelines-operator-controller-manager-5fbfdc8x5wnx   1/1     Running   0          3m39s
etcd-85c59bc4d6-wn777                                             1/1     Running   0          3m41s
grafana-deployment-6cf577dbb6-ptcjp                               1/1     Running   0          3m35s
grafana-operator-controller-manager-54fbd5b876-zfbvz              2/2     Running   0          4m4s
instascale-instascale-66587c96f5-28chv                            1/1     Running   0          4m34s
kuberay-operator-67d58795bf-h8hwt                                 1/1     Running   0          4m31s
mcad-controller-mcad-5f5cb64ddb-mhf5p                             1/1     Running   0          4m34s
modelmesh-controller-5588b58d79-c46g5                             1/1     Running   0          3m41s
modelmesh-controller-5588b58d79-tn4rt                             1/1     Running   0          3m41s
modelmesh-controller-5588b58d79-wz82x                             1/1     Running   0          3m41s
notebook-controller-deployment-5c565c4c75-2pbzg                   1/1     Running   0          3m50s
odh-dashboard-7f46945556-kd7l5                                    2/2     Running   0          4m37s
odh-dashboard-7f46945556-vsg4m                                    2/2     Running   0          4m37s
odh-model-controller-79c67bc689-5559f                             1/1     Running   0          3m41s
odh-model-controller-79c67bc689-9q9ss                             1/1     Running   0          3m41s
odh-model-controller-79c67bc689-vnfbh                             1/1     Running   0          3m41s
odh-notebook-controller-manager-5cf77fdc56-s4cm6                  1/1     Running   0          3m50s
prometheus-odh-model-monitoring-0                                 3/3     Running   0          3m39s
prometheus-odh-model-monitoring-1                                 3/3     Running   0          3m39s
prometheus-odh-model-monitoring-2                                 3/3     Running   0          3m39s
prometheus-odh-monitoring-0                                       2/2     Running   0          3m58s
prometheus-odh-monitoring-1                                       2/2     Running   0          3m58s
prometheus-operator-779f765944-p2nbf                              1/1     Running   0          4m9s

4. Access the spawner page by going to your Open Data Hub dashboard. It'll be in the format of:

https://odh-dashboard-$ODH_NAMESPACE.apps.<your cluster's uri>

4.1 You can find it with this command:

oc get route -n opendatahub |grep dash

For example:

odh-dashboard          odh-dashboard-opendatahub.apps.jimbig412.cp.fyre.ibm.com                 odh-dashboard          8443    reencrypt/Redirect   None

4.2 Put that in your browser. For example: https://odh-dashboard-opendatahub.apps.jimbig412.cp.fyre.ibm.com

- If prompted, give it your kubeadmin user and password
- If prompted, grant it access as well

4.3 Click on the link "Launch application" in the Jupyter tile.

4.4 Choose CodeFlare Notebook, and click "Start server"

4.5 Note, if this is the first time, it'll take awhile to pull the new container. You can watch it start from the terminal by issuing this:

oc get pods -n opendatahub |grep jupyter

And it'll show if the pod is starting or has started. For example:

jupyter-nb-kube-3aadmin-0                                         0/2     ContainerCreating   0          89s
and then a few minutes later:
jupyter-nb-kube-3aadmin-0                                         2/2     Running   0          2m30s

4.6 Note, It's also using a pvc:

oc get pvc
NAME                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                   AGE
jupyterhub-nb-kube-3aadmin-pvc   Bound    pvc-28c725bd-6ba8-4bf4-92fe-b88b82b58fc6   1Gi        RWO            portworx-watson-assistant-sc   3m32s

5. In the Jupyter Notebook:

5.1 Click either "Open in a new tab" or "Open in current tab"

- If prompted, give it your kubeadmin user and password
- If prompted, grant it access as well

5.2 Click on the "+" to open up a new window, select terminal Inside this terminal, do this:

git clone https://github.com/project-codeflare/codeflare-sdk.git

Then you can close the terminal

5.2 On the far left, navigate to: codeflare-sdk --> demo-notebooks --> batch-job --> batch_mnist.ipynb

6. Then walk through the pipeline one line at a time

Hint: If you have a slow cluster like mine, double-click to edit your mnist.py and near the bottom, change:

  max_epochs=100,
to
  max_epochs=5,

To save you a lot of time.