Trident: iSCSI volumes mounted with root permissions on OpenShift 4.5

Created on 26 Mar 2021  ·  5Comments  ·  Source: NetApp/trident

Describe the bug
A new OpenShift 4.5 cluster has been deployed together with NetApp Trident installer v20.10.1.
NFS storage served by NetApp is working fine on the OpenShift cluster. But iSCSI volumes are resulting in 'permission denied' errors. Because of this, prometheus and elasticsearch pods cannot be started.

$ oc logs -f prometheus-k8s-0 -c prometheus
level=info ts=2021-03-26T10:26:30.595Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.2, branch=rhaos-4.5-rhel-7, revision=c3b41963fbe48114e54396ac05b56b02cb3e4a0a)"
level=info ts=2021-03-26T10:26:30.595Z caller=main.go:331 build_context="(go=go1.13.4, user=root@3d590aab9ed6, date=20200810-04:36:52)"
level=info ts=2021-03-26T10:26:30.595Z caller=main.go:332 host_details="(Linux 4.18.0-193.14.3.el8_2.x86_64 #1 SMP Mon Jul 20 15:02:29 UTC 2020 x86_64 prometheus-k8s-0 (none))"
level=info ts=2021-03-26T10:26:30.595Z caller=main.go:333 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2021-03-26T10:26:30.595Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2021-03-26T10:26:30.595Z caller=query_logger.go:85 component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied"
panic: Unable to create mmap-ed active query log

$ oc logs -f elasticsearch-cdm-6uixr0l5-1-574cf77c5c-fbbgj -c elasticsearch
...
[2021-03-26 10:27:10,098][INFO ][container.run ] ES_JAVA_OPTS: ' -Xms4096m -Xmx4096m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Xloggc:/elasticsearch/persistent/elasticsearch/logs/gc.log -XX:ErrorFile=/elasticsearch/persistent/elasticsearch/logs/error.log'
[2021-03-26 10:27:10,099][INFO ][container.run ] Checking if Elasticsearch is ready
mkdir: cannot create directory '/elasticsearch/persistent/elasticsearch': Permission denied

We figured out the iSCSI volumes are mounted with root permissions on the OpenShift nodes:

sh-4.4# df -h | grep pvc-17029cbf-27eb-4f29-bad2-e0608a518072
/dev/mapper/3600a098056303030303f4f77477a3276 9.8G 37M 9.3G 1% /var/lib/kubelet/pods/565ca5b8-d1fd-4dc5-b118-cf224e226dc1/volumes/kubernetes.io~csi/pvc-17029cbf-27eb-4f29-bad2-e0608a518072/mount

sh-4.4# ls -lRt /var/lib/kubelet/pods/565ca5b8-d1fd-4dc5-b118-cf224e226dc1/volumes/kubernetes.io~csi/pvc-17029cbf-27eb-4f29-bad2-e0608a518072/mount
/var/lib/kubelet/pods/565ca5b8-d1fd-4dc5-b118-cf224e226dc1/volumes/kubernetes.io~csi/pvc-17029cbf-27eb-4f29-bad2-e0608a518072/mount:
total 20
drwxr-xr-x. 2 root root 4096 Mar 24 16:46 prometheus-db <=====================================
drwx------. 2 root root 16384 Mar 24 11:25 lost+found

On another OpenShift 4.5 cluster that doesn't have the issue, the prometheus volumes are mounted like this:

sh-4.4# df -h | grep pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd
/dev/mapper/3600a098056303030302b4f7733775552 15G 9.5G 4.6G 68% /var/lib/kubelet/pods/bdb57dac-8202-4ec2-8c97-2f9367c7ed40/volumes/kubernetes.io~csi/pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd/mount
sh-4.4# ls -lRt /var/lib/kubelet/pods/bdb57dac-8202-4ec2-8c97-2f9367c7ed40/volumes/kubernetes.io~csi/pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd/mount
/var/lib/kubelet/pods/bdb57dac-8202-4ec2-8c97-2f9367c7ed40/volumes/kubernetes.io~csi/pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd/mount:
total 20
drwxrwsr-x. 32 root 1000260000 4096 Mar 25 16:00 prometheus-db <==============================
drwxrws---. 2 root 1000260000 16384 Oct 16 15:28 lost+found

The only difference here is that Trident v20.07.0 is used. Which is an older version compared to v20.10.1.

Environment
Provide accurate information about the environment to help us reproduce the issue.

  • Trident version: 20.10.1
  • Container runtime: OpenShift 4.5
  • Kubernetes orchestrator: OpenShift v4.5
  • OS: RHEL CoreOS release 4.5
  • NetApp backend types: ONTAP Select running ONTAP 9.6

To Reproduce
Creating a test pod with an iscsi pvc also fails to start.
'oc debug pod/test_pod' shows the iscsi volume is mounted but no permission to write on the file system.
'oc debug pod/test_pod --as-root' allows us to write files/directories inside the iscsi file system.
This is not expected behavior. OpenShift containers should not be running via the root account.

Expected behavior
On another OpenShift 4.5 cluster that doesn't have the issue, the prometheus volumes are mounted like this:

sh-4.4# df -h | grep pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd
/dev/mapper/3600a098056303030302b4f7733775552 15G 9.5G 4.6G 68% /var/lib/kubelet/pods/bdb57dac-8202-4ec2-8c97-2f9367c7ed40/volumes/kubernetes.io~csi/pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd/mount
sh-4.4# ls -lRt /var/lib/kubelet/pods/bdb57dac-8202-4ec2-8c97-2f9367c7ed40/volumes/kubernetes.io~csi/pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd/mount
/var/lib/kubelet/pods/bdb57dac-8202-4ec2-8c97-2f9367c7ed40/volumes/kubernetes.io~csi/pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd/mount:
total 20
drwxrwsr-x. 32 root 1000260000 4096 Mar 25 16:00 prometheus-db <==============================
drwxrws---. 2 root 1000260000 16384 Oct 16 15:28 lost+found

The only difference is that Trident v20.07.0 is used on this OpenShift cluster, instead of v20.10.1.

Additional context
NetApp support case: 2008704377 -> still no feedback received

bug

All 5 comments

My guess is that you didn't specify the fsType parameter in your storage class. Delete the existing storage class (no impact to existing volumes), add

fsType: "ext4"

to your storage class yaml and re-create.

Some background

When the pod starts, it can specify an fsGroup as part of the securityContext. Kubernetes will go ahead and apply this group ID to all files/folders on the volume (chown/chmod) and add the group as a supplemental group to the user that the app runs with. This ensure that permissions are correctly set.

As this is not possible in all scenarios (based on what storage is used), Kubernetes tries to detect if this should run or not. One of the indicators is the fsType. If that exists Kubernetes assumes that it will be able to chown/chmod. In older releases that was set to ext4 by default. This has changed to "" with more recent versions of the CSI sidecar containers that are included in Trident 20.07.1 and higher. So on old versions if you didn't specify fstype in your storage class, the default "ext4" was used and everything worked. In more recent versions there is an empty default, so if you don't have it in your storage class this will be blank - causing Kubernetes to believe there is no fsType and skipping the permission step.

https://netapp-trident.readthedocs.io/en/stable-v21.01/kubernetes/known-issues.html?highlight=fstype#known-issues

https://netapp-trident.readthedocs.io/en/stable-v21.01/kubernetes/operations/tasks/storage-classes.html?highlight=fstype#deleting-a-storage-class

I deleted and recreated the iscsi storage class adding the parameter 'fsType: ext4':

$  oc get sc/iscsi -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2021-03-26T12:34:52Z"
  managedFields:
  - apiVersion: storage.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:allowVolumeExpansion: {}
      f:mountOptions: {}
      f:parameters:
        .: {}
        f:backendType: {}
        f:fsType: {}
        f:storagePools: {}
      f:provisioner: {}
      f:reclaimPolicy: {}
      f:volumeBindingMode: {}
    manager: oc
    operation: Update
    time: "2021-03-26T12:34:52Z"
  name: iscsi
  resourceVersion: "4857712"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/iscsi
  uid: 55988799-c11e-4efa-a462-d7fa42aab7d2
mountOptions:
- discard
parameters:
  backendType: ontap-san
  fsType: ext4   <======
  storagePools: ontap_san_ci00150164:n1_data_vmdk_01,n2_data_vmdk_01
provisioner: csi.trident.netapp.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

I restarted one of the Prometheus and elasticsearch pods afterwards. But the issue remains.

$  oc logs -f prometheus-k8s-0 -c prometheus
level=info ts=2021-03-26T12:36:09.589Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.2, branch=rhaos-4.5-rhel-7, revision=c3b41963fbe48114e54396ac05b56b02cb3e4a0a)"
level=info ts=2021-03-26T12:36:09.589Z caller=main.go:331 build_context="(go=go1.13.4, user=root@3d590aab9ed6, date=20200810-04:36:52)"
level=info ts=2021-03-26T12:36:09.589Z caller=main.go:332 host_details="(Linux 4.18.0-193.14.3.el8_2.x86_64 #1 SMP Mon Jul 20 15:02:29 UTC 2020 x86_64 prometheus-k8s-0 (none))"
level=info ts=2021-03-26T12:36:09.589Z caller=main.go:333 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2021-03-26T12:36:09.589Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2021-03-26T12:36:09.589Z caller=query_logger.go:85 component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied"
panic: Unable to create mmap-ed active query log

$  oc debug pod/prometheus-k8s-0
Defaulting container name to prometheus.
Use 'oc describe pod/prometheus-k8s-0-debug -n openshift-monitoring' to see all of the containers in this pod.

Starting pod/prometheus-k8s-0-debug ...
Pod IP: 10.131.0.128
If you don't see a command prompt, try pressing enter.
sh-4.2$ df -hT .
Filesystem                                    Type  Size  Used Avail Use% Mounted on
/dev/mapper/3600a098056303030303f4f77477a3276 ext4  9.8G   37M  9.3G   1% /prometheus
sh-4.2$ ls -ld /prometheus
drwxr-xr-x. 2 root root 4096 Mar 24 15:46 /prometheus
sh-4.2$ touch /prometheus/test
touch: cannot touch '/prometheus/test': Permission denied
sh-4.2$ id
uid=1000420000(1000420000) gid=0(root) groups=0(root),1000420000

Did I miss one step?

Sorry I wasn't clear about this. The parameters of the storage class are only applied to new volumes you create. Any existing PV will not get this set. I'm afraid you cannot modifying this for an already existing PV. Could you please try with an new PVC/PV?

Hello, that solves the issue! Thanks a lot!

After adding fsType: ext4 to the iscsi storage class configuration, I had to delete the PVCs that belong to the failing prometheus pods. Then delete the failing pods. They will be started automatically again but stay in Pending state until I recreated the PVCs again.

Same for the elasticsearch pods. Except that I didn't had to recreate the PVCs. They were created automatically.

I verified the permissions of the iSCSI file systems inside the pods. They are no longer owned by root:root now.
All pods are in Running state and monitoring clusteroperator is no longer degraded.

@timvandevoort this is related to how Kubernetes applies fsGroups. As can be seen here: https://github.com/kubernetes/kubernetes/blob/f137c4777095b3972e2dd71a01365d47be459389/pkg/volume/csi/csi_mounter.go#L415

if fsGroup == nil || driverPolicy == storage.NoneFSGroupPolicy || c.readOnly {
        return false
    }

As long as your StorageClass has a parameters.fsType defined, you can apply fsGroups through Security Context.

I am marking this as issue as "Closed". Thanks for using Trident!

Was this page helpful?
0 / 5 - 0 ratings