-
Notifications
You must be signed in to change notification settings - Fork 523
Closed
Labels
Description
@nammn Hello, I am not able to add arbiter to 2 member replicaset. (Same for 3 member replicaset). After adding, the pod is created however operator throws error about not reaching the goal state. After some time replicaset members go down as readiness probe is failed.
What did you do to encounter the bug?
Steps to reproduce the behavior:
- Changed my CR MongoDB cluster yaml manifest database.yaml spec.arbiters: 0 to spec.arbiters: 1
- Use kubectl apply -f database.yaml
- Checked MongoDB community operator logs. Observed the debug log stating that none of pods reached goal state.
What did you expect?
Arbiter to be added to 2 member replicaset. Both 2 members and 1 arbiter to reach goal state.
What happened instead?
Neither members or arbiter could reach goal state. Replicaset stuck in crashloop. Working members become failed after not reaching goal state for some time, too.
Operator Information
- 0.10.0
- docker.io/mongo:6.0.17
Kubernetes Cluster Information
- AWS EKS
- 1.28
If possible, please include:
- The operator logs
2024-09-09T11:35:35.491Z DEBUG scram/scram.go:102 Credentials have not changed, using credentials stored in: secret/dms-user-scram-scram-credentials
2024-09-09T11:35:35.492Z DEBUG agent/agent_readiness.go:111 The Agent in the Pod 'mongodb-0' hasn't reached the goal state yet (goal: 30, agent: 29) {"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z DEBUG agent/agent_readiness.go:111 The Agent in the Pod 'mongodb-1' hasn't reached the goal state yet (goal: 30, agent: 29) {"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z DEBUG agent/agent_readiness.go:111 The Agent in the Pod 'mongodb-2' hasn't reached the goal state yet (goal: 30, agent: 29) {"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z DEBUG agent/agent_readiness.go:111 The Agent in the Pod 'mongodb-arb-0' hasn't reached the goal state yet (goal: 30, agent: -1) {"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z DEBUG agent/replica_set_port_manager.go:122 No port change required {"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z DEBUG agent/replica_set_port_manager.go:40 Calculated process port map: map[mongodb-0:27017 mongodb-1:27017 mongodb-2:27017 mongodb-arb-0:27017] {"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z DEBUG controllers/replica_set_controller.go:505 AutomationConfigMembersThisReconciliation {"mdb.AutomationConfigMembersThisReconciliation()": 3}
2024-09-09T11:35:35.492Z DEBUG controllers/replica_set_controller.go:358 Waiting for agents to reach version 30 {"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z DEBUG agent/agent_readiness.go:111 The Agent in the Pod 'mongodb-0' hasn't reached the goal state yet (goal: 30, agent: 29) {"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z DEBUG agent/agent_readiness.go:111 The Agent in the Pod 'mongodb-1' hasn't reached the goal state yet (goal: 30, agent: 29) {"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z DEBUG agent/agent_readiness.go:111 The Agent in the Pod 'mongodb-2' hasn't reached the goal state yet (goal: 30, agent: 29) {"ReplicaSet": "mongodb-surplus/mongodb"}
2024-09-09T11:35:35.492Z INFO controllers/mongodb_status_options.go:110 ReplicaSet is not yet ready, retrying in 10 seconds
- Below we assume that your replicaset database pods are named
mongo-<>
. For instance:
❯ k get pods
NAME READY STATUS RESTARTS AGE
mongo-0 2/2 Running 0 19h
mongo-1 2/2 Running 0 19h
❯ k get mdbc
NAME PHASE VERSION
mongo Running 4.4.0
- yaml definitions of your MongoDB Deployment(s):
kubectl get mdbc -oyaml
apiVersion: v1
items:
- apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
annotations:
mongodb.com/v1.lastAppliedMongoDBVersion: 6.0.17
creationTimestamp: "2024-01-03T07:47:03Z"
generation: 48
labels:
k8slens-edit-resource-version: v1
name: mongodb
namespace: mongodb-<SENSITIVE>
resourceVersion: "391080428"
uid: 8dbc92a1-061b-4ebb-a2be-d1b5dd6d696b
spec:
additionalMongodConfig:
storage.wiredTiger.engineConfig.journalCompressor: zlib
arbiters: 1
members: 3
security:
authentication:
ignoreUnknownUsers: true
modes:
- SCRAM
statefulSet:
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: NodeGroup
operator: In
values:
- <SENSITIVE>
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- mongodb-<SENSITIVE>
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- name: mongod
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
volumeClaimTemplates:
- metadata:
name: data-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 70G
storageClassName: ebs-sc
- metadata:
name: logs-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10G
storageClassName: ebs-sc
type: ReplicaSet
version: 6.0.17
status:
currentMongoDBMembers: 3
currentStatefulSetReplicas: 3
message: ReplicaSet is not yet ready, retrying in 10 seconds
mongoUri: mongodb://mongodb-0.mongodb-svc.mongodb-surplus.svc.cluster.local:27017,mongodb-1.mongodb-svc.mongodb-surplus.svc.cluster.local:27017,mongodb-2.mongodb-svc.mongodb-surplus.svc.cluster.local:27017/?replicaSet=mongodb
phase: Pending
version: 6.0.17
kind: List
metadata:
resourceVersion: ""
- yaml definitions of your kubernetes objects like the statefulset(s), pods (we need to see the state of the containers):
kubectl get sts -oyaml
apiVersion: v1
items:
- apiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: "2024-09-06T13:47:18Z"
generation: 4
name: mongodb
namespace: mongodb-XXX
ownerReferences:
- apiVersion: mongodbcommunity.mongodb.com/v1
blockOwnerDeletion: true
controller: true
kind: MongoDBCommunity
name: mongodb
uid: 8dbc92a1-061b-4ebb-a2be-d1b5dd6d696b
resourceVersion: "391084063"
uid: 25fa6a25-5016-4a16-af39-8c6907338a49
spec:
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Retain
podManagementPolicy: OrderedReady
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
app: mongodb-XXX
serviceName: mongodb-XXX
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: "2024-09-09T07:49:13Z"
creationTimestamp: null
labels:
app: mongodb-XXX
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: NodeGroup
operator: In
values:
- XXX
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- mongodb-XXX
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- args:
- ""
command:
- /bin/sh
- -c
- "\nif [ -e \"/hooks/version-upgrade\" ]; then\n\t#run post-start hook
to handle version changes (if exists)\n /hooks/version-upgrade\nfi\n\n#
wait for config and keyfile to be created by the agent\nwhile ! [ -f /data/automation-mongod.conf
-a -f /var/lib/mongodb-mms-automation/authentication/keyfile ]; do sleep
3 ; done ; sleep 2 ;\n\n# start mongod with this configuration\nexec mongod
-f /data/automation-mongod.conf;\n\n"
env:
- name: AGENT_STATUS_FILEPATH
value: /healthstatus/agent-health-status.json
image: docker.io/mongo:6.0.17
imagePullPolicy: IfNotPresent
name: mongod
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: data-volume
- mountPath: /healthstatus
name: healthstatus
- mountPath: /hooks
name: hooks
- mountPath: /var/log/mongodb-mms-automation
name: logs-volume
- mountPath: /var/lib/mongodb-mms-automation/authentication
name: mongodb-keyfile
- mountPath: /tmp
name: tmp
- command:
- /bin/bash
- -c
- |-
current_uid=$(id -u)
declare -r current_uid
if ! grep -q "${current_uid}" /etc/passwd ; then
sed -e "s/^mongodb:/builder:/" /etc/passwd > /tmp/passwd
echo "mongodb:x:$(id -u):$(id -g):,,,:/:/bin/bash" >> /tmp/passwd
export NSS_WRAPPER_PASSWD=/tmp/passwd
export LD_PRELOAD=libnss_wrapper.so
export NSS_WRAPPER_GROUP=/etc/group
fi
agent/mongodb-agent -healthCheckFilePath=/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json -serveStatusPort=5000 -cluster=/var/lib/automation/config/cluster-config.json -skipMongoStart -noDaemonize -useLocalMongoDbTools -logFile /var/log/mongodb-mms-automation/automation-agent.log -logLevel INFO -maxLogFileDurationHrs 24
env:
- name: AGENT_STATUS_FILEPATH
value: /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
- name: AUTOMATION_CONFIG_MAP
value: mongodb-config
- name: HEADLESS_AGENT
value: "true"
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/mongodb/mongodb-agent:12.0.15.7646-1
imagePullPolicy: Always
name: mongodb-agent
readinessProbe:
exec:
command:
- /opt/scripts/readinessprobe
failureThreshold: 40
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 500M
requests:
cpu: 500m
memory: 400M
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/scripts
name: agent-scripts
- mountPath: /var/lib/automation/config
name: automation-config
readOnly: true
- mountPath: /data
name: data-volume
- mountPath: /var/log/mongodb-mms-automation/healthstatus
name: healthstatus
- mountPath: /var/log/mongodb-mms-automation
name: logs-volume
- mountPath: /var/lib/mongodb-mms-automation/authentication
name: mongodb-keyfile
- mountPath: /tmp
name: tmp
dnsPolicy: ClusterFirst
initContainers:
- command:
- cp
- version-upgrade-hook
- /hooks/version-upgrade
image: quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.6
imagePullPolicy: Always
name: mongod-posthook
resources:
limits:
cpu: "1"
memory: 500M
requests:
cpu: 500m
memory: 400M
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /hooks
name: hooks
- command:
- cp
- /probes/readinessprobe
- /opt/scripts/readinessprobe
image: quay.io/mongodb/mongodb-kubernetes-readinessprobe:1.0.12
imagePullPolicy: Always
name: mongodb-agent-readinessprobe
resources:
limits:
cpu: "1"
memory: 500M
requests:
cpu: 500m
memory: 400M
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/scripts
name: agent-scripts
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 2000
serviceAccount: mongodb-database
serviceAccountName: mongodb-database
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: agent-scripts
- name: automation-config
secret:
defaultMode: 416
secretName: mongodb-config
- emptyDir: {}
name: healthstatus
- emptyDir: {}
name: hooks
- emptyDir: {}
name: mongodb-keyfile
- emptyDir: {}
name: tmp
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
name: data-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 70G
storageClassName: ebs-sc
volumeMode: Filesystem
status:
phase: Pending
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
name: logs-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10G
storageClassName: ebs-sc
volumeMode: Filesystem
status:
phase: Pending
status:
availableReplicas: 0
collisionCount: 0
currentReplicas: 3
currentRevision: mongodb-6847cc6f7
observedGeneration: 4
replicas: 3
updateRevision: mongodb-6847cc6f7
updatedReplicas: 3
- apiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: "2024-09-06T13:47:10Z"
generation: 8
name: mongodb-arb
namespace: mongodb-XXX
ownerReferences:
- apiVersion: mongodbcommunity.mongodb.com/v1
blockOwnerDeletion: true
controller: true
kind: MongoDBCommunity
name: mongodb
uid: 8dbc92a1-061b-4ebb-a2be-d1b5dd6d696b
resourceVersion: "391081887"
uid: 1267641d-8cfa-4d13-ae06-a79c5255facc
spec:
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Retain
podManagementPolicy: OrderedReady
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: mongodb-XXX
serviceName: mongodb-XXX
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: "2024-09-09T07:45:41Z"
creationTimestamp: null
labels:
app: mongodb-XXX
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: NodeGroup
operator: In
values:
- XXX
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- mongodb-XXX
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- args:
- ""
command:
- /bin/sh
- -c
- "\nif [ -e \"/hooks/version-upgrade\" ]; then\n\t#run post-start hook
to handle version changes (if exists)\n /hooks/version-upgrade\nfi\n\n#
wait for config and keyfile to be created by the agent\nwhile ! [ -f /data/automation-mongod.conf
-a -f /var/lib/mongodb-mms-automation/authentication/keyfile ]; do sleep
3 ; done ; sleep 2 ;\n\n# start mongod with this configuration\nexec mongod
-f /data/automation-mongod.conf;\n\n"
env:
- name: AGENT_STATUS_FILEPATH
value: /healthstatus/agent-health-status.json
image: docker.io/mongo:6.0.17
imagePullPolicy: IfNotPresent
name: mongod
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: data-volume
- mountPath: /healthstatus
name: healthstatus
- mountPath: /hooks
name: hooks
- mountPath: /var/log/mongodb-mms-automation
name: logs-volume
- mountPath: /var/lib/mongodb-mms-automation/authentication
name: mongodb-keyfile
- mountPath: /tmp
name: tmp
- command:
- /bin/bash
- -c
- |-
current_uid=$(id -u)
declare -r current_uid
if ! grep -q "${current_uid}" /etc/passwd ; then
sed -e "s/^mongodb:/builder:/" /etc/passwd > /tmp/passwd
echo "mongodb:x:$(id -u):$(id -g):,,,:/:/bin/bash" >> /tmp/passwd
export NSS_WRAPPER_PASSWD=/tmp/passwd
export LD_PRELOAD=libnss_wrapper.so
export NSS_WRAPPER_GROUP=/etc/group
fi
agent/mongodb-agent -healthCheckFilePath=/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json -serveStatusPort=5000 -cluster=/var/lib/automation/config/cluster-config.json -skipMongoStart -noDaemonize -useLocalMongoDbTools -logFile /var/log/mongodb-mms-automation/automation-agent.log -logLevel INFO -maxLogFileDurationHrs 24
env:
- name: AGENT_STATUS_FILEPATH
value: /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
- name: AUTOMATION_CONFIG_MAP
value: mongodb-config
- name: HEADLESS_AGENT
value: "true"
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/mongodb/mongodb-agent:12.0.15.7646-1
imagePullPolicy: Always
name: mongodb-agent
readinessProbe:
exec:
command:
- /opt/scripts/readinessprobe
failureThreshold: 40
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 500M
requests:
cpu: 500m
memory: 400M
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/scripts
name: agent-scripts
- mountPath: /var/lib/automation/config
name: automation-config
readOnly: true
- mountPath: /data
name: data-volume
- mountPath: /var/log/mongodb-mms-automation/healthstatus
name: healthstatus
- mountPath: /var/log/mongodb-mms-automation
name: logs-volume
- mountPath: /var/lib/mongodb-mms-automation/authentication
name: mongodb-keyfile
- mountPath: /tmp
name: tmp
dnsPolicy: ClusterFirst
initContainers:
- command:
- cp
- version-upgrade-hook
- /hooks/version-upgrade
image: quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.6
imagePullPolicy: Always
name: mongod-posthook
resources:
limits:
cpu: "1"
memory: 500M
requests:
cpu: 500m
memory: 400M
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /hooks
name: hooks
- command:
- cp
- /probes/readinessprobe
- /opt/scripts/readinessprobe
image: quay.io/mongodb/mongodb-kubernetes-readinessprobe:1.0.12
imagePullPolicy: Always
name: mongodb-agent-readinessprobe
resources:
limits:
cpu: "1"
memory: 500M
requests:
cpu: 500m
memory: 400M
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/scripts
name: agent-scripts
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 2000
serviceAccount: XXX
serviceAccountName: XXX
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: agent-scripts
- name: automation-config
secret:
defaultMode: 416
secretName: mongodb-config
- emptyDir: {}
name: healthstatus
- emptyDir: {}
name: hooks
- emptyDir: {}
name: mongodb-keyfile
- emptyDir: {}
name: tmp
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
name: data-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 70G
storageClassName: ebs-sc
volumeMode: Filesystem
status:
phase: Pending
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
name: logs-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10G
storageClassName: ebs-sc
volumeMode: Filesystem
status:
phase: Pending
status:
availableReplicas: 1
collisionCount: 0
currentReplicas: 1
currentRevision: mongodb-arb-5f6bc75bb8
observedGeneration: 8
readyReplicas: 1
replicas: 1
updateRevision: mongodb-arb-5f6bc75bb8
updatedReplicas: 1
kind: List
metadata:
resourceVersion: ""
- The agent clusterconfig of the faulty members:
kubectl exec -it mongo-0 -c mongodb-agent -- cat /var/lib/automation/config/cluster-config.json
{"version":32,"processes":[{"name":"mongodb-0","disabled":false,"hostname":"mongodb-0.mongodb-svc.mongodb-xxx.svc.cluster.local","args2_6":{"net":{"port":27017},"repl
ication":{"replSetName":"mongodb"},"storage":{"dbPath":"/data","wiredTiger":{"engineConfig":{"journalCompressor":"zlib"}}}},"featureCompatibilityVersion":"6.0","processTy
pe":"mongod","version":"6.0.17","authSchemaVersion":5},{"name":"mongodb-1","disabled":false,"hostname":"mongodb-1.mongodb-svc.mongodb-xxx.svc.cluster.local","args2_6"
:{"net":{"port":27017},"replication":{"replSetName":"mongodb"},"storage":{"dbPath":"/data","wiredTiger":{"engineConfig":{"journalCompressor":"zlib"}}}},"featureCompatibil
ityVersion":"6.0","processType":"mongod","version":"6.0.17","authSchemaVersion":5},{"name":"mongodb-2","disabled":false,"hostname":"mongodb-2.mongodb-svc.mongodb-xxx.
svc.cluster.local","args2_6":{"net":{"port":27017},"replication":{"replSetName":"mongodb"},"storage":{"dbPath":"/data","wiredTiger":{"engineConfig":{"journalCompressor":"
zlib"}}}},"featureCompatibilityVersion":"6.0","processType":"mongod","version":"6.0.17","authSchemaVersion":5},{"name":"mongodb-arb-0","disabled":false,"hostname":"mongod
b-arb-0.mongodb-svc.mongodb-xxx.svc.cluster.local","args2_6":{"net":{"port":27017},"replication":{"replSetName":"mongodb"},"storage":{"dbPath":"/data","wiredTiger":{"
engineConfig":{"journalCompressor":"zlib"}}}},"featureCompatibilityVersion":"6.0","processType":"mongod","version":"6.0.17","authSchemaVersion":5}],"replicaSets":[{"_id":
"mongodb","members":[{"_id":0,"host":"mongodb-0","arbiterOnly":false,"votes":1,"priority":1},{"_id":1,"host":"mongodb-1","arbiterOnly":false,"votes":1,"priority":1},{"_id
":2,"host":"mongodb-2","arbiterOnly":false,"votes":1,"priority":1},{"_id":100,"host":"mongodb-arb-0","arbiterOnly":true,"votes":1,"priority":1}],"protocolVersion":"1","nu
mberArbiters":1}],"auth":{"usersWanted":[{"mechanisms":[],"roles":[{"role":"clusterAdmin","db":"admin"},{"role":"userAdminAnyDatabase","db":"admin"}],"user":"admin-user",
"db":"admin","authenticationRestrictions":[],"scramSha256Creds":{"iterationCount":15000,"salt":"WioeMJQXT8w9Coif5Fq3gqV1OfeDi64Bvq/maw==","serverKey":"iK8SJLwCzmk95+mUePC
3wrpGw29Tfx9vN+ZCKSMPKMM=","storedKey":"Z33GU9ix2W++nlnkFBIbYP7kEATZ/6sDVQqdhEd+tT0="},"scramSha1Creds":{"iterationCount":10000,"salt":"Q9mmbNXpyLDRtYmoln1xgA==","serverK
ey":"AmaNP+YmbrNf23l8URaZAZKKOz0=","storedKey":"0d8SscAfTMph+2aW416TXB1/UZw="}},{"mechanisms":[],"roles":[{"role":"readWrite","db":"xxx"}],"user":"xxx-prod-user","db":"ad
min","authenticationRestrictions":[],"scramSha256Creds":{"iterationCount":15000,"salt":"GOkTsWgdrct5KSSQtTHC20myEJM76v5OMEGXOA==","serverKey":"ksqF9YIWnI50+bQJhjl0/zA1a0H
0UpcNnzxnFEjciV4=","storedKey":"GyxjpwCp9hTHsK5CX2ObkIs73NP2zL1VrwbCQTLDvGE="},"scramSha1Creds":{"iterationCount":10000,"salt":"4RAcRyAxnRCQQhcHWRDA2w==","serverKey":"59K
Q8PQV/rS4zxSuVca/tQbDNWw=","storedKey":"v68O4bx8u7/RNIks1WBvmXIJ+H8="}},{"mechanisms":[],"roles":[{"role":"readWrite","db":"xxx"}],"user":"xxx-prod-user","db":"admin","au
thenticationRestrictions":[],"scramSha256Creds":{"iterationCount":15000,"salt":"vYX0jOTF0NvPPmpQm1oz/b7v1/sAnFOMWdm5Pg==","serverKey":"MWuSeUedUk33YD57g/pVw+kV89vQK8OmTib
RLl2hR0U=","storedKey":"ffXuxQ5HTf0FcH2FdcNvKigWSO/TgPdF0elXk9iYX3E="},"scramSha1Creds":{"iterationCount":10000,"salt":"zV90H0Z2XJ8sCiupCSK3PQ==","serverKey":"IwdxN4BVrGS
qyLPXDhrbZKFsbtc=","storedKey":"e5XxJTwdueUyUyJd3ioqQFuEKbc="}},{"mechanisms":[],"roles":[{"role":"readWrite","db":"xxx"},{"role":"readWrite","db":"xxx"}],"user":"xxx-
user","db":"admin","authenticationRestrictions":[],"scramSha256Creds":{"iterationCount":15000,"salt":"0HwrZJa9FIMCy5r4erCq7o2gb/RSaHofCV1XMw==","serverKey":"4FTg/fstci+8W
6BZE4jfyXpLJr9/f4zsuDiKrLnBcgg=","storedKey":"lThVu1E2tv14Q7H58DYYNK1jlqXaIZDCp/Omp44wR1A="},"scramSha1Creds":{"iterationCount":10000,"salt":"gDxOqOLC16/e/WvhWSGDdA==","s
erverKey":"q6SKd30cOY+PFQnqRFMpdgmNTFA=","storedKey":"9swpmpNkjLofRRRprZWGrCBoolk="}},{"mechanisms":[],"roles":[{"role":"dbAdmin","db":"admin"},{"role":"userAdminAnyDatab
ase","db":"admin"},{"role":"readWrite","db":"xxx"},{"role":"readWrite","db":"xxx"},{"role":"readWrite","db":"local"}],"user":"xxx-user","db":"admin","authenticationRe
strictions":[],"scramSha256Creds":{"iterationCount":15000,"salt":"m40BqXb1jbVT4NIN9NjAdbTdQp84O6KtEbRjgA==","serverKey":"hztnbCDXJs0zBcwtaJsLquRtEgCHDykKj04SaQ3eLn8=","st
oredKey":"cftguOpTPNr4QYvL2XtrRlybPzi96CgzGoZ91EVZg2g="},"scramSha1Creds":{"iterationCount":10000,"salt":"SofiWm+P4s3RwvvIxflOOQ==","serverKey":"73knk0VrQPm6PWSYM5PFYwUK1
lA=","storedKey":"wnYGbRIv2qPtcpv3j4r+lUX8x/4="}},{"mechanisms":[],"roles":[{"role":"read","db":"cps"},{"role":"read","db":"xxx"},{"role":"changestreamrole","db":"admin
"}],"user":"xxx-user","db":"admin","authenticationRestrictions":[],"scramSha256Creds":{"iterationCount":15000,"salt":"wzzV+4JFqwIGw1mAzRucb1oiIcYYR/gdcwZyJw==","serverKey
":"Ty105G/oxXhrv9UwgIqqXHO7ZxM5LYW9T/mta7uiQYo=","storedKey":"63E/7kQy2g4O/MUd0a62q8pQNBtITkJ74dUagrRESO4="},"scramSha1Creds":{"iterationCount":10000,"salt":"R/PWGnO94tyg
xdNjIavdkQ==","serverKey":"Gr8vtcLPyb/pR/h8GRivOrY6/sE=","storedKey":"kwWqu/dnDzNTDOEu03TcgeDWxPY="}}],"disabled":false,"authoritativeSet":false,"autoAuthMechanisms":["SC
RAM-SHA-256"],"autoAuthMechanism":"SCRAM-SHA-256","deploymentAuthMechanisms":["SCRAM-SHA-256"],"autoUser":"mms-automation","key":"8tQDoV1eZdKJvpc7cA8rtu939Glj1IgsL9CNE1nf
7SuZMFw8Te47PmhA9Z1NPi27cRw5+bs16kenEAPP82V7v51Xcv5Q9xPZKUxltKlc3t9cfq2Q7Il42DJsrjhQUhne5lKNghWLRHPSFVb8IHbuImgPcvu7mPz6VsYClu6Lno5ewW3ziIvilIW/2xvpxqG0qz4jvz5/cmtTWeNn7V
JzNOYwYurWdFfvdUDL+Z+kQqcbsa95SSYA8217h6aKE2guOwlVpK0VZBYCPACg+ID1dARawAHG7xCA92lFttymLfgu8kbUXeW6RxBsgwz5iuOXjiIrm8XpWhpWHLJNplf5YaGsqBMIbRlH3tAXGv6auqLaiGup3+kQXDJNwC7J
uaa5F0FGXg+PdQPMOH4xv2SZy0zGHh988CaEtXhBVWiQ06FhnNWyxziLCl8BGJpCbD2bsGiiUBcUHvxkCybARhguLYdnS60+tlJcMIr3rpt7MTgRuHhwki0gX1KcVmEe+tPeg57RdqcVcEEqqHYwc4Ghkk/PF/10BlsO0NiUZm
JxZqow7ffSRZHtZ/VKW2og6CZp2V3BaYZmzYwHn5XFFRCDNUu8mbwvtySQVSlVVY4GbKRkgepYsWrYGc20yPH7Hzni9b8N0zCmX8HPy5icn8+jf4z7BRw=","keyfile":"/var/lib/mongodb-mms-automation/authent
ication/keyfile","keyfileWindows":"%SystemDrive%\\MMSAutomation\\versions\\keyfile","autoPwd":"eX-rXNR2PB_ytwagyylk"},"tls":{"CAFilePath":"","clientCertificateMode":"OPTI
ONAL"},"mongoDbVersions":[{"name":"6.0.17","builds":[{"platform":"linux","url":"","gitVersion":"","architecture":"amd64","flavor":"rhel","minOsVersion":"","maxOsVersion":
"","modules":[]},{"platform":"linux","url":"","gitVersion":"","architecture":"amd64","flavor":"ubuntu","minOsVersion":"","maxOsVersion":"","modules":[]},{"platform":"linu
x","url":"","gitVersion":"","architecture":"aarch64","flavor":"ubuntu","minOsVersion":"","maxOsVersion":"","modules":[]},{"platform":"linux","url":"","gitVersion":"","arc
hitecture":"aarch64","flavor":"rhel","minOsVersion":"","maxOsVersion":"","modules":[]}]}],"backupVersions":[],"monitoringVersions":[],"options":{"downloadBase":"/var/lib/
mongodb-mms-automation"}}
- The agent health status of the faulty members:
kubectl exec -it mongo-0 -c mongodb-agent -- cat /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
{"statuses":{"mongodb-arb-0":{"IsInGoalState":false,"LastMongoUpTime":1725903583,"ExpectedToBeUp":true,"ReplicationStatus":-1}},"mmsStatus":{"mongodb-arb-0":{"name":"mong
odb-arb-0","lastGoalVersionAchieved":-1,"plans":[{"automationConfigVersion":32,"started":"2024-09-09T17:37:58.413084367Z","completed":null,"moves":[{"move":"Start","moveD
oc":"Start the process","steps":[{"step":"StartFresh","stepDoc":"Start a mongo instance (start fresh)","isWaitStep":false,"started":"2024-09-09T17:37:58.413103457Z","com
pleted":"2024-09-09T17:38:02.701441838Z","result":"success"}]},{"move":"WaitRsInit","moveDoc":"Wait for the replica set to be initialized by another member","steps":[{"st
ep":"WaitRsInit","stepDoc":"Wait for the replica set to be initialized by another member","isWaitStep":true,"started":"2024-09-09T17:38:02.701493829Z","completed":null,"r
esult":"wait"}]},{"move":"WaitFeatureCompatibilityVersionCorrect","moveDoc":"Wait for featureCompatibilityVersion to be right","steps":[{"step":"WaitFeatureCompatibilityV
ersionCorrect","stepDoc":"Wait for featureCompatibilityVersion to be right","isWaitStep":true,"started":null,"completed":null,"result":""}]}]}],"errorCode":0,"errorString
":""}}}
- The verbose agent logs of the faulty members:
kubectl exec -it mongo-0 -c mongodb-agent -- cat /var/log/mongodb-mms-automation/automation-agent-verbose.log
[2024-09-09T17:40:52.041+0000] [.info] [src/director/director.go:tracef:806] <mongodb-arb-0> [17:40:52.041] because
[All the following are true:
['currentState.Up' = true]
['currentState.CanRsInit' = false]
['desiredState.ReplSetConf' != <nil> ('desiredState.ReplSetConf' = ReplSetConfig{id=mongodb,version=0,commitmentStatus=false,configsvr=false,protocolVersion=1,forcePr
otocolVersion=false,writeConcernMajorityJournalDefault=,members={id:0,HostPort:mongodb-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:falsePriority:1,H
idden:false,SecondaryDelaySecs:0,Votes:1,Tags:map[]},{id:1,HostPort:mongodb-1.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:falsePriority:1,Hidden:false
,SecondaryDelaySecs:0,Votes:1,Tags:map[]},{id:2,HostPort:mongodb-2.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:falsePriority:1,Hidden:false,SecondaryD
elaySecs:0,Votes:1,Tags:map[]},{id:100,HostPort:mongodb-arb-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:truePriority:1,Hidden:false,SecondaryDelaySe
cs:0,Votes:1,Tags:map[]},settings=map[]})]
['currentState.ReplSetConf' = <nil>]
]
[2024-09-09T17:40:52.041+0000] [.info] [src/director/director.go:planAndExecute:575] <mongodb-arb-0> [17:40:52.041] Step=WaitRsInit as part of Move=WaitRsInit in plan fai
led : <mongodb-arb-0> [17:40:52.041] Postcondition not yet met for step WaitRsInit because
['currentState.ReplSetConf' = <nil>].
Recomputing a plan...
[2024-09-09T17:40:52.362+0000] [.warn] [metrics/collector/util.go:getPingStatus:84] <hardwareMetricsCollector> [17:40:52.362] Failed to fetch replStatus for mongodb-arb-0
: <hardwareMetricsCollector> [17:40:52.362] Error executing WithClientFor() for cp=mongodb-arb-0.mongodb-svc.mongodb-surplus.svc.cluster.local:27017 (local=false) connec
tMode=SingleConnect : <hardwareMetricsCollector> [17:40:52.362] Error running command for runCommandWithTimeout(dbName=admin, cmd=[{replSetGetStatus 1}]) : result={} iden
tityUsed=__system@local[[MONGODB-CR/SCRAM-SHA-1 SCRAM-SHA-256]][668] : (NotYetInitialized) no replset config has been received
[2024-09-09T17:40:52.678+0000] [.info] [src/config/config.go:ReadClusterConfig:433] [17:40:52.678] Retrieving cluster config from /var/lib/automation/config/cluster-confi
g.json...
[2024-09-09T17:40:52.678+0000] [.info] [main/components/agent.go:LoadClusterConfig:277] [17:40:52.678] clusterConfig unchanged
[2024-09-09T17:40:53.072+0000] [.info] [src/mongoclientservice/mongoclientservice.go:func1:1619] [17:40:53.072] Testing auth with username __system db=local to mongodb-ar
b-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017 (local=false) connectMode=SingleConnect ipversion=0 tls=false
[2024-09-09T17:40:53.081+0000] [.info] [src/mongoctl/processctl.go:GetKeyHashes:2080] <mongodb-arb-0> [17:40:53.081] Able to successfully auth to mongodb-arb-0.mongodb-sv
c.mongodb-xxx.svc.cluster.local:27017 (local=false) using desired auth key
[2024-09-09T17:40:53.108+0000] [.info] [src/mongoctl/processctl.go:Update:3555] <mongodb-arb-0> [17:40:53.108] <DB_WRITE> Updated with query map[] and update [{$set [{age
ntFeatures [StateCache]} {nextVersion 32}]}] and upsert=true on local.clustermanager
[2024-09-09T17:40:53.125+0000] [.info] [src/director/director.go:computePlan:278] <mongodb-arb-0> [17:40:53.125] ... process has a plan : WaitRsInit,WaitFeatureCompatibil
ityVersionCorrect
[2024-09-09T17:40:53.125+0000] [.info] [src/director/director.go:tracef:806] <mongodb-arb-0> [17:40:53.125] Running step: 'WaitRsInit' of move 'WaitRsInit'
[2024-09-09T17:40:53.125+0000] [.info] [src/director/director.go:tracef:806] <mongodb-arb-0> [17:40:53.125] because
[All the following are true:
['currentState.Up' = true]
['currentState.CanRsInit' = false]
['desiredState.ReplSetConf' != <nil> ('desiredState.ReplSetConf' = ReplSetConfig{id=mongodb,version=0,commitmentStatus=false,configsvr=false,protocolVersion=1,forcePr
otocolVersion=false,writeConcernMajorityJournalDefault=,members={id:0,HostPort:mongodb-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:falsePriority:1,H
idden:false,SecondaryDelaySecs:0,Votes:1,Tags:map[]},{id:1,HostPort:mongodb-1.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:falsePriority:1,Hidden:false
,SecondaryDelaySecs:0,Votes:1,Tags:map[]},{id:2,HostPort:mongodb-2.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:falsePriority:1,Hidden:false,SecondaryD
elaySecs:0,Votes:1,Tags:map[]},{id:100,HostPort:mongodb-arb-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017,ArbiterOnly:truePriority:1,Hidden:false,SecondaryDelaySe
cs:0,Votes:1,Tags:map[]},settings=map[]})]
['currentState.ReplSetConf' = <nil>]
]
[2024-09-09T17:40:53.125+0000] [.info] [src/director/director.go:planAndExecute:575] <mongodb-arb-0> [17:40:53.125] Step=WaitRsInit as part of Move=WaitRsInit in plan fai
led : <mongodb-arb-0> [17:40:53.125] Postcondition not yet met for step WaitRsInit because
['currentState.ReplSetConf' = <nil>].
Recomputing a plan...
[2024-09-09T17:40:53.364+0000] [.warn] [metrics/collector/util.go:getPingStatus:84] <hardwareMetricsCollector> [17:40:53.364] Failed to fetch replStatus for mongodb-arb-0
: <hardwareMetricsCollector> [17:40:53.364] Error executing WithClientFor() for cp=mongodb-arb-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017 (local=false) connec
tMode=SingleConnect : <hardwareMetricsCollector> [17:40:53.364] Error running command for runCommandWithTimeout(dbName=admin, cmd=[{replSetGetStatus 1}]) : result={} iden
tityUsed=__system@local[[MONGODB-CR/SCRAM-SHA-1 SCRAM-SHA-256]][668] : (NotYetInitialized) no replset config has been received
[2024-09-09T17:40:53.718+0000] [.info] [src/config/config.go:ReadClusterConfig:433] [17:40:53.718] Retrieving cluster config from /var/lib/automation/config/cluster-confi
g.json...
[2024-09-09T17:40:53.718+0000] [.info] [main/components/agent.go:LoadClusterConfig:277] [17:40:53.718] clusterConfig unchanged
[2024-09-09T17:40:54.157+0000] [.info] [src/mongoclientservice/mongoclientservice.go:func1:1619] [17:40:54.157] Testing auth with username __system db=local to mongodb-ar
b-0.mongodb-svc.mongodb-xxx.svc.cluster.local:27017 (local=false) connectMode=SingleConnect ipversion=0 tls=false
[2024-09-09T17:40:54.166+0000] [.info] [src/mongoctl/processctl.go:GetKeyHashes:2080] <mongodb-arb-0> [17:40:54.166] Able to successfully auth to mongodb-arb-0.mongodb-sv
c.mongodb-xxx.svc.cluster.local:27017 (local=false) using desired auth key
[2024-09-09T17:40:54.191+0000] [.info] [src/mongoctl/processctl.go:Update:3555] <mongodb-arb-0> [17:40:54.191] <DB_WRITE> Updated with query map[] and update [{$set [{age
ntFeatures [StateCache]} {nextVersion 32}]}] and upsert=true on local.clustermanager
[2024-09-09T17:40:54.203+0000] [.info] [src/director/director.go:computePlan:278] <mongodb-arb-0> [17:40:54.203] ... process has a plan : WaitRsInit,WaitFeatureCompatibil
ityVersionCorrect
- You might not have the verbose ones, in that case the non-verbose agent logs:
kubectl exec -it mongo-0 -c mongodb-agent -- cat /var/log/mongodb-mms-automation/automation-agent.log
[2024-09-09T17:37:58.248+0000] [header.info] [::0] GitCommitId = 25bb5320d7087c7aa24eb6118df217a028238723
[2024-09-09T17:37:58.248+0000] [header.info] [::0] AutomationVersion = 12.0.15.7646
[2024-09-09T17:37:58.248+0000] [header.info] [::0] localhost = mongodb-arb-0.mongodb-svc.mongodb-xxx.svc.cluster.local
[2024-09-09T17:37:58.249+0000] [header.info] [::0] ErrorStateSleepTime = 10s
[2024-09-09T17:37:58.249+0000] [header.info] [::0] GoalStateSleepTime = 10s
[2024-09-09T17:37:58.249+0000] [header.info] [::0] NotGoalStateSleepTime = 1s
[2024-09-09T17:37:58.249+0000] [header.info] [::0] PlanCutoffTime = 300000
[2024-09-09T17:37:58.249+0000] [header.info] [::0] TracePlanner = false
[2024-09-09T17:37:58.249+0000] [header.info] [::0] User = mongodb
[2024-09-09T17:37:58.249+0000] [header.info] [::0] Go version = go1.18.5
[2024-09-09T17:37:58.249+0000] [header.info] [::0] MmsBaseURL =
[2024-09-09T17:37:58.249+0000] [header.info] [::0] MmsGroupId =
[2024-09-09T17:37:58.249+0000] [header.info] [::0] HttpProxy =
[2024-09-09T17:37:58.249+0000] [header.info] [::0] DisableHttpKeepAlive = false
[2024-09-09T17:37:58.249+0000] [header.info] [::0] HttpsCAFile =
[2024-09-09T17:37:58.249+0000] [header.info] [::0] TlsRequireValidMMSServerCertificates = true
[2024-09-09T17:37:58.249+0000] [header.info] [::0] TlsMMSServerClientCertificate =
[2024-09-09T17:37:58.249+0000] [header.info] [::0] KMIPProxyCertificateDir = /tmp
[2024-09-09T17:37:58.249+0000] [header.info] [::0] EnableLocalConfigurationServer = false
[2024-09-09T17:37:58.249+0000] [header.info] [::0] DialTimeoutSeconds = 40
[2024-09-09T17:37:58.249+0000] [header.info] [::0] KeepUnusedMongodbVersions = false
[2024-09-09T17:37:58.249+0000] [header.info] [::0] DisallowDowngrades = false
[2024-09-09T17:37:59.378+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:37:59.378] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:37:59.479+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:37:59.479] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:38:00.430+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:38:00.430] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:38:00.531+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:38:00.531] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:38:01.461+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:38:01.461] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:38:01.569+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:38:01.569] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:38:02.385+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:38:02.385] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down
[2024-09-09T17:38:02.487+0000] [.error] [src/mongoctl/processctl.go:RunCommand:1120] <hardwareMetricsCollector> [17:38:02.487] Server at mongodb-arb-0.mongodb-svc.mongodb
-xxx.svc.cluster.local:27017 (local=false) is down