k8s报错小结
# K8S使用过程遇到的问题
# 1、The connection to the server localhost:8080 was refused - did you specify the right host or port?
问题分析
环境变量 原因:kubernetes master没有与本机绑定,集群初始化的时候没有绑定,此时设置在本机的环境变量即可解决问题。
解决: 步骤一:设置环境变量
##具体根据情况,此处记录linux设置该环境变量
ll/etc/kubernetes/kubelet.conf
-rw------- 1 root root 1906 1月 15 09:52 /etc/kubernetes/kubelet.conf
##方式一:编辑文件设置
vim /etc/profile
>>在底部增加新的环境变量 export KUBECONFIG=/etc/kubernetes/kubelet.conf
##方式二:直接追加文件内容
echo "export KUBECONFIG=/etc/kubernetes/kubelet.conf" >> /etc/profile
2
3
4
5
6
7
8
9
10
步骤二:使生效
source /etc/profile
AWS EKS 解决方法:
aws eks update-kubeconfig --region cn-north-1 --name <cluster_name>
# 2、INSTALLATION FAILED: cannot re-use a name that is still in use
#执行helm安装时报错 Error: INSTALLATION FAILED: cannot re-use a name that is still in use
解决:
helm ls --all-namespaces
kubectl delete namespace qsh-test
kubectl create namespace qsh-test
2
3
# 3、Pod无法删除
每当删除namespace或pod 等一些Kubernetes资源时,有时资源状态会卡在terminating,很长时间无法删除,甚至有时增加–force flag(强制删除)之后还是无法正常删除。这时就需要edit该资源,将字段finalizers设置为null,之后Kubernetes资源就正常删除了。
当删除pod时有时会卡住,pod状态变为terminating,无法删除pod
强制删除
kubectl delete pod xxx -n xxx --force --grace-period=0
如果强制删除还不行,设置finalizers为空 (如果一个容器已经在运行,这时需要对一些容器属性进行修改,又不想删除容器,或不方便通过replace的方式进行更新。kubernetes还提供了一种在容器运行时,直接对容器进行修改的方式,就是patch命令。)
kubectl patch pod xxx -n xxx -p '{"metadata":{"finalizers":null}}'
这样pod就可以删除了。
# 4、Namespace无法删除
unable to create new content in namespace posthog because it is being terminated
现象:
##命名空间一直处于Terminating状态
[ec2-user@eks posthog]$ kubectl get ns -owide
NAME STATUS AGE
default Active 3d
kube-node-lease Active 3d
kube-public Active 3d
kube-system Active 3d
posthog Terminating 3h23m
##执行强制删除命令会一直卡住
[ec2-user@eks posthog]$ kubectl delete ns posthog --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
\namespace "posthog" force deleted
2
3
4
5
6
7
8
9
10
11
12
13
解决:
##查看posthog的命名空间描述
kubectl get ns posthog -o son > ns-posthog.json
##删除spec
###删除前内容如下:
"spec": {
"finalizers": [
"kubernetes"
]
},
###删除后内容如下:
"spec": {
},
##打开一个新窗口运行kubectl proxy跑一个API代理在本地的8081端口
kubectl proxy --port=8081
##curl删除
curl -k -H "Content-Type:application/json" -X PUT --data-binary @ns-posthog.json http://127.0.0.1:8081/api/v1/namespaces/posthog/finalize
##重新检查,发现已删除
kubectl get ns
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 5、PV无法删除
K8s 集群内有一个已经不再使用的 PV,虽然已经删除了与其关联的 Pod 及 PVC,并对其执行了删除命令,但仍无法正常删除,一直处于 Terminating 状态:
解决方法:
##执行如下命令强制删除(efs-pv 替换成实际需要删除的 pv 名称):
kubectl patch pv efs-pv -p '{"metadata":{"finalizers":null}}'
##再次查看可以发现该 pv 已被删除
2
3
4
# 6、创建nginx-ingress-controller 出错
Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: IngressClass “nginx” in namespace “” exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key “meta.helm.sh/release-name”: must be set to “k8s-nginx”; annotation validation error: missing key “meta.helm.sh/release-namespace”: must be set to “nginx-ingress-controller” helm.go:84: [debug] IngressClass “nginx” in namespace “” exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key “meta.helm.sh/release-name”: must be set to “k8s-nginx”; annotation validation error: missing key “meta.helm.sh/release-namespace”: must be set to “nginx-ingress-controller” rendered manifests contain a resource that already exists. Unable to continue with install
原因分析: 使用 helm 创建nginx-ingress-controller时出错
查看helm chart仓库values.yaml文件
#... ... ...
##查看以下字段
ingressClassResource:
name: nginx
enabled: true
default: false
controllerClass: "k8s.io/ingress-nginx"
parameters: {}
#... ... ...
2
3
4
5
6
7
8
9
解决方法:
helm install k8s-nginx mynginx/nginx-ingress-controller -n nginx-ingress-controller --create-namespace --set ingressClassResource.name="nginx-new"
如果没生效,使用以下命令:
helm install k8s-nginx mynginx/nginx-ingress-controller -n nginx-ingress-controller --create-namespace --set controller.ingressClassResource.name="nginx-new"
# etcd集群部署遇到的问题
# 1、etcd.serverice启动报错,显示–logger=zap有问题
解决方法: 修改配置文件,去掉该参数,重新启动服务
# 2、publish error: etcdserver: request timed out,由于etcd集群没有同时启动导致
解决方法: 在部署了etcd的节点上,同时启动etcd服务systemctl start etcd
# 3、error #1: dial tcp 127.0.0.1:2379: connect: connection refused,由于参数ETCD_LISTEN_CLIENT_URLS没有将172.0.0.1:2379包含在内
解决方法: ETCD_LISTEN_CLIENT_URLS添加https://172.0.0.1:2379或者直接改成0.0.0.0:2379
# 4、error #1: dial tcp 127.0.0.1:4001: connect: connection refused,由于低版本的peer的监听端口是否4001
解决方法: ETCD_LISTEN_CLIENT_URLS添加https://172.0.0.1:2379或者直接改成0.0.0.0:2379
# 5、error #1: net/http: HTTP/1.x transport connection broken: malformed HTTP response “\x15\x03\x01\x00\x02\x02”,由于配置信息监听地址写成了http://
解决方法: 将监听地址改成https://
# kube-apiserver.service 遇到的错误
# 1、error: unable to find suitable network address.error=‘no default routes found in “/proc/net/route” or “/proc/net/ipv6_route”’. Tr… to fix this,由于没有配置网关路由问题
解决方法:
route add default gw 172.16.0.1
# 2、error: --etcd-servers must be specified
解决:sudo journalctl -xe -u kube-apiserver | more通过查看更多错误信息,除了error: --etcd-servers must be specified错误提示外无其他错误信息,通过手动执行system unit检查是否配置有误,手动能正常启动,说明配置文件可能存在字符错误,重新写入配置后,启动正常
# 3、watch chan error: etcdserver: mvcc: required revision has been compacted,由于etcd的版本问题导致的,不影响功能的使用
解决方法: 可以安装对应版本的etcd
# kubelet和kube-proxy 部署遇到的错误
# 1、failed complete: v1alpha1.KubeProxyConfiguration.ClientConnection
failed complete: v1alpha1.KubeProxyConfiguration.ClientConnection: readObjectStart: expect { or n, but found “, error found in #10 byte of …|nection”:“kubeconfig|…, bigger context …|pha1”,“bindAddress”:“0.0.0.0”,“clientConnection”:"kubeconfig:/data/kubernetes/cfg/kube-proxy.kubecon|…
解决方法: 检查yml文件格式是否正确,yml配置文件遇到":“或者”-"后面必须留一个空格!
# 2、network plugin is not ready: cni config uninitialized
Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized,由于没有插件cni
解决方法: 修改kubelet.conf配置文件去掉相关配置参数–network-plugin=cni,重启服务即可或者下在安装cni插件
# 3、failed to get imageFs info: unable to find data in memory cache
在错误日志中发现:E0927 15:38:12.475997 16586 kubelet.go:1308] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache
解决方法:
yum -y upgrade systemd
# 4、failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to f…
解决方法: node节点上没有关闭交换分区,临时关闭的swapoff -a,最好就是永久关闭