k8s报错小结

# K8S使用过程遇到的问题

# 1、The connection to the server localhost:8080 was refused - did you specify the right host or port?

问题分析

环境变量原因：kubernetes master没有与本机绑定，集群初始化的时候没有绑定，此时设置在本机的环境变量即可解决问题。

解决： 步骤一：设置环境变量

##具体根据情况，此处记录linux设置该环境变量
ll/etc/kubernetes/kubelet.conf
-rw------- 1 root root 1906 1月  15 09:52 /etc/kubernetes/kubelet.conf

##方式一：编辑文件设置
vim /etc/profile
>>在底部增加新的环境变量 export KUBECONFIG=/etc/kubernetes/kubelet.conf

##方式二:直接追加文件内容
echo "export KUBECONFIG=/etc/kubernetes/kubelet.conf" >> /etc/profile

1
2
3
4
5
6
7
8
9
10

步骤二：使生效

source /etc/profile

AWS EKS 解决方法：

aws eks update-kubeconfig --region cn-north-1 --name <cluster_name>

# 2、INSTALLATION FAILED: cannot re-use a name that is still in use

#执行helm安装时报错 Error: INSTALLATION FAILED: cannot re-use a name that is still in use

解决：

helm ls --all-namespaces
kubectl delete namespace qsh-test
kubectl create namespace qsh-test

1
2
3

# 3、Pod无法删除

每当删除namespace或pod 等一些Kubernetes资源时，有时资源状态会卡在terminating，很长时间无法删除，甚至有时增加–force flag(强制删除)之后还是无法正常删除。这时就需要edit该资源，将字段finalizers设置为null，之后Kubernetes资源就正常删除了。

当删除pod时有时会卡住，pod状态变为terminating，无法删除pod

强制删除

kubectl delete pod xxx -n xxx --force --grace-period=0

如果强制删除还不行，设置finalizers为空 （如果一个容器已经在运行，这时需要对一些容器属性进行修改，又不想删除容器，或不方便通过replace的方式进行更新。kubernetes还提供了一种在容器运行时，直接对容器进行修改的方式，就是patch命令。）

kubectl patch pod xxx -n xxx -p '{"metadata":{"finalizers":null}}'

这样pod就可以删除了。

# 4、Namespace无法删除

unable to create new content in namespace posthog because it is being terminated

现象：

##命名空间一直处于Terminating状态
[ec2-user@eks posthog]$ kubectl get ns -owide
NAME                       STATUS        AGE
default                    Active        3d
kube-node-lease            Active        3d
kube-public                Active        3d
kube-system                Active        3d
posthog                    Terminating   3h23m

##执行强制删除命令会一直卡住
[ec2-user@eks posthog]$ kubectl delete ns posthog --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
\namespace "posthog" force deleted

1
2
3
4
5
6
7
8
9
10
11
12
13

解决：

##查看posthog的命名空间描述
kubectl get ns posthog -o son > ns-posthog.json

##删除spec
###删除前内容如下：
    "spec": {
        "finalizers": [
            "kubernetes"
        ]
    },

###删除后内容如下：
    "spec": {
    },

##打开一个新窗口运行kubectl proxy跑一个API代理在本地的8081端口
kubectl proxy --port=8081

##curl删除
curl -k -H "Content-Type:application/json" -X PUT --data-binary @ns-posthog.json http://127.0.0.1:8081/api/v1/namespaces/posthog/finalize

##重新检查，发现已删除
kubectl get ns

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

# 5、PV无法删除

K8s 集群内有一个已经不再使用的 PV，虽然已经删除了与其关联的 Pod 及 PVC，并对其执行了删除命令，但仍无法正常删除，一直处于 Terminating 状态：

解决方法：

##执行如下命令强制删除（efs-pv 替换成实际需要删除的 pv 名称）：
kubectl patch pv efs-pv -p '{"metadata":{"finalizers":null}}'

##再次查看可以发现该 pv 已被删除

1
2
3
4

# 6、创建nginx-ingress-controller 出错

Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: IngressClass “nginx” in namespace “” exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key “meta.helm.sh/release-name”: must be set to “k8s-nginx”; annotation validation error: missing key “meta.helm.sh/release-namespace”: must be set to “nginx-ingress-controller” helm.go:84: [debug] IngressClass “nginx” in namespace “” exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key “meta.helm.sh/release-name”: must be set to “k8s-nginx”; annotation validation error: missing key “meta.helm.sh/release-namespace”: must be set to “nginx-ingress-controller” rendered manifests contain a resource that already exists. Unable to continue with install

原因分析：使用 helm 创建nginx-ingress-controller时出错

查看helm chart仓库values.yaml文件

#... ... ...
##查看以下字段
ingressClassResource:
  name: nginx
  enabled: true
  default: false
  controllerClass: "k8s.io/ingress-nginx"
  parameters: {}
#... ... ...

1
2
3
4
5
6
7
8
9

解决方法：

helm install k8s-nginx mynginx/nginx-ingress-controller -n nginx-ingress-controller --create-namespace --set ingressClassResource.name="nginx-new"

如果没生效，使用以下命令：

helm install k8s-nginx mynginx/nginx-ingress-controller -n nginx-ingress-controller --create-namespace --set controller.ingressClassResource.name="nginx-new"

# etcd集群部署遇到的问题

# 1、etcd.serverice启动报错，显示–logger=zap有问题

解决方法：修改配置文件，去掉该参数，重新启动服务

# 2、publish error: etcdserver: request timed out，由于etcd集群没有同时启动导致

解决方法：在部署了etcd的节点上，同时启动etcd服务systemctl start etcd

# 3、error #1: dial tcp 127.0.0.1:2379: connect: connection refused，由于参数ETCD_LISTEN_CLIENT_URLS没有将172.0.0.1:2379包含在内

解决方法： ETCD_LISTEN_CLIENT_URLS添加https://172.0.0.1:2379或者直接改成0.0.0.0:2379

# 4、error #1: dial tcp 127.0.0.1:4001: connect: connection refused，由于低版本的peer的监听端口是否4001

解决方法： ETCD_LISTEN_CLIENT_URLS添加https://172.0.0.1:2379或者直接改成0.0.0.0:2379

# 5、error #1: net/http: HTTP/1.x transport connection broken: malformed HTTP response “\x15\x03\x01\x00\x02\x02”，由于配置信息监听地址写成了http://

解决方法：将监听地址改成https://

# kube-apiserver.service 遇到的错误

# 1、error: unable to find suitable network address.error=‘no default routes found in “/proc/net/route” or “/proc/net/ipv6_route”’. Tr… to fix this，由于没有配置网关路由问题

解决方法：

route add default gw 172.16.0.1

# 2、error: --etcd-servers must be specified

解决:sudo journalctl -xe -u kube-apiserver | more通过查看更多错误信息，除了error: --etcd-servers must be specified错误提示外无其他错误信息，通过手动执行system unit检查是否配置有误，手动能正常启动，说明配置文件可能存在字符错误，重新写入配置后，启动正常

# 3、watch chan error: etcdserver: mvcc: required revision has been compacted，由于etcd的版本问题导致的，不影响功能的使用

解决方法：可以安装对应版本的etcd

# kubelet和kube-proxy 部署遇到的错误

# 1、failed complete: v1alpha1.KubeProxyConfiguration.ClientConnection

failed complete: v1alpha1.KubeProxyConfiguration.ClientConnection: readObjectStart: expect { or n, but found “, error found in #10 byte of …|nection”:“kubeconfig|…, bigger context …|pha1”,“bindAddress”:“0.0.0.0”,“clientConnection”:"kubeconfig:/data/kubernetes/cfg/kube-proxy.kubecon|…

解决方法：检查yml文件格式是否正确，yml配置文件遇到":“或者”-"后面必须留一个空格！

# 2、network plugin is not ready: cni config uninitialized

Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized，由于没有插件cni

解决方法：修改kubelet.conf配置文件去掉相关配置参数–network-plugin=cni，重启服务即可或者下在安装cni插件

# 3、failed to get imageFs info: unable to find data in memory cache

在错误日志中发现：E0927 15:38:12.475997 16586 kubelet.go:1308] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache

解决方法：

yum -y upgrade systemd

# 4、failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to f…

解决方法： node节点上没有关闭交换分区，临时关闭的swapoff -a，最好就是永久关闭

原文链接：Kubernetes 报错小结_奔跑在路上的技术博客_51CTO博客 (opens new window)

上次更新: 2026/01/26, 22:03:21

← kubernetes（k8s）yaml文件详解 Kubernetes 安装配置ingress controller→