蛮子哥 蛮子哥
首页
  • linux
  • windows
  • 中间件
  • 监控
  • 网络
  • 存储
  • 安全
  • 防火墙
  • 数据库
  • 系统
  • docker
  • 运维工具
  • other
  • elk
  • K8S
  • ansible
  • Jenkins
  • GitLabCI_CD
  • ArgoCD
  • 随笔
  • 面试
  • 工具
  • 收藏夹
  • Shell
  • python
  • golang
友链
  • 索引

    • 分类
    • 标签
    • 归档
    • 首页 (opens new window)
    • 关于我 (opens new window)
    • 图床 (opens new window)
    • 评论 (opens new window)
    • 导航栏 (opens new window)
周刊
GitHub (opens new window)

蛮子哥

业精于勤,荒于嬉
首页
  • linux
  • windows
  • 中间件
  • 监控
  • 网络
  • 存储
  • 安全
  • 防火墙
  • 数据库
  • 系统
  • docker
  • 运维工具
  • other
  • elk
  • K8S
  • ansible
  • Jenkins
  • GitLabCI_CD
  • ArgoCD
  • 随笔
  • 面试
  • 工具
  • 收藏夹
  • Shell
  • python
  • golang
友链
  • 索引

    • 分类
    • 标签
    • 归档
    • 首页 (opens new window)
    • 关于我 (opens new window)
    • 图床 (opens new window)
    • 评论 (opens new window)
    • 导航栏 (opens new window)
周刊
GitHub (opens new window)
  • ansible系列文章

  • Kubernetes笔记

    • 安装篇-kubeadm
    • k8s入门
    • k8s安装篇二进制
    • k8s面试题
    • kubernetes(k8s)yaml文件详解
    • k8s报错小结
    • Kubernetes 安装配置ingress controller
    • cka考试真题
    • ingress配置证书
    • cka考试作业
    • k8s部署java项目
    • jenkins脚本式流水线部署k8s项目实例一
    • helm v3安装并创建例子
    • 使用helm将本地部署文件上传到harbor chart上
    • helm公共仓库创建
    • helm适应minio作为私有仓库
    • helm release使用说明
    • kubernetes核心概念
    • kubectl使用技巧
    • kubernetes卷的几种类型
    • kubernetes安全框架
    • 云原生-什么是HPA和PDB、VPA
    • k8s部署php项目示例
    • 配置kubeconfig 文件访问 Kubernetes 集群
    • configmap配置的几种方式
    • k8s部署go服务一
    • k8s部署java项目
    • kubernetes部署prometheus监控
    • kubernetes部署elk日志系统
    • kubernetes环境devops流水线
    • kubernetes高阶技能必备的工具
    • deployment中使用configmap、secret的方式
    • 业务pod 飘移pending排查分析
    • debian 12安装kubernetes
    • istio入门
    • kubernetes证书续签到100年
    • kubernetes网络模式
    • etcd的备份和还原
    • Kubernetes 安装和配置 NFS 存储卷
    • VictoriaLogs集群采集Kubernetes Pod日志
    • 解决容器时区问题
    • 日志采集操作示例
    • operator部署VictoriaMetrics
      • VictoriaMetrics 架构概览
      • 安装 operator
      • 安装 VMSorage, VMSelect 与 VMInsert
      • 安装 VMAlertmanager 与 VMAlert
      • 安装 VMAgent
      • 配置 Grafana
        • 添加数据源
        • 添加 Dashboard
        • 1. 创建 VMRule 添加告警规则
        • 2. 验证是否生效
        • 最佳实践建议
        • 1. 部署 VMAlertmanager(如果还没部署)
        • 2. 创建 VMAlertmanagerConfig 配置邮箱(推荐方式)
        • 3. 创建 SMTP 密码 Secret(强烈推荐,不要明文写密码)
        • 4. 在 VMAlert 中指向 VMAlertmanager
        • 5. 应用并验证
        • 常见 SMTP 配置示例
    • grafana高可用部署
    • kubernetes部署jaeger
    • victorialogs配置关键字告警
    • kubernetes部署python项目
    • kubernetes节点故障重新加入操作
    • kubernetes部署go服务二
    • Istio Gateway 统一入口
  • elk

  • jenkins

  • GitLabCI_CD

  • AI编程

  • 提示词

  • ArgoCD

  • 专题
  • Kubernetes笔记
蛮子哥
2025-05-25
目录

operator部署VictoriaMetrics

# VictoriaMetrics 架构概览

以下是 VictoriaMetrics 的核心组件架构图:

image-20260525092246454

  • vmstorage 负责存储数据,是有状态组件。
  • vmselect 负责查询数据,Grafana 添加 Prometheus 数据源时使用 vmselect 地址,查询数据时,vmselect 会调用各个 vmstorage 的接口完成数据的查询。
  • vminsert 负责写入数据,采集器将采集到的数据 "吐到" vminsert,然后 vminsert 会调用各个 vmstorage 的接口完成数据的写入。
  • 各个组件都可以水平伸缩,但不支持自动伸缩,因为伸缩需要修改启动参数。

# 安装 operator

使用 helm 安装:

helm repo add vm https://victoriametrics.github.io/helm-charts
helm repo update
helm install victoria-operator vm/victoria-metrics-operator

#指定参数安装
  helm install vm ./victoria-metrics-operator-0.63.1.tgz \
  --namespace monitoring \
  --timeout 15m \
  --set vlstorage.persistentVolume.storageClassName=nfs-monitor \
  --set "vlstorage.retentionPeriod=180d" \
  --set "vlstorage.persistentVolume.size=1Gi" \
  --set vmauth.enabled=true \
  --set vlstorage.replicaCount=2 \
  --set global.image.registry=docker.1ms.run \
  --set vlstorage.image.registry=docker.1ms.run \
  --set vlinsert.image.registry=docker.1ms.run \
  --set vlselect.image.registry=docker.1ms.run
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

检查 operator 是否成功启动:

$ kubectl -n monitoring get pod
NAME                                                           READY   STATUS    RESTARTS   AGE
victoria-operator-victoria-metrics-operator-7b886f85bb-jf6ng   1/1     Running   0          20s
1
2
3

# 安装 VMSorage, VMSelect 与 VMInsert

准备 vmcluster.yaml:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMCluster
metadata:
  name: vmcluster
  namespace: monitoring
spec:
  retentionPeriod: "1" # 默认单位是月,参考 https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#retention
  vmstorage:
    replicaCount: 2
    storage:
      volumeClaimTemplate:
        metadata:
          name: data
        spec:
          accessModes: [ "ReadWriteOnce" ]
          storageClassName: cbs
          resources:
            requests:
              storage: 100Gi
  vmselect:
    replicaCount: 2
  vminsert:
    replicaCount: 2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

安装:

$ kubectl apply -f vmcluster.yaml
vmcluster.operator.victoriametrics.com/vmcluster created
1
2

检查组件是否启动成功:

$ kubectl -n monitoring get pod | grep vmcluster
vminsert-vmcluster-77886b8dcb-jqpfw                            1/1     Running   0          20s
vminsert-vmcluster-77886b8dcb-l5wrg                            1/1     Running   0          20s
vmselect-vmcluster-0                                           1/1     Running   0          20s
vmselect-vmcluster-1                                           1/1     Running   0          20s
vmstorage-vmcluster-0                                          1/1     Running   0          20s
vmstorage-vmcluster-1                                          1/1     Running   0          20s
1
2
3
4
5
6
7

# 安装 VMAlertmanager 与 VMAlert

准备 vmalertmanager.yaml:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlertmanager
metadata:
  name: vmalertmanager
  namespace: monitoring
spec:
  replicaCount: 1
  selectAllByDefault: true
1
2
3
4
5
6
7
8

安装 VMAlertmanager:

$ kubectl apply -f vmalertmanager.yaml
vmalertmanager.operator.victoriametrics.com/vmalertmanager created
1
2

准备 vmalert.yaml:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
  name: vmalert
  namespace: monitoring
spec:
  replicaCount: 1
  selectAllByDefault: true
  notifier:
    url: http://vmalertmanager-vmalertmanager:9093
  resources:
    requests:
      cpu: 10m
      memory: 10Mi
  remoteWrite:
    url: http://vminsert-vmcluster:8480/insert/0/prometheus/
  remoteRead:
    url: http://vmselect-vmcluster:8481/select/0/prometheus/
  datasource:
    url: http://vmselect-vmcluster:8481/select/0/prometheus/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

安装 VMAlert:

$ kubectl apply -f vmalert.yaml
vmalert.operator.victoriametrics.com/vmalert created
1
2

检查组件是否启动成功:

$ kubectl -n monitoring get pod | grep vmalert
vmalert-vmalert-5987fb9d5f-9wt6l                               2/2     Running   0          20s
vmalertmanager-vmalertmanager-0                                2/2     Running   0          40s
1
2
3

# 安装 VMAgent

vmagent 用于采集监控数据并发送给 VictoriaMetrics 进行存储,对于腾讯云容器服务上的容器监控数据采集,需要用自定义的 additionalScrapeConfigs 配置,准备自定义采集规则配置文件 scrape-config.yaml:

文件下载地址:https://pdf.chatquanttail.top/2026/05/scrape-config.yaml

[root@k8s-master01 monitor]# cat node-scrape-final.yaml 
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMNodeScrape
metadata:
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: node-exporter
  namespaceSelector:
    matchNames:
      - monitoring
  port: http
  path: /metrics
  interval: 15s
  scrapeTimeout: 10s
  scheme: http
  relabelConfigs:
    - sourceLabels: [__meta_kubernetes_pod_node_name]
      targetLabel: instance
    - targetLabel: job
      replacement: node-exporter
    - action: replace
      sourceLabels: [__meta_kubernetes_pod_ip]
      targetLabel: __address__
      regex: (.+)
      replacement: :9100

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
场景 A:采集指定 namespace 下特定 label 的 Pod
[root@k8s-master01 monitor]# cat vmpod-scrape-app.yaml 
# vmpod-scrape-app.yaml
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMPodScrape
metadata:
  name: vmagent-pod-scrape
  namespace: monitoring        # VMPodScrape 本身放在 monitoring ns
spec:
  namespaceSelector:
    matchNames:
      - monitoring                # 目标 Pod 所在 namespace,可填多个
        #- production
  selector:
    matchLabels:
      app.kubernetes.io/name: vmagent               # 匹配目标 Pod 的 label
  podMetricsEndpoints:
    - port: http  
      path: /metrics
      interval: 30s
      scrapeTimeout: 10s
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

再准备 vmagent.yaml:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAgent
metadata:
  name: vmagent
  namespace: monitoring
spec:
  selectAllByDefault: true
  additionalScrapeConfigs:
    key: additional-scrape-configs.yaml
    name: additional-scrape-configs
  resources:
    requests:
      cpu: 10m
      memory: 10Mi
  replicaCount: 1
  remoteWrite:
  - url: "http://vminsert-vmcluster:8480/insert/0/prometheus/api/v1/write"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

安装:

$ kubectl apply -f  scrape-config.yaml
kubectl apply -f  node-scrape-final.yaml
$ kubectl apply -f vmagent.yamlll
vmagent.operator.victoriametrics.com/vmagent created
1
2
3
4

检查组件是否启动成功:

$ kubectl -n monitoring get pod | grep vmagent
vmagent-vmagent-cf9bbdbb4-tm4w9                                2/2     Running   0          20s
vmagent-vmagent-cf9bbdbb4-ija8r                                2/2     Running   0          20s
1
2
3

# 配置 Grafana

# 添加数据源

VictoriaMetrics 兼容 Prometheus,在 Grafana 添加数据源时,使用 Prometheus 类型,如果 Grafana 跟 VictoriaMetrics 安装在同一集群中,可以使用 service 地址,如:

http://vmselect-vmcluster:8481/select/0/prometheus/
1

# 添加 Dashboard

VictoriaMetrics 官方提供了几个 Grafana Dashboard,id 分别是:

  1. 11176
  2. 12683
  3. 14205

可以将其导入 Grafana

# 配置告警

使用 VictoriaMetrics Operator 安装时,添加 vmalert 告警规则的正确方式是使用 VMRule CRD,而不是直接修改配置文件。

Operator 会自动根据 VMRule 生成 ConfigMap 并挂载到 vmalert Pod 中,修改后会自动 reload,无需重启。

# 1. 创建 VMRule 添加告警规则

[root@k8s-master01 monitor]# cat my-final-rules.yaml
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMRule
metadata:
  name: my-custom-alerts
  namespace: monitoring
  labels:
    app: production
spec:
  groups:
  - name: node-alerts
    interval: 30s
    rules:
      - alert: NodeDown
        expr: up{job="node-exporter"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          description: "node_exporter 无法访问,已持续 1 分钟"
          summary: "节点 {{ $labels.instance }} 失联"

      - alert: NodeHighCPUUsage
        expr: |
          100 * (1 - avg(rate(node_cpu_seconds_total{mode="idle", job="node-exporter"}[5m])) by (instance)) > 80
        for: 2m
        labels:
          severity: warning
        annotations:
          description: "CPU 使用率 {{ printf \"%.1f%%\" $value }},已持续 2 分钟"
          summary: "节点 {{ $labels.instance }} CPU 使用率高"

      - alert: NodeHighMemoryUsage
        expr: |
          100 * (node_memory_MemTotal_bytes{job="node-exporter"} - node_memory_MemAvailable_bytes{job="node-exporter"}) / node_memory_MemTotal_bytes{job="node-exporter"} > 85
        for: 2m
        labels:
          severity: critical
        annotations:
          description: "当前内存使用率 {{ printf \"%.1f%%\" $value }}"
          summary: "节点 {{ $labels.instance }} 内存使用率过高"

      - alert: NodeHighDiskUsage
        expr: |
          100 * (
            (node_filesystem_size_bytes{job="node-exporter",fstype!~"tmpfs|ramfs"}
            - node_filesystem_free_bytes{job="node-exporter",fstype!~"tmpfs|ramfs"})
            / node_filesystem_size_bytes{job="node-exporter",fstype!~"tmpfs|ramfs"}
          ) > 85
        for: 3m
        labels:
          severity: warning
        annotations:
          description: "磁盘使用率 {{ printf \"%.1f%%\" $value }}"
          summary: "节点 {{ $labels.instance }} 磁盘 {{ $labels.mountpoint }} 使用率高"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

应用方式:

kubectl apply -f my-final-rules.yaml
1

# 2. 验证是否生效

# 查看生成的 ConfigMap(Operator 会自动创建多个)
kubectl get configmap -l app.kubernetes.io/name=vmalert -n monitoring

# 查看 vmalert 日志
kubectl logs -l app.kubernetes.io/name=vmalert -n monitoring --tail=100

# 进入 vmalert 查看规则
kubectl port-forward svc/vmalert-prod 8880:8880
# 浏览器访问 http://localhost:8880/api/v1/rules
1
2
3
4
5
6
7
8
9

# 最佳实践建议

  1. 按业务拆分:一个业务/系统一个 VMRule,便于维护。
  2. 使用 labels 做区分:给 VMRule 加 environment: prod、team: backend 等标签,然后在 VMAlert 用 ruleSelector 精确匹配。
  3. 记录规则(recording rules)建议单独一个 group,提高性能。
  4. 模板:annotations 支持 Prometheus 模板语法。

# 配置邮箱告警

✅ 配置邮箱告警通知的正确方式(VictoriaMetrics Operator)

VictoriaMetrics 的 vmalert 本身不直接发邮件,它只负责评估规则并把告警发送给 Alertmanager(通过 notifier)。
真正的邮件发送由 VMAlertmanager 负责。

# 1. 部署 VMAlertmanager(如果还没部署)

[root@k8s-master01 monitor]# cat vmalertmanager.yaml 
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlertmanager
metadata:
  name: vmalertmanager
  namespace: monitoring
spec:
  replicaCount: 1
  selectAllByDefault: true
1
2
3
4
5
6
7
8
9

# 2. 创建 VMAlertmanagerConfig 配置邮箱(推荐方式)


  
apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-config
  namespace: monitoring
type: Opaque
stringData:
  alertmanager.yaml: |
    global:
      resolve_timeout: 5m

    route:
      receiver: email-163
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 1m
      repeat_interval: 30m

    receivers:
      - name: email-163
        email_configs:
          - to: '874878956@qq.com'
            from: 'zpj199310@163.com'
            smarthost: 'smtp.163.com:465'
            require_tls: true
            auth_username: 'zpj199310@163.com'
            auth_identity: 'zpj199310@163.com'
            auth_password: 'xxxxxxxxxxxxx'
            send_resolved: true
            headers:
              Subject: '[{{ .Status | toUpper }}][VM] {{ .CommonLabels.alertname }}'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

# 3. 创建 SMTP 密码 Secret(强烈推荐,不要明文写密码)

#根据需要添加
apiVersion: v1
kind: Secret
metadata:
  name: smtp-secret
  namespace: moni
type: Opaque
data:
  password: <base64 编码后的密码>   # echo -n 'yourpassword' | base64
  
kubectl create secret generic smtp-163-secret \
  --from-literal=authcode=HFLPPQFDDJGJARBD \
  -n monitoring --dry-run=client -o yaml | kubectl apply -f -
1
2
3
4
5
6
7
8
9
10
11
12
13

# 4. 在 VMAlert 中指向 VMAlertmanager

确保你的 VMAlert CR 有以下配置:

[root@k8s-master01 monitor]# cat vmalert.yaml 
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
  name: vmalert
  namespace: monitoring
spec:
  replicaCount: 1
  selectAllByDefault: true
  notifier:
    url: http://vmalertmanager-vmalertmanager:9093
  resources:
    requests:
      cpu: 10m
      memory: 10Mi
  remoteWrite:
    url: http://vminsert-vmcluster:8480/insert/0/prometheus/
  remoteRead:
    url: http://vmselect-vmcluster:8481/select/0/prometheus/
  datasource:
    url: http://vmselect-vmcluster:8481/select/0/prometheus/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

# 5. 应用并验证

kubectl apply -f smtp-secret.yaml
kubectl apply -f vmalertmanager.yaml
kubectl apply -f vmalert.yaml   # 如果修改了
1
2
3

1. 确认测试告警规则存在

kubectl get vmrule test-email-alert -n monitoring
1

如果不存在,请重新创建:

kubectl apply -f - <<EOF
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMRule
metadata:
  name: test-email-alert
  namespace: monitoring
spec:
  groups:
    - name: test-email
      rules:
        - alert: TestEmailAlert
          expr: vector(1) > 0
          for: 0s
          labels:
            severity: critical
          annotations:
            summary: "163邮箱配置测试"
            description: "如果收到这封邮件,说明配置成功!"
EOF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

2. 重启 vmalert 让告警立即发送

kubectl rollout restart deployment vmalert-vmalert -n monitoring   # 注意:你的 vmalert 可能是 deployment
1

3. 观察日志(看是否尝试发送邮件)

# 观察 Alertmanager 日志
kubectl logs -l app.kubernetes.io/name=vmalertmanager -n monitoring --tail=100 -f
1
2

4. 同时观察 vmalert 日志

kubectl logs -l app.kubernetes.io/name=vmalert -n monitoring --tail=50
1

验证命令:

# 查看生成的 Alertmanager 配置
kubectl get secret vmalertmanager-vmalertmanager-generated -n monitoring -o yaml | grep alertmanager.yaml -A 50

# 查看 VMAlertmanager 日志
kubectl logs -l app.kubernetes.io/name=vmalertmanager -n monitoring

#查看发送日志
kubectl logs -l app.kubernetes.io/name=vmalertmanager -n monitoring --tail=100 -f
#查看告警日志
kubectl logs -l app.kubernetes.io/name=vmalert -n monitoring --tail=50

1
2
3
4
5
6
7
8
9
10
11

# 常见 SMTP 配置示例

  • 腾讯企业邮:smarthost: smtp.exmail.qq.com:587
  • 阿里云邮箱:smarthost: smtp.aliyun.com:465(SSL)
  • Gmail:smarthost: smtp.gmail.com:587(需开启“应用专用密码”)
  • Outlook/Office365:smarthost: smtp.office365.com:587

# 配置飞书告警

cat alertmanager-config.yaml

apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-config
  namespace: monitoring
type: Opaque
stringData:
  alertmanager.yaml: |
    global:
      resolve_timeout: 5m

    templates:
      - '/etc/alertmanager/templates/*.tmpl'

    route:
      receiver: feishu
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 1m
      repeat_interval: 30m
      routes:
        - receiver: feishu
          continue: true
        - receiver: email-163

    receivers:
      - name: feishu
        webhook_configs:
          - url: 'http://alertmanager-webhook-adapter.monitoring.svc.cluster.local:8090/webhook/send?channel_type=feishu&token=xxxxxxxxxxxxxxxxxxxxx&msg_type=text'
            send_resolved: true

      - name: email-163
        email_configs:
          - to: '874878956@qq.com'
            from: 'zpj199310@163.com'
            smarthost: 'smtp.163.com:465'
            require_tls: true
            auth_username: 'zpj199310@163.com'
            auth_identity: 'zpj199310@163.com'
            auth_password: 'xxxxxxxxxxxxxxxxx'
            send_resolved: true
            headers:
              Subject: '[{{ .Status | toUpper }}][VM] {{ .CommonLabels.alertname }}'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
[root@k8s-master01 monitor]# cat configmap-feishu.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-webhook-adapter-tmpl
  namespace: monitoring
data:
  feishu.zh.tmpl: |
    {{- define "prom.title" -}}
    {{- if eq .Status "firing" -}}🚨 【告警触发】{{- else -}}✅ 【告警恢复】{{- end }} {{ .Signature }}
    {{- end -}}

    {{- define "prom.text" -}}
    {{- $status := .Status -}}
    ━━━━━━━━━━━━━━━━━━
    {{- range .Alerts }}
    告警名称: {{ .Labels.alertname }}
    告警级别: {{ .Labels.severity | toUpper }}
    实    例: {{ .Labels.instance }}
    状    态: {{ if eq $status "firing" }}FIRING 🔥{{ else }}RESOLVED ✅{{ end }}
    开始时间: {{ .StartsAt }}
    摘    要: {{ .Annotations.summary }}
    详    情: {{ .Annotations.description }}
    ━━━━━━━━━━━━━━━━━━
    {{- end }}
    {{- end -}}

    {{- define "prom.markdown" -}}
    {{ template "prom.text" . }}
    {{- end -}}

    {{- define "prom.content" -}}
    {{ template "prom.text" . }}
    {{- end -}}

    {{- define "prom.card" -}}
    {{ template "prom.text" . }}
    {{- end -}}

    {{- define "prom" -}}
    {{ template "prom.title" . }}
    {{ template "prom.text" . }}
    {{- end -}}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[root@k8s-master01 monitor]# cat go-alertmanager-feishu.yaml 
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: alertmanager-webhook-adapter
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: alertmanager-webhook-adapter
  template:
    metadata:
      labels:
        app: alertmanager-webhook-adapter
    spec:
      containers:
        - name: adapter
          image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/bougou/alertmanager-webhook-adapter:v1.1.10
          args:
            - "--signature"
            - "MyCluster"
            - "--tmpl-lang"
            - "zh"
            - "--tmpl-dir"
            - "/etc/adapter/templates"
          env:
            - name: TZ
              value: "Asia/Shanghai"
          ports:
            - containerPort: 8090
          volumeMounts:
            - name: tmpl
              mountPath: /etc/adapter/templates
      volumes:
        - name: tmpl
          configMap:
            name: alertmanager-webhook-adapter-tmpl

---
apiVersion: v1
kind: Service
metadata:
  name: alertmanager-webhook-adapter
  namespace: monitoring
spec:
  selector:
    app: alertmanager-webhook-adapter
  ports:
    - port: 8090
      targetPort: 8090
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

执行启动

kubectl apply -f alertmanager-config.yaml
kubectl apply -f configmap-feishu.yaml
kubectl apply -f go-alertmanager-feishu.yaml 
1
2
3

执行测试

kubectl port-forward svc/alertmanager-webhook-adapter -n monitoring 8090:8090
curl -v -X POST 'http://localhost:8090/webhook/send?channel_type=feishu&token=xxxxxxxxxxxxxxxx&msg_type=text' \
  -H 'Content-Type: application/json' \
  -d '{
    "version": "4",
    "groupKey": "test",
    "status": "firing",
    "receiver": "feishu",
    "groupLabels": {"alertname": "TestAlert"},
    "commonLabels": {
      "alertname": "TestAlert",
      "severity": "critical",
      "instance": "192.168.1.100:9090"
    },
    "commonAnnotations": {
      "summary": "这是一条测试告警",
      "description": "节点 CPU 使用率超过 90%"
    },
    "alerts": [
      {
        "status": "firing",
        "labels": {
          "alertname": "TestAlert",
          "severity": "critical",
          "instance": "192.168.1.100:9090"
        },
        "annotations": {
          "summary": "这是一条测试告警",
          "description": "节点 CPU 使用率超过 90%"
        },
        "startsAt": "2024-01-01T00:00:00Z",
        "endsAt": "0001-01-01T00:00:00Z",
        "generatorURL": "http://prometheus:9090/graph"
      }
    ]
  }'

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

# 安装node_exporter

[root@k8s-master01 monitor]# cat node-exporter.yaml 
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMNodeScrape
metadata:
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: node-exporter
  port: http
  path: /metrics
  interval: 30s
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
  labels:
    app.kubernetes.io/name: node-exporter
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: node-exporter
  template:
    metadata:
      labels:
        app.kubernetes.io/name: node-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      tolerations:
        - operator: "Exists"
      containers:
        - name: node-exporter
          image: docker.cnb.cool/zzppjj/docker-images/node-exporter:v1.8.2
          args:
            - --web.listen-address=:9100
          ports:
            - name: http
              containerPort: 9100
              hostPort: 9100
          resources:
            limits:
              cpu: 200m
              memory: 200Mi
            requests:
              cpu: 100m
              memory: 100Mi
          securityContext:
            readOnlyRootFilesystem: true
[root@k8s-master01 monitor]# cat node-exporter-service.yaml 
apiVersion: v1
kind: Service
metadata:
  name: node-exporter
  namespace: monitoring
  labels:
    app.kubernetes.io/name: node-exporter
spec:
  type: ClusterIP
  clusterIP: None
  ports:
    - name: http
      port: 9100
      targetPort: 9100
  selector:
    app.kubernetes.io/name: node-exporter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

原文链接 (opens new window)

微信 支付宝
上次更新: 2026/06/15, 01:53:28

← 日志采集操作示例 grafana高可用部署→

最近更新
01
victorialogs配置关键字告警
06-03
02
kubernetes部署jaeger
05-30
03
grafana高可用部署
05-26
更多文章>
Theme by Vdoing | Copyright © 2019-2026 | 点击查看十年之约 | 鄂ICP备2024072800号
  • 跟随系统
  • 浅色模式
  • 深色模式
  • 阅读模式