1.下载process-exporter

wget https://github/ncabatoff/process-exporter/releases/download/v0.7.10/process-exporter-0.7.10.linux-amd64.tar.gz

2.安装部署process-exporter

在要监控的机器上,安装proces-exporter。

tar -xvf process-exporter-0.7.10.linux-amd64.tar.gz
mv process-exporter-0.7.10.linux-amd64 process-exporter

3.编写配置文件

vim process-name.yaml

注意:该进程监控主要是通过匹配进程中的关键字来实现的,所以需要通过ps -ef | grep "***"来查找你启动进程,然后查找cmdline中的关键信息(可以是一个关键词,也可以是部分路径)来进行信息采集。同时,也可以通过ps -ef | grep '***'查询到目标进程后,执行more /proc/端口号/cmdline查看cmdline内容,然后挑选必要信息填写到config.yaml中。如果改进程不存在,则不会有该进程的数据采集到。

指定一个进程:

process_names:
  - name: "{{.Matches}}"
    cmdline:
    - 'alertlib'

指定多个进程:

process_names:
  - name: "{{.Matches}}"
    cmdline:
    - 'test1'
  - name: "{{.Matches}}"
    cmdline:
    - 'test2'
  - name: "{{.Matches}}"
    cmdline:
    - 'test3'

指定所有进程:

process_names:
 - name: "{{.Comm}}"
    cmdline:
    - '.+'

4.编写启动脚本

vim /usr/lib/systemd/system/process-exporter.service

ExecStart填写安装的process-exporter地址

[Unit]
Description=process_exporter
After=network.target

[Service]
User=root
Type=simple
ExecStart=/opt/soft/process-exporter/process-exporter -config.path /opt/soft/process-exporter/config.yaml
Restart=on-failure

[Install]
WantedBy=multi-user.target

5.启动

systemctl daemon-reload   
systemctl start process-exporter.service
systemctl status process-exporter.service
systemctl enable process-exporter.service  #开机自启

6.验证

curl localhost:9256/metrics

注意:metrics中包含:namedprocess_namegroup_num_procs{groupname=“map[:alertlib]”}即代表启动正确,否则查询config.yaml配置是否正确。

7.配置Prometheus

修改Prometheus配置文件

- job_name: 'process'
	static_configs:
	-targets: ['172.**.**.**:9256']

配置告警规则:

groups:
 - name: dol_alert_process_rule
   rules:
   - alert: Dolphinscheduler Alert ProcessDown # 告警名称
     expr: (namedprocess_namegroup_num_procs{groupname="map[:alertlib]"}) == 0
     for: 1m # 满足告警条件持续时间多久后,才会发送告警
     labels: #标签项
        severity: error
     annotations: # 解析项,详细解释告警信息
         summary: "dolphinscheduler Alert {{ $labels.instance }}  has been down for more than 1 minutes"
         description: "dolphinscheduler Alert has been down, This requires immediate action!"

8.热加载Prometheus报警规则

./promtool check config  prometheus.yml
systemctl reload prometheus.service

9.配置钉钉告警

详情请参考:prometheus配置alertmanager告警-钉钉告警

更多推荐

Prometheus通过Process-exporter实现任务进程监控