728x90
1. 자바 서버
build.gradle
..
dependencies {
implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'io.micrometer:micrometer-registry-prometheus'
...
}
application.yaml
server:
port: 8000
management:
endpoints:
web:
exposure:
include: health, info, prometheus
2. 설정 yml 파일
configs/alertmanager.yml
global:
resolve_timeout: 1m
slack_api_url: 'https://hooks.slack.com/services/XXXXXXX' # slack api 설정
# email
smtp_smarthost: '<smtp-host>:587' # smtp 설정
smtp_from: sender@email.com # stmp 전송자
smtp_auth_username: test@email.com # 이메일
smtp_auth_password: test_password # 비밀번호 설정
route:
receiver: 'notifications'
routes:
- match:
serverity: page
receiver: 'notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#prometheus-test' # slack 채널 이름
send_resolved: true
title:
email_configs:
- to: my-email@email.com # alert 받는 이메일
configs/prometheus.yml
# my global config
global:
scrape_interval: 10s # Set the scrape interval to every 10 seconds. Default is every 1 minute.
evaluation_interval: 10s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['host.docker.internal:9093']
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['host.docker.internal:9100']
- job_name: 'spring-boot-app'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['host.docker.internal:8000']
configs/rules.yml
groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 1m # 발생후 pending 대기 시간
- name: node_rules
rules:
- record: job:up:avg
expr: avg without(instance)(up{job="node"})
- alert: ManyInstancesDown
expr: job:up:avg{job="node"} < 0.5
- alert: BatchJobNoRecentSuccess
expr: >
time() - my_batch_job_last_success_time_seconds{job="batch"} > 86400*2
- alert: FDsNearLimit
expr: >
process_open_fds > process_max_fds * .8
for: 5m
3. 도커 실행
run_grafana.cmd
docker run -d --name=grafana -p 3000:3000 grafana/grafana:5.0.0
run_alertmanager.cmd
docker run -d --rm --name alertmanager -p 127.0.0.1:9093:9093 -v %cd%/configs:/etc/prometheus quay.io/prometheus/alertmanager --config.file=/etc/prometheus/alertmanager.yml
run_prometheus.cmd
docker run -d --rm --name prometheus -p 9090:9090 -v %cd%/configs:/etc/prometheus prom/prometheus --config.file=/etc/prometheus/prometheus.yml
728x90
'개발 > 기능 구현, 프로젝트' 카테고리의 다른 글
폐쇄망 환경에서 웹만들기(spring boot, gradle, nuxt, nexus3) (0) | 2021.09.06 |
---|---|
websocket 설정( js, spring boot) (0) | 2021.03.31 |
Firebase authentication 적용하기 (python) (1) | 2020.03.19 |
go access 정리 (0) | 2020.03.16 |
댓글