Logs module
## 🦌 Centralized Logging Stack Integration ### ELK Stack Online - Added **`elasticsearch`**, **`logstash`**, and **`kibana`** services to `docker-compose.yml`: - **Elasticsearch** for log storage and indexing with persistent volumes. - **Logstash** as the GELF entrypoint, handling log ingestion and transformation. - **Kibana** as the web UI for log exploration, dashboards, and saved searches. - Each ELK service is wired with: - **Persistent storage** to survive restarts. - **Environment variables** for credentials and tuning. - **Bootstrap scripts** to perform initial setup (policies, templates, dashboards, etc.). ### Global GELF Logging - All existing services now use the **GELF logging driver** in `docker-compose.yml`: - Containers send their logs to **Logstash** instead of stdout-only. - Provides **structured**, centralized logs ready for querying in Elasticsearch/Kibana. - Result: no more log hunting across containers — everything lands in one searchable place. --- ## 🔁 Log Lifecycle & Visualization Automation ### Elasticsearch & Kibana Bootstrap - Introduced **bootstrap scripts and config files** to automate: - **Index Lifecycle Management (ILM)** policies for log retention and rollover. - **Index templates** for log indices (naming, mappings, and settings). - **Kibana imports** (index patterns / data views, dashboards, visualizations). - This turns ELK setup from a manual ritual into a **single-command provisioning step**. ### Logstash Pipeline Upgrade - Added a **Logstash pipeline configuration** to: - Ingest **GELF** logs from Docker. - **Normalize/rename fields** for consistent querying across services. - Index logs into **Elasticsearch** with **daily rotation per container** pattern. - Outcome: logs are structured, tagged by container, and auto-rotated to keep storage sane. --- ## 🛠 Makefile & Docker.mk Enhancements ### Logs Setup Targets - Added a new **`logs`** target in `Makefile` (with `.PHONY` declaration) to manage logging setup from the top level. - Added a **`logs-setup`** target in `Docker.mk` to: - Initialize **ILM policies** in Elasticsearch. - Apply **index templates** for logs. - Create **Kibana index patterns** so logs are immediately visible in the UI. - These targets plug into the existing tooling, making logging setup part of the **standard dev/ops workflow**. --- ## 🔐 Environment Configuration ### Secure Elasticsearch Access - Updated `env.example` to include: - **`ELASTIC_PASSWORD`**: central password for Elasticsearch authentication. - Encourages **secure-by-default** deployments and aligns local/dev with production-style security. --- ## 📈 Monitoring Configuration Updates ### Grafana Alerting & Prometheus Cleanup - Added a **basic alerting policy for Grafana**: - Provides a default routing tree for alerts. - Acts as a foundation for future, more granular alert rules. - Cleaned up **Prometheus scrape configuration**: - Removed obsolete backend scrape targets. - Keeps monitoring config focused on **live** and relevant services.
This commit is contained in:
commit
e44a3af76d
13 changed files with 998 additions and 14 deletions
24
Docker.mk
24
Docker.mk
|
|
@ -6,10 +6,12 @@
|
|||
# By: maiboyer <maiboyer@student.42.fr> +#+ +:+ +#+ #
|
||||
# +#+#+#+#+#+ +#+ #
|
||||
# Created: 2025/06/11 18:10:26 by maiboyer #+# #+# #
|
||||
# Updated: 2025/07/30 19:32:11 by maiboyer ### ########.fr #
|
||||
# Updated: 2025/11/14 18:54:16 by maiboyer ### ########.fr #
|
||||
# #
|
||||
# **************************************************************************** #
|
||||
|
||||
.PHONY: logs
|
||||
|
||||
all: build
|
||||
docker compose up -d
|
||||
|
||||
|
|
@ -39,3 +41,23 @@ prune: clean
|
|||
-docker network prune
|
||||
-docker system prune -a
|
||||
|
||||
ES_URL ?= http://local.maix.me:9200
|
||||
KIBANA_URL ?= http://local.maix.me:5601
|
||||
|
||||
logs-setup:
|
||||
@until curl -s "$(ES_URL)" > /dev/null 2>&1; do sleep 1; done;
|
||||
|
||||
@curl -s -X PUT "$(ES_URL)/_ilm/policy/docker-logs-policy" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"policy":{"phases":{"hot":{"actions":{}},"delete":{"min_age":"7d","actions":{"delete":{}}}}}}' > /dev/null
|
||||
|
||||
@curl -s -X PUT "$(ES_URL)/_template/docker-logs-template" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"index_patterns":["docker-*"],"settings":{"index.lifecycle.name":"docker-logs-policy"}}' > /dev/null
|
||||
|
||||
@until curl -s "$(KIBANA_URL)/api/status" > /dev/null 2>&1; do sleep 1; done;
|
||||
|
||||
@curl -s -X POST "$(KIBANA_URL)/api/saved_objects/index-pattern/docker-logs" \
|
||||
-H "kbn-xsrf: true" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"attributes":{"title":"docker-*","timeFieldName":"@timestamp"}}' > /dev/null
|
||||
|
|
|
|||
6
Makefile
6
Makefile
|
|
@ -1,4 +1,4 @@
|
|||
# **************************************************************************** #make
|
||||
# **************************************************************************** #
|
||||
# #
|
||||
# ::: :::::::: #
|
||||
# Makefile :+: :+: :+: #
|
||||
|
|
@ -6,7 +6,7 @@
|
|||
# By: rparodi <rparodi@student.42.fr> +#+ +:+ +#+ #
|
||||
# +#+#+#+#+#+ +#+ #
|
||||
# Created: 2023/11/12 11:05:05 by rparodi #+# #+# #
|
||||
# Updated: 2025/11/10 01:05:11 by maiboyer ### ########.fr #
|
||||
# Updated: 2025/11/14 17:40:57 by maiboyer ### ########.fr #
|
||||
# #
|
||||
# **************************************************************************** #
|
||||
|
||||
|
|
@ -157,4 +157,4 @@ fnginx: nginx-dev/nginx-selfsigned.crt nginx-dev/nginx-selfsigned.key
|
|||
wait
|
||||
|
||||
# phony
|
||||
.PHONY: all clean fclean re header footer npm@install npm@clean npm@fclean npm@build sql tmux
|
||||
.PHONY: all clean fclean re header footer npm@install npm@clean npm@fclean npm@build sql tmux logs
|
||||
|
|
|
|||
|
|
@ -15,6 +15,11 @@ services:
|
|||
- transcendance-network
|
||||
volumes:
|
||||
- static-volume:/volumes/static
|
||||
logging:
|
||||
driver: gelf
|
||||
options:
|
||||
gelf-address: "udp://127.0.0.1:12201"
|
||||
tag: "{{.Name}}"
|
||||
|
||||
#
|
||||
# The "entry point" as in it does all of this:
|
||||
|
|
@ -37,6 +42,11 @@ services:
|
|||
environment:
|
||||
# this can stay the same for developpement. This is an alias to `localhost`
|
||||
- NGINX_DOMAIN=local.maix.me
|
||||
logging:
|
||||
driver: gelf
|
||||
options:
|
||||
gelf-address: "udp://127.0.0.1:12201"
|
||||
tag: "{{.Name}}"
|
||||
|
||||
###############
|
||||
# ICONS #
|
||||
|
|
@ -58,6 +68,11 @@ services:
|
|||
- JWT_SECRET=KRUGKIDROVUWG2ZAMJZG653OEBTG66BANJ2W24DTEBXXMZLSEB2GQZJANRQXU6JA
|
||||
- USER_ICONS_STORE=/volumes/store
|
||||
- DATABASE_DIR=/volumes/database
|
||||
logging:
|
||||
driver: gelf
|
||||
options:
|
||||
gelf-address: "udp://127.0.0.1:12201"
|
||||
tag: "{{.Name}}"
|
||||
|
||||
|
||||
###############
|
||||
|
|
@ -80,7 +95,12 @@ services:
|
|||
- JWT_SECRET=KRUGKIDROVUWG2ZAMJZG653OEBTG66BANJ2W24DTEBXXMZLSEB2GQZJANRQXU6JA
|
||||
- DATABASE_DIR=/volumes/database
|
||||
- PROVIDER_FILE=/extra/providers.toml
|
||||
|
||||
logging:
|
||||
driver: gelf
|
||||
options:
|
||||
gelf-address: "udp://127.0.0.1:12201"
|
||||
tag: "{{.Name}}"
|
||||
|
||||
|
||||
###############
|
||||
# CHAT #
|
||||
|
|
@ -123,7 +143,11 @@ services:
|
|||
environment:
|
||||
- JWT_SECRET=KRUGKIDROVUWG2ZAMJZG653OEBTG66BANJ2W24DTEBXXMZLSEB2GQZJANRQXU6JA
|
||||
- DATABASE_DIR=/volumes/database
|
||||
|
||||
logging:
|
||||
driver: gelf
|
||||
options:
|
||||
gelf-address: "udp://127.0.0.1:12201"
|
||||
tag: "{{.Name}}"
|
||||
|
||||
|
||||
###############
|
||||
|
|
@ -154,6 +178,11 @@ services:
|
|||
- GF_SERVER_ROOT_URL=http://local.maix.me:3000
|
||||
- GF_SECURITY_ADMIN_USER=${GRAFANA_ADMIN_USER}
|
||||
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASS}
|
||||
logging:
|
||||
driver: gelf
|
||||
options:
|
||||
gelf-address: "udp://127.0.0.1:12201"
|
||||
tag: "{{.Name}}"
|
||||
|
||||
prometheus:
|
||||
image: prom/prometheus:latest
|
||||
|
|
@ -164,6 +193,11 @@ services:
|
|||
volumes:
|
||||
- ./monitoring/prometheus:/etc/prometheus/
|
||||
restart: unless-stopped
|
||||
logging:
|
||||
driver: gelf
|
||||
options:
|
||||
gelf-address: "udp://127.0.0.1:12201"
|
||||
tag: "{{.Name}}"
|
||||
|
||||
cadvisor:
|
||||
image: gcr.io/cadvisor/cadvisor:latest
|
||||
|
|
@ -178,6 +212,12 @@ services:
|
|||
- /sys:/sys:ro
|
||||
- /var/lib/docker/:/var/lib/docker:ro
|
||||
restart: unless-stopped
|
||||
logging:
|
||||
driver: gelf
|
||||
options:
|
||||
gelf-address: "udp://127.0.0.1:12201"
|
||||
tag: "{{.Name}}"
|
||||
|
||||
|
||||
blackbox:
|
||||
image: prom/blackbox-exporter:latest
|
||||
|
|
@ -187,9 +227,70 @@ services:
|
|||
ports:
|
||||
- "9115:9115"
|
||||
restart: unless-stopped
|
||||
logging:
|
||||
driver: gelf
|
||||
options:
|
||||
gelf-address: "udp://127.0.0.1:12201"
|
||||
tag: "{{.Name}}"
|
||||
|
||||
|
||||
|
||||
###############
|
||||
# LOGS #
|
||||
###############
|
||||
|
||||
elasticsearch:
|
||||
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.23
|
||||
container_name: logs-elasticsearch
|
||||
networks:
|
||||
- monitoring
|
||||
environment:
|
||||
- discovery.type=single-node
|
||||
- ES_JAVA_OPTS=-Xms512m -Xmx512m
|
||||
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
|
||||
volumes:
|
||||
- elastic-data:/usr/share/elasticsearch/data
|
||||
- ./logs/elasticsearch:/setup
|
||||
ports:
|
||||
- "9200:9200"
|
||||
command: ["/setup/bootstrap.sh"]
|
||||
restart: unless-stopped
|
||||
|
||||
logstash:
|
||||
image: docker.elastic.co/logstash/logstash:7.17.23
|
||||
container_name: logs-logstash
|
||||
depends_on:
|
||||
- elasticsearch
|
||||
networks:
|
||||
- monitoring
|
||||
volumes:
|
||||
- ./logs/logstash/pipeline:/usr/share/logstash/pipeline
|
||||
ports:
|
||||
- "12201:12201/udp"
|
||||
restart: unless-stopped
|
||||
|
||||
kibana:
|
||||
image: docker.elastic.co/kibana/kibana:7.17.23
|
||||
container_name: logs-kibana
|
||||
depends_on:
|
||||
- elasticsearch
|
||||
networks:
|
||||
- monitoring
|
||||
environment:
|
||||
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
|
||||
- SERVER_PUBLICBASEURL=http://local.maix.me:5601
|
||||
- ELASTICSEARCH_USERNAME=elastic
|
||||
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
|
||||
ports:
|
||||
- "5601:5601"
|
||||
volumes:
|
||||
- ./logs/kibana:/setup
|
||||
command: ["/setup/bootstrap.sh"]
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
images-volume:
|
||||
sqlite-volume:
|
||||
static-volume:
|
||||
grafana-data:
|
||||
elastic-data:
|
||||
|
|
|
|||
|
|
@ -1,3 +1,5 @@
|
|||
GRAFANA_ADMIN_USER=""
|
||||
GRAFANA_ADMIN_PASS=""
|
||||
GRAFANA_WEBHOOK_URL=""
|
||||
GRAFANA_ADMIN_USER=
|
||||
GRAFANA_ADMIN_PASS=
|
||||
GRAFANA_WEBHOOK_URL=
|
||||
|
||||
ELASTIC_PASSWORD=
|
||||
|
|
|
|||
19
logs/elasticsearch/bootstrap.sh
Executable file
19
logs/elasticsearch/bootstrap.sh
Executable file
|
|
@ -0,0 +1,19 @@
|
|||
#!/bin/sh
|
||||
|
||||
setup_ilm() {
|
||||
set -xe
|
||||
until curl -s -f http://localhost:9200 >/dev/null; do
|
||||
sleep 2;
|
||||
done;
|
||||
|
||||
curl -v -X PUT "localhost:9200/_ilm/policy/docker-logs-policy" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '@/setup/docker-logs-policy.json'
|
||||
curl -v -X PUT "localhost:9200/_template/docker-logs-template" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '@/setup/docker-logs-template.json'
|
||||
exit 0
|
||||
}
|
||||
|
||||
setup_ilm &
|
||||
exec /usr/local/bin/docker-entrypoint.sh eswrapper
|
||||
15
logs/elasticsearch/docker-logs-policy.json
Normal file
15
logs/elasticsearch/docker-logs-policy.json
Normal file
|
|
@ -0,0 +1,15 @@
|
|||
{
|
||||
"policy": {
|
||||
"phases": {
|
||||
"hot": {
|
||||
"actions": {}
|
||||
},
|
||||
"delete": {
|
||||
"min_age": "7d",
|
||||
"actions": {
|
||||
"delete": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
1
logs/elasticsearch/docker-logs-template.json
Normal file
1
logs/elasticsearch/docker-logs-template.json
Normal file
|
|
@ -0,0 +1 @@
|
|||
{"index_patterns":["docker-*"],"settings":{"index.lifecycle.name":"docker-logs-policy"}}}
|
||||
15
logs/kibana/bootstrap.sh
Executable file
15
logs/kibana/bootstrap.sh
Executable file
|
|
@ -0,0 +1,15 @@
|
|||
#!/bin/sh
|
||||
|
||||
kibana_setup() {
|
||||
set -xe
|
||||
until curl -s -f "localhost:5601/api/status"; do
|
||||
sleep 2
|
||||
done
|
||||
|
||||
curl -v -X POST "localhost:5601/api/saved_objects/_import?overwrite=true" \
|
||||
-H "kbn-xsrf: true" \
|
||||
--form file='@/setup/export.ndjson'
|
||||
exit 0
|
||||
}
|
||||
kibana_setup &
|
||||
exec /usr/local/bin/kibana-docker
|
||||
5
logs/kibana/export.ndjson
Normal file
5
logs/kibana/export.ndjson
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
{"attributes":{"buildNum":47645,"defaultIndex":"docker-logs","defaultRoute":"/app/dashboards#/view/f1356840-c17c-11f0-92fb-4711317b9bee"},"coreMigrationVersion":"7.17.23","id":"7.17.23","migrationVersion":{"config":"7.13.0"},"references":[],"type":"config","updated_at":"2025-11-14T17:29:48.539Z","version":"WzE0Miw0XQ=="}
|
||||
{"attributes":{"fieldAttrs":"{\"@timestamp\":{\"count\":3},\"command\":{\"count\":2},\"container_name\":{\"count\":1},\"level\":{\"count\":1},\"message\":{\"count\":1}}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"docker-*","typeMeta":"{}"},"coreMigrationVersion":"7.17.23","id":"docker-logs","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2025-11-14T17:26:47.450Z","version":"Wzc0LDRd"}
|
||||
{"attributes":{"columns":["container_name","message","level"],"description":"test","grid":{},"hideChart":false,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[],\"indexRefName\":\"kibanaSavedObjectMeta.searchSourceJSON.index\"}"},"sort":[["@timestamp","asc"]],"title":"LogTable"},"coreMigrationVersion":"7.17.23","id":"b5a48950-c17c-11f0-92fb-4711317b9bee","migrationVersion":{"search":"7.9.3"},"references":[{"id":"docker-logs","name":"kibanaSavedObjectMeta.searchSourceJSON.index","type":"index-pattern"}],"type":"search","updated_at":"2025-11-14T17:26:47.450Z","version":"Wzc1LDRd"}
|
||||
{"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.17.23\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":0,\"w\":24,\"h\":21,\"i\":\"9600aa15-1732-41da-a43c-723fdb1a97a0\"},\"panelIndex\":\"9600aa15-1732-41da-a43c-723fdb1a97a0\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"visualizationType\":\"lnsXY\",\"type\":\"lens\",\"references\":[{\"type\":\"index-pattern\",\"id\":\"docker-logs\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"docker-logs\",\"name\":\"indexpattern-datasource-layer-7b411268-3ed2-45f6-9067-b88364aba992\"}],\"state\":{\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"labelsOrientation\":{\"x\":0,\"yLeft\":0,\"yRight\":0},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"bar_stacked\",\"layers\":[{\"layerId\":\"7b411268-3ed2-45f6-9067-b88364aba992\",\"accessors\":[\"27ad7775-f44f-4d6c-b49d-5f8bebee33af\"],\"position\":\"top\",\"seriesType\":\"bar\",\"showGridlines\":false,\"layerType\":\"data\",\"xAccessor\":\"e4e3a367-7cd4-4ad6-95a7-824f0717503d\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[],\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"7b411268-3ed2-45f6-9067-b88364aba992\":{\"columns\":{\"e4e3a367-7cd4-4ad6-95a7-824f0717503d\":{\"label\":\"Top values of container_name.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"container_name.keyword\",\"isBucketed\":true,\"params\":{\"size\":5,\"orderBy\":{\"type\":\"column\",\"columnId\":\"27ad7775-f44f-4d6c-b49d-5f8bebee33af\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}},\"27ad7775-f44f-4d6c-b49d-5f8bebee33af\":{\"label\":\"Count of records\",\"dataType\":\"number\",\"operationType\":\"count\",\"isBucketed\":false,\"scale\":\"ratio\",\"sourceField\":\"Records\"}},\"columnOrder\":[\"e4e3a367-7cd4-4ad6-95a7-824f0717503d\",\"27ad7775-f44f-4d6c-b49d-5f8bebee33af\"],\"incompleteColumns\":{}}}}}}},\"enhancements\":{},\"hidePanelTitles\":false},\"title\":\"Log Count\"},{\"version\":\"7.17.23\",\"type\":\"search\",\"gridData\":{\"x\":24,\"y\":0,\"w\":24,\"h\":21,\"i\":\"08f56117-4041-4282-af91-99a44941e06d\"},\"panelIndex\":\"08f56117-4041-4282-af91-99a44941e06d\",\"embeddableConfig\":{\"enhancements\":{},\"hidePanelTitles\":false},\"title\":\"Log Management\",\"panelRefName\":\"panel_08f56117-4041-4282-af91-99a44941e06d\"}]","timeRestore":false,"title":"Default","version":1},"coreMigrationVersion":"7.17.23","id":"f1356840-c17c-11f0-92fb-4711317b9bee","migrationVersion":{"dashboard":"7.17.3"},"references":[{"id":"docker-logs","name":"9600aa15-1732-41da-a43c-723fdb1a97a0:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"docker-logs","name":"9600aa15-1732-41da-a43c-723fdb1a97a0:indexpattern-datasource-layer-7b411268-3ed2-45f6-9067-b88364aba992","type":"index-pattern"},{"id":"b5a48950-c17c-11f0-92fb-4711317b9bee","name":"08f56117-4041-4282-af91-99a44941e06d:panel_08f56117-4041-4282-af91-99a44941e06d","type":"search"}],"type":"dashboard","updated_at":"2025-11-14T17:26:47.450Z","version":"Wzc2LDRd"}
|
||||
{"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":4,"missingRefCount":0,"missingReferences":[]}
|
||||
22
logs/logstash/pipeline/logstash.conf
Normal file
22
logs/logstash/pipeline/logstash.conf
Normal file
|
|
@ -0,0 +1,22 @@
|
|||
input {
|
||||
gelf {
|
||||
port => 12201
|
||||
}
|
||||
}
|
||||
|
||||
filter {
|
||||
mutate {
|
||||
rename => { "[full_message]" => "message" }
|
||||
}
|
||||
}
|
||||
|
||||
output {
|
||||
elasticsearch {
|
||||
hosts => ["http://elasticsearch:9200"]
|
||||
index => "docker-%{[container_name]}-%{+YYYY.MM.dd}"
|
||||
}
|
||||
|
||||
stdout {
|
||||
codec => rubydebug
|
||||
}
|
||||
}
|
||||
5
monitoring/grafana/alerting/policies.yaml
Normal file
5
monitoring/grafana/alerting/policies.yaml
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
routes:
|
||||
receiver: discord-webhook
|
||||
routes:
|
||||
- matchers:
|
||||
receiver: discord-webhook
|
||||
782
monitoring/grafana/alerting/rules.yaml
Normal file
782
monitoring/grafana/alerting/rules.yaml
Normal file
|
|
@ -0,0 +1,782 @@
|
|||
apiVersion: 1
|
||||
groups:
|
||||
- orgId: 1
|
||||
name: availability
|
||||
folder: alert_rules.yml
|
||||
interval: 1m
|
||||
rules:
|
||||
- uid: 14db4fe7-faf3-5629-9ee1-c5c189d75fec
|
||||
title: InstanceDown
|
||||
condition: threshold
|
||||
data:
|
||||
- refId: query
|
||||
queryType: prometheus
|
||||
relativeTimeRange:
|
||||
from: 660
|
||||
to: 60
|
||||
datasourceUid: prometheus
|
||||
model:
|
||||
datasource:
|
||||
type: prometheus
|
||||
uid: prometheus
|
||||
expr: up == 0
|
||||
instant: true
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
range: false
|
||||
refId: query
|
||||
- refId: prometheus_math
|
||||
queryType: math
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: prometheus_math
|
||||
type: math
|
||||
- refId: threshold
|
||||
queryType: threshold
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
conditions:
|
||||
- evaluator:
|
||||
params:
|
||||
- 0
|
||||
type: gt
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: prometheus_math
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: threshold
|
||||
type: threshold
|
||||
noDataState: OK
|
||||
execErrState: OK
|
||||
for: 1m
|
||||
annotations:
|
||||
description: |
|
||||
Instance {{ $labels.instance }} (job={{ $labels.job }}) has not responded to Prometheus scrapes for more than one minute.
|
||||
summary: Instance {{ $labels.job }} down
|
||||
labels:
|
||||
__converted_prometheus_rule__: "true"
|
||||
severity: critical
|
||||
isPaused: false
|
||||
missing_series_evals_to_resolve: 1
|
||||
- orgId: 1
|
||||
name: blackbox-probes
|
||||
folder: alert_rules.yml
|
||||
interval: 1m
|
||||
rules:
|
||||
- uid: c549c658-ce15-5d56-9842-07730bb11e15
|
||||
title: BlackboxProbeFailed
|
||||
condition: threshold
|
||||
data:
|
||||
- refId: query
|
||||
queryType: prometheus
|
||||
relativeTimeRange:
|
||||
from: 660
|
||||
to: 60
|
||||
datasourceUid: prometheus
|
||||
model:
|
||||
datasource:
|
||||
type: prometheus
|
||||
uid: prometheus
|
||||
expr: probe_success == 0
|
||||
instant: true
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
range: false
|
||||
refId: query
|
||||
- refId: prometheus_math
|
||||
queryType: math
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: prometheus_math
|
||||
type: math
|
||||
- refId: threshold
|
||||
queryType: threshold
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
conditions:
|
||||
- evaluator:
|
||||
params:
|
||||
- 0
|
||||
type: gt
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: prometheus_math
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: threshold
|
||||
type: threshold
|
||||
noDataState: OK
|
||||
execErrState: OK
|
||||
for: 30s
|
||||
annotations:
|
||||
description: |
|
||||
The Blackbox probe for {{ $labels.instance }} has failed (probe_success = 0).
|
||||
summary: Blackbox probe failed
|
||||
labels:
|
||||
__converted_prometheus_rule__: "true"
|
||||
severity: critical
|
||||
isPaused: false
|
||||
missing_series_evals_to_resolve: 1
|
||||
- uid: 78a2ece6-4f7a-5496-9a59-6de4a56db201
|
||||
title: BlackboxHighLatency
|
||||
condition: threshold
|
||||
data:
|
||||
- refId: query
|
||||
queryType: prometheus
|
||||
relativeTimeRange:
|
||||
from: 660
|
||||
to: 60
|
||||
datasourceUid: prometheus
|
||||
model:
|
||||
datasource:
|
||||
type: prometheus
|
||||
uid: prometheus
|
||||
expr: probe_duration_seconds > 1
|
||||
instant: true
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
range: false
|
||||
refId: query
|
||||
- refId: prometheus_math
|
||||
queryType: math
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: prometheus_math
|
||||
type: math
|
||||
- refId: threshold
|
||||
queryType: threshold
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
conditions:
|
||||
- evaluator:
|
||||
params:
|
||||
- 0
|
||||
type: gt
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: prometheus_math
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: threshold
|
||||
type: threshold
|
||||
noDataState: OK
|
||||
execErrState: OK
|
||||
for: 2m
|
||||
annotations:
|
||||
description: |
|
||||
The Blackbox probe to {{ $labels.instance }} has been taking more than 1 second to respond for over 2 minutes.
|
||||
summary: High latency on a Blackbox probe
|
||||
labels:
|
||||
__converted_prometheus_rule__: "true"
|
||||
severity: warning
|
||||
isPaused: false
|
||||
missing_series_evals_to_resolve: 1
|
||||
- uid: 00b5d799-0eef-59e9-9371-2a0bfb7df19b
|
||||
title: BlackboxBadHTTPStatus
|
||||
condition: threshold
|
||||
data:
|
||||
- refId: query
|
||||
queryType: prometheus
|
||||
relativeTimeRange:
|
||||
from: 660
|
||||
to: 60
|
||||
datasourceUid: prometheus
|
||||
model:
|
||||
datasource:
|
||||
type: prometheus
|
||||
uid: prometheus
|
||||
expr: probe_http_status_code != 200
|
||||
instant: true
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
range: false
|
||||
refId: query
|
||||
- refId: prometheus_math
|
||||
queryType: math
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: prometheus_math
|
||||
type: math
|
||||
- refId: threshold
|
||||
queryType: threshold
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
conditions:
|
||||
- evaluator:
|
||||
params:
|
||||
- 0
|
||||
type: gt
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: prometheus_math
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: threshold
|
||||
type: threshold
|
||||
noDataState: OK
|
||||
execErrState: OK
|
||||
for: 1m
|
||||
annotations:
|
||||
description: |
|
||||
The Blackbox probe to {{ $labels.instance }} is returning HTTP status {{ $value }} different from 200.
|
||||
summary: Bad HTTP status code on a Blackbox probe
|
||||
labels:
|
||||
__converted_prometheus_rule__: "true"
|
||||
severity: warning
|
||||
isPaused: false
|
||||
missing_series_evals_to_resolve: 1
|
||||
- orgId: 1
|
||||
name: container-resources
|
||||
folder: alert_rules.yml
|
||||
interval: 1m
|
||||
rules:
|
||||
- uid: 985c697f-e309-524c-9cd4-650a2045c279
|
||||
title: HighGlobalCPUUsage
|
||||
condition: threshold
|
||||
data:
|
||||
- refId: query
|
||||
queryType: prometheus
|
||||
relativeTimeRange:
|
||||
from: 660
|
||||
to: 60
|
||||
datasourceUid: prometheus
|
||||
model:
|
||||
datasource:
|
||||
type: prometheus
|
||||
uid: prometheus
|
||||
expr: (sum(rate(container_cpu_user_seconds_total[5m])) * 100) > 80
|
||||
instant: true
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
range: false
|
||||
refId: query
|
||||
- refId: prometheus_math
|
||||
queryType: math
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: prometheus_math
|
||||
type: math
|
||||
- refId: threshold
|
||||
queryType: threshold
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
conditions:
|
||||
- evaluator:
|
||||
params:
|
||||
- 0
|
||||
type: gt
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: prometheus_math
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: threshold
|
||||
type: threshold
|
||||
noDataState: OK
|
||||
execErrState: OK
|
||||
for: 5m
|
||||
annotations:
|
||||
description: |
|
||||
Global CPU usage of containers has been above 80% for more than 5 minutes. Check which services are consuming the most resources.
|
||||
summary: High global CPU usage for containers
|
||||
labels:
|
||||
__converted_prometheus_rule__: "true"
|
||||
severity: warning
|
||||
isPaused: false
|
||||
missing_series_evals_to_resolve: 1
|
||||
- uid: 635d0ad1-10f2-51f4-9226-baf56557d870
|
||||
title: HighGlobalMemoryUsage
|
||||
condition: threshold
|
||||
data:
|
||||
- refId: query
|
||||
queryType: prometheus
|
||||
relativeTimeRange:
|
||||
from: 660
|
||||
to: 60
|
||||
datasourceUid: prometheus
|
||||
model:
|
||||
datasource:
|
||||
type: prometheus
|
||||
uid: prometheus
|
||||
expr: (sum(container_memory_usage_bytes) / sum(machine_memory_bytes)) * 100 > 80
|
||||
instant: true
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
range: false
|
||||
refId: query
|
||||
- refId: prometheus_math
|
||||
queryType: math
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: prometheus_math
|
||||
type: math
|
||||
- refId: threshold
|
||||
queryType: threshold
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
conditions:
|
||||
- evaluator:
|
||||
params:
|
||||
- 0
|
||||
type: gt
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: prometheus_math
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: threshold
|
||||
type: threshold
|
||||
noDataState: OK
|
||||
execErrState: OK
|
||||
for: 5m
|
||||
annotations:
|
||||
description: |
|
||||
Global memory usage of containers has been above 80% for more than 5 minutes.
|
||||
summary: High global memory usage for containers
|
||||
labels:
|
||||
__converted_prometheus_rule__: "true"
|
||||
severity: warning
|
||||
isPaused: false
|
||||
missing_series_evals_to_resolve: 1
|
||||
- orgId: 1
|
||||
name: per-container-resources
|
||||
folder: alert_rules.yml
|
||||
interval: 1m
|
||||
rules:
|
||||
- uid: 3daf3f51-d4ad-5169-ace2-cdc1c43d8e4e
|
||||
title: HighContainerCPUUsage
|
||||
condition: threshold
|
||||
data:
|
||||
- refId: query
|
||||
queryType: prometheus
|
||||
relativeTimeRange:
|
||||
from: 660
|
||||
to: 60
|
||||
datasourceUid: prometheus
|
||||
model:
|
||||
datasource:
|
||||
type: prometheus
|
||||
uid: prometheus
|
||||
expr: rate(container_cpu_user_seconds_total[5m]) * 100 > 80
|
||||
instant: true
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
range: false
|
||||
refId: query
|
||||
- refId: prometheus_math
|
||||
queryType: math
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: prometheus_math
|
||||
type: math
|
||||
- refId: threshold
|
||||
queryType: threshold
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
conditions:
|
||||
- evaluator:
|
||||
params:
|
||||
- 0
|
||||
type: gt
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: prometheus_math
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: threshold
|
||||
type: threshold
|
||||
noDataState: OK
|
||||
execErrState: OK
|
||||
for: 5m
|
||||
annotations:
|
||||
description: |
|
||||
Container {{ $labels.name }} has been using more than 80% CPU for more than 5 minutes.
|
||||
summary: High CPU usage on a container
|
||||
labels:
|
||||
__converted_prometheus_rule__: "true"
|
||||
severity: warning
|
||||
isPaused: false
|
||||
missing_series_evals_to_resolve: 1
|
||||
- uid: 3202077e-ba84-5401-86fe-0fe6b0a4c26d
|
||||
title: HighContainerMemoryUsage
|
||||
condition: threshold
|
||||
data:
|
||||
- refId: query
|
||||
queryType: prometheus
|
||||
relativeTimeRange:
|
||||
from: 660
|
||||
to: 60
|
||||
datasourceUid: prometheus
|
||||
model:
|
||||
datasource:
|
||||
type: prometheus
|
||||
uid: prometheus
|
||||
expr: container_memory_usage_bytes > 500 * 1024 * 1024
|
||||
instant: true
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
range: false
|
||||
refId: query
|
||||
- refId: prometheus_math
|
||||
queryType: math
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: prometheus_math
|
||||
type: math
|
||||
- refId: threshold
|
||||
queryType: threshold
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
conditions:
|
||||
- evaluator:
|
||||
params:
|
||||
- 0
|
||||
type: gt
|
||||
datasource:
|
||||
IsPrunable: false
|
||||
access: ""
|
||||
apiVersion: ""
|
||||
basicAuth: false
|
||||
basicAuthUser: ""
|
||||
created: "0001-01-01T00:00:00Z"
|
||||
database: ""
|
||||
id: -100
|
||||
isDefault: false
|
||||
jsonData: {}
|
||||
name: __expr__
|
||||
readOnly: false
|
||||
secureJsonData: {}
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
updated: "0001-01-01T00:00:00Z"
|
||||
url: ""
|
||||
user: ""
|
||||
withCredentials: false
|
||||
expression: prometheus_math
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: threshold
|
||||
type: threshold
|
||||
noDataState: OK
|
||||
execErrState: OK
|
||||
for: 5m
|
||||
annotations:
|
||||
description: |
|
||||
Container {{ $labels.name }} has been using more than 500 MB of RAM for more than 5 minutes. Adjust the threshold if necessary.
|
||||
summary: High memory usage on a container
|
||||
labels:
|
||||
__converted_prometheus_rule__: "true"
|
||||
severity: warning
|
||||
isPaused: false
|
||||
missing_series_evals_to_resolve: 1
|
||||
|
|
@ -6,10 +6,6 @@ scrape_configs:
|
|||
static_configs:
|
||||
- targets: ['monitoring-prometheus:9090']
|
||||
|
||||
- job_name: 'backend'
|
||||
static_configs:
|
||||
- targets: ['127.0.0.1:8888']
|
||||
|
||||
- job_name: 'cadvisor'
|
||||
static_configs:
|
||||
- targets: ['monitoring-cadvisor:8080']
|
||||
|
|
@ -21,7 +17,6 @@ scrape_configs:
|
|||
|
||||
static_configs:
|
||||
- targets:
|
||||
- http://nginx
|
||||
- http://nginx/monitoring/ok
|
||||
- http://auth/monitoring
|
||||
- http://user/monitoring
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue