Logs module
## 🦌 Centralized Logging Stack Integration ### ELK Stack Online - Added **`elasticsearch`**, **`logstash`**, and **`kibana`** services to `docker-compose.yml`: - **Elasticsearch** for log storage and indexing with persistent volumes. - **Logstash** as the GELF entrypoint, handling log ingestion and transformation. - **Kibana** as the web UI for log exploration, dashboards, and saved searches. - Each ELK service is wired with: - **Persistent storage** to survive restarts. - **Environment variables** for credentials and tuning. - **Bootstrap scripts** to perform initial setup (policies, templates, dashboards, etc.). ### Global GELF Logging - All existing services now use the **GELF logging driver** in `docker-compose.yml`: - Containers send their logs to **Logstash** instead of stdout-only. - Provides **structured**, centralized logs ready for querying in Elasticsearch/Kibana. - Result: no more log hunting across containers — everything lands in one searchable place. --- ## 🔁 Log Lifecycle & Visualization Automation ### Elasticsearch & Kibana Bootstrap - Introduced **bootstrap scripts and config files** to automate: - **Index Lifecycle Management (ILM)** policies for log retention and rollover. - **Index templates** for log indices (naming, mappings, and settings). - **Kibana imports** (index patterns / data views, dashboards, visualizations). - This turns ELK setup from a manual ritual into a **single-command provisioning step**. ### Logstash Pipeline Upgrade - Added a **Logstash pipeline configuration** to: - Ingest **GELF** logs from Docker. - **Normalize/rename fields** for consistent querying across services. - Index logs into **Elasticsearch** with **daily rotation per container** pattern. - Outcome: logs are structured, tagged by container, and auto-rotated to keep storage sane. --- ## 🛠 Makefile & Docker.mk Enhancements ### Logs Setup Targets - Added a new **`logs`** target in `Makefile` (with `.PHONY` declaration) to manage logging setup from the top level. - Added a **`logs-setup`** target in `Docker.mk` to: - Initialize **ILM policies** in Elasticsearch. - Apply **index templates** for logs. - Create **Kibana index patterns** so logs are immediately visible in the UI. - These targets plug into the existing tooling, making logging setup part of the **standard dev/ops workflow**. --- ## 🔐 Environment Configuration ### Secure Elasticsearch Access - Updated `env.example` to include: - **`ELASTIC_PASSWORD`**: central password for Elasticsearch authentication. - Encourages **secure-by-default** deployments and aligns local/dev with production-style security. --- ## 📈 Monitoring Configuration Updates ### Grafana Alerting & Prometheus Cleanup - Added a **basic alerting policy for Grafana**: - Provides a default routing tree for alerts. - Acts as a foundation for future, more granular alert rules. - Cleaned up **Prometheus scrape configuration**: - Removed obsolete backend scrape targets. - Keeps monitoring config focused on **live** and relevant services.
This commit is contained in:
commit
e44a3af76d
13 changed files with 998 additions and 14 deletions
24
Docker.mk
24
Docker.mk
|
|
@ -6,10 +6,12 @@
|
||||||
# By: maiboyer <maiboyer@student.42.fr> +#+ +:+ +#+ #
|
# By: maiboyer <maiboyer@student.42.fr> +#+ +:+ +#+ #
|
||||||
# +#+#+#+#+#+ +#+ #
|
# +#+#+#+#+#+ +#+ #
|
||||||
# Created: 2025/06/11 18:10:26 by maiboyer #+# #+# #
|
# Created: 2025/06/11 18:10:26 by maiboyer #+# #+# #
|
||||||
# Updated: 2025/07/30 19:32:11 by maiboyer ### ########.fr #
|
# Updated: 2025/11/14 18:54:16 by maiboyer ### ########.fr #
|
||||||
# #
|
# #
|
||||||
# **************************************************************************** #
|
# **************************************************************************** #
|
||||||
|
|
||||||
|
.PHONY: logs
|
||||||
|
|
||||||
all: build
|
all: build
|
||||||
docker compose up -d
|
docker compose up -d
|
||||||
|
|
||||||
|
|
@ -39,3 +41,23 @@ prune: clean
|
||||||
-docker network prune
|
-docker network prune
|
||||||
-docker system prune -a
|
-docker system prune -a
|
||||||
|
|
||||||
|
ES_URL ?= http://local.maix.me:9200
|
||||||
|
KIBANA_URL ?= http://local.maix.me:5601
|
||||||
|
|
||||||
|
logs-setup:
|
||||||
|
@until curl -s "$(ES_URL)" > /dev/null 2>&1; do sleep 1; done;
|
||||||
|
|
||||||
|
@curl -s -X PUT "$(ES_URL)/_ilm/policy/docker-logs-policy" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"policy":{"phases":{"hot":{"actions":{}},"delete":{"min_age":"7d","actions":{"delete":{}}}}}}' > /dev/null
|
||||||
|
|
||||||
|
@curl -s -X PUT "$(ES_URL)/_template/docker-logs-template" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"index_patterns":["docker-*"],"settings":{"index.lifecycle.name":"docker-logs-policy"}}' > /dev/null
|
||||||
|
|
||||||
|
@until curl -s "$(KIBANA_URL)/api/status" > /dev/null 2>&1; do sleep 1; done;
|
||||||
|
|
||||||
|
@curl -s -X POST "$(KIBANA_URL)/api/saved_objects/index-pattern/docker-logs" \
|
||||||
|
-H "kbn-xsrf: true" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"attributes":{"title":"docker-*","timeFieldName":"@timestamp"}}' > /dev/null
|
||||||
|
|
|
||||||
6
Makefile
6
Makefile
|
|
@ -1,4 +1,4 @@
|
||||||
# **************************************************************************** #make
|
# **************************************************************************** #
|
||||||
# #
|
# #
|
||||||
# ::: :::::::: #
|
# ::: :::::::: #
|
||||||
# Makefile :+: :+: :+: #
|
# Makefile :+: :+: :+: #
|
||||||
|
|
@ -6,7 +6,7 @@
|
||||||
# By: rparodi <rparodi@student.42.fr> +#+ +:+ +#+ #
|
# By: rparodi <rparodi@student.42.fr> +#+ +:+ +#+ #
|
||||||
# +#+#+#+#+#+ +#+ #
|
# +#+#+#+#+#+ +#+ #
|
||||||
# Created: 2023/11/12 11:05:05 by rparodi #+# #+# #
|
# Created: 2023/11/12 11:05:05 by rparodi #+# #+# #
|
||||||
# Updated: 2025/11/10 01:05:11 by maiboyer ### ########.fr #
|
# Updated: 2025/11/14 17:40:57 by maiboyer ### ########.fr #
|
||||||
# #
|
# #
|
||||||
# **************************************************************************** #
|
# **************************************************************************** #
|
||||||
|
|
||||||
|
|
@ -157,4 +157,4 @@ fnginx: nginx-dev/nginx-selfsigned.crt nginx-dev/nginx-selfsigned.key
|
||||||
wait
|
wait
|
||||||
|
|
||||||
# phony
|
# phony
|
||||||
.PHONY: all clean fclean re header footer npm@install npm@clean npm@fclean npm@build sql tmux
|
.PHONY: all clean fclean re header footer npm@install npm@clean npm@fclean npm@build sql tmux logs
|
||||||
|
|
|
||||||
|
|
@ -15,6 +15,11 @@ services:
|
||||||
- transcendance-network
|
- transcendance-network
|
||||||
volumes:
|
volumes:
|
||||||
- static-volume:/volumes/static
|
- static-volume:/volumes/static
|
||||||
|
logging:
|
||||||
|
driver: gelf
|
||||||
|
options:
|
||||||
|
gelf-address: "udp://127.0.0.1:12201"
|
||||||
|
tag: "{{.Name}}"
|
||||||
|
|
||||||
#
|
#
|
||||||
# The "entry point" as in it does all of this:
|
# The "entry point" as in it does all of this:
|
||||||
|
|
@ -37,6 +42,11 @@ services:
|
||||||
environment:
|
environment:
|
||||||
# this can stay the same for developpement. This is an alias to `localhost`
|
# this can stay the same for developpement. This is an alias to `localhost`
|
||||||
- NGINX_DOMAIN=local.maix.me
|
- NGINX_DOMAIN=local.maix.me
|
||||||
|
logging:
|
||||||
|
driver: gelf
|
||||||
|
options:
|
||||||
|
gelf-address: "udp://127.0.0.1:12201"
|
||||||
|
tag: "{{.Name}}"
|
||||||
|
|
||||||
###############
|
###############
|
||||||
# ICONS #
|
# ICONS #
|
||||||
|
|
@ -58,6 +68,11 @@ services:
|
||||||
- JWT_SECRET=KRUGKIDROVUWG2ZAMJZG653OEBTG66BANJ2W24DTEBXXMZLSEB2GQZJANRQXU6JA
|
- JWT_SECRET=KRUGKIDROVUWG2ZAMJZG653OEBTG66BANJ2W24DTEBXXMZLSEB2GQZJANRQXU6JA
|
||||||
- USER_ICONS_STORE=/volumes/store
|
- USER_ICONS_STORE=/volumes/store
|
||||||
- DATABASE_DIR=/volumes/database
|
- DATABASE_DIR=/volumes/database
|
||||||
|
logging:
|
||||||
|
driver: gelf
|
||||||
|
options:
|
||||||
|
gelf-address: "udp://127.0.0.1:12201"
|
||||||
|
tag: "{{.Name}}"
|
||||||
|
|
||||||
|
|
||||||
###############
|
###############
|
||||||
|
|
@ -80,6 +95,11 @@ services:
|
||||||
- JWT_SECRET=KRUGKIDROVUWG2ZAMJZG653OEBTG66BANJ2W24DTEBXXMZLSEB2GQZJANRQXU6JA
|
- JWT_SECRET=KRUGKIDROVUWG2ZAMJZG653OEBTG66BANJ2W24DTEBXXMZLSEB2GQZJANRQXU6JA
|
||||||
- DATABASE_DIR=/volumes/database
|
- DATABASE_DIR=/volumes/database
|
||||||
- PROVIDER_FILE=/extra/providers.toml
|
- PROVIDER_FILE=/extra/providers.toml
|
||||||
|
logging:
|
||||||
|
driver: gelf
|
||||||
|
options:
|
||||||
|
gelf-address: "udp://127.0.0.1:12201"
|
||||||
|
tag: "{{.Name}}"
|
||||||
|
|
||||||
|
|
||||||
###############
|
###############
|
||||||
|
|
@ -123,7 +143,11 @@ services:
|
||||||
environment:
|
environment:
|
||||||
- JWT_SECRET=KRUGKIDROVUWG2ZAMJZG653OEBTG66BANJ2W24DTEBXXMZLSEB2GQZJANRQXU6JA
|
- JWT_SECRET=KRUGKIDROVUWG2ZAMJZG653OEBTG66BANJ2W24DTEBXXMZLSEB2GQZJANRQXU6JA
|
||||||
- DATABASE_DIR=/volumes/database
|
- DATABASE_DIR=/volumes/database
|
||||||
|
logging:
|
||||||
|
driver: gelf
|
||||||
|
options:
|
||||||
|
gelf-address: "udp://127.0.0.1:12201"
|
||||||
|
tag: "{{.Name}}"
|
||||||
|
|
||||||
|
|
||||||
###############
|
###############
|
||||||
|
|
@ -154,6 +178,11 @@ services:
|
||||||
- GF_SERVER_ROOT_URL=http://local.maix.me:3000
|
- GF_SERVER_ROOT_URL=http://local.maix.me:3000
|
||||||
- GF_SECURITY_ADMIN_USER=${GRAFANA_ADMIN_USER}
|
- GF_SECURITY_ADMIN_USER=${GRAFANA_ADMIN_USER}
|
||||||
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASS}
|
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASS}
|
||||||
|
logging:
|
||||||
|
driver: gelf
|
||||||
|
options:
|
||||||
|
gelf-address: "udp://127.0.0.1:12201"
|
||||||
|
tag: "{{.Name}}"
|
||||||
|
|
||||||
prometheus:
|
prometheus:
|
||||||
image: prom/prometheus:latest
|
image: prom/prometheus:latest
|
||||||
|
|
@ -164,6 +193,11 @@ services:
|
||||||
volumes:
|
volumes:
|
||||||
- ./monitoring/prometheus:/etc/prometheus/
|
- ./monitoring/prometheus:/etc/prometheus/
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
|
logging:
|
||||||
|
driver: gelf
|
||||||
|
options:
|
||||||
|
gelf-address: "udp://127.0.0.1:12201"
|
||||||
|
tag: "{{.Name}}"
|
||||||
|
|
||||||
cadvisor:
|
cadvisor:
|
||||||
image: gcr.io/cadvisor/cadvisor:latest
|
image: gcr.io/cadvisor/cadvisor:latest
|
||||||
|
|
@ -178,6 +212,12 @@ services:
|
||||||
- /sys:/sys:ro
|
- /sys:/sys:ro
|
||||||
- /var/lib/docker/:/var/lib/docker:ro
|
- /var/lib/docker/:/var/lib/docker:ro
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
|
logging:
|
||||||
|
driver: gelf
|
||||||
|
options:
|
||||||
|
gelf-address: "udp://127.0.0.1:12201"
|
||||||
|
tag: "{{.Name}}"
|
||||||
|
|
||||||
|
|
||||||
blackbox:
|
blackbox:
|
||||||
image: prom/blackbox-exporter:latest
|
image: prom/blackbox-exporter:latest
|
||||||
|
|
@ -187,9 +227,70 @@ services:
|
||||||
ports:
|
ports:
|
||||||
- "9115:9115"
|
- "9115:9115"
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
|
logging:
|
||||||
|
driver: gelf
|
||||||
|
options:
|
||||||
|
gelf-address: "udp://127.0.0.1:12201"
|
||||||
|
tag: "{{.Name}}"
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
###############
|
||||||
|
# LOGS #
|
||||||
|
###############
|
||||||
|
|
||||||
|
elasticsearch:
|
||||||
|
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.23
|
||||||
|
container_name: logs-elasticsearch
|
||||||
|
networks:
|
||||||
|
- monitoring
|
||||||
|
environment:
|
||||||
|
- discovery.type=single-node
|
||||||
|
- ES_JAVA_OPTS=-Xms512m -Xmx512m
|
||||||
|
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
|
||||||
|
volumes:
|
||||||
|
- elastic-data:/usr/share/elasticsearch/data
|
||||||
|
- ./logs/elasticsearch:/setup
|
||||||
|
ports:
|
||||||
|
- "9200:9200"
|
||||||
|
command: ["/setup/bootstrap.sh"]
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
logstash:
|
||||||
|
image: docker.elastic.co/logstash/logstash:7.17.23
|
||||||
|
container_name: logs-logstash
|
||||||
|
depends_on:
|
||||||
|
- elasticsearch
|
||||||
|
networks:
|
||||||
|
- monitoring
|
||||||
|
volumes:
|
||||||
|
- ./logs/logstash/pipeline:/usr/share/logstash/pipeline
|
||||||
|
ports:
|
||||||
|
- "12201:12201/udp"
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
kibana:
|
||||||
|
image: docker.elastic.co/kibana/kibana:7.17.23
|
||||||
|
container_name: logs-kibana
|
||||||
|
depends_on:
|
||||||
|
- elasticsearch
|
||||||
|
networks:
|
||||||
|
- monitoring
|
||||||
|
environment:
|
||||||
|
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
|
||||||
|
- SERVER_PUBLICBASEURL=http://local.maix.me:5601
|
||||||
|
- ELASTICSEARCH_USERNAME=elastic
|
||||||
|
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
|
||||||
|
ports:
|
||||||
|
- "5601:5601"
|
||||||
|
volumes:
|
||||||
|
- ./logs/kibana:/setup
|
||||||
|
command: ["/setup/bootstrap.sh"]
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
volumes:
|
volumes:
|
||||||
images-volume:
|
images-volume:
|
||||||
sqlite-volume:
|
sqlite-volume:
|
||||||
static-volume:
|
static-volume:
|
||||||
grafana-data:
|
grafana-data:
|
||||||
|
elastic-data:
|
||||||
|
|
|
||||||
|
|
@ -1,3 +1,5 @@
|
||||||
GRAFANA_ADMIN_USER=""
|
GRAFANA_ADMIN_USER=
|
||||||
GRAFANA_ADMIN_PASS=""
|
GRAFANA_ADMIN_PASS=
|
||||||
GRAFANA_WEBHOOK_URL=""
|
GRAFANA_WEBHOOK_URL=
|
||||||
|
|
||||||
|
ELASTIC_PASSWORD=
|
||||||
|
|
|
||||||
19
logs/elasticsearch/bootstrap.sh
Executable file
19
logs/elasticsearch/bootstrap.sh
Executable file
|
|
@ -0,0 +1,19 @@
|
||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
setup_ilm() {
|
||||||
|
set -xe
|
||||||
|
until curl -s -f http://localhost:9200 >/dev/null; do
|
||||||
|
sleep 2;
|
||||||
|
done;
|
||||||
|
|
||||||
|
curl -v -X PUT "localhost:9200/_ilm/policy/docker-logs-policy" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '@/setup/docker-logs-policy.json'
|
||||||
|
curl -v -X PUT "localhost:9200/_template/docker-logs-template" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '@/setup/docker-logs-template.json'
|
||||||
|
exit 0
|
||||||
|
}
|
||||||
|
|
||||||
|
setup_ilm &
|
||||||
|
exec /usr/local/bin/docker-entrypoint.sh eswrapper
|
||||||
15
logs/elasticsearch/docker-logs-policy.json
Normal file
15
logs/elasticsearch/docker-logs-policy.json
Normal file
|
|
@ -0,0 +1,15 @@
|
||||||
|
{
|
||||||
|
"policy": {
|
||||||
|
"phases": {
|
||||||
|
"hot": {
|
||||||
|
"actions": {}
|
||||||
|
},
|
||||||
|
"delete": {
|
||||||
|
"min_age": "7d",
|
||||||
|
"actions": {
|
||||||
|
"delete": {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
1
logs/elasticsearch/docker-logs-template.json
Normal file
1
logs/elasticsearch/docker-logs-template.json
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
{"index_patterns":["docker-*"],"settings":{"index.lifecycle.name":"docker-logs-policy"}}}
|
||||||
15
logs/kibana/bootstrap.sh
Executable file
15
logs/kibana/bootstrap.sh
Executable file
|
|
@ -0,0 +1,15 @@
|
||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
kibana_setup() {
|
||||||
|
set -xe
|
||||||
|
until curl -s -f "localhost:5601/api/status"; do
|
||||||
|
sleep 2
|
||||||
|
done
|
||||||
|
|
||||||
|
curl -v -X POST "localhost:5601/api/saved_objects/_import?overwrite=true" \
|
||||||
|
-H "kbn-xsrf: true" \
|
||||||
|
--form file='@/setup/export.ndjson'
|
||||||
|
exit 0
|
||||||
|
}
|
||||||
|
kibana_setup &
|
||||||
|
exec /usr/local/bin/kibana-docker
|
||||||
5
logs/kibana/export.ndjson
Normal file
5
logs/kibana/export.ndjson
Normal file
|
|
@ -0,0 +1,5 @@
|
||||||
|
{"attributes":{"buildNum":47645,"defaultIndex":"docker-logs","defaultRoute":"/app/dashboards#/view/f1356840-c17c-11f0-92fb-4711317b9bee"},"coreMigrationVersion":"7.17.23","id":"7.17.23","migrationVersion":{"config":"7.13.0"},"references":[],"type":"config","updated_at":"2025-11-14T17:29:48.539Z","version":"WzE0Miw0XQ=="}
|
||||||
|
{"attributes":{"fieldAttrs":"{\"@timestamp\":{\"count\":3},\"command\":{\"count\":2},\"container_name\":{\"count\":1},\"level\":{\"count\":1},\"message\":{\"count\":1}}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"docker-*","typeMeta":"{}"},"coreMigrationVersion":"7.17.23","id":"docker-logs","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2025-11-14T17:26:47.450Z","version":"Wzc0LDRd"}
|
||||||
|
{"attributes":{"columns":["container_name","message","level"],"description":"test","grid":{},"hideChart":false,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[],\"indexRefName\":\"kibanaSavedObjectMeta.searchSourceJSON.index\"}"},"sort":[["@timestamp","asc"]],"title":"LogTable"},"coreMigrationVersion":"7.17.23","id":"b5a48950-c17c-11f0-92fb-4711317b9bee","migrationVersion":{"search":"7.9.3"},"references":[{"id":"docker-logs","name":"kibanaSavedObjectMeta.searchSourceJSON.index","type":"index-pattern"}],"type":"search","updated_at":"2025-11-14T17:26:47.450Z","version":"Wzc1LDRd"}
|
||||||
|
{"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.17.23\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":0,\"w\":24,\"h\":21,\"i\":\"9600aa15-1732-41da-a43c-723fdb1a97a0\"},\"panelIndex\":\"9600aa15-1732-41da-a43c-723fdb1a97a0\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"visualizationType\":\"lnsXY\",\"type\":\"lens\",\"references\":[{\"type\":\"index-pattern\",\"id\":\"docker-logs\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"docker-logs\",\"name\":\"indexpattern-datasource-layer-7b411268-3ed2-45f6-9067-b88364aba992\"}],\"state\":{\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"labelsOrientation\":{\"x\":0,\"yLeft\":0,\"yRight\":0},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"bar_stacked\",\"layers\":[{\"layerId\":\"7b411268-3ed2-45f6-9067-b88364aba992\",\"accessors\":[\"27ad7775-f44f-4d6c-b49d-5f8bebee33af\"],\"position\":\"top\",\"seriesType\":\"bar\",\"showGridlines\":false,\"layerType\":\"data\",\"xAccessor\":\"e4e3a367-7cd4-4ad6-95a7-824f0717503d\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[],\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"7b411268-3ed2-45f6-9067-b88364aba992\":{\"columns\":{\"e4e3a367-7cd4-4ad6-95a7-824f0717503d\":{\"label\":\"Top values of container_name.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"container_name.keyword\",\"isBucketed\":true,\"params\":{\"size\":5,\"orderBy\":{\"type\":\"column\",\"columnId\":\"27ad7775-f44f-4d6c-b49d-5f8bebee33af\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}},\"27ad7775-f44f-4d6c-b49d-5f8bebee33af\":{\"label\":\"Count of records\",\"dataType\":\"number\",\"operationType\":\"count\",\"isBucketed\":false,\"scale\":\"ratio\",\"sourceField\":\"Records\"}},\"columnOrder\":[\"e4e3a367-7cd4-4ad6-95a7-824f0717503d\",\"27ad7775-f44f-4d6c-b49d-5f8bebee33af\"],\"incompleteColumns\":{}}}}}}},\"enhancements\":{},\"hidePanelTitles\":false},\"title\":\"Log Count\"},{\"version\":\"7.17.23\",\"type\":\"search\",\"gridData\":{\"x\":24,\"y\":0,\"w\":24,\"h\":21,\"i\":\"08f56117-4041-4282-af91-99a44941e06d\"},\"panelIndex\":\"08f56117-4041-4282-af91-99a44941e06d\",\"embeddableConfig\":{\"enhancements\":{},\"hidePanelTitles\":false},\"title\":\"Log Management\",\"panelRefName\":\"panel_08f56117-4041-4282-af91-99a44941e06d\"}]","timeRestore":false,"title":"Default","version":1},"coreMigrationVersion":"7.17.23","id":"f1356840-c17c-11f0-92fb-4711317b9bee","migrationVersion":{"dashboard":"7.17.3"},"references":[{"id":"docker-logs","name":"9600aa15-1732-41da-a43c-723fdb1a97a0:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"docker-logs","name":"9600aa15-1732-41da-a43c-723fdb1a97a0:indexpattern-datasource-layer-7b411268-3ed2-45f6-9067-b88364aba992","type":"index-pattern"},{"id":"b5a48950-c17c-11f0-92fb-4711317b9bee","name":"08f56117-4041-4282-af91-99a44941e06d:panel_08f56117-4041-4282-af91-99a44941e06d","type":"search"}],"type":"dashboard","updated_at":"2025-11-14T17:26:47.450Z","version":"Wzc2LDRd"}
|
||||||
|
{"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":4,"missingRefCount":0,"missingReferences":[]}
|
||||||
22
logs/logstash/pipeline/logstash.conf
Normal file
22
logs/logstash/pipeline/logstash.conf
Normal file
|
|
@ -0,0 +1,22 @@
|
||||||
|
input {
|
||||||
|
gelf {
|
||||||
|
port => 12201
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
filter {
|
||||||
|
mutate {
|
||||||
|
rename => { "[full_message]" => "message" }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
output {
|
||||||
|
elasticsearch {
|
||||||
|
hosts => ["http://elasticsearch:9200"]
|
||||||
|
index => "docker-%{[container_name]}-%{+YYYY.MM.dd}"
|
||||||
|
}
|
||||||
|
|
||||||
|
stdout {
|
||||||
|
codec => rubydebug
|
||||||
|
}
|
||||||
|
}
|
||||||
5
monitoring/grafana/alerting/policies.yaml
Normal file
5
monitoring/grafana/alerting/policies.yaml
Normal file
|
|
@ -0,0 +1,5 @@
|
||||||
|
routes:
|
||||||
|
receiver: discord-webhook
|
||||||
|
routes:
|
||||||
|
- matchers:
|
||||||
|
receiver: discord-webhook
|
||||||
782
monitoring/grafana/alerting/rules.yaml
Normal file
782
monitoring/grafana/alerting/rules.yaml
Normal file
|
|
@ -0,0 +1,782 @@
|
||||||
|
apiVersion: 1
|
||||||
|
groups:
|
||||||
|
- orgId: 1
|
||||||
|
name: availability
|
||||||
|
folder: alert_rules.yml
|
||||||
|
interval: 1m
|
||||||
|
rules:
|
||||||
|
- uid: 14db4fe7-faf3-5629-9ee1-c5c189d75fec
|
||||||
|
title: InstanceDown
|
||||||
|
condition: threshold
|
||||||
|
data:
|
||||||
|
- refId: query
|
||||||
|
queryType: prometheus
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 660
|
||||||
|
to: 60
|
||||||
|
datasourceUid: prometheus
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: prometheus
|
||||||
|
expr: up == 0
|
||||||
|
instant: true
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
range: false
|
||||||
|
refId: query
|
||||||
|
- refId: prometheus_math
|
||||||
|
queryType: math
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: prometheus_math
|
||||||
|
type: math
|
||||||
|
- refId: threshold
|
||||||
|
queryType: threshold
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params:
|
||||||
|
- 0
|
||||||
|
type: gt
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: prometheus_math
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: threshold
|
||||||
|
type: threshold
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: OK
|
||||||
|
for: 1m
|
||||||
|
annotations:
|
||||||
|
description: |
|
||||||
|
Instance {{ $labels.instance }} (job={{ $labels.job }}) has not responded to Prometheus scrapes for more than one minute.
|
||||||
|
summary: Instance {{ $labels.job }} down
|
||||||
|
labels:
|
||||||
|
__converted_prometheus_rule__: "true"
|
||||||
|
severity: critical
|
||||||
|
isPaused: false
|
||||||
|
missing_series_evals_to_resolve: 1
|
||||||
|
- orgId: 1
|
||||||
|
name: blackbox-probes
|
||||||
|
folder: alert_rules.yml
|
||||||
|
interval: 1m
|
||||||
|
rules:
|
||||||
|
- uid: c549c658-ce15-5d56-9842-07730bb11e15
|
||||||
|
title: BlackboxProbeFailed
|
||||||
|
condition: threshold
|
||||||
|
data:
|
||||||
|
- refId: query
|
||||||
|
queryType: prometheus
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 660
|
||||||
|
to: 60
|
||||||
|
datasourceUid: prometheus
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: prometheus
|
||||||
|
expr: probe_success == 0
|
||||||
|
instant: true
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
range: false
|
||||||
|
refId: query
|
||||||
|
- refId: prometheus_math
|
||||||
|
queryType: math
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: prometheus_math
|
||||||
|
type: math
|
||||||
|
- refId: threshold
|
||||||
|
queryType: threshold
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params:
|
||||||
|
- 0
|
||||||
|
type: gt
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: prometheus_math
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: threshold
|
||||||
|
type: threshold
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: OK
|
||||||
|
for: 30s
|
||||||
|
annotations:
|
||||||
|
description: |
|
||||||
|
The Blackbox probe for {{ $labels.instance }} has failed (probe_success = 0).
|
||||||
|
summary: Blackbox probe failed
|
||||||
|
labels:
|
||||||
|
__converted_prometheus_rule__: "true"
|
||||||
|
severity: critical
|
||||||
|
isPaused: false
|
||||||
|
missing_series_evals_to_resolve: 1
|
||||||
|
- uid: 78a2ece6-4f7a-5496-9a59-6de4a56db201
|
||||||
|
title: BlackboxHighLatency
|
||||||
|
condition: threshold
|
||||||
|
data:
|
||||||
|
- refId: query
|
||||||
|
queryType: prometheus
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 660
|
||||||
|
to: 60
|
||||||
|
datasourceUid: prometheus
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: prometheus
|
||||||
|
expr: probe_duration_seconds > 1
|
||||||
|
instant: true
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
range: false
|
||||||
|
refId: query
|
||||||
|
- refId: prometheus_math
|
||||||
|
queryType: math
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: prometheus_math
|
||||||
|
type: math
|
||||||
|
- refId: threshold
|
||||||
|
queryType: threshold
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params:
|
||||||
|
- 0
|
||||||
|
type: gt
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: prometheus_math
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: threshold
|
||||||
|
type: threshold
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: OK
|
||||||
|
for: 2m
|
||||||
|
annotations:
|
||||||
|
description: |
|
||||||
|
The Blackbox probe to {{ $labels.instance }} has been taking more than 1 second to respond for over 2 minutes.
|
||||||
|
summary: High latency on a Blackbox probe
|
||||||
|
labels:
|
||||||
|
__converted_prometheus_rule__: "true"
|
||||||
|
severity: warning
|
||||||
|
isPaused: false
|
||||||
|
missing_series_evals_to_resolve: 1
|
||||||
|
- uid: 00b5d799-0eef-59e9-9371-2a0bfb7df19b
|
||||||
|
title: BlackboxBadHTTPStatus
|
||||||
|
condition: threshold
|
||||||
|
data:
|
||||||
|
- refId: query
|
||||||
|
queryType: prometheus
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 660
|
||||||
|
to: 60
|
||||||
|
datasourceUid: prometheus
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: prometheus
|
||||||
|
expr: probe_http_status_code != 200
|
||||||
|
instant: true
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
range: false
|
||||||
|
refId: query
|
||||||
|
- refId: prometheus_math
|
||||||
|
queryType: math
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: prometheus_math
|
||||||
|
type: math
|
||||||
|
- refId: threshold
|
||||||
|
queryType: threshold
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params:
|
||||||
|
- 0
|
||||||
|
type: gt
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: prometheus_math
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: threshold
|
||||||
|
type: threshold
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: OK
|
||||||
|
for: 1m
|
||||||
|
annotations:
|
||||||
|
description: |
|
||||||
|
The Blackbox probe to {{ $labels.instance }} is returning HTTP status {{ $value }} different from 200.
|
||||||
|
summary: Bad HTTP status code on a Blackbox probe
|
||||||
|
labels:
|
||||||
|
__converted_prometheus_rule__: "true"
|
||||||
|
severity: warning
|
||||||
|
isPaused: false
|
||||||
|
missing_series_evals_to_resolve: 1
|
||||||
|
- orgId: 1
|
||||||
|
name: container-resources
|
||||||
|
folder: alert_rules.yml
|
||||||
|
interval: 1m
|
||||||
|
rules:
|
||||||
|
- uid: 985c697f-e309-524c-9cd4-650a2045c279
|
||||||
|
title: HighGlobalCPUUsage
|
||||||
|
condition: threshold
|
||||||
|
data:
|
||||||
|
- refId: query
|
||||||
|
queryType: prometheus
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 660
|
||||||
|
to: 60
|
||||||
|
datasourceUid: prometheus
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: prometheus
|
||||||
|
expr: (sum(rate(container_cpu_user_seconds_total[5m])) * 100) > 80
|
||||||
|
instant: true
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
range: false
|
||||||
|
refId: query
|
||||||
|
- refId: prometheus_math
|
||||||
|
queryType: math
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: prometheus_math
|
||||||
|
type: math
|
||||||
|
- refId: threshold
|
||||||
|
queryType: threshold
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params:
|
||||||
|
- 0
|
||||||
|
type: gt
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: prometheus_math
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: threshold
|
||||||
|
type: threshold
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: OK
|
||||||
|
for: 5m
|
||||||
|
annotations:
|
||||||
|
description: |
|
||||||
|
Global CPU usage of containers has been above 80% for more than 5 minutes. Check which services are consuming the most resources.
|
||||||
|
summary: High global CPU usage for containers
|
||||||
|
labels:
|
||||||
|
__converted_prometheus_rule__: "true"
|
||||||
|
severity: warning
|
||||||
|
isPaused: false
|
||||||
|
missing_series_evals_to_resolve: 1
|
||||||
|
- uid: 635d0ad1-10f2-51f4-9226-baf56557d870
|
||||||
|
title: HighGlobalMemoryUsage
|
||||||
|
condition: threshold
|
||||||
|
data:
|
||||||
|
- refId: query
|
||||||
|
queryType: prometheus
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 660
|
||||||
|
to: 60
|
||||||
|
datasourceUid: prometheus
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: prometheus
|
||||||
|
expr: (sum(container_memory_usage_bytes) / sum(machine_memory_bytes)) * 100 > 80
|
||||||
|
instant: true
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
range: false
|
||||||
|
refId: query
|
||||||
|
- refId: prometheus_math
|
||||||
|
queryType: math
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: prometheus_math
|
||||||
|
type: math
|
||||||
|
- refId: threshold
|
||||||
|
queryType: threshold
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params:
|
||||||
|
- 0
|
||||||
|
type: gt
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: prometheus_math
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: threshold
|
||||||
|
type: threshold
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: OK
|
||||||
|
for: 5m
|
||||||
|
annotations:
|
||||||
|
description: |
|
||||||
|
Global memory usage of containers has been above 80% for more than 5 minutes.
|
||||||
|
summary: High global memory usage for containers
|
||||||
|
labels:
|
||||||
|
__converted_prometheus_rule__: "true"
|
||||||
|
severity: warning
|
||||||
|
isPaused: false
|
||||||
|
missing_series_evals_to_resolve: 1
|
||||||
|
- orgId: 1
|
||||||
|
name: per-container-resources
|
||||||
|
folder: alert_rules.yml
|
||||||
|
interval: 1m
|
||||||
|
rules:
|
||||||
|
- uid: 3daf3f51-d4ad-5169-ace2-cdc1c43d8e4e
|
||||||
|
title: HighContainerCPUUsage
|
||||||
|
condition: threshold
|
||||||
|
data:
|
||||||
|
- refId: query
|
||||||
|
queryType: prometheus
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 660
|
||||||
|
to: 60
|
||||||
|
datasourceUid: prometheus
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: prometheus
|
||||||
|
expr: rate(container_cpu_user_seconds_total[5m]) * 100 > 80
|
||||||
|
instant: true
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
range: false
|
||||||
|
refId: query
|
||||||
|
- refId: prometheus_math
|
||||||
|
queryType: math
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: prometheus_math
|
||||||
|
type: math
|
||||||
|
- refId: threshold
|
||||||
|
queryType: threshold
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params:
|
||||||
|
- 0
|
||||||
|
type: gt
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: prometheus_math
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: threshold
|
||||||
|
type: threshold
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: OK
|
||||||
|
for: 5m
|
||||||
|
annotations:
|
||||||
|
description: |
|
||||||
|
Container {{ $labels.name }} has been using more than 80% CPU for more than 5 minutes.
|
||||||
|
summary: High CPU usage on a container
|
||||||
|
labels:
|
||||||
|
__converted_prometheus_rule__: "true"
|
||||||
|
severity: warning
|
||||||
|
isPaused: false
|
||||||
|
missing_series_evals_to_resolve: 1
|
||||||
|
- uid: 3202077e-ba84-5401-86fe-0fe6b0a4c26d
|
||||||
|
title: HighContainerMemoryUsage
|
||||||
|
condition: threshold
|
||||||
|
data:
|
||||||
|
- refId: query
|
||||||
|
queryType: prometheus
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 660
|
||||||
|
to: 60
|
||||||
|
datasourceUid: prometheus
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: prometheus
|
||||||
|
expr: container_memory_usage_bytes > 500 * 1024 * 1024
|
||||||
|
instant: true
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
range: false
|
||||||
|
refId: query
|
||||||
|
- refId: prometheus_math
|
||||||
|
queryType: math
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: is_number($query) || is_nan($query) || is_inf($query)
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: prometheus_math
|
||||||
|
type: math
|
||||||
|
- refId: threshold
|
||||||
|
queryType: threshold
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params:
|
||||||
|
- 0
|
||||||
|
type: gt
|
||||||
|
datasource:
|
||||||
|
IsPrunable: false
|
||||||
|
access: ""
|
||||||
|
apiVersion: ""
|
||||||
|
basicAuth: false
|
||||||
|
basicAuthUser: ""
|
||||||
|
created: "0001-01-01T00:00:00Z"
|
||||||
|
database: ""
|
||||||
|
id: -100
|
||||||
|
isDefault: false
|
||||||
|
jsonData: {}
|
||||||
|
name: __expr__
|
||||||
|
readOnly: false
|
||||||
|
secureJsonData: {}
|
||||||
|
type: __expr__
|
||||||
|
uid: __expr__
|
||||||
|
updated: "0001-01-01T00:00:00Z"
|
||||||
|
url: ""
|
||||||
|
user: ""
|
||||||
|
withCredentials: false
|
||||||
|
expression: prometheus_math
|
||||||
|
intervalMs: 1000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
refId: threshold
|
||||||
|
type: threshold
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: OK
|
||||||
|
for: 5m
|
||||||
|
annotations:
|
||||||
|
description: |
|
||||||
|
Container {{ $labels.name }} has been using more than 500 MB of RAM for more than 5 minutes. Adjust the threshold if necessary.
|
||||||
|
summary: High memory usage on a container
|
||||||
|
labels:
|
||||||
|
__converted_prometheus_rule__: "true"
|
||||||
|
severity: warning
|
||||||
|
isPaused: false
|
||||||
|
missing_series_evals_to_resolve: 1
|
||||||
|
|
@ -6,10 +6,6 @@ scrape_configs:
|
||||||
static_configs:
|
static_configs:
|
||||||
- targets: ['monitoring-prometheus:9090']
|
- targets: ['monitoring-prometheus:9090']
|
||||||
|
|
||||||
- job_name: 'backend'
|
|
||||||
static_configs:
|
|
||||||
- targets: ['127.0.0.1:8888']
|
|
||||||
|
|
||||||
- job_name: 'cadvisor'
|
- job_name: 'cadvisor'
|
||||||
static_configs:
|
static_configs:
|
||||||
- targets: ['monitoring-cadvisor:8080']
|
- targets: ['monitoring-cadvisor:8080']
|
||||||
|
|
@ -21,7 +17,6 @@ scrape_configs:
|
||||||
|
|
||||||
static_configs:
|
static_configs:
|
||||||
- targets:
|
- targets:
|
||||||
- http://nginx
|
|
||||||
- http://nginx/monitoring/ok
|
- http://nginx/monitoring/ok
|
||||||
- http://auth/monitoring
|
- http://auth/monitoring
|
||||||
- http://user/monitoring
|
- http://user/monitoring
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue