We now include a monitoring stack for introspection on a running Kubernetes cluster. The stack includes 3 components:
┌────────┐ │ Router │ ┌────────┐ ┌─────┐ └────────┘ │ Logger │◀───▶│Redis│ │ └────────┘ └─────┘ Log file ▲ │ │ ▼ │ ┌────────┐ ┌─────────┐ logs/metrics ┌─────┐ │App Logs│──Log File──▶│ fluentd │───────topics─────▶│ NSQ │ └────────┘ └─────────┘ └─────┘ │ │ ┌─────────────┐ │ │ HOST │ ▼ │ Telegraf │───┐ ┌────────┐ └─────────────┘ │ │Telegraf│ │ └────────┘ ┌─────────────┐ │ │ │ HOST │ │ ┌───────────┐ │ │ Telegraf │───┼───▶│ InfluxDB │◀────Wire ───────────┘ └─────────────┘ │ └───────────┘ Protocol │ ▲ ┌─────────────┐ │ │ │ HOST │ │ ▼ │ Telegraf │───┘ ┌──────────┐ └─────────────┘ │ Grafana │ └──────────┘
Deis Workflow exposes Grafana through the router using service annotations. This
allows users to access the Grafana UI at
http://grafana.mydomain.com. The default username/password of
admin/admin can be overridden at any time by setting the following environment variables in
Grafana will preload several dashboards to help operators get started with monitoring Kubernetes and Deis Workflow. These dashboards are meant as starting points and don't include every item that might be desirable to monitor in a production installation.
Deis Workflow monitoring does not currently write data to the host filesystem or to long-term storage. If the Grafana instance fails, modified dashboards are lost. Until there is a solution to persist this, export dashboards and store them separately in version control.
It is recommended that users provide their own installation for Grafana if possible. The current deployment of Grafana within Workflow is durable across pod restarts which means custom dashboards that are created after startup will not be restored when the pod comes back up. If you wish to provide your own Grafana instance you can either set the
GRAFANA_LOCATION environment variable when your run
helm generate or set
grafana_location in the generate_params.toml.
InfluxDB writes data to the host disk, however, if the InfluxDB pod dies and comes back on
another host the data will not be recovered. We intend to fix this in a future release. The InfluxDB Admin UI is also
exposed through the router allowing users to access the query engine by going to
influx.mydomain.com. You will need to
configure where to find the
influx-api endpoint by clicking the "gear" icon at the top right and changing the host to
influxapi.mydomain.com and port to
Note: Each user accessing the Influx UI will need to make this change.
You can choose to not expose the Influx UI and API to the world by updating
$CHART_HOME/workspace/workflow-$WORKFLOW_RELEASE/manifests/deis-monitor-influxdb-ui-svc.yaml and removing the
following line -
To use off-cluster Influx please provide the following values in either the
generate_params.toml file or as environment variables when running
url = "http://my-influxhost.com:8086"-
database = "metrics"-
user = "InfluxUser"-
password = "MysuperSecurePassword"-
Telegraf is the metrics collection daemon used within the monitoring stack. It will collect and send the following metrics to InfluxDB:
It is possible to send these metrics to other endpoints besides InfluxDB. For more information please consult the following file
To learn more about customizing each of the above components please visit the monitor repository for more information.