Sawtooth Network Monitoring

Introduction

I need to agree with Prometheus vision, instrument everything. Every library, subsystem and service should have at least a few metrics to give you a rough idea of how it is performing.

It's easier with instrumentation to get a snapshot of your system state in a particular time, the one can discovery outliers and misbehaviors and if you need more details you always have debugging logs.

Saying that, in this post we will analyze how to monitor a Sawtooth Validator Node, using InfluxDB and Grafana.

Installing the tools

As usual we start with the KIND cluster, if you don't have this yet check it out, apply the RBAC for tiller and install it after:

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system
EOF

Installing HELM, InfluxDB and Grafana

$ helm init --service-account tiller
$ helm install stable/influxdb --name influxdb
$ helm install stable/grafana --name grafana

Configuring InfluxDB and Grafana

First we need the get the DNS address and available ports for each service, you can use:

$ kubectl get svc
grafana      ClusterIP   10.102.250.186   <none>        80/TCP                                         9m34s
influxdb     ClusterIP   10.101.193.193   <none>        8086/TCP,8088/TCP                              9m15s

Host summary:

grafana - grafana.default.svc
influxdb - influxdb.default.svc

To test locally lets expose both ports in our client machine:

INFLUXDB_HOST=$(kubectl get pods --namespace default -l app=influxdb -o jsonpath="{ .items[0].metadata.name }")
GRAFANA_HOST=$(kubectl get pods --namespace default -l "app=grafana,release=grafana" -o jsonpath="{.items[0].metadata.name}")

kubectl port-forward --namespace default $GRAFANA_HOST 3000
kubectl port-forward --namespace default $INFLUXDB_HOST 8086:8086

Create a new Influx database

To create a new database in InfluxDB use:

> / # influx -precision rfc3339
Connected to http://localhost:8086 version 1.7.3
InfluxDB shell version: 1.7.3
Enter an InfluxQL query
> CREATE DATABASE db1

Grafana setup

Go to http://localhost:3000, you can find your password on "helm status grafana" instructions. Lets add the InfluxDB database in the dashboard.

setup.png

Now, you need to import the dashboard from:

https://raw.githubusercontent.com/hyperledger/sawtooth-core/master/docker/grafana/dashboards/dashboard.json

Enabling in the validator

Download the Sawtooth 1.1 - Kubernetes, find the ALL the validators lines in the YAML description (container name: sawtooth-validator) and enable code instrumentation via arguments, the off-site configuration for configuration monitoring can be made on /etc/sawtooth/validator.toml as well.

$ wget https://sawtooth.hyperledger.org/docs/core/releases/1.1/app_developers_guide/sawtooth-kubernetes-default-poet.yaml

vi sawtooth-kubernetes-default-poet.yaml

...
  sawtooth-validator -vv \
    --opentsdb-url http://influxdb.default:8086 \
    --opentsdb-db db1 \
    --endpoint tcp://$SAWTOOTH_0_SERVICE_HOST:8800 \
    --bind component:tcp://eth0:4004 \
    ...
  sawtooth-validator -vv \
    --opentsdb-url http://influxdb.default:8086 \
    --opentsdb-db db1 \
    --endpoint tcp://$SAWTOOTH_1_SERVICE_HOST:8800 \
    --bind component:tcp://eth0:4004 \
    ...
  sawtooth-validator -vv \
    --opentsdb-url http://influxdb.default:8086 \
    --opentsdb-db db1 \
    --endpoint tcp://$SAWTOOTH_2_SERVICE_HOST:8800 \
    --bind component:tcp://eth0:4004 \
    ...
...

$ kubectl create -f sawtooth-kubernetes-default-poet.yaml

Wait for the nodes startup and double check if all metrics are working accordingly

# In the sawtooth-validators you can grab from logs
kubectl logs sawtooth-0-6bf8ddf79c-lmbsh -c sawtooth-validator | grep 8086

[2019-11-16 13:59:36.120 INFO     cli] Adding metrics reporter: url=http://influxdb.default:8086, db=db1

$ kubectl logs $INFLUXDB_HOST  # check the influxdb host logs

[httpd] 10.32.0.4 - - [16/Nov/2019:14:00:46 +0000] "POST /write?db=db1&precision=s HTTP/1.1" 204 0 "-" "Python-urllib/3.5" 7ca56170-0879-11ea-819b-4a0c9068d373 7050

The dashboard

Finally with every node sending metrics we can the Intkey Family to test our network:

$ kubectl exec -it sawtooth-0-6bf8ddf79c-z9pvq -c sawtooth-shell bash  # start the shell

$ intkey set a 10  # Initialize the first key
$ while true; do intkey inc a 10; done  # Increment it

hosts.png

About the metrics

Lets take a look closer on some of these metrics, since they are fundamental to correct network analysis of nodes behaviour.

https://github.com/hyperledger/sawtooth-core/blob/v1.1.0/validator/sawtooth_validator/journal/chain.py#L660

 * Chain Head - sawtooth_validator.chain.ChainController.chain_head 
      ChainController with LAST block inserted HEAD hash
      self._block_num_gauge = GaugeWrapper(metrics_registry.gauge('block_num'))
 
 * Committed Transactions - sawtooth_validator.chain.ChainController.committed_transactions_gauge
      Number of commited transactions (it can exists more than one per block)
      self._committed_transactions_count = CounterWrapper(metrics_registry.counter('committed_transactions_count'))

 * Block number - sawtooth_validator.chain.ChainController.block_num
      Number of blocks in the chain
      self._chain_head_gauge = GaugeWrapper(metrics_registry.gauge('chain_head', default='no chain head'))

 * Chain Head Moved to Fork - sawtooth_validator.chain.ChainController.chain_head_moved_to_fork_count
     Count the times the HEAD of the blockchain was moved to a new forked via consensus.
     self._moved_to_fork_count = CounterWrapper(metrics_registry.counter('chain_head_moved_to_fork_count'))

Conclusion

Don't forget to add an username and password to InfluxDB and Grafana in a production environment, other good practice can be move these argument configurations to the validator.toml file inside a configmap. Other than that have fun checking where these metrics come from in the source code, you will be amazed how much you can learn from that.

Listening