Contributor Node Operations

Authentication - SAML2 Support

Your Contributor Node can be configured to have SAML Authentication enabled, or disabled

  • SAML Authentication disabled - the login will be performed via the CN (your login credentials are created by your organization when your Contributor Node is configured).

  • SAML Authentication enabled - the authentication will be performed via your Organization's IdP (Identity Provider)

SAML Authentication is cookie based. The cookie is currently configured to expire every 60min, which means that after the expiry of the cookie - users will be automatically logged out.

See https://datarepublic.atlassian.net/wiki/spaces/DOCS/pages/159417879/Configuring+Contributor+Nodes+with+SAML+for+SSO+beta#Enabling-SSO-for-your-Contributor-Node for more detail.

Stopping and restarting the node

If you need to shut down your Contributor Node (for example before upgrading) run the contributor.sh script with the “down” command:

1 $ sh contributor.sh down

Te restart it:

1 $ sh contributor.sh up -d

The "-d" operation tells Docker Compose to start the images in the background.

Upgrading the node

Data Republic will email the nominated Contributor Node Administrator when there are updates to the software. Updates are generally available every 2 months, however security critical updates are communicated as soon as an issue is discovered and fix released.

When updates are released, a new version of contributor.sh is usually included. Updating is as easy as stopping and restarting the node:

1 2 3 $ cp contributor-update.sh contributor.sh # Move the new file into place $ sh contributor.sh down $ sh contributor.sh up -d

API for System Info

The Contributor Node includes a REST API for checking the system information(mainly the version). This is helpful when you want to make sure you are running the latest Contributor Node.

The System Info API is at the /api/v1/SystemInfo endpoint. It requires no authentication and will return an HTTP status of 200 (OK) on success. An example using curl is given below:

1 $ curl https://localhost:9059/api/v1/SystemInfo

In the example above, replace localhost:9059 with the hostname and port number for your node.

An example output is as below:

1 {"Version":"1.8.3","FullVersion":"v1.8.3-0-g3bf55b98"}

Backing up data

Your Contributor Node is designed to retain as little data as possible. One thing it does store is the mapping between your internal customer key (called the person ID) and the random token. This information is stored in the attached MySQL database.

You can use your existing database backup tools, or the standard MySQL backup/restore commands to backup this data. Alternatively, we recommend backing up the Docker volume or virtual machine the node is running on.

To backup the Docker volume image:

1 2 3 4 $ sh contributor.sh down $ sudo rsync -r /var/lib/docker/volumes/contributor_contributor-db-data \ /path/to/backup-directory $ sh contributor.sh up -d

You may want to use your existing enterprise backup tools instead of rsync, to ensure the files are saved to a different virtual machine.

No confidential information is stored in the volume being backed up, only the mapping between person ID and randomly generated token.

Monitoring and logging

API for Health Checks

Ping

The Contributor Node includes a REST API for updating data, downloading tokens, and deleting previously uploaded hash slice data from the matcher nodes. There is also API calls for monitoring the node to ensure the node is running.

The simplest API call is the /api/v1/Ping endpoint. It requires no authentication and will return an HTTP status of 200 (OK) on success (the body is empty). An example using curl is given below:

1 $ curl https://localhost:9059/api/v1/Ping

In the example above, replace localhost:9059 with the hostname and port number for your node.

This API call is only verifying that the node is running – a successful result does not necessarily mean that your node is operating correctly. However any other result (e.g. 500 or timeout error) can be taken to indicate that the node is not running or is unable to start correctly. See Log Messages, below for how to diagnose such issues.

HealthCheck

A second API call is available at /api/v1/HealthCheck that performs a more thorough check. This call does require API authentication, which is passed using the HTTP Basic Authentication protocol. An example using curl is given below:

1 2 3 4 5 6 # HTTP Basic passes the api password in Base64 encoding - beware trailing spaces! # Replace "YOUR_PASSWORD" with your node API password API_PASSWORD=$(echo -n api:YOUR_PASSWORD | base64) # Calls /api/v1/HealthCheck. 200 (OK) and empty body ("{}") indicates no errors curl -X GET "https://localhost:9059/api/v1/HealthCheck" -H "accept: application/json" -H "authorization: Basic $API_PASSWORD"

In the example above, replace localhost:9059 with the host name and port number for your node, and YOUR_PASSWORD with the API password you configured in contributor.sh.

Any status code other than 200 indicates an error, and the body of the message will contain the error message. The expected error codes are:

Status Code

Error

Troubleshooting

Status Code

Error

Troubleshooting

401

Unauthenticated

API password is incorrect, or was not supplied. Check that you have Base64 encoded the password correctly. Note that there should be no trailing spaces or newlines in the password.

500

Server error

The node has not been able to start correctly, or has experienced an unrecoverable error. Check Log Messages (see below) for details.

SystemHealthCheck

To monitor the health of the entire Privacy-Preserving Matching network, including your node’s ability to interact with it, you can use the /api/v1/SystemHealthCheck end point. This monitoring API will return the status of not only your node, but also the readiness of the network to receive hash slice data or perform a match. This end point also requires authentication.

An example of calling this end point using curl is given below.

1 2 $ API_PASSWORD=$(echo -n api:YOUR_PASSWORD | base64) $ curl -X GET "https://localhost:9059/api/Contributor/v1/SystemHealthCheck" -H "accept: application/json" -H "authorization: Basic $API_PASSWORD"

In the example above, replace localhost:9059 with the host name and port number for your node, and YOUR_PASSWORD with the API password you configured in contributor.sh.

The SystemHealthCheck call should always return 200 (any other value indicates that the node has not started correctly). The message body must then be examined for the status of individual sub-components.

An example JSON output will help illustrate:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 { "LocalHealth": { "Status": "HEALTH_PASSING", "Output": "Ping local database successful." }, "NetworkConnectivity": { "Status": "HEALTH_PASSING", "Output": "Ping Vault successful.\nPing Consul successful.\nPing aggregator 101 (aggregator.hitch-qa.nonprod-au.datarepublic.io:443) successful." }, "LoadRecordsHealth": { "Status": "HEALTH_PASSING", "Output": "Ping local database successful. [...]" }, "QueryHealth": { "Status": "HEALTH_CRITICAL", "Output": "Ping local database successful. [...]" } }

The status HEALTH_PASSING indicates that the node’s health checks are all OK. The values HEALTH_CRITICAL means one or more health checks have failed for that component.

Component

Health Passing

Health Critical

Troubleshooting Steps

Component

Health Passing

Health Critical

Troubleshooting Steps

LocalHealth: Refers to the Contributor Node itself

Node is operating and can connect to local database

There is a local issue that means the node is not ready to receive or download data

Check Log Messages below.

NetworkConnectivity: Refers to the node being able to talk to Vault (KMS), Consul (configuration), and the Matcher Nodes.

Node is able to connect to all required components in the Privacy-Preserving Matching network.

Node can’t connect to one or more components and so is not ready to receive data.

Check Log Messages below.

Also check firewall and/or proxy logs.

LoadRecordsHealth: Node’s assessment of whether it is able to send hash slices to Matcher Nodes.

Node is able to receive data and send hash slices to the matcher network.

Node can’t connect to a required component and so will not be able to receive data for hash slicing.

Check UI Dashboard, and/or Log Messages.

Contact DR Support in event of extended outage (>1 hr).

QueryHealth: Node’s assessment of whether a sufficient number of matcher nodes are running and able to complete a match request.

Data Republic ready to initiate a match request.

A match request in Data Republic will likely fail.

Contact DR Support in event of extended outage (>1 hr).

 

Log Messages

To access the log messages of your node you can either configure FluentD to direct log messages to your logging server, or capture the log messages from the docker container.

Capturing logs from the Docker container

An example of the latter approach is given below:

1 $ docker logs -f $(docker ps --format "{{.Names}}" --filter "name=contributor" | grep _contributor_)

The command above will “follow” the logs similar to the tail -f command – it does not return unless the image stops running. It may be useful to look at the logs over a recent period (say the last hour):

1 2 $ node_container_name=$(docker ps --format "{{.Names}}" --filter "name=contributor" | grep _contributor_) $ docker logs --since 1h $node_container_name

 

Capturing logs

In the same directory as your contributor.sh file you will find the Fluentd configuration file, named fluentd.conf. You will need to uncomment the first remote_syslog block and enter your server details to redirect log messages to your syslog server.

Available on image registry.fpims.datarepublic.com.au/dr-log version 3 and above (dr-log:3)

1 2 3 4 5 6 7 8 9 10 11 12 <match **> @type copy <store> @type remote_syslog host 172.22.0.1 # enter here your syslog server details port 514 </store> <store> @type secure_forward [...] </store> </match>

If you would like to use a syslog server running on the same host, you will need to use the docker host gateway address.

1 2 3 4 5 <store> @type remote_syslog host "#{`ip route | awk '/default/{print $3}'`}" # dynamically find the host gateway [...] </store>

You may find all the possible configuration parameters on the plugin documentation page.

Restarting the contributor node is required for the changes to take effect.

Log message format

Log messages output from the Docker container are in a quasi-JSON format.

  • Each message is separated by a newline character.

  • An individual message is a single JSON object.

Each JSON message contains a log level, a time and a msg.

Log messages to look for

Here are the most common errors that can be found in a log msg entry returned by the contributor application.

Unable to issue a Vault certificate

1 {“level":"info","logGroup":"general","msg":"retrying after error: DynamicCertificateManager.LoadCertificate: Error making API request.\n\nURL: PUT <https://vault.datarepublic.io:8200/v1/hitch-pki/issue/hitch-contributor1\nCode:> 403. Errors:\n\n* permission denied","time":"2020-07-06T04:08:25Z"}

In that case, the Contributor Node can not issue his x509 certificate, used to communicate with the rest of the network.

  • Verify that your HITCH_KMSTOKEN variable is correct and is the same as the token issued by Data Republic

  • Contact Data Republic Customer Success Team who will renew your expired token

If the second option works, keep in mind that your Contributor Node is refreshing its access token itself on a regular basis and therefore needs to be constantly running.

Transport is closing

As the Contributor Node is streaming data to the network, some interruptions can break flow.

1 2 Unavailable desc = client disconnected Send: rpc error: code = Unavailable desc = transport is closing

These errors occur when a transient network error happens on the network or when the remote backend has an internal error.

Please verify any timeout on long streaming connections on your proxies and firewalls. You can also use a tool such as mtr to look at packet loss on the route to the host returned by the error.

Ultimately, simply try again later.

Packet for query is too large

packet for query is too large. Try adjusting the ‘max_allowed_packet’ variable on the serve

This is due to a misconfiguration of the database.

  • First, make sure that the HITCH_DATABASEURL variable contains the extension ?maxAllowedPacket=0. This tells the SQL client to match the server value of maxAllowedPacket.

  • Secondly, you need to update MySQL server configuration with an appropriate value for your hardware. Depending on your database hosting:

    • If hosting on AWS RDS, update your Parameter Groups for max_allowed_packetto 256M.

    • If hosting on a dedicated server, update your my.ini to contain max_allowed_packet=256M

    • If using Docker Compose, add --max_allowed_packet=32505856 at the end of the command: line