InfluxDB - Migration to version 2

Logo

Introduction

A previous article, published in February 2020, covered "InfluxDB v2, getting starting, preparing the migration from version 1.x".

InfluxDB v2 was in a beta stage version in this paper. InfluxDB v2 has been released officially in November 2020. It’s time to migrate from version 1.8.

Before migrating, let’s sum up.

About the TICK Stack (Telegraf - InfluxDB - Chronograf - Kapacitor) : in version 2, Chronograf and Kapacitor are integrated into InfluxDB, only Telegraf remains a separate component.

InfluxDB v2 - architecture TICK

In this paper, a migration is performed from InfluxDB v1.8 to InfluxDB v2. The InfluxDB server is installed on Ubuntu 18.04.

The major changes to notice :

  • A database/retention policy is a bucket in version 2. An organization is mandatory and initialized at upgrade.

    InfluxDB v2 - DBRP v1 to buckets v2

  • Flux language replaces InfluxQL (SQL-Like).
  • Continuous queries must be migrated to Flux tasks.
  • Protocols Opentsdb/Graphite/Collectd are no more supported and native in InfluxDB engine, Telegraf must be used to feed InfluxDB.
  • Backward compatibility for existing 1.x users (InfluxQL queries, Kapacitor Tickscripts…) is guaranteed on InfluxDB v2 only if authentication is enabled for these users (username/password).

Measurements and series, quick reminder

In the time series database InfluxDB, the format of a point is the following :

measurement[,tag=value[,tag=value]] field=value[,field=value] [<timestamp>]
cpu_measurement,location=france,host=vpsfrsqlpac1 value=25.08,desc="influx" 1580918550000000000
cpu_measurement,location=germany,host=vpsfrsqlpac2 value=76.07,desc="postgres" 1580918550000000000

Series are the combination measurement/possible tag keys :

measurement, tag key1=value1, tag key2=value2 [,...]
cpu_measurement,location=france,host=vpsfrsqlpac1
cpu_measurement,location=germany,host=vpsfrsqlpac2
  • Data are overwritten if the point already exists.
  • The server’s timestamp is used if omitted.
  • Data types can be forced when writing the first point : value=25i for an integer datatype, status=t|f for a boolean datatype.

The InfluxDB 1.8 server to migrate to version 2

InfluxDB v 1.8 architecture

InfluxDB v 1.8 distribution is installed on Ubuntu 18.04 in the directory /opt/influxdb/influxdb-1.8. The InfluxDB v1.8 server to migrate is started with the user influxdb (uid : 10000, gid : 10000) and the command line below :

/opt/influxdb/influxdb-1.8/usr/bin/influxd  \
      -config /opt/influxdb/dba/srvifxsqlpac/cfg/srvifxsqlpac.conf \
      -pidfile /opt/influxdb/dba/srvifxsqlpac/run/srvifxsqlpac.pid
  • HTTPS is enabled (self-signed certificate). InfluxDB listens on https://vpsfrsqlpac:8086.
    influxdb$ export INFLUX_USERNAME=dba
    influxdb$ export INFLUX_PASSWORD=****************
                
    influxdb$ influx -ssl -host vpsfrsqlpac
    Connected to https://vpsfrsqlpac:8086 version 1.8.3
    InfluxDB shell version: 1.8.3 >
    
    SHOW DATABASES;
    name: databases
    name
    ----
    _internal
    netdatatsdb
    telegraf
    aggtsdb
    SHOW USERS;
    user     admin
    ----     -----
    dba      true
    netdata  false
    grafana  false
    telegraf false
    SHOW GRANTS FOR netdata;
    database    privilege
    --------    ---------
    aggtsdb     ALL PRIVILEGES
    netdatatsdb ALL PRIVILEGES
    SHOW GRANTS FOR grafana;
    database    privilege
    --------    ---------
    netdatatsdb READ
    aggtsdb     READ
  • The server databases (metadata + data + wal) are located in the directory /sqlpac/influxdb/srvifxsqlpac.
  • Continuous queries exist.
  • Flux language is enabled.
  • An endpoint opentsdb listens on port 4242. Netdata (performance metrics collector tool) sends data to this endpoint using OpenTSDB protocol.
  • Grafana reporting tool connects to the InfluxDB server using grafana user, read only user in netdatatsdb database.

In the configuration file :

/opt/influxdb/dba/srvifxsqlpac/cfg/srvifxsqlpac.conf
[meta]                                                      
  dir = "/sqlpac/influxdb/srvifxsqlpac/meta"
  retention-autocreate = true

[data]
  dir = "/sqlpac/influxdb/srvifxsqlpac/data"
  wal-dir = "/sqlpac/influxdb/srvifxsqlpac/wal"

[http]
  bind-address = "vpsfrsqlpac:8086"
  auth-enabled = true
  
  https-enabled = true
  https-certificate = "/var/ssl/VPSFRSQLPAC.crt"
  https-private-key = "/var/ssl/VPSFRSQLPAC.key"

  flux-enabled = true
  flux-log-enabled = true>
                  
[continuous_queries]
  enabled = true
  log-enabled = true
  query-stats-enabled = false
  run-interval = "60s"
  
[[opentsdb]]
  enabled = true
  bind-address = ":4242"
  database = "netdatatsdb"

Checking InfluxDB v1 for upgrade, auth-enabled

InfluxDB 2.0 requires authentication and does not support the InfluxDB 1.x auth-enabled = false configuration option.

Only users 1.x with credentials (username/password) will be migrated to InfluxDB v2 and important notice : admin users are not migrated.

If admin users are used (Grafana, Chronograf, Telegraf, Kapacitor…), before migrating, the existing tools will have to use non admin users with authentication.

Restart the InfluxDB server with the auth-enabled configuration option set to true and migrate existing tools, data sources… to use non admin authenticated users, otherwise the existing queries (InfluxQL…) will be silently ignored in InfluxDB v2.

/opt/influxdb/dba/srvifxsqlpac/cfg/srvifxsqlpac.conf
[http]
auth-enabled=true

Migration path

The migration path is the following :

  • Data are migrated to InfluxDB version 2 in the directory /sqlpac/influxdb/srvifx2sqlpac.
  • The bolt database is also installed in the directory /sqlpac/influxdb/srvifx2sqlpac. The bolt database is new in version 2, this key/value database stores metadata (dashboards, permissions, tasks,…).
  • Continuous queries are migrated to Flux tasks (to be prepared in advance if possible).
  • The opentsdb endpoint is replaced by Telegraf.
InfluxDB v2 - Migration path

Migrating to InfluxDB v2

Step 1 : Installing InfluxDB v2

InfluxDB 2.0.3 is installed in the directory /opt/influxdb/influxdb-2.0, this directory only contains the executables influx (client) and influxd (influx daemon). It is added in the $PATH variable :

influxdb$ export PATH=/opt/influxdb/influxdb-2.0:$PATH

Before migrating, run influxd upgrade --help to get a full description of the upgrade option :

influxdb$ influxd upgrade --help

The upgrade will :

  1. Read the 1.x config file and creates a 2.x config file with matching options. Unsupported 1.x options are reported.
  2. Copie 1.x database files.
  3. Create influx CLI configurations.
  4. Export any 1.x continuous queries to disk.

Space availability is checked during the upgrade.

Check the version used is now v2 :

influxdb$ influxd version
InfluxDB 2.0.3 (git: fe04d346df) build_date: 2020-12-15T01:00:16Z

Step 2 : Stopping and backing up InfluxDB v1 server databases

The upgrade does not overwrite existing 1.x database files in this use case.

Stop the InfluxDB v1 database and make a backup copy of all 1.x data :

influxdb$ ps -ef | grep 'bin/influxd'
influxdb 18824     1  1 01:28 pts/2    00:00:18 /opt/influxdb/influxdb-1.8/usr/bin/influxd
           -pidfile /opt/influxdb/dba/srvifxsqlpac/run/srvifxsqlpac.pid
           -config /opt/influxdb/dba/srvifxsqlpac/cfg/srvifxsqlpac.conf        

influxdb$ kill -s TERM 18824

influxdb$ cd /sqlpac/influxdb
influxdb$ cp -R srvifxsqlpac srvifxsqlpac_backup

Step 3 : Preparing the upgrade command line


influxdb$ influxd upgrade \
            --bolt-path="/sqlpac/influxdb/srvifx2sqlpac/srvifxs2qlpac.bolt" \
            --engine-path="/sqlpac/influxdb/srvifx2sqlpac" \
            --config-file="/opt/influxdb/dba/srvifxsqlpac/cfg/srvifxsqlpac.conf" \
            --continuous-query-export-path="/opt/influxdb/dba/srvifx2sqlpac/scripts/cs.sql" \
            --influx-configs-path="/sqlpac/influxdb/srvifx2sqlpac/configs" \
            --v2-config-path="/opt/influxdb/dba/srvifx2sqlpac/cfg/srvifx2sqlpac.toml" \
            --log-path="/opt/influxdb/dba/srvifx2sqlpac/log/upgrade.log" \
            --bucket="masterts" \
            --org="sqlpac" \
            --username="dba" \
            --password="************"

In the above command line :

--bolt-pathBolt database v2 path and file name
--engine-pathv2 engine database files location
--config-fileInfluxDB 1.8 config file path
--continuous-query-export-pathFile where to write continuous queries InfluxQL code
--influx-configs-pathPath for 2.x CLI configurations file
--v2-config-pathConfiguration file to generate for V2 server Allowed formats are YAML, TOML or JSON files (.yaml,.yml,.toml,.json).
--log-pathUpgrade log file
--bucketMaster bucket (mandatory)
--orgOrganization (mandatory)
--usernameAdmin username (mandatory)
--passwordPassword Admin username (mandatory)

The parameter --v1-dir specifies the path to source 1.x database directory containing meta, data and wal sub-folders but is not necessary if the parameter --config-file is passed, they are mutually exclusive :

Error: only one of --v1-dir or --config-file may be specified

Remove any file already generated by a previous failed upgrade (database files, *.toml…). If a file already exists, an error is generated and the upgrade is not performed :

Error: file present at target path for exported continuous queries '/opt/influxdb/dba/srvifx2sqlpac/scripts/cs.sql'
Error: upgraded 2.x engine directory '/sqlpac/influxdb/srvifx2sqlpac' must be empty

When the upgrade starts (some output is removed for brevity) :

"upgrade/upgrade.go:376","msg":"Starting InfluxDB 1.x upgrade"
"upgrade/upgrade.go:379","msg":"Upgrading config file","file":"/opt/influxdb/dba/srvifxsqlpac/cfg/srvifxsqlpac.conf"
"upgrade/upgrade.go:383","msg":"Config file upgraded.","1.x config":"/opt/influxdb/dba/srvifxsqlpac/cfg/srvifxsqlpac.conf","2.x config":"/opt/influxdb/dba/srvifx2sqlpac/cfg/srvifx2sqlpac.toml"
"upgrade/upgrade.go:393","msg":"Upgrade source paths","meta":"/sqlpac/influxdb/srvifxsqlpac/meta","data":"/sqlpac/influxdb/srvifxsqlpac/data"
"upgrade/upgrade.go:394","msg":"Upgrade target paths","bolt":"/sqlpac/influxdb/srvifx2sqlpac/srvifxs2qlpac.bolt","engine":"/sqlpac/influxdb/srvifx2sqlpac"
"bolt/bbolt.go:67","msg":"Resources opened","service":"bolt","path":"/sqlpac/influxdb/srvifx2sqlpac/srvifxs2qlpac.bolt"
…
"migration/migration.go:241","msg":"Migration \"Create TSM metadata buckets\" started (up)","service":"migrations"
…
Welcome to InfluxDB 2.0 upgrade!
Please type your retention period in hours.
Or press ENTER for infinite.:

You have entered:
  Username:          dba
  Organization:      sqlpac
  Bucket:            masterts
  Retention Period:  infinite
Confirm? (y/n): y

The retention period is set to infinite. The upgrade takes place.

"upgrade/setup.go:154","msg":"CLI config has been stored.","path":"/sqlpac/influxdb/srvifx2sqlpac/configs"
"upgrade/database.go:37","msg":"Checking space"
"upgrade/database.go:53","msg":"Disk space info","Free space":"4.0 GB","Requested space":"466 MB"
"upgrade/database.go:67","msg":"Upgrading databases"
"upgrade/database.go:75","msg":"Skipping _internal "
…
"upgrade/database.go:80","msg":"Upgrading database ","database":"netdatatsdb"}
"upgrade/database.go:99","msg":"Creating bucket ","Bucket":"netdatatsdb/autogen"}
"upgrade/database.go:110","msg":"Creating database with retention policy","database":"dd2b762103a1d94c"}
"upgrade/database.go:128","msg":"Creating mapping","database":"netdatatsdb","retention policy":"autogen","orgID":"4dec7e867866cc2f","bucketID":"dd2b762103a1d94c"}
"upgrade/database.go:141","msg":"Creating shard group","database":"dd2b762103a1d94c","retention policy":"autogen","time":1577059200}
"upgrade/database.go:141","msg":"Creating shard group","database":"dd2b762103a1d94c","retention policy":"autogen","time":1578268800}
"upgrade/database.go:141","msg":"Creating shard group","database":"dd2b762103a1d94c","retention policy":"autogen","time":1578873600}
"upgrade/database.go:141","msg":"Creating shard group","database":"dd2b762103a1d94c","retention policy":"autogen","time":1579478400}
"upgrade/database.go:141","msg":"Creating shard group","database":"dd2b762103a1d94c","retention policy":"autogen","time":1580688000}
"upgrade/database.go:141","msg":"Creating shard group","database":"dd2b762103a1d94c","retention policy":"autogen","time":1581292800}
"upgrade/database.go:156","msg":"Copying data","source":"/sqlpac/influxdb/srvifxsqlpac/data/netdatatsdb/autogen","target":"/sqlpac/influxdb/srvifx2sqlpac/data/dd2b762103a1d94c/autogen"}
"upgrade/database.go:171","msg":"Copying wal","source":"/sqlpac/influxdb/srvifxsqlpac/wal/netdatatsdb/autogen","target":"/sqlpac/influxdb/srvifx2sqlpac/wal/dd2b762103a1d94c/autogen"}
"upgrade/database.go:208","msg":"Exporting CQ","db":"netdatatsdb","cq_name":"cq_metrics"}
…
"upgrade/database.go:80","msg":"Upgrading database ","database":"telegraf"}
"upgrade/database.go:99","msg":"Creating bucket ","Bucket":"telegraf/rp72h"}
"upgrade/database.go:110","msg":"Creating database with retention policy","database":"7538d4ef709d1715"}
"upgrade/database.go:128","msg":"Creating mapping","database":"telegraf","retention policy":"rp72h","orgID":"4dec7e867866cc2f","bucketID":"7538d4ef709d1715"}
…
"upgrade/security.go:48","msg":"User is admin and will not be upgraded.","username":"dba"}
"upgrade/security.go:105","msg":"User upgraded.","username":"grafana"}
"upgrade/security.go:105","msg":"User upgraded.","username":"netdata"}
"upgrade/upgrade.go:463","msg":"Upgrade successfully completed. Start the influxd service now, then log in","login_url":"https://vpsfrsqlpac:8086"}

Starting InfluxDB v2

Databases are now migrated. The configuration file srvifx2sqlpac.toml created with the parameter --v2-config-path during the upgrade contains the essential informations (bolt database path, engine database path, address…). The HTTPS/SSL parameters are also copied from v1 configuration and defined.

/opt/influxdb/dba/srvifx2sqlpac/cfg/srvifx2sqlpac.toml
bolt-path = "/sqlpac/influxdb/srvifx2sqlpac/srvifxs2qlpac.bolt"
engine-path = "/sqlpac/influxdb/srvifx2sqlpac"
http-bind-address = "vpsfrsqlpac:8086"
storage-series-id-set-cache-size = 100
tls-cert = "/var/ssl/VPSFRSQLPAC.crt"
tls-key = "/var/ssl/VPSFRSQLPAC.key"

To start InfluxDB v2 server, set the variable INFLUXD_CONFIG_PATH (v2 server configuration file) and run the influxd daemon with the option run :

influxdb$ export INFLUXD_CONFIG_PATH=/opt/influxdb/dba/srvifx2sqlpac/cfg/srvifx2sqlpac.toml
        
influxdb$ nohup influxd run >> /opt/influxdb/dba/srvifx2sqlpac/log/srvifx2sqlpac.log 2>&1 &
$LOG/srvifx2sqlpac.log
ts=2021-01-29T08:12:21.185163Z lvl=info msg="Welcome to InfluxDB" log_id=0S1VoPlG000 version=2.0.3 commit=fe04d346df build_date=2020-12-15T01:00:16Z
ts=2021-01-29T08:12:21.186172Z lvl=info msg="Resources opened" log_id=0S1VoPlG000 service=bolt path=/sqlpac/influxdb/srvifx2sqlpac/srvifxs2qlpac.bolt
ts=2021-01-29T08:12:21.194448Z lvl=info msg="Checking InfluxDB metadata for prior version." log_id=0S1VoPlG000 bolt_path=/sqlpac/influxdb/srvifx2sqlpac/srvifxs2qlpac.bolt
ts=2021-01-29T08:12:21.194543Z lvl=info msg="Using data dir" log_id=0S1VoPlG000 service=storage-engine path=/sqlpac/influxdb/srvifx2sqlpac/data
ts=2021-01-29T08:12:21.194607Z lvl=info msg="Compaction settings" log_id=0S1VoPlG000 service=storage-engine max_concurrent_compactions=1 throughput_bytes_per_second=50331648 throughput_bytes_per_second_burst=50331648
ts=2021-01-29T08:12:21.194620Z lvl=info msg="Open store (start)" log_id=0S1VoPlG000 service=storage-engine op_name=tsdb_open op_event=start
ts=2021-01-29T08:12:21.603908Z lvl=info msg="Opened shard" log_id=0S1VoPlG000 service=storage-engine op_name=tsdb_open index_version=inmem path=/sqlpac/influxdb/srvifx2sqlpac/data/dd2b762103a1d94c/autogen/75 duration=395.206ms
ts=2021-01-29T08:12:21.620382Z lvl=info msg="Opened shard" log_id=0S1VoPlG000 service=storage-engine op_name=tsdb_open index_version=inmem path=/sqlpac/influxdb/srvifx2sqlpac/data/dd2b762103a1d94c/autogen/15 duration=16.413ms
ts=2021-01-29T08:12:21.653303Z lvl=info msg="Opened shard" log_id=0S1VoPlG000 service=storage-engine op_name=tsdb_open index_version=inmem path=/sqlpac/influxdb/srvifx2sqlpac/data/dd2b762103a1d94c/autogen/22 duration=32.865ms
ts=2021-01-29T08:12:21.696091Z lvl=info msg="Opened shard" log_id=0S1VoPlG000 service=storage-engine op_name=tsdb_open index_version=inmem path=/sqlpac/influxdb/srvifx2sqlpac/data/dd2b762103a1d94c/autogen/26 duration=41.242ms
ts=2021-01-29T08:12:21.724205Z lvl=info msg="Opened shard" log_id=0S1VoPlG000 service=storage-engine op_name=tsdb_open index_version=inmem path=/sqlpac/influxdb/srvifx2sqlpac/data/dd2b762103a1d94c/autogen/47 duration=28.064ms
ts=2021-01-29T08:12:21.744825Z lvl=info msg="Opened shard" log_id=0S1VoPlG000 service=storage-engine op_name=tsdb_open index_version=inmem path=/sqlpac/influxdb/srvifx2sqlpac/data/dd2b762103a1d94c/autogen/58 duration=20.583ms
ts=2021-01-29T08:12:21.745069Z lvl=info msg="Open store (end)" log_id=0S1VoPlG000 service=storage-engine op_name=tsdb_open op_event=end op_elapsed=550.449ms
ts=2021-01-29T08:12:21.745094Z lvl=info msg="Starting retention policy enforcement service" log_id=0S1VoPlG000 service=retention check_interval=30m
ts=2021-01-29T08:12:21.745111Z lvl=info msg="Starting precreation service" log_id=0S1VoPlG000 service=shard-precreation check_interval=10m advance_period=30m
ts=2021-01-29T08:12:21.745165Z lvl=info msg="Starting query controller" log_id=0S1VoPlG000 service=storage-reads concurrency_quota=10 initial_memory_bytes_quota_per_query=9223372036854775807 memory_bytes_quota_per_query=9223372036854775807 max_memory_bytes=0 queue_size=10
ts=2021-01-29T08:12:21.745699Z lvl=info msg="Configuring InfluxQL statement executor (zeros indicate unlimited)." log_id=0S1VoPlG000 max_select_point=0 max_select_series=0 max_select_buckets=0
ts=2021-01-29T08:12:22.039074Z lvl=info msg=Starting log_id=0S1VoPlG000 service=telemetry interval=8h
ts=2021-01-29T08:12:22.039344Z lvl=info msg=Listening log_id=0S1VoPlG000 transport=https addr=vpsfrsqlpac2:8086 port=8086

Run influx ping to check the connectivity :

influxdb$ influx ping --host https://vpsfrsqlpac:8086
OK

Using influx client and self-signed certificates without a certificate authority (CA), add the option --skip-verify, otherwise the error "x509: certificate signed by unknown authority" is raised.

For more informations about creating self-signed certificates with its own certificate authority : Ubuntu - Self-signed certificates with its own certification authority

Connecting and running commands using influx client

The admin’s token (dba) is then stored in the file /sqlpac/influxdb/srvifx2sqlpac/configs, section default, file created during the migration (--influx-configs-path) :

/sqlpac/influxdb/srvifx2sqlpac/configs
[default]
  url = "https://vpsfrsqlpac:8086"
  token = "K2YXbGhIJIjVhL_FjmDN_Dl3CdOIgAPi4CwHhp6SrSFOEvfm62ziYOZ15W4kySH7dc6Hlx0BhBKRvH9IXgja6g=="
  org = "sqlpac"
  active = true

Set the variable $INFLUX_CONFIGS_PATH (configs file path) and the variable $INFLUX_ACTIVE_NAME (section name) :

influxdb$ export INFLUX_CONFIGS_PATH=/sqlpac/influxdb/srvifx2sqlpac/configs
influxdb$ export INFLUX_ACTIVE_NAME=default

influx client commands require --host if localhost (the default) is not used, define the variable $INFLUX_HOST, it will avoid repeating --host parameter :

influxdb$ export INFLUX_HOST=https://vpsfrsqlpac:8086

Admin commands are now authorized (buckets, users, authorizations, tasks… management) using influx client :

influxdb$ influx bucket list --org sqlpac
        ID                      Name                            Retention       Organization ID
a67ea95f68df447c        _monitoring             168h0m0s        4dec7e867866cc2f
e6b47fdfbbe80d57        _tasks                  72h0m0s         4dec7e867866cc2f
18c073a588127f79        masterts                0s              4dec7e867866cc2f
dd2b762103a1d94c        netdatatsdb/autogen     0s              4dec7e867866cc2f
7538d4ef709d1715        telegraf/rp72h          72h0m0s         4dec7e867866cc2f
influxdb$ influx auth list
ID                  Description     Token        User Name    User ID             Permissions
06f9d7a7c0856000    dba's Token     nDWNhwb…     dba          06f9d7a7a1856000    [read:authorizations write:authorizations
                                                                                   read:buckets write:buckets
                                                                                   read:dashboards write:dashboards…]

To use the graphical tool (Chronograf, …), first run influx user password to initialize a password for the admin account (dba) :

influxdb$ influx user password --name dba
Please type your new password:
…

GUI is available at the address http(s)://<host>:8086 : https://vpsfrsqlpac:8086 in this use case.

InfluxDB v2 - Sign In
InfluxData - Sign In
InfluxData - Data explorer
InfluxData - Data Explorer

Now Chronograf is integrated, building queries, especially with Flux language, is easier, influx query command line being a little bit frustrating :

InfluxData - Query builder

About learning Flux language, the 2 publications below may help :

Post migration steps

Users

influxd upgrade migrates existing 1.x users and permissions, however it does not migrate administrative users.

To check the 1.x users and permissions migration, use influx v1 auth list :

influxdb$ influx v1 auth list

ID        Description  Name / Token    User Name       User ID     Permissions
07017e…   …            grafana         dba             07017e…     [read:orgs/4dec7e867866cc2f/buckets/dd2b762103a1d94c]
07017e…   …            netdata         dba             07017e…     [read:orgs/4dec7e867866cc2f/buckets/dd2b762103a1d94c
                                                                    write:orgs/4dec7e867866cc2f/buckets/dd2b762103a1d94c]
influxdb$ influx bucket list
ID                      Name                    Retention       Organization ID
a67ea95f68df447c        _monitoring             168h0m0s        4dec7e867866cc2f
e6b47fdfbbe80d57        _tasks                  72h0m0s         4dec7e867866cc2f
18c073a588127f79        masterts                0s              4dec7e867866cc2f
dd2b762103a1d94c        netdatatsdb/autogen     0s              4dec7e867866cc2f
7538d4ef709d1715        telegraf/rp72h          72h0m0s         4dec7e867866cc2f

Backward compatibility is ensured. The grafana user used by Grafana is migrated : the user connects well to InfluxDB v2 and can run the InfluxQL queries defined in the Grafana dashboards.

Permissions can be managed using influx v1 auth [create|delete|set-active…], but bear in mind that using v1 option, only read/write permissions on buckets can be defined. The full permissions management (buckets, tasks, endpoints, dashboards…) is only available for v2 users created with influx user and influx auth commands.

Continuous queries migration to Flux tasks

Unfortunately, continuous queries are not easily migrated and must be converted manually to Flux tasks. This work should be prepared before migration. Continuous queries source code is saved in the file specified in the parameter --continuous-query-export-path when running upgrade (influxd upgrade).

/opt/influxdb/dba/srvifx2sqlpac/scripts/cs.sql
name: netdatatsdb
name                query
----                -----
cq_metrics CREATE CONTINUOUS QUERY cq_metrics ON netdatatsdb BEGIN SELECT mean(value) INTO netdatatsdb.autogen.backend_metrics FROM netdatatsdb.autogen."netdata.netdata.backend_metrics.sent" GROUP BY time(2m) fill(0) END
InfluxQL 1 codeFlux task v2 code
CREATE CONTINUOUS QUERY cq_metrics
ON netdatatsdb
BEGIN
 SELECT mean(value)
 INTO netdatatsdb.autogen.backend_metrics
 FROM netdatatsdb.autogen."netdata.netdata.backend_metrics.sent"
 GROUP BY time(2m)
 fill(0)
END
cq_metrics.flux
option task = {name: "cq_metrics", every: 2m}

data = from(bucket: "netdatatsdb/autogen")
  |> range(start: -task.every)
  |> filter(fn: (r) =>
      (r._measurement == "netdata.netdata.backend_metrics.sent")
     )
  |> aggregateWindow(every: 2m, fn: mean)
  |> fill(column: "_value", value: 0.0)
  |> set(key: "_measurement", value: "backend_metrics")
  |> to(bucket: "netdatatsdb/autogen")

To compile the task, either use the graphical interface or influx client :

influxdb$ influx task create --file cq_metrics.flux
ID                      Name            Organization ID         Organization    Status  Every   Cron
07018ed278d19000        cq_metrics      4dec7e867866cc2f        sqlpac          active  2m

Tasks logs are available in the user interface. Also available in command line using influx client , but less easy :

influxdb$ influx task run list --task-id 07018ed278d19000
ID                   TaskID               Status   ScheduledFor           StartedAt                        FinishedAt                       RequestedAt
07019b0e30119000     07018ed278d19000     success  2021-01-29T11:14:00Z   2021-01-29T11:14:00.003607793Z   2021-01-29T11:14:00.026414732Z   0001-01-01T00:00:00Z
07019a9900119000     07018ed278d19000     success  2021-01-29T11:12:00Z   2021-01-29T11:12:00.003733373Z   2021-01-29T11:12:00.020218332Z   0001-01-01T00:00:00Z

OpenTSDB migration to Telegraf

Native support of the protocols OpenTSDB, Graphite, CollectD, UDP is removed in version 2. A telegraf agent must be setup between the application and the InfluxDB v2 server to manage these architectures.

InfluxDB v2 - OpenTSDB to Telegraf

In this use case, Netdata sends its metrics through the protocol OpenTSDB to the InfluxDB Server version 1.8 / port 4242 :

netdata.conf
[backend]
        # host tags =
        enabled = yes
        data source = average
        type = opentsdb
        destination = tcp:vpsfrsqlpac:4242

Metrics are sent with the following OpenTSDB syntax :

put netdata.users.sockets.daemon 1579463790 0.0000000 host=vpsfrsqlpac1

Unfortunately, Telegraf does not have any OpenTSDB input plugin, only an output one, bad news… But Netdata can also send the metrics using the graphite protocol and Telegraf supports the Graphite data format with the input plugin socket_listener.

In the target architecture, Netdata pushes the metrics in the graphite format to a Telegraf agent listening on port 14001.

Version Telegraf 1.9 and onwards is required for InfluxDB v2.

Telegraf 1.17.1 is installed in the directory /opt/influxdb/telegraf-1.17 and the Telegraf agent configuration file (tgf_netdata.conf) is prepared :

  • Input plugin : socket_listener
  • Output plugin : influxdb_v2
influxdb$ telegraf --input-filter socket_listener --output-filter influxdb_v2 config > $TGF_CFG/tgf_netdata.conf
tgf_netdata.conf
[agent]
  omit_hostname = true
  
[[inputs.socket_listener]]
  service_address = "tcp://:14001"
  data_format = "graphite"

  templates = [
        "measurement.host.measurement*"
  ]
        
[[outputs.influxdb_v2]]
 urls = ["https://vpsfrsqlpac:8086"]
 token = "K2YXbGhIJIjVhL_FjmDN_Dl3CdOIgAPi4CwHhp6SrSFOEvfm62ziYOZ15W4kySH7dc6Hlx0BhBKRvH9IXgja6g=="
 organization = "sqlpac"
 bucket = "netdatatsdb/autogen"
 insecure_skip_verify = true
  • In the general configuration [agent], the parameter omit_hostname is defined to true, otherwise the tag host is automatically added by the telegraf agent and it overrides the tag host sent by NetData.
  • The input plugin is configured : port 14001 and graphite data format. A template is applied ("measurement.host.measurement*"), indeed when NetData sends data with the graphite format, the format is the following :
    netdata.vpsfrsqlpac1.users.cpu.postgres 0.0000000 1579463970

    But we want the host name to be a tag key (host=vpsfrsqlpac1) and not defined in the measurement name :

    netdata.users.cpu.postgres,host=vpsfrsqlpac1 0.0000000 1579463970
  • In the output configuration to InfluxDB Server 2, do not forget the option insecure_skip_verify set to true if https is implemented with self-signed certificates without certificate authority. The url, token, organization and bucket are specified. The token used here is the admin’s token (dba), another user with less privileges is recommended.

The Netdata configuration is therefore modified to set graphite data format and to switch to the port of the Telegraf agent :

netdata.conf
[backend]
        enabled = yes
        data source = average
        type = graphite
        destination = tcp:vpsfrsqlpac:14001

Telegraf agent is started and Netdata server restarted :

influxdb% nohup telegraf --config $TGF_CFG/tgf_netdata.conf \
                   --debug >> $TGF_LOG/tgf_netdata.log 2>&1 &

When input/output streams are properly defined :

2021-01-28T09:11:37Z I! Loaded processors:
2021-01-28T09:11:37Z I! Loaded outputs: influxdb_v2
2021-01-28T09:11:37Z I! Tags enabled:
2021-01-28T09:11:37Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"", Flush Interval:10s
2021-01-28T09:11:37Z D! [agent] Initializing plugins
2021-01-28T09:11:37Z D! [agent] Connecting outputs
2021-01-28T09:11:37Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2021-01-28T09:11:37Z D! [agent] Successfully connected to outputs.influxdb_v2
2021-01-28T09:11:37Z D! [agent] Starting service inputs
2021-01-28T09:11:37Z I! [inputs.socket_listener] Listening on tcp://[::]:14001
2021-01-28T09:11:46Z D! [outputs.influxdb_v2] Wrote batch of 1000 metrics in 167.920781ms
2021-01-28T09:11:46Z D! [outputs.influxdb_v2] Wrote batch of 1000 metrics in 14.633634ms
2021-01-28T09:11:46Z D! [outputs.influxdb_v2] Buffer fullness: 6000 / 10000 metrics
2021-01-28T09:11:47Z D! [outputs.influxdb_v2] Wrote batch of 1000 metrics in 17.831348ms
2021-01-28T09:11:47Z D! [outputs.influxdb_v2] Wrote batch of 1000 metrics in 15.655119ms
2021-01-28T09:11:47Z D! [outputs.influxdb_v2] Wrote batch of 1000 metrics in 13.539954ms
2021-01-28T09:11:47Z D! [outputs.influxdb_v2] Wrote batch of 1000 metrics in 17.757401ms
2021-01-28T09:11:47Z D! [outputs.influxdb_v2] Wrote batch of 1000 metrics in 46.331055ms
2021-01-28T09:11:47Z D! [outputs.influxdb_v2] Wrote batch of 1000 metrics in 33.348918ms
2021-01-28T09:11:47Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics

InfluxQL queries, the endpoint /query

The endpoint /query ensures backward compatibility with InfluxQL 1.x queries :

curl --request POST https://vpsfrsqlpac:8086/query \
 --user "netdata:********" \
 --header "Accept: application/csv" \
 --data-urlencode "db=netdatatsdb" \
 --data-urlencode "rp=autogen" \
 --data-urlencode "q=SELECT mean(value) FROM netdatatsdb.autogen.backend_metrics WHERE time >= now() - 1w GROUP BY time(5m) FILL(none)"
name,tags,time,mean
backend_metrics,,1612060200000000000,1347.2222222222224
backend_metrics,,1612061100000000000,1352.6666666666667
backend_metrics,,1612062000000000000,1355.8333333333333
…

Add the option --insecure for self-signed certificates without certificate authority.

In the example above, a 1.x user is used : of course a v2 user’s token can be used :

curl --request POST https://vpsfrsqlpac:8086/query \
 --header "Authorization: Token K2YXbGhIJIjVhL_FjmDN_Dl3CdOIgAPi4CwHhp6SrSFOEvfm62ziYOZ15W4kySH7dc6Hlx0BhBKRvH9IXgja6g==" \
 --header "Accept: application/csv" \
 --data-urlencode "db=netdatatsdb" \
 --data-urlencode "rp=autogen" \
 --data-urlencode "q=SELECT mean(value) FROM netdatatsdb.autogen.backend_metrics WHERE time >= now() - 1w GROUP BY time(5m) FILL(none)"
name,tags,time,mean
backend_metrics,,1612060200000000000,1347.2222222222224
backend_metrics,,1612061100000000000,1352.6666666666667
backend_metrics,,1612062000000000000,1355.8333333333333
…

We may immediately conclude : to use 1.x InfluxQL commands (SHOW MEASUREMENTS, SHOW SERIES…), do we have now to use curl and HTTP queries to the endpoint /query ?

Temporarily, yes, but InfluxQL SHOW commands will be probably removed in future releases. InfluxDB v2 package schema should be used instead, this topic is covered in a next section (Exploring schema).

Not sure that’s the case, but it seems InfluxQL queries are translated to Flux queries. The influx client transpile option displays translations, a such mechanism exists :

influxdb$ influx transpile \
           'SELECT mean(value) FROM netdatatsdb.autogen.backend_metrics WHERE time >= now() - 1w GROUP BY time(15m) FILL(none)'
package main
from(bucket: "netdatatsdb/autogen")
        |> range(start: 2021-01-22T10:34:54.39766715Z, stop: 2021-02-29T10:34:54.39766715Z)
        |> filter(fn: (r) =>
                (r._measurement == "backend_metrics" and r._field == "value"))
        |> group(columns: ["_measurement", "_start", "_stop", "_field"], mode: "by")
        |> keep(columns: ["_measurement", "_start", "_stop", "_field", "_time", "_value"])
        |> window(every: 15m)
        |> mean()
        |> map(fn: (r) =>
                ({r with _time: r._start}))
        |> window(every: inf)
        |> rename(columns: {_value: "mean"})
        |> yield(name: "0")

The transpile option seems to be a good tool to start later InfluxQL code migration to Flux.

Writing data

InfluxDB Line protocol

About InfluxDB line protocol, no changes in version 2, the format remains the same. The InfluxQL INSERT statement is replaced by the command line influx write to write points :

influxdb$ influx write --bucket=telegraf/rp72h \
                       --precision s 'customMeasure,host=vpsfrsqlpac1 cpupct=23.4,slot=1i,isdefault=true 1581321757' 
  • The timestamp server is used when it is omitted in the line.
  • Integer datatype instead of float is forced by adding i after the value when inserting the first point. The suffix u is used to specify unsigned integers.
  • Boolean datatype is applied when writing t|true or f|false without quotes or double quotes when inserting the first point.

Applying the right data type enforces data integrity and reduces memory and space usage.

Error: Failed to write data: unexpected error writing points to database: partial write: series type mismatch: already Integer but got Float dropped=1.

The endpoint /write

The endpoint /write, like the endpoint /query, is available for writing for backward compatibility with 1.x API :

curl \
  --request POST "https://vpsfrsqlpac:8086/write?db=telegraf&rp=rp72h" \
  --user "telegraf:**********" \
  --data-binary "measurement,host=host1 field1=2i,field2=2.0 1577836800000000000"
curl \
  --request POST "https://vpsfrsqlpac:8086/write?db=telegraf&rp=rp72h" \
  --header "Authorization: Token K2YXbGhIJIjVhL_FjmDN_Dl3CdOIgAPi4CwHhp6SrSFOEvfm62ziYOZ15W4kySH7dc6Hlx0BhBKRvH9IXgja6g==" \
  --data-binary "measurement,host=host1 field1=2i,field2=2.0 1577836800000000000"

Add the option --insecure for self-signed certificates without certificate authority.

Bulk loads

In InfluxDB v2, bulk loads from CSV files are now performed using csv package (still experimental ?) :

  • CSV formats can be InfluxDB Line Protocol or Annotated CSV.
  • Sources can be https, file, raw data.

The details are not covered here, this paper only focuses on migration.

import "experimental/csv"

csv.from(file: "/sqlpac/data/netdata_20210128.csv")
  |> to(bucket: "netdatatsdb/autogen", org: "sqlpac")

SELECT INTO, function to()

In InfluxDB v1.x, SELECT INTO statements are used to copy data from one measurement to another.

In InfluxDB v2, use the Flux function to().

In the example below, computed data from the measurement netdata.netdata.backend_metrics.sent are written to the backend_metrics_perhour measurement in the same bucket (netdatatsdb/autogen). There is a lack of documentation on how to write data in the same bucket using the function set.

from(bucket: "netdatatsdb/autogen")
  |> range(start: -10d)
  |> filter(fn: (r) => r._measurement == "netdata.netdata.backend_metrics.sent")
  |> filter(fn: (r) => r._field == "value")
  |> aggregateWindow(every: 1h, fn: mean)
  |> fill(column: "_value", value: 0.0)
  |> set(key: "_measurement", value: "backend_metrics_perhour")
  |> to(org: "sqlpac", bucket: "netdatatsdb/autogen")

The InfluxQL query syntax would have been :

SELECT mean(value)
INTO netdatatsdb.autogen.backend_metrics_perhour
FROM netdatatsdb.autogen."netdata.netdata.backend_metrics.sent"
WHERE time >= now() - 10d
GROUP BY time(1h) FILL(0)

Deleting data

Less used, but we need sometimes to delete erroneous data impacting reporting.

The InfluxQL DELETE statement is replaced by the influx delete command line :

influxdb$ influx delete --bucket netdatatsdb/autogen \
  --org sqlpac \
  --start '1970-01-01T00:00:00Z' \
  --stop $(date +"%Y-%m-%dT%H:%M:%SZ") \
  --predicate '_measurement="cpu_measurement" AND location="spain"'

Start and stop times are specified using the RFC3339 format. Do not forget the predicates and start/stop.

Unfortunately, regular expressions are not yet supported in predicates.

--predicate '_measurement =~ /^netdata/'
Error: Failed to delete data: invalid request; error parsing request json:
operator: "=~" at position: 13 is not supported yet.

Exploring schema

In InfluxDB v1, we were used to explore metadata and schema with the SHOW commands.

In InfluxDB v2, helper functions in schema package replace SHOW commands functionalities, some translations examples below :

InfluxQLFlux InfluxDB v2
SHOW DATABASES
name
----
netdatatsdb
telegraf
…
SHOW RETENTION POLICIES ON telegraf
name  duration shardGroupDuration replicaN default
----  -------- ------------------ -------- -------
rp72h 72h0m0s  24h0m0s            1        true
buckets()

… name               … retentionPolicy  retentionPeriod
-------------------     ---------------  ----------------
netdatatsdb/autogen  … autogen                         0
telegraf/rp72h       … rp72h             259200000000000
…
SHOW SERIES
key
---
netdata.users.mem.mcs,host=vpsfrsqlpac1
netdata.users.mem.mcs,host=vpsfrsqlpac2
…
N/A
In the following commands, before invoking schema package functions
import "influxdata/influxdb/schema"
SHOW MEASUREMENTS ON netdatatsdb
name: measurements
name
----
cpu_measurement
netdata.apache_sqlpac.bytesperreq.size
…
schema.measurements(bucket: "netdatatsdb/autogen")
_value
------
cpu_measurement
netdata.apache_sqlpac.bytesperreq.size
…
SHOW MEASUREMENTS ON netdatatsdb
WITH MEASUREMENT =~ /postgres/

schema.measurements(bucket: "netdatatsdb/autogen")
  |> filter(fn: (r) => (r._value =~ /postgres/))
SHOW TAG KEYS ON netdatatsdb
FROM "cpu_measurement"
name: cpu_measurement
tagKey
------
host
location
schema.measurementTagKeys(
  bucket: "netdatatsdb/autogen",
  measurement: "cpu_measurement"
)
host
location
SHOW TAG VALUES ON netdatatsdb
FROM "cpu_measurement"
WITH KEY=location
name: cpu_measurement
key      value
---      -----
location france
location germany
schema.measurementTagValues(
  bucket: "netdatatsdb/autogen",
  measurement: "cpu_measurement",
  tag: "location"
)
_value
------
france
germany
SHOW FIELD KEYS ON netdatatsdb
FROM "cpu_measurement";
name: cpu_measurement
fieldKey    fieldType
--------    ---------
description string
value       float
schema.measurementFieldKeys(
  bucket: "netdatatsdb/autogen",
  measurement: "cpu_measurement"
)
_value
------
description
value

Field type is no more available, only field name.

In some schema package functions, by default search is performed with an oldest time set to -30 days. To find oldest metadata, the parameter start is specified in the function :
schema.measurementFieldKeys(
  bucket: "netdatatsdb/autogen",
  measurement: "cpu_measurement",
  start: -600d
)

Conclusion

Migration is quite easy. 2 things to bear in mind : a database + retention policy becomes a bucket in version 2 and InfluxQL is replaced by Flux language.

Inconvenients :

  • Migration may take a while depending on data volume to migrate. Space disk availability is checked before copying.
  • Continuous queries must be migrated manually to Flux tasks : this work should be performed in advance in a testing environment.
  • Admin users are not migrated and only authenticated users are migrated.
  • Mandatory migration to Telegraf to replace InfluxDB v1 OpenTSDB, Collectd… native protocols, component which adds a point of failure.

Advantages :

  • Backward compatibility with the endpoints /query and /write for existing tools (Grafana…) using authenticated users 1.x (InfluxQL queries, TickScript).
  • Chronograf and Kapacitor integrated to InfluxDB v2.
  • Flux tasks and new features with Flux language (join, pivot, packages, sql.from and much more…).

Existing tools can then be smoothly migrated to InfluxDB v2 :

  • Migration from InfluxQL to Flux / InfluxDB v2 starting Grafana 7.1.
  • Kapacitor / TickScript to Flux tasks.