Fields get missing from fields.idx for some shards #25965

hansole · 2025-02-04T11:07:58Z

Steps to reproduce:
List the minimal actions needed to reproduce the behaviour.

reboot of InfluxDB node

Expected behaviour:
Queries should still show metrics from all shreds.

Actual behaviour:
After restarting the influxdb service we seem to lose random 24h chunks of data that was previously present in influxdb. The gaps correspond to shard durations. so it looks like the shards lose data. Influx queries returns no results for the periods represented by the shards. The gaps are in several places in the range, but the issue is never on "latest" data.

After some investigation, we have made the following observations:

Looks like the data is still present in the shards (can be extracted using the influxd inspect export-lp command).
There are tags are missing from the fields.idx files for the faulty shards
- It can be fixed by inserting a single measurement using the Web UI for the Line Protocol, using the missing field from fields.idx and a timestamp matching the faulty shard. Now it is added to the fields.idxl. and Influxdb returns the metrics for the whole shard. This works, but is a very manual process that results in fake data.
- It can also be fixed by removing all fields.idx files. This will result in the files getting re-generated when starting InfluxDB, with all the tags in the shard as part of the fields.idx. But for some reason, for some shards fields.idx does not get generated.
- It can also be fixed by just copying the .idx file from “neighboring” shards, since the metrics are very similar between shards. This has been the most reliable fix, and we were able to restore (make searchable again) all data in the database using this approach.

Any idea why the tags was not in the fields.idx after the service was restarted? Could they been present in fields.idxl before the reboot, and failed to be copied for some reason?

Why does it fail to re-generate fields.idx for some of the shards? Is there some way to “force” re-generation of the fields.idx file other that delete and restart of InfluxDB? Can we enable extra logging to get an idea of why they are not re-generated?

Environment info:

Linux 4.18.0-553.36.1.el8_10.x86_64 x86_64
InfluxDB v2.7.11
$ free -h
total used free shared buff/cache available
Mem: 1.5Ti 49Gi 916Gi 52Mi 545Gi 1.4Ti
Swap: 4.0Gi 0B 4.0Gi
$ df -h /var/lib/influxdb/
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg01-influxdb 2.0T 896G 1018G 47% /var/lib/influxdb

Config:
$ cat /etc/influxdb/config.toml
bolt-path = "/var/lib/influxdb/influxd.bolt"
engine-path = "/var/lib/influxdb/engine"
storage-cache-max-memory-size = "4g"
storage-compact-throughput-burst = "1700M"
storage-compact-throughput = "700M"

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fields get missing from fields.idx for some shards #25965

Fields get missing from fields.idx for some shards #25965

hansole commented Feb 4, 2025

Fields get missing from fields.idx for some shards #25965

Fields get missing from fields.idx for some shards #25965

Comments

hansole commented Feb 4, 2025