Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with 'influxdb3 show databases' when outputting to parquet #25941

Open
sanderson opened this issue Jan 30, 2025 · 3 comments
Open

Bug with 'influxdb3 show databases' when outputting to parquet #25941

sanderson opened this issue Jan 30, 2025 · 3 comments
Labels

Comments

@sanderson
Copy link
Contributor

sanderson commented Jan 30, 2025

The influxdb3 show databases command lists parquet as a supported output format but doesn't provide a -o,--output option to specify a file to write output to. You can't output raw Parquet in a command line.

Steps to reproduce:

  1. Create one or more databases in an InfluxDB 3 Core or Enterprise instance.

  2. Run the following command to list databases and format the output as Parquet:

    influxdb3 show databases --format parquet

Expected behaviour:

I would expect the show commands, if they are to support Parquet output, would also include a -o, --output option to output the raw Parquet to a file (like the influxdb3 query command does).

Actual behaviour:

When you try to output Parquet on the command line, you get an error similar to:

Show command failed: invalid utf-8 sequence of 1 bytes from index 19
@hiltontj hiltontj added the v3 label Jan 30, 2025
@Karribalu
Copy link

Hello @sanderson,

I’ve investigated the issue, and here are my findings:

  1. The error occurs at index 19, which corresponds to an encoded character in the Parquet metadata (\xb5, Hex: 0xb5).
  2. Parquet produces binary data that is not necessarily UTF-8 encoded, which causes the println! statement to fail.
  3. To handle this, we can use String::from_utf8_lossy, which replaces invalid UTF-8 sequences with the replacement character (�) instead of failing.
    I’ve raised a PR to implement this change.

Thanks,
Bala.

@mgattozzi
Copy link
Contributor

mgattozzi commented Feb 4, 2025

@Karribalu If you do want to contribute and help it's important to understand that Parquet data is equivalent to a Vec<u8> and is not in any way valid utf-8. We want to have the command require a user to specify a file path to output the data to if we have parquet specified as the format. If not printing it out like with json or a pretty table is fine.

We do this exact thing in the query subcommand here ->

// write to file if output path specified
if let Some(path) = &config.output_file_path {
let mut f = OpenOptions::new()
.write(true)
.create(true)
.truncate(true)
.open(path)
.await?;
f.write_all_buf(&mut resp_bytes).await?;
} else {
if config.output_format.is_parquet() {
Err(Error::NoOutputFileForParquet)?
}
println!("{}", std::str::from_utf8(&resp_bytes)?);
}

It's just a matter of making it work for the show command here.

@Karribalu
Copy link

Hello @mgattozzi,

I apologise for misunderstanding the ask here and I can see now what can be done here.
Thanks for the reference.

As I can see this issue is also happening in show system command
influxdb3 show system --database <database-name> table-list --format parquet
influxdb3 show system --database <database-name> table --format parquet distinct_caches
influxdb3 show system --database <database-name> summary --format parquet

for reference I get this error for all these commands.

distinct_caches summary: thread 'main' panicked at influxdb3/src/commands/show/system.rs:267:64: called Result::unwrap()on anErrvalue: FromUtf8Error { bytes: [80, 65, 82, 49, 21, 2, 25, 188, 72, 12, 97, 114, 114, 111, 119, 95, 115, 99, 104, 101, 109, 97, 21, 12, 0, 21, 12, 37, 0, 24, 5, 116, 97, 98, 108, 101, 37, 0, 76, 28, 0, 0, 0, 21, 12, 37, 0, 24, 4, 110, 97, 109, 101, 37, 0, 76, 28, 0, 0, 0, 53, 0, 24, 10, 99, 111, 108, 117, 109, 110, 95, 105, 100, 115, 21, 2, 21, 6, 76, 60, 0, 0, 0, 53, 4, 24, 4, 108, 105, 115, 116, 21, 2, 0, 21, 2, 37, 2, 24, 4, 105, 116, 101, 109, 37, 26, 76, 172, 19, 32, 18, 0, 0, 0, 53, 0, 24, 12, 99, 111, 108, 117, 109, 110, 95, 110, 97, 109, 101, 115, 21, 2, 21, 6, 76, 60, 0, 0, 0, 53, 4, 24, 4, 108, 105, 115, 116, 21, 2, 0, 21, 12, 37, 2, 24, 4, 105, 116, 101, 109, 37, 0, 76, 28, 0, 0, 0, 21, 4, 37, 0, 24, 15, 109, 97, 120, 95, 99, 97, 114, 100, 105, 110, 97, 108, 105, 116, 121, 37, 28, 76, 172, 19, 64, 18, 0, 0, 0, 21, 4, 37, 0, 24, 15, 109, 97, 120, 95, 97, 103, 101, 95, 115, 101, 99, 111, 110, 100, 115, 37, 28, 76, 172, 19, 64, 18, 0, 0, 0, 22, 0, 25, 12, 25, 28, 24, 12, 65, 82, 82, 79, 87, 58, 115, 99, 104, 101, 109, 97, 24, 168, 5, 47, 47, 47, 47, 47, 47, 81, 66, 65, 65, 65, 81, 65, 65, 65, 65, 65, 65, 65, 75, 65, 65, 119, 65, 67, 103, 65, 74, 65, 65, 81, 65, 67, 103, 65, 65, 65, 66, 65, 65, 65, 65, 65, 65, 65, 81, 81, 65, 67, 65, 65, 73, 65, 65, 65, 65, 66, 65, 65, 73, 65, 65, 65, 65, 66, 65, 65, 65, 65, 65, 89, 65, 65, 65, 67, 89, 65, 81, 65, 65, 88, 65, 69, 65, 65, 79, 81, 65, 65, 65, 66, 48, 65, 65, 65, 65, 80, 65, 65, 65, 65, 65, 81, 65, 65, 65, 67, 81, 47, 118, 47, 47, 69, 65, 65, 65, 65, 66, 81, 65, 65, 65, 65, 65, 65, 65, 65, 67, 69, 65, 65, 65, 65, 80, 114, 43, 47, 47, 57, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 56, 65, 65, 65, 66, 116, 89, 88, 104, 102, 89, 87, 100, 108, 88, 51, 78, 108, 89, 50, 57, 117, 90, 72, 77, 65, 120, 80, 55, 47, 47, 120, 65, 65, 65, 65, 65, 85, 65, 65, 65, 65, 65, 65, 65, 65, 65, 104, 65, 65, 65, 65, 65, 117, 47, 47, 47, 47, 81, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 80, 65, 65, 65, 65, 98, 87, 70, 52, 88, 50, 78, 104, 99, 109, 82, 112, 98, 109, 70, 115, 97, 88, 82, 53, 65, 80, 106, 43, 47, 47, 56, 89, 65, 65, 65, 65, 68, 65, 65, 65, 65, 65, 65, 65, 65, 65, 120, 73, 65, 65, 65, 65, 65, 81, 65, 65, 65, 66, 103, 65, 65, 65, 68, 115, 47, 118, 47, 47, 69, 65, 65, 85, 65, 66, 65, 65, 68, 103, 65, 80, 65, 65, 81, 65, 65, 65, 65, 73, 65, 66, 65, 65, 65, 65, 65, 85, 65, 65, 65, 65, 68, 65, 65, 65, 65, 65, 65, 65, 65, 82, 103, 77, 65, 65, 65, 65, 65, 65, 65, 65, 65, 66, 106, 47, 47, 47, 56, 69, 65, 65, 65, 65, 97, 88, 82, 108, 98, 81, 65, 65, 65, 65, 65, 77, 65, 65, 65, 65, 89, 50, 57, 115, 100, 87, 49, 117, 88, 50, 53, 104, 98, 87, 86, 122, 65, 65, 65, 65, 65, 71, 84, 47, 47, 47, 56, 89, 65, 65, 65, 65, 68, 65, 65, 65, 65, 65, 65, 65, 65, 65, 120, 85, 65, 65, 65, 65, 65, 81, 65, 65, 65, 66, 103, 65, 65, 65, 66, 89, 47, 47, 47, 47, 69, 65, 65, 87, 65, 66, 65, 65, 68, 103, 65, 80, 65, 65, 81, 65, 65, 65, 65, 73, 65, 66, 65, 65, 65, 65, 65, 89, 65, 65, 65, 65, 72, 65, 65, 65, 65, 65, 65, 65, 65, 81, 73, 89, 65, 65, 65, 65, 65, 65, 65, 71, 65, 65, 103, 65, 66, 65, 65, 71, 65, 65, 65, 65, 73, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 69, 65, 65, 65, 65, 97, 88, 82, 108, 98, 81, 65, 65, 65, 65, 65, 75, 65, 65, 65, 65, 89, 50, 57, 115, 100, 87, 49, 117, 88, 50, 108, 107, 99, 119, 65, 65, 50, 80, 47, 47, 47, 120, 81, 65, 65, 65, 65, 77, 65, 65, 65, 65, 65, 65, 65, 65, 71, 65, 119, 65, 65, 65, 65, 65, 65, 65, 65, 65, 121, 80, 47, 47, 47, 119, 81, 65, 65, 65, 66, 117, 89, 87, 49, 108, 65, 65, 65, 65, 65, 66, 65, 65, 70, 65, 65, 81, 65, 65, 65, 65, 68, 119, 65, 69, 65, 65, 65, 65, 67, 65, 65, 81, 65, 65, 65, 65, 71, 65, 65, 65, 65, 65, 119, 65, 65, 65, 65, 65, 65, 65, 65, 89, 69, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 69, 65, 65, 81, 65, 66, 65, 65, 65, 65, 65, 85, 65, 65, 65, 66, 48, 89, 87, 74, 115, 90, 81, 65, 65, 65, 65, 61, 61, 0, 24, 25, 112, 97, 114, 113, 117, 101, 116, 45, 114, 115, 32, 118, 101, 114, 115, 105, 111, 110, 32, 53, 51, 46, 51, 46, 48, 25, 108, 28, 0, 0, 28, 0, 0, 28, 0, 0, 28, 0, 0, 28, 0, 0, 28, 0, 0, 0, 209, 3, 0, 0, 80, 65, 82, 49], error: Utf8Error { valid_up_to: 7, error_len: Some(1) } }

Should I also consider fixing these ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants