Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform a research and implement metric collection POC #14915

Closed
alexandr-shegeda opened this issue Jul 21, 2022 · 2 comments · May be fixed by airbytehq/airbyte-e2e-testing-tool#41
Closed

Perform a research and implement metric collection POC #14915

alexandr-shegeda opened this issue Jul 21, 2022 · 2 comments · May be fixed by airbytehq/airbyte-e2e-testing-tool#41
Assignees
Labels

Comments

@alexandr-shegeda
Copy link
Contributor

Tell us about the problem you're trying to solve

We are going to implement some metrics and collect statistics about sync runs.

Describe the solution you’d like

We should analyze existing frameworks and compare them in order to make a decision in favor of one of them to be implemented.

Describe the alternative you’ve considered or used

There are a number of existing tools that allow to collect and display of the benchmarks, such as Datadog, NewRelic, Dynatrace, etc.

Additional context

In the scope of this ticket, we expect to do research and possibly implement some high-level POC.

@etsybaev
Copy link
Contributor

Greg Solovyev (Airbyte)
7:40 PM
Here’s what I think we need wrt metrics collection:
Throughput metrics (measured separately for full refresh syncs and incremental syncs, also separately for CDC and non-CDC configurations):
MB/second read for source connectors
MB/second written for destination connectors
Records/second read for source connectors
Records/second written for destination connectors
Records/second processed during normalization phase
MB/second processed during normalization phase (if this is possible to measure)
Scalability metrics:
Minimum Memory required to read XXMb size row
Distributions of throughput metrics listed above measured over different numbers of streams (example: measure MB/second read by MySQL Source Connector with 1 vCPU/500MB Memory when connector has 1 stream, 2 streams…1K streams, measure the same with 2 vCPUs/500MB Memory, measure the same with 1 vCPU/1GB Memory). The goal is to understand if, when and to what extent adding resources to connector containers improves throughput
Performance metrics:
time between source connector startup and first record read
time between destination connector startup and first record written
time elapsed between the moment when source connector sends a record/message to the platform and destination connector receives it

Greg Solovyev (Airbyte)
7:56 PM
You may want to use TPC-DI industry standard benchmark for all test sets and as a guidance for metrics: https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-di_v1.1.0.pdf
https://www.tpc.org/tpcdi/default5.asp

Greg Solovyev (Airbyte)
8:27 PM
Regarding integrating metrics from Airbyte core part: yes, we can use metrics from Airbyte Core via API if that API has the metrics that we need (I don’t know what metrics are available there).
Regarding DataDog: I don’t think we need to use DataDog. The scope of this task is for a Benchmarking tool, not for the entire Airbyte Cloud. We want to run this tool once every few weeks and have it generate a report (a spreadsheet or a data set in our internal data warehouse would be acceptable as an output from this tool). Rather than taking these metrics from real customers in the cloud or real users of OSS platform, we want to use this tool in an isolated environment with pre-defined data sets. The reason for this is that we need to be able to push the platform to the limits and measure the limits of performance and scale, and that we need to be able to repeat the same benchmark run (with the same data set) on different configurations and different versions of our software.

@etsybaev etsybaev removed their assignment Aug 29, 2022
@kimerinn
Copy link
Contributor

kimerinn commented Sep 6, 2022

As a conclusion from conversation with @alexandr-shegeda , I see following subtasks:

  1. Storing simple metrics (sync time) in memory store on server side
  2. Obtaining sync time metrics on client side (e2e-testing-tool)
  3. Storing metrics on client side
  4. Enriching server-side metrics (throughput metrics, scalability metrics, performance metrics)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants