Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GMS sends invalid requests to elastic search (duplicated fields in json request body) #12575

Open
ttekampe opened this issue Feb 7, 2025 · 0 comments
Labels
bug Bug report

Comments

@ttekampe
Copy link

ttekampe commented Feb 7, 2025

Describe the bug
We are scraping mssql dbs using the python SDK Pipeline (from datahub.ingestion.run.pipeline import Pipeline)
For many dbs this runs without issues, for one db however, there are failure reported in the logs of the ingestion job:

{'error': 'Unable to emit metadata to DataHub GMS: com.datahub.util.exception.RetryLimitReached: Failed to add after 3 retries',

Looking into the logs of the GMS container, it turns out that requests that are sent to elastic search have an invalid json payload:


[Thread-11491] ERROR c.l.m.s.e.query.ESSearchDAO:385 - Auto complete query failed:OpenSearch exception [type=x_content_parse_exception, reason=[1:3348] [highlight] failed to parse field [fields]]
[Thread-11491] ERROR c.l.m.s.e.query.ESSearchDAO:385 - Auto complete query failed:OpenSearch exception [type=x_content_parse_exception, reason=[1:3348] [highlight] failed to parse field [fields]]
[Thread-11490] ERROR c.l.m.s.e.query.ESSearchDAO:385 - Auto complete query failed:OpenSearch exception [type=x_content_parse_exception, reason=[1:3345] [highlight] failed to parse field [fields]]
[Thread-11490] ERROR c.l.m.s.e.query.ESSearchDAO:385 - Auto complete query failed:OpenSearch exception [type=x_content_parse_exception, reason=[1:3345] [highlight] failed to parse field [fields]]
[Thread-11490] ERROR c.l.d.g.r.search.AutocompleteUtils:66 - Failed to execute autocomplete all: field null, query *, filters: null, limit: null
Suppressed: org.opensearch.client.ResponseException: method [POST], host [http://127.0.0.1:9200], URI [/mlmodelindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{
    "error": {
        "root_cause": [
            {
                "type": "x_content_parse_exception",
                "reason": "[1:3345] [highlight] failed to parse field [fields]"
            }
        ],
        "type": "x_content_parse_exception",
        "reason": "[1:3345] [highlight] failed to parse field [fields]",
        "caused_by": {
            "type": "json_parse_exception",
            "reason": "Duplicate field 'name'\n at [Source: (org.elasticsearch.common.io.stream.InputStreamStreamInput); line: 1, column: 3353]"
        }
    },
    "status": 400
}

To Reproduce
The errors occur only for one out of many mssql dbs that we are scraping, so I am unsure how to reproduce the issue

Expected behavior
The GMS should send only valid requests to elastic search.

Additional context

  • acryl-datahub python package version: 0.15.0.5
  • Datahub gms version: 0.15.0
  • elastic search version: 7.10.1
@ttekampe ttekampe added the bug Bug report label Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

1 participant