You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We are scraping mssql dbs using the python SDK Pipeline (from datahub.ingestion.run.pipeline import Pipeline)
For many dbs this runs without issues, for one db however, there are failure reported in the logs of the ingestion job:
{'error': 'Unable to emit metadata to DataHub GMS: com.datahub.util.exception.RetryLimitReached: Failed to add after 3 retries',
Looking into the logs of the GMS container, it turns out that requests that are sent to elastic search have an invalid json payload:
[Thread-11491] ERROR c.l.m.s.e.query.ESSearchDAO:385 - Auto complete query failed:OpenSearch exception [type=x_content_parse_exception, reason=[1:3348] [highlight] failed to parse field [fields]]
[Thread-11491] ERROR c.l.m.s.e.query.ESSearchDAO:385 - Auto complete query failed:OpenSearch exception [type=x_content_parse_exception, reason=[1:3348] [highlight] failed to parse field [fields]]
[Thread-11490] ERROR c.l.m.s.e.query.ESSearchDAO:385 - Auto complete query failed:OpenSearch exception [type=x_content_parse_exception, reason=[1:3345] [highlight] failed to parse field [fields]]
[Thread-11490] ERROR c.l.m.s.e.query.ESSearchDAO:385 - Auto complete query failed:OpenSearch exception [type=x_content_parse_exception, reason=[1:3345] [highlight] failed to parse field [fields]]
[Thread-11490] ERROR c.l.d.g.r.search.AutocompleteUtils:66 - Failed to execute autocomplete all: field null, query *, filters: null, limit: null
Suppressed: org.opensearch.client.ResponseException: method [POST], host [http://127.0.0.1:9200], URI [/mlmodelindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{
"error": {
"root_cause": [
{
"type": "x_content_parse_exception",
"reason": "[1:3345] [highlight] failed to parse field [fields]"
}
],
"type": "x_content_parse_exception",
"reason": "[1:3345] [highlight] failed to parse field [fields]",
"caused_by": {
"type": "json_parse_exception",
"reason": "Duplicate field 'name'\n at [Source: (org.elasticsearch.common.io.stream.InputStreamStreamInput); line: 1, column: 3353]"
}
},
"status": 400
}
To Reproduce
The errors occur only for one out of many mssql dbs that we are scraping, so I am unsure how to reproduce the issue
Expected behavior
The GMS should send only valid requests to elastic search.
Additional context
acryl-datahub python package version: 0.15.0.5
Datahub gms version: 0.15.0
elastic search version: 7.10.1
The text was updated successfully, but these errors were encountered:
Describe the bug
We are scraping mssql dbs using the python SDK Pipeline (
from datahub.ingestion.run.pipeline import Pipeline
)For many dbs this runs without issues, for one db however, there are failure reported in the logs of the ingestion job:
Looking into the logs of the GMS container, it turns out that requests that are sent to elastic search have an invalid json payload:
To Reproduce
The errors occur only for one out of many mssql dbs that we are scraping, so I am unsure how to reproduce the issue
Expected behavior
The GMS should send only valid requests to elastic search.
Additional context
The text was updated successfully, but these errors were encountered: