[Bug]: Trying to run the vectorizer on a large number of new documents results in "Requested 629204 tokens, max 600000 tokens per request" from openai #481

kolaente · 2025-02-14T10:24:24Z

What happened?

I'm running the vectorizer on a new large dataset and get this error from openai:

unexpected error: Error code: 400 - {'error': {'message': 'Requested 629204 tokens, max 600000 tokens per request', 'type': 'max_tokens_per_request', 'param': None, 'code': 'max_tokens_per_request'}}

I wonder if that's a configuration error?

pgai extension affected

0.7.0

pgai library affected

No response

PostgreSQL version used

17

What operating system did you use?

latest timescale/timescaledb-ha:pg17

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

How can we reproduce the bug?

1. Have a large amount of data
2. Create a vectorizer
3. Run the worker

(I can't really pin it down, will try to see if I can reproduce it better)

Are you going to work on the bugfix?

None

The text was updated successfully, but these errors were encountered:

Resolves timescale#481

kolaente · 2025-02-14T11:16:20Z

PR with a fix: #482

Resolves timescale#481

cevian · 2025-02-14T16:17:35Z

@kolaente does reducing batch_size in the processing configuration (docs) help?

kolaente added bug Something isn't working community pgai labels Feb 14, 2025

kolaente added a commit to kolaente/pgai that referenced this issue Feb 14, 2025

fix: max_tokens_per_request error when pushing many documents at once

1e83b35

Resolves timescale#481

kolaente linked a pull request Feb 14, 2025 that will close this issue

fix: max_tokens_per_request error when pushing many documents at once #482

Open

kolaente added a commit to kolaente/pgai that referenced this issue Feb 14, 2025

fix: max_tokens_per_request error when pushing many documents at once

6eabc74

Resolves timescale#481

kolaente added a commit to kolaente/pgai that referenced this issue Feb 14, 2025

fix: max_tokens_per_request error when pushing many documents at once

08c7a1e

Resolves timescale#481

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Trying to run the vectorizer on a large number of new documents results in "Requested 629204 tokens, max 600000 tokens per request" from openai #481

[Bug]: Trying to run the vectorizer on a large number of new documents results in "Requested 629204 tokens, max 600000 tokens per request" from openai #481

kolaente commented Feb 14, 2025

kolaente commented Feb 14, 2025

cevian commented Feb 14, 2025

[Bug]: Trying to run the vectorizer on a large number of new documents results in "Requested 629204 tokens, max 600000 tokens per request" from openai #481

[Bug]: Trying to run the vectorizer on a large number of new documents results in "Requested 629204 tokens, max 600000 tokens per request" from openai #481

Comments

kolaente commented Feb 14, 2025

What happened?

pgai extension affected

pgai library affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

Are you going to work on the bugfix?

kolaente commented Feb 14, 2025

cevian commented Feb 14, 2025