Painting to Imagine #1096

nesretep-anp1 · 2025-01-24T08:28:34Z

Discussed in #1095

^{Originally posted by nesretep-anp1 January 22, 2025}
I do have some trouble with the "Painting to Imagine" in the "Image Generation" process.

...

It seems that there are several issues with selecting the used models.

/image Paint a picture of a fight of General Grevious against Darth Vader

Pay attention to the fact, that "General Grievous" is misspelled!
An agent is used with bunny-llama-3-8b-v

This result in the effect explained in the above mentioned discussion. The logs show up with ...

Fallback to default chat model tokenizer: gpt-4o.
Configure tokenizer for model: meta-llama-3.1-8b-instruct in Khoj settings to improve context stuffing.

Ok. now the effect with "violent acts" is explained; it comes out of gpt-4o. But why is it used?

In function truncate_messages the model gpt-4o is set statically as the default tokenizer.

The funny thing is, that this static assignment only gets executed if all other cases fail.

So, I set a tokenizer for the used model. Nothing changed, still gpt-4o.

Why (and I did not get to that point currently within my research) a set tokenizer is not taken into action?

I added default_tokenizer = model_name directly below the except.

Now the following happened in the logs ...

Fallback to default chat model tokenizer: meta-llama-3.1-8b-instruct.
Configure tokenizer for model: meta-llama-3.1-8b-instruct in Khoj settings to improve context stuffing.

(which then certainly fails, but ...)

I now realized - what is already shown in the first log snippet - not the model (of the agent) bunny-llama-3-8b-v is shown, instead meta-llama-3.1-8b-instruct is used!?!?!?!

Ok, why the agent's model is not used?

I went to "Settings" and set the "Chat" in "Models" to bunny-llama-3-8b-v.

Still meta-llama-3.1-8b-instruct is used!

I then changed server chat settings in "Admin Panel" to bunny-llama-3-8b-v.

Now bunny-llama-3-8b-v is used.

IMHO the real bug described here is the weird decision of which model is taken into action in which situation and constellation?!

Specially, ...

The agent has a model, why use a different one?
The user has settings, why use "server settings"?
The agent and the user has settings, why offer "server settings"? (Oh, I initially setup Khoj without "server settings" record, but I had to set it then to get Web scraper running).

Additionally, ...

truncate_messages should not get set statically and the set tokenizer of the model should get applied here

The text was updated successfully, but these errors were encountered:

debanjum · 2025-01-24T10:04:56Z

Why (and I did not get to that point currently within my research) a set tokenizer is not taken into action?

truncate_messages should not get set statically and the set tokenizer of the model should get applied here

The tokenizer is only used for token counting, this is used for message truncation such that it fits the models (set) max prompt size. It is only something to worry about if you're hitting max context limits or wanting to reduce memory usage. I intend to remove that debug line as it has confused other folks as well.

Fallback to default chat model tokenizer: gpt-4o.
Configure tokenizer for model: meta-llama-3.1-8b-instruct in Khoj settings to improve context stuffing.

I see how this message can be confusion but the tokenizer name is not the same as the chat model name. Khoj can identify the tokenizer for offline models run directly within Khoj. But it cannot for chat models run via API.

For such models, you can set the tokenizer field in the chat model settings on the admin panel to the huggingface repo corresponding to the tokenizer for your chat model. For llama models the tokenizer can be set to hf-internal-testing/llama-tokenizer from huggingface.

Again It is only something to worry about if you're hitting max context limits or wanting to reduce memory usage. Otherwise this isn't really relevant to your issue with chat model selection and image generation, IMO.

IMHO the real bug described here is the weird decision of which model is taken into action in which situation and constellation?!

Specially, ...

The agent has a model, why use a different one?

The user has settings, why use "server settings"?

The agent and the user has settings, why offer "server settings"? (Oh, I initially setup Khoj without "server settings" record, but I had to set it then to get Web scraper running).

Server chat settings are prioritized over user chat to allow server admins more control on the appropriate model to use for intermediary and background tasks. It is (meant to be) an optional feature. The priority order of default chat model selection is:
server chat settings > user chat settings > first chat model added. See

khoj/src/khoj/database/adapters/__init__.py

Lines 1112 to 1113 in a3b5ec4

    
           async def aget_default_chat_model(user: KhojUser = None): 
        
               """Get default conversation config. Prefer chat model by server admin > user > first created chat model"""

Note: The agent chat model is current only used for the final response generation step. Not for the intermediate steps.

Questions:

What prioritization order chat model do you expect? (e.g agent > user?).
We should for sure document the chat model prioritization currently used, if it's not already being done.
What behavior were you seeing with the web scrapers that made you set the server chat settings in Khoj? Note: Setting the chat models is not a required field (I believe) when setting up the web scraper prioritization via the server chat settings

nesretep-anp1 · 2025-01-24T13:03:22Z

The tokenizer is only used for token counting, this is used for message truncation such that it fits the models (set) max prompt size. It is only something to worry about if you're hitting max context limits or wanting to reduce memory usage. I intend to remove that debug line as it has confused other folks as well.

You should not; that was the reason why I came across that function. IMHO the logging output should be - depending on level - much, much more.

On the one hand again I think, that there should not be a static assignment of a model.

On the other hand, ...

I see how this message can be confusion but the tokenizer name is not the same as the chat model name. Khoj can identify the tokenizer for offline models run directly within Khoj. But it cannot for chat models run via API.

For such models, you can set the tokenizer field in the chat model settings on the admin panel to the huggingface repo corresponding to the tokenizer for your chat model. For llama models the tokenizer can be set to hf-internal-testing/llama-tokenizer.

Like said in my original post, I set the tokenizer field of the model (the one of the agent as well as the one used for the prompt enhancement), but still the one out of default_tokenizer was used.

That is a nice hint and should be at least part of an example within the docs for using on-prem constellations. But, ...

Again It is only something to worry about if you're hitting max context limits or wanting to reduce memory usage. Otherwise this isn't really relevant to your issue with chat model selection and image generation, IMO.

..., you saw my originating prompt: this should not lead into hitting the limits, right?

(And, before you ask, ... for these tests I initiated a new instance without any nodes, docs, ... in it currently.)

Like said, ....

the default_model should not get set statically.
Why truncate_message is called alyways?

nesretep-anp1 · 2025-01-24T13:13:07Z

What prioritization order chat model do you expect? (e.g agent > user?).

We should for sure document the chat model prioritization currently used, if it's not already being done.

Well, ... IMHO and specially due to the fact, that there are centrally/admin and user organized/maintained/... agents, everything needed should be maintained/stored/set/... within the agent specification.

No user setting, no server setting, ...

In fact everything starts with a conversation with an (user selected) agent.

This agent has it's personality (system prompt(s)), model, parameters, ..., ...

And this agent definition should be used for everything.

Perhaps an implicit default model (e.g. when an agent does not have a image generation model explicitly set), but if it is set within the agent, then no other parameter, model, ... should be used anywhere.

I really was very confused when realizing that - while the agent has model_1 configured - the image generation with THIS agent uses model_2 for prompt enhancement.

nesretep-anp1 · 2025-01-24T13:13:44Z

What behavior were you seeing with the web scrapers that made you set the server chat settings in Khoj? Note: Setting the chat models is not a required field (I believe) when setting up the web scraper prioritization via the server chat settings

Forcing Khoj to use the direct one. ;)

debanjum · 2025-02-03T22:21:25Z

Agent chat model cannot override chat model set by server admin for the intermediate steps.

But this commit enables using the agents chat model instead of the default user chat model (set via the /settings page) for the intermediate steps (including image prompt generation) when no server chat settings are set.

To have the agent chat model be used for intermediate steps remove the server chat setting. As priority order for intermediate steps is server > agent > user chat model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Painting to Imagine #1096

Painting to Imagine #1096

nesretep-anp1 commented Jan 24, 2025 •

edited

Loading

debanjum commented Jan 24, 2025 •

edited

Loading

nesretep-anp1 commented Jan 24, 2025 •

edited

Loading

nesretep-anp1 commented Jan 24, 2025

nesretep-anp1 commented Jan 24, 2025

debanjum commented Feb 3, 2025

Painting to Imagine #1096

Painting to Imagine #1096

Comments

nesretep-anp1 commented Jan 24, 2025 • edited Loading

Discussed in #1095

debanjum commented Jan 24, 2025 • edited Loading

Questions:

nesretep-anp1 commented Jan 24, 2025 • edited Loading

nesretep-anp1 commented Jan 24, 2025

nesretep-anp1 commented Jan 24, 2025

debanjum commented Feb 3, 2025

nesretep-anp1 commented Jan 24, 2025 •

edited

Loading

debanjum commented Jan 24, 2025 •

edited

Loading

nesretep-anp1 commented Jan 24, 2025 •

edited

Loading