Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Painting to Imagine #1096

Open
nesretep-anp1 opened this issue Jan 24, 2025 · 5 comments
Open

Painting to Imagine #1096

nesretep-anp1 opened this issue Jan 24, 2025 · 5 comments

Comments

@nesretep-anp1
Copy link

nesretep-anp1 commented Jan 24, 2025

Discussed in #1095

Originally posted by nesretep-anp1 January 22, 2025
I do have some trouble with the "Painting to Imagine" in the "Image Generation" process.

...


It seems that there are several issues with selecting the used models.

/image Paint a picture of a fight of General Grevious against Darth Vader
  • Pay attention to the fact, that "General Grievous" is misspelled!
  • An agent is used with bunny-llama-3-8b-v

This result in the effect explained in the above mentioned discussion. The logs show up with ...

Fallback to default chat model tokenizer: gpt-4o.
Configure tokenizer for model: meta-llama-3.1-8b-instruct in Khoj settings to improve context stuffing.

Ok. now the effect with "violent acts" is explained; it comes out of gpt-4o. But why is it used?

In function truncate_messages the model gpt-4o is set statically as the default tokenizer.

The funny thing is, that this static assignment only gets executed if all other cases fail.

So, I set a tokenizer for the used model. Nothing changed, still gpt-4o.

Why (and I did not get to that point currently within my research) a set tokenizer is not taken into action?

I added default_tokenizer = model_name directly below the except.

Now the following happened in the logs ...

Fallback to default chat model tokenizer: meta-llama-3.1-8b-instruct.
Configure tokenizer for model: meta-llama-3.1-8b-instruct in Khoj settings to improve context stuffing.

(which then certainly fails, but ...)

I now realized - what is already shown in the first log snippet - not the model (of the agent) bunny-llama-3-8b-v is shown, instead meta-llama-3.1-8b-instruct is used!?!?!?!

Ok, why the agent's model is not used?

I went to "Settings" and set the "Chat" in "Models" to bunny-llama-3-8b-v.

Still meta-llama-3.1-8b-instruct is used!

I then changed server chat settings in "Admin Panel" to bunny-llama-3-8b-v.

Now bunny-llama-3-8b-v is used.

IMHO the real bug described here is the weird decision of which model is taken into action in which situation and constellation?!

Specially, ...

  • The agent has a model, why use a different one?
  • The user has settings, why use "server settings"?
  • The agent and the user has settings, why offer "server settings"? (Oh, I initially setup Khoj without "server settings" record, but I had to set it then to get Web scraper running).

Additionally, ...

  • truncate_messages should not get set statically and the set tokenizer of the model should get applied here
@debanjum
Copy link
Member

debanjum commented Jan 24, 2025

Why (and I did not get to that point currently within my research) a set tokenizer is not taken into action?

truncate_messages should not get set statically and the set tokenizer of the model should get applied here

The tokenizer is only used for token counting, this is used for message truncation such that it fits the models (set) max prompt size. It is only something to worry about if you're hitting max context limits or wanting to reduce memory usage. I intend to remove that debug line as it has confused other folks as well.

Fallback to default chat model tokenizer: gpt-4o.
Configure tokenizer for model: meta-llama-3.1-8b-instruct in Khoj settings to improve context stuffing.

I see how this message can be confusion but the tokenizer name is not the same as the chat model name. Khoj can identify the tokenizer for offline models run directly within Khoj. But it cannot for chat models run via API.

For such models, you can set the tokenizer field in the chat model settings on the admin panel to the huggingface repo corresponding to the tokenizer for your chat model. For llama models the tokenizer can be set to hf-internal-testing/llama-tokenizer from huggingface.

Again It is only something to worry about if you're hitting max context limits or wanting to reduce memory usage. Otherwise this isn't really relevant to your issue with chat model selection and image generation, IMO.

IMHO the real bug described here is the weird decision of which model is taken into action in which situation and constellation?!

Specially, ...

  • The agent has a model, why use a different one?
  • The user has settings, why use "server settings"?
  • The agent and the user has settings, why offer "server settings"? (Oh, I initially setup Khoj without "server settings" record, but I had to set it then to get Web scraper running).

Server chat settings are prioritized over user chat to allow server admins more control on the appropriate model to use for intermediary and background tasks. It is (meant to be) an optional feature. The priority order of default chat model selection is:
server chat settings > user chat settings > first chat model added. See

async def aget_default_chat_model(user: KhojUser = None):
"""Get default conversation config. Prefer chat model by server admin > user > first created chat model"""

Note: The agent chat model is current only used for the final response generation step. Not for the intermediate steps.

Questions:

  • What prioritization order chat model do you expect? (e.g agent > user?).
  • We should for sure document the chat model prioritization currently used, if it's not already being done.
  • What behavior were you seeing with the web scrapers that made you set the server chat settings in Khoj? Note: Setting the chat models is not a required field (I believe) when setting up the web scraper prioritization via the server chat settings

@nesretep-anp1
Copy link
Author

nesretep-anp1 commented Jan 24, 2025

The tokenizer is only used for token counting, this is used for message truncation such that it fits the models (set) max prompt size. It is only something to worry about if you're hitting max context limits or wanting to reduce memory usage. I intend to remove that debug line as it has confused other folks as well.

You should not; that was the reason why I came across that function. IMHO the logging output should be - depending on level - much, much more.

On the one hand again I think, that there should not be a static assignment of a model.

On the other hand, ...

I see how this message can be confusion but the tokenizer name is not the same as the chat model name. Khoj can identify the tokenizer for offline models run directly within Khoj. But it cannot for chat models run via API.

For such models, you can set the tokenizer field in the chat model settings on the admin panel to the huggingface repo corresponding to the tokenizer for your chat model. For llama models the tokenizer can be set to hf-internal-testing/llama-tokenizer.

Like said in my original post, I set the tokenizer field of the model (the one of the agent as well as the one used for the prompt enhancement), but still the one out of default_tokenizer was used.

That is a nice hint and should be at least part of an example within the docs for using on-prem constellations. But, ...

Again It is only something to worry about if you're hitting max context limits or wanting to reduce memory usage. Otherwise this isn't really relevant to your issue with chat model selection and image generation, IMO.

..., you saw my originating prompt: this should not lead into hitting the limits, right?

(And, before you ask, ... for these tests I initiated a new instance without any nodes, docs, ... in it currently.)

Like said, ....

  • the default_model should not get set statically.

  • Why truncate_message is called alyways?

@nesretep-anp1
Copy link
Author

  • What prioritization order chat model do you expect? (e.g agent > user?).
  • We should for sure document the chat model prioritization currently used, if it's not already being done.

Well, ... IMHO and specially due to the fact, that there are centrally/admin and user organized/maintained/... agents, everything needed should be maintained/stored/set/... within the agent specification.

No user setting, no server setting, ...

In fact everything starts with a conversation with an (user selected) agent.

This agent has it's personality (system prompt(s)), model, parameters, ..., ...

And this agent definition should be used for everything.

Perhaps an implicit default model (e.g. when an agent does not have a image generation model explicitly set), but if it is set within the agent, then no other parameter, model, ... should be used anywhere.

I really was very confused when realizing that - while the agent has model_1 configured - the image generation with THIS agent uses model_2 for prompt enhancement.

@nesretep-anp1
Copy link
Author

  • What behavior were you seeing with the web scrapers that made you set the server chat settings in Khoj? Note: Setting the chat models is not a required field (I believe) when setting up the web scraper prioritization via the server chat settings

Forcing Khoj to use the direct one. ;)

@debanjum
Copy link
Member

debanjum commented Feb 3, 2025

Agent chat model cannot override chat model set by server admin for the intermediate steps.

But this commit enables using the agents chat model instead of the default user chat model (set via the /settings page) for the intermediate steps (including image prompt generation) when no server chat settings are set.

To have the agent chat model be used for intermediate steps remove the server chat setting. As priority order for intermediate steps is server > agent > user chat model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants