Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): Enable json parsing with typing & conversion #8578

Open
wants to merge 14 commits into
base: dev
Choose a base branch
from

Conversation

majdyz
Copy link
Contributor

@majdyz majdyz commented Nov 7, 2024

Background

We are using loosely typed JSON in several places, sometimes some sort of typing is required, and the resulting has to be either (tuple, list, dict, int, float, etc), and the raw string format is not adhering to that.

Changes 🏗️

This PR introduces typing on json.loads.

Testing 🔍

Note

Only for the new autogpt platform, currently in autogpt_platform/

  • Create from scratch and execute an agent with at least 3 blocks
  • Import an agent from file upload, and confirm it executes correctly
  • Upload agent to marketplace
  • Import an agent from marketplace and confirm it executes correctly
  • Edit an agent from monitor, and confirm it executes correctly

Configuration Changes 📝

Note

Only for the new autogpt platform, currently in autogpt_platform/

If you're making configuration or infrastructure changes, please remember to check you've updated the related infrastructure code in the autogpt_platform/infra folder.

Examples of such changes might include:

  • Changing ports
  • Adding new services that need to communicate with each other
  • Secrets or environment variable changes
  • New or infrastructure changes such as databases

@majdyz majdyz requested a review from ntindle November 7, 2024 03:07
@majdyz majdyz requested a review from a team as a code owner November 7, 2024 03:07
@majdyz majdyz requested review from Pwuts and removed request for a team November 7, 2024 03:07
@github-actions github-actions bot added platform/backend AutoGPT Platform - Back end size/m labels Nov 7, 2024
Copy link

codiumai-pr-agent-pro bot commented Nov 7, 2024

PR Reviewer Guide 🔍

(Review updated until commit c9df13e)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Error Handling
The new error handling in convert() function catches all exceptions generically and wraps them in ConversionError. This could mask important specific errors. Consider catching more specific exceptions.

Default Value Risk
Using empty string as default value for json.loads could mask real errors if stats is None. Consider handling None case explicitly.

Type Safety
The loads() function returns Any when no target_type is provided. This could lead to type safety issues in consuming code. Consider adding runtime type checking or warnings.

@majdyz majdyz requested review from Pwuts and removed request for Pwuts November 7, 2024 04:37
Copy link
Member

@Pwuts Pwuts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a functional or aesthetic change? In all of the instances where this target_type feature is used, I'd expect the type to be predictable because we validate it before storing it. Are there any edge cases this addresses? Can you give an example?

@majdyz
Copy link
Contributor Author

majdyz commented Nov 8, 2024

@Pwuts the trigger was we accidentally storing stats column of AgentGraphExecution as a list instead of dict. This code made sure to not break in such a scenario. Example: https://github.com/Significant-Gravitas/AutoGPT/pull/8578/files#diff-fe699074de9055342cbf5c2aef8eba0ae08ef41429c2f012789c20eb9dec532cR82

Once an invalid format being stored, our code will break and we have no way to handle it.

This also helps the type hinting on the IDE

@majdyz majdyz requested a review from Pwuts November 8, 2024 07:52
@Pwuts
Copy link
Member

Pwuts commented Nov 8, 2024

accidentally storing stats column of AgentGraphExecution as a list instead of dict. This code made sure to not break in such a scenario.

As far as I can see, this code has no recovery mechanism for that situation, does it? It raises an error, which is still breaking.

@majdyz
Copy link
Contributor Author

majdyz commented Nov 8, 2024

Nope it concerts between types, that's the whole point of this PR

@Pwuts
Copy link
Member

Pwuts commented Nov 8, 2024

it concerts between types

What does that mean?

@Pwuts
Copy link
Member

Pwuts commented Nov 8, 2024

My point is: I've looked through the code of convert() and afaics convert([{"some_stat": 123}], dict[str, Any]) will output {0: {"some_stat": 123}}. This doesn't fix the problem of stats being accidentally being stored wrapped in a tuple or list, which was the trigger for making this PR (right?). The resulting object may be a dict, but still doesn't have the right structure.

Trying to coerce a type when retrieving stuff from DB is risky business to me. A dict (that we want to store) accidentally being wrapped in a list is just one of many possible ways to screw up storing that information. I think checking the parsed value against a target type is all we should be doing when it comes to info pulled from the DB.

@majdyz
Copy link
Contributor Author

majdyz commented Nov 8, 2024

You can make the expected structure as deep and detailed as you want and it will shape it, since the caller only expect dict[str, Any] then that's what we get.

The stats value on invalid format doesn't matter, what matter is that we never break the whole process because of it.
To fix the initial problem we only have two options:

  • Make the list to dict, then move on
  • Not changing anything, but still move on

I think checking the parsed value against a target type is all we should be doing when it comes to info pulled from the DB.

Checking the type is an obvious requirement. The action that we need if it doesn't match is the question, so we could either:

  • Raise an error (that we can't do)
  • Shape it to match the structure (and maybe log a warning) and move on.

I chose the second option.

@majdyz majdyz closed this Nov 8, 2024
@majdyz majdyz reopened this Nov 8, 2024
Copy link

Persistent review updated to latest commit c9df13e

@Pwuts
Copy link
Member

Pwuts commented Nov 9, 2024

Shape it to match the structure (and maybe log a warning) and move on.

My point is that this generally won't help. If we are planning on using the value, e.g. stats["duration"], and stats comes in as a list[dict] instead of a dict, then trying to convert it (using an arbitrary mechanism) isn't going to make it usable because they are fundamentally different data structures. It will let the system move on until the value is actually used, and then it will still fail, maybe in an unexpected way. It could even cause further pollution of the system by propagating the malformed data.

So I think it really should just raise an error.

Simple conversions can work fine, but I don't really see a case for converting to/from or between complex types like list and dict. Within graphs we have to be somewhat flexible to increase interoperability of blocks, but I wouldn't want to extend this to our backend code. That honestly seems dangerous.

@majdyz
Copy link
Contributor Author

majdyz commented Nov 10, 2024

So I think it really should just raise an error.

We definitely can't do this. This is the whole point of the PR, avoid raising error on malformatted loosely typed json. Json value we are having are mostly not a the most crucial information (stats, output data, properties). It doesn't make sense to error out loading a graph, executing a graph , or even opening a monitor page just because the db was accidentally corrupted the stats/outputData value.

If a best effort conversion is a no go. I'm open with other alternatives, but it has to be non-terminating, not like the current state.

@majdyz
Copy link
Contributor Author

majdyz commented Nov 11, 2024

As I stated before, using the value is not the main intention. We need the process to be non breaking that's all. And to make it non breaking we need to fallback to empty value on each type or do conversion like what this change is doing.

So I think it really should just raise an error.

We can't. We should not break monitor page or literally any get request because an execution duration stats or one output page missing or malformed.

Copy link

netlify bot commented Nov 11, 2024

Deploy Preview for auto-gpt-docs canceled.

Name Link
🔨 Latest commit 8821b04
🔍 Latest deploy log https://app.netlify.com/sites/auto-gpt-docs/deploys/6736a946777a0400099db72f

aarushik93
aarushik93 previously approved these changes Nov 12, 2024
@majdyz
Copy link
Contributor Author

majdyz commented Nov 12, 2024

I'll hold merging this PR based on @Pwuts input here.

@Pwuts
Copy link
Member

Pwuts commented Nov 12, 2024

If a best effort conversion is a no go. I'm open with other alternatives, but it has to be non-terminating, not like the current state.

Alright. I still don't think we should be trying to convert between types that don't have a trivial way to convert. E.g. float <-> str is doable because there is a lossless way to do that conversion. list <-> dict is non-trivial, and I estimate the chance of it working with the rest of the code to be minimal.

A few other ideas:

  • Raise an error if deserialized value is not of target_type, then try/except where this isn't allowed to cascade into failure of the containing routine
  • Return None if deserialized value is not of target_type and can't be (trivially) converted
  • Add a default: Any = None parameter, return default if deserialized value is not of target_type

@majdyz
Copy link
Contributor Author

majdyz commented Nov 13, 2024

@Pwuts Updated, please re-review

@@ -133,6 +133,8 @@ def _convert(value: Any, target_type: Type):
return {convert(v, args[0]) for v in value}
else:
return value
elif raise_on_mismatch:
raise ConversionError(f"Failed to convert {value} to {target_type}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ConversionError(f"Failed to convert {value} to {target_type}")
raise TypeError(f"Value {value} is not of expected type {target_type}")

@Pwuts
Copy link
Member

Pwuts commented Nov 14, 2024

Other than the one comment I just posted I'm happy with this solution :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants