-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should we store installed cask information? #17013
Comments
@Homebrew/cask It'd be great to hear your opinion on this topic. |
Thanks for opening this @apainintheneck! ❤️
This feels like we lose a lot of benefits of the API and make offline installation impossible.
This feels like the best option to me and worth tackling in JSON v3.
If this format is a e.g. superset/subset of JSON v3 this seems desirable. Otherwise, it feels like it will have all of the disadvantages of one or both of the above. |
Great discussion. I think there a benefits to storing it as a JSON file because it allows us to store additional information, such as the tap that the cask was installed from. There's inconsistency between what information is available between the ruby source, and the JSON. Apart from the |
The only downsides I know of are the extra network request and the fact that the tap info is not included in the Ruby cask file.
@Rylan12 worked a bit on this over a year ago before we decided to to download the source files for casks with flight or language blocks. The approach at the time was to store the flight blocks as strings in the API JSON and then evaluate that code when loading the cask.
We never really attempted to load the language tabs from the API. I'll have to look into that a bit more. I think the |
Good point! We currently store the tap when installing from the API since the JSON cask already includes that information. We were also discussing whether or not it should be included when installing with internal JSON v3. Are there any other bits of information that would be good to store during the installation process? |
Yeh. I don't love it but: don't really see any alternatives, sadly.
Thanks ❤️
Note: I think the JSON should be structured such that we're able to use the same schema (or some sort of schema inheritance/inclusion if that's possible?) validation eventually for the public API schema, internal API schema and JSON that's written to disk. |
So the method_source gem only stores the code for the block/proc itself but not any surrounding variables that it closes over. This should be fine for internal use with the main cask repo but could cause problems for third-party taps. We could try some workarounds where we capture all local variables when defining the proc with a custom proc class but that would up the difficulty dramatically. There have been some gems that have tried to solve this sort of problem in the past but it's not obvious to me that any of them were ever production-ready and all of them have been abandoned for about a decade. We could potentially use them for inspiration but I don't think it's worth it. |
Good investigation, thanks @apainintheneck!
We should optimise for our own use cases here and we can fix up issues with 3rd party taps and e.g. provide a longer deprecation cycle. |
I don't think this problem will just go away if we tell people not to do it and have a longer deprecation cycle. Inevitably it will be misused because valid Ruby syntax just cannot be described in JSON assuming we don't want to get rid of the Ruby DSL entirely. It also means that subtle errors could be introduced that wouldn't be clear until trying to load an installed cask from JSON. It seems like a mistake to choose a flawed option which is trying to serialize and deserialize arbitrary blocks of Ruby code as strings in JSON when the option of just evaluating Ruby code like we have been for the life of the Homebrew project is on the table. Doing this research has reminded me why I was against the Ruby strings in JSON approach from the beginning. |
This is something we can audit for, adjust the way we evaluate the DSL (i.e. what scope we use and what variables are captured), warn/deprecate/error for at installation time, etc.
The logical conclusion of this argument is we shouldn't have used the JSON API for casks at all. Here's the requirements we have here:
and, as it relates to the API:
Hopefully that helps! |
I didn't know that it was possible to reduce the scope of closures in Ruby. Can you point me to some references on that because I've been unable to find anything with a quick search?
There were problems initially for this very reason but they got ironed out with time. What I'm saying is that we can't generalize what we've made work reasonably well with the core cask repos to third-party repos.
I disagree. It's not obvious to me that this requirement will result in a more maintainable codebase or a better experience for users.
Good point, I hadn't thought of that but it seems ideal and achievable.
This seems ideal but there might be a few fields in the internal JSON that are really no useful or relevant for the publish JSON. I agree in general though.
Could you point me to some threads or information about this problem? It's not obvious to me how pervasive this is or when it was most acute.
👍
50/50 on this one.
I think the question is how painful would it be to have a JSON schema for all casks and will it be worth that effort. |
Ruby within JSON sucks and there's a reason we have already tried that before and declared it not a good idea. With formulae, there's already a two-level system:
Formulae store the Ruby file always in the install directory (though not usually necessary to use given there's no With casks, it's actually quite similar except it's two downloads rather than one since we don't repackage:
The reason casks don't always store the Ruby file because we don't always fetch the Ruby file. However, in the end we will always support installing from third-party taps and they use Ruby files, so we will always support both JSON and Ruby install scenarios, unless the plan was to add extra code to generate JSON on-the-fly during install? Ruby files are downloaded when pre/postflights are used, so we should already know when a Ruby file is needed and when it isn't for uninstall. Remember that reading the installed-cask information is already quite simple: it just loads it through the CaskLoader. If we're introducing yet another format here or are starting to transform formats from Ruby->JSON or vice versa that is not a simplification, that's extra code and we need to be aware of that and its extra maintenance. Critically: installation information should not be broken ever, and that includes whatever is already there right now. So if that install information switches from JSON API v2 to v3, then that means supporting both v2 and v3 indefinitely unless we do some similarly indefinite migration behaviour. So while I agree that we should do better here, please don't expect it to simplify anything as a lot of the existing code will probably stay. Ultimately, when it comes down to it: That would be the ideal option here as Ruby can be volatile and more at risk of breaking in the future. |
@apainintheneck Blocks can be evaluated with
We don't need to. These repos always install from Cask Ruby files in taps rather than a JSON API. The way these repos do things can be audited, deprecated, disabled and removed.
I am the technical lead on this project (https://docs.brew.sh/Homebrew-Governance#6-project-leader) and I am stating that these are a hard requirements. I have explained my reasoning here enough times. Let's continue discussion on other points, please.
Please use GitHub's search to find these or just trust me.
You do not have to implement this schema. Others have expressed a desire to do so. I am stating that others want to do this so it's important to not prevent it. |
@Bo98 Yes, but this does not mean that we should vary the file that is written to the Caskroom depending on the installation method here (which is my main argument). Having an
Firstly, "not be broken ever" is not a realistic API contract to have as an open source project. Secondly, we have broken this many times before on purpose and by accident and we will likely break this again in future.
This doesn't make a lot of sense to me because: we're not proposing a switch from JSON API v2 to v3, I'm proposing a switch from "sometimes Ruby, sometimes JSON (which is not the same as JSON v3) to "the same data is consistently written regardless of the way the cask was installed" and "we want this to somewhat follow the new v3 API design" and "we should have a longer-than-usual deprecation/migration window for handling this". I am game for e.g. Ruby files to be written only when needed (and, note, they will break in future when we make deprecations) but a JSON file is written every single time (like the
This is a really great idea and the direction I would like things to go in 👍🏻 |
We've never (to my knowledge) intentionally broken (To be clear: I'm not saying the current formats are great for that - but the next format really should be)
Cool - we can likely eliminate large parts of this problem if we can do that. |
I'll add another vote in favor of flight DSLs. I also think that any additional cask metadata that we want to include at install time could probably be added separately from the discussion of cask file formats and it would likely be pretty uncontroversial. I don't see any reason why for instance tap couldn't get added to the |
Yes there doesn't need to be any overlap at all between the formats and I suggest making them separate, like how the very-stable formula tab format is. We just need to store any interesting metadata + the uninstall instructions - we don't need to store preinstall data or download URLs etc. Flight Ruby code is OK at install-time but is not really OK for an uninstall stage executed potentially many months/years in the future. |
I think it's worth clarifying what "intentionally broken" means here. It means: after a long period of deprecation/warnings/perhaps migration: we will not use the uninstall hooks from the Cask JSON/Ruby file in the Caskroom but instead use those from the Cask file, if available. The absolute worst case scenario here is you have software you installed years ago, haven't upgraded it at all since, ignored all the warnings and then when you
I agree very strongly here with all of this. Running arbitrary Ruby with context to a Caskfile/Homebrew internals is terrible for long-term backwards compatibility. My desire to use JSON here was specifically to limit how much dynamism is possible here so that we can maintain support for (the new format) pretty much indefinitely and make any future migration to another format dramatically easier.
I think some overlap would be nice if possible but I also think the That's exactly the sort of model I'm hoping for with the same for Casks. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Across Homebrew/brew and Homebrew/homebrew-cask, issues related to how we store cask information is coming up on a weekly basis. It is very clearly a big pain point in casks. In formulae we have tabs that we've not broken compatibility for in years (if ever). For casks, we use Ruby files which we break regularly on both Homebrew minor versions and Ruby updates. I do feel there's room to be more like formulae here. My suggestion is:
I think this is better if split into separate issues but reopening this one for now until then. |
Agreed.
If we're going to keep around a Ruby file: we may as well just use the cask file for this. I really don't want to introduce yet another Ruby-on-disk file format.
Agreed. This provides even more support for "let's not have another Ruby file here".
We should just skip flights when they error and allow uninstallation regardless of the rest here. We should deprecate/disable/remove this behaviour at the normal rate. That gives the best part of a year to migrate which, for a rolling release package manager, is pretty decent really. |
Yes, that's what I meant. Exactly like how formulae work, though formulae are only read at install-time and not uninstall-time hence the ambition to also try eliminate the usage of |
That approach makes sense to me too. Ideally we'd have some way to migrate existing installed files to the new format or gracefully handle legacy options somehow as well. |
Cool, we're agreed then 👍🏻
This is a "nice to have" when the rest is done. Let's not blow up the scope too much here. |
Verification
brew install wget
. If they do, open an issue at https://github.com/Homebrew/homebrew-core/issues/new/choose instead.Provide a detailed description of the proposed feature
When you install a package from Homebrew, we store the package definition along with the installed package. This allows us to always be able to load the package again in the future even if it gets removed from a tap. It also means that any uninstall information is more accurate since files can get moved around between versions.
A year and a half ago when we went all in on the API with Homebrew v4.0, we moved core casks to use the API. When you install a cask without the API, it will store the Ruby cask file along with the cask itself as metadata. When you install cask with the API, it will store the JSON cask file which represents the Ruby cask file 99% of the time.
If a flight or language block is present, we download the original Ruby cask file during the installation process and use that instead.
HOMEBREW_INSTALL_FROM_API
#14304languages
when installing from the API #14456That means that there are two possible cask files that could be stored along with an installed cask. This means that we there is more cask loading code that we have to support along with exceptions like those for casks with certain blocks defined. We should try to simplify that if possible.
Current Ideas
This would make things much simpler on our end but would require an extra network request to fetch the Ruby cask file every time a new cask gets installed. We already do this for casks with flight or language blocks.
This would require figuring out how to serialize the fight and language blocks in the cask DSL. That's why we currently download the Ruby cask file before installing casks with those blocks. They cannot be expressed in the current iteration of the cask API JSON.
This might actually end up being more code to support but would be more consistent because we'd be able to specify the minimum amount of information necessary to work with an installed cask.
What is the motivation for the feature?
Having one file format will mean that the codebase is more consistent. It will also make it easier for us to maintain compatibility with older casks going forward.
Me and @MikeMcQuaid have been going back and forth on this for a week and I think it'd be helpful to get more voices in this discussion. There might be some simple solution that we've missed. We'd like to get this sorted out before releasing the internal JSON API v3.
How will the feature be relevant to at least 90% of Homebrew users?
It's relevant to anyone who installs casks as it could affect old cask file loading reliability.
What alternatives to the feature have been considered?
Keeping things the way they are where we support both the JSON and Ruby cask file formats.
The text was updated successfully, but these errors were encountered: