Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing directory listing #114

Open
flakey5 opened this issue Apr 14, 2024 · 2 comments
Open

Optimizing directory listing #114

flakey5 opened this issue Apr 14, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@flakey5
Copy link
Member

flakey5 commented Apr 14, 2024

Issue

Directory listing through R2's S3 api isn't super performant, especially when compared to nginx's ability on the DO server. For /download/release/, R2 takes ~3sec uncached while nginx takes about ~1sec uncached. Of course, most of the requests will be cached and so there shouldn't be any noticeable impact, but it's still not great imo.

Proposal

Caching every path in the bucket within a json file that we can use similarly to how we use redirectLinks.json.

The structure for the json file would look something like,

interface Directory {
  // Directories within this directory
  directories?: Record<string, Directory>;
  // Files within this directory
  files?: string[]
}

So, a path like nodejs/release/vX.X.X/node.exe would be stored as:

{
  "directories": {
    "nodejs": {
      "directories": {
        "release": {
          "directories": {
            "vX.X.X": {
              "files": ["node.exe"]
            }
          }
        }
      }
    }
  }
}

I made a script that generated ~29,000 absolute paths and converted them to the data structure shown above. I searched for three different paths and timed the results with console.time:
image

So, from 3 seconds down to 0.1 seconds for a cold start and ~0.01 seconds when hot.

There is a drawback to this however: 29,000 paths isn't the full amount of paths that exist in the bucket, and the json file is already at 1.5mb. The worker should have a size limit of 10mb according to the Cloudflare docs, but I don't know how big the final tree will be. This will only grow as well with each new release.

One alternative we could do is to just do this for the most popular directories as a fast path, and if it doesn't exist within the tree then we send the listing request to the S3 api like we do currently.

@MoLow
Copy link
Member

MoLow commented Apr 15, 2024

SGTM, I wonder if this isnt just another cache layer but if the implementation isn't too complex it can be ok

@flakey5
Copy link
Member Author

flakey5 commented May 29, 2024

Holding off on this till the provider concept is fully implemented so I can have a better idea of how it can be implemented nicely

@flakey5 flakey5 added the enhancement New feature or request label Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants