Optimizing directory listing #114

flakey5 · 2024-04-14T06:59:09Z

Issue

Directory listing through R2's S3 api isn't super performant, especially when compared to nginx's ability on the DO server. For /download/release/, R2 takes ~3sec uncached while nginx takes about ~1sec uncached. Of course, most of the requests will be cached and so there shouldn't be any noticeable impact, but it's still not great imo.

Proposal

Caching every path in the bucket within a json file that we can use similarly to how we use redirectLinks.json.

The structure for the json file would look something like,

interface Directory {
  // Directories within this directory
  directories?: Record<string, Directory>;
  // Files within this directory
  files?: string[]
}

So, a path like nodejs/release/vX.X.X/node.exe would be stored as:

{
  "directories": {
    "nodejs": {
      "directories": {
        "release": {
          "directories": {
            "vX.X.X": {
              "files": ["node.exe"]
            }
          }
        }
      }
    }
  }
}

I made a script that generated ~29,000 absolute paths and converted them to the data structure shown above. I searched for three different paths and timed the results with console.time:

So, from 3 seconds down to 0.1 seconds for a cold start and ~0.01 seconds when hot.

There is a drawback to this however: 29,000 paths isn't the full amount of paths that exist in the bucket, and the json file is already at 1.5mb. The worker should have a size limit of 10mb according to the Cloudflare docs, but I don't know how big the final tree will be. This will only grow as well with each new release.

One alternative we could do is to just do this for the most popular directories as a fast path, and if it doesn't exist within the tree then we send the listing request to the S3 api like we do currently.

The text was updated successfully, but these errors were encountered:

MoLow · 2024-04-15T05:27:19Z

SGTM, I wonder if this isnt just another cache layer but if the implementation isn't too complex it can be ok

flakey5 · 2024-05-29T15:20:52Z

Holding off on this till the provider concept is fully implemented so I can have a better idea of how it can be implemented nicely

flakey5 added the enhancement New feature or request label Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing directory listing #114

Optimizing directory listing #114

flakey5 commented Apr 14, 2024

MoLow commented Apr 15, 2024

flakey5 commented May 29, 2024

Optimizing directory listing #114

Optimizing directory listing #114

Comments

flakey5 commented Apr 14, 2024

Issue

Proposal

MoLow commented Apr 15, 2024

flakey5 commented May 29, 2024