ci: self-hosted runners for benchmarks #29955

HowardBraham · 2025-01-29T02:03:23Z

Description

Major changes

benchmarks.yml - Does some important and new-to-us things like:
- Self-hosted runner runs-on: gha-mm-scale-set-ubuntu-22.04-amd64-med
- Running in the same kind of Docker container we use in CircleCI container: image: cimg/node:22.13-browsers
- Unlike on CircleCI, does not work without installing Xvfb
codespaces.yml
- I originally added this workflow, and it was important to caching in the past, but with the new workflows in GHA, it's no longer needed, so I deleted it

Prerequisites to merging this PR

Merge this benchmarks branch of github-tools: https://github.com/MetaMask/github-tools/blob/benchmarks/.github/actions/setup-environment/action.yml
Change setup-environment@benchmarks back to setup-environment@main
Vastly increase the number of allocated self-hosted runners

This is just Part 1 of a larger 4-part task to make the startup time quality gate, but I think it's the hardest part

Run on self-hosted GitHub Actions runners so that hopefully the numbers will be more stable (this PR)
Get this to measure the same measurements in the same way as we do for Sentry
Make metamaskbot post with the results of this instead of the current CircleCI source
Set up the quality gate

Related issues

Progresses: MetaMask/MetaMask-planning#3679

github-actions · 2025-01-29T02:03:35Z

CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes.

metamaskbot · 2025-01-29T02:38:57Z

Builds ready [9c55124]

builds: chrome, firefox
builds (flask): chrome, firefox
builds (MMI): chrome
builds (test): chrome, firefox
builds (test-flask): chrome, firefox
build viz: Build System
mv3: Bundle Size Stats
mv2: E2e Actions Stats
storybook: Storybook
typescript migration: Dashboard
all artifacts
bundle viz:
- background: 0, 1, 2, 3, 4, 5, 6, 7
- common: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- ui: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- content-script: 0
- offscreen: 0

Page Load Metrics (1730 ± 64 ms)

Platform	Page	Metric	Min (ms)	Max (ms)	Average (ms)	StandardDeviation (ms)	MarginOfError (ms)
Chrome	Home	firstPaint	225	2038	1648	351	169
		domContentLoaded	1508	2026	1699	128	61
		load	1568	2041	1730	133	64
		domInteractive	23	97	42	25	12
		backgroundConnect	10	68	34	20	10
		firstReactRender	15	97	50	28	13
		getState	4	57	13	14	7
		initialActions	0	0	0	0	0
		loadScripts	1052	1569	1246	123	59
		setupStore	8	91	19	23	11
		uiStartup	1772	2595	2011	234	112

Bundle size diffs

background: 0 Bytes (0.00%)
ui: 0 Bytes (0.00%)
common: 0 Bytes (0.00%)

metamaskbot · 2025-01-29T03:35:56Z

Builds ready [9c55124]

builds: chrome, firefox
builds (flask): chrome, firefox
builds (MMI): chrome
builds (test): chrome, firefox
builds (test-flask): chrome, firefox
build viz: Build System
mv3: Bundle Size Stats
mv2: E2e Actions Stats
storybook: Storybook
typescript migration: Dashboard
all artifacts
bundle viz:
- background: 0, 1, 2, 3, 4, 5, 6, 7
- common: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- ui: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- content-script: 0
- offscreen: 0

Page Load Metrics (1730 ± 64 ms)

Platform	Page	Metric	Min (ms)	Max (ms)	Average (ms)	StandardDeviation (ms)	MarginOfError (ms)
Chrome	Home	firstPaint	225	2038	1648	351	169
		domContentLoaded	1508	2026	1699	128	61
		load	1568	2041	1730	133	64
		domInteractive	23	97	42	25	12
		backgroundConnect	10	68	34	20	10
		firstReactRender	15	97	50	28	13
		getState	4	57	13	14	7
		initialActions	0	0	0	0	0
		loadScripts	1052	1569	1246	123	59
		setupStore	8	91	19	23	11
		uiStartup	1772	2595	2011	234	112

Bundle size diffs

background: 0 Bytes (0.00%)
ui: 0 Bytes (0.00%)
common: 0 Bytes (0.00%)

HowardBraham · 2025-01-29T04:10:16Z

.github/workflows/benchmarks.yml

+        with:
+          name: ${{ inputs.name }}
+          path: test-artifacts/chrome/benchmark/
+          retention-days: 5


The number of retention-days was chosen very arbitrarily, and maybe someone has an opinion

We pay a very tiny amount per Gigabyte-hour to use artifact storage. I anticipate 5 days will be insignificant even over a large number of artifacts. However, if you have a sense for the size of the artifact + how often it will be generated, we can spot check the cost for each number of retention days

These particular benchmark artifacts are very very small. It's just some JSON data that gets zipped. On one run, it's 823 bytes and 196 bytes. But we also may want to look back at these numbers much later. GitHub probably isn't the best platform for that though, something like Sentry or Segment is probably much better.

HowardBraham · 2025-01-29T04:10:37Z

.github/workflows/update-attributions.yml

@@ -182,4 +182,4 @@ jobs:
        env:
          GITHUB_TOKEN: ${{ secrets.LAVAMOAT_UPDATE_TOKEN }}
          PR_NUMBER: ${{ github.event.issue.number }}
-          ACTION_RUN_URL: "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
+          ACTION_RUN_URL: '${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}'


This is just Prettier running on a file

metamaskbot · 2025-01-29T04:57:31Z

Builds ready [e5a4333]

builds: chrome, firefox
builds (flask): chrome, firefox
builds (MMI): chrome
builds (test): chrome, firefox
builds (test-flask): chrome, firefox
build viz: Build System
mv3: Bundle Size Stats
mv2: E2e Actions Stats
storybook: Storybook
typescript migration: Dashboard
all artifacts
bundle viz:
- background: 0, 1, 2, 3, 4, 5, 6, 7
- common: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- ui: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- content-script: 0
- offscreen: 0

Page Load Metrics (2151 ± 76 ms)

Platform	Page	Metric	Min (ms)	Max (ms)	Average (ms)	StandardDeviation (ms)	MarginOfError (ms)
Chrome	Home	firstPaint	227	2507	1986	600	288
		domContentLoaded	1760	2409	2105	165	79
		load	1842	2476	2151	158	76
		domInteractive	30	150	58	32	16
		backgroundConnect	15	94	51	28	13
		firstReactRender	18	90	47	22	10
		getState	9	90	35	26	12
		initialActions	0	1	0	0	0
		loadScripts	1289	1877	1578	152	73
		setupStore	10	66	22	18	9
		uiStartup	2110	3372	2513	279	134

Bundle size diffs

background: 0 Bytes (0.00%)
ui: 0 Bytes (0.00%)
common: 0 Bytes (0.00%)

metamaskbot · 2025-01-29T05:39:18Z

Builds ready [a6f5acb]

builds: chrome, firefox
builds (flask): chrome, firefox
builds (MMI): chrome
builds (test): chrome, firefox
builds (test-flask): chrome, firefox
build viz: Build System
mv3: Bundle Size Stats
mv2: E2e Actions Stats
storybook: Storybook
typescript migration: Dashboard
all artifacts
bundle viz:
- background: 0, 1, 2, 3, 4, 5, 6, 7
- common: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- ui: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- content-script: 0
- offscreen: 0

Page Load Metrics (1708 ± 72 ms)

Platform	Page	Metric	Min (ms)	Max (ms)	Average (ms)	StandardDeviation (ms)	MarginOfError (ms)
Chrome	Home	firstPaint	1554	2173	1715	143	69
		domContentLoaded	1509	2097	1676	133	64
		load	1514	2175	1708	151	72
		domInteractive	25	94	39	19	9
		backgroundConnect	10	226	40	47	23
		firstReactRender	16	95	47	26	12
		getState	5	89	20	25	12
		initialActions	0	1	0	0	0
		loadScripts	1075	1614	1220	116	56
		setupStore	7	61	14	13	6
		uiStartup	1722	2398	1979	202	97

Bundle size diffs

background: 0 Bytes (0.00%)
ui: 0 Bytes (0.00%)
common: 0 Bytes (0.00%)

Gudahtt · 2025-01-29T14:58:42Z

Could you split this into multiple PRs? A lot of these changes seem to be distinct from each other

metamaskbot · 2025-01-29T22:24:03Z

Builds ready [6266945]

builds: chrome, firefox
builds (flask): chrome, firefox
builds (MMI): chrome
builds (test): chrome, firefox
builds (test-flask): chrome, firefox
build viz: Build System
mv3: Bundle Size Stats
mv2: E2e Actions Stats
storybook: Storybook
typescript migration: Dashboard
all artifacts
bundle viz:
- background: 0, 1, 2, 3, 4, 5, 6, 7
- common: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- ui: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- content-script: 0
- offscreen: 0

Page Load Metrics (1958 ± 284 ms)

Platform	Page	Metric	Min (ms)	Max (ms)	Average (ms)	StandardDeviation (ms)	MarginOfError (ms)
Chrome	Home	firstPaint	289	3234	1555	702	337
		domContentLoaded	1386	3608	1945	585	281
		load	1394	3645	1958	592	284
		domInteractive	22	106	45	23	11
		backgroundConnect	8	68	19	15	7
		firstReactRender	15	84	49	23	11
		getState	4	54	15	14	7
		initialActions	0	1	0	0	0
		loadScripts	1008	2763	1440	483	232
		setupStore	7	61	24	20	9
		uiStartup	1532	4169	2280	685	329

Bundle size diffs

background: 0 Bytes (0.00%)
ui: 0 Bytes (0.00%)
common: 0 Bytes (0.00%)

itsyoboieltr · 2025-01-30T19:50:07Z

The migration to GitHub Actions is a necessary step, but before committing to self-hosted runners, we should first validate the stability of measurements across different environments.

We need to compare benchmark results between CircleCI, GitHub-hosted (ubuntu-latest), and self-hosted runners to determine if self-hosted runners indeed reduce fluctuations. I am not convinced that they actually would.
We need to ensure that the same type of build is measured across all environments to make results comparable. This PR is testing the webpack build, but CircleCI is testing browserify.
We should definitely avoid premature infrastructure changes (e.g., provisioning self-hosted runners) without confirming that they improve measurement stability. We need reproducible experiments, and most importantly, NUMBERS!

The suggested approach is to conduct benchmark runs on GitHub-hosted runners first (ubuntu-latest) and compare them against CircleCI and self-hosted runner results. If fluctuations remain high across all environments, this method may not be suitable for a quality gate. However, the migration to GitHub Actions remains beneficial regardless.

Next steps:

Run benchmarks on ubuntu-latest and self-hosted runners, compare results.
If stability issues persist, scrap the idea of using the results for a quality gate while still proceeding with the migration to GitHub Actions.

Would be great to see some comparison data in the PR before merging.

HowardBraham · 2025-01-30T22:12:10Z

As requested by @itsyoboieltr, here are benchmarks running in 4 different configurations:

browserify build on self-hosted runners
browserify build on ubuntu-latest
webpack build on self-hosted runners
webpack build on ubuntu-latest

Probably the most important stat here is uiStartup (btw our old system highlights load). Two additional things I see:

Standard deviation is pretty low across all configurations, which is nice
All configurations are approximately twice as fast as our CircleCI machine

I'm going to run this again to see if it varies over time, and also with some other options.

Platform	Page	Metric	Min (ms)	Max (ms)	Average (ms)	StandardDeviation (ms)	MarginOfError (ms)
Browserify-self-host	Home	firstPaint	158	1195	714	302	145
		domContentLoaded	821	1177	868	75	36
		load	824	1179	873	74	36
		domInteractive	14	27	20	2	1
		backgroundConnect	5	11	7	1	1
		firstReactRender	12	17	12	1	1
		getState	3	5	4	1	0
		initialActions	0	1	0	0	0
		loadScripts	618	809	655	42	20
		setupStore	6	9	7	1	0
		uiStartup	899	1292	971	81	39
Browserify-ubuntu-latest	Home	firstPaint	216	1061	869	299	143
		domContentLoaded	965	1144	1016	40	19
		load	973	1150	1022	40	19
		domInteractive	26	69	36	14	7
		backgroundConnect	8	12	10	1	1
		firstReactRender	18	37	21	4	2
		getState	5	11	8	2	1
		initialActions	0	1	0	0	0
		loadScripts	751	874	791	30	15
		setupStore	9	14	11	1	1
		uiStartup	1090	1371	1169	63	30
Webpack-self-host	Home	firstPaint	154	736	668	119	57
		domContentLoaded	666	718	685	15	7
		load	670	734	692	15	7
		domInteractive	22	44	25	5	2
		backgroundConnect	16	24	19	2	1
		firstReactRender	11	13	12	0	0
		getState	3	6	5	1	0
		initialActions	0	1	0	0	0
		loadScripts	656	710	676	15	7
		setupStore	4	7	5	1	0
		uiStartup	804	882	833	17	8
Webpack-ubuntu-latest	Home	firstPaint	170	896	520	318	153
		domContentLoaded	708	890	829	35	17
		load	714	898	836	36	17
		domInteractive	21	40	34	4	2
		backgroundConnect	22	29	25	2	1
		firstReactRender	13	15	13	1	0
		getState	3	8	6	1	1
		initialActions	0	0	0	0	0
		loadScripts	697	879	818	35	17
		setupStore	5	8	6	1	0
		uiStartup	883	1049	998	33	16

HowardBraham · 2025-01-30T22:59:40Z

Second run of this, numbers are pretty close to the first run

Platform	Page	Metric	Min (ms)	Max (ms)	Average (ms)	StandardDeviation (ms)	MarginOfError (ms)
Browserify-self-host	Home	firstPaint	157	905	776	200	96
		domContentLoaded	802	868	833	16	8
		load	807	875	839	17	8
		domInteractive	14	24	19	2	1
		backgroundConnect	5	8	7	1	0
		firstReactRender	11	13	12	1	0
		getState	3	5	4	1	0
		initialActions	0	1	0	0	0
		loadScripts	602	669	630	17	8
		setupStore	5	7	6	1	0
		uiStartup	893	986	938	23	11
Browserify-ubuntu-latest	Home	firstPaint	209	1107	863	322	155
		domContentLoaded	960	1082	1008	34	16
		load	963	1091	1014	35	17
		domInteractive	17	64	36	15	7
		backgroundConnect	7	16	10	2	1
		firstReactRender	18	37	22	4	2
		getState	5	12	7	2	1
		initialActions	0	1	0	0	0
		loadScripts	754	856	788	29	14
		setupStore	9	23	11	3	1
		uiStartup	1107	1265	1152	40	19
Webpack-self-host	Home	firstPaint	172	752	677	117	56
		domContentLoaded	655	745	695	21	10
		load	669	754	705	20	9
		domInteractive	22	70	28	10	5
		backgroundConnect	17	24	19	2	1
		firstReactRender	11	13	12	1	0
		getState	3	6	4	1	0
		initialActions	0	1	0	0	0
		loadScripts	647	737	687	21	10
		setupStore	4	7	5	1	1
		uiStartup	822	897	847	20	10
Webpack-ubuntu-latest	Home	firstPaint	169	878	458	307	148
		domContentLoaded	805	895	830	21	10
		load	812	904	837	21	10
		domInteractive	26	81	37	12	6
		backgroundConnect	21	27	24	2	1
		firstReactRender	12	14	13	0	0
		getState	4	10	6	1	1
		initialActions	0	1	0	0	0
		loadScripts	794	883	819	21	10
		setupStore	5	8	6	1	0
		uiStartup	963	1081	991	26	13

NicholasEllul · 2025-02-06T19:35:09Z

.github/workflows/benchmarks.yml

+      - name: Download artifact prep-build-test-webpack
+        uses: actions/download-artifact@v4
+        with:
+          path: ./dist/
+          pattern: prep-build-test-webpack
+          merge-multiple: true


Nitpick/optional: Noting that requiring this artifact means there is a dependency on needing the prep-build-test-webpack step in main.yml to successfully create its artifact before this job can work correctly.

I wonder if we could make this more explicit in its file structure doing something like this to make this dependency clearer

E.g

. ├── main.yml └── main └── benchmarks.yml

That would be cool, but I think GitHub Actions doesn't allow this. If you find documentation that says otherwise, let's go for it.

Another option would be a naming scheme like perhaps main-benchmarks.yml

NicholasEllul · 2025-02-06T19:44:22Z

.github/workflows/benchmarks.yml

+        with:
+          name: ${{ inputs.name }}
+          path: test-artifacts/chrome/benchmark/
+          retention-days: 5


We pay a very tiny amount per Gigabyte-hour to use artifact storage. I anticipate 5 days will be insignificant even over a large number of artifacts. However, if you have a sense for the size of the artifact + how often it will be generated, we can spot check the cost for each number of retention days

NicholasEllul · 2025-02-06T20:00:32Z

test/e2e/webdriver/chrome.js

@@ -61,6 +61,11 @@ class ChromeDriver {
      args.push('--disable-gpu');
    }

+    // It will crash if you don't do this, but there might be another way around it
+    if (process.env.GITHUB_ACTION) {
+      args.push('--no-sandbox');


Do you have access to the logs that showed up when the crash occurred?

This might be a sign that there is a shift in the permissions that our runner is executing with. The chrome web driver will not run in sandbox mode if its being executed with root permissions as a security measure. Its likely the case that our old runner was not running as root, but now that we are using our self hosted runners (or a custom image), this may have changed. Can you validate this?

I've seen a handful of remote code execution vulnerabilities with webdrivers in the past, so a malicious pull request that triggers this may be able to take advantage of this in the future if we disable the sandbox.

I ran it again for you, here's the full run log: https://github.com/MetaMask/metamask-extension/actions/runs/13192830006/job/36828840527

Important snippet:

[763:763:0207/035354.645008:FATAL:zygote_host_impl_linux.cc(126)] No usable sandbox! Update your kernel or see https://chromium.googlesource.com/chromium/src/+/main/docs/linux/suid_sandbox_development.md for more information on developing with the SUID sandbox. If you want to live dangerously and need an immediate workaround, you can try using --no-sandbox. [0207/035354.652173:ERROR:file_io_posix.cc(145)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq: No such file or directory (2) [0207/035354.652233:ERROR:file_io_posix.cc(145)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: No such file or directory (2) /__w/metamask-extension/metamask-extension/node_modules/selenium-webdriver/lib/error.js:524 let err = new ctor(data.message) ^ SessionNotCreatedError: session not created: Chrome failed to start: exited normally. (session not created: DevToolsActivePort file doesn't exist) (The process started from chrome location /github/home/.cache/selenium/chrome/linux64/1[26](https://github.com/MetaMask/metamask-extension/actions/runs/13192830006/job/36828840527#step:7:27).0.6478.182/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

HowardBraham added the team-extension-platform Extension Platform team label Jan 29, 2025

HowardBraham self-assigned this Jan 29, 2025

metamaskbot added the team-tiger Tiger team (for tech debt reduction + performance improvements) label Jan 29, 2025

HowardBraham removed the team-tiger Tiger team (for tech debt reduction + performance improvements) label Jan 29, 2025

HowardBraham force-pushed the self-hosted-benchmarks branch from 9c55124 to ce07be7 Compare January 29, 2025 04:08

HowardBraham commented Jan 29, 2025

View reviewed changes

HowardBraham requested review from itsyoboieltr, Gudahtt and NicholasEllul January 29, 2025 04:12

HowardBraham force-pushed the self-hosted-benchmarks branch from ce07be7 to e5a4333 Compare January 29, 2025 04:16

HowardBraham force-pushed the self-hosted-benchmarks branch from e5a4333 to a6f5acb Compare January 29, 2025 05:03

HowardBraham marked this pull request as ready for review January 29, 2025 05:13

HowardBraham added the DO-NOT-MERGE Pull requests that should not be merged label Jan 29, 2025

HowardBraham force-pushed the self-hosted-benchmarks branch from a6f5acb to c9f1e1e Compare January 29, 2025 17:59

HowardBraham changed the title ~~ci: self-hosted runners for benchmarks, with a new prep-deps workflow~~ ci: self-hosted runners for benchmarks Jan 29, 2025

HowardBraham mentioned this pull request Jan 29, 2025

ci: a new prep-deps workflow with caching #29979

Open

ci: self-hosted runners for benchmarks

6266945

HowardBraham force-pushed the self-hosted-benchmarks branch from c9f1e1e to 6266945 Compare January 29, 2025 21:50

HowardBraham mentioned this pull request Jan 30, 2025

ci: self-hosted runners for benchmarks (prereq) MetaMask/github-tools#38

Open

NicholasEllul reviewed Feb 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: self-hosted runners for benchmarks #29955

ci: self-hosted runners for benchmarks #29955

HowardBraham commented Jan 29, 2025 •

edited

Loading

github-actions bot commented Jan 29, 2025

metamaskbot commented Jan 29, 2025

metamaskbot commented Jan 29, 2025

HowardBraham Jan 29, 2025

NicholasEllul Feb 6, 2025

HowardBraham Feb 7, 2025

HowardBraham Jan 29, 2025

metamaskbot commented Jan 29, 2025

metamaskbot commented Jan 29, 2025

Gudahtt commented Jan 29, 2025

metamaskbot commented Jan 29, 2025

itsyoboieltr commented Jan 30, 2025

HowardBraham commented Jan 30, 2025

HowardBraham commented Jan 30, 2025

NicholasEllul Feb 6, 2025

HowardBraham Feb 7, 2025

NicholasEllul Feb 6, 2025

NicholasEllul Feb 6, 2025

HowardBraham Feb 7, 2025

ci: self-hosted runners for benchmarks #29955

Are you sure you want to change the base?

ci: self-hosted runners for benchmarks #29955

Conversation

HowardBraham commented Jan 29, 2025 • edited Loading

Description

Major changes

Prerequisites to merging this PR

This is just Part 1 of a larger 4-part task to make the startup time quality gate, but I think it's the hardest part

Related issues

github-actions bot commented Jan 29, 2025

metamaskbot commented Jan 29, 2025

metamaskbot commented Jan 29, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metamaskbot commented Jan 29, 2025

metamaskbot commented Jan 29, 2025

Gudahtt commented Jan 29, 2025

metamaskbot commented Jan 29, 2025

itsyoboieltr commented Jan 30, 2025

HowardBraham commented Jan 30, 2025

HowardBraham commented Jan 30, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HowardBraham commented Jan 29, 2025 •

edited

Loading