Skip to content

Commit

Permalink
PEP 774: Removing the LLVM requirement for JIT builds (#4234)
Browse files Browse the repository at this point in the history
  • Loading branch information
savannahostrowski authored Jan 27, 2025
1 parent 36b3b39 commit 4c0b2e8
Show file tree
Hide file tree
Showing 2 changed files with 261 additions and 0 deletions.
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -650,6 +650,7 @@ peps/pep-0769.rst @facundobatista
peps/pep-0770.rst @sethmlarson @brettcannon
peps/pep-0771.rst @pradyunsg
peps/pep-0773.rst @zooba
peps/pep-0774.rst @savannahostrowski
# ...
peps/pep-0777.rst @warsaw
# ...
Expand Down
260 changes: 260 additions & 0 deletions peps/pep-0774.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,260 @@
PEP: 774
Title: Removing the LLVM requirement for JIT builds
Author: Savannah Ostrowski <[email protected]>
Status: Draft
Type: Standards Track
Created: 27-Jan-2025
Python-Version: 3.14

Abstract
========

Since Python 3.13, CPython has been able to be configured and built with an
experimental just-in-time (JIT) compiler via the ``--enable-experimental-jit``
flag on Linux and Mac and ``--experimental-jit`` on Windows. To build CPython with
the JIT enabled, users are required to have LLVM installed on their machine
(initially, with LLVM 16 but more recently, with LLVM 19). LLVM is responsible
for generating stencils that are essential to our copy-and-patch JIT (see :pep:`744`).
These stencils are predefined, architecture-specific templates that are used
to generate machine code at runtime.

This PEP proposes removing the LLVM build-time dependency for JIT-enabled builds
by hosting the generated stencils in the CPython repository. This approach
allows us to leverage the checked-in stencils for supported platforms at build
time, simplifying the contributor experience and address concerns raised at the
Python Core Developer Sprint in September 2024. That said, there is a clear
tradeoff to consider, as improved developer experience does come at the cost of
increased repository size.

It is important to note that this PEP is not a proposal to accept or reject the
JIT itself but rather to determine whether the build-time dependency on LLVM is
acceptable for JIT builds moving forward. If this PEP is rejected, we will
proceed with the status quo, retaining the LLVM build-time requirement. While
this dependency has served the JIT development process effectively thus far, it
introduces setup complexity and additional challenges that this PEP seeks to
alleviate.

Motivation
==========

At the Python Core Developer Sprint that took place in September 2024, there was
discussion about the next steps for the JIT - a related discussion also took
place on `GitHub <https://github.com/python/cpython/issues/115869>`__. As part
of that discussion, there was also a clear appetite for removing the LLVM
requirement for JIT builds in preparation for shipping the JIT off by default in
3.14. The consensus at the sprint was that it would be sufficient to provide
pre-generated stencils for non-debug builds for Tier 1 platforms and that
checking these files into the CPython repo would be adequate for the limited
number of platforms (though more options have been explored; see `Rejected
Ideas`_).

Currently, building CPython with `the JIT requires LLVM
<https://github.com/python/cpython/tree/main/Tools/jit#installing-llvm>`__ as a
build-time dependency. Despite not being exposed to end users, this dependency
is suboptimal. Requiring LLVM adds a setup burden for developers and those who
wish to build CPython with the JIT enabled. Depending on the operating system,
the version of LLVM shipped with the OS may differ from that required by our JIT
builds, which introduces additional complexity to troubleshoot and resolve. With
few core developers currently contributing to and maintaining the JIT, we also
want to make sure that the friction to work on JIT-related code is minimized as
much as possible.

With the proposed approach, hosting pre-compiled stencils for supported
architectures can be generated in advance, stored in a central location, and
automatically used during builds. This approach ensures reproducible builds,
making the JIT a more stable and sustainable part of CPython's future.

Rationale
=========

This PEP proposes checking JIT stencils directly into the CPython repo as the
best path forward for eliminating our build-time dependency on LLVM.

This approach:

* Provides the best end-to-end experience for those looking to build CPython
with the JIT
* Lessens the barrier to entry for those looking to contribute to the JIT
* Ensures builds remain reproducible and consistent across platforms without
relying on external infrastructure or download mechanisms
* Eliminates variability introduced by network conditions or potential
discrepancies between hosted files and the CPython repository state, and
* Subjects stencils to the same review processes we have for all other JIT-related
code

However, this approach does result in a slight increase in overall
repository size. Comparing repo growth on commits over the past 90 days, the
difference between the actual commits and the same commits with stencils added
amounts to a difference of 0.03 MB per stencil file. This is a small increase in
the context of the overall repository size, which has grown by 2.55 MB in the
same time period. For six stencil files, this amounts to an upper bound of 0.18 MB.
The current total size of the stencil files for all six platforms is 7.2 MB.

These stencils could become larger in the future with changes to register
allocation, which would introduce 5-6 variants per instruction in each stencil
file (5-6x larger). However, if we ended up going this route, there are
additional modifications we could make to stencil files that could help
counteract this size increase (e.g., stripping comments, minimizing the
stencils).

Specification
=============

This specification outlines the proposed changes to remove the build-time
dependency on LLVM and the contributor experience if this PEP is accepted.

Repository changes
------------------

The CPython repository would now host the pre-compiled JIT stencils in a new
subdirectory in ``Tools/jit`` called ``stencils/``. At present, the JIT is tested
and built for six platforms, so to start, we'd check in six stencil files. In
the future, we may check in additional stencil files if support for additional
platforms is desired or relevant.

.. code-block:: text
cpython/
Tools/
jit/
stencils/
aarch64-apple-darwin.h
aarch64-unknown-linux-gnu.h
i686-pc-windows-msvc.h
x86_64-apple-darwin.h
x86_64-pc-windows-msvc.h
x86_64-pc-linux-gnu.h
Workflow
--------

The workflow changes can be split into two parts, namely building CPython with
the JIT enabled and working on the JIT's implementation.

Building CPython with the JIT
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Precompiled JIT stencil files will be stored in the ``Tools/jit/stencils``
directory, with each file name corresponding to its target triple as outlined
above. At build time, we determine whether to use the checked in stencils or to
generate a new stencil for the user's platform. Specifically, for contributors
with LLVM installed, the ``build.py`` script in ``Tools/jit/stencils`` will allow
them to regenerate the stencil for their platform. Those without LLVM can rely
on the precompiled stencil files directly from the repository.

Working on the JIT's implementation (or touching JIT files)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In continuous integration (CI), stencil files will be automatically validated and updated when changes
are made to JIT-related files. When a pull request is opened that touches these
files, the ``jit.yml`` workflow, which builds and tests our builds, will run as
usual.

However, as part of this, we will introduce a new step that diffs the current
stencils in the repo against those generated in CI. If there is a diff for a
platform's stencil file, a patch file for the updated stencil is generated and
the step will fail. Each patch is uploaded to GitHub Actions. After CI is
finished running across all platforms, the patches are aggregated into a single
patch file for convenience. You can download this aggregated patch, apply it
locally, and commit the updated stencils back to your branch. Then, the
subsequent CI run will pass.

Reference Implementation
========================

Key parts of the `reference implementation <https://github.com/python/cpython/pull/129331>`__ include:

- |CI|_: The CI workflow responsible for generating stencil patches.

- |jit_stencils|_: The directory where stencils are stored.

- |targets|_: The code to compile and parse the templates at build time.

.. |CI| replace:: ``.github/workflows/jit.yml``
.. _CI: https://github.com/python/cpython/blob/main/.github/workflows/jit.yml

.. |jit_stencils| replace:: ``Tools/jit/stencils``
.. _jit_stencils: https://github.com/python/cpython/blob/main/Tools/jit/stencils

.. |targets| replace:: ``Tools/jit/_targets``
.. _targets: https://github.com/python/cpython/blob/main/Tools/jit/_targets.py

Ignoring the stencils themselves and any necessary JIT README changes, the
changes to the source code to support reproducible stencil generation and
hosting are minimal (around 150 lines of changes).

Rejected Ideas
==============

Several alternative approaches were considered as part of the research and
exploration for this PEP. However, the ideas below either involve
infrastructural cost, maintenance burden, or a worse overall developer
experience.

Using Git submodules
--------------------

Git submodules are a poor developer experience for hosting stencils because they
create a different kind of undesirable friction. For instance, any
updates to the JIT would necessitate regenerating the stencils and committing
them to a separate repository. This introduces a convoluted process: you must
update the stencils in the submodule repository, commit those changes, and then
update the submodule reference in the main CPython repository. This disconnect
adds unnecessary complexity and overhead, making the process brittle and
error-prone for contributors and maintainers.

Using Git subtrees
------------------

When using subtrees, the embedded repository becomes part of the main
repository, similar to what's being proposed in this PEP. However, subtrees
require additional tooling and steps for maintenance, which adds unnecessary
complexity to workflows.

Hosting in a separate repository
--------------------------------

While splitting JIT stencils into a separate repository avoids the storage
overhead associated with hosting the stencils, it adds complexity to the build
process. Additional tooling would be required to fetch the stencils and
potentially create additional and unnecessary failure points in the workflow.
This separation also makes it harder to ensure consistency between the stencils
and the CPython source tree, as updates must be coordinated across the
repositories. Finally, this approach introduces an attack vector, as external
repositories are less integrated with our workflows, making provenance and
integrity harder to guarantee.

Hosting in cloud storage
------------------------

Hosting stencils in cloud storage like S3 buckets or GitHub raw storage
introduces external dependencies, complicating offline development
workflows. Also, depending on the provider, this type of hosting comes with
additional cost, which we'd like to avoid.

Using Git LFS
-------------

Git Large File Storage (LFS) adds a tool dependency for contributors,
complicating the development workflow, especially for those who may not already
use Git LFS. Git LFS does not work well with offline workflows since files
managed by LFS require an internet connection to fetch when checking out
specific commits, which is disruptive for even basic Git workflows. Git LFS has
some free quota but there are `additional
costs <https://docs.github.com/en/billing/managing-billing-for-your-products/managing-billing-for-git-large-file-storage/about-billing-for-git-large-file-storage>`__.
for exceeding that quota which are also undesirable.

Maintain the status quo with LLVM as a build-time dependency
------------------------------------------------------------

Retaining LLVM as a build-time dependency upholds the existing barriers to
adoption and contribution. Ultimately, this option fails to address the core
challenges of accessibility and simplicity, and fails to eliminate the
dependency which was deemed undesirable at the Python Core Developer Sprint in
the fall (the impetus for this PEP), making it a poor long-term solution.

Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

0 comments on commit 4c0b2e8

Please sign in to comment.