Skip to content

Commit

Permalink
[Profiling] Model and device specific folders for profiled data, docu…
Browse files Browse the repository at this point in the history
…mentation for profiling (#7)

* minor

* minor

* minor

* minor

* profiling refactoring

* minor

* various fixes

* minor

* minor

* minor

* Fix CUDA illegal memory encountered error

* minor

* minor

* Do not have hardcoded num_blocks

* h100 mlp

* h100 mlp

* h100 mlp+attention+ar

* Add A100 compute data in new format

* Fix mlp profiling

* Document how to add a new model

* Fix incomplete mlp data for a100, delete for H100 (need reprofiling)

* Change paths for profiling data files

* format

* Add highlight figures in README

---------

Co-authored-by: Amey Agrawal <[email protected]>
Co-authored-by: Nitin Kedia <[email protected]>
Co-authored-by: Ubuntu <azureuser@vmss000005.qcze0vdp4tsupb4kdsslztcwbe.gvxx.internal.cloudapp.net>
  • Loading branch information
4 people authored May 14, 2024
1 parent 6d98aa8 commit bee57da
Show file tree
Hide file tree
Showing 68 changed files with 712,527 additions and 217,456 deletions.
46 changes: 46 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
FILL IN THE PR DESCRIPTION HERE

FIX #xxxx (*link existing issues this PR will resolve*)

**BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE**

---

<details>
<!-- inside this <details> section, markdown rendering does not work, so we use raw html here. -->
<summary><b> PR Checklist (Click to Expand) </b></summary>

<p>Thank you for your contribution to Vidur! Before submitting the pull request, please ensure the PR meets the following criteria. This helps Vidur maintain the code quality and improve the efficiency of the review process.</p>

<h3>PR Title and Classification</h3>
<p>Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:</p>
<ul>
<li><code>[Bugfix]</code> for bug fixes.</li>
<li><code>[CI/Build]</code> for build or continuous integration improvements.</li>
<li><code>[Doc]</code> for documentation fixes and improvements.</li>
<li><code>[Model]</code> for adding a new model or improving an existing model. Model name should appear in the title.</li>
<li><code>[Profiling]</code> For changes on the profiling module. </li>
<li><code>[Core]</code> for changes in the core simulator logic </li>
<li><code>[Misc]</code> for PRs that do not fit the above categories. Please use this sparingly.</li>
</ul>
<p><strong>Note:</strong> If the PR spans more than one category, please include all relevant prefixes.</p>

<h3>Code Quality</h3>

<p>The PR need to meet the following code quality standards:</p>

<ul>
<li>Pass all linter checks. Please use <code>make format</code></a> to format your code.</li>
<li>The code need to be well-documented to ensure future contributors can easily understand the code.</li>
<li>Please add documentation to <code>docs/source/</code> if the PR modifies the user-facing behaviors of Vidur. It helps user understand and utilize the new features or changes.</li>
</ul>

<h3>Notes for Large Changes</h3>
<p>Please keep the changes as concise as possible. For major architectural changes (>500 LOC), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with <code>rfc-required</code> and might not go through the PR.</p>

<h3>Thank You</h3>

<p> Finally, thank you for taking the time to read these guidelines and for your interest in contributing to Vidur. Your contributions make Vidur a great tool for everyone! </p>


</details>
22 changes: 16 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# Vidur: LLM Inference Simulator

We highly recommend reading the [Vidur: A Large-Scale Simulation Framework for LLM Inference](https://www.microsoft.com/en-us/research/publication/vidur-a-large-scale-simulation-framework-for-llm-inference/) paper first before going into the code.
Vidur is a high-fidelity LLM inference simulator, designed to aid capacity planning and deployment configuration optimization. Please refer to our [MLSys'24 paper](https://arxiv.org/abs/2405.05465) for more details.<br>
We have a [live demo](https://vidur.westus2.cloudapp.azure.com/) that captures the capabilities of the system.

![Simulator Fidelity](./assets/dynamic_fidelity_v8_request_e2e_time_normalized_85_p95.jpeg)
*Difference in 95th percentile Request E2E Normalized time showing fidelity of Vidur's execution time predictions across four models and three dynamic workload traces, using request load at 85% of the maximum serving capacity for each scenario.*
![Config Search](./assets/llama70b_Chat1M_ttft_tbt_90_99_2.0_0.2.jpeg)
*Capacity per dollar for different deployment configurations vs TTFT-P90 (left) and TBT-P99 (middle) for LLaMA2-70B.*

## Setup

Expand Down Expand Up @@ -41,10 +47,10 @@ wandb login --host https://<your-org>.wandb.io

To opt out of wandb, pick any one of the following methods:

1. `export WANDB_MODE=disabled` in your shell or add this in `~/.zshrc` or `~/.bashrc`. Remeber to reload using `source ~/.zshrc`.
2. Set `wandb_project` and `wandb_group` as `""` in `simulator/config/default.yml`. Also remove these CLI params from the shell command with which the simulator is invoked.
1. `export WANDB_MODE=disabled` in your shell or add this in `~/.zshrc` or `~/.bashrc`. Remember to reload using `source ~/.zshrc`.
2. Set `wandb_project` and `wandb_group` as `""` in `simulator/config/default.yml`. Also, remove these CLI params from the shell command with which the simulator is invoked.

## Running simulator
## Running the simulator

To run the simulator, execute the following command from the repository root,

Expand Down Expand Up @@ -78,9 +84,13 @@ python -m simulator.main \
--vllm_scheduler_max_tokens_in_batch 4096
```

The simulator supports a plethora of parameters for the simulation description of which can be found [here](docs/simulator_params.md).
The simulator supports a plethora of parameters for the simulation description which can be found [here](simulator/config/README.md).

The metrics will be logged to wandb directly and a copy will be stored in the `simulator_output` directory along with the chrome trace. A description of all the logged metrics can be found [here](simulator/metrics/README.md).

## Adding a new model

The metrics will be logged to wandb directly and a copy will be stored in the `simulator_output` directory along with the chrome trace. A description of all the logged metrics can be found [here](docs/simulator_metrics.md).
Instructions on adding a new model can be found [here](simulator/profiling/README.md).

## Formatting Code

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit bee57da

Please sign in to comment.