[Profiling] Model and device specific folders for profiled data, docu…

…mentation for profiling (#7) * minor * minor * minor * minor * profiling refactoring * minor * various fixes * minor * minor * minor * Fix CUDA illegal memory encountered error * minor * minor * Do not have hardcoded num_blocks * h100 mlp * h100 mlp * h100 mlp+attention+ar * Add A100 compute data in new format * Fix mlp profiling * Document how to add a new model * Fix incomplete mlp data for a100, delete for H100 (need reprofiling) * Change paths for profiling data files * format * Add highlight figures in README --------- Co-authored-by: Amey Agrawal <[email protected]> Co-authored-by: Nitin Kedia <[email protected]> Co-authored-by: Ubuntu <azureuser@vmss000005.qcze0vdp4tsupb4kdsslztcwbe.gvxx.internal.cloudapp.net>
microsoft · May 14, 2024 · bee57da · bee57da
1 parent 6d98aa8
commit bee57da
Show file tree

Hide file tree

Showing 68 changed files with 712,527 additions and 217,456 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,46 @@
+FILL IN THE PR DESCRIPTION HERE
+
+FIX #xxxx (*link existing issues this PR will resolve*)
+
+**BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE**
+
+---
+
+<details>
+<!-- inside this <details> section, markdown rendering does not work, so we use raw html here. -->
+<summary><b> PR Checklist (Click to Expand) </b></summary>
+
+<p>Thank you for your contribution to Vidur! Before submitting the pull request, please ensure the PR meets the following criteria. This helps Vidur maintain the code quality and improve the efficiency of the review process.</p>
+
+<h3>PR Title and Classification</h3>
+<p>Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:</p>
+<ul>
+    <li><code>[Bugfix]</code> for bug fixes.</li>
+    <li><code>[CI/Build]</code> for build or continuous integration improvements.</li>
+    <li><code>[Doc]</code> for documentation fixes and improvements.</li>
+    <li><code>[Model]</code> for adding a new model or improving an existing model. Model name should appear in the title.</li>
+    <li><code>[Profiling]</code> For changes on the profiling module. </li>
+    <li><code>[Core]</code> for changes in the core simulator logic </li>
+    <li><code>[Misc]</code> for PRs that do not fit the above categories. Please use this sparingly.</li>
+</ul>
+<p><strong>Note:</strong> If the PR spans more than one category, please include all relevant prefixes.</p>
+
+<h3>Code Quality</h3>
+
+<p>The PR need to meet the following code quality standards:</p>
+
+<ul>
+    <li>Pass all linter checks. Please use <code>make format</code></a> to format your code.</li>
+    <li>The code need to be well-documented to ensure future contributors can easily understand the code.</li>
+    <li>Please add documentation to <code>docs/source/</code> if the PR modifies the user-facing behaviors of Vidur. It helps user understand and utilize the new features or changes.</li>
+</ul>
+
+<h3>Notes for Large Changes</h3>
+<p>Please keep the changes as concise as possible. For major architectural changes (>500 LOC), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with <code>rfc-required</code> and might not go through the PR.</p>
+
+<h3>Thank You</h3>
+
+<p> Finally, thank you for taking the time to read these guidelines and for your interest in contributing to Vidur. Your contributions make Vidur a great tool for everyone! </p>
+
+
+</details>
diff --git a/README.md b/README.md
@@ -1,6 +1,12 @@
 # Vidur: LLM Inference Simulator
 
-We highly recommend reading the [Vidur: A Large-Scale Simulation Framework for LLM Inference](https://www.microsoft.com/en-us/research/publication/vidur-a-large-scale-simulation-framework-for-llm-inference/) paper first before going into the code.
+Vidur is a high-fidelity LLM inference simulator, designed to aid capacity planning and deployment configuration optimization. Please refer to our [MLSys'24 paper](https://arxiv.org/abs/2405.05465) for more details.<br>
+We have a [live demo](https://vidur.westus2.cloudapp.azure.com/) that captures the capabilities of the system.
+
+![Simulator Fidelity](./assets/dynamic_fidelity_v8_request_e2e_time_normalized_85_p95.jpeg)
+*Difference in 95th percentile Request E2E Normalized time showing fidelity of Vidur's execution time predictions across four models and three dynamic workload traces, using request load at 85% of the maximum serving capacity for each scenario.*
+![Config Search](./assets/llama70b_Chat1M_ttft_tbt_90_99_2.0_0.2.jpeg)
+*Capacity per dollar for different deployment configurations vs TTFT-P90 (left) and TBT-P99 (middle) for LLaMA2-70B.*
 
 ## Setup
 
@@ -41,10 +47,10 @@ wandb login --host https://<your-org>.wandb.io
 
 To opt out of wandb, pick any one of the following methods:
 
-1. `export WANDB_MODE=disabled` in your shell or add this in `~/.zshrc` or `~/.bashrc`. Remeber to reload using `source ~/.zshrc`.
-2. Set `wandb_project` and `wandb_group` as `""` in `simulator/config/default.yml`. Also remove these CLI params from the shell command with which the simulator is invoked.
+1. `export WANDB_MODE=disabled` in your shell or add this in `~/.zshrc` or `~/.bashrc`. Remember to reload using `source ~/.zshrc`.
+2. Set `wandb_project` and `wandb_group` as `""` in `simulator/config/default.yml`. Also, remove these CLI params from the shell command with which the simulator is invoked.
 
-## Running simulator
+## Running the simulator
 
 To run the simulator, execute the following command from the repository root,
 
@@ -78,9 +84,13 @@ python -m simulator.main  \
 --vllm_scheduler_max_tokens_in_batch 4096
 ```
 
-The simulator supports a plethora of parameters for the simulation description of which can be found [here](docs/simulator_params.md).
+The simulator supports a plethora of parameters for the simulation description which can be found [here](simulator/config/README.md).
+
+The metrics will be logged to wandb directly and a copy will be stored in the `simulator_output` directory along with the chrome trace. A description of all the logged metrics can be found [here](simulator/metrics/README.md).
+
+## Adding a new model
 
-The metrics will be logged to wandb directly and a copy will be stored in the `simulator_output` directory along with the chrome trace. A description of all the logged metrics can be found [here](docs/simulator_metrics.md).
+Instructions on adding a new model can be found [here](simulator/profiling/README.md).
 
 ## Formatting Code
 

diff --git a/assets/dynamic_fidelity_v8_request_e2e_time_normalized_85_p95.jpeg b/assets/dynamic_fidelity_v8_request_e2e_time_normalized_85_p95.jpeg
diff --git a/assets/llama70b_Chat1M_ttft_tbt_90_99_2.0_0.2.jpeg b/assets/llama70b_Chat1M_ttft_tbt_90_99_2.0_0.2.jpeg