-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fastsafetensors #298
base: main
Are you sure you want to change the base?
Fastsafetensors #298
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Loading Model weights with fastsafetensors | ||
=================================================================== | ||
|
||
Using fastsafetensor library enables loading model weights to GPU memory by leveraging GPU direct storage. See https://github.com/foundation-model-stack/fastsafetensors for more details. | ||
For enabling this feature, set the environment variable ``USE_FASTSAFETENSOR`` to ``true`` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this still relevant? |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,9 +42,10 @@ | |
set_default_torch_dtype) | ||
from vllm.model_executor.model_loader.weight_utils import ( | ||
download_safetensors_index_file_from_hf, download_weights_from_hf, | ||
filter_duplicate_safetensors_files, filter_files_not_needed_for_inference, | ||
get_gguf_extra_tensor_names, gguf_quant_weights_iterator, | ||
initialize_dummy_weights, np_cache_weights_iterator, pt_weights_iterator, | ||
fastsafetensors_weights_iterator, filter_duplicate_safetensors_files, | ||
filter_files_not_needed_for_inference, get_gguf_extra_tensor_names, | ||
gguf_quant_weights_iterator, initialize_dummy_weights, | ||
np_cache_weights_iterator, pt_weights_iterator, | ||
runai_safetensors_weights_iterator, safetensors_weights_iterator) | ||
from vllm.model_executor.utils import set_weight_attrs | ||
from vllm.platforms import current_platform | ||
|
@@ -307,7 +308,15 @@ def _get_weights_iterator( | |
hf_weights_files, | ||
) | ||
elif use_safetensors: | ||
weights_iterator = safetensors_weights_iterator(hf_weights_files) | ||
use_fastsafe_tensor = os.getenv('USE_FASTSAFETENSOR', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My suggestion here would be to enable loader extra config for the DefaultLoader: https://github.com/vllm-project/vllm/blob/e784c6b9984e8f8116f74000b863d941495acb0b/vllm/model_executor/model_loader/loader.py#L190 So that you can enable fastsafetensors and GDS with |
||
'False').lower() == 'true' | ||
if use_fastsafe_tensor: | ||
logger.info("Using fastsafetensor for loading weights") | ||
weights_iterator = fastsafetensors_weights_iterator( | ||
hf_weights_files) | ||
else: | ||
weights_iterator = safetensors_weights_iterator( | ||
hf_weights_files) | ||
else: | ||
weights_iterator = pt_weights_iterator(hf_weights_files) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the fastsafetensors repo's pyproject.toml,
nvidia-cufile-cu12
is not listed as a dependency. Should we open a PR there to add this as a dep instead of manually adding it here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
cufile
lib isn’t listed as a direct installation requirement, but it’s necessary because when importing the main object, it relies on a piece of code (fastcpp
) that depends oncufile
. The key point is thatcufile
is required for using the library with GDS support, which isn’t our use case. However, due to how the lib is structured, we still need this dependency even if we’re not using GDS. According to the docs, settingnogds=True
disables GDS support, but thecufile
requirement is triggered immediately upon import. I believe the best course of action is to open an issue in the repo to address this condition (I can take care of that).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @fialhocoelho, I missed this reply.
Thanks for the explanation. Are we sure we're not interested in GDS though? I don't know much about it, but it seems it's one of the performance improvements that
fastsafetensors
bringsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dtrifiro I don’t have much experience with how GDS would be installed in an OpenShift env, since it operates at a low level. I’ve only used it in a DGX environment. I can look into it a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at https://docs.nvidia.com/gpudirect-storage/overview-guide/index.html#software-architecture it doesn't seem like there's much needed apart from
cufile
and the nvidia driver