Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Download Files from File Dataset - Error: C stack usage 160275245380 is too close to the limit #402

Open
buswrecker opened this issue Nov 2, 2020 · 12 comments

Comments

@buswrecker
Copy link

Describe the bug
Unable to download files to compute instance using download_from_file_dataset

> library(azuremlsdk)
> ws <- load_workspace_from_config()
> ojdata <- get_dataset_by_name(name = "diabetesfiles", workspace = ws)
> download_from_file_dataset(ojdata, target_path = 'sampleData', overwrite=T)
Error: C stack usage  160275245380 is too close to the limit
> download_from_file_dataset(ojdata, target_path = 'sampleData', overwrite=T)

Screenshots
image
image

Additional context
Add any other context about the problem here.

@nihil0
Copy link

nihil0 commented Nov 5, 2020

I have the same error, but in this case, my datastore is an Azure SQL DB. I am creating a tabular dataset from that datastore and trying to read that into a data frame. Interestingly, this works fine on Python, and I think the R package is calling the same python code. Right now, I am using a workaround where I download the files locally and read them into R data frames.

ds <- get_dataset_by_name(ws, name="traindata-test")
ds$to_csv_files()$download("./data/traindata")
df <- read_csv("./data/traindata/part-00000")

I am trying to understand why I am running into a C stack error when I try to to the conversion to R data frame from a TabularDataset using load_dataset_into_data_frame().

@buswrecker
Copy link
Author

I have use load_dataset_into_data_frame() where the source data is on SQL DB and is of TabularDataset; works fine up to 10k rows and about 200 columns; i could imagine this could be problematic with larger datasets and potentially hitting C Stack errors.

@zac-at-incycle
Copy link

I am encountering the same error when trying to load a dataset on a compute cluster.

When run on an ML compute instance via RStudio, the code below runs fine.

When executed as part of a pipeline in an RScriptStep on ML compute cluster with the same VM sku as the RStudio compute instance, it throws the C stack error:

my_data= load_dataset_into_data_frame(my_dataset)

@zac-at-incycle
Copy link

Also: Attempting to use $to_csv_files()$download(...) as mentioned by @nikhilp0 is not working for me. It caused same 'C stack' error for me when run on compute instance in RStudio.

@zac-at-incycle
Copy link

For anyone else blocked by the same issue, I was able to work around it by downloading files directly from the Datasource and not using a Dataset at all.

Instead of

my_dataset = get_dataset_by_name(aml_workspace, my_dataset_name)
my_data = load_dataset_into_data_frame(my_dataset) # 'C stack` error thrown here when running on compute cluster

I used

input_datastore = get_datastore(aml_workspace, "input_data")
download_from_datastore(datastore=input_datastore, "./input_data", overwrite=TRUE)
my_data = read.csv("./input_data/my_data.csv")

@zac-at-incycle
Copy link

Also seeing the same error when attempting to use get_model() method.

@zac-at-incycle
Copy link

I'm now seeing the same error after deploying previously working R script and RScriptStep into a new workspace. Same code and cluster VM SKU, but in new workspace consistently throws 'C stack' error.

@jakeatmsft
Copy link

Blocked by same issue, when mounting a file dataset.

@jakeatmsft
Copy link

Blocked by same issue, when mounting a file dataset.
I noticed print out the Cstack info and it seems ok. I am unable to download from datastore (workaround above) or dataset at this point is there another workaround, this is blocking a client CI/CD pipeline.

print(Cstack_info()) download_from_datastore(datastore='x', path='y', prefix='z', overwrite=TRUE)
output:
size current direction eval_depth
7969177 88448 1 11
Error: C stack usage 870311906868 is too close to the limit
Execution halted

@pourmoayed
Copy link

pourmoayed commented Dec 2, 2020

I get also the same C stack error when trying to get some data from the workspace Datastore using a simple sql query:

qry_str <- "SELECT * FROM ws_sql_view"
dataset_obj <- ws %>%
get_datastore("isf_db") %>%
reticulate::tuple(qry_str) %>%
python_sdk$core$dataset$Dataset$Tabular$from_sql_query()

The code fails in the last line where we directly use python module in R to get the dataset object (i.e, python_sdk$core$dataset$Dataset$Tabular$from_sql_query()):
Error: C stack usage 403877116004 is too close to the limit
Execution halted

Any news for a possible solution for this issue?

@zac-at-incycle
Copy link

I'm now seeing the same error after deploying previously working R script and RScriptStep into a new workspace. Same code and cluster VM SKU, but in new workspace consistently throws 'C stack' error.

For anyone else blocked by this issue, I discovered that the difference between the two workspaces mentioned above was that one used a datasource that accessed blob storage via an account key and the other used a datasource that accessed blob storage with a SAS token.

Attempting to use a datasource with a SAS key from R SDK triggered the 'C stack' error. Using the datasource with an account key did not.

@jpe316
Copy link

jpe316 commented Dec 10, 2020

Tabular datasets support in the R SDK and RScriptStep are experimental and we will not be triaging issues for them at this time - please do not take a dependency on them.

We willfollow up on the file datasets issue with recommended approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants