Unable to Download Files from File Dataset - Error: C stack usage 160275245380 is too close to the limit #402

buswrecker · 2020-11-02T02:01:09Z

Describe the bug
Unable to download files to compute instance using download_from_file_dataset

> library(azuremlsdk)
> ws <- load_workspace_from_config()
> ojdata <- get_dataset_by_name(name = "diabetesfiles", workspace = ws)
> download_from_file_dataset(ojdata, target_path = 'sampleData', overwrite=T)
Error: C stack usage  160275245380 is too close to the limit
> download_from_file_dataset(ojdata, target_path = 'sampleData', overwrite=T)

Screenshots

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

nihil0 · 2020-11-05T10:09:22Z

I have the same error, but in this case, my datastore is an Azure SQL DB. I am creating a tabular dataset from that datastore and trying to read that into a data frame. Interestingly, this works fine on Python, and I think the R package is calling the same python code. Right now, I am using a workaround where I download the files locally and read them into R data frames.

ds <- get_dataset_by_name(ws, name="traindata-test")
ds$to_csv_files()$download("./data/traindata")
df <- read_csv("./data/traindata/part-00000")

I am trying to understand why I am running into a C stack error when I try to to the conversion to R data frame from a TabularDataset using load_dataset_into_data_frame().

buswrecker · 2020-11-05T20:54:58Z

I have use load_dataset_into_data_frame() where the source data is on SQL DB and is of TabularDataset; works fine up to 10k rows and about 200 columns; i could imagine this could be problematic with larger datasets and potentially hitting C Stack errors.

zac-at-incycle · 2020-11-12T18:01:08Z

I am encountering the same error when trying to load a dataset on a compute cluster.

When run on an ML compute instance via RStudio, the code below runs fine.

When executed as part of a pipeline in an RScriptStep on ML compute cluster with the same VM sku as the RStudio compute instance, it throws the C stack error:

my_data= load_dataset_into_data_frame(my_dataset)

zac-at-incycle · 2020-11-12T18:27:22Z

Also: Attempting to use $to_csv_files()$download(...) as mentioned by @nikhilp0 is not working for me. It caused same 'C stack' error for me when run on compute instance in RStudio.

zac-at-incycle · 2020-11-12T20:27:38Z

For anyone else blocked by the same issue, I was able to work around it by downloading files directly from the Datasource and not using a Dataset at all.

Instead of

my_dataset = get_dataset_by_name(aml_workspace, my_dataset_name)
my_data = load_dataset_into_data_frame(my_dataset) # 'C stack` error thrown here when running on compute cluster

I used

input_datastore = get_datastore(aml_workspace, "input_data")
download_from_datastore(datastore=input_datastore, "./input_data", overwrite=TRUE)
my_data = read.csv("./input_data/my_data.csv")

zac-at-incycle · 2020-11-12T21:39:46Z

Also seeing the same error when attempting to use get_model() method.

zac-at-incycle · 2020-11-16T18:49:35Z

I'm now seeing the same error after deploying previously working R script and RScriptStep into a new workspace. Same code and cluster VM SKU, but in new workspace consistently throws 'C stack' error.

jakeatmsft · 2020-11-30T14:53:37Z

Blocked by same issue, when mounting a file dataset.

jakeatmsft · 2020-11-30T22:05:40Z

Blocked by same issue, when mounting a file dataset.
I noticed print out the Cstack info and it seems ok. I am unable to download from datastore (workaround above) or dataset at this point is there another workaround, this is blocking a client CI/CD pipeline.

print(Cstack_info()) download_from_datastore(datastore='x', path='y', prefix='z', overwrite=TRUE)
output:
size current direction eval_depth
7969177 88448 1 11
Error: C stack usage 870311906868 is too close to the limit
Execution halted

pourmoayed · 2020-12-02T13:32:28Z

I get also the same C stack error when trying to get some data from the workspace Datastore using a simple sql query:

qry_str <- "SELECT * FROM ws_sql_view"
dataset_obj <- ws %>%
get_datastore("isf_db") %>%
reticulate::tuple(qry_str) %>%
python_sdk$core$dataset$Dataset$Tabular$from_sql_query()

The code fails in the last line where we directly use python module in R to get the dataset object (i.e, python_sdk$core$dataset$Dataset$Tabular$from_sql_query()):
Error: C stack usage 403877116004 is too close to the limit
Execution halted

Any news for a possible solution for this issue?

zac-at-incycle · 2020-12-02T14:38:07Z

I'm now seeing the same error after deploying previously working R script and RScriptStep into a new workspace. Same code and cluster VM SKU, but in new workspace consistently throws 'C stack' error.

For anyone else blocked by this issue, I discovered that the difference between the two workspaces mentioned above was that one used a datasource that accessed blob storage via an account key and the other used a datasource that accessed blob storage with a SAS token.

Attempting to use a datasource with a SAS key from R SDK triggered the 'C stack' error. Using the datasource with an account key did not.

jpe316 · 2020-12-10T22:11:18Z

Tabular datasets support in the R SDK and RScriptStep are experimental and we will not be triaging issues for them at this time - please do not take a dependency on them.

We willfollow up on the file datasets issue with recommended approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to Download Files from File Dataset - Error: C stack usage 160275245380 is too close to the limit #402

Unable to Download Files from File Dataset - Error: C stack usage 160275245380 is too close to the limit #402

buswrecker commented Nov 2, 2020

nihil0 commented Nov 5, 2020

buswrecker commented Nov 5, 2020

zac-at-incycle commented Nov 12, 2020

zac-at-incycle commented Nov 12, 2020

zac-at-incycle commented Nov 12, 2020

zac-at-incycle commented Nov 12, 2020

zac-at-incycle commented Nov 16, 2020

jakeatmsft commented Nov 30, 2020

jakeatmsft commented Nov 30, 2020

pourmoayed commented Dec 2, 2020 •

edited

Loading

zac-at-incycle commented Dec 2, 2020

jpe316 commented Dec 10, 2020

Unable to Download Files from File Dataset - Error: C stack usage 160275245380 is too close to the limit #402

Unable to Download Files from File Dataset - Error: C stack usage 160275245380 is too close to the limit #402

Comments

buswrecker commented Nov 2, 2020

nihil0 commented Nov 5, 2020

buswrecker commented Nov 5, 2020

zac-at-incycle commented Nov 12, 2020

zac-at-incycle commented Nov 12, 2020

zac-at-incycle commented Nov 12, 2020

zac-at-incycle commented Nov 12, 2020

zac-at-incycle commented Nov 16, 2020

jakeatmsft commented Nov 30, 2020

jakeatmsft commented Nov 30, 2020

pourmoayed commented Dec 2, 2020 • edited Loading

zac-at-incycle commented Dec 2, 2020

jpe316 commented Dec 10, 2020

pourmoayed commented Dec 2, 2020 •

edited

Loading