You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the issue linked to the documentation
I have trouble completely understanding the PyODScorer and KMeansScorer based on the darts documentation page. For clarification I have dissected the problem in a few smaller questions:
In the documentation pages for PyODScorer/KMeansScorer the data is said to be split into moving windows of size W and the following description is given: "For a series of length N, (N - W + 1)/W subsequences will be generated". Depending if the stride is 1 or W either (N - W + 1) or (N // W) may be correct. Which one is it?
The TimeSeries subsequence passed to the underlying scorer is of shape (W * D). Does this mean a 2D 'array' of shape (W, D), e.g W by D or a 1D vector of length WxD, e.g. W times D?
Based on the Sklearn documentation, the k-means clusterer takes data in the shape (n_samples, n_features). Which of the following is correct?
The documentation indicates that it is n_features=W, "applying a score per vector of size W". If this is the case, why was this direction chosen instead of ii.?
In the context of inputted subsequences, I would expect that n_samples = W and n_features = D. Thus if D=1, the value at each point in time is clustered. If this is the case, is the score for each item in the vector aggregated to achieve a single score per window vector?
PyOD has the same (n_samples, n_features) input format, Is point 3 the same for the PyODScorer?
Describe the issue linked to the documentation
I have trouble completely understanding the PyODScorer and KMeansScorer based on the darts documentation page. For clarification I have dissected the problem in a few smaller questions:
In the documentation pages for PyODScorer/KMeansScorer the data is said to be split into moving windows of size W and the following description is given: "For a series of length N, (N - W + 1)/W subsequences will be generated". Depending if the stride is 1 or W either (N - W + 1) or (N // W) may be correct. Which one is it?
The TimeSeries subsequence passed to the underlying scorer is of shape (W * D). Does this mean a 2D 'array' of shape (W, D), e.g W by D or a 1D vector of length WxD, e.g. W times D?
Based on the Sklearn documentation, the k-means clusterer takes data in the shape (n_samples, n_features). Which of the following is correct?
PyOD has the same (n_samples, n_features) input format, Is point 3 the same for the PyODScorer?
Thank you for your time.
Additional context
Relevant darts documentation pages: https://unit8co.github.io/darts/generated_api/darts.ad.scorers.kmeans_scorer.html
And: https://unit8co.github.io/darts/generated_api/darts.ad.scorers.pyod_scorer.html
Sklearn Kmeans docs: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
The text was updated successfully, but these errors were encountered: