[QUESTION] Questions regarding anomaly scoring documentation #2660

tristandijkstra · 2025-01-30T19:17:05Z

Describe the issue linked to the documentation
I have trouble completely understanding the PyODScorer and KMeansScorer based on the darts documentation page. For clarification I have dissected the problem in a few smaller questions:

In the documentation pages for PyODScorer/KMeansScorer the data is said to be split into moving windows of size W and the following description is given: "For a series of length N, (N - W + 1)/W subsequences will be generated". Depending if the stride is 1 or W either (N - W + 1) or (N // W) may be correct. Which one is it?
The TimeSeries subsequence passed to the underlying scorer is of shape (W * D). Does this mean a 2D 'array' of shape (W, D), e.g W by D or a 1D vector of length WxD, e.g. W times D?
Based on the Sklearn documentation, the k-means clusterer takes data in the shape (n_samples, n_features). Which of the following is correct?
1. The documentation indicates that it is n_features=W, "applying a score per vector of size W". If this is the case, why was this direction chosen instead of ii.?
2. In the context of inputted subsequences, I would expect that n_samples = W and n_features = D. Thus if D=1, the value at each point in time is clustered. If this is the case, is the score for each item in the vector aggregated to achieve a single score per window vector?
PyOD has the same (n_samples, n_features) input format, Is point 3 the same for the PyODScorer?

Thank you for your time.

Additional context
Relevant darts documentation pages: https://unit8co.github.io/darts/generated_api/darts.ad.scorers.kmeans_scorer.html
And: https://unit8co.github.io/darts/generated_api/darts.ad.scorers.pyod_scorer.html

Sklearn Kmeans docs: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

tristandijkstra added question Further information is requested triage Issue waiting for triaging labels Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Questions regarding anomaly scoring documentation #2660

[QUESTION] Questions regarding anomaly scoring documentation #2660

tristandijkstra commented Jan 30, 2025

[QUESTION] Questions regarding anomaly scoring documentation #2660

[QUESTION] Questions regarding anomaly scoring documentation #2660

Comments

tristandijkstra commented Jan 30, 2025