Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Questions regarding anomaly scoring documentation #2660

Open
tristandijkstra opened this issue Jan 30, 2025 · 0 comments
Open

[QUESTION] Questions regarding anomaly scoring documentation #2660

tristandijkstra opened this issue Jan 30, 2025 · 0 comments
Labels
question Further information is requested triage Issue waiting for triaging

Comments

@tristandijkstra
Copy link

Describe the issue linked to the documentation
I have trouble completely understanding the PyODScorer and KMeansScorer based on the darts documentation page. For clarification I have dissected the problem in a few smaller questions:

  1. In the documentation pages for PyODScorer/KMeansScorer the data is said to be split into moving windows of size W and the following description is given: "For a series of length N, (N - W + 1)/W subsequences will be generated". Depending if the stride is 1 or W either (N - W + 1) or (N // W) may be correct. Which one is it?

  2. The TimeSeries subsequence passed to the underlying scorer is of shape (W * D). Does this mean a 2D 'array' of shape (W, D), e.g W by D or a 1D vector of length WxD, e.g. W times D?

  3. Based on the Sklearn documentation, the k-means clusterer takes data in the shape (n_samples, n_features). Which of the following is correct?

    1. The documentation indicates that it is n_features=W, "applying a score per vector of size W". If this is the case, why was this direction chosen instead of ii.?
    2. In the context of inputted subsequences, I would expect that n_samples = W and n_features = D. Thus if D=1, the value at each point in time is clustered. If this is the case, is the score for each item in the vector aggregated to achieve a single score per window vector?
  4. PyOD has the same (n_samples, n_features) input format, Is point 3 the same for the PyODScorer?

Thank you for your time.

Additional context
Relevant darts documentation pages: https://unit8co.github.io/darts/generated_api/darts.ad.scorers.kmeans_scorer.html
And: https://unit8co.github.io/darts/generated_api/darts.ad.scorers.pyod_scorer.html

Sklearn Kmeans docs: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

@tristandijkstra tristandijkstra added question Further information is requested triage Issue waiting for triaging labels Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested triage Issue waiting for triaging
Projects
None yet
Development

No branches or pull requests

1 participant