-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathExploring_Data_Patterns.Rmd
153 lines (116 loc) · 5.67 KB
/
Exploring_Data_Patterns.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
title: "Data Patterns"
output:
word_document: default
html_notebook: default
pdf_document: default
html_document: default
---
This section covers sessions 1-2, chapters 2-3 of HW.
Covered topics:<br>
- getting data into R <br>
- making a time series <br>
- exploring autocorrelation <br>
- how to do formulas from p.41-42 of HW <br>
- naive forecasting method <br>
- moving averages <br>
- exponential smoothing <br>
- Holt's linear smoothing <br>
- Winter's method <br>
- Winters-Holt <br>
<h3>Load Data into R and transform to Time Series</h3>
First, set your working directory to wherever your excel file is, otherwise you'll have to type in the complete file path.Also save your R project and your R script there.<br>
![](figures/exploringdatapatterns_setworkdir.png)
Now you need to create an object and read the data from an excel file. If you typed the values from the book, type them as one series (one column). If you are using someone's file, transform them to one column. You don't need the years and quarters.Here is how my excel file looks after transforming: <br>
![](figures/exploringdatapatterns_excelfile.png)
There are two tabs in the file and I concatenated the Year+Quarter. Now we want to import the data from Sheet1 to R.
```{r}
#For this, we use the package readxl and its function read_excel
#first, apply the function to the file
#create an object called firstdata0 and apply the function to it
#we have to specify the file name. This argument is a string, therefore use quotation marks ""
firstdata0 <- readxl::read_excel("first session.xlsx")
firstdata0
```
Clearly that's not what we want. Our goal is to get the time series in a friendly form.
```{r}
#remove firstdata0
remove(firstdata0)
#add some inputs to the function
#We additionally specify the file name, which sheet to take the data from, and the range. Why the range? I only want to get the first column.
firstdata <- readxl::read_excel("first session.xlsx", sheet='Sheet1', range = "A1:A48")
#this displays the first five entries
head(firstdata, n=5)
```
Now the data is loaded into R. Let's check what object type we have
```{r}
str(firstdata)
```
This is a data frame. We however want a timeseries because then we can use a number of specific functions.
```{r}
#to make a time series, create a new object and apply the ts function to it like this:
series <- ts(firstdata$Series, frequency=4, start=1988)
#as you can see, we defined the frequency of observations as 4 per year and the starting year as 1988.
#let's display the object
series
```
Let's plot a series (standard series of R-base).
```{r}
#we use a generic plot function, there are many more.
#we use type = "p" for points in the scatterplot. Other types are available, use Help in Rstudio(F1) to find out more.
#main is the title of the plot
library(seasonal)
sample_dataset <- unemp
plot(sample_dataset, type="p",main="First plot")
```
<h3> Autocorrelation </h3>
To do this, we use the acf function from the stats package, which is pre-loaded.
```{r}
# Load packages
library(zoo)
library(forecast)
# We specify the series, then we want to first look at the ACF, then we want to see it plotted. All until lag 10. Coredata is used to extract the timeseries plainly.
acf(coredata(sample_dataset), plot=FALSE, lag.max =10)
acf(coredata(sample_dataset), plot=TRUE, lag.max=50, xlim=c(0,50), ylab="Auto Correlogram", xlab="Lags in Quaterlies")
```
For forecasting we use the "TTR" package. Make sure to load this package. For the example we will be using the dataset "austres" that is preloaded in R. Fromt here you can use the package as shown below. Note that we are using an moving average of 3.
```{r}
library(TTR)
simpleMA <- SMA(sample_dataset, n=3)
plot.ts(simpleMA)
```
For decomposition of your forecast, it is important to analyse the seasonality, trend and randomness of your historic data. This can easily be done in R by calling the decompose function.
```{r}
decomposition_of_historic_data <- decompose(sample_dataset)
plot(decomposition_of_historic_data)
```
In order to create predictions we can use the HoltWinters Method. This is the most evolved version of the exponential smoothing methods. For simplicity, realize that beta and gamma can be set to zero, disabling the seasonal and/or trend component, reducing it to simpler technique.
```{r}
exp <- HoltWinters(sample_dataset, beta = 0, gamma = 0)
exp_seasonal <- HoltWinters(sample_dataset, gamma = 0)
exp_seasonal_trend <- HoltWinters(sample_dataset)
print(exp)
predictions <- exp$fitted
head(predictions)
plot(exp)
plot(exp_seasonal)
plot(exp_seasonal_trend)
cat("SSE exp", exp$SSE, "\n", "SSE exp Seasonality", exp_seasonal$SSE, "\n", "SSE exp Seasonality and Trend", exp_seasonal_trend$SSE, "\n")
```
In order calculate all the performance measures for the prediction a couple of numbers need to be retrieved. The formula v = n - m relates to the R objects as n = the amoun of values, m = the amount of coefficient. v represents the residual degrees of freedom. In order to calculate everthing the following steps need to be conducted:
```{r}
n <- length(exp$fitted)
m <- length(exp$coefficients)
v <- n - m
cat("With this formula we have established the values for n and m, those being", n, "and", m, "therefore we can calculate v by substraction.", "V is", v)
# Calculate MSE
MSE <- exp$SSE / v
cat("MSE =", MSE)
```
In order to forecast the data for hundred periods, one can use the forecast package. The syntax works as follows:
```{r}
exp_fore <- forecast::forecast(exp, h = 100)
exp_fore_trend_seasonal <- forecast::forecast(exp_seasonal_trend, h = 100)
plot(exp_fore, main="Forecast for Exponential Smothing (basic)")
plot(exp_fore_trend_seasonal, main = "Forecast for HoltWinters method with Trend and Seasonality")
```