-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathch6.Rmd
144 lines (107 loc) · 3.67 KB
/
ch6.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
title: "讀取不同的資料格式"
author: "郭耀仁"
date: "`r Sys.Date()`"
output: slidy_presentation
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, results = 'hide', warning = FALSE)
```
## 讀取 csv
- 這是實務中最常運用的方法,csv 是 comma-separated values 的縮寫
- 利用 `read.table()` 函數
- `sep = ","` 告訴R 語言這個資料是以逗號(comma)區隔變數
- `header = TRUE` 告訴 R 語言這個資料的第一列(The first row)是變數名稱
```{r}
url <- "https://storage.googleapis.com/r_rookies/iris.csv" # 在雲端上儲存了一份 csv 檔案
iris_csv_df <- read.table(url, sep = ",", header = TRUE)
head(iris_csv_df)
```
## 讀取 csv (2)
- 欄位屬性可以在讀取資料時設定,指定 `colClasses = ` 這個參數,輸入一個字串向量。
```{r}
url <- "https://storage.googleapis.com/r_rookies/iris.csv" # 在雲端上儲存了一份 csv 檔案
iris_csv_df <- read.table(url, sep = ",", header = TRUE, colClasses = c("numeric", "numeric", "numeric", "numeric", "character"))
str(iris_csv_df)
```
## 讀取 tsv
- tsv 是 tab-separated values 的縮寫
- `sep = "\t"` 告訴 R 語言變數之間的分隔符號是 tab 鍵
```{r}
url <- "https://storage.googleapis.com/r_rookies/iris.tsv" # 在雲端上儲存了一份 tsv 檔案
iris_tsv_df <- read.table(url, sep = "\t", header = TRUE)
head(iris_tsv_df)
```
## 讀取 txt
- `sep = ":"` 這個參數告訴 R 語言變數之間的分隔符號為冒號
```{r}
url <- "https://storage.googleapis.com/r_rookies/iris.txt" # 在雲端上儲存了一份 txt 檔案
iris_colon_sep_df <- read.table(url, sep = ":", header = TRUE)
head(iris_colon_sep_df)
```
## 載入 Excel 試算表
- 使用 `readxl` 套件中的 `read_excel` 函數
- 先將 <https://storage.googleapis.com/r_rookies/iris.xlsx> 下載到 `~/Downloads` 目錄下
```
install.packages("readxl")
```
```{r, results='hide'}
library(readxl)
download_path <- "~/Downloads/iris.xlsx"
iris_xlsx_df <- read_excel(download_path)
```
## 載入 SAS 資料集
- 使用 `haven()` 套件中的 `read_sas()` 函數可以載入 SAS 資料集
```
install.packages("haven")
```
```{r}
library(haven)
smoking_sas_data <- read_sas("http://storage.googleapis.com/r_rookies/smoking.sas7bdat")
```
## 載入 JSON
- 什麼是 JSON(JavaScript Object Notation)?
- 這是一個 JSON Object:
```{r}
friends_json <- '{
"genre": "Sitcom",
"seasons": 10,
"episodes": 236,
"stars": ["Jennifer Aniston", "Courteney Cox", "Lisa Kudrow", "Matt LeBlanc", "Matthew Perry", "David Schwimmer"]
}'
```
## 載入 JSON(2)
- 使用 `jsonlite` 套件的 `fromJSON()` 函數來載入
- 結果是一個 **List(清單)**
```
install.packages("jsonlite")
```
```r
library(jsonlite)
friends_json <- '{
"genre": "Sitcom",
"seasons": 10,
"episodes": 236,
"stars": ["Jennifer Aniston", "Courteney Cox", "Lisa Kudrow", "Matt LeBlanc", "Matthew Perry", "David Schwimmer"]
}'
friends_list <- fromJSON(friends_json)
paste("六人行有幾季:", friends_list$seasons)
paste("Who stars Rachel Green:", friends_list$stars[1])
```
## 載入 JSON(3)
- 這是一個 array of JSON object
- 使用 `jsonlite` 套件的 `fromJSON()` 函數來載入
- 結果是一個 **Dataframe(資料框)**
```r
library(jsonlite)
starring_json <- '[
{"character": "Rachel Green", "star": "Jennifer Aniston"},
{"character": "Monica Geller", "star": "Courteney Cox"},
{"character": "Phoebe Buffay", "star": "Lisa Kudrow"},
{"character": "Joey Tribbiani", "star": "Matt LeBlanc"},
{"character": "Chandler Bing", "star": "Matthew Perry"},
{"character": "Ross Geller", "star": "David Schwimmer"}
]'
starring_df <- fromJSON(starring_json)
View(starring_df)
```