-
Notifications
You must be signed in to change notification settings - Fork 63
/
Copy pathTutorial7_ExportingWork.Rmd
385 lines (294 loc) · 12.5 KB
/
Tutorial7_ExportingWork.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
% Tutorial 7: Exporting Your Work
% DPI R Bootcamp
% Jared Knowles
```{r setup, include=FALSE}
# set global chunk options
opts_chunk$set(fig.path='figure/slides7-', cache.path='cache/slides7-',fig.width=8,fig.height=6,message=FALSE,error=FALSE,warning=FALSE,echo=TRUE,dev='svg')
#source('ggplot2themes.R')
load('data/smalldata.rda')
library(eeptools)
```
# Overview
In this lesson we hope to learn:
- Quick and basic export of results
- Writing a basic report
- Exporting graphics for use in other documents
- Reproducible research
<p align="center"><img src="img/dpilogo.png" height="81" width="138"></p>
# Why does Export Matter?
- Need to be able to share work with others and present it
- This can be tricky in R because R has so many choices for export
- R can talk to a number of outside formats including Excel, Word, PDF, PNG, HTML
- R can even be used as a web service: [https://public.opencpu.org/pages/](https://public.opencpu.org/pages/)
- We'll cover the basics, but there are a lot of outside resources for doing what you need in your own environment
# Generating a basic report
- There are a few key concepts that R allows that we should follow when preparing a report on data
1. Include the data, source code, and output together in one package
2. Have the source code available for raw data to finished product
3. Present figures, tables, and code in a single document
- Why do we do this?
* Transparency
* Reproducibility
* It isn't much harder than the basic analysis itself
# A few terms
- **Rmarkdown** extension **.Rmd** and used for R flavored markdown in RStudio
- **R script** extension **.R** used to store R code only
- **R HTML** extension **.RHtml** used to store R HTML
- **Text** extension **.txt** used to write plain text
- ** R Sweave** extension **.Rnw** used to write LaTeX reports with embedded R code
# Beginning
- Open a new script file in RStudio--"R Markdown"
- This opens a template file for an HTML report, the easiest type to create
- In R there are a number of packages designed to help create reports and place them on a scale like so:
```{r tradeoff,dev='svg',echo=FALSE,out.height='400px',out.width='700px'}
library(ggplot2)
pub<-c(1,12,17,20)
time<-c(1,4,10,15)
a<-qplot(time,pub)+geom_smooth(se=FALSE)+ylab('Quality')+xlab('Time')
a+geom_text(aes(x=4.8,y=11.5,label='R Markdown'))+
geom_text(aes(x=2.5,y=2.3,label='copy and paste'))+
geom_text(aes(x=10,y=16.2,label='knitr to PDF'))+
geom_text(aes(x=14,y=18.2,label='Sweave +bibtex'))+opts(
title='Quality v. Time Tradeoff in Exporting Information')+theme_dpi()
```
# Get the tools
- `install.packages('knitr','markdown')`; `sweave` is part of R base already
- We need a LaTeX distribution for Tex documents
- What is LaTeX?
- An open source typsetting framework that can be used to produce complex technical documents
- We won't mess with LaTeX today - instead we will focus on HTML and markdown
# A Simple Example
- Paste this in a new R script
- Save the script in your working directory as `myscript.R`
```{r dummyscript, echo=TRUE,eval=FALSE}
#' This is some text
#'
# + myplot, dev='svg',out.width='500px',out.height='400px'
library(ggplot2)
data(diamonds)
qplot(carat,price,data=diamonds,alpha=I(.3),color=clarity)
#' Diamond size is clearly related to price, but not in a linear fashion.
#'
```
# Converting the Script
- This happens in two steps to make an HTML file
```{r spinr,eval=FALSE}
o<-spin("C:/Path/To/myscript.R",knit=FALSE)
knit2html(o,envir=new.env())
```
# Example Script II
- Create `myscript2.R`
```{r scriptr2,eval=FALSE,echo=TRUE}
#' This is some text that I want to explain
#' For example, this plot is important, let's look below
# + myplot, dev='svg',out.width='500px',out.height='400px',warning=FALSE,message=FALSE
library(ggplot2)
load('PATH/TO/MY/DATA.rda')
qplot(readSS,mathSS,data=df,alpha=I(.2))+geom_smooth()
#' There is not a linear relationship, but it sure is close.
#' Let's do some regression
#'
test<-lm(mathSS~readSS+factor(grade),data=df)
summary(test)
#' It's all statistically significant
```
# Spin 2
```{r spinr23, eval=FALSE,echo=TRUE}
o<-spin("C:/Path/To/myscript2.R",knit=FALSE)
knit2html(o,envir=new.env())
# We specify that new environment is used to carry out the analysis, not the current environment
```
# Stitch
- Stiching is similar to spinning, but we don't even have to mess around with anything but our R script
- Just write the R code
- Any comments you write preceeded by the **#** denote something we want printed in the text
- Even more simplified than `spin`
# Stitch Example
```{r stitching}
## title: My Super Report ##
## Author: Mr. Data ##
# A plot and some text
library(ggplot2)
library(xtable)
load('PATH/TO/MY/DATA')
qplot(readSS,mathSS,data=df,alpha=I(.2))+geom_smooth()
# Now a linear model
test<-lm(mathSS~readSS+factor(grade),data=df)
summary(test)
# Ok!
```
# Stitch Spin
- A number of different outputs
```{r sdfafeaw, eval=FALSE,echo=TRUE}
# Markdown
stitch('PATH/TO/MY/SCRIPT',system.file("misc","knitr-template.Rmd",package="knitr"))
knit2html("Path/To/My/Markdown.md")
# Direct 2 Html
stitch('PATH/TO/MY/SCRIPT',system.file("misc","knitr-template.html",package="knitr"))
# Direct to PDF
# Requires LaTeX
stitch('PATH/TO/MY/SCRIPT')
```
# R Flavored Markdown
- Creating a **.Rmd** file allows us to use the `knitr` package to give us in-depth control over R HTML files
- Along the way we learn how to make great PDF files using the same toolkit
- This unified platform greatly simplifies the way R can be used in the workflow to make analyses a single coherent package from raw data to outputs
# R Report Concepts
- a **chunk** is the technical term (no really) for a piece of R code inserted in a document
- These slides are made up of text with some markup + chunks of R code
- **opts** tell R how to use the chunk
- Options allow us to keep or discard the chunk, display the results, source code, or both, display a figure or graph, and how to format the output we get from R
- This is all incredibly flexible and incredibly easy to implement with a few basic parameters
- You can view the source code of each one of these tutorials online to get a feel for the format (**.Rmd**)
# A quick example
```{r,tidy=FALSE,eval=FALSE,echo=TRUE}
# Start .Rmd file on next line
My Super Report on Student Testing
------------------------------------
Dr. Debateman
==============
In this report I plan to show you all the results of student testing in Myoming.
#```{r chunksetup, include=FALSE} (remove # in actual document)
load("PATH/TO/MY/DATA.rda")
source("myscript.R")
library(ggplot2)
#```
The most important thing to look at is this plot:
#```{r plot1,dev='png',fig.width=9,fig.height=6}
qplot(readSS,mathSS,data=df)
#```
And my model output can be included a few ways because it is so great.
#```{r mystatmodel,results='markup'}
mymod<-lm(readSS~mathSS+factor(grade),data=df)
summary(mymod)
#```
#```{r mystatmodel2,results='asis'}
mymod<-lm(readSS~mathSS+factor(grade),data=df)
print(xtable(summary(mymod)),type='html')
#```
And because I am awesome, I am done.
```
# Check for Understanding
- Paste the text into a .Rmd file
- Delete the first line
- Change "PATH/TO/MY/DATA" to the path to our data
- Click "Knit HTML" in RStudio
- Or:
```{r eval=FALSE}
knit("PATH/TO/myscript.Rmd",envir=new.env())
knit2html("Path/To/Myscript.md")
```
# Global options
- **knitr** has global options we can set for our whole document
- This allows us to customize our reports
- This is an example block of knitr options:
```{r eval=FALSE,echo=TRUE}
opts_chunk$set(fig.path='figure/slides7-', cache.path='cache/slides7-',fig.width=8,fig.height=6,message=FALSE,error=FALSE,warning=FALSE,echo=TRUE,dev='svg')
```
# Chunk Options
- We can choose a number of options about our chunks--whether to include them or not
- We can set the output of our graphics to be a certain size
- We can store or cache results of chunks so they do not have to be rerun with cache
- We can even print code that does not run but just gets displayed
# Making the Report a Bit Prettier
- A frustrating thing about R reports can be that they don't fit into our institutional reporting requirements
- The HTML framework is simple, but doesn't give us much flexibility (we need LaTeX for total control)
- But, with a bit of CSS hacking, we can get somewhere in between by pasting this at the head of our document
```
<style type="text/css">
body, td {
font-size: 14px;
}
r.code{
font-size: 10px;
}
pre {
font-size: 10px
}
</style>
```
# Exporting a Single Plot
- Sometimes we don't need to produce a whole report, but just save a particular plot and share it
- We do this with the `dev` argument where `dev` is equivalent to the graphic type we want to export
```{r singlegraphicoutput,echo=TRUE,eval=FALSE}
#PDF
pdf(file="PATH/TO/MYPLOT.PDF",width=10,heigh=8)
print(qplot(readSS,mathSS,data=df,alpha=I(.2)))
dev.off()
# PNG
png(file="PATH/TO/MYPLOT.png",width=1200,heigh=900)
print(qplot(readSS,mathSS,data=df,alpha=I(.2)))
dev.off()
```
- We can also always directly export from RStudio, which gives us a lot of options!
# Exporting data
- Of course, other times we need to export raw data or results of our analyses
- R is very flexible in these cases, and depending on what you want to do, you are probably best served by Googling a specific results
- However, a few functions are indispensible for such work
- These are the `foreign` library, `save`, `write.csv`, and `write.dta`
- Note that R can also write SPSS files, SAS files, and files for just about any statistics program (sometimes even Excel, but CSV is preferred for this purpose)
# Here's an Example
- When you want to save data you can use a simple line of code
```{r writedataout,eval=FALSE}
write.csv(df,file='PATH/TO/MY.csv')
write.dta(df,file='PATH/TO/MY.dta')
# save in the R file
save(df,file='PATH/TO/MY.rda',compress='xz')
```
# Exporting Tables
- Sometimes we need to export a table that is attractive
```{r uglytable}
table(df$female,df$schid)
```
- This is ugly and hard to tell what it means
- We need a better way
# We need XTABLE
- The `xtable` package provides a good way to do this output
```{r xtableexamp1,results='asis',echo=FALSE}
require(xtable)
print(xtable(table(df$ell,df$schid)),type='html')
print(xtable(as.table(cor(df[,26:30],df[,26:30]))),type='html')
```
# How?
```{r xtablecode,eval=FALSE,echo=TRUE}
require(xtable)
print(xtable(table(df$ell,df$schid)),type='html')
```
- A bit annoyingly we have to wrap our table in two functions:
- `xtable` reformats the table to an xtable object
- `print` exports the xtable object to the screen so whatever document processing system we use can grab it
- Lots of objects can be coerced into `xtable` objects, which helps
- `xtable` can be HTML or LaTeX output, giving flexibility for how we build our document
# Other options
- `apsrtable` for beautiful looking regression tables
- `Hmisc` has some nice output functions as well for HTML in particular
- Many specialty packages have their own way of wrapping some mark up language around their output
- We can always export with CSV directly to Excel
# Reproducibility Matters
- Point and click interfaces do not allow work to be reproduced
- Time is wasted when work cannot be repeated and produce the same results
- Not valid and reliable for decision making
# Advanced Topics
- Using `pandoc` or `slidify` to create HTML5 slides in R
- Writing full blown LaTeX reports
- Embedding video and visualization
- Changes CSS styles of reports
# References
1. [How to Use knitr](http://yihui.name/knitr/)
2. [CRAN Taskview: Reproducible Research](http://cran.r-project.org/web/views/ReproducibleResearch.html)
3. [A Sweave Demo](http://users.stat.umn.edu/~geyer/Sweave/)
4. [Donald Knuth on Literate Programming](http://www-cs-faculty.stanford.edu/~knuth/lp.html)
# Session Info
It is good to include the session info, e.g. this document is produced with **knitr** version `r packageVersion('knitr')`. Here is my session info:
```{r session-info}
print(sessionInfo(), locale=FALSE)
```
# Attribution and License
<p xmlns:dct="http://purl.org/dc/terms/">
<a rel="license" href="http://creativecommons.org/publicdomain/mark/1.0/">
<img src="http://i.creativecommons.org/p/mark/1.0/88x31.png"
style="border-style: none;" alt="Public Domain Mark" />
</a>
<br />
This work (<span property="dct:title">R Tutorial for Education</span>, by <a href="www.jaredknowles.com" rel="dct:creator"><span property="dct:title">Jared E. Knowles</span></a>), in service of the <a href="http://www.dpi.wi.gov" rel="dct:publisher"><span property="dct:title">Wisconsin Department of Public Instruction</span></a>, is free of known copyright restrictions.
</p>