2023-04-21
Reproducibility and transparency are essential for research. Tools are quickly evolving that allow combined presentation of narrative documentation, methods, analysis, and results, along with the code that generated the results. Having the code and results within one package is invaluable for checking over results or responding to reviews.
Two main integrated development environments for scientific research are Jupyter Notebooks, used mainly for Python and RStudio Desktop and Server for R.
For this session, we will be using RStudio Desktop with R Markdown to create a self-contained HTML file including code and results of geospatial analyses done within R and PostGIS.
In this session, we will include a brief lecture followed by a live demonstration.
This session follows on previous sessions presented at the CUGOS Spring Fling in years past:
R Markdown is a text format that combines readable narrative and R code for analysis and/or rendering of tables and graphics.
A process diagram (images from R
Markdown Quick Tour) shows that the input .Rmd
file is
converted using knitr
to a .md
(markdown)
file, which is then processed by pandoc
, which produces the
final document format (web page, PDF, MS Word document, slide show,
handout, book, dashboard, package vignette or other format). The author
creates the .Rmd
file and the completed output is generated
in R using the render()
function or by clicking the
Knit
button in the RStudio interface.
The benefit of markdown is that the format is very simple compared to more complex coding languages (e.g., HTML or \(\LaTeX\)). On the left is some R Markdown text and on the right is the rendered web page. Here some examples show formatting for section headings, font emphasis (italics, bold), bulleted lists, monospaced code blocks, \(\LaTeX\) format equations, and footnotes.
R code chunks are written within delimited code regions. The R code
is run during the rendering process, resulting in analytic or other
processes. If the R code chunk creates a graph or table, the output is
placed within the rendered document. For example, here we see a
statistical summary of speed and stopping distance from R’s built-in
cars
data set, as well as a scatter-plot and locally
smoothed regression curve with confidence intervals.
Graphics and tables can also be captioned with automatically generated caption numbers that can also be used in cross-references.
“Inline expressions” can be used to include calculated R outputs within the narrative. Usually these are numerical values calculated from your data.
This allows narrative text to include calculated or data-driven values rather than copy/pasted text that would require manual updating and presents the risk of including incorrect results.
An example paragraph using inline code:
For example, in the cars
data set there are 50
observations; the mean and standard deviation of speed and distance was
15.4 (5.3) miles per hour and 43 (25.8) feet.
The code for this paragraph is:
For example, in the cars
data set there are
`r
nrow(cars)` observations;
the mean and standard deviation of speed
and distance was
`r cars %>% pull(speed) %>% mean() %>%
round(1)`
(`r cars %>% pull(speed) %>% sd() %>%
round(1)`)
miles per hour and
`r cars %>% pull(dist) %>%
mean() %>% round(1)`
(`r cars %>% pull(dist) %>% sd()
%>% round(1)`) feet.
The first few lines of a .Rmd
file contains metadata
about the document as well as specifications for output formatting
options. For example, for this slide show the following:
‑‑‑
title: "R Markdown for OSGEO"
author: "[Phil Hurvitz](mailto:phurvitz@uw.edu
)"
date: '`r format(Sys.time(), "%Y-%m-%d %H:%M")`'
output:
slidy_presentation:
css: ['styles.css', 'https://fonts.googleapis.com/css?family=Open+Sans']
‑‑‑
The YAML header specifies the output as
slidy_presentation
; however, other options can specify a
multitude of output formats and formatting characteristics. Usually only
minor changes in the .Rmd
code are needed to switch from
one output format to another.
Some other examples from https://rmarkdown.rstudio.com/authoring_quick_tour.html:
‑‑‑
—
title: "Sample Document"
output:
pdf_document:
toc: true
highlight: zenburn
—
‑‑‑
‑‑‑
—
title: "Sample Document"
output:
html_document:
toc: true
theme: united
pdf_document:
toc: true
highlight: zenburn
—
‑‑‑
Another example, uses a mailto
hyperlink for my name,
automatically adds render date and time
‑‑‑
title: "GIS examples"
author: "[Phil Hurvitz](mailto:phurvitz@uw.edu
)"
date: ‘`r format(Sys.time(), "%Y-%m-%d %H:%M")`’
header-includes: #allows you to add in your own Latex packages
-
\usepackage{float} #use the ‘float’ package
-
\floatplacement{figure}{H} #make every figure with caption = h
output:
bookdown::html_document2:
number_sections:
true
self_contained: true
code_folding: hide
toc: true
toc_float:
collapsed: true
smooth_scroll: false
pdf_document:
number_sections:
true
toc: true
fig_cap: yes
keep_tex: yes
urlcolor: blue
‑‑‑
The .Rmd
file can be rendered to different outputs using
the Knit
control in RStudio:
Or by using the R command line, e.g., for this presentation:
rmarkdown::render ("rmarkdown_for_osgeo_hurvitz_20230222.Rmd")
Because the default output format in the YAML header is
slidy_presentation
, the .Rmd
file is
automatically rendered to a .html
HTML5 file as a Slidy
presentation.
Now that we have covered the basics of R Markdown, we will shift to a live demonstration using OSGEO.
Switch to CSDE terminal server
Rendered version: CUGOS 2023 Hurvitz
We covered:
R:
\(+\) A great framework for
programming, analysis, and generating readable content that includes
source code (great for research)
\(+\) Relatively easy to read and write
source code.
\(+\) Stand-alone
system
\(-\) Processes run in
memory = a problem for large data sets. \(-\) No spatial indexes?
PostGIS:
\(+\) High power,
great for large data sets.
\(+\)
Data and functions within the same system.
\(+\) Can be managed within the R framework
with DBI
.
\(+\)
Spatial indexes!
\(-\) Another
software layer to manage.
\(-\)
Requires server (but server and client can be the same
machine).