In a series of entries on work automation, I would like to focus on discussing examples of improving recurring tasks. I will try to present solutions to the same problems using various tools (like R, Python, VBA, etc.). Today’s entry concerns the automation of a simple process using R.

If you often deal with cyclical tasks such as download data > summarise data > prepare Word document > send an email and do not do it in an automated way, then this entry is for you!


Source code is available on my GitHub.

# Packages:
  
require(magrittr) # pipe
require(rmarkdown) # rendering documents
require(jsonlite) # JSON processing
require(ggplot2) # plots
require(readr) # file processing
require(mailR) # e-mail

Assume that the boss or client would like to receive a mini-analysis every day regarding changes in gold prices. Necessary calculations as a percentage change relative to the previous day and the first day of the year, or the chart date ~ price. Can the time devoted to this task be shortened to one-click? Let’s find out…

The task can be reduced to a simple diagram shown in the figure below:

STEP 1. DOWNLOAD DATA

Downloading data is easy because the NBP provides API.

# fn is attachment
fn <- paste0("tmp/prices_", format(Sys.Date(), '%Y_%m_%d'), ".csv")
dir.create("tmp", showWarnings = FALSE)

# data from the beginning of the year
prices <- paste0('http://api.nbp.pl/api/cenyzlota/',
                 format(Sys.Date(), '%Y'), '-01-01/', 
                 format(Sys.Date(), '%Y-%m-%d'),
                 '?format=json') %>%
  fromJSON %>%
  set_colnames(c('date', 'price')) %T>%
  write.csv2(fn, row.names = FALSE)

STEP 2. PREPARE REPORT

I use the rmarkdown package to generate a report and email content. For this purpose, I have prepared a parameterized template in the Rmd format, which is available here.

At the beginning of the document, I define the parameter names passed from outside. prices is a table downloaded using the API. is_html informs if the content of the mail (Html) or the report in Docx format is generated.

---
...
params:
  prices: "prices"
  is_html: "is_html"
---

There is a fragment responsible for formatting the text describing changes in gold prices. When the percentage change is positive, the appropriate fragment will be colored green. In the case of a negative value, the relevant part of the text will be colored red.

n = nrow(params$prices)

dn = (params$prices$price[n] - params$prices$price[n-1])/params$prices$price[n-1]

if (dn > 0) {
  dn_desc = '**<font color="green">increased by</font>**'
} else if (dn < 0) {
  dn_desc = '**<font color="red">decreased by</font>**'
}
dn %<>% formatC(digits = 4, format = "f")
dn_desc %<>% paste(dn, "% compared to")

if (params$prices$price[n] == params$prices$price[n-1]) {
  dn_desc = "has not changed in relation to"
}

d1 = (params$prices$price[n] - params$prices$price[1])/params$prices$price[1]

if (d1 > 0) {
  d1_desc = '**<font color="green">increased by</font>**'
} else if (d1 < 0) {
  d1_desc = '**<font color="red">decreased by</font>**'
}
d1 %<>% formatC(digits = 4, format = "f")
d1_desc %<>% paste(d1, "% compared to")

if (params$prices$price[n] == params$prices$price[1]) {
  d1_desc = "has not changed in relation to"
}

The R code can be placed in the text.

The current gold price is **```r params$prices$price[n]```** PLN (```r params$prices$date[n]```).

Sometimes, placing R code into text is very difficult (especially when you want to use conditional statements or loops). For this reason, set the chunk as follows {r, results = 'asis'} and use the cat function. The generated content will be formatted as plain text.

cat("The price ", dn_desc, " the price from ", 
     params$prices$date[n-1], " (", params$prices$price[n-1], " PLN) ",
     "and ", d1_desc, " the price from the beginning of the year ",
     "(", params$prices$price[1],  " PLN).", sep = "")

All that remains is to insert the image into the document. A conditional instruction has covered this fragment due to the way the mail works. By default, the rmarkdown package renders all images to base64 format (regardless of how we save them in the document). The problem is that the mail client does not allow sending such a form - we have to send images as attachments. Because of this only after rendering the HTML document, I replace the fragment %%plot%% with the tag <img>.

if (params$is_html) {
  cat('%%plot%%')
} else {
  ggplot(prices, aes(as.Date(date), price)) + 
    geom_point() + 
    xlab("date") + 
    ggtitle(paste0("Gold prices in ", format(Sys.Date(), "%Y")))
}

Rendering of documents.

# html
render(input         = "template.Rmd",
       output_file   = "email.html",
       output_format = "html_document",
       params        = list(prices = prices,
                            is_html = TRUE),
       encoding      = "utf-8")

# %%plot%% replacement
read_file("email.html") %>%
  gsub("%%plot%%", '<img src="tmp/plot.png">', ., fixed = TRUE) %>%
  readr::write_file("email.html")

# docx
render(input         = "template.Rmd",
       output_file   = "report.docx",
       output_format = "word_document",
       params        = list(prices = prices,
                            is_html = FALSE),
       encoding      = "utf-8")

STEP 3. SEND E-MAIL

Sending e-mail is based on the mailR package. Configuration can be complicated (unless someone uses such solutions as a docker) because it works based on Java. It probably requires a version no newer than 8.

email <- send.mail(from         = email_from, # config file
                   to           = email_to, # config file
                   subject      = paste0("NBP > GOLD PRICES > ", format(Sys.Date(), '%Y-%m-%d')),
                   body         = "email.html",
                   encoding     = "utf-8",
                   html         = TRUE,
                   smtp         = smtp_config, # config file
                   inline       = TRUE,
                   attach.files = c(fn, "report.docx"),
                   authenticate = TRUE,
                   send         = FALSE,
                   debug        = TRUE)

email$send()

STEP 4. JOB AUTOMATION

All you have to do is run the script. Now you can say that with “one-click” you perform the entire process presented in the initial diagram. As a result, we receive such a message:

In the end, you can use the Linux CRON or Windows task manager, and you will not even need one-click.

BONUS

For those familiar with the docker, I have prepared a Dockerfile with a defined work environment. In this way, you can omit the configuration part of the packages and run the script using the command docker run image_name. The image can be built for a long time and is not the optimal size, but that’s not what this entry was about.

FROM r-base:3.5.0

RUN apt-get update
RUN apt-get install -y --fix-missing openjdk-8-jdk libcurl4-openssl-dev
RUN rm /var/lib/apt/lists/* -R

RUN R CMD javareconf

RUN Rscript -e "install.packages('jsonlite')"
RUN Rscript -e "install.packages('rmarkdown')"
RUN Rscript -e "install.packages('readr')"
RUN Rscript -e "install.packages('curl')"
RUN Rscript -e "install.packages('ggplot2')"
RUN Rscript -e "install.packages('rJava')"
RUN Rscript -e "install.packages('mailR')"

RUN wget https://github.com/jgm/pandoc/releases/download/1.19.2.1/pandoc-1.19.2.1-1-amd64.deb
RUN dpkg -i pandoc-1.19.2.1-1-amd64.deb

COPY scripts /scripts

WORKDIR /scripts

CMD Rscript download_prepare_send.R