---
title: "Using CrossVA and openVA"
author: "Jason Thomas"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using CrossVA and openVA}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 7, fig.align = "center")
library(CrossVA)
library(openVA)
```

This vignette provides several examples of preparing verbal autopsy (VA) data
using the CrossVA function `odk2openVA` and then assigning a cause of death (CoD)
using openVA.  CrossVA is designed to work with VA data collected using a
questionnare with the same format as the 2016 VA instruments developed by the
World Health Organization (WHO) -- versions 1.4.1 and 1.5.1 of the 2016 WHO VA
instrument and the final version of the 2014 WHO VA instrument are currently supported.

# Example R Sessions

CrossVA has a wrapper function with an option to specify the WHO instrument used to
collect the data: 

* `odk2openVA` -- convert 2014 & 2016 forms to work with insilico(data.type =
   "WHO2016") and InterVA5

This function accepts an export from an Open Data Kit (ODK) Aggregate server as
its input.  The exported records need to be a comma-separated-values (CSV) file.


Before we load our data and assign CoD, we need to know the path of the folder
that contains the data.  Sometimes it is useful to change R's current working
directory to the folder where the data files is located:

```{r setCWD, eval = FALSE}
# Print the current working directory
getwd()
#> [1] "C:/Users/LeoMessi/"

# Change your current working directory as follows
setwd("C:/Users/LeoMessi/Verbal-Autopsy")
getwd()
#> [1] "C:/Users/LeoMessi/Verbal-Autopsy"

# Print the files in your current working directory with the dir() function
# (the CSV file you created with ODK Briefcase should be listed)
dir()
#> [1] "vaData_2016.csv"    "vaData_2012.csv" "rCode_for_cleaning_vaData.R"
#> [2] "vaData_report.pdf"  "cat_Videos"

# Load data into R
odkexport <- read.csv("vaData_2016.csv", stringsAsFactors = FALSE)
```

If you prefer to keep your data in one folder and your R code in another
directory, it is possible to use the path to the CSV data file exported by ODK
Briefcase -- this is used below with the example data file who151_odk_export.csv.

## Analysis of 2016 WHO Verbal Autopsy Instrument

Again, CrossVA currently supports version 1.5.1 and 1.4.1 of the 2016 WHO VA
questionnaire. If the input data frame (from odk) includes a column containing 
the string: "age_neonate_hours", then the function assumes the questionnaire 
version is 1.4.1 (and assumes version 1.5.1 if the string is not located).

### Questionnaire version 1.5.1

```{r loadData, eval = FALSE}
# Start by loading the CrossVA and openVA packages from your library
library(CrossVA)
library(openVA)

# Load the CSV from ODK Briefcase (here we use the example data file from the CrossVA package)
## fileNames_v151 contains the path to the example data file
fileName_v151 <- system.file("sample", "who151_odk_export.csv", package = "CrossVA")
fileName_v151
#> [1] "C:/Users/LeoMessi/R/win-library/3.5/CrossVA/sample/who151_va_output.csv"

dir("C:/Users/LeoMessi/R/win-library/3.5/CrossVA/sample/")
#> [1] "who151_odk_export.csv"    "who141_odk_export.csv"    "who2014_odk_export.csv"

# Use the read.csv() function to load the data
odkexport_v151 <- read.csv(fileName_v151, stringsAsFactors = FALSE)
```

```{r silentLoadva151, include = FALSE}
fileName_v151 <- system.file("sample", "who151_odk_export.csv", package = "CrossVA")
odkexport_v151 <- read.csv(fileName_v151, stringsAsFactors = FALSE)
```

Now that the CSV file is loaded into R, we can use CrossVA's function
`odk2openVA` to convert the CSV file into the proper format (i.e., a data frame
with 354 columns).

```{r runCrossVA}
# Convert VAs using the odk2openVA() function
## we will be able to use either InterVA5 or insilico(data.type = "WHO2016") to assign CoD
openva_input_v151 <- odk2openVA(odkexport_v151)

# For 2016 WHO VA instrument, the output needs to have 354 columns (1 ID + 353 symptoms)
dim(openva_input_v151)

# ID must be the first column
names(openva_input_v151)
```

Now that the VA records have been converted into the expected format, we can use
the tools in the openVA package to analyze the data.  There are separate
functions for each algorithm: `InterVA`, `InterVA5`, and `insilico`.  For your
convenience, openVA also includes a wrapper function, `codeVA`, which call any
of these algorithms to assign CoD.

```{r InterVA5}
# InterVA5
run1 <- InterVA5(openva_input_v151,
                 HIV = "l",
                 Malaria = "l",
                 write = TRUE,
                 directory = getwd())

# We could also use codeVA() to get the same results:
## run1 <- codeVA(openva_input_v151,
##                data.type = "WHO2016",
##                model = "InterVA",
##                version = "5.0",
##                HIV = "l",
##                Malaria = "l",
##                write = TRUE,
##                directory = getwd())
```

By default the parameter `write = TRUE`, which requires that we pass an argument
to `directory` -- the folder where the log file is created.  The log file
includes information about the VA records that are excluded from the analysis
(usually because they have a missing value for age and/or sex) as well as any
changes made to ensure the indicators are consistent with each other.  We can
use the following commands to summarize the results.

```{r InterVA5-summary}
# List the top 5 causes in the Cause-Specific Mortality Fraction (CSMF)
summary(run1)

# We can list more causes with the top parameter.
summary(run1, top = 10)

# Create a bar plot of the CSMF.
plotVA(run1)

# InterVA5 will also write an CSV file, called VA5_result.csv, with the CoDs for each record.
# Also note that InterVA5 created the log file, errorlogV5.txt
dir()
```

We can also assign CoDs using the InSilicoVA algorithm.

```{r InSilico}
run2 <- insilico(openva_input_v151, data.type = "WHO2016")

## run2 <- codeVA(openva_input_v151,
##                data.type = "WHO2016",
##                model = "InSilico",
##                version = "WHO2016")

# Print CSMF for top 6 causes
summary(run2, top = 6)

# Plot CSMF
plotVA(run2)
```

### Questionnaire version 1.4.1

```{r example141}
# If you have not run the previous code, make sure you have loaded the packages
# library(CrossVA)
# library(openVA)
fileName_v141 <- system.file("sample", "who141_odk_export.csv", package = "CrossVA")
odkexport_v141 <- read.csv(fileName_v141, stringsAsFactors = FALSE)

## Since odkexport has a column name that includes "age_neonate_hours"
## odk2openVA will assume the questionnaire version is 1.4.1
col_age_neonate_hours <- grep("age_neonate_hours",
                              tolower(names(odkexport_v141)))
col_age_neonate_hours
names(odkexport_v141)[col_age_neonate_hours]

# Convert VAs using the odk2openVA() function for version 1.4.1
## we will be able to use either InterVA5 or insilico(data.type = "WHO2016") to assign CoD
openva_input_v141 <- odk2openVA(odkexport_v141)
dim(openva_input_v141)

# Assign CoD with model = InterVA5 and codeVA
run3 <- codeVA(openva_input_v141,
               data.type = "WHO2016",
               model = "InterVA",
               version = "5.0",
               HIV = "l",
               Malaria = "l",
               write = TRUE,
               directory = getwd())

## Summarize InterVA5 results
summary(run3)
plotVA(run3)

# Assign CoD with model = InSilico and codeVA
run4 <- codeVA(openva_input_v141,
               data.type = "WHO2016",
               model = "InSilicoVA")

## Summarize InSilicoVA results
summary(run4)
plotVA(run4)
```