The R programming language provides a suite of useful tools for performing data science tasks. R enables data analysts to supercharge their analysis workflow by producing reproducible analyses and reports in a highly readable syntax on tabular data. Building on the fundamentals introduced in R for Data Science I, trainees will learn to produce publication-ready data visualizations, interact with databases, simplify data manipulation analysis code and analyze text data with tidyverse packages.
What Will I Learn From This Course?
A laptop with the RStudio IDE and R (version 3.2 and above) installed is ideal. R’s system requirements are minimal, however we recommend a system with 4GB RAM or more.
We highly recommend that trainees install the tidyverse packages before coming to class with install.packages(“tidyverse”) Trainees may also use RStudio Cloud, an online hosted instance of R designed for data science training, provided a stable internet connection at the venue can be guaranteed.
R for Data Science I
Course Outline for This Programme
- Theming options
- geom_text, geom_label
Refine your ggplot2 plots to be publication-ready. Learn about how factors relate to ggplot2 visuals, theming and labelling options in ggplot2, and annotations with the text and label geoms.
- two-table verbs
- writing a function
Learn about advanced functions in the dplyr package: (1) operations between more than one data frame; (2) connecting to databases using the familiar dplyr syntax with the dbplyr package; and (3) writing functions with dplyr verbs to repeat regular analyses.
- stringr functions
- basic regex
- application: bag-of-words
Learn to perform basic string tasks with the stringr package. This section introduces the concept of list columns and how it applies to bag- of-words analysis, a rudimentary technique in natural language processing (NLP).
Students will apply the skills learned in this course to a simple dataset and produce a report in R Markdown.