R Programming (Beginner – Intermediate)

The R programming language provides a suite of useful tools for performing data science tasks.

Free

James Hong

R Programming (Beginner – Intermediate)

3 days
All levels
0 lessons
0 quizzes
0 students

Course Info

The R programming language provides a suite of useful tools for performing data science tasks. R enables data analysts to supercharge their analysis workflow by producing reproducible analyses and reports in a highly readable syntax on tabular data. Trainees will learn to extract data from common file formats, define the appropriate transformations for their analysis and produce informative visualizations using the ggplot2 package.

What Will I Learn From This Course?

Use the ggplot2 package to explore distributions and relationships in a dataset.

Use the dplyr and tidyr packages to manipulate data into a format amenable for analysis.

Understand issues related to importing CSV and Excel file formats and how to overcome these issues.

Pre-requisite

A laptop with the RStudio IDE and R (version 3.2 and above) installed is ideal. R’s system requirements are minimal, however we recommend a system with 4GB RAM or more. https://support.rstudio.com/hc/en- us/articles/201853926-What-are-your-system- recommendations-for-the-RStudio-IDE-

We highly recommend that trainees install the tidyverse packages before coming to class with install.packages(“tidyverse”) Trainees may also use RStudio Cloud, an online- hosted instance of R designed for data science training, provided a stable internet connection at the venue can be guaranteed.

Pre-requisite

Familiarity with high school statistics is assumed. Prior programming experience will be helpful, but not essential.

Course Outline for This Programme

– Data types: integer, logical, double, string
– dplyr: filter, arrange, mutate

Data frames are a common representation for tables in R. Learn about the structure of a data frame, acceptable data types for data frame columns and how to filter and arrange rows and modify columns of a data frame using the dplyr package.

– geoms and aesthetics
– facets
– labels

The grammar of graphics defines a consistent vocabulary for declaring visuals that is easily extensible to the creation of complex graphics. The ggplot2 package provides the capability to define visuals using this consistent vocabulary for declaring graphics. Learn to explore distributions and relationships between categorical and quantitative data types using the ggplot2 package.

– select
– group_by
– vector functions (log, operators, ifelse(), etc.)
– summary functions (mean, median, etc.)
– variants (distinct, count, etc.)

Building on data frames, this section expands on the grammar for data manipulation defined by the dplyr package.

– pivot_wider, pivot_longer
– unite, separate

Data must be properly formatted for visualization and modelling tasks. Learn about the tidy data framework and how data can be transformed ‘wider’ and ‘longer’ with the tidyr package.

– CSV, XLSX
– Use RStudio interface to generate import codes
– Aware of issues with CSV files, fixes
– Aware of spreadsheet organization best practices for working in R

CSV and Excel file formats are commonly encountered in analysis. This section explores how to import data in RStudio and common issues encountered when processing data in these file formats.

Students will apply the skills learned in this course to a simple dataset and produce a report in R Markdown.

Curriculum is empty
James Hong

Background – Microsoft .NET Framework

James has more than 10 years of software development experience working in a software house. He has developed multiple solutions on different platforms, including Web, Desktop, and Server. He has started developing solutions using the Microsoft .NET Framework technology since 2002. He has also developed and implemented Document Management Solutions with document routing, Construction Site Safety Processing Solutions for Government Agencies, Human Resource Management Assessment Solution on Balance Score Card and Recruitment Assessment.

Background – Microsoft SQL Server

James has substantial experience using Microsoft SQL Server version 6.5 for RDBMS solution development and deployment. He has also developed multiple Business Intelligence solutions using Microsoft SQL Server BI components since 2002. Microsoft SQL Server is his first choice on any solution development, like Digital Bill Conversion for Telco Corporate Users Solution, Human Resource Management Assessment Solution on BI for Performance Analysis, etc.

Background – Microsoft AZURE

James has trained and consulted AZURE solution for 5 years. He has been following azure technology closely. Since 2019, he was also one of the trainers pointed to deliver the AZURE training content for the MICROSOFT AZURE Training project in APEC. He has helped to deliver the content to MICROSOFT clients like AXA (Hong Kong), P&G (Philippine), SAP SA (China) etc. He also provides numerous training to T System (Malaysia). He also trains and consults AZURE Solution Architect, AZURE Data Science Solution and AZURE Service Development.