What you need to know to stat your ML Journey
Machine Learning fundamentals with R course is divided into three main theoretical parts: tree based modelling, k-nearest neighbours and neural networks. Two additional sub-topics are discussed: ensemble methods and support vector machines. A walkthrough for the R script that includes tree-based modelling, neural networks and ensemble methods is discussed in this course. The dataset used is from R package “breastCancerNKI” based upon data from Van de Vijver et al (2002) and van't Veer et al. (2002). This data had 24,481 genes as variables and 337 patients as observations. One particular outcome, cancer metastasis “e.dmfs”, was the focus of this example. Many numbers in the data was missing so any gene that had missing values was removed and any patient that did not have the metastasis outcome was also removed. The final complete data that this example was based upon contained 14,319 genes and 319 patients with 210 that did not develop a distal metastasis and 109 that indeed developed metastasis. The scripts for k-nearest neighbours and support vector machines are also available to be self-studied. It is highly encouraged to try to apply the code given on different datasets to make sure that you fully understand the concepts and can apply them independently. Nobody has ever learned to code by just looking at code.