VERSION WARNING

This tutorial was written using the kohonen package version 2.0.19. Some of the code will not work in the most recent version of this package. To install 2.0.19, run the following:

packageurl <- "https://cran.r-project.org/src/contrib/Archive/kohonen/kohonen_2.0.19.tar.gz"
install.packages(packageurl, repos = NULL, type = "source")

I hope to update all of the SOM tutorials to run properly on kohonen v3 in the near future.

Inroduction

Self Organizing Maps (SOMs) are a tool for visualizing patterns in high dimensional data by producing a 2 dimensional representation, which (hopefully) displays meaningful patterns in the higher dimensional structure. SOMs are “trained” with the given data (or a sample of your data) in the following way:

  • The size of map grid is defined.
  • Each cell in the grid is assigned an initializing vector in the data space.
    • For example, if you are creating a map of a 22 dimensional space, each grid cell is assigned a representative 22 dimensional vector.
    • Initiation can either be random or following specific methods.
  • Data are repeatedly fed into the model to train it. Each time a training vector is entered, the following process is undertaken:
    • The grid cell with the representative vector that is closest to the training vector is identified.
    • All of the representative vectors of grid cells nearby the identified one are slightly adjusted towards the training vector.
  • Several parameters of convergence force the adjustments to get smaller and smaller as training vectors are fed in many times, causing the map to stabilize into a representation.

The key feature this algorithm gives to the SOM is that points that were close in the data space are close in the SOM. Thus SOMs may be a good tool for representing spatial clusters in your data.

Kohonen Mapping Types

require(kohonen)
require(RColorBrewer)

The Kohonen package allows for quick creation of some basic SOMs in R. Our examples below will use player statistics from the 2015/16 NBA season. We will look at player stats per 36 minutes played, so variation in playtime is somewhat controlled for. These data are available at http://www.basketball-reference.com/. We’ve already cleaned the data. Kohonen functions will require using numeric fields with no missing entries.

library(RCurl)
NBA <- read.csv(text = getURL("https://raw.githubusercontent.com/clarkdatalabs/soms/master/NBA_2016_player_stats_cleaned.csv"), 
    sep = ",", header = T, check.names = FALSE)

Basic SOM

Before we create a SOM, we need to choose which variables we want to search for patterns in.

colnames(NBA)
##  [1] ""       "Player" "Pos"    "Age"    "Tm"     "G"      "GS"    
##  [8] "MP"     "FG"     "FGA"    "FG%"    "3P"     "3PA"    "3P%"   
## [15] "2P"     "2PA"    "2P%"    "FT"     "FTA"    "FT%"    "ORB"   
## [22] "DRB"    "TRB"    "AST"    "STL"    "BLK"    "TOV"    "PF"    
## [29] "PTS"

We’ll start with some simple examples using shot attempts:

NBA.measures1 <- c("FTA", "2PA", "3PA")
NBA.SOM1 <- som(scale(NBA[NBA.measures1]), grid = somgrid(6, 4, "rectangular"))
plot(NBA.SOM1)