Individual Exercise

  1. Take a grouped dataset (observations within states, individuals, regions, years, etc.) that is of interest to you and use reshape2 to obtain group means and standard deviations, and plyr to conduct a no pooling analysis of a response variable. Report the estimated coefficients and standard errors in dataframes.
## example w/ baseball data
library(plyr)
library(reshape2)
bb <- baseball; bb$team <- NULL; bb$lg <- NULL
bb_molt <- melt(bb, 'id')
bb_mean <- dcast(bb_molt, id ~ variable, mean)
bb_sd <- dcast(bb_molt, id ~ variable, sd)
  1. Using the Broken Function.R script on Sakai, work with browser() and the debugger to find all the mistakes in the function. Fix them so that the last two lines of the script return a vector of zeroes. Include the corrected function in your notebook and demonstrate that it works.
## correct function 
index.means <- function(x, rows = T) {
  
  if (class(x) != 'data.frame' && class(x) != 'matrix') stop('needs two dimensional input')
  simple.mean <- function(x) sum(x) / length(x)
  
  if (rows == T) {
    
    output <- numeric(nrow(x))
    for (i in 1:nrow(x)) output[i] <- simple.mean(x[i, ])
    
  } else {
    
    output <- numeric(ncol(x))
    for (i in 1:ncol(x)) output[i] <- simple.mean(x[, i])
    
  }
  
  output
  
}

## generate fake data
mat <- matrix(rgamma(200, 2, 3), 20, 10)

## these both yield vectors of zeroes (+/- floating point errors)
index.means(mat) - apply(mat, 1, mean)
##  [1]  0.0e+00  1.1e-16 -5.6e-17  0.0e+00  0.0e+00 -1.1e-16  5.6e-17
##  [8]  0.0e+00  0.0e+00 -1.1e-16  0.0e+00  2.2e-16  0.0e+00  0.0e+00
## [15]  1.1e-16  0.0e+00  0.0e+00 -5.6e-17  0.0e+00  0.0e+00
index.means(mat, rows = F) - apply(mat, 2, mean)
##  [1] -5.6e-17  0.0e+00 -1.1e-16  0.0e+00  1.1e-16  0.0e+00  0.0e+00
##  [8]  0.0e+00  0.0e+00  0.0e+00