January 29, 2010

Projections, defense (Baseball?! Wut?)

I've been dumping most of my baseball thoughts/opinions/etc over at ACB for a while now, but I'm still hopeful that I can dump a thought or two that I'm chewing on over here from time to time. I do like to make grandiose plans for baseball stuff from time to time but things (math, dogs, processed cheese balls, general laziness) tend to change my plans. Something I'm still planning is making a project-a-tron of my own. Just off the top of my head, this would require me trying to figure out things like, off the top of my head,


  • Minor league, college, and international player stat translations

  • Aging patterns (relative to specific players)

  • Platoon splits + usage

  • Reliability + variance of baseball stats

  • Defense-neutral pitching

  • Park effects

  • Defensive models (more on this below)

  • Database management for managing all this data

That's a lot of stuff, for sure, and I'm sure it just scratches the surface. It's certainly too much for me to do anything for 2010! I'll probably say a thing or two about all the players during spring training like I usually do, mostly based off the publicly available projections at fangraphs plus the just-released 2010 PECOTAs, which I now have access to.
Anyway, here are the two things on my mind right now

1. If I do create a project-a-tron, I have to make sure it has a name first (which is obviously the most important thing). The custom seems to be to name it after some sort of scrappy, not necessarily terrible middle infielder from their favorite team (see pecota, chone, cairo, etc). It seems like people have eschewed actually making an acronym from these names now. I know Shawn has already staked a claim on 'Shawon', who would qualify in that he was both a Cub and a medicore middle infielder, but we has before my time in Cubs fandom anyway. I'm leaning towards breaking this tradition though, and naming it after just a player I'm a fan of (see Sosa, Zambrano, Prior). So I pose this question to my ones of readers: what do you guys think? I'm leaning toward the Z's myself.

2, Colin Wyers just published his first article ($) at Baseball Prospectus about the new defensive metric that he is rolling out. I'm excited by the fact that he's going pretty slow with this, since I've never really looked into these metrics and it will be a good opportunity for learning about them. I've found the comments pretty amusing on that article, though. My view of modeling anything (defense, etc.) is that it's precisely that, a model. Yes, you can add more things, but it's important to make sure that the absolute basics of it make sense, which is exactly what Colin seems to be doing.
But the kind of complaints people seem to be giving are pretty dumb, or at least obvious. Most of the complaints have to do with
  • Defensive positioning (especially shifts v LHB)
  • Hard-hittedness of balls

These are valid questions, and should be included in the Perfect Defense Model™, but it seems foolish to complain about the lack of them when we DON'T HAVE ANY DATA ON THESE THINGS (yet). When hit f/x and field f/x roll out, this stuff can be incorporated into models. But for now, we don't have the data, and you have to build your model based on the available data you have.
One caveat: at least in recent years, there has been more classification on batted balls (i.e. stuff hit 'sharply', 'softly', etc. I'm not really a big fan of this kind of binning though. It's pretty subjective, and I'd rather just wait until we have hit f/x. Not to mention that Colin has pointed out that there's already a significant error in batted ball data as to whether hits are line drives, grounders, or fly balls.

-b

January 26, 2010

Belated christmas present post



My wife is awesome