### Walkability via Census

August 18th, 2014
[html]
If you want to know how walkable various parts of the country are, and you only have (1) the census and (2) walk scores for DC, what could you do? Well, you could look for census variables that correlate with walk scores in DC and then extrapolate these correlations to the rest of the country to get a national map:

This is pretty cool! But let's have a look at these correlations:

 % 16 and older employed in civilian labor force 0.636261 % 25-34 years old 0.630423 % females 16 and older employed in civilian labor force 0.594394 % 16 and older in civilian labor force 0.578812 Nonrelatives in household 0.561472 % 16 and older in labor force 0.545052 % 16 and older commuting to work by other means 0.542523 % females 16 and older in civilian labor force 0.5328 % 18 years and younger 0.530061 % born in state of residence -0.529451 houses built 1939 or earlier 0.528639 workers 16 and older commuting to work by other means 0.527117 workers 16 and older walking to work 0.525601 % females 16 and older in labor force 0.5229 % with at least a bachelors degree 0.522021 % 10-14 years old -0.519463 % 16 and older driving to work alone -0.507589 population 25-34 years old 0.507211 % 16 and older not in labor force -0.502902
The first thing that jumps out is that I really wish they had run a round of principal component analysis to cut down the number of variables. Many of these are probably just multiple ways of saying the same thing:
• employment
• % 16 and older in labor force
• % 16 and older in civilian labor force
• % 16 and older employed in civilian labor force
• % 16 and older not in labor force
• % females 16 and older in labor force
• % females 16 and older in civilian labor force
• % females 16 and older employed in civilian labor force
• population, after considering %-version of the same metric
• population 25-34 years old
• workers 16 and older commuting to work by other means
• age
• % 10-14 years old
• % 18 years and younger
• % 25-34 years old
• population 25-34 years old
• commuting
• % 16 and older driving to work alone
• workers 16 and older walking to work
• % 16 and older commuting to work by other means
• workers 16 and older commuting to work by other means
What I'm saying is that if you ran PCA to look for correlations between these different variables, I strongly suspect you'd find that after taking into account new variables representing employment, population, age, and commuting the remaining related variables would provide very little additional information. But I don't have easy access to the raw data, so I'll be lazy and just approximate this by taking the variable in each category with the strongest correlation and just keeping that one. (We would expect that after PCA our new super-variables would have a bit better correlation, but still it would be about like this.):
 % 16 and older employed in civilian labor force 0.636261 % 25-34 years old 0.630423 Nonrelatives in household 0.561472 % born in state of residence -0.529451 houses built 1939 or earlier 0.528639 workers 16 and older walking to work 0.525601 % with at least a bachelors degree 0.522021
Some of these variables, like people walking to work or whether houses were built before cars became popular, do seem very likely to correlate with walkability wherever you are in the country. On the other hand, most of the other variable instead look to me like they're measuring "who in DC lives in walkable areas". We see a bunch of employed college-educated young people from out of state living together. I can totally believe that the more like that someone is the more they're likely to value walkability in choosing where to live, and the more likely they are to be able to afford it. But if we're trying to identify walkable areas in the rest of the country this is going to miss out on a lot of other ways a place can be walkable.

This actually surprises the author somewhat; after taking affordability into account and seeing what places had the best combination of walkability and affordabilty by their metric they wrote:

I was expecting something like a smaller, affordable Midwest town or something, but it the highest scoring areas were usually just outside of major downtown
It looks to me like what's going on is that the kind of people who want to live in the walkable parts of DC are not the kind of people who want to live in smaller Midwest towns, even if those towns are super walkable.

(On the other hand, this might actually be a better metric for the author's purposes. They're really trying to figure out where they would enjoy living, and they think they want walkable neighborhoods. But having a lot of people 25-35 and a lot of strangers living together, along with high employment and education, are probably also things they care about. A retirement community in rural Tennessee might be both very walkable but it's probably not what they had in mind.)

### Recent posts on blogs I like:

Recently a security hole in a certain open source Java library resulted in a worldwide emergency kerfuffle as, say, 40% of the possibly hundreds of millions of worldwide deployments of this library needed to be updated in a hurry. (The other 60% also …

via apenwarr January 1, 2022

#### The container throttling problem

This is an excerpt from an internal document David Mackey and I co-authored in April 2019. The document is excerpted since much of the original doc was about comparing possible approaches to increasing efficency at Twitter, which is mostly information tha…

via Posts on December 18, 2021

#### Experiences in raising children in shared housing

Sometimes I see posts about people’s hope to raise children in a group housing situation, and it often seems overly optimistic to me. In particular they seem to expect that there will be more shared childcare than I think should be expected. Today I talke…

via The whole sky October 18, 2021