Industrial clusters in the Western U.S.

Andrey Kamenov, Ph.D. Probability and Statistics

In today’s post we focus on finding patterns in the industry sector employment data for the Western states. Of course, there should be a significant difference between California and the mountain states, but what about the other states?

Of course, the data we’re dealing with is multidimensional. We limit the depth of our study to just the major industry sectors. Even then, that’s 19 variables for each county – a significant amount of data.

The first technique that we are going to use is geographically constrained clusterization. There are several different methods, and in our opinion, the best results are provided by the Arisel method (proposed by Duque and Church in 2004).

The map above shows four industrial clusters that can be found in the region. The majority of the population (60 percent) lives in the “B” cluster (encompassing most of California, as well as four Arizona counties). Clusters A and D have slightly less than 20 percent of the total employment in the region each, with less than 2 percent remaining for the C cluster.

The interactive map tooltip will show you the most and least popular industry sectors for each cluster (compared to the national averages). For example, the green cluster has significantly higher-than-average employment in Mining, but lower-than-average in Manufacturing. The latter sector is not very prominent in the purple cluster as well – Accommodation and Food Services top the list here.

Another good way to visualize the data is t-SNE dimensionality reduction. It can show us how different the cluster business profiles are by approximating the difference with distance between points on a scatter plot.

pop_clusters_arisel_compact_W

The first noticeable thing from the chart above is that red and blue clusters are actually quite similar. The purple cluster has the most tightly clustered profile here, while the green one is actually quite diverse.

Source(s):

Reference(s):

  • van der Maaten, L.J.P.; Hinton, G.E. Visualizing High-Dimensional Data using t-SNE. Journal of Machine Learning Research 9:2579-2605, 2008.
  • Duque, J.C.; Dev, Boris; Betancourt, A.; Franco, J.L. ClusterPy: Library of spatially constrained clustering algorithms, Version 0.9.9. RiSE-group (Research in Spatial Economics). EAFIT University. 2011.
Discuss this article on our forum with over 1,900,000 registered members.

About Andrey Kamenov

Andrey Kamenov, Ph.D. Probability and Statistics

Andrey Kamenov is a data scientist working for Advameg Inc. His background includes teaching statistics, stochastic processes and financial mathematics in Moscow State University and working for a hedge fund. His academic interests range from statistical data analysis to optimal stopping theory. Andrey also enjoys his hobbies of photography, reading and powerlifting.

Other posts by Andrey Kamenov:

One thought on “Industrial clusters in the Western U.S.”

Leave a Reply

Your email address will not be published. Required fields are marked *