Visualizing correlations with graphs
Yesterday I found a statistics textbook for geologists [1] for $1 at a library book sale. When I thumbed through the book an image similar to the one below caught my eye.
This image approximates Figure 15.2 in [1],
The nodes represent six factors of the thickness of rock formations and the edges are labeled with the correlations between factors. Only large correlations are shown. For example, in theory everything is correlated with total" but carbonates are not significantly correlated with the total. Nonclastics divide into evaporates and carbonates; apparently nearly all the nonclastics in this data set were evaporites.
Notice that this example illustrates that correlation is not transitive. That is, if A is correlated with B and B is correlated with C, it does not follow that A is necessarily correlated with C.
Making the graphI made the graph above with GraphViz using the following code.
graph G { layout=neato T [label="Total" , pos="2.50, 5.00!"] S [label="Sand" , pos="4.66, 3.75!"] C [label="Carbonates" , pos="4.66, 1.25!"] E [label="Evaporites" , pos="2.50, 0.00!"] N [label="Nonclastics", pos="0.39, 1.25!"] H [label="Shale" , pos="0.39, 3.75!"] T -- S [label=" 0.24 "] T -- H [label=" 0.89 "] T -- N [label=" 0.84 "] T -- E [label=" 0.82 "] H -- N [label=" 0.69 "] H -- E [label=" 0.70 "] S -- C [label=" 0.45 "] N -- E [label=" 0.99 "] }
I've mostly used GraphViz to make graphs when I didn't care much about the layout. I've experimented with a few layout engines, but I hadn't tried specifying the node positions before.
The nodes in the original graph were arranged in a circle, so I tried the circo layout engine. This did not position the nodes in a circle. I also tried specifying the positions without the bang on the end, giving the positions as layout hints. GraphViz did not appreciate my suggestions and was certain that it knew better how to layout the graph. But when I added the exclamation marks GraphViz acquiesced to my wishes.
GraphViz will create output in a variety of formats. I tried PNG and SVG. The SVG image above was 11 times smaller than the PNG output. One reason I starting using SVG images more often is that they often result in smaller files. They also look very nice at multiple resolutions, i.e. on a desktop and on a mobile device.
Related posts- Induced negative correlation
- Graphing Japanese prefectures
- Graph of probability distribution relationships
[1] Krumbein and Graybill. An Introduction to Statistical Models in Geology. McGraw-Hill, 1965.
The post Visualizing correlations with graphs first appeared on John D. Cook.