Parallel Coordinate Plot

High-dimensional data are hard to visualize because we are used to a 3D world. Many visualization techniques have been developed to help visualize this kind of data. Most of them eventually map high-dimensional points onto 2D points for displaying on a flat surface. Parallel coordinate plots take a different approach to draw high-dimensional data. Since plotting more than 3 mutually perpendicular axes is impossible, parallel coordinate plots draw all the axes parallel to each other and equally spaced in a 2D plane.

For simplicity, 1 is adopted to be the distance between adjacent axes.

In this approach, a point in a p-dimensional space is represented as a series of line segments in a 2-dimensional space. Thus, if the original data observation is written as (x1, x2, ... xp), its parallel coordinate representation is the p line segments connecting the points (1, x1), (2, x2), ... (p, xp). This unbroken series of p line segments could be thought of as a profile of a given observation. The shape of the line segments conveys information about the levels of the p variables. Observations with similar data values across all variables will share similar profiles. Clusters of similar observations can thus be discerned. Associations among variables can also be visualized; two variables inversely proportional to each other will be connected by line segments which all cross in the region between the axes, while two directly proportional variables will be connected by parallel line segments.

There are in general 2 ways to draw a parallel coordinate plot. One is to use a common vertical range that encompasses all the values of all the variables being visualized. Figure 11-9 is such an example. Parallel coordinate plots drawn in this way can convey relative magnitude of variable values. The other way is for each axis to use a different scale so that the minimum and the maximum values of the variable at an axis are mapped to the bottom and the top of the available drawing room. Parallel coordinate plots drawn in this way are called stretched in this manual and is good for detecting clusters visually. Figure 11-10 is a stretched parallel coordinate plot displaying the same data as Figure 11-9.

Figure 11-10. A stretched parallel coordinate plot

---> images/plot-pcp-stretched-example.png <---

Overstriking can frequently be a problem for parallel coordinate plots. Each blue pixel in Figure 11-9 only tells you there is at least one observation profile passing through it. A blue pixel that appears only in one observation profile looks exactly the same as a blue pixel that appears in 1000 observation profiles. Argos provides several ways to detect overstriking in a parallel coordinate plot. For example,

Notes

[1]

It's very easy to find out the variable at an axis. Moving the cursor close to a tic label on the horizontal axis will cause the name of the associated variable to pop up in a small window like:

---> images/pcp-tic-tip.png <---

Invoking List X Tic Labels generates a list of all variables names.