Categorize by Colors...

Painting is an important operation of Argos to explore data. It is used to integrate information in different plots. It can be used to specify the data you would like to retrieve for further explorations. There is a categorical variable, Observation color, for you to treat the current coloring scheme as a regular variable and you can use it wherever a categorical variable can be used. The coloring scheme in a plot is constantly changing as you carry out painting operations to explore data. If you would like to preserve a particular coloring scheme in a plot, there are a couple of ways to accomplish it. One way is, for each color in a plot, to extract observations of the same color and form a new data set with a meaning name. The other way is to use Categorize by Colors..., which allows you to define a new categorical variable whose categories are determined by the coloring scheme in a plot at the moment it is defined.

For example, Figure 13-8 is a DNA profile plot of 19,276 cells painted in 3 colors by a cell biologist to indicate 3 group of cells in different phases of cell cycle. Red cells are most likely in the 2N phase. green cells are most likely in the S phase. Cyan cells are most likely in the 4N phase.

Figure 13-8. A DNA profile plot of 19,276 cells

---> images;hkf-categorize-by-colors-dna-profile.png <---

Categorize by Colors... can be used to define a new variable "cell cycle phase" based on this coloring scheme. After being invoked, Categorize by Colors... will pop up a dialog for you to specify a name for the new categorical variable:

---> images/hkf-categorize-by-colors-dialog-initial.png <---

Only observation colors in the displayed data in the plot on which Categorize by Colors... is invoked will be listed in this dialog. You can assign a meaningful name to each category in the text field to the right of each color rectangle. Here is an example of the same dialog but with specified variable and category names:

---> images/hkf-categorize-by-colors-dialog-filledl.png <---

Variable and category names are case-sensitive.

Let D be a data set at the root level [1] in the scenegraph, P be a plot somewhere in the branch rooted at D, and Ds be the displayed data set in P. Data set Ds is a subset of data set D and may or may not contain the same set of observations. When Categorize by Colors... is invoked on plot P, only observations in data set Ds will take on one of the defined categories for the newly defined categorical variable; all observations in data set D but not in data set Ds will have the missing value code "NA" as their values.

When Shift-<Categorize by Colors...> [1] is invoked on plot P, the list of colors defining a new categorical variable is still determined by data set Ds but all observations in data set D will take on one of the defined categories for the new categorical variable. If the color of an observation in data set D is not one of the category-defining colors in Ds, the missing value code "NA" will be used as the value of the new categorical variable for this observation.

Categorize by Colors... will be active and not grayed out in the right-click menu of a plot if and only if there are 2 or more colors in the displayed data.

Notes

[1]

The Shift key is held down when Categorize by Colors... is invoked.