Please see here for a description of the data set used in this tutorial.
Outliers
HCS data contain a lot of outliers, often resulting from primitive
image processing algorithms. Because most of the variables in an
HCS data set are directly or indirectly based on nuclei identified
by image analysis, it is a good idea to first look at DNA profiles
to identify and remove outliers; otherwise, the garbage that goes
in comes out to clutter otherwise good analysis.