| Tasks: Two-Class Problem
Class 0 = crystal data types 1,2,3 Design a classifier to distinguish between Class 0 and 1 as defined above. That is, the input to the classifier will be an image from the datasets 1, 2, 3, 7, 8 or 9, and you should output a label indicating a prediction of whether it is a type 1, 2, or 3 image or a type 7, 8, or 9 image. In your design, keep in mind that misclassifying Class 1 data is more undesirable than misclassifying Class 0 data. |
We propose a two-step approach for discriminating data types 1, 2, and 3 (no crystals) from data types 7, 8, and 9 (crystals). The first step involves a 1-D classification which screens out images that obviously contain no crystals. If an image is classified as Class 0 at this step, it is assigned a label of ‘0' and the process is complete. On the other hand, if the image is classified as Class 1, the image is passed on to the second step of our algorithm. This second step involves a 2-D classification which attempts to discriminate between true Class 1 images and Class 0 images which may have been misclassified in step 1. After this second step, an image is assigned a label of either ‘0' or ‘1', according to the 2-D classifier. The figure below illustrates the flow of our algorithm.

| back to top | home |
As mentioned above, the first step in our algorithm involves a single, scalar feature. This feature, the number of objects in an image, is the same as Feature #5 used in Project Task 1. Please see here for a description of this feature. Before this feature can be extracted from the image, the dish perimeter must be estimated and the image must be preprocessed.
In Project Task 1, the final classifier chosen was 1-D with Gaussian class-conditional densities having unequal means and variances, equal prior probabilities, and unequal costs. This resulted in a threshold of 10.8, meaning that if an image had less than 10 objects in it, then it would be classified as data type 1. Otherwise the image would be classified as data type 9. This threshold worked great for discriminating between data type 1 and data type 9 (see result here), but with the additional data types (2, 3, 7, and 8), it results in an unacceptable number of misses crystal images. An attempt was made to recalculate the means and variances of Class 0 and Class 1, but the resulting threshold gave very poor performance (classified everything as Class 1). Next, we tried to set a false-alarm rate via Neyman-Pearson, but for false-alarm rates below 20%, the resulting threshold was negative, which again meant that the decision rule was useless. So we empirically chose the threshold so that the total error (= 10*#miss + #False Alarm (FA)) of the training data was minimized. This error is plotted below as a function of threshold; we can see that a threshold of 4 is by far and away the best choice for the training data.
We will make the assumption that this threshold will also work well for the test data. The errors for the training data are broken down as follows:
Crystal data type 1: 10
Crystal data type 2: 10
Crystal data type 3: 73
Crystal data type 7: 0
Crystal data type 8: 0
Crystal data type 9: 1
Total = 10 + 10 + 73 + 10*1 = 103
As we can see, we only make one miss, but have many false alarms. In particular, data type 3 appears to be the most difficult to discriminate from data types 7, 8, and 9 based upon the number of objects in the image. This is due to the various types of precipitate and “gunk” that characteristically appear in data type 3 images. These non-crystal related heterogeneities result in lots of objects being segmented during the extraction of our first feature. To the human eye these segmented objects that appear in data type 3 images are definitely not crystals, and so we must now search for some additional features that involve the properties of said objects. At first glance, it is easy to notice that objects from data type 3 tend to be darker and smoother than objects from data types 7, 8, or 9. This leads to two additional features discussed below.
| back to top | home |
The second step involves using the properties of the objects segmented from the thresholded image during feature 1 extraction in step 1 (see examples). If an image contains crystals, segmented objects would correspond to the crystals within the image, whereas the objects obtained from class 0 images would correspond to non-crystal precipitations. Hence, we expect that the properties (features) of objects from class 0 and class 1 would be significantly different and thus can be used to differentiate between the two classes. Matlab's ‘regionprops' command is used on the object labels to determine locations of the objects (in terms of which pixels). We have decided to use two additional features for step 2. Feature 2 is the average of objects' pixel intensities , which is calculated as follows: First the average pixel intensity is calculated for each object in an image, then another average is taken over all objects in the image, resulting in a single scalar value. Feature 3 is the average of objects' sum of local variances, calculated as follows: First the sum of local variances is calculated for each object in an image, and then the sum of local variances of all the objects in the image are averaged, resulting in a single scalar value. The histograms of these two features for the training data are shown below. Note that only samples that were classified as Class 1 in Step 1 are included here.


| back to top | home |
We incorporate these two features into a 2-D classifier with Gaussian class-conditional densities having unequal means and variances and unequal prior probabilities. The costs, as assigned in the problem, are c00= c11 = 0, c10 = 1, and c01 = 10. The prior probabilities are chosen empirically, resulting in the constant term in our discriminant function, (log(p1*c01)-log(p0*c10)), being log(6.0215). The decision boundary has the form
where


![]()
The scatter plot of our training data is shown below, with our resulting decision boundary superimposed.
We can see that we have reduced significant number of false alarms without additional misses (see result on training data below). Using Step 2 in conjunction with Step 1 reduces our total error (=10*#miss + #FA) by about 25%.
| back to top | home |
We test our classifier with all images in class 1-3, 7-9. The total number of error made for each class is shown below.
Crystal data type 1: 8
Crystal data type 2: 10
Crystal data type 3: 50
Crystal data type 7: 0
Crystal data type 8: 0
Crystal data type 9: 1
Total (10*1 + (8+10+50))= 78
| back to top | home |
Function CLASSIFIER2CLASS - classify class 0 images(type 1,2,3) from class 1 images (type 7,8,9)
Input: crystal image in standard JPEG format
Output: label assigned to image
Usage: label = classifier2class('1a/i_1_14.jpg');