Preselected target groups were presented in one of three conditions: in the context of a typical latent fingerprint; in the context of a high-clarity plain (not rolled) exemplarFootnote 1; or as a small area cropped from the plain exemplar (without surrounding context) (Fig. 1). Examiners then localized the designated target group in a second impression that the examiners were told was from the same source (mated); the mated impression was the same for all three conditions. The use of preselected target groups and mated impressions allows us to isolate localization behavior from other elements such as target selection and the decision process that results in a conclusion. This task was part of a larger experiment that involved other tasks such as counting and tracing ridges, as well as easy and difficult comparisons with realistic latent prints.
An Eyelink 1000 eye-tracker was used to track eye movement while examiners analyzed and compared fingerprint images on computer screens. Instructions to participants were as follows:
“Find the target group — You will be shown an image on the left with a target group already identified/indicated (a square about eight ridges on a side). The target group may be shown as an outlined area within a full exemplar or latent, or as a small cropped area shown by itself. You will then be shown a rolled exemplar from the same source (same finger, same subject, with the same orientation). Find the indicated target group, and when you are confident that you have found it, say DONE. The target group will always be present. (The purpose is so that we can understand in isolation the behavior of your eyes when memorizing a group of features, and searching for that group of features — which we expect to see often in standard comparisons. We are not trying to trick you (the source is always there): we are trying to see how you memorize and search for a target group.)”
Fingerprint images
Eight image sets were selected from the “Ground Truth” dataset distributed with the Federal Bureau of Investigation’s (FBI) Universal Latent Workstation (ULW, n.d.). Each image set was constructed from three impressions of the same finger (latent, plain (flat) exemplar, rolled exemplar) scanned at 39.4 pixels per millimeter (ppmm) — note that the images were scanned at 1000 pixels per inch in accordance with the prevailing standard (ANSI/NIST-ITL, 2013), and therefore metric equivalents are rounded. When presented to an examiner, the image on the left was either (a) the latent image, (b) the plain exemplar image, or (c) a small area cropped from the plain exemplar image; the mated, high-clarity rolled exemplar was always shown on the right side of the screen. The images from two image sets are shown in Fig. 1.
A small area of each finger was selected as the “target group.” This target group was shown outlined in yellow on the latent and plain image; the cropped image was identical to the target group outlined on the plain image but without the surrounding context. Each target group area was a 150 × 150 pixel square (3.8 mm × 3.8 mm; approximately eight ridges across assuming an average ridge-ridge distance of 0.56 mm). Because of the plasticity of the skin, the areas on the latent and plain images were not strictly identical. Thus, eight image sets (FT1–FT8) were defined; within each set three image pairs were defined, with the left image varying based on the task type (latent, plain, cropped), but with the same corresponding rolled exemplar used as the right image. We refer to image pairs as FT1Latent, FT1Plain, FT1Crop, etc. The image in the plain and cropped tasks was identical except for the visual context included in the plain tasks, so that comparing plain to cropped would isolate the effect of context.
The target areas were selected to provide a moderate amount of pattern-level information, so that the task was tractable but not obvious. We considered selecting low-information target groups (e.g., areas where ridges are relatively parallel with few minutiae, or targets cropped from the latent prints), but rejected that because it would change the scenario to one where the participants might not have enough information to complete the task. Conversely, we also avoided highly distinctive areas: the reason that we stopped using target groups FT7 and FT8 was because they were too obvious (clearly core and delta formations, respectively).
In order to isolate the specific task of finding the target and determine the role of context, we deliberately only used fingerprints from the same source (mated image pairs), and told the examiners that they were mated. Using only mated image pairs was necessary: if examiners were also deciding whether the images were mated, they presumably would have compared in detail regions outside the target group. This would have turned our task into a full examination, instead of the intended subset of the comparison process.
When assessing whether a fixation is considered in the target area, we add a margin of 30 pixels (just over one ridge width) to the 150 × 150 pixel target to allow for factors such as eye-tracker measurement imprecision (e.g., calibration), foveal field of view, distortion, and skin elasticity.
Data collection setup
Figure 2 shows a typical test setup. Examiners viewed the images on a Viewsonic VX2452mh LCD monitor at 1080p (1920 × 1080) resolution with a 5-ms luminance-change time constant running at 60 Hz. They were positioned using a chinrest 70 cm from the eye to the monitor. At this viewing distance and monitor resolution, there are 50 screen pixels per degree of viewing angle (edge to edge of the monitor was about 38°; average distance between the centers of the left and right images was about 19°).
Presentation code was written in MATLAB (MathWorks, 2012) using functions from the Psychtoolbox for image presentation (Kleiner, Brainard, & Pelli, 2007; Pelli, 1997), and the Eyelink Toolbox for coordination with the eye tracker (Cornelissen, Peters, & Palmer, 2002). The software interface allowed examiners to zoom in and out at the mouse location using keyboard presses, which were also used to pan the image. The two images were presented separately on each half of the monitor and could be zoomed independently. A software tool to mark and link features such as minutiae was available, and was used on 10% of the target-finding trials. Contrast inversion was also available but seldom used for the present task. The default zoom level was 1:1 (one screen pixel to one image pixel); because the screen resolution was 3.7 ppmm and the image resolution was 39.4 ppmm, there was an effective magnification of 11x. The current zoom levels and markup mode were displayed in the upper-left portion of the monitor as shown in Fig. 2; fixations near that textbox were excluded from our analyses. There were no other user-interface elements (e.g., scrollbars, toolbars, buttons) displayed on the screen.
At the start of the trial, the left image was displayed. When the examiner verbally indicated that they were ready to proceed with the Comparison phase of the trial, the experimenter displayed the right image. When the examiner said that they had found the target, the experimenter terminated the trial and triggered the final drift correction dots.
Gaze location (from both eyes) was recorded at 1 kHz using an EyeLink 1000 eye tracker. The EyeLink camera and illuminator were positioned immediately in front of the monitor on the table, and could be adjusted left or right if necessary to eliminate glare on glasses. We generally used 75% illumination within the EyeLink system, but bifocal contacts or thick glasses that tended to block IR illumination sometimes required 100% illumination.
Calibration
At the start of, and periodically during, the experiment, the experimenter calibrated the eye tracker. Calibration was done within the EyeLink software using 13 calibration dots; gaze locations were tracked using the centroid measurement technique. Calibration was re-attempted as necessary, with a goal of obtaining an average error calculation of 0.5° for both eyes. A value of 0.8° was considered acceptable, although one examiner was only able to achieve 0.94° and 0.97° after seven calibrations, but was still included in the analyses. The calibration error goal of 0.5° corresponds to about ± one ridge (25 screen pixels) at a 1:1 zoom level (68% of fixations in this task were at 1:1). The foveal field of view (area of high acuity) typically corresponds to about 2°, or a diameter of about 100 screen pixels; at a 1:1 zoom level, this is 2.5 mm in image coordinates, or about 4.5 ridges.
In addition to the overall calibration that was managed within the eye tracker, we collected additional data to allow us to correct for gaze drift in post-processing. Each trial began and ended with seven drift correction dots presented in sequence; the examiner was instructed to fixate on each dot of known location for 1.25 s. Post-processing corrected for systematic drift on each trial by comparing gaze points within a threshold distance (2°) from the dot during the 1.25-s interval against the ground truth location. A clustering algorithm based on the mean shift algorithm (Cheng, 1995) was used to determine the largest cluster of at least 20 successive 1-kHz gaze points within the 2° radius; this cluster center was taken as the intended location of gaze for that drift correction dot. By working with raw gaze points, we take advantage of the density of the gaze points (potentially over multiple fixations and micro-fixations) to determine the intent of gaze during the drift correction procedure. Final drift correction was done using QR decomposition (Francis, 1961) that found a second-order polynomial transformation in both dimensions (six total coefficients) that allowed for translation, rotation, and scaling of the gaze points to the ground truth locations. This transformation was then applied to all fixations extracted from the raw gaze stream, as described next.
Fixation extraction
Each trial consists of a series of eye locations over time: an (x,y,t) path of raw 1-kHz data, which was processed to differentiate saccades and fixations. Because visual information is only coarsely represented during saccades (Ross, Morrone, Goldberg, & Burr, 2001), and because much of the relevant information for fingerprint comparisons is found in smaller details (minutiae), fixations were the fundamental unit of analysis for most of the study results. The raw gaze stream (1-kHz sample data) was partitioned into fixations and saccades using the following approach. We used the Engbert-Mergenthaler saccade detector (Engbert & Mergenthaler, 2006) with a velocity threshold of 8 pixels/s (0.16°/s) and a saccade duration minimum threshold of 9 ms to identify long saccades. As in Port, Trimberger, Hitzeman, Redick, and Beckerman (2016), a saccade required that the eye remain at rest for at least 20 ms within a ± 0.25° X-Y positional window. The Engbert-Mergenthaler saccade detector (Engbert & Mergenthaler, 2006) does a very good job of detecting saccades in which the eye moves more than 1.5° (Port et al., 2016). However, saccades of less than 1.5° tend to include a mixture of what subjectively appear to be even shorter saccades and fixations that are close together. Differentiating saccades from fixations is further complicated by the fact that fingerprint examiners may make very regular, closely spaced, fixations when counting or following ridges. A single saccade detector with a high-velocity threshold may risk grouping together several close fixations into a single fixation (of long duration) that may not accurately represent the detailed behavior. However, a low threshold risks producing many spurious fixations and saccades. Thus, we modified the Engbert-Mergenthaler saccade detector using a variation of the double-threshold algorithm, which is popular in many image-processing applications (e.g., Chen, Sun, Heng, & Xia, 2008) where distance can complement a threshold applied to some other value such as image intensity. Similar approaches have been used for eye-tracking analyses on fingerprint examiners (Busey et al., 2013, 2015; Parada et al., 2015).
To apply the double-threshold algorithm, the period between long saccades was further divided into one or several fixations according to the following approach. First, we created a set of candidate fixations by re-applying the Engbert-Mergenthaler saccade detector using a lower-velocity threshold of 3 pixels/s (0.06°/s) and the same saccade duration threshold of 9 ms. The use of a lower threshold has the advantage that it finds more short saccades, at the risk of spuriously introducing extra fixations where the eye does not travel far enough to be considered a true fixation. If two contiguous fixations had centroids within a minimum distance, then these two fixations were merged together under the assumption that the low-velocity threshold may have inadvertently split some true fixations. We selected a minimum distance of 0.35°, which is more conservative than the 1° value recommended as the fixation radius by Blignaut (2009), but tends to preserve closely spaced fixations rather than grouping them into a larger fixation. This approach was necessary because at times the examiners tended to put deliberate fixations in close proximity, such as when they were counting or following individual ridges. Figure 3 illustrates the resulting fixations on an example with spatially close fixations.
Fixations were required to be longer than 80 ms, although Manor and Gordon (2003) argued that a minimum of 100 ms can also be justified. The centroid of the fixation was computed as the median x and y location of all of the raw gaze points that are identified as part of the fixation. The resulting mean fixation duration was 320 ms (median 270 ms; quartiles 198–373 ms). The median saccade duration was 21 ms within an image; median duration for crossing saccades (between the left and right image) was 76 ms. Details in Additional file 1: Appendix SI-4.1.
The above procedures determine fixation centroids projected onto the monitor coordinates. These coordinates were then adjusted by the aforementioned drift correction procedures. Because our software allows for scaling and panning of the images, an additional step was required to project the drift-corrected fixations into the image coordinates using the scaling and panning parameters active during that fixation. Additional information regarding fixation extraction methods used and their accuracy are shown in Additional file 1: Appendix SI-4.2.
Test administration
Each participant was assigned a sequence of fingerprint comparisons, interspersed with three types of directed tasks. In addition to the find-the-target task that is the focus of this paper, the directed tasks included ridge following and ridge counting; results of the fingerprint comparisons and other directed tasks will be reported in subsequent papers. Testing occurred in June–August 2016 in six locations in Ohio, Indiana, Virginia, Kentucky, and Georgia. Participants were provided with written instructions prior to the test. An experimenter then verbally summarized the instructions and answered any questions. Participants were requested to continue testing for 2 h or until all of the assigned trials were completed; however, participants were permitted to stop early or continue after the 2-h time period.
Participants
Participation was open to practicing latent print examiners who are currently doing casework or have done casework within the last year. Participants gave informed consent after reviewing a human subject consent form approved by the Federal Bureau of Investigation Institutional Review Board prior to the start of the study. A total of 122 examiners participated: 39% were from federal agencies, 31% state, 22% local, 5% international, and 2% private. Seventy-nine percent were from accredited laboratories. Seventy-six percent had 5 or more years of experience as a latent print examiner; none had less than 1 year. Nineteen percent wore glasses, 29% had contact lenses, and 7% had LASIK. No participants were required by their employers to participate. Participants were assured that their results would remain anonymous; a coding system was used to ensure anonymity during our analyses and in reporting. Usable eye-tracking data for the present task was collected from 117 participants: of the total 122, four participants were tested during an initial phase of data collection in which find-the-target tasks were not assigned, and data from one participant was unusable due to a corrupt file. (See Additional file 1: Appendix SI-2 for further details on participants.)
Test yield
The 117 participants completed a total of 675 valid trials (two invalid trials were omitted). Each examiner was assigned only one type of task (latent, plain, cropped) from each image set (FT1–FT8); therefore, no examiner was assigned two trials involving the same source finger. Because participants were allowed to stop early or continue, some completed as few as two or as many as eight trials; most participants (87) completed at least two trials of each type (latent, plain, cropped). Image sets FT7 and FT8 were retired early during the course of testing to better use the limited time with each examiner; due to the smaller resulting sample sizes, these two image sets are omitted from analyses aggregated by image pair. From 32 to 39 examiners completed each of the tasks from FT1Latent through FT6Cropped (18 image pairs); only seven to nine examiners completed each of FT7Latent through FT8Cropped (six image pairs). The 675 trials included a total of 53,093 valid fixations; for analyses omitting FT7 and FT8, there were 630 trials with 49,242 valid fixations. (See Additional file 1: Appendix SI-5 for further details regarding omitted data and test yield.)