The goal of the search task (Experiment 2) was to determine whether semantic and episodic information (recent experience with target locations) optimally guide search decisions. Target locations were selected to be either (1) congruent with scene semantics, from the congruent sampling distribution derived in Experiment 1 (semantically congruent), (2) incongruent with scene semantics, from the incongruent sampling distribution computed in Experiment 1 (semantically incongruent), or (3) random. We conducted a power analysis through G*Power 3.1.9.2 (Faul et al. 2007) to determine our sample size. With an alpha of .05, the power analysis revealed that a sample of 10 subjects would allow us to achieve a projected power of .95 and a projected effect size of .25 for a three-way repeated measures analysis of variance to compare subjects’ search performance on the factors: target (three levels), congruence (three levels), and session number (five levels).

### Experiment 2: Method

#### Subjects

Ten subjects completed the experiment. Data from one additional subject was collected but not analyzed because the instructions were misinterpreted. All subjects had normal or corrected-to-normal vision. Subjects were paid $10/h. The study was approved by the Rutgers University Institutional Review Board and was conducted in accordance with the Declaration of Helsinki.

#### Apparatus

Data was collected on a Dell Optiplex 755 with a 21.5” Dell SX2210Tb monitor (60 Hz refresh rate) using 1920 \(\times\) 1080 desktop resolution. The experiment was written in HTML, CSS, and JavaScript using jsPsych (de Leeuw 2015), KineticJS version 5.1.0 (Rowell et al. 2012), and jQuery version 1.11.1 (jQuery Foundation, Inc.). The experiment was presented in a maximized Google Chrome browser window. Viewing distance was whatever felt most comfortable for the subject, which ranged from approximately 20-24”.

#### Stimuli

Stimuli for the search task were the same two room scenes used in Experiment 1 (Fig. 1a, b). Three displays were constructed using Adobe Illustrator and Adobe Photoshop, and were rendered as interactive scenes for the search task: a kitchen scene (Fig. 1a), a living room scene (Fig. 1b), and a map (Fig. 5b, c). Six searchable locations in each room were surrounded by a red glow (which can be seen in 1a, b) until searched. An image of the target (48-75 pixels wide and 48-93 pixels high) appeared on each trial above the display and alongside the task instructions. A simple “map” consisted of an 800 \(\times\) 600 drawing with three grey, equally spaced squares (87 \(\times\) 84), a red “ \(\times\) ” (82 \(\times\) 84) to mark the current location on the map, and the text “Click a room”. The outermost squares on the map were labeled with room names (right: “Kitchen”, left: “Living Room”) and could be used to switch rooms via mouse click.

The display screen additionally showed the trial number, the subject’s current trial score (e.g., “Current reward for finding the [target]: 22 points”), the cumulative score for the block (“Accumulated points”), a button to access the map, the 800 \(\times\) 600 region where the map and scenes were displayed, and a small image of the target. A 500 pixel-wide timer bar indicated the time (and possible points) remaining in the trial. The timer bar and the maximum possible score were updated every second to reflect the time remaining in the trial. Below the score, a grey button labeled “Go To Map” allowed subjects to access the map.

#### Design

The levels of semantic congruence were (1) *Semantically congruent*: target probabilities were congruent with the semantics of the scene, selected using the congruent sampling distribution obtained in Experiment 1 (see Eq. 1), (2) *random*: target location probabilities were random; and (3) *semantically incongruent*: target location probabilities were selected using the incongruent sampling distribution derived from the data in Experiment 1 (Eq. 2), and thus target locations were incongruent with the scene semantics. The congruence manipulation was implemented via experimental blocks. In each block, rooms were searched for three targets: a mug, batteries, and keys (Fig. 1c). There was one search target per trial (10 trials/target/block).

#### Procedure

*Order of testing* Each experimental session (approximately 45 min) consisted of three blocks of 30 trials (10 trials per target), one block for each level of semantic congruence. Subjects were tested for 5 sessions, except for one subject who was tested for only 4 sessions. Each session took place on a separate day. The order of blocks was pseudorandomized such that no two subjects received the same congruence condition order across sessions, and no subject received the same block order across sessions. Within a block, targets were selected at random. Each of the three targets appeared 10 times without replacement in a 30 trial block.

*Instructions* Before the beginning of testing, subjects were told that they would be searching two computer-illustrated rooms for a target. They were given a list of searchable locations within each room. Subjects were informed that they would earn points by finding the target, and that they would earn more points for finding the target quickly.

*Familiarization* Each block was preceded by 12 familiarization trials to inform subjects about the searchable locations within each of the two scenes. In the familiarization trials, each room scene was displayed one at a time, in randomized order. In each scene, the six searchable locations were outlined with a red glow. Subjects were provided a label for one of the locations and instructed to click on it (e.g., “Click on the microwave”). Upon clicking a location, a sound played (“cha-ching” if they correctly clicked on the instructed location or “splat” for an incorrect click). Feedback text in 30 pt red font (“Correct!” or “Incorrect!”) was overlaid on the top-center of the scene for 900 ms. The trial persisted until the correct location was clicked, at which point there was a 250 ms inter-trial interval and subsequently the next familiarization trial was displayed. Familiarization trial order was randomized without replacement. Once all six locations in the first scene were correctly identified, the same procedure was repeated for the second scene.

*Search task* Before each experimental block, a fictitious street address with a randomly generated house number was displayed and subjects were asked to search for items within the house. The purpose of the address was to produce the impression that subjects were searching in a new house in each block of trials. The address persisted until the subject pressed a key to begin the block.

A trial proceeded as follows (see Fig. 5). First, a screen displayed the current trial number and instructed the subject to find the displayed target (Fig. 5a). This screen remained until the space bar was pressed, which started the trial. A timer recorded the duration of the trial.

To initiate search, subjects first used the map to select a room (Fig. 5b), which caused the symbol (x) on the map to be displaced to the center of the chosen room over a period of 1000 ms (Fig. 5c). Subjects were able to switch between rooms at any time by clicking on a “Go to Map” button above the display region. Within a room, subjects could click on one of the searchable locations designated by the red glow (Fig. 5d). Upon clicking a location to search, an animation briefly (2000 ms) zoomed in on the chosen location (150% scale) and then zoomed out to the full scale of the room (Fig. 5e), during which time subjects were unable to act. Following the search animation, audio and text feedback about the outcome of the search were provided simultaneously (Fig. 5f): auditory feedback was the same as in the familiarization task, and text feedback was overlaid in 30 pt red font (“Nothing here.” or “Found!”) for 900 ms. After searching a location, the red glow surrounding the location disappeared, and subjects could no longer interact with the location. If the target was in the chosen location, a “cha-ching” sound played, the message “Found!” (900 ms) was displayed, and the trial terminated. A screen then appeared indicating that the target was found, and displayed an image of the target (Fig. 5g). Subjects then moved to the next trial. Otherwise, a “splat” sound played, and the message “Nothing here” (900 ms) was displayed, after which they could continue searching. The trial persisted until either the target was found or 30 seconds elapsed. If the target was not found within 30 seconds, a screen appeared indicating that they did not find the target, along with an image of the target (Fig. 5h).

*Point system* To measure search performance, points were awarded equal to the seconds remaining in the trial at the time that the target was found. Delays associated with selecting a room to search in (1000 ms) and searching a location within a room (2000 ms) cost subjects 1 and 2 points, respectively. This meant that a maximum of 27 points could be earned in a trial if the target was found immediately. Points awarded were the inverse of reaction time (e.g., 27 points earned corresponded to a 3 second search). If the target was not found, zero points were awarded. Points awarded on each trial were added to a cumulative score over the course of a block as motivation for subjects, but did not carry over to subsequent blocks.

*Feedback* Half of the subjects received feedback regarding the actual location of the target at the end of a trial whether the target was found during the trial or not (Fig. 5h). The other 5 subjects were not informed of the target’s actual location after each trial (Fig. 5g).

#### Analysis

There were a total of 450 trials per subject (10 trials per target \(\times\) 3 targets per block \(\times\) 3 blocks per session \(\times\) 5 sessions per subject) for 9 subjects (4050 trials), and 360 trials for one subject who completed only 4 experimental sessions, yielding a total of 4410 trials across subjects. Sixteen trials were excluded from analysis due to a browser rendering issue during data collection. Data from the remaining 4394 trials were analyzed.

*Analysis of search performance* Search performance was measured in two ways: points earned and reaction time. Both measures were analyzed using a mixed analysis of variance. There were three within-subjects factors: (1) the semantic congruence of the environment (semantically congruent, random, or semantically incongruent), (2) the experimental session number (1–5) to evaluate learning, and (3) the target (mug, batteries, or keys). Whether or not the subject received feedback at the end of each trial was included as a between-subjects factor.

### Experiment 2: Ideal observer model and simulations

We developed a Bayesian model to predict the behavior of an ideal observer conditioned on prior knowledge and recent experience.

### Ideal observer model

An ideal observer’s belief that a target *t* would be found in a location *l* during search *i*, based on the searcher’s prior knowledge *w* and recent experience *r*, can be expressed using Bayes rule as follows:

$$\begin{aligned} P(\theta _{t,l}|r,w) = \frac{ P(r|\theta _{t,l})P(\theta _{t,l}|w)}{\sum _{i,j}P(r|\theta _{t_{i},l_{j}})P(\theta _{t_{i},l_{j}}|w)} \end{aligned}$$

(3)

where \(\theta _{t,l}\) represents the searcher’s belief that target *t* is in location *l*, *r* represents the searcher’s recent experience, and *w* represents the searcher’s prior expectations based on general world knowledge. Note that before the first search event takes place, \(\theta _{t,l}\) is determined by *w*.

To model an ideal observer’s beliefs, we treated each search event as a Bernoulli trial, where the outcome is either a success or a failure. Prior beliefs derived from knowledge about scenes were represented by a Beta distribution because it is well-suited to represent binary-event probabilities (Kruschke 2014). We estimated the expected value, \(\hat{p}_{t,l}\), of the Beta distribution representing prior knowledge about scene semantics using the Likert ratings obtained in Experiment 1 as follows:

$$\begin{aligned} \hat{p}_{t,l} = \frac{\bar{L}_{t,l}}{7} \end{aligned}$$

(4)

where \({\bar{L}}_{t,l}\) is the Likert score averaged over raters for a target *t* occurring in location *l*, and 7 is the maximum value on the Likert scale. To obtain a probability estimate, we divided \({\bar{L}}_{t,l}\) by 7, the maximum value of the Likert scale.

We obtained the Beta distribution shape parameter \(\alpha _{t,l}\), the observed number of successful searches for the target *t* in location *l*, and the scale parameter \(\beta _{t,l}\), the number of times the target *t* was not found when location *l* was searched, by multiplying the expected value of each prior distribution (\(\hat{p}_{t,l}\)) by the total number of observations, *s*, as follows (Eq. 5)

$$\begin{aligned} P(\theta _{t,l}|r,w) = \Pi _i P(t_{f}^{i}|\theta _{t,l})P(\theta _{t,l}|\alpha _{t,l},\beta _{t,l}) = \Pi _i P(t_{f}^{i}|\theta _{t,l})P(\theta _{t,l}|\hat{p}_{t,l},s) \end{aligned}$$

(5)

where \(\theta _{t,l}\) represents the event that the target is found (Ferrari and Cribari-Neto 2004). Because *s* is the total number of observations, it determines how heavily the prior \(\hat{p}_{t,l}\) influences search behavior, and therefore *s* determines whether the searcher relies more on prior knowledge or on information gained from recent experience. We refer to *s* in the simulations as prior strength.

#### Simulations

We used the ideal observer model to predict optimal search performance for the target objects. The simulation (1) determined the behavior of an ideal observer for the task, and (2) predicted behavior as a function of different levels of the searcher’s dependence on world knowledge or recent experience (see appendix for expanded simulation methods).

Three levels of semantic congruence were tested: (1) *Semantically congruent*: target probabilities were congruent with the semantics of the scene, selected using the congruent sampling distribution obtained in Experiment 1 (see Eq. 1), (2) *random*: target location probabilities were random; and (3) *semantically incongruent*: target location probabilities were selected using the incongruent sampling distribution derived from the data in Experiment 1 (Eq. 2). The three levels of semantic congruence, the sampling distributions for the three targets (as found in Experiment 1), and the target location sampling method were all identical to those used in the search task.

To compare performance of the simulated searcher against humans who have physical limitations (e.g., the need to move a mouse), costs associated with search were doubled: selecting a room to search deducted the simulated searcher’s score by 2 points (vs. 1 for humans), and searching a location resulted in a 4 point deduction (vs. 2 for humans). Six different ideal observers were tested, varying in their dependence on prior knowledge. The dependence of the simulated searcher’s beliefs on prior knowledge was termed *prior strength*. Low values for prior strength simulated a strong reliance on recent experience, while high values simulated a strong reliance on prior beliefs.

Simulation performance was assessed via the points earned in the semantically incongruent condition, in which targets were placed in the least likely locations under prior knowledge guided by scene semantics. We simulated 500 trials for each of the three targets in each congruence condition (semantically congruent, random, semantically incongruent), and each prior strength value (1, 30, 60, 90, 150, 300) resulting in 27,000 total simulated trials. Results of the simulation showed that simulated searchers successfully prioritized information from recent experience over prior knowledge when prior strength was below 60 (Fig. 6).

Considering performance for each target separately, simulated searchers learned to search successfully for all targets except the mug (Fig. 7), suggesting preliminarily that statistical learning of the incongruent target locations for the mug was not possible.

These simulations predicted the ideal search performance for the three targets chosen for the active search task under varying degrees of reliance on world knowledge and recent experience, as determined by prior strength (*s*). Because the ideal observer performed similarly for prior strength values over 60 when target locations were incongruent with scene semantics (per Fig. 6), we chose to compare human performance in the search task to the simulated searcher’s performance using prior strength values of 1 (driven by recent experience), 60 (informed by both prior knowledge and recent experience), and 300 (driven by world knowledge).

### Experiment 2: Results

#### Search performance

Mean reaction time was analyzed using a mixed ANOVA with repeated measures on semantic congruence, session, and target and feedback as a between subjects measure. There was no effect of feedback, *F*(1,7) \(=\) 1.863, \(p =\) .215, \(\eta ^{2} =\) .210, on participants’ reaction time and so data was collapsed across feedback conditions (Fig. 8).

There were main effects of session, *F*(4,32) \(=\) 5.769, \(p =\) .001, \(\eta ^{2} =\) .419, and congruence, *F*(2, 16) \(=\) 17.665, \(p<\) .001, \(\eta ^{2} =\) .688. Pairwise comparisons with Bonferroni corrections indicate that mean reaction time from session 1 was significantly different from sessions 2 and 5. Specifically, reaction time in session 1 was significantly slower than sessions 2, \(p =\) .011, and 5, \(p =\) .005. All other comparisons were insignificant. For semantic congruence, participants were quicker in the semantically congruent condition than the other two congruency conditions, \(ps<\) .006 (Fig. 9).

A significant interaction between congruence and target was also found, *F*(4,32) \(=\) 22.391, \(p<\) .001, \(\eta ^{2} =\) .737. Reaction time was higher when participants were searching for the mug in the semantically incongruent condition (\(M =\) 20791.50 ms, \(SE =\) 672.67 ms) than the batteries (\(M =\) 17384.59 ms, \(SE =\) 537.73 ms) and the keys (\(M =\) 16724.22 ms, \(SE =\) 812.18 ms), but lower in the semantically congruent condition (\(M =\) 13396.55 ms, \(SE =\) 967.58 ms) in comparison to the batteries (\(M =\) 17821.19 ms, \(SE =\) 661.62 ms) and the keys (\(M =\) 16359.11 ms, \(SE =\) 917.67 ms). There were no other significant interactions (Fig. 10).

Points earned in each trial was used to compare the performance of human searchers to that of the ideal observer. The average points per subject for all trials in each block were analyzed using a mixed ANOVA with repeated measures on semantic congruence, session, and target and feedback as a between subjects measure. The ANOVA revealed no effect of feedback *F*(1, 7) \(=\) 2.067, \(p =\) .194, \(\eta ^{2} =\) .023, therefore, feedback was not included as a factor in subsequent analyses (Fig. 11).

A three-way repeated measures ANOVA revealed a main effect of session, *F*(4,32) \(=\) 5.528, \(p =\) .008, \(\eta ^{2}\) = .409, and congruence level, *F*(2,16) \(=\) 18.197, \(p =\) .001, \(\eta ^{2} =\) .695. On average, scores increased over sessions (Fig. 7). Pairwise comparison using Bonferroni correction revealed significantly higher scores by the fifth session relative to the first, \(p =\) .003, and higher scores by the second session relative to the first, \(p =\) .014. Other pairwise contrasts were not significant (Fig. 12).

Scores were higher in the semantically congruent condition than in both the incongruent, \(p =\) .002, and the random condition, \(p =\) .006; (Fig. 12). There was no difference between the semantically incongruent condition and random condition (\(p =\) .40). There was no main effect of target on overall points accrued, *F*(2,16) \(=\) 2.471, \(p =\) .134, \(\eta ^{2} =\) .236 (Fig. 13).

There was a significant interaction between target and congruence level, *F*(4,32) \(=\) 22.596, \(p<\) .01, \(\eta ^{2} =\) .739, suggesting that there was a significant effect of the target on the ability to learn incongruent target locations as predicted by the simulation. Namely, searchers would be less able to learn to search successfully in the semantically incongruent condition when searching for the mug than for the other targets, given that the ideal observer was unable to learn the mug’s incongruent locations. The results for human searchers supported this prediction (see Fig. 12). When searching for the mug, scores were highest in the semantically congruent condition (\(M_{\mathrm{{mug}}} =\) 17.659, \(\mathrm{{SE}}_{\mathrm{{mug}}} =\) .943, \(N_{\mathrm{{mug}}} =\) 393) and lowest in the incongruent condition (\(M_{\mathrm{{mug}}} =\) 10.416, \(\mathrm{{SE}}_{\mathrm{{mug}}} =\) .623, \(N_{\mathrm{{mug}}} =\) 396), indicating difficulty when learning to search for the mug. In contrast, learning to search for the batteries or the keys was tractable. Scores for the batteries and keys were similar in the semantically congruent (\(M_{\mathrm{{batteries}}} =\) 13.339, \(\mathrm{{SE}}_{\mathrm{{batteries}}} =\) .630, \(N_{\mathrm{{batteries}}} =\) 418; \(M_{\mathrm{{keys}}} =\) 14.637, \(\mathrm{{SE}}_{\mathrm{{keys}}} =\) .950, \(N_{\mathrm{{keys}}} =\) 408) and incongruent condition (\(M_{\mathrm{{batteries}}}\) = 13.772, \(\mathrm{{SE}}_{\mathrm{{batteries}}} =\) .526, \(N_{\mathrm{{batteries}}} =\) 416; \(M_{\mathrm{{keys}}} =\) 14.436, \(\mathrm{{SE}}_{\mathrm{{keys}}} =\) .776, \(N_{\mathrm{{keys}}} =\) 410), and lowest in the random condition (\(M_{\mathrm{{batteries}}} =\) 11.634, \(SE_{\mathrm{{batteries}}} =\) .614, \(N_{\mathrm{{batteries}}} =\) 404; \(M_{\mathrm{{keys}}} =\) 11.034, \(SE_{\mathrm{{keys}}} =\) .585, \(N_{\mathrm{{keys}}} =\) 401; see Fig. 13). There was no interaction between session number and either the congruence level, *F*(8,64) \(=\) .244, \(p =\) .98, \(\eta ^{2} =\) .030, or the target, *F*(8,64) \(=\) .364, \(p =\) .814, \(\eta ^{2} =\) .044.

Because points decreased as time elapsed in a trial, reaction time and points earned were strongly correlated with one another (Pearson’s \(r(4392) = -0.99, p<\) .0001). To compare subjects’ search performance with the simulated ideal observer, for which response times were not available, we elected to compare points earned by human searchers to that of the simulated searchers under three levels of prior strength that predicted different search performance in the simulations: 1, 60, and 300.

#### Comparison to simulations

To explore how well humans searched by comparing their performance to the simulations, we compared points earned by the human searchers in each experimental block with the ideal observer’s performance at prior strength values of 1, 60, and 300 (selected based on Fig. 6). Experimental blocks in which target locations were random were excluded from analysis in order to limit the number of comparisons to those that would be most informative and interpretable. We conducted two-tailed unpaired Bayesian *t*-tests using the ‘BayesFactor‘ package in R (Morey and Rouder 2011) to compute Bayes factors that weigh evidence for the null (\(H_{0}\): no difference between sample means) against the alternative hypothesis (\(H_{1}\): sample means are different; Rouder et al. 2009). Evidence in favor of the null (\(BF_{01}\)) was calculated by inverting the default Bayes factors (\(BF_{10}\)) that assess evidence for the alternative hypothesis against the null ((\(\frac{H_{1}}{H_{0}}\) )\(^{-1} =\) \(\frac{H_{0}}{H_{1}}\) )—in this case, \(BF_{10}\) captures similarity to the ideal observer. For the current analysis, Bayes factors above 1 were considered to support the null hypothesis, and the magnitude of the ratio indicated the strength of the evidence.

For each subject (\(\hbox {n} = 10\)), each congruence block (\(\hbox {n} = 2\)) in each session (\(\hbox {n} = 5\) for all subjects but one) was compared to aggregated data from the simulations. Because there were 1500 simulated trials for each level of prior strength and congruence condition, simulated data were averaged over bins of 50 trials, resulting in 30 data points (means) for each level of prior strength (1, 60, and 300) and semantic congruence condition (congruent or incongruent) for comparison (n = 6 samples). There were 588 *t*-tests in total. The number of tests that favor the null in each condition are reported.^{Footnote 2}

Aggregated performance for ideal observers in the congruent condition was approximately identical for all levels of prior strength, therefore for the congruent condition, we report only comparisons to the performance of the simulated searcher with the lowest prior strength (\(s =\) 1).

*Congruent human performance.* For overall subject performance, a total of 23 Bayes factors (47%) exceeded 1, and the average Bayes factor was 2.72 (SD = 0.86), suggesting the overall performance for approximately half of subjects was comparable to the ideal observers that relied on recent experience exclusively (\(s =\) 1) in the congruent condition. However, the number of Bayes factors that exceeded 1 was higher (\(n =\) 34, 69%) when human data from the congruent condition was compared to the ideal observer relying only on recent experience (\(s =\) 1) in the incongruent condition, and the average Bayes factor was slightly higher (\(M =\) 2.80, SD = 0.84), suggesting human subjects searching in the congruent condition performed slightly worse than the simulated searcher that relied on recent experience in the congruent condition, and were more similar to the ideal observer that relied on recent experience in the incongruent condition. In contrast, only 2 Bayes factors supported the null when subject data was compared performance of the ideal observer with partial world knowledge (\(s =\) 60) in the incongruent condition (\(M =\) 1.80, SD = 0.99), in sessions 1 (\(BF_{01} =\) 2.50) and 4 (\(BF_{01} =\) 1.10), and 0 Bayes factors supported the null when subject data was compared to simulated performance that relied solely on world knowledge in the incongruent condition (\(s =\) 300).

Over sessions, evidence for the null was largely consistent for subjects in the congruent condition (Fig. 14). In sessions 1, 2, and 5, Bayes factors for 4 subjects (40%) supported the null hypothesis when compared to simulated search in the congruent condition (\(M_{1} =\) 2.25, SD\(_{1} =\) 0.51, \(M_{2} =\) 2.99, SD\(_{2} =\) 1.23, \(M_{5} =\) 2.67, SD\(_{5} =\) 1.04), whereas Bayes factors for 5 and 6 subjects (50% and 60%) supported the null in sessions 3 and 4, respectively (\(M_{3} =\) 2.77, SD\(_{3} =\) 1.06, \(M_{4} =\) 2.83, SD\(_{4} =\) 0.63). More subjects performed similarly to the ideal incongruent observer: in sessions 1 and 5, there were 6 subjects (60% and 67%^{Footnote 3}) whose Bayes factors supported the null (\(M_{1} =\) 3.00, SD\(_{1} =\) 1.05, \(M_{5} =\) 2.74, SD\(_{5} =\) 0.72), 7 subjects (70%) in sessions 3 and 4 (\(M_{3} =\) 3.05, SD\(_{3} =\) 0.54, \(M_{4} =\) 3.07, SD\(_{4} =\) 0.75), and 8 subjects (80%) in session 2 (\(M_{2} =\) 2.22, SD\(_{2} =\) 0.92).

*Incongruent human performance.*Overall, Bayes factors exceeded 1 for the ideal incongruent observer (\(s =\) 1) in a total of 22 blocks (45%), and the average Bayes factor was 2.50 (SD = 0.84), suggesting searchers performed similarly to the ideal incongruent observer in the incongruent condition half of the time. Only 5 (10%) Bayes factors exceeded 1 (\(M =\) 2.05, SD = 0.54) when human performance was compared to the ideal congruent observer, and 4 (8%) supported the null when subject data was compared performance of the ideal observer with partial world knowledge (\(s =\) 60) in the incongruent condition (\(M =\) 2.76, SD = 1.27). Consistent with human performance in the congruent condition, 0 Bayes factors supported the null when subject data was compared to simulated performance that relied solely on world knowledge in the incongruent condition (\(s =\) 300).

When human data was compared to performance of the ideal incongruent observer (\(s =\) 1), evidence favoring the null increased over sessions: Bayes factors for 2 subjects supported the null in session 1 (\(M_{1} =\) 2.21, SD\(_{1} =\) 0.93), 5 subjects (50%) in sessions 2 and 3 (\(M_{2} =\) 2.15, SD\(_{2} =\) 0.68, \(M_{3} =\) 2.43, SD\(_{3} =\) 0.92), 3 subjects (30%) in session 4 (\(M_{4} =\) 3.35, SD\(_{4} =\) 0.39), and 7 subjects (78%) in session 5 (\(M_{5} =\) 2.51, SD\(_{5} =\) 0.95). Bayes factors only supported the null when human data was compared to the ideal congruent observer twice (20%) in sessions 3 and 4 (\(M_{3} =\) 2.20, SD\(_{3} =\) 0.23, \(M_{4} =\) 2.11, SD\(_{4} =\) 0.94), and only one occurred in session 5 (\(BF_{01} =\) 1.64). Two subjects performed similarly to the ideal incongruent observer that relied partially on world knowledge (\(s =\) 60) in sessions 1 and 2 only (\(M_{1} =\) 2.42, SD\(_{1} =\) 1.93, \(M_{2} =\) 3.10, SD\(_{2} =\) 0.82), suggesting some subjects had learned some of the incongruent target locations in part during the first two sessions.

Overall, performance for most human searchers was comparable to ideal observers that learned target locations optimally, which suggests our subjects integrated world knowledge and recent experience in a near-optimal fashion. Human searchers became more similar to the ideal incongruent observer over experimental sessions, suggesting subjects learned to search in the semantically incongruent search environment over time. Human searchers did not perform as well as the ideal congruent observer in the semantically congruent experimental condition, despite the fact that the simulated searcher incurred point penalties for searching a location or switching rooms that were twice as high as those incurred by human searchers. These results suggest that human searchers learned from recent experience in ways that were near-optimal. That is, on the whole, both human and simulated searchers were able to learn to search effectively in the semantically incongruent search environment.