This module outlines general theoretical principles that underlie cognitive processes across many domains, ranging from perception to language, to reasoning and decision making. The focus will be on general, quantitative regularities, and the degree to which theories focusing on specific cognitive scientific topics can be constrained by such principles. There will be an introduction on general methods and approaches in cognitive science and some of the problems related to them. Later in the course, some computational approaches in cognitive science will be discussed. There will be particular emphasis on understanding cognitive principles that are relevant to theories of decision making.

What is Cognitive Science?

Brief history.

Notable that the narrative revolves around several key conferences where prominent figures from different fields became aligned.

Bridging Levels of Analysis for Probabilistic Models of Cognition

Levels of models:

Popular research method is to look at where people diverge from ideal solutions, to figure out what algorithms their mind is using to approximate the solution. But.. .

Rational process models - identify algorithm for approximating probabilistic inference under time/space limits, compare to what we know about mind and behavior.

Example - Monte Carlo with small number of samples is tractable. Consistent with:

Some progress in bridging to implementation level eg neural models of importance sampling.

Lecture 1

Cognitive science as reverse engineering - understand how the mind works by trying to build one and see what differs.

Brief history:


Automaticity of Social Behavior: Direct Effects of Trait Construct and Stereotype Activation on Action

(Paired with the more recent failed replication.)

Arguing that non-conscious priming can strongly affect behavior.

Experiment 1:

Experiment 2:

Experiment 3:

Argues that this works where subliminal adverts for pepsi don’t because they directly activate traits which contain behavior whereas pepsi just activates the pepsi representation. So elderly -> walk slow but pepsi -/> drink pepsi? Also because there is some activation energy to get up and buy coke, whereas they setup situations where the action was already required and the only difference was in accessibility. So priming for hostility will make people more likely to react to an annoying trigger but not to be randomly hostile.

Note that results for behavior here are stronger than their previous results for judgments, but would assume that judgments mediate behavior. But in ex1 there was no effect on perception of the experimenter. And little evidence so far for judgment mediating behavior.

Behavioral Priming: It’s all in the Mind, but Whose Mind?

Failed replication of previous paper.

Reasons to doubt original:

Experiment 1:

Experiment 2:

Most subjects were aware of the prime (but it said 6%…) and are in psych course so might be expected to be suspicious.

Priming via social cues is way more believable to me than priming via word choice. Clear selective pressure for understanding and reacting to social cues.

Lecture 2

Scientific reasoning. Psi hypothesis as running example.

Base-rate fallacy vs significance testing.

Successful replication could just mean replicating the mistakes of the original.

In a replication aim to improve on original methods or test some new factor - more likely to be received in good faith and more likely to generate new insight beyond back-and-forth.

A good successfully replication can falsify a hypothesis by more accurately identifying the mechanism behind the effect eg previous paper replicated slow walking, but showed that the effect disappeared under proper blinding.

Defenses of priming:


Try to structure experiments with multiple competing hypotheses where any given result would support some hypothesis and weaken the others.

The Cognitive Neuroscience of Human Memory Since H.M.




Other patients:

Declarative: facts, representations, conscious recall, compare/contrast memories Non-declarative memory: unconscious performance, black box

Visual perception:

Immediate and working memory:

Remote memory:

Working theory of long-term memory::


Group studies average out individual variation - allows studying less obvious effects

Finding the engram

Engram def=

The hunt:

Sharp-wave ripple events in hippocampus:



Memory, navigation and theta rhythm in the hippocampal-entorhinal system

Having a lot of trouble with this paper. Needs much more time and depth.


Implementation possibilities:

Some complex ideas about implementation in theta waves that I can’t follow, but apparently explains:

Maybe this explains why word-vec works? Are we just reverse-engineering the minds spatial relationships?


The role of the hippocampus in navigation is memory

Place cells, grid cells etc seem to imply that the hippocampus provides navigation. Paper argues that the evidence actually shows that it provides general cognitive maps and that navigation is just one usecase.

Navigation strategies:

Rats with hippocampal lesions:

Humans with hippocampal lesions:

Working theory:

Evidence that different spatial mappings are used for different tasks within the same environment.

Hippocampus maps abstract spaces:

Imaging suggests that hippocampus is not continuously involved when using cognitive maps in navigation, but only when learning or when planning/altering routes.

Speculation that hippocampus originally evolved for navigation but was co-opted for abstract relationships. (How does hippocampus size vary across species?).

Lecture 3

Divide into declarative vs non-declarative memory no longer seems to be carving at the joints:

Pattern separator vs pattern completer.

Patients learn facts at school, have high IQ and get good grades.

Use fMRI to detect 60% periodicity in humans when navigating => grid cells. Periodicity correlates with success on spatial memory task.

Experiment suggesting that periodicity can be observed even for abstract spaces, by pairing a coordinate system with bird pictures of varying neck and leg length.

Something analogous to space cells for time observed in rats.

Uniting the Tribes of Fluency to Form a Metacognitive Nation

Theory: the difficulty of a cognitive task (from fluent to non-fluent) is used as a meta-cognitive cue that feeds into other judgments via ‘naive theories’ aka heuristics.



Discounting - if fluency is recognized, subject corrects and may even over-correct.

Seems like discounting provides a lot of adjustment room in this theory. How to falsify? Could try varying eg legibility over a wide scale and looking for a discounting effect.

Lecture 4

Fluency can induce:

Familiarity seems like a reasonable heuristic - exposure => fluency, so assume fluency => exposure.

Explanation for the popcorn is that it prevents subvocalisation so can’t judge pronunciation fluency of words.

Others make less sense to me.

Notable that the class was typically split when asked to predict outcome of experiments ie proposed mechanism is so vague that either outcome is plausible.

Other ‘constructs’:

Not worth reviewing, not confident in results.

Understanding face recognition

Broad view of facial recognition, including processes like retrieving information about the faces owner.

What information might components of facial recognition produce?

Open questions:

Are faces special?

Are there dedicated cognitive process for facial processing, or do we just reuse generic object recognition?

Main arguments:

Main challenges


Holistic/configural processing vs within-class discrimination:



Argument that too many studies rely on significant vs not-significant, rather than testing interactions.

Lecture 5

Are faces special?

Face recognition could be:

Behavioral experiments:

Neural experiments:

Medical cases:


Lecture 6

Skipped the reading this week :S

Social cognition - ‘the psychological processes that result from inferring the actual, imagined, or implied mental state of another’

Affect is creeping back into models of decision-making.

Moving away from 2-process model because of neuro evidence - clearly many systems involved.

What makes a process automatic? Not requiring:

Rare for any given process to hit all 4.

Illusion of agency - maybe intent does not exist.

Debate over value of heuristics vs rationality.


When do we attribute responsibility to an agent for an action?

John laughs at the comedian. No one else laughs at the comedian. John laughs at every comedian. John laughs at the comedian every time. => Behavior is attributable to John, not to comedian

Experimentally, seems to be less sensitive to consensus than other two.

Attribute agency to objects similarly, but not moral status eg ‘computer said no’ but don’t feel bad for throwing the computer away. How do we tell the difference?

Emotions hard to define.

Dominant theory - emotion as cognitive interpretation of physiological signals . Behavior change:

Default mode = social cognition applied to self?

Lecture 7

Examples of theories that try to unify multiple phenomena:

Scale invariance:

Decision by sampling:

A theory of magnitude:

Scale-invariance as a unifying psychological principle

Scale invariance common in nature. Psych processes adapted to reflect this?

Clear examples in perception:

Can’t be purely scale-invariant, because it is possible to judge magnitudes, but usually poorly.

Not true at all for eg color perception.

Perhaps reflects that the systems themselves are implemented physically.

A theory of magnitude: common cortical metrics of time, space and quantity

Argues that:

Explaining interference in terms of attention is way too unconstrained. Sounds like single theory but close reading of literature shows that wide variety of proposed effects and causal mechanisms.

Predicts SNARC should work for any space/action -coded magnitude.

Decision by sampling

Typical theories of decision-making take utility functions as given. How do we build/calibrate a utility function given basic psychological operations?

To relate this back to previous two papers, how do we get an absolute judgment of utility out of brain systems that are only good at relative, scale-invariant judgments?

Many examples of utility functions (in aggregate) matching cumulative distribution of events in the real world.

Proposes that we sample several items from memory and use these to estimate percentile on empirical distribution.

Many other examples of similar processes:

Assumes that sampling from memory is a good approximation of sampling from reality. Some evidence for this eg Anderson & Schooler 1991.

Has anyone tested the predicted binomial noise?


Lecture 8

Language is hard to define:

Levels of analysis:

Traditional Wernicke-Geschwind model:

Problems with model:

Speech perception is ambiguous - requires top-down processing. Illusion of speech units.

Really no reason to continue teaching Wernicke-Geschwind model.

The free-energy principle: a unified brain theory?

Summary of Surfing Uncertainty

Summary of The Predictive Mind

Wikipedia on free-energy principle

Variational Bayes:

Free energy principle

But we like surprising things? Presumably this is to be explained. Or are actions chosen to minimize $F$ in general, rather than for this specific action?

Relation to infomax principle (maximizing mutual information between sense and model subject to constraints on complexity of model). Complexity term in 1st formulation penalizes more complex models - regularization/shrinking.

The fact that these models predict empirically observed receptive fields so well suggests that we are endowed with (or acquire) prior expectations that the causes of our sensations are largely independent and sparse.

Arranged hierarchically, so each model passes prediction error up and passes predictions down. Precision parameter models noise at each level. High noise => more trust in priors / predictions from above. Low noise => more trust in sensory data from below.

States ‘value is inverse proportional to surprise’. (In a particular simple model) if we perform gradient ascent on value, then the long-term proportion of time spent in a state is proportional to value, so surprise is inversely proportional to value. Since we act to minimize free energy, priors can encode values. But does acting to minimize free energy lead to gradient ascent on value? Seems like the argument is backwards.

Starting to get flashes of picoeconomics here - recursive relation between model of the future and model of own decision making.

Many references to more general connections between minimizing free energy and defying thermodynamics over lifetime of agent, which I don’t follow at all.

Active Inference, Curiosity and Insight

Various activities can be explained as acting to reduce uncertainty:

To infer expected free energy, we need priors on our own behavior.

Using example of learning complex rules by active inference. Use prior beliefs about own behavior to encode rules of task, in a way that I don’t understand.

Non-REM sleep. In absence of new sensory input, minimizing free energy => minimizing model complexity vs accuracy. Pruning as regularization.

REM sleep. After pruning parameters, need to reevaluate posterior. Can do this by re-simulating observed evidence.

Superstition as premature pruning.

Open confusions: choice of action vs expected free energy, encoding values as priors, explore vs exploit, precision. Suspect that many of these would be resolved by implementing one of the examples

Active inference and epistemic value

Lecture 9

Value of actions can depend on order eg find food then eat vs eat then find food. So have to evaluate policies, not individual actions.

$\sigma$ is softmax.

Penalizes divergence between $Q$ and $P_\text{prior}$, can set prior on future state to encode value. Not clear how to encode non-bounded tasks.

Bear in mind that we are summing log-probabilities == multiplying probabilities. So states that have 0 on any of the decompositions are still worthless overall.

Depression, self-destructive behavior etc explained as malformed priors.

From discussion afterwards:

Lecture 10

Embodied cognition - cognitive processes rooted in perception and action, knowledge not stored as abstract symbolic representation but derived on the fly from perception (past or present) and action.

Doesn’t seem to pin down a clear hypothesis, makes it difficult to figure out which experiments support which version of the theory.

Eg language

Usually attempt to demonstrate embodiment by demonstrating interaction between cognition and perception/action.

Classic experiments which failed to replicate:

Presented several other experiments which have yet to be replicated. Effect sizes are typically <1%

Think of embodiment as a spectrum from purely symbolic/logical to fully embodied. Claim evidence does not strongly support either end of the spectrum.

Models of embodiment underspecified. Any effect of the body on thought taken as evidence for embodiment without understanding of how embodiment works. We should be able to explain the pattern of results, not just whether embodiment is there or not.