Why Machines Learn

Why Machines Learn: The Elegant Math Behind Modern AI, Anil Ananthaswamy, 2024

November/December…, 2024

Context and Reflection

I am reading this book as part of a small book club: The 26-minute Book Club. This sort of book is really not my cup of tea — it is about the math with which various learning algorithms are implemented. I am, at my deepest, interested in the algorithms. Math is neither a strong point nor a deep interest — it seems more like magic to me: You generate a bunch of tautologies, define some terms and rules, and then elaborate it, and all of a sudden it enables you to do things.

I had imagined I would give this a one-meeting trial, and then bow out. However, I found the group (4 other people) quite pleasant, and composed of — barring the one person I know who invited me — interesting strangers that I would not otherwise get to know. And that, especially in retirement, is of great value.

Another advantage is that working through this book forces me to think about topics I would otherwise ignore: basic math including linear algebra, vectors, a little calculus, probability, gaussian distributions, Bayes’ Theorm. Perhaps it will prove tractable enough that I will feel emboldened to take on fluid dynamics, something that I suspect would help me better understand a whole range of natural patterns, from meteorological to geological…. Or perhaps I will figure out how to distill the ‘qualitative’ aspects that I am interested from the mangle of symbols. We shall see.

C1: Desperately Seeking Patterns

  • Konrad Lorenz and duckling imprinting. Ducklings can imprint on two moving red objects, and will follow any two moving objects of the same color; or two moving objects of different colors, mutatis mutandis. How is it that an organism with a small brain and only a few seconds of exposure can learn something like this?
  • This leads into a brief introduction to Frank Rosenblatt’s invention of Perceptrons in the late 1950’s. Perceptrons are one of the first ‘brain-inspired’ algorithms that can learn patterns by inspecting labeled data.
  • Then there is a foray into notation: linear equations (relationships) with weights (aka coefficients) and symbols (variables). Also sigma notation.
  • McCulloch and Pitt, 1943, story of their meeting, and their model of a neuron — the neurode — that can implement basic logical operations.
  • The MCP Neurode. Basically, the neurode takes inputs, combines them according to a function, and outputs a 1 if they are over a threshold theta, and a 0 otherwise. If you allow inputs to be negative and lead to inhibition, as well as allow neuroses [sometimes spell-correct is pretty funny] to connect to one another, you can implement all of boolean logic. The problem, however, is that the thresholds theta must be hand-crafted.
  • Rosenblatt’s Perceptron made a splash because it could learn its weights and theta from the data. An early application was to train perceptrons to recognize hand drawn letters — and it could learn simply by ‘punishing’ it for mistakes.
  • Hebbian Learning: Neurons that fire together, wire together. Or, learning takes place by the formation of connections between firing neurons, and the loss or severing of connections between neurons that are not in sync.
  • The difference between the MCP Neurode and Perceptrons is that perceptrons input’s don’t have to be 1 or 0 — they can be continuous. And they are weighted, and they are compared to a bias.
  • The Perceptron does make one basic assumption: that there is a clear, unambiguous rule to learn — no noise in the data. If this is the case, it can be proven that a perceptron will always find a linear divide (i.e. when there is one to be found).

C2: We are All Just Numbers

  • Hamilton’s discovery of quaternions, and his inscription on Brougham bridge in Dublin. i2 = j2 = k2 = ijk = -1 Quaternions don’t concern us, but Hamilton developed concepts for manipulating them that are quite important: vectors and scalars.
  • Scalar/Vector math: computing length; summing vectors; stretching a vector by scalar multiplication;
  • Dot product: a.b = a1b1 + a2b2 (the sum of the products of the vector’s components). The dot product (AKA the scalar product) is an operation that takes two vectors and returns a single number (a scalar). It’s a way to quantify how much two vectors “align” with each other — that is, the degree to which they point in the same direction.
    • E.g. Imagine pushing a model railroad car along some tracks. If you push in the exact direction that the tracks go, all the force you apply goes into moving the car; if you push at an angle to the tracks, only a portion of the force you apply goes into moving the car. This (the proportion of force moving the car along the tracks), is what the dot product gives you.
  • Something about dot products being similar to weighted sums, which can be used to represent perceptrons??? Didn’t understand this bit. [p. 36-42]
  • A perceptron is essentially an algorithm for finding a line/plane/hyperplane that accurately divides values into appropriately labeled regions.
  • Using matrices to represent vectors. Matrix math. Multiplying matrix A with the Transpose of Matrix B
  • So the point of all this is to take Rosenblatt’s Perceptron and transform it into formal notation that linearly transforms an input to an output.
  • Lower bounds tell us about whether something is impossible.” — Manuel Sabin
  • Minsky and Papert’s book, Perceptrons, poured cold water on the field by proving that Perceptrons could not cope with XOR. XOR could only be solved with multiple layers of Perceptrons, but nobody knew how to train anything but the top layer
  • I am not clear on why failure to cope with XOR was such cold water…
    Later: It is because XOR is a simple logical operation; the inability of Perceptrons handling it suggested that they would not work for even moderately complex problems. Some also generalized the failure to all neural networks, rather than just single layer ones.
  • Multiple layers requires back-propagation…

C3: The Bottom of the Bowl

  • McCarthy, Minsky, Shannon and Rochester organized the 1955 Dartmouth summer seminar on Artificial Intelligence. Widrow attended this seminar, but decide it would take at least 25 years to build a thinking machine,
  • Widrow worked on filtering noise out of signals: He worked on adaptive filtering, meaning a filter that could learn from its errors. Widrow worked on continuous signals; others applied his approach to filtering digital signals. Widrow and Hoff — Adaptive filtering — invented Least Mean Squares algorithm.
  • Least Mean Squares is a method for quantifying error. What Widrow wanted to do was to create an adaptive filter that would learn in response to errors — this required a method for adjusting parameters of the filter so as to minimize errors. This is referred to as The Method of Steepest Descent, discovered by the French mathematician, Cauchy.
  • Much of the rest of the chapter introduces math for ‘descending slopes.’ dx/dy moves us along a gradient… the minimum will have a slope of zero. When we have planes or hyperplanes we need to take multiple variables into account so we have partial derivatives.

“If there’s one thing to take away from this discussion, it’s this: For a multi-dimensional or high-dimensional function (meaning, a function of many variables), the gradient is given by a vector. The components of the vector are partial derivatives of that function with respect to each of the variables.

What we have just seen is extraordinarily powerful. If we know how to take the partial derivative of a function with respect to each of its variables, no matter how many variables or how complex the function, we can always express the gradient as a row vector or column vector.

Our analysis has also connected the dots between two important concepts: functions on the one hand and vectors on the other. Keep this in mind. These seemingly disparate fields of mathematics-vectors, matrices, linear algebra, calculus, probability and statistics, and optimization theory (we have yet to touch upon the latter two) – will all come together as we make sense of why machines learn.”

Reading Break…

  • So with an adaptive filter, you filter the input, and look at the error in the output, and feed that error back into the filter, which adjusts itself to minimize the error.
  • So first you need to be able to have an input where you already know what the true signal is, so that you can determine the error after the filter has transformed the input. How do you get that? ➔ Later: In the application we’re talking about, this is the training phase. Once the model is trained, you assume the characteristics of the noise will not change and the model will continue to work.
    One issue is whether the noise in the signal is always of the same sort — that is, if you train an adaptive filter on a bunch of inputs whose signals you know, will that give you a good chance of having a filter that can appropriately transform an unknown signal? The book uses the example of two modems communicating over a noisy line, and it makes sense (I think) that noise would have fairly uniform characteristics, at least for the session. But that seems unlikely to hold for everything.
    Can we assume that the noise, in a particular situation, is always the same ,or at least has the same statistical properties?
    Suppose the source or nature of the noise in the signal changes over time? Well, you could embed some kind of known signal into the input (I imagine, say, a musical chord), and let the filter learn to adjust the output so that the known chord comes through.
    But will a filter that preserves the chord also preserve the other information in the signal? I have no idea. I’d think it would depend a lot on (1) the nature of the signal, and (2) the nature of the noise.
  • I’m confused about the part about adding delays to signal… and I’m confused about how, in real life, you know what the desired signal is.
  • Later: Still not very clear on the noise issue, but I’m guessing it depends on what you’re applying it to. If the noise is varies in an unpredictable way for a particular application, then the filter/neuron simply won’t work and will produce gibberish.
  • Anyway, let’s assume we know the desired signal (and hence the noise/error) — how do we quantify the later? We don’t want to just add it up because it can have positive or negative values which would cancel one another out — instead, the errors are squared, and you take their mean to quantify the noise: the is called the Mean Square Error. It is also the case that squaring the errors exaggerates the effects of the larger errors, which seems like a desirable thing.
  • The math shows that the formula for the error associated with an adaptive filter is quadratic, meaning that it will be concave, and that thus the minimum error will be the minimum of the function. That can be found in multiple ways, either by finding the point at which the slope of the function is zero, or using gradient descent to find it.
  • A problem is that to do this, you need more and more samples of xn and yn and dn to calculate parameters, and you need to use calculus to calculate partial derivatives, and especially in high dimensional space this becomes burdensome (or impossible).
  • The solution was that Widrow and Hoff found a (IMO kludgy) way to just estimate the error without doing a lot of work.

weightNew = weightOld + 2 • <step-size> • <error-for-a-single-point>

  • This is called the Least Mean Squares (LSM) algorithm.
    Later: What they are doing is taking a single data point(a single input-out put pair) at random and using that to estimate the gradient and adjust the weight. Each update a new pair is randomly selected, and over time the algorithm noisily decreases the error. This is called Stochastic Gradient Descent. There is an alternative to this approach called mini-batch gradient descent that uses a randomly-selected set of points (e.g. 32 of them) for each update.

C4: In All Probability

  • The Monty Hall problem. There are three doors, one of which has a valuable prize behind it, and the others which have only goats. After you’ve picked door 1, Monty opens door 2, revealing a goat. You now have a change to change your pick — should you do that?
  • The answer. The answer is “yes.” For a long time this seemed counterintuitive to me (and Paul Erdos): revealing what is behind one of the doors should not change the probability of what is behind the other doors. What was tripping me up (ironically) is that I was ignoring the psychology. The key is that Monty is not opening a door at random: he knows what is behind each door, and in particular, he will not open a door that has the prize behind it (as that would destroy the game). So when Monty opens door 2, he is sometimes providing information about both door 2 and about door 3.
  • Let’s suppose I’ve picked the first door. There are three cases:
    (1) Pxx — if I have the correct door, Monty can open either of the others.
    (2) xPx — If a goat is behind 1, and P behind 2, Monty can only open 3
    (3) xxP — If a goat is behind 1, and P is behind 3, Monty can only open 2
    In 2 of these 3 cases, switching to the remaining unopened door gets me the prize. Monty has change the prior probabilities, and so we much re-evaluate.
  • This argument will hold for any number of doors, because Monty always knows where the car is, and since he will avoid opening that door, every door he opens changes the priors — i.e. gives additional information about the unopened doors.
  • Later: If we construct a different version of the problem, where, before Monty can pick a new door, an earthquake strikes and door 2 happens to collapse, revealing the goat, there is no reason to change (or not change) your pick. The revelation of the goat behind door 2 does not give us further information about what is behind any of the other doors, since the earthquake’s ‘revelation’ was truely a random event.
  • Bayes Theorem history. Interestingly, Thomas Bayes’ essay describing his approach was only presented to the Royal Society in 1763, two years after his death, by his friend Richard Price (who later scholars believe made substantive contributions, although Price attributed it all to Bayes).

P (X-is-true | given Evidence-for-X is positive)
IS EQUAL TO
P-X-in-the-world • How-strong-the-evidence-is (e.g. the accuracy of the test)
————————————- (DIVIDED BY) —————————————————-
(P-X-in-the-world • probability of a true positive)
• (1 – P-X-in-the-world) • (1 – probability of a true positive)

OR

The empirical probability in the world * the predictive accuracy given evidence
————————————————————————————————————————–
the likelihood of the world producing that evidence
(=sum of probabilities of all ways of producing that evidence)

Reading Break…

  • Machine Learning is inherently probabilistic because there are an infinite number of hyperplanes that can discriminate between a learned alternative, and it has settled on one of them for no particular reason. Other factors that make ML less than accurate are that the data itself may have errors, and that the amount of data drawn upon is limited. Later: And we must keep in mind that ML is only minimizing error — whether the result has enough signal to be useful is an empirical and domain-dependent issue.
  • Distinction between theoretical probability and empirical probability (e.g., theoretical probability of a fair coin coming up heads is 50%; empirical probability of a fair coin coming up heads depends on actually doing it, and it will approach but not reach the theoretical probability as one increases the number of empirical samples.
  • Aside: There is also the issue of the degree to which real-world events are actually expressions of mathematical distributions. It seems elegant to assume that, but is it really so?
  • The case of a coin flip is an example of a Bernoulli distribution. It has only two values, a and b, and can be characterized by a probability p that such that p is the probability of a, and (1p) is the probability of b.
  • Distributions with a mean and a variance (aka standard deviation). Now consider the case where you have N>2 outcomes, each with their own probability. This is distribution can be characterized by a mean and standard deviation — the mean (aka the expected value) is the sum of the values of each outcome multiplied by their probabilities, and the standard deviation is the square root of {the sum of the squares of the (deviations of each value from the mean)} — or the sum of the absolute value of each values difference from the mean. We can talk about the distribution as a whole in terms of its probability mass function.
  • So far we’ve been talking about variables with discrete values, but we can instead talk about variables with continuous values. Here we can’t talk about the probability of a particular value because there are an infinite number of values and the probability of any single perfectly precise value is zero. So, instead, what we do is talk about the probability of a value occurring within particular bounds: this is called the probability density function. [Aside: But wouldn’t it be possible to do some calculus like move where we look at what happens as a finite interval approaches zero?]
  • The important point is that whether we have a variable with discrete or continuous values, we can use well-known and analytically well understood functions to characterize the distribution.
  • Machine Learning. Let’s being with a set of labeled data points: y is a label that has two values, and x is an instance of the data. y is categorical; x is a vector with N components. This data set can be represented as a matrix: y1, x11, x12, x13 .. x1n and so on for y2, y3, etc. Each component of x is a feature that the algorithm will use to discriminate which y x belongs to.
  • Now, if you knew the underlying distribution P(y, x), you could determine the probability of y=A given x, and the probability of y≠A given x, and use the highest probability to assign the label. If this were the case, this is what would be termed a Bayes Optimal Classifier. [Aside: I’m a little unclear on this — it seems like it’s dependent on a particular situation, and so it seems odd to give it this sort of name.]
  • But usually you don’t know the underlying distribution, and so it must be estimated. Often it is easier to make assumptions about the underlying distribution (Bernoulli? Gaussian?), but it is important to keep in mind that these are idealizations chosen to make the math easier.
  • Aside: A Gaussian distribution is defined as being symmetric with respect to a single mode (which is also the mean and median), with asymptotic tails that never reach 0.
  • There are two approaches to estimating distributions:
    One is MLE or Maximum Likelihood Estimation, involves selecting a theta (that is a distribution with particular parameters indicated by theta) that maximizes the likelihood of observing (generating?) that labeled data you already have.
    In the text, MLE is exemplified by imagining a set of data about two populations’ heights, labeled short or tall, and that each population has a gaussian distribution, and thus that all the data will be best modeled by a combination of the two distributions. ???But is that still a gaussian distribution? And what is the rational for choosing a gaussian distribution rather than some other distribution???
    The other is called MAP, for Maximum A Posteriori estimation. As best I can tell, this involves estimating the distribution based on our experience of the world and representing it as a function. Then you set the derivative to zero, and solve that equation (after checking to be sure that you’ve got the maximum rather than the minimum). If this approach does not yield a ‘closed’ solution (most of the time), you can take the negative of the function and use gradient descent to find its minimum.
  • MLE vs MAP. So MLE tries to find the maximum using the data, and MAP tries to find the maximum of a distribution you’ve guessed at. MLE works best when there’s a lot of data; MAP works best when there isn’t much data you can make a reasonable guess about the underlying distribution).
  • The Federalist Papers example: which unattributed papers were written by Madison and which by Hamilton. A first approach was to analyze the length of sentences in papers know to be written by each author, and use the mean length and the standard deviation to discriminate: unfortunately the means and SDs for each author were almost identical. Later, someone suggested using the frequency of word use, and, in particular, function words tended to reliably discriminate between the known works of the two authors: this was then used to predict which of the unattributed essays were written by which author.
  • Aside: It is not clear to me whether the achievement of Mosteller and Wallace was due to the use of Bayesian reasoning, or to the realization the word choice was a very good discriminator between the authors. ➔ LATER: Consensus among scholar from many fields indicates, according to chatGPT, that their work did indeed validate the use of Bayesian inference, as well as creating a new approach to linguistics, and indicating the ways in which computers could be used.
  • The Penguin example.
  • A trick that statisticians use is to assume that the distributions for each feature under consideration are independent — seems like a bit of a leap, but it appears to work and it makes the math easier and requires less data.
  • Naive or Idiot Bayesian classifier.

… reading break…

C5: Birds of a Feather

  • The 1854 Snow Cholera Epidemic map
  • Voroni diagrams and nearest neighbors
  • When you represent something as a set of N-dimension vectors, the vectors can be considered as points in an n-dimensional space, and you can use the NN algorithm to compute their neighbors, and devise non-linear boundaries between labeled points.
  • However, as the dimensionality of the space increases, there’s a problem in that the space, in most regions, becomes very sparsely populated…

… reading break…

C6: There’s Magic in Them Matrices

  • Principal components analysis (PCA)  involves reducing the dimensionality of a data of space in such a way that the retained dimensions capture most of the variance.
  • This assumes that dimensions that do not contain much variance are unimportant; and that the dimensions that capture a lot of variance  actually are important. This may not be true.
  • A few notes on Vectors
    • A vector with six components as a dimensionality of six. That is, it can be represented by a single point in a six dimensional space.)
    • Multiplying a vector by a square matrix of the same dimensionality, basically means changing the magnitude and orientation of that vector in the same space.
    • An eigenvector of a matrix is a non-zero vector such that, when it is multiplied by the matrix, it does NOT change its orientation, only its length. The length of the eigenvector is called the eigenvalue.
    • Basically, for any matrix of dimension N, you can find N eigenvectors that, when multiplied by the matrix, do not change orentation, but only change magnitude (that magnitude is the eigenvalue)
    • For a symmetric matrix, the eigenvectors lie along the major and minor axes (hyperaxes) of the (hyper)ellipse.
    • There is a nice visualization on pages 186-187 of what it means to find the eigenvectors
  • Centering a matrix means taking the mean for a particular dimension (feature) and subtracting that from each  individual value for that feature: that transforms each feature’s value into how much it deviates  from the mean. This is also called “mean corrected” matrix.
  • If you multiply a mean, corrected matrix by its transpose, you get a square matrix where the diagonal values showed the variance for each feature, and the off diagonal values show the covariant between pairs of features. This is called the mean-corrected ovarian matrix. 
  • Now, if you compute the eigenvectors for that matrix, the eigenvalues will allow you to see where most of the variance is and do a principal components analysis, reducing the dimensionality of the matrix.

… reading break…
Actually, a break of about six weeks
during which I was traveling and missed the group meetings.
I may go back and summarize the missed material… or I may not not.

C7: The Great Kernel Rope Trick

The first part of the chapter talks about developing an algorithm for finding the optimal hyperplane that separates two groups. The improves on what a perceptron does — it finds a hyperplane, but not necessarily (and probably not) the optimal one. The algorithm here works by maximizing the margin or border between the closest points.

The kernel trick refers to computing the dot products of vectors in higher dimensional spaces by computing the dot products of lower-dimension vectors that are mappings from the original vectors. This is significant because one way of classifying sets of data that overlap is by projecting them into higher dimensional spaces where they don’t overlap, and figuring out the optimal hyperplane there — except the kernel trick allows you to calculate the hyperplane without doing in the high-dimensional space. The term used for this technique is Support Vector Machines.

C8: With a Little Help from Physics

Hopfield Networks. Networks that have local energy minima that represent ‘memories.’ When a new input comes that produces something close to a particular memory, that pattern is likely to ‘decay’ into the stable pattern that represents the memory.

Hopfield’s work showed that the brain could be modeled as a dynamic system.

C9: The Man Who Set Back Deep Learning (Not Really) – TBD

C10: The Algorithm that Put Paid to a Persistent Myth – TBD

C11: The Eyes of a Machine

  • Hubel and Weisel’s work on the visual cortex of cats beginning with the serendipitous discovery of edge detection. They described a neural architecture in which a hierarchy of cells detected increasingly complex visual features based on multiple simple detectors feeding into more complex detectors.
  • This architecture was used in neural nets…
  • The convolution operation enabled the mathematical mimicking of detectors using the convolution matrices (aka filters).
  • Cu Lin figured out how to create networks that could learn their own filters…

C12: Terra Incognita – TBD

Epilogue

Views: 6

Four Billion Years and Counting…

Four Billion Years and Counting: Canada’s Geological Heritage. Produced by the Canadian Federation of Earth Sciences, by seven editors and dozens of authors. 2014.

November-December, 2024.

I am reading this with CJS. It is a nice overview of regional geology, and it is nice that all the examples come from Canada, and at least some of the discussion is relevant to Minnesota Geology. The book is notable for its beautifully done pictures and diagrams.

Continue reading Four Billion Years and Counting…

Views: 14

Othello

November 2024

Reading as part of the Fall 2024 Shakespeare course — see general notes for more.

Although its a famous play, and does indeed contain some striking things — particularly Iago’s manipulation of Othello, and also the use of the hankerchief as symbol of fidelity and betrayal – I was not that keen on this play. Give me some comedy, or at least a little more magic!

Precis of the play

Othello, a famous general fighting for Venice, has just married Desdemona, to the dismay of her father. Othello is black, and an outsider, and knows little of the customs or society of Venice – but he is valued due to his military prowess, especially as the Turks seem about to attack. He has chosen the polished and bookish Cassio as his lieutenant, much to the distress and anger of Iago, who has spent his life in the field and believes he has earned the postion. Iago decides to get revenge, and aims to destroy Cassio and Desdemona and, through her, Othello. 

After this, the play unfolds in a straightforward way. Iago subtly raises questions about Desdemona’s faithfulness – all the while pretending that he is reluctant to speak and is unsure of the truth of what he is saying – and in a famous scene transforms Othello’s trust of Desdemona into suspicion, suggesting that she is having an affair with Cassio. Iago is one of Shakespeare’s most famous villians – Coleridge referred to him as having “motiveless malignity.”

Othello wants visible proof, and here Desdemona’s hankerchief comes into play. It was her first gift from Othello, and it was woven by a fortune teller with magical properties. Iago secrets Desdemona’s hankerchief (which she had lost and Emilia found and given to Iago) in Cassio’s quarters. Cassio finds the hankerchief and gives it to the courtesian Bianca to copy – Othello watches this from a distance, and believes it proof of Desdemona’s infidelity. Othello orders Iago to kill Cassio, and Othello strangles Desdemona. When it is revealed that Desdemona was innocent, Othello kills himself.

Views: 0

Through the Language Glass

by Guy Deutscher

October 2024

This is an excellent book; interesting well-documented science, and some beautiful and erudite writing as well. The basic argument — that grammar determines what must be specified, rather than what can be specified, and in that manner instills certain habits of mind that effect how people see the world — seems correct, if not quite living up to the subtitle of the book: Why the World Looks Different in Other Languages.

Perhaps the most interesting and fun part of the book was to be introduced to languages that work very differently from English: The Mates language (in Peru) that requires speakers to specify whether the fact they report is based on personal observation, indirect evidence, or hearsay; and the Australian language that has no egocentric prepositions, but requires all positional information to be reported in terms of the cardinal directions, thus requiring their speakers to always be oriented.

This book was a pleasure to read. I plan to seek out other work by this writer. 

Contents

Front Matter

On whether languages reflect the characteristics of their speakers, he writes:

Many a dinner table conversation is embellished by such vignettes, for few subjects lend themselves more readily to disquisition than the character of different languages and their speakers. And yet should these lofty observations be carried away from the conviviality of the dining room to the chill of the study, they would quickly collapse like a soufflé of airy anecdote-at best amusing and meaningless, at worst bigoted and absurd.

— p. 2

The basic argument of the book is this:

The effects that have emerged from recent research, however, are far more down to earth. They are to do with the habits of mind that language can instill on the ground level of thought: on memory, attention, perception, and associations. And while these effects may be less wild than those flaunted in the past, we shall see that some of them are no less striking for all that.

I think it is correct, but that the subtitle of the book – Why the World Looks Different in Other Languages – is a bit of an exaggeration.

C1-5: <Reprise of history and status of color terms>

C1: Naming the Rainbow

This chapter reprises now-unknown work by William Gladstone (now remembered as an English prime minister) on Homer and his writings, and focuses in on particular on one chapter in Gladstone’s monumental 3,000 page work: a chapter on Homer’s use of color terms.

Gladstone’s scrutiny of the Iliad and the Odyssey revealed that there is something awry about Homer’s descriptions of color, and the conclusions Gladstone draws from his discovery are so radical and so bewildering that his contemporaries are entirely unable to digest them and largely dismiss them out of hand. But before long, Gladstone’s conundrum will launch a thousand ships of learning, have a profound effect on the development of at least three academic disciplines, and trigger a war over the control of language between nature and culture that after 150 years shows no sign of abating.

Gladstone notes that Homer uses color terms in odd ways — the famous “wine dark sea” (really “wine-looking” sea) being just one example.

Mostly Homer, as well as other Greek authors of the period, use color very little in their descriptions: mostly they use black or white; terms for colors are used infrequently and inconsistently. For example, the only other use of “wine-looking” is to describe the color of oxen.

Gladstone’s fourth point is the vast predominance of the “most crude and elemental forms of color”-black and white-over every other. He counts that Homer uses the adjective melas (black) about 170 times in the poems, and this does not even include instances of the corresponding verb “to grow black,” as when the sea is described as “black-ening beneath the ripple of the West Wind that is newly risen.” Words meaning “white” appear around 100 times. In contrast to this abun-dance, the word eruthros (red) appears thirteen times, xanthos (yellow) is hardly found ten times, ioeis (violet) six times, and other colors even less often.

C6: Crying Whorf

This chapter describes the origin, rise and fall of linguistic relativity. Sapir is depicted as respectable but making over-stated claims; Whorf comes across as a charlatan, for example, making claims to have deeply studied Hopi, when he only had access to a single informant in New York – and making broad claims that are entirely wrong (e.g. that the Hopi language does not have a future tense). 

Deutscher traces the origin of linguistic relativity to Wilhelm von Humboldt in 1799,  whose “linguistic road to Damascus led through the Pyrennes.” Deutscher encountered the Basque language, and found that it was radically different from the languages linguists tended to study. He then sought out other ‘more exotic’ languages, which he found by going to the Vatican library and studying the notes of Jesuit missionaries to South and Central America: “…Humboldt was barely scratching the surface. But the dim ray of light that shown from his materials felt dazzling nonetheless because of the utter darkness in which he and his contemporaries had languished.” p. 135 Although Humboldt’s ideas led to linguistic relativity, it should be noted that he had a much more nuanced and correct view: In principle, any language may express any idea; the real differences among languages are not what they are able to express but in “what it encourages and stimulates its speakers to do from its own inner force.” But this view was not carried forward, and instead: “The Humboldtian ideas now underwent a process of rapid fermentation, and as the spirit of the new theory grew more powerful, the rhetoric became less sober. ”

All that said, Deutscher argues it is a mistake to dismiss the idea that language has no influence over thought. But rather than taking the strong case the language constrains thought, he instead argues the habits of language may lead to habits of mind. In the case of the influence of language, and refers to the idea that Boas introduces and that Jakobson crystalized into a maxim: “Languages differ in what they must convey, and not in what they may convey.”

Phrases I like

“…has still the power to disturb our hearts.” [Sapir, referring to Homer, Ovid, etc.] p. 129

“[His] linguistic road to Damascus led through the Pyrennes.” p. 134

“…Humboldt was barely scratching the surface. But the dim ray of light that shown from his materials felt dazzling nonetheless because of the utter darkness in which he and his contemporaries had languished.” p. 135

Views: 4

Henry V

October 2024

Reading as part of the Fall 2024 Shakespeare course — see general notes for more.

Precis

Background: Henry V is son of Henry IV, who obtained the throne by usurping it from Richard II – this means that there is some feeling that neither Henry is a legitimate ruler. Before becoming King, Henry V was a wild youth, dissipated and engaged in lascivious acts. But on his father’s death, Henry becomes a serious and mature ruler. 

The play opens with a chorus praising Henry as an unmatched warrior King. But then, the next act depicts the Archbishop of Canterbery revealing his plan to avert a harsh tax on the Church by legitimizing and encouraging Henry’s plans to invade France and take its throne.

Act 2 begins with the chorus describing the desire of the young men of England to pursue honor by participating in this war. The first scene following this shows conversation – and almost a fight, between three old soldiers who are erstwhile companions of Henry – the depicts honor as the least of their concerns. The second scene of Act shows the unmasking of traitors among the Lords who support Henry. 

Act III. The war has begun. The English army, led by Henry, lays siege to the French town of Harfleur. Before the gates, Henry delivers a rousing speech (“Once more unto the breach…”) to rally his soldiers; the siege takes a heavy toll, and the town eventually surrenders.

 In Act IV, Henry has arrived at Agincourt; his army is weary and outnumbered. Henry, in disguise, walks among his soldiers at night, listening to their fears and doubts. In the morning, Henry delivers his famous St. Crispin’s Day speech, which lifts the English spirits.

In Act V, the battle of Agincourt is won by the English. Henry returns to England, where the victory is celebrated, and then to France, to negotiate the final terms of the peace. There he woos a reluctant Princess Katherine, which marriage will solidify his claim to the thrown. The play ends with a reminder that Henry will die, and things will unravel.

Structure of the Play

1. Invasion Groundwork

  • Prologue: Chorus wishes for a greater stage, and tells audience to use its imagination.
  • 1.1 Theological Justification
    Bishops of Canterbury and Ely discuss bill that will seize money from the search; they plan to avoid it by providing a theological justification for Henry V’s claim to France, and thus his invasion. They also mention how much Henry V has changed since his father’s death: “And so the Prince obscured his contemplation / Under the veil of wildness / which, no doubt, grew like the summer grass, fastest by night / Unseen yet crescive in his facility
  • 1.2: Bishops assure H of invasion’s morality; tennis ball mock
    Henry V
     invites the Bishops to give an explication of the law regarding his claims to France, and they do so, even as Henry repeatedly asks them to be honest about it. Henry also raises the possibility of Scotland invading should he go to France, but the Bishops argue that that can be defended against. Finally, after deciding that he will take control of France, by invasion if necessary, he invites in the French ambassadors, who, in a message from the Dauphin, present him with a barrel of tennis balls. Henry says he will play play a set in France, and will “strike his father’s crown into the hazard.” Exter, uncle to the King, is present and speaks a line or two. 

2. Preparations for War

Elimination of traitors; introduction of common solidiers; preparation by France

  • Chorus: The chorus describes the excitement in England about the coming war – They sell the pasture now to buy the horse – and provide notice that three nobles – Cambridge, Scroop, and Grey –have become traitors. 
  • Bardolph, Henry’s former tavern companion, prevents two solidiers – Nym and Pistol – from fighting over Hostess Quickly, Pistol’s wife, and requires them to become friends. They are interrupted by news that Falstaff is dying. 
  • Cambridge, Scroop, and Grey are brought into Henry V’s presence, not realizing that he knows they are traitors, and are asked about whether Henry should show mercy to someone who has spoken against it. They say no, and override Henry’s wishes to show clemency. He the reveals that he knows of their betrayals, and they are all condemned to death.
  • Falstaff has died. BardolphNymPistol and Hostess Quickly morn his death. The three men prepare to depart for France, and Pistol bids Hostess Quickly goodbye. 
  • The King of France and the Dauphin plan for the defense of France against Henry – the King is cautious, the Dauphin is not, being contemptuous of Henry, and ignoring warnings about Henry’s new ethos. Exter enters as ambassador, and asks the King of France to yield to Henry, and returns insults to the Dauphin. The King says he will answer in the morning: “A night is but small breath and little pause / To answer matters of this consequence.

3. Invasion, part 1: Success as Harlefor surrenders

Initial success: Harlefor surrenders; commoners show cowardance; 5:1 odds

  • Chorus: Describes the departure of the English navy: …
         Play with your fancies and in them behold, 
         Upon the hempen tackle, shipboys climbing.
         Hear the shrill whistle, which doth order give 
         To sounds confused. Behold the threaden sails, 
         Borne with th’ invisible and creeping wind, 
         Draw the huge bottoms through the furrowed sea 
         Breasting the lofty surge. O, do but think 
         You stand upon the rivage and behold
         A city on th’ inconstant billows dancing…

    and notes that the French King offered the hand of his daughter and some small unprofitable dukedoms – this offer is disregarded (and is reported only after the navy is described as being launched). 
  • The invasion begins: “Once more into the breach, dear friends, once more / Or close the wall up with our English dead.” Henry makes a speech as the prepare to advance.
  • The three soldiers show their cowardence in trying to withdraw from the assault – they are driven back to it by Captain Fluellen. Captain Fluellen then engages in discussions and disputations with three other Captains: Glower, Jamy, Macmorris. [Not quite sure of the point of this scene]
  • Henry gives a speech before the gates of Harlefor, saying it is their last chance, and that they will be to blame if they do not surrender and the city is ravaged:

I  will not leave the half-achieved Harfleur 
Till in her ashes she lie buried.
     The gates of mercy shall be all shut up, 
     And the fleshed soldier, rough and hard of heart 
     In liberty of bloody hand, shall range 
     With conscience wide as hell, mowing like grass
     Your fresh fair virgins and your flow’ring infant
     What is it then to me if impious war, 
     Arrayed in flames like to the prince of fiends, 
     Do with his smirched complexion all fell feats 
Unlinked to waste and desolation?

  • Katherine, Princess of France, has one of her maids teach her English. [The scene appears to be presented in French – would the audience have understood???]
  • The governor surrenders the town, and Henry spares its citizens.
    [Neither of these things happened in history.]
  • The French nobles are embarrassed by Henry’s successful invasion. But they convince themselves they will triumph, and send an ambassador to ask what ransom Henry will offer when he is captured.
  • Ancient Pistol has distinguished himself and pleads with Captain Fluellen for the life of Bardoph, who has been sentenced to death for stealing. His plea is rejected, and he departs with a curse. Captain Fluellen talks with Henry, and mentions Bardolph, whose execution Henry upholds. The French Ambassador, Mountjoy arrives to enquire about Henry’s ransom: Henry says ‘nothing but my body.’
  • The French nobles, confident of their victory on the eve of the battle, boast and banter among themselves.

4. Invasion, part 2: Triumph at Agincourt

Eve of  battle; Henry & Williams & Fluellen; Pistol demands ransom;  triumph at Agincourt

  • The Chorus draws a beautiful picture of the two armies the night before the battle, camped across from one another, awaiting the morning. The French confident, the English anxious… but with Henry moving among them to raise morale.
         Now entertain conjecture of a time
         When creeping murmur and the poring dark
         Fills the wide vessel of the universe
    From camp to camp, through the foul womb of night,
         The hum of either army stilly sounds, 
         That the fixed sentinels almost receive 
         The secret whispers of each other’s watch.
         Fire answers fire, and through their paly flames 
         Each battle sees the other’s umbered face
  • Henry walks though his camp, in disguise. He encounters Pistol, overhears a conversation between Grover and Fluellen that leaves him impressed with the Welshman’s quality, and argues with a soldier – Williams – about the King’s responsibility for the spiritual fate of his solidiers – they exchange gloves with the intention of dueling later. Last, Henry laments his father’s usurpation of Richard II’s throne. 
  • The French nobles, about to fight, lament that the English are so few and weak.
  • Henry gives a speech of encouragement again. Responding to someone wishing for more men, Henry says he does not wish for more, and furthermore that those who do not wish to figtht will be furnished with passage home. ‘I do not wish to share the honor more than I have to,’ is his sentiment.
  • The ambassador, Mountjoy, comes again to negotiate a ransom, which Henry refuses. 
  • A French soldier surrenders to Pistol, who threatens to kill him unless he provides a ransom. 
  • The French nobles recognize that they have been defeated, and, ashamed, vow that they will die in battle. 
  • Henry hears of the deaths of York and Suffolk; unsure of whether he had victory, when he hears a French call to arms he orders all French prisoners killed. 
  • Fluellen in conversation with Grover compare Henry to Alexander the Great. Montjoy arrives with the French surrender. Williams appears with the glove, which Henry does not acknowledge; but Henry give Fluellen the other glove and sends him after Williams, and then sends others after Fluellen to prevent a full fight. 
  • William encounters Fluellen, and strikes him. The other men arrive and prevent an escalation. Henry arrives and explains what happens and ‘pardons’ Williams, and has his glove filled with crowns. [I’m not quite sure of what happens after this, especially between Williams and Fluellen—Fluellen seems to do an about face and now thinks well of Williams.] The scene ends with the numbers of the dead being announced, and Henry giving credit for the victory to god.

5. Treaty signed, and marriage

Treaty signed and Princess Kate agrees to marry Henry; Fluellen gets revenge

  • Chorus: Brings Henry back to England where he and his victory are celebrated, and then back to France where the treaty recognizing Henry as sovereign will be signed. 
  • Fluellen, via use of a cudgel, forces Pistol to eat a leek to avenge his insults; Pistol decides to return to England where he will wear his cudgel wounds to pretend to be a wounded soldier. 
  • Henry and the King of France meet, and Henry delegates negotiation to his nobles while he woos Princess Katherine – she consents to marrying him, but without, it seems to me, much understanding or enthusiasm. Henry rides roughshod over her preference not to kiss before the wedding: “O Kate, nice customs curtsy to great Kings.

A few notes

Throughout the play we see that Henry has separated himself from his old base companions: Falstaff dies (and was previously exiled); Henry allows Barloph to be hanged for stealing; the Bishops remark on how Henry has changed.

Deception: Not much. Henry goes in disguise among his troops. Henry incident with William. Henry does not tell Fluellen what is up when he sends him after William. Henry uses lots of flowery words which it is unlikely Princess Kate will understand.

??? Is Henrys order to kill the prisoners proper?

??? Does Henry really think the war is just?

??? Henry says that if they do not surrender, governor will be responsible for soldiers’ depredations.

Quotes I like

Now entertain conjecture of a time
When creeping murmur and the poring dark
Fills the wide vessel of the universe.
From camp to camp, through the foul womb of night,
The hum of either army stilly sounds, 
That the fixed sentinels almost receive 
The secret whispers of each other’s watch.
Fire answers fire, and through their paly flames 
Each battle sees the other’s umbered face;

 I will not leave the half-achieved Harfleur 
Till in her ashes she lie buried.
The gates of mercy shall be all shut up, 
And the fleshed soldier, rough and hard of heart
In liberty of bloody hand, shall range
With conscience wide as hell, mowing like grass:
Your fresh fair virgins and your flow’ring infant
What is it then to me if impious war,
Arrayed in flames like to the prince of fiends,
Do with his smirched complexion all fell feats
Enlinked to waste and desolation?

Play with your fancies and in them behold,
Upon the hempen tackle, shipboys climbing.
Hear the shrill whistle, which doth order give
To sounds confused. Behold the threaden sails,
Borne with th’ invisible and creeping wind,
Draw the huge bottoms through the furrowes
Breasting the lofty surge. O, do but think
You stand upon the rivage and behold
A city on th’ inconstant billows dancing…

Views: 1