My experience learning Fermat's Last Theorem

I'll try to post some longer advice on learning the proof of FLT later. The medium length version of my advice is the following.

(1) read the first 2 chapters of Cornell–Silverman–Stevens, and rearead it (while also reading Silverman's AEC I and II and asking both technical and conceptual questions to anyone who will answer) until it is crystal clear. (This is the "easy" part.)

Full disclosure: for the "crystal clear" part, I didn't understand the parts about finite flat group schemes and what it means for a representation to be "finite" at ell = p until much later, and didn't attempt to understand Ribet's proof of level lowering until much later.

I think that if you're earlier on in grad school, that spending a semester with (1) (and in particular, focusing as much on Silverman's AEC I) is worthwhile and already enough of a goal.

(1.5) One thing that is more difficult that (1), but easier than (2), is to understand in a lot of detail how construct a representation out of a modular form. So you need to know about modular curves, Galois representations, Jacobians, the Hecke algebra, the Eichler–Shimura relation, etc, and its kind of difficult to really "get it" while black boxing too much; understanding moduli really helps, and this is a good way and motivation to understand moduli. Diamond and Shurman is great; like I said in another thread, my preferred route is to start with Darmon's Clay 2006 notes + William Stein's various course notes and fill in the reference where needed.

(2) Then for the proof of modularity, systematically read through the rest of Cornell–Silverman–Stevens, and follow up on whatever other references you need to to understand everything. I skipped a few parts (mainly the automorphic parts, which were too far afield for me), but really gave the rest of it my all. I would usually divide my time between reading ahead before I understood everything, and rereading earlier stuff more carefully. Is also very useful to find other accounts of the same material and read that too (especially expository material, like Darmon's notes that I mention below). The choice of extra material isn't as important in my opinion; just seeing the same ideas explained by different voice helps a lot, and often different authors have their own paritcular details that they have a short and enlightening explanation of, or illuminating but personal examples.

When I did this, I was a 4th year grad student, and already knew (1) pretty well (Darmon gave lectures on FLT at a Clay summer school in 2006, with excellent notes (available on his webpage) are excellent, and I was writing a paper about generalized Fermat equations, where I really needed to understand (1), but could replace the substantial tools of (2) with much easier techniques), but I also had learned and relearned Scheme theory (see my page here), Stacks, Deformation theory, formal schemes etc. With this much background, I basically spent 3 hours a day for a semester (2 in the morning right after breakfast, and 1 in the evenings reviewing things) until I worked my way through the book. (I usually had some morning reading project like this each semester [e.g., Knutson's "Algebraic Spaces"]).

I had another big advantage + source of motivation: I was in Boston for Bjorn Poonen's sabbatical. Richard Taylor was running a learning seminar where the goal was to read Kisin's paper on Potentially semistable deformation rings. I definitely did not have the background for this, but really wanted to take advantage of the situation, so started (2) systematically. The big advantage was that there was a group of people who knew more than me, that were meeting in the same room once a week that I could ask questions to (including Richard Taylor, Kiran Kedlaya, Bjorn, Rob Pollack, Jay Pottharst, David Geraghty, Ana Caraiani..). (Also, I took 6 years for my phd and had two fellowships [so wasn't teaching, and could spend extra time learning stuff in the mornings]).

Finally, as an example of why you need so much background: the first step of the proof of modularity is that you write down two functors F and G. F is the functor of "modular Galois representations", and G is the functor of "Galois representations with properties X,Y,Z,…", where XYZ etc are all of the properties that modular Galois reps satisfy (odd, 2 dimensional, continuous, complex conjugation acts with char poly x² - 1, etc). There is a natural map G –> F, which you want to show is an isomorphism. Then you represent these functors. (So you need deformation theory.) By its nature, F you can represent using rings related to the Hecke algebra; lets call the representing ring T. G is more hands on, and you can represent it by some ring R that is constructed with a lot of linear algebra (and less modular techniques). So you get a map T –> R, and to prove modularity you need to show that this map of (very differently constructed) rings is an isomorphism. These days folks call these "R = T" theorems. (Also, I swept under the rug that we were working formal schemes, so F = Spf T rather than Spec T.) R and T are both local rings, and you hear a lot about the "commutative algebra" part of Wiles proof. Well, he knew enough about R and T to prove a theorem of the form "if f is a map of local rings with this big list of properties, and if f is an isomorphism mod the maximal ideals + blah, then f is an isomorphism". In even more difficult situations (like Kisin's paper), R and T are much nastier rings (they might have multiple irreducible components for example). I think in Wiles case they're irreducible, and not too singular (they're Gorenstein). (Surely over time I've muddled some of these details.)