What's new
The following are the titles of recent articles syndicated from What's new
Add this feed to your friends list for news aggregation, or view this feed's syndication information.

LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.

[ << Previous 20 ]
Wednesday, June 4th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
6:09 am
Decomposing a factorial into large factors (second version)

Boris Alexeev, Evan Conway, Matthieu Rosenfeld, Andrew Sutherland, Markus Uhr, Kevin Ventullo, and I have uploaded to the arXiv a second version of our paper “Decomposing a factorial into large factors“. This is a completely rewritten and expanded version of a previous paper of the same name. Thanks to many additional theoretical and numerical contributors from the other coauthors, we now have much more precise control on the main quantity {t(N)} studied in this paper, allowing us to settle all the previous conjectures about this quantity in the literature.

As discussed in the previous post, {t(N)} denotes the largest integer {t} such that the factorial {N!} can be expressed as a product of {N} factors, each of which is at least {t}. Computing {t(N)} is a special case of the bin covering problem, which is known to be NP-hard in general; and prior to our work, {t(N)} was only computed for {N \leq 599}; we have been able to compute {t(N)} for all {N \leq 10000}. In fact, we can get surprisingly sharp upper and lower bounds on {t(N)} for much larger {N}, with a precise asymptotic

\displaystyle \frac{t(N)}{N} = \frac{1}{e} - \frac{c_0}{\log N} - \frac{O(1)}{\log^{1+c} N}

for an explicit constant {c_0 = 0.30441901\dots}, which we conjecture to be improvable to

\displaystyle \frac{t(N)}{N} = \frac{1}{e} - \frac{c_0}{\log N} - \frac{c_1+o(1)}{\log^{2} N}

for an explicit constant {c_1 = 0.75554808\dots}: … For instance, we can demonstrate numerically that

\displaystyle 0 \leq t(9 \times 10^8) - 316560601 \leq 113.

As a consequence of this precision, we can verify several conjectures of Guy and Selfridge, namely

  • {t(N) \leq N/e} for all {N \neq 1,2,4}.
  • {t(N) \geq \lfloor 2N/7\rfloor} for all {N \neq 56}.
  • {t(N) \geq N/3} for all {N \geq 3 \times 10^5}. (In fact we show this is true for {N \geq 43632}, and that this threshold is best possible.)

Guy and Selfridge also claimed that one can establish {t(N) \geq N/4} for all large {N} purely by rearranging factors of {2} and {3} from the standard factorization {1 \times 2 \times \dots \times N} of {N!}, but surprisingly we found that this claim (barely) fails for all {N > 26244}:

The accuracy of our bounds comes from several techniques:

  • Greedy algorithms, in which one allocates the largest prime factors of {N!} first and then moves to smaller primes, provide quickly computable, though suboptimal, lower bounds on {t(N)} for small, medium, and moderately large values;
  • Linear programming and integer programming methods provides extremely accurate upper and lower bounds on {t(N)} for small and medium values of {N};
  • Rearrangement methods can be analyzed asymptotically via linear programming, and work well for large {N}; and
  • The modified approximate factorization strategy, discussed in the previous post is now sharpened by using {3}-smooth numbers (products of {2} and {3}) as the primary “liquidity pool” to reallocate factors of {N!}, as opposed to the previous approach of only using powers of {2}.

To me, the biggest surprise was just how stunningly accurate the linear programming methods were; the very large number of repeated prime factors here actually make this discrete problem behave rather like a continuous one.

Monday, June 2nd, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
3:30 pm
On the number of exceptional intervals to the prime number theorem in short intervals

Ayla Gafni and I have just uploaded to the arXiv the paper “On the number of exceptional intervals to the prime number theorem in short intervals“. This paper makes explicit some relationships between zero density theorems and prime number theorems in short intervals which were somewhat implicit in the literature at present.

Zero density theorems are estimates of the form

\displaystyle N(\sigma,T) \ll T^{A(\sigma)(1-\sigma)+o(1)}

for various {0 \leq \sigma < 1}, where {T} is a parameter going to infinity, {N(\sigma,T)} counts the number of zeroes of the Riemann zeta function of real part at least {\sigma} and imaginary part between {-T} and {T}, and {A(\sigma)} is an exponent which one would like to be as small as possible. The Riemann hypothesis would allow one to take {A(\sigma)=-\infty} for any {\sigma > 1/2}, but this is an unrealistic goal, and in practice one would be happy with some non-trivial upper bounds on {A(\sigma)}. A key target here is the density hypothesis that asserts that {A(\sigma) \leq 2} for all {\sigma} (this is in some sense sharp because the Riemann-von Mangoldt formula implies that {A(1/2)=2}); this hypothesis is currently known for {\sigma \leq 1/2} and {\sigma \geq 25/32}, but the known bounds are not strong enough to establish this hypothesis in the remaining region. However, there was a recent advance of Guth and Maynard, which among other things improved the upper bound {A_0} on {\sup_\sigma A(\sigma)} from {12/5=2.4} to {30/13=2.307\dots}, marking the first improvement in this bound in over four decades. Here is a plot of the best known upper bounds on {A(\sigma)}, either unconditionally, assuming the density hypothesis, or the stronger Lindelöf hypothesis:

One of the reasons we care about zero density theorems is that they allow one to localize the prime number theorem to short intervals. In particular, if we have the uniform bound {A(\sigma) \leq A_0} for all {\sigma}, then this leads to the prime number theorem

\displaystyle  \sum_{x \leq n < x+x^\theta} \Lambda(n) \sim x^\theta holding for all {x} if {\theta > 1-\frac{1}{A_0}}, and for almost all {x} (possibly excluding a set of density zero) if {\theta > 1 - \frac{2}{A_0}}. For instance, the Guth-Maynard results give a prime number theorem in almost all short intervals for {\theta} as small as {2/15+\varepsilon}, and the density hypotheis would lower this just to {\varepsilon}.

However, one can ask about more information on this exceptional set, in particular to bound its “dimension” {\mu(\theta)}, which roughly speaking amounts to getting an upper bound of {X^{\mu(\theta)+o(1)}} on the size of the exceptional set in any large interval {[X,2X]}. Based on the above assertions, one expects {\mu(\theta)} to only be bounded by {1} for {\theta < 1-2/A}, be bounded by {-\infty} for {\theta > 1-1/A}, but have some intermediate bound for the remaining exponents.

This type of question had been studied in the past, most direclty by Bazzanella and Perelli, although there is earlier work by many authors om some related quantities (such as the second moment {\sum_{n \leq x} (p_{n+1}-p_n)^2} of prime gaps) by such authors as Selberg and Heath-Brown. In most of these works, the best available zero density estimates at that time were used to obtain specific bounds on quantities such as {\mu(\theta)}, but the numerology was usually tuned to those specific estimates, with the consequence being that when newer zero density estimates were discovered, one could not readily update these bounds to match. In this paper we abstract out the arguments from previous work (largely based on the explicit formula for the primes and the second moment method) to obtain an explicit relationship between {\mu(\theta)} and {A(\sigma)}, namely that

\displaystyle  \mu(\theta) \leq \inf_{\varepsilon>0} \sup_{0 \leq \theta<1; A(\sigma) \geq \frac{1}{1-\theta}-\varepsilon} \mu_{2,\sigma}(\theta) where

\displaystyle  \mu_{2,\theta}(\theta) = (1-\theta)(1-\sigma)A(\sigma)+2\sigma-1. Actually, by also utilizing fourth moment methods, we obtain a stronger bound

\displaystyle  \mu(\theta) \leq \inf_{\varepsilon>0} \sup_{0 \leq \theta<1; A(\sigma) \geq \frac{1}{1-\theta}-\varepsilon} \min( \mu_{2,\sigma}(\theta), \mu_{4,\sigma}(\theta) ) where

\displaystyle  \mu_{4,\theta}(\theta) = (1-\theta)(1-\sigma)A^*(\sigma)+4\sigma-3 and {A^*(\sigma)} is the exponent in “additive energy zero density theorems”

\displaystyle N^*(\sigma,T) \ll T^{A^*(\sigma)(1-\sigma)+o(1)} where {N^*(\sigma,T)} is similar to {N(\sigma,T)}, but bounds the “additive energy” of zeroes rather than just their cardinality. Such bounds have appeared in the literature since the work of Heath-Brown, and are for instance a key ingredient in the recent work of Guth and Maynard. Here are the current best known bounds:

These explicit relationships between exponents are perfectly suited for the recently launched Analytic Number Theory Exponent Database (ANTEDB) (discussed previously here), and have been uploaded to that site.

This formula is moderately complicated (basically an elaborate variant of a Legendre transform), but easy to calculate numerically with a computer program. Here is the resulting bound on {\mu(\theta)} unconditionally and under the density hypothesis (together with a previous bound of Bazzanella and Perelli for comparison, where the range had to be restricted due to a gap in the argument we discovered while trying to reproduce their results):

For comparison, here is the situation assuming strong conjectures such as the density hypothesis, Lindelof hypothesis, or Riemann hypothesis:

Saturday, May 31st, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
4:49 pm
A Lean companion to “Analysis I”

Almost 20 years ago, I wrote a textbook in real analysis called “Analysis I“. It was intended to complement the many good available analysis textbooks out there by focusing more on foundational issues, such as the construction of the natural numbers, integers, rational numbers, and reals, as well as providing enough set theory and logic to allow students to develop proofs at high levels of rigor.

While some proof assistants such as Coq or Agda were well established when the book was written, formal verification was not on my radar at the time. However, now that I have had some experience with this subject, I realize that the content of this book is in fact very compatible with such proof assistants; in particular, the ‘naive type theory’ that I was implicitly using to do things like construct the standard number systems, dovetails well with the dependent type theory of Lean (which, among other things, has excellent support for quotient types).

I have therefore decided to launch a Lean companion to “Analysis I”, which is a “translation” of many of the definitions, theorems, and exercises of the text into Lean. In particular, this gives an alternate way to perform the exercises in the book, by instead filling in the corresponding “sorries” in the Lean code. (I do not however plan on hosting “official” solutions to the exercises in this companion; instead, feel free to create forks of the repository in which these sorries are filled in.)

Currently, the following sections of the text have been translated into Lean:

The formalization has been deliberately designed to be separate from the standard Lean math library Mathlib at some places, but reliant on it at others. For instance, Mathlib already has a standard notion of the natural numbers {{\bf N}}. In the Lean formalization, I first develop “by hand” an alternate construction Chapter2.Nat of the natural numbers (or just Nat, if one is working in the Chapter2 namespace), setting up many of the basic results about these alternate natural numbers which parallel similar lemmas about {{\bf N}} that are already in Mathlib (but with many of these lemmas set as exercises to the reader, with the proofs currently replaced with “sorries”). Then, in an epilogue section, isomorphisms between these alternate natural numbers and the Mathlib natural numbers are established (or more precisely, set as exercises). From that point on, the Chapter 2 natural numbers are deprecated, and the Mathlib natural numbers are used instead. I intend to continue this general pattern throughout the book, so that as one advances into later chapters, one increasingly relies on Mathlib’s definitions and functions, rather than directly referring to any counterparts from earlier chapters. As such, this companion could also be used as an introduction to Lean and Mathlib as well as to real analysis (somewhat in the spirit of the “Natural number game“, which in fact has significant thematic overlap with Chapter 2 of my text).

The code in this repository compiles in Lean, but I have not tested whether all of the (numerous) “sorries” in the code can actually be filled (i.e., if all the exercises can actually be solved in Lean). I would be interested in having volunteers “playtest” the companion to see if this can actually be done (and if the helper lemmas or “API” provided in the Lean files are sufficient to fill in the sorries in a conceptually straightforward manner without having to rely on more esoteric Lean programming techniques). Any other feedback will of course also be welcome.

[UPDATE, May 31: moved the companion to a standalone repository.]

Tuesday, May 13th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
6:36 am
Some variants of the periodic tiling conjecture

Rachel Greenfeld and I have just uploaded to the arXiv our paper Some variants of the periodic tiling conjecture. This paper explores variants of the periodic tiling phenomenon that, in some cases, a tile that can translationally tile a group, must also be able to translationally tile the group periodically. For instance, for a given discrete abelian group {G}, consider the following question:

Question 1 (Periodic tiling question) Let {F} be a finite subset of {G}. If there is a solution {1_A} to the tiling equation {1_F * 1_A = 1}, must there exist a periodic solution {1_{A_p}} to the same equation {1_F * 1_{A_p} = 1}?

We know that the answer to this question is positive for finite groups {H} (trivially, since all sets are periodic in this case), one-dimensional groups {{\bf Z} \times H} with {H} finite, and in {{\bf Z}^2}, but it can fail for {{\bf Z}^2 \times H} for certain finite {H}, and also for {{\bf Z}^d} for sufficiently large {d}; see this previous blog post for more discussion. But now one can consider other variants of this question:

  • Instead of considering level one tilings {1_F * 1_A = 1}, one can consider level {k} tilings {1_F * 1_A = k} for a given natural number {k} (so that every point in {G} is covered by exactly {k} translates of {F}), or more generally {1_F * 1_A = g} for some periodic function {g}.
  • Instead of requiring {1_F} and {1_A} to be indicator functions, one can allow these functions to be integer-valued, thus we are now studying convolution equations {f*a=g} where {f, g} are given integer-valued functions (with {g} periodic and {f} finitely supported).

We are able to obtain positive answers to three such analogues of the periodic tiling conjecture for three cases of this question. The first result (which was kindly shared with us by Tim Austin), concerns the homogeneous problem {f*a = 0}. Here the results are very satisfactory:

Theorem 2 (First periodic tiling result) Let {G} be a discrete abelian group, and let {f} be integer-valued and finitely supported. Then the following are equivalent.
  • (i) There exists an integer-valued solution {a} to {f*a=0} that is not identically zero.
  • (ii) There exists a periodic integer-valued solution {a_p} to {f * a_p = 0} that is not identically zero.
  • (iii) There is a vanishing Fourier coefficient {\hat f(\xi)=0} for some non-trivial character {\xi \in \hat G} of finite order.

By combining this result with an old result of Henry Mann about sums of roots of unity, as well as an even older decidability result of Wanda Szmielew, we obtain

Corollary 3 Any of the statements (i), (ii), (iii) is algorithmically decidable; there is an algorithm that, when given {G} and {f} as input, determines in finite time whether any of these assertions hold.

Now we turn to the inhomogeneous problem in {{\bf Z}^2}, which is the first difficult case (periodic tiling type results are easy to establish in one dimension, and trivial in zero dimensions). Here we have two results:

Theorem 4 (Second periodic tiling result) Let {G={\bf Z}^2}, let {g} be periodic, and let {f} be integer-valued and finitely supported. Then the following are equivalent.
  • (i) There exists an integer-valued solution {a} to {f*a=g}.
  • (ii) There exists a periodic integer-valued solution {a_p} to {f * a_p = g}.

Theorem 5 (Third periodic tiling result) Let {G={\bf Z}^2}, let {g} be periodic, and let {f} be integer-valued and finitely supported. Then the following are equivalent.
  • (i) There exists an indicator function solution {1_A} to {f*1_A=g}.
  • (ii) There exists a periodic indicator function solution {1_{A_p}} to {f * 1_{A_p} = g}.

In particular, the previously established case of periodic tiling conjecture for level one tilings of {{\bf Z}^2}, is now extended to higher level. By an old argument of Hao Wang, we now know that the statements mentioned in Theorem 5 are now also algorithmically decidable, although it remains open whether the same is the case for Theorem 4. We know from past results that Theorem 5 cannot hold in sufficiently high dimension (even in the classic case {g=1}), but it also remains open whether Theorem 4 fails in that setting.

Following past literature, we rely heavily on a structure theorem for solutions {a} to tiling equations {f*a=g}, which roughly speaking asserts that such solutions {a} must be expressible as a finite sum of functions {\varphi_w} that are one-periodic (periodic in a single direction). This already explains why tiling is easy to understand in one dimension, and why the two-dimensional case is more tractable than the case of general dimension. This structure theorem can be obtained by averaging a dilation lemma, which is a somewhat surprising symmetry of tiling equations that basically arises from finite characteristic arguments (viewing the tiling equation modulo {p} for various large primes {p}).

For Theorem 2, one can take advantage of the fact that the homogeneous equation {f*a=0} is preserved under finite difference operators {\partial_h a(x) := a(x+h)-a(x)}: if {a} solves {f*a=0}, then {\partial_h a} also solves the same equation {f * \partial_h a = 0}. This freedom to take finite differences one to selectively eliminate certain one-periodic components {\varphi_w} of a solution {a} to the homogeneous equation {f*a=0} until the solution is a pure one-periodic function, at which point one can appeal to an induction on dimension, to equate parts (i) and (ii) of the theorem. To link up with part (iii), we also take advantage of the existence of retraction homomorphisms from {{\bf C}} to {{\bf Q}} to convert a vanishing Fourier coefficient {\hat f(\xi)= 0} into an integer solution to {f*a=0}.

The inhomogeneous results are more difficult, and rely on arguments that are specific to two dimensions. For Theorem 4, one can also perform finite differences to analyze various components {\varphi_w} of a solution {a} to a tiling equation {f*a=g}, but the conclusion now is that the these components are determined (modulo {1}) by polynomials of one variable. Applying a retraction homomorphism, one can make the coefficients of these polynomials rational, which makes the polynomials periodic. This turns out to reduce the original tiling equation {f*a=g} to a system of essentially local combinatorial equations, which allows one to “periodize” a non-periodic solution by periodically repeating a suitable block of the (retraction homomorphism applied to the) original solution.

Theorem 5 is significantly more difficult to establish than the other two results, because of the need to maintain the solution in the form of an indicator function. There are now two separate sources of aperiodicity to grapple with. One is the fact that the polynomials involved in the components {\varphi_w} may have irrational coefficients (see Theorem 1.3 of our previous paper for an explicit example of this for a level 4 tiling). The other is that in addition to the polynomials (which influence the fractional parts of the components {\varphi_w}), there is also “combinatorial” data (roughly speaking, associated to the integer parts of {\varphi_w}) which also interact with each other in a slightly non-local way. Once one can make the polynomial coefficients rational, there is enough periodicity that the periodization approach used for the second theorem can be applied to the third theorem; the main remaining challenge is to find a way to make the polynomial coefficients rational, while still maintaining the indicator function property of the solution {a}.

It turns out that the restriction homomorphism approach is no longer available here (it makes the components {\varphi_w} unbounded, which makes the combinatorial problem too difficult to solve). Instead, one has to first perform a second moment analysis to discern more structure about the polynomials involved. It turns out that the components {\varphi_w} of an indicator function {1_A} can only utilize linear polynomials (as opposed to polynomials of higher degree), and that one can partition {{\bf Z}^2} into a finite number of cosets on which only three of these linear polynomials are “active” on any given coset. The irrational coefficients of these linear polynomials then have to obey some rather complicated, but (locally) finite, sentence in the theory of first-order linear inequalities over the rationals, in order to form an indicator function {1_A}. One can then use the Weyl equidistribution theorem to replace these irrational coefficients with rational coefficients that obey the same constraints (although one first has to ensure that one does not accidentally fall into the boundary of the constraint set, where things are discontinuous). Then one can apply periodization to the remaining combinatorial data to conclude.

A key technical problem arises from the discontinuities of the fractional part operator {\{x\}} at integers, so a certain amount of technical manipulation (in particular, passing at one point to a weak limit of the original tiling) is needed to avoid ever having to encounter this discontinuity.

Saturday, May 10th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
12:41 am
A tool to verify estimates, II: a flexible proof assistant

In a recent post, I talked about a proof of concept tool to verify estimates automatically. Since that post, I have overhauled the tool twice: first to turn it into a rudimentary proof assistant that could also handle some propositional logic; and second into a much more flexible proof assistant (deliberately designed to mimic the Lean proof assistant in several key aspects) that is also powered by the extensive Python package sympy for symbolic algebra, following the feedback from previous commenters. This I think is now a stable framework with which one can extend the tool much further; my initial aim was just to automate (or semi-automate) the proving of asymptotic estimates involving scalar functions, but in principle one could keep adding tactics, new sympy types, and lemmas to the tool to handle a very broad range of other mathematical tasks as well.

The current version of the proof assistant can be found here. (As with my previous coding, I ended up relying heavily on large language model assistance to understand some of the finer points of Python and sympy, with the autocomplete feature of Github Copilot being particularly useful.) While the tool can support fully automated proofs, I have decided to focus more for now on semi-automated interactive proofs, where the human user supplies high-level “tactics” that the proof assistant then performs the necessary calculations for, until the proof is completed.

It’s easiest to explain how the proof assistant works with examples. Right now I have implemented the assistant to work inside the interactive mode of Python, in which one enters Python commands one at a time. (Readers from my generation may be familiar with text adventure games, which have a broadly similar interface.) I would be interested developing at some point a graphical user interface for the tool, but for prototype purposes, the Python interactive version suffices. (One can also run the proof assistant within a Python script, of course.)

After downloading the relevant files, one can launch the proof assistant inside Python by typing from main import * and then loading one of the pre-made exercises. Here is one such exercise:

>>> from main import *
>>> p = linarith_exercise()
Starting proof.  Current proof state:
x: pos_real
y: pos_real
z: pos_real
h1: x < 2*y
h2: y < 3*z + 1
|- x < 7*z + 2

This is the proof assistant’s formalization of the following problem: If x,y,z are positive reals such that x < 2y and y < 3z+1, prove that x < 7z+2.

The way the proof assistant works is that one directs the assistant to use various “tactics” to simplify the problem until it is solved. In this case, the problem can be solved by linear arithmetic, as formalize dby the Linarith() tactic:

>>> p.use(Linarith())
Goal solved by linear arithmetic!
Proof complete!

If instead one wanted a bit more detail on how the linear arithmetic worked, one could have run this tactic instead with a verbose flag:

>>> p.use(Linarith(verbose=true))
Checking feasibility of the following inequalities:
1*z > 0
1*x + -7*z >= 2
1*y + -3*z < 1
1*y > 0
1*x > 0
1*x + -2*y < 0
Infeasible by summing the following:
1*z > 0 multiplied by 1/4
1*x + -7*z >= 2 multiplied by 1/4
1*y + -3*z < 1 multiplied by -1/2
1*x + -2*y < 0 multiplied by -1/4
Goal solved by linear arithmetic!
Proof complete!

Sometimes, the proof involves case splitting, and then the final proof has the structure of a tree. Here is one example, where the task is to show that the hypotheses (x>-1) \wedge (x<1) and (y>-2) \wedge (y<2) imply (x+y>-3) \wedge (x+y<3):

>>> from main import *
>>> p = split_exercise()
Starting proof.  Current proof state:
x: real
y: real
h1: (x > -1) & (x < 1)
h2: (y > -2) & (y < 2)
|- (x + y > -3) & (x + y < 3)
>>> p.use(SplitHyp("h1"))
Decomposing h1: (x > -1) & (x < 1) into components x > -1, x < 1.
1 goal remaining.
>>> p.use(SplitHyp("h2"))
Decomposing h2: (y > -2) & (y < 2) into components y > -2, y < 2.
1 goal remaining.
>>> p.use(SplitGoal())
Split into conjunctions: x + y > -3, x + y < 3
2 goals remaining.
>>> p.use(Linarith())
Goal solved by linear arithmetic!
1 goal remaining.
>>> p.use(Linarith())
Goal solved by linear arithmetic!
Proof complete!
>>> print(p.proof())
example (x: real) (y: real) (h1: (x > -1) & (x < 1)) (h2: (y > -2) & (y < 2)): (x + y > -3) & (x + y < 3) := by
  split_hyp h1
  split_hyp h2
  split_goal
  . linarith
  linarith

Here at the end we gave a “pseudo-Lean” description of the proof in terms of the three tactics used: a tactic cases h1 to case split on the hypothesis h1, followed by two applications of the simp_all tactic to simplify in each of the two cases.

The tool supports asymptotic estimation. I found a way to implement the order of magnitude formalism from the previous post within sympy. It turns out that sympy, in some sense, already natively implements nonstandard analysis: its symbolic variables have an is_number flag which basically corresponds to the concept of a “standard” number in nonstandard analysis. For instance, the sympy version S(3) of the number 3 has S(3).is_number == True and so is standard, whereas an integer variable n = Symbol("n", integer=true) has n.is_number == False and so is nonstandard. Within sympy, I was able to construct orders of magnitude Theta(X) of various (positive) expressions X, with the property that Theta(n)=Theta(1) if n is a standard number, and use this concept to then define asymptotic estimates such as X \lesssim Y (implemented as lesssim(X,Y)). One can then apply a logarithmic form of linear arithmetic to then automatically verify some asymptotic estimates. Here is a simple example, in which one is given a positive integer N and positive reals x,y such that x \leq 2N^2 and y < 3N, and the task is to conclude that xy \lesssim N^4:

>>> p = loglinarith_exercise()
Starting proof.  Current proof state:
N: pos_int
x: pos_real
y: pos_real
h1: x <= 2*N**2
h2: y < 3*N
|- Theta(x)*Theta(y) <= Theta(N)**4
>>> p.use(LogLinarith(verbose=True))
Checking feasibility of the following inequalities:
Theta(N)**1 >= Theta(1)
Theta(x)**1 * Theta(N)**-2 <= Theta(1)
Theta(y)**1 * Theta(N)**-1 <= Theta(1)
Theta(x)**1 * Theta(y)**1 * Theta(N)**-4 > Theta(1)
Infeasible by multiplying the following:
Theta(N)**1 >= Theta(1) raised to power 1
Theta(x)**1 * Theta(N)**-2 <= Theta(1) raised to power -1
Theta(y)**1 * Theta(N)**-1 <= Theta(1) raised to power -1
Theta(x)**1 * Theta(y)**1 * Theta(N)**-4 > Theta(1) raised to power 1
Proof complete!

The logarithmic linear programming solver can also handle lower order terms, by a rather brute force branching method:

>>> p = loglinarith_hard_exercise()
Starting proof.  Current proof state:
N: pos_int
x: pos_real
y: pos_real
h1: x <= 2*N**2 + 1
h2: y < 3*N + 4
|- Theta(x)*Theta(y) <= Theta(N)**3
>>> p.use(LogLinarith())
Goal solved by log-linear arithmetic!
Proof complete!

I plan to start developing tools for estimating function space norms of symbolic functions, for instance creating tactics to deploy lemmas such as Holder’s inequality and the Sobolev embedding inequality. It looks like the sympy framework is flexible enough to allow for creating further object classes for these sorts of objects. (Right now, I only have one proof-of-concept lemma to illustrate the framework, the arithmetic mean-geometric mean lemma.)

I am satisfied enough with the basic framework of this proof assistant that I would be open to further suggestions or contributions of new features, for instance by introducing new data types, lemmas, and tactics, or by contributing example problems that ought to be easily solvable by such an assistant, but are currently beyond its ability, for instance due to the lack of appropriate tactics and lemmas.

Sunday, May 4th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
3:26 pm
Orders of infinity

Many problems in analysis (as well as adjacent fields such as combinatorics, theoretical computer science, and PDE) are interested in the order of growth (or decay) of some quantity {X} that depends on one or more asymptotic parameters (such as {N}) – for instance, whether the quantity {X} grows or decays linearly, quadratically, polynomially, exponentially, etc. in {N}. In the case where these quantities grow to infinity, these growth rates had once been termed “orders of infinity” – for instance, in the 1910 book of this name by Hardy – although this term has fallen out of use in recent years. (Hardy fields are still a thing, though.)

In modern analysis, asymptotic notation is the preferred device to organize orders of infinity. There are a couple of flavors of this notation, but here is one such (a blend of Hardy’s notation and Landau’s notation). Formally, we need a parameter space {\Omega} equipped with a non-principal filter {{\mathcal F}} that describes the subsets of parameter space that are “sufficiently large” (e.g., the cofinite (Fréchet) filter on {{\bf N}}, or the cocompact filter on {{\bf R}}). We will use {N} to denote elements of this filter; thus, an assertion holds for sufficiently large {N} if and only if it holds for all {N} in some element {U} of the filter {{\mathcal F}}. Given two positive quantities {X = X(N), Y = Y(N)} that are defined for sufficiently large {N \in \Omega}, one can then define the following notions:

  • (i) We write {X = O(Y)}, {X \lesssim Y}, or {Y \gtrsim X} (and sometimes {Y = \Omega(X)}) if there exists a constant {C>0} such that {X(N) \leq CY(N)} for all sufficiently large {N}.
  • (ii) We write {X = o(Y)}, {X \ll Y}, or {Y \gg X} (and sometimes {Y = \omega(X)}) if, for every {\varepsilon>0}, one has {X(N) \leq \varepsilon Y(N)} for all sufficiently large {N}.
  • (iii) We write {X \asymp Y} (and sometimes {X \sim Y} or {Y = \Theta(X)}) if {X \lesssim Y \lesssim X}, or equivalently if there exist constants {C, c > 0} such that {c Y(N) \leq X(N) \leq C Y(N)} for all sufficiently large {N}.

We caution that in analytic number theory and adjacent fields, the slightly different notation of Vinogradov is favored, in which {X \ll Y} would denote the concept (i) instead of (ii), and {X \sim Y} would denote a fourth concept {X = (1+o(1)) Y} instead of (iii). However, we will use the Hardy-Landau notation exclusively in this blog post.

Anyone who works with asymptotic notation for a while will quickly recognize that it enjoys various algebraic properties akin to the familiar algebraic properties of order {<} on the real line. For instance, the symbols {\lesssim, \ll, \gg, \gtrsim} behave very much like {\leq}, {<}, {>}, {\gtrsim}, with properties such as the following:

  • If {X \lesssim Y} and {Y \lesssim Z}, then {X \lesssim Z}.
  • {X \lesssim Y} if and only if {XZ \lesssim YZ}.
  • If {N} is restricted to a sequence, then (after passing to a subsequence if necessary), exactly one of {X \ll Y}, {X \asymp Y}, and {X \gg Y} is true.
One also has the “tropical” property {X+Y \asymp \max(X,Y)}, making asymptotic notation a kind of “tropical algebra“.

However, in contrast with other standard algebraic structures (such as ordered fields) that blend order and arithmetic operations, the precise laws of orders of infinity are usually not written down as a short list of axioms. Part of this is due to cultural differences between analysis and algebra – as discussed in this essay by Gowers, analysis is often not well suited to the axiomatic approach to mathematics that algebra benefits so much from. But another reason is due to our orthodox implementation of analysis via “epsilon-delta” type concepts, such as the notion of “sufficiently large” used above, which notoriously introduces a large number of both universal and existential quantifiers into the subject (for every epsilon, there exists a delta…) which tends to interfere with the smooth application of algebraic laws (which are optimized for the universal quantifier rather than the existential quantifier).

But there is an alternate approach to analysis, namely nonstandard analysis, which rearranges the foundations so that many of quantifiers (particularly the existential ones) are concealed from view (usually via the device of ultrafilters). This makes the subject of analysis considerably more “algebraic” in nature, as the “epsilon management” that is so prevalent in orthodox analysis is now performed much more invisibly. For instance, as we shall see, in the nonstandard framework, orders of infinity acquire the algebraic structure of a totally ordered vector space that also enjoys a completeness property reminiscent, though not identical to, the completeness of the real numbers. There is also a transfer principle that allows one to convert assertions in orthodox asymptotic notation into logically equivalent assertions about nonstandard orders of infinity, allowing one to then prove asymptotic statements in a purely algebraic fashion. There is a price to pay for this “algebrization” of analysis; the spaces one works with become quite large (in particular, they tend to be “inseparable” and not “countably generated” in any reasonable fashion), and it becomes difficult to extract explicit constants {C} (or explicit decay rates) from the asymptotic notation. However, there are some cases in which the tradeoff is worthwhile. For instance, symbolic computations tend to be easier to perform in algebraic settings than in orthodox analytic settings, so formal computations of orders of infinity (such as the ones discussed in the previous blog post) could benefit from the nonstandard approach. (See also my previous posts on nonstandard analysis for more discussion about these tradeoffs.)

Let us now describe the nonstandard approach to asymptotic notation. With the above formalism, the switch from standard to nonstandard analysis is actually quite simple: one assumes that the asymptotic filter {{\mathcal F}} is in fact an ultrafilter. In terms of the concept of “sufficiently large”, this means adding the following useful axiom:

  • Given any predicate {P(N)}, exactly one of the two statements “{P(N)} holds for sufficiently large {N}” and “{P(N)} does not hold for sufficiently large {N}” is true.

This can be compared with the situation with, say, the Fréchet filter on the natural numbers {{\bf N}}, in which one has to insert some qualifier such as “after passing to a subsequence if necessary” in order to make the above axiom true.

The existence of an ultrafilter requires some weak version of the axiom of choice (specifically, the ultrafilter lemma), but for this post we shall just take the existence of ultrafilters for granted.

We can now define the nonstandard orders of infinity {{\mathcal O}} to be the space of all non-negative functions {X(N)} defined for sufficiently large {N \in \Omega}, modulo the equivalence relation {\asymp} defined previously. That is to say, a nonstandard order of infinity is an equivalence class

\displaystyle  \Theta(X) := \{ Y : U \rightarrow {\bf R}^+: X \asymp Y \}

of functions {Y: U \rightarrow {\bf R}^+} defined on elements {U} of the ultrafliter. For instance, if {\Omega} is the natural numbers, then {\Theta(N)} itself is an order of infinity, as is {\Theta(N^2)}, {\Theta(e^N)}, {\Theta(1/N)}, {\Theta(N \log N)}, and so forth. But we exclude {0}; it will be important for us that the order of infinity is strictly positive for all sufficiently large {N}.

We can place various familiar algebraic operations on {{\mathcal O}}:

  • Addition. We define {\Theta(X)+\Theta(Y) :=\Theta(X+Y)}. It is easy to see that this is well-defined, by verifying that if {X \asymp X'} and {Y \asymp Y'}, then {X+Y \asymp X'+Y'}. However, because of our positivity requirement, we do not define subtraction on {{\mathcal O}}.
  • Multiplication and division We define {\Theta(X) \times \Theta(Y) := \Theta(XY)} and {\Theta(X)/\Theta(Y) = \Theta(X/Y)}; we do not need to worry about division by zero, thanks to the positivity requirement. Again it is easy to verify this is well-defined.
  • Scalar exponentiation If {\lambda} is a real number, we define {\Theta(X)^\lambda := \Theta(X^\lambda)}. Again, this is easily checked to be well-defined.
  • Order We define {\Theta(X) \leq \Theta(Y)} if {X \lesssim Y}, and {\Theta(X) < \Theta(Y)} if {X \ll Y}. Again, this is easily checked to be well-defined. (And the ultrafilter axiom ensures that {\Theta(X) \leq \Theta(Y)} holds iff exactly one of {\Theta(X)=\Theta(Y)} and {\Theta(X) < \Theta(Y)} holds.)

With these operations, combined with the ultrafilter axiom, we see that {{\mathcal O}} obeys the laws of many standard algebraic structures, the proofs of which we leave as exercises for the reader:

  • {({\mathcal O}, \leq)} is a totally ordered set.
  • In fact, {({\mathcal O}, \leq, \Theta(1), \times, (\cdot)^{\cdot})} is a totally ordered vector space, with {\Theta(1)} playing the role of the zero vector, multiplication {(\Theta(X), \Theta(Y)) \mapsto \Theta(X) \times \Theta(Y)} playing the role of vector addition, and scalar exponentiation {(\lambda, \Theta(X)) \mapsto \Theta(X)^\lambda} playing the role of scalar multiplication. (Of course, division would then play the role of vector subtraction.) To avoid confusion, one might refer to {{\mathcal O}} as a log-vector space rather than a vector space to emphasize the fact that the vector structure is coming from multiplication (and exponentiation) rather than addition (and multiplication). Ordered log-vector spaces may seem like a strange and exotic concept, but they are actually already studied implicitly in analysis, albeit under the guise of other names such as log-convexity.
  • {({\mathcal O}, +, \times, 1)} is a semiring (albeit one without an additive identity element), which is idempotent: {\Theta(X) + \Theta(X) = \Theta(X)} for all {\Theta(X) \in {\mathcal O}}.
  • More generally, addition can be described in purely order-theoretic terms: {\Theta(X) + \Theta(Y) = \max(\Theta(X), \Theta(Y))} for all {\Theta(X), \Theta(Y) \in {\mathcal O}}. (It would therefore be natural to call {{\mathcal O}} a tropical semiring, although the precise axiomatization of this term does not appear to be fully standardized currently.)

The ordered (log-)vector space structure of {{\mathcal O}} in particular opens up the ability to prove asymptotic implications by (log-)linear programming; this was implicitly used in my previous post. One can also use the language of (log-)linear algebra to describe further properties of various orders of infinity. For instance, if {\Omega} is the natural numbers, we can form the subspace

\displaystyle N^{O(1)} := \bigcup_{k \geq 0} [\Theta(N)^{-k}, \Theta(N)^k]

of {{\mathcal O}} consisting of those orders of infinity {\Theta(X)} which are of polynomial type in the sense that {N^{-C} \lesssim X \lesssim N^C} for some {C>0}; this is then a (log)-vector subspace of {{\mathcal O}}, and has a canonical (log-)linear surjection {\mathrm{ord}: N^{O(1)} \rightarrow {\bf R}} that assigns to each order of infinity {\Theta(X)} of polynomial type the unique real number {\alpha} such that {X(N) = N^{\alpha+o(1)}}, that is to say for all {\varepsilon>0} one has {N^{\alpha-\varepsilon} \lesssim X(N) \lesssim N^{\alpha+\varepsilon}} for all sufficiently large {N}. (The existence of such an
Friday, May 2nd, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
4:01 am
A proof of concept tool to verify estimates

This post was inspired by some recent discussions with Bjoern Bringmann.

Symbolic math software packages are highly developed for many mathematical tasks in areas such as algebra, calculus, and numerical analysis. However, to my knowledge we do not have similarly sophisticated tools for verifying asymptotic estimates – inequalities that are supposed to hold for arbitrarily large parameters, with constant losses. Particularly important are functional estimates, where the parameters involve an unknown function or sequence (living in some suitable function space, such as an {L^p} space); but for this discussion I will focus on the simpler situation of asymptotic estimates involving a finite number of positive real numbers, combined using arithmetic operations such as addition, multiplication, division, exponentiation, and minimum and maximum (but no subtraction). A typical inequality here might be the weak arithmetic mean-geometric mean inequality

\displaystyle  (abc)^{1/3} \lesssim a+b+c \ \ \ \ \ (1)


where {a,b,c} are arbitrary positive real numbers, and the {\lesssim} here indicates that we are willing to lose an unspecified (multiplicative) constant in the estimates.

I have wished in the past (e.g., in this MathOverflow answer) for a tool that could automatically determine whether such an estimate was true or not (and provide a proof if true, or an asymptotic counterexample if false). In principle, simple inequalities of this form could be automatically resolved by brute force case splitting. For instance, with (1), one first observes that {a+b+c} is comparable to {\max(a,b,c)} up to constants, so it suffices to determine if

\displaystyle  (abc)^{1/3} \lesssim \max(a,b,c). \ \ \ \ \ (2)

Next, to resolve the maximum, one can divide into three cases: {a \gtrsim b,c}; {b \gtrsim a,c}; and {c \gtrsim a,b}. Suppose for instance that {a \gtrsim b,c}. Then the estimate to prove simplifies to

\displaystyle (abc)^{1/3} \lesssim a,

and this is (after taking logarithms) a positive linear combination of the hypotheses {a \gtrsim b}, {a \gtrsim c}. The task of determining such a linear combination is a standard linear programming task, for which many computer software packages exist.

Any single such inequality is not too difficult to resolve by hand, but there are applications in which one needs to check a large number of such inequalities, or split into a large number of cases. I will take an example at random from an old paper of mine (adapted from the equation after (51), and ignoring some epsilon terms for simplicity): I wanted to establish the estimate

\displaystyle  \frac{\langle N_2 \rangle^{1/2}}{\langle N_1 \rangle^{1/4} L_1^{1/2} L_2^{1/2} } L_{\min}^{1/2} N^{-1} (N_1 N_2 N_3)^{1/2} \lesssim 1 \ \ \ \ \ (3)

for any {N_1,N_2,N_3,L_1,L_2,L_3 > 0} obeying the constraints

\displaystyle N_{\max} \sim N_{\mathrm{med}} \sim N; \quad L_{\max} \sim L_{\mathrm{med}} \gtrsim N_1 N_2 N_3

where {N_{\max}}, {N_{\mathrm{med}}}, and {N_{\min}} are the maximum, median, and minimum of {N_1, N_2, N_3} respectively, and similarly for {L_{\max}}, {L_{\mathrm{med}}}, and {L_{\min}}, and {\langle N \rangle := (1+N^2)^{1/2}}. This particular bound could be dispatched in three or four lines from some simpler inequalities; but it took some time to come up with those inequalities, and I had to do a dozen further inequalities of this type. This is a task that seems extremely ripe for automation, particularly with modern technology.

Recently, I have been doing a lot more coding (in Python, mostly) than in the past, aided by the remarkable facility of large language models to generate initial code samples for many different tasks, or to autocomplete partially written code. For the most part, I have restricted myself to fairly simple coding tasks, such as computing and then plotting some mildly complicated mathematical functions, or doing some rudimentary data analysis on some dataset. But I decided to give myself the more challenging task of coding a verifier that could handle inequalities of the above form. After about four hours of coding, with frequent assistance from an LLM, I was able to produce a proof of concept tool for this, which can be found at this Github repository. For instance, to verify (1), the relevant Python code is

    a = Variable("a")
    b = Variable("b")
    c = Variable("c")
    assumptions = Assumptions()
    assumptions.can_bound((a * b * c) ** (1 / 3), max(a, b, c))

and the (somewhat verbose) output verifying the inequality is

Checking if we can bound (((a * b) * c) ** 0.3333333333333333) by max(a, b, c) from the given axioms.
We will split into the following cases:
[[b <~ a, c <~ a], [a <~ b, c <~ b], [a <~ c, b <~ c]]
Trying case: ([b <~ a, c <~ a],)
Simplify to proving (((a ** 0.6666666666666667) * (b ** -0.3333333333333333)) * (c ** -0.3333333333333333)) >= 1.
Bound was proven true by multiplying the following hypotheses :
b <~ a raised to power 0.33333333
c <~ a raised to power 0.33333333
Trying case: ([a <~ b, c <~ b],)
Simplify to proving (((b ** 0.6666666666666667) * (a ** -0.3333333333333333)) * (c ** -0.3333333333333333)) >= 1.
Bound was proven true by multiplying the following hypotheses :
a <~ b raised to power 0.33333333
c <~ b raised to power 0.33333333
Trying case: ([a <~ c, b <~ c],)
Simplify to proving (((c ** 0.6666666666666667) * (a ** -0.3333333
333333333)) * (b ** -0.3333333333333333)) >= 1.
Bound was proven true by multiplying the following hypotheses :
a <~ c raised to power 0.33333333
b <~ c raised to power 0.33333333
Bound was proven true in all cases!

This is of course an extremely inelegant proof, but elegance is not the point here; rather, that it is automated. (See also this recent article of Heather Macbeth for how proof writing styles change in the presence of automated tools, such as formal proof assistants.)

The code is close to also being able to handle more complicated estimates such as (3); right now I have not written code to properly handle hypotheses such as {N_{\max} \sim N_{\mathrm{med}} \sim N} that involve complex expressions such as {N_{\max} = \max(N_1,N_2,N_3)}, as opposed to hypotheses that only involve atomic variables such as {N_1}, {N_2, N_3}, but I can at least handle such complex expressions in the left and right-hand sides of the estimate I am trying to verify.

In any event, the code, being a mixture of LLM-generated code and my own rudimentary Python skills, is hardly an exemplar of efficient or elegant coding, and I am sure that there are many expert programmers who could do a much better job. But I think this is proof of concept that a more sophisticated tool of this form could be quite readily created to do more advanced tasks. One such example task was the one I gave in the above MathOverflow question, namely being able to automatically verify a claim such as

\displaystyle \sum_{d=0}^\infty \frac{2d+1}{2h^2 (1 + \frac{d(d+1)}{h^2}) (1 + \frac{d(d+1)}{h^2m^2})^2} \lesssim 1 + \log(m^2)

for all {h,m > 0}. Another task would be to automatically verify the ability to estimate some multilinear expression of various functions, in terms of norms of such functions in standard spaces such as Sobolev spaces; this is a task that is particularly prevalent in PDE and harmonic analysis (and can frankly get somewhat tedious to do by hand). As speculated in that MO post, one could eventually hope to also utilize AI to assist in the verification process, for instance by suggesting possible splittings of the various sums or integrals involved, but that would be a long-term objective.

This sort of software development would likely best be performed as a collaborative project, involving both mathematicians and expert programmers. I would be interested to receive advice on how best to proceed with such a project (for instance, would it make sense to incorporate such a tool into an existing platform such as SageMATH), and what features for a general estimate verifier would be most desirable for mathematicians. One thing on my wishlist is the ability to give a tool an expression to estimate (such as a multilinear integral of some unknown functions), as well as a fixed set of tools to bound that integral (e.g., splitting the integral into pieces, integrating by parts, using the Hölder and Sobolev inequalities, etc.), and have the computer do its best to optimize the bound it can produce with those tools (complete with some independently verifiable proof certificate for its output). One could also imagine such tools having the option to output their proof certificates in a formal proof assistant language such as Lean. But perhaps there are other useful features that readers may wish to propose.

Wednesday, April 23rd, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
10:13 pm
Stonean spaces, projective objects, the Riesz representation theorem, and (possibly) condensed mathematics

A basic type of problem that occurs throughout mathematics is the lifting problem: given some space {X} that “sits above” some other “base” space {Y} due to a projection map {\pi: X \rightarrow Y}, and some map {f: A \rightarrow Y} from a third space {A} into the base space {Y}, find a “lift” {\tilde f} of {f} to {X}, that is to say a map {\tilde f: A \rightarrow X} such that {\pi \circ \tilde f = f}. In many applications we would like to have {\tilde f} preserve many of the properties of {f} (e.g., continuity, differentiability, linearity, etc.).

Of course, if the projection map {\pi: X \rightarrow Y} is not surjective, one would not expect the lifting problem to be solvable in general, as the map {f} to be lifted could simply take values outside of the range of {\pi}. So it is natural to impose the requirement that {\pi} be surjective, giving the following commutative diagram to complete:

If no further requirements are placed on the lift {\tilde f}, then the axiom of choice is precisely the assertion that the lifting problem is always solvable (once we require {\pi} to be surjective). Indeed, the axiom of choice lets us select a preimage {\phi(y) \in \pi^{-1}(\{y\})} in the fiber of each point {y \in Y}, and one can lift any {f: A \rightarrow Y} by setting {\tilde f := \phi \circ f}. Conversely, to build a choice function for a surjective map {\pi: X \rightarrow Y}, it suffices to lift the identity map {\mathrm{id}_Y:Y \rightarrow Y} to {X}.

Of course, the maps provided by the axiom of choice are famously pathological, being almost certain to be discontinuous, non-measurable, etc.. So now suppose that all spaces involved are topological spaces, and all maps involved are required to be continuous. Then the lifting problem is not always solvable. For instance, we have a continuous projection {x \mapsto x \hbox{ mod } 1} from {{\bf R}} to {{\bf R}/{\bf Z}}, but the identity map {\mathrm{id}_{{\bf R}/{\bf Z}}:{\bf R}/{\bf Z} \rightarrow {\bf R}/{\bf Z}} cannot be lifted continuously up to {{\bf R}}, because {{\bf R}} is contractable and {{\bf R}/{\bf Z}} is not.

However, if {A} is a discrete space (every set is open), then the axiom of choice lets us solve the continuous lifting problem from {A} for any continuous surjection {\pi: X \rightarrow Y}, simply because every map from {A} to {X} is continuous. Conversely, the discrete spaces are the only ones with this property: if {A} is a topological space which is not discrete, then if one lets {A_{disc}} be the same space {A} equipped with the discrete topology, then the only way one can continuously lift the identity map {\mathrm{id}_A: A \rightarrow A} through the “projection map” {\pi: A_{disc} \rightarrow A} (that maps each point to itself) is if {A} is itself discrete.

These discrete spaces are the projective objects in the category of topological spaces, since in this category the concept of an epimorphism agrees with that of a surjective continuous map. Thus {A_{disc}} can be viewed as the unique (up to isomorphism) projective object in this category that has a bijective continuous map to {A}.

Now let us narrow the category of topological spaces to the category of compact Hausdorff (CH) spaces. Here things should be better behaved; for instance, it is a standard fact in this category that continuous bijections are homeomorphisms, and it is still the case that the epimorphisms are the continuous surjections. So we have a usable notion of a projective object in this category: CH spaces {A} such that any continuous map {f: A \rightarrow Y} into another CH space can be lifted via any surjective continuous map {\pi: X \rightarrow Y} to another CH space.

By the previous discussion, discrete CH spaces will be projective, but this is an extremely restrictive set of examples, since of course compact discrete spaces must be finite. Are there any others? The answer was worked out by Gleason:

Proposition 1 A compact Hausdorff space {A} is projective if and only if it is extremally disconnected, i.e., the closure of every open set is again open.

Proof: We begin with the “only if” direction. Let {A} was projective, and let {U} be an open subset of {A}. Then the closure {\overline{U}} and complement {A \backslash U} are both closed, hence compact, subsets of {A}, so the disjoint union {\overline{U} \uplus (A \backslash U)} is another CH space, which has an obvious surjective continuous projection map {\pi: \overline{U} \uplus (A \backslash U) \rightarrow A} to {A} formed by gluing the two inclusion maps together. As {A} is projective, the identity map {\mathrm{id}_A: A \rightarrow A} must then lift to a continuous map {\tilde f: A \rightarrow \overline{U} \uplus (A \backslash U) \rightarrow A}. One easily checks that {f} has to map {\overline{U}} to the first component {\overline{U}} of the disjoint union, and {A \backslash \overline{U}} ot the second component; hence {f^{-1}(\overline{U}) = \overline{U}}, and so {\overline{U}} is open, giving extremal disconnectedness.

Conversely, suppose that {A} is extremally disconnected, that {\pi: X \rightarrow Y} is a continuous surjection of CH spaces, and {f: A \rightarrow Y} is continuous. We wish to lift {f} to a continuous map {\tilde f: A \rightarrow X}.

We first observe that it suffices to solve the lifting problem for the identity map {\mathrm{id}_A: A \rightarrow A}, that is to say we can assume without loss of generality that {Y=A} and {f} is the identity. Indeed, for general maps {f: A \rightarrow Y}, one can introduce the pullback space

\displaystyle A \times_Y X := \{ (a,x) \in A \times X: \pi(x) = f(a) \}

which is clearly a CH space that has a continuous surjection {\tilde \pi: A \times_Y X \rightarrow A}. Any continuous lift of the identity map {\mathrm{id}_A: A \rightarrow A} to {A \times_Y X}, when projected onto {X}, will give a desired lift {\tilde f: A \rightarrow X}.

So now we are trying to lift the identity map {\mathrm{id}_A: A \rightarrow A} via a continuous surjection {\pi: X \rightarrow A}. Let us call this surjection {\pi: X \rightarrow A} minimally surjective if no restriction {\pi|_K: K \rightarrow A} of {X} to a proper closed subset {K} of {X} remains surjective. An easy application of Zorn’s lemma shows that every continuous surjection {\pi: X \rightarrow A} can be restricted to a minimally surjective continuous map {\pi|_K: K \rightarrow A}. Thus, without loss of generality, we may assume that {\pi} is minimally surjective.

The key claim now is that every minimally surjective map {\pi: X \rightarrow A} into an extremally disconnected space is in fact a bijection. Indeed, suppose for contradiction that there were two distinct points {x_1,x_2} in {X} that mapped to the same point {a} under {X}. By taking contrapositives of the minimal surjectivity property, we see that every open neighborhood of {x_1} must contain at least one fiber {\pi^{-1}(\{b\})} of {b}, and by shrinking this neighborhood one can ensure the base point is arbitrarily close to {a = \pi(x_2)}. Thus, every open neighborhood of {x_1} must intersect every open neighborhood of {x_2}, contradicting the Hausdorff property.

It is well known that continuous bijections between CH spaces must be homeomorphisms (they map compact sets to compact sets, hence must be open maps). So {\pi:X \rightarrow A} is a homeomorphism, and one can lift the identity map to the inverse map {\pi^{-1}: A \rightarrow X}. \Box

Remark 2 The property of being “minimally surjective” sounds like it should have a purely category-theoretic definition, but I was unable to match this concept to a standard term in category theory (something along the lines of a “minimal epimorphism”, I would imagine).

In view of this proposition, it is now natural to look for extremally disconnected CH spaces (also known as Stonean spaces). The discrete CH spaces are one class of such spaces, but they are all finite. Unfortunately, these are the only “small” examples:

Lemma 3 Any first countable extremally disconnected CH space {A} is discrete.

Proof: If such a space {A} were not discrete, one could find a sequence {a_n} in {A} converging to a limit {a} such that {a_n \neq a} for all {A}. One can sparsify the elements {a_n} to all be distinct, and from the Hausdorff property one can construct neighbourhoods {U_n} of each {a_n} that avoid {a}, and are disjoint from each other. Then {\bigcup_{n=1}^\infty U_{2n}} and then {\bigcup_{n=1}^\infty U_{2n+1}} are disjoint open sets that both have {a} as an adherent point, which is inconsistent with extremal disconnectedness: the closure of {\bigcup_{n=1}^\infty U_{2n}} contains {a} but is disjoint from {\bigcup_{n=1}^\infty U_{2n+1}}, so cannot be open. \Box

Thus for instance there are no extremally disconnected compact metric spaces, other than the finite spaces; for instance, the Cantor space {\{0,1\}^{\bf N}} is not extremally disconnected, even though it is totally disconnected (which one can easily see to be a property implied by extremal disconnectedness). On the other hand, once we leave the first-countable world, we have plenty of such spaces:

Lemma 4 Let {\mathcal{B}} be a complete Boolean algebra. Then the Stone dual {\mathrm{Hom}(\mathcal{B},\{0,1\})} of {\mathcal{B}} (i.e., the space of boolean homomorphisms {\phi: \mathcal{B} \rightarrow \{0,1\}}) is an extremally disconnected CH space.

Proof: The CH properties are standard. The elements {E} of {{\mathcal B}} give a basis of the topology given by the clopen sets {B_E := \{ \phi: \phi(E) = 1\}}. Because the Boolean algebra is complete, we see that the closure of the open set {\bigcup_{\alpha} B_{E_\alpha}} for any family {E_\alpha} of sets is simply the clopen set {B_{\bigwedge_\alpha E_\alpha}}, which obviously open, giving extremal disconnectedness. \Box

Remark 5 In fact, every extremally disconnected CH space <img src="https://s0.wp.com/latex.php?latex=%7BX%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0&amp;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%7BX%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0&amp;c=20201002 1x, https://s0.wp.com/latex.php?latex=%7BX%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0&amp;c
Thursday, April 17th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
2:48 pm
A suspicious conference

I have received multiple queries from colleagues who have been invited (from a non-academic email address) to speak a strange-sounding conference that is allegedly supported by major mathematical institutions, allegedly hosted at a prestigious university, and allegedly having myself (and two other Fields Medalists) as plenary speakers. The invitees are asked to pay “registration fees” upfront, with the promise of future reimbursement. (There is a bare-bones web site, which seems to be partially copy-pasted from some previous “conferences” in chemistry and physics, but I will not link to it here.)

I have not agreed (or even been asked) to participate in this event, and I can confirm the same for at least one of the other supposed plenary speakers. There is also no confirmation of the support or location claimed.

As such, this does not appear to be a legitimate scientific conference, and I would advise anyone receiving such an email to discard it.

EDIT: in order to have this post be picked up by appropriate search engine queries, the name of the alleged conference is “Infinity ’25: Horizons in Mathematical Thought”.

SECOND EDIT: I am *not* referring to the 2026 International Congress of Mathematicians (ICM), which is also sending out speaker invitations currently, and is of course an extremely legitimate conference.

Thursday, March 27th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
4:14 am
Decomposing a factorial into large factors

I’ve just uploaded to the arXiv the paper “Decomposing a factorial into large factors“. This paper studies the quantity {t(N)}, defined as the largest quantity such that it is possible to factorize {N!} into {N} factors {a_1, \dots, a_N}, each of which is at least {t(N)}. The first few values of this sequence are

\displaystyle  1,1,1,2,2,2,2,2,3,3,3,3,3,4, \dots

(OEIS A034258). For instance, we have {t(9)=3}, because on the one hand we can factor

\displaystyle  9! = 3 \times 3 \times 3 \times 3 \times 4 \times 4 \times 5 \times 7 \times 8

but on the other hand it is not possible to factorize {9!} into nine factors, each of which is {4} or higher.

This quantity {t(N)} was introduced by Erdös, who asked for upper and lower bounds on {t(N)}; informally, this asks how equitably one can split up {N!} into {N} factors. When factoring an arbitrary number, this is essentially a variant of the notorious knapsack problem (after taking logarithms), but one can hope that the specific structure of the factorial {N!} can make this particular knapsack-type problem more tractable. Since

\displaystyle  N! = a_1 \dots a_N \geq t(N)^N

for any putative factorization, we obtain an upper bound

\displaystyle  t(N) \leq (N!)^{1/N} = \frac{N}{e} + O(\log N) \ \ \ \ \ (1)

thanks to the Stirling approximation. At one point, Erdös, Selfridge, and Straus claimed that this upper bound was asymptotically sharp, in the sense that

\displaystyle  t(N) = \frac{N}{e} + o(N) \ \ \ \ \ (2)

as {N \rightarrow \infty}; informally, this means we can split {N!} into {N} factors that are (mostly) approximately the same size, when {N} is large. However, as reported in this later paper, Erdös “believed that Straus had written up our proof… Unfortunately Straus suddenly died and no trace was ever found of his notes. Furthermore, we never could reconstruct our proof, so our assertion now can be called only a conjecture”.

Some further exploration of {t(N)} was conducted by Guy and Selfridge. There is a simple construction that gives the lower bound

\displaystyle  t(N) \geq \frac{3}{16} N - o(N)

that comes from starting with the standard factorization {N! = 1 \times 2 \times \dots \times N} and transferring some powers of {2} from the later part of the sequence to the earlier part to rebalance the terms somewhat. More precisely, if one removes one power of two from the even numbers between {\frac{3}{8}N} and {N}, and one additional power of two from the multiples of four between {\frac{3}{4}N} to {N}, this frees up {\frac{3}{8}N + o(N)} powers of two that one can then distribute amongst the numbers up to {\frac{3}{16} N} to bring them all up to at least {\frac{3}{16} N - o(N)} in size. A more complicated procedure involving transferring both powers of {2} and {3} then gives the improvement {t(N) \geq \frac{1}{4} N - o(N)}. At this point, however, things got more complicated, and the following conjectures were made by Guy and Selfridge:
  • (i) Is {\frac{t(N)}{N} \leq \frac{1}{e}} for all {N \neq 1,2,4}?
  • (ii) Is {t(N) \geq \lfloor 2N/7 \rfloor} for all {N \neq 56}? (At {N=56}, this conjecture barely fails: {t(56) = 15 < 16 = \lfloor 2 \times 56/7 \rfloor}.)
  • (iii) Is {\frac{t(N)}{N} \geq \frac{1}{3}} for all {N \geq 300000}?

In this note we establish the bounds

\displaystyle  \frac{1}{e} - \frac{O(1)}{\log N} \leq \frac{t(N)}{N} \leq \frac{1}{e} - \frac{c_0+o(1)}{\log N} \ \ \ \ \ (3)

as {N \rightarrow \infty}, where {c_0} is the explicit constant

\displaystyle  c_0 := \frac{1}{e} \int_0^1 \left \lfloor \frac{1}{x} \right\rfloor \log \left( ex \left \lceil \frac{1}{ex} \right\rceil \right)\ dx \approx 0.3044.

In particular this recovers the lost result (2). An upper bound of the shape

\displaystyle  \frac{t(N)}{N} \leq \frac{1}{e} - \frac{c+o(1)}{\log N} \ \ \ \ \ (4)

for some {c>0} was previously conjectured by Erdös and Graham (Erdös problem #391). We conjecture that the upper bound in (3) is sharp, thus

\displaystyle  \frac{t(N)}{N} = \frac{1}{e} - \frac{c_0+o(1)}{\log N}, \ \ \ \ \ (5)

which is consistent with the above conjectures (i), (ii), (iii) of Guy and Selfridge, although numerically the convergence is somewhat slow.

The upper bound argument for (3) is simple enough that it could also be modified to establish the first conjecture (i) of Guy and Selfridge; in principle, (ii) and (iii) are now also reducible to a finite computation, but unfortunately the implied constants in the lower bound of (3) are too weak to make this directly feasible. However, it may be possible to now crowdsource the verification of (ii) and (iii) by supplying a suitable set of factorizations to cover medium sized {N}, combined with some effective version of the lower bound argument that can establish {\frac{t(N)}{N} \geq \frac{1}{3}} for all {N} past a certain threshold. The value {N=300000} singled out by Guy and Selfridge appears to be quite a suitable test case: the constructions I tried fell just a little short of the conjectured threshold of {100000}, but it seems barely within reach that a sufficiently efficient rearrangement of factors can work here.

We now describe the proof of the upper and lower bound in (3). To improve upon the trivial upper bound (1), one can use the large prime factors of {N!}. Indeed, every prime {p} between {N/e} and {N} divides {N!} at least once (and the ones between {N/e} and {N/2} divide it twice), and any factor {a_i} that contains such a factor therefore has to be significantly larger than the benchmark value of {N/e}. This observation already readily leads to some upper bound of the shape (4) for some {c>0}; if one also uses the primes {p} that are slightly less than {N/e} (noting that any multiple of {p} that exceeds {N/e}, must in fact exceed {\lceil N/ep \rceil p}) is what leads to the precise constant {c_0}.

For previous lower bound constructions, one started with the initial factorization {N! = 1 \times \dots \times N} and then tried to “improve” this factorization by moving around some of the prime factors. For the lower bound in (3), we start instead with an approximate factorization roughly of the shape

\displaystyle  N! \approx (\prod_{t \leq n < t + 2N/A, \hbox{ odd}} n)^A

where {t} is the target lower bound (so, slightly smaller than {N/e}), and {A} is a moderately sized natural number parameter (we will take {A \asymp \log^3 N}, although there is significant flexibility here). If we denote the right-hand side here by {B}, then {B} is basically a product of {N} numbers of size at least {t}. It is not literally equal to {N!}; however, an easy application of Legendre’s formula shows that for odd small primes {p}, {N!} and {B} have almost exactly the same number of factors of {p}. On the other hand, as {B} is odd, {B} contains no factors of {2}, while {N!} contains about {N} such factors. The prime factorizations of {B} and {N!} differ somewhat at large primes, but {B} has slightly more such prime factors as {N!} (about {\frac{N}{\log N} \log 2} such factors, in fact). By some careful applications of the prime number theorem, one can tweak some of the large primes appearing in {B} to make the prime factorization of {B} and {N!} agree almost exactly, except that {B} is missing most of the powers of {2} in {N!}, while having some additional large prime factors beyond those contained in {N!} to compensate. With a suitable choice of threshold {t}, one can then replace these excess large prime factors with powers of two to obtain a factorization of {N!} into {N} terms that are all at least {t}, giving the lower bound.

The general approach of first locating some approximate factorization of {N!} (where the approximation is in the “adelic” sense of having not just approximately the right magnitude, but also approximately the right number of factors of {p} for various primes {p}), and then moving factors around to get an exact factorization of {N!}, looks promising for also resolving the conjectures (ii), (iii) mentioned above. For instance, I was numerically able to verify that {t(300000) \geq 90000} by the following procedure:

  • Start with the approximate factorization of {N!}, {N = 300000} by {B = (\prod_{90000 \leq n < 102000, \hbox{ odd}} n)^{50}}. Thus {B} is the product of {N} odd numbers, each of which is at least {90000}.
  • Call an odd prime {B}-heavy if it divides {B} more often than {N!}, and {N!}-heavy if it divides {N!} more often than {B}. It turns out that there are {14891} more {B}-heavy primes than {N!}-heavy primes (counting multiplicity). On the other hand, {N!} contains {2999992} powers of {2}, while {B} has none. This represents the (multi-)set of primes one has to redistribute in order to convert a factorization of {B} to a factorization of <img src="https://s0.wp.com/latex.php?latex=%7BN%21%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%7BN%21%7D&#038;bg=ffffff
Wednesday, February 26th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
4:47 am
The three-dimensional Kakeya conjecture, after Wang and Zahl

There has been some spectacular progress in geometric measure theory: Hong Wang and Joshua Zahl have just released a preprint that resolves the three-dimensional case of the infamous Kakeya set conjecture! This conjecture asserts that a Kakeya set – a subset of {{\bf R}^3} that contains a unit line segment in every direction, must have Minkowski and Hausdorff dimension equal to three. (There is also a stronger “maximal function” version of this conjecture that remains open at present, although the methods of this paper will give some non-trivial bounds on this maximal function.) It is common to discretize this conjecture in terms of small scale {0 < \delta < 1}. Roughly speaking, the conjecture then asserts that if one has a family {\mathbb{T}} of {\delta \times \delta \times 1} tubes of cardinality {\approx \delta^{-2}}, and pointing in a {\delta}-separated set of directions, then the union {\bigcup_{T \in \mathbb{T}} T} of these tubes should have volume {\approx 1}. Here we shall be a little vague as to what {\approx} means here, but roughly one should think of this as “up to factors of the form {O_\varepsilon(\delta^{-\varepsilon})} for any {\varepsilon>0}“; in particular this notation can absorb any logarithmic losses that might arise for instance from a dyadic pigeonholing argument. For technical reasons (including the need to invoke the aforementioned dyadic pigeonholing), one actually works with slightly smaller sets {\bigcup_{T \in \mathbb{T}} Y(T)}, where {Y} is a “shading” of the tubes in {\mathbb{T}} that assigns a large subset {Y(T)} of {T} to each tube {T} in the collection; but for this discussion we shall ignore this subtlety and pretend that we can always work with the full tubes.

Previous results in this area tended to center around lower bounds of the form

\displaystyle  |\bigcup_{T \in \mathbb{T}} T| \gtrapprox \delta^{3-d} \ \ \ \ \ (1)

for various intermediate dimensions {0 < d < 3}, that one would like to make as large as possible. For instance, just from considering a single tube in this collection, one can easily establish (1) with {d=1}. By just using the fact that two lines in {{\bf R}^3} intersect in a point (or more precisely, a more quantitative estimate on the volume between the intersection of two {\delta \times \delta \times 1} tubes, based on the angle of intersection), combined with a now classical {L^2}-based argument of Córdoba, one can obtain (1) with {d=2} (and this type of argument also resolves the Kakeya conjecture in two dimensions). In 1995, building on earlier work by Bourgain, Wolff famously obtained (1) with {d=2.5} using what is now known as the “Wolff hairbrush argument”, based on considering the size of a “hairbrush” – the union of all the tubes that pass through a single tube (the hairbrush “stem”) in the collection.

In their new paper, Wang and Zahl established (1) for {d=3}. The proof is lengthy (127 pages!), and relies crucially on their previous paper establishing a key “sticky” case of the conjecture. Here, I thought I would try to summarize the high level strategy of proof, omitting many details and also oversimplifying the argument at various places for sake of exposition. The argument does use many ideas from previous literature, including some from my own papers with co-authors; but the case analysis and iterative schemes required are remarkably sophisticated and delicate, with multiple new ideas needed to close the full argument.

A natural strategy to prove (1) would be to try to induct on {d}: if we let {K(d)} represent the assertion that (1) holds for all configurations of {\approx \delta^{-2}} tubes of dimensions {\delta \times \delta \times 1}, with {\delta}-separated directions, we could try to prove some implication of the form {K(d) \implies K(d + \alpha)} for all {0 < d < 3}, where {\alpha>0} is some small positive quantity depending on {d}. Iterating this, one could hope to get {d} arbitrarily close to {3}.

A general principle with these sorts of continuous induction arguments is to first obtain the trivial implication {K(d) \implies K(d)} in a non-trivial fashion, with the hope that this non-trivial argument can somehow be perturbed or optimized to get the crucial improvement {K(d) \implies K(d+\alpha)}. The standard strategy for doing this, since the work of Bourgain and then Wolff in the 1990s (with precursors in older work of Córdoba), is to perform some sort of “induction on scales”. Here is the basic idea. Let us call the {\delta \times \delta \times 1} tubes {T} in {\mathbb{T}} “thin tubes”. We can try to group these thin tubes into “fat tubes” of dimension {\rho \times \rho \times 1} for some intermediate scale {\delta \ll \rho \ll 1}; it is not terribly important for this sketch precisely what intermediate value is chosen here, but one could for instance set {\rho = \sqrt{\delta}} if desired. Because of the {\delta}-separated nature of the directions in {\mathbb{T}}, there can only be at most {\lessapprox (\rho/\delta)^{2}} thin tubes in a given fat tube, and so we need at least {\gtrapprox \rho^{-2}} fat tubes to cover the {\approx \delta^{-2}} thin tubes. Let us suppose for now that we are in the “sticky” case where the thin tubes stick together inside fat tubes as much as possible, so that there are in fact a collection {\mathbb{T}_\rho} of {\approx \rho^{-2}} fat tubes {T_\rho}, with each fat tube containing about {\approx (\rho/\delta)^{2}} of the thin tubes. Let us also assume that the fat tubes {T_\rho} are {\rho}-separated in direction, which is an assumption which is highly consistent with the other assumptions made here.

If we already have the hypothesis {K(d)}, then by applying it at scale {\rho} instead of {\delta} we conclude a lower bound on the volume occupied by fat tubes:

\displaystyle  |\bigcup_{T_\rho \in \mathbb{T}_\rho} T_\rho| \gtrapprox \rho^{3-d}.

Since {\sum_{T_\rho \in \mathbb{T}_\rho} |T_\rho| \approx \rho^{-2} \rho^2 = 1}, this morally tells us that the typical multiplicity {\mu_{fat}} of the fat tubes is {\lessapprox \rho^{d-3}}; a typical point in {\bigcup_{T_\rho \in \mathbb{T}_\rho} T_\rho} should belong to about {\mu_{fat} \lessapprox \rho^{d-3}} fat tubes.

Now, inside each fat tube {T_\rho}, we are assuming that we have about {\approx (\rho/\delta)^{2}} thin tubes that are {\delta}-separated in direction. If we perform a linear rescaling around the axis of the fat tube by a factor of {1/\rho} to turn it into a {1 \times 1 \times 1} tube, this would inflate the thin tubes to be rescaled tubes of dimensions {\delta/\rho \times \delta/\rho \times 1}, which would now be {\approx \delta/\rho}-separated in direction. This rescaling does not affect the multiplicity of the tubes. Applying {K(d)} again, we see morally that the multiplicity {\mu_{fine}} of the rescaled tubes, and hence the thin tubes inside {T_\rho}, should be {\lessapprox (\delta/\rho)^{d-3}}.

We now observe that the multiplicity {\mu} of the full collection {\mathbb{T}} of thin tubes should morally obey the inequality

\displaystyle  \mu \lessapprox \mu_{fat} \mu_{fine}, \ \ \ \ \ (2)

since if a given point lies in at most {\mu_{fat}} fat tubes, and within each fat tube a given point lies in at most {\mu_{fine}} thin tubes in that fat tube, then it should only be able to lie in at most {\mu_{fat} \mu_{fine}} tubes overall. This heuristically gives {\mu \lessapprox \rho^{d-3} (\delta/\rho)^{d-3} = \delta^{d-3}}, which then recovers (1) in the sticky case.

In their previous paper, Wang and Zahl were roughly able to squeeze a little bit more out of this argument to get something resembling {K(d) \implies K(d+\alpha)} in the sticky case, loosely following a strategy of Nets Katz and myself that I discussed in this previous blog post from over a decade ago. I will not discuss this portion of the argument further here, referring the reader to the introduction to that paper; instead, I will focus on the arguments in the current paper, which handle the non-sticky case.

Let’s try to repeat the above analysis in a non-sticky situation. We assume {K(d)} (or some suitable variant thereof), and consider some thickened Kakeya set

\displaystyle  E = \bigcup_{T \in {\mathbb T}} T

where {{\mathbb T}} is something resembling what we might call a “Kakeya configuration” at scale {\delta}: a collection of {\delta^{-2}} thin tubes of dimension {\delta \times \delta \times 1} that are {\delta}-separated in direction. (Actually, to make the induction work, one has to consider a more general family of tubes than these, satisfying some standard “Wolff axioms” instead of the direction separation hypothesis; but we will gloss over this issue for now.) Our goal is to prove something like {K(d+\alpha)} for some {\alpha>0}, which amounts to obtaining some improved volume bound

\displaystyle  |E| \gtrapprox \delta^{3-d-\alpha}

that improves upon the bound {|E| \gtrapprox \delta^{3-d}} coming from {K(d)}. From the previous paper we know we can do this in the “sticky” case, so we will assume that {E} is “non-sticky” (whatever that means).

A typical non-sticky setup is when there are now {m \rho^{-2}} fat tubes for some multiplicity {m \ggg 1} (e.g., {m = \delta^{-\eta}} for some small constant {\eta>0}), with each fat tube containing only {m^{-1} (\delta/\rho)^{-2}} thin tubes. Now we have an unfortunate imbalance: the fat tubes form a “super-Kakeya configuration”, with too many tubes at the coarse scale {\rho} for them to be all {\rho}-separated in direction, while the thin tubes inside a fat tube form a “sub-Kakeya configuration” in which there are not enough tubes to cover all relevant directions. So one cannot apply the hypothesis {K(d)} efficiently at either scale.

This looks like a serious obstacle, so let’s change tack for a bit and think of a different way to try to close the argument. Let’s look at how {E} intersects a given {\rho}-ball {B(x,\rho)}. The hypothesis {K(d)} suggests that {E} might behave like a {d}-dimensional fractal (thickened at scale {\delta}), in which case one might be led to a predicted size of {E \cap B(x,\rho)} of the form {(\rho/\delta)^d \delta^3}. Suppose for sake of argument that the set {E} was denser than this at this scale, for instance we have

\displaystyle  |E \cap B(x,\rho)| \gtrapprox (\rho/\delta)^d \delta^{3-\alpha} \ \ \ \ \ (3)

for all {x \in E} and some {\alpha>0}. Observe that the {\rho}-neighborhood {E} is basically {\bigcup_{T_\rho \in {\mathbb T}_\rho} T_\rho}, and thus has volume {\gtrapprox \rho^{3-d}} by the hypothesis {K(d)} (indeed we would ev
Monday, February 24th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
12:10 am
Closing the “green gap”: from the mathematics of the landscape function to lower electricity costs for households

I recently returned from the 2025 Annual Meeting of the “Localization of Waves” collaboration (supported by the Simons Foundation, with additional related support from the NSF), where I learned (from Svitlana Mayboroda, the director of the collaboration as well as one of the principal investigators) of a remarkable statistic: net electricity consumption by residential customers in the US has actually experienced a slight decrease in recent years:

The decrease is almost entirely due to gains in lighting efficiency in households, and particularly the transition from incandescent (and compact fluorescent) light bulbs to LED light bulbs:

Annual energy savings from this switch to consumers in the US were already estimated to be $14.7 billion in 2020 – or several hundred dollars per household – and are projected to increase, even in the current inflationary era, with the cumulative savings across the US estimated to reach $890 billion by 2035.

What I also did not realize before this meeting is the role that recent advances in pure mathematics – and specifically, the development of the “landscape function” that was a primary focus of this collaboration – played in accelerating this transition. This is not to say that this piece of mathematics was solely responsible for these developments; but, as I hope to explain here, it was certainly part of the research and development ecosystem in both academia and industry, spanning multiple STEM disciplines and supported by both private and public funding. This application of the landscape function was already reported upon by Quanta magazine at the very start of this collaboration back in 2017; but it is only in the last few years that the mathematical theory has been incorporated into the latest LED designs and led to actual savings at the consumer end.

LED lights are made from layers of semiconductor material (e.g., Gallium nitride or Indium gallium nitride) arranged in a particular fashion. When enough of a voltage difference is applied to this material, electrons are injected into the “n-type” side of the LED, while holes of electrons are injected into the “p-type” side, creating a current. In the active layer of the LED, these electrons and holes recombine in the quantum wells of the layer, generating radiation (light) via the mechanism of electroluminescence. The brightness of the LED is determined by the current, while the power consumption is the product of the current and the voltage. Thus, to improve energy efficiency, one seeks to design LEDs to require as little voltage as possible to generate a target amount of current.

As it turns out, the efficiency of an LED, as well as the spectral frequencies of light they generate, depend in many subtle ways on the precise geometry of the chemical composition of the semiconductors, the thickness of the layers, the geometry of how the layers are placed atop one another, the temperature of the materials, and the amount of disorder (impurities) introduced into each layer. In particular, in order to create quantum wells that can efficiently trap the electrons and holes together to recombine to create light of a desired frequency, it is useful to introduce a certain amount of disorder into the layers in order to take advantage of the phenomenon of Anderson localization. However, one cannot add too much disorder, lest the electron states become fully bound and the material behaves too much like an insulator to generate appreciable current.

One can of course make empirical experiments to measure the performance of various proposed LED designs by fabricating them and then testing them in a laboratory. But this is an expensive and painstaking process that does not scale well; one cannot test thousands of candidate designs this way to isolate the best performing ones. So, it becomes desirable to perform numerical simulations of these designs instead, which – if they are sufficiently accurate and computationally efficient – can lead to a much shorter and cheaper design cycle. (In the near future one may also hope to accelerate the design cycle further by incorporating machine learning and AI methods; but these techniques, while promising, are still not fully developed at the present time.)

So, how can one perform numerical simulation of an LED? By the semiclassical approximation, the wave function {\psi_i} of an individual electron should solve the time-independent Schrödinger equation

\displaystyle -\frac{\hbar^2}{2m_e} \Delta \psi_i + E_c \psi_i = E_i \psi_i,

where {\psi} is the wave function of the electron at this energy level, and {E_c} is the conduction band energy. The behavior of hole wavefunctions follows a similar equation, governed by the valence band energy {E_v} instead of {E_c}. However, there is a complication: these band energies are not solely coming from the semiconductor, but also contain a contribution {\mp e \varphi} that comes from electrostatic effects from the electrons and holes, and more specifically by solving the Poisson equation

\displaystyle \mathrm{div}( \varepsilon_r \nabla \varphi ) = \frac{e}{\varepsilon_0} (n-p + N_A^+ - N_D^-)

where {\varepsilon_r} is the dielectric constant of the semiconductor, {n,p} are the carrier densities of electrons and holes respectively, {N_A^+}, {N_D^-} are further densities of ionized acceptor and donor atoms, and {\hbar, m_e, e, \varepsilon_0} are physical constants. This equation looks somewhat complicated, but is mostly determined by the carrier densities {n,p}, which in turn ultimately arise from the probability densities {|\psi_i|^2} associated to the eigenfunctions {\psi_i} via the Born rule, combined with the Fermi-Dirac distribution from statistical mechanics; for instance, the electron carrier density {n} is given by the formula

\displaystyle n = \sum_i \frac{|\psi_i|^2}{1 + e^{(E_i - E_{Fn})/k_B T}},

with a similar formula for {p}. In particular, the net potential {E_c} depends on the wave functions {\psi_i}, turning the Schrödinger equation into a nonlinear self-consistent Hartree-type equation. From the wave functions one can also compute the current, determine the amount of recombination between electrons and holes, and therefore also calculate the light intensity and absorption rates. But the main difficulty is to solve for the wave functions {\psi_i} for the different energy levels of the electron (as well as the counterpart for holes).

One could attempt to solve this nonlinear system iteratively, by first proposing an initial candidate for the wave functions {\psi_i}, using this to obtain a first approximation for the conduction band energy {E_c} and valence band energy {E_v}, and then solving the Schrödinger equations to obtain a new approximation for {\psi_i}, and repeating this process until it converges. However, the regularity of the potentials {E_c, E_v} plays an important role in being able to solve the Schrödinger equation. (The Poisson equation, being elliptic, is relatively easy to solve to high accuracy by standard methods, such as finite element methods.) If the potential {E_c} is quite smooth and slowly varying, then one expects the wave functions {\psi_i} to be quite delocalizated, and for traditional approximations such as the WKB approximation to be accurate.

However, in the presence of disorder, such approximations are no longer valid. As a consequence, traditional methods for numerically solving these equations had proven to be too inaccurate to be of practical use in simulating the performance of a LED design, so until recently one had to rely primarily on slower and more expensive empirical testing methods. One real-world consequence of this was the “green gap“; while reasonably efficient LED designs were available in the blue and red portions of the spectrum, there was not a suitable design that gave efficient output in the green spectrum. Given that many applications of LED lighting required white light that was balanced across all visible colors of the spectrum, this was a significant impediment to realizing the energy-saving potential of LEDs.

Here is where the landscape function comes in. This function started as a purely mathematical discovery: when solving a Schrödinger equation such as

\displaystyle -\Delta \phi + V \phi = E \phi

(where we have now suppressed all physical constants for simplicity), it turns out that the behavior of the eigenfunctions {\phi} at various energy levels {E} is controlled to a remarkable extent by the landscape function {u}, defined to be the solution to the equation

\displaystyle -\Delta u + V u = 1.

As discussed in this previous blog post (discussing a paper on this topic I wrote with some of the members of this collaboration), one reason for this is that the Schrödinger equation can be transformed after some routine calculations to

\displaystyle -\frac{1}{u^2} \mathrm{div}( u^2 \nabla (\phi/u)) + \frac{1}{u} (\phi/u) = E (\phi/u),

thus making {\frac{1}{u}} an effective potential for the Schrödinger equation (and {u^2} also being the coefficients of an effective geometry for the equation). In practice, when {V} is a disordered potential, the effective potential {1/u} tends to be behave like a somewhat “smoothed out” or “homogenized” version of {V} that exhibits superior numerical performance. For instance, the classical Weyl law predicts (assuming a smooth confining potential {V}) that the density of states up to energy {E} – that is to say, the number of bound states up to {E} – should asymptotically behave like {\frac{1}{(2\pi)^2}|\{ (x,\xi): \xi^2 + V(x) \leq E\}|}. This is accurate at very high energies {E}, but when {V} is disordered, it tends to break down at low and medium energies. However, the landscape function makes a prediction {\frac{1}{(2\pi)^2}|\{ (x,\xi): \xi^2 + 1/u(x) \leq E\}|} for this density of states that is significantly more accurate in practice in these regimes, with a mathematical justification (up to multiplicative constants) of this accuracy obtained in this paper of David, Filoche, and Mayboroda. More refined predictions (again with some degree of theoretical support from mathematical analysis) can be made on the local integrated density of states, and with more work one can then also obtain approximations for the carrier density functions {n,p} mentioned previously in terms of the energy band level functions {E_c}, {E_v}. As the landscape function {u} is relatively easy to compute (coming from solving a single elliptic equation), this gives a very practical numerical way to carry out the iterative procedure described previously to model LEDs in a way that has proven to be both numerically accurate, and significantly faster than empirical testing, leading to a significantly more rapid design cycle.

In particular, recent advances in LED technology have largely closed the “green gap” by introducing designs that incorporate “{V}-defects”: {V}-shaped dents in the semiconductor layers of the LED that create lateral carrier injection pathways and modify the internal electric field, enhancing hole transport into the active layer. The ability to accurately simulate the effects of these defects has allowed researchers to largely close this gap:

My understanding is that the major companies involved in developing LED lighting are now incorporating landscape-based methods into their own proprietary simulation models to achieve similar effects in commercially produced LEDs, which should lead to further energy savings in the near future.

Thanks to Svitlana Mayboroda and Marcel Filoche for detailed discussions, comments, and corrections of the material here.

Thursday, February 13th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
6:14 pm
Cosmic Distance Ladder videos with Grant Sanderson (3blue1brown): commentary and corrections

Grant Sanderson (who runs, and creates most of the content for, the website and Youtube channel 3blue1brown) has been collaborating with myself and others (including my coauthor Tanya Klowden) on producing a two-part video giving an account of some of the history of the cosmic distance ladder, building upon a previous public lecture I gave on this topic, and also relating to a forthcoming popular book with Tanya on this topic. The first part of this video is available here; the second part is available here.

The videos were based on a somewhat unscripted interview that Grant conducted with me some months ago, and as such contained some minor inaccuracies and omissions (including some made for editing reasons to keep the overall narrative coherent and within a reasonable length). They also generated many good questions from the viewers of the Youtube video. I am therefore compiling here a “FAQ” of various clarifications and corrections to the videos; this was originally placed as a series of comments on the Youtube channel, but the blog post format here will be easier to maintain going forward. Some related content will also be posted on the Instagram page for the forthcoming book with Tanya.

Questions on the two main videos are marked with an appropriate timestamp to the video.

Comments on part 1 of the video

  • 4:26 Did Eratosthenes really check a local well in Alexandria?

    This was a narrative embellishment on my part. Eratosthenes’s original work is lost to us. The most detailed contemperaneous account, by Cleomedes, gives a simplified version of the method, and makes reference only to sundials (gnomons) rather than wells. However, a secondary account of Pliny states (using this English translation), “Similarly it is reported that at the town of Syene, 5000 stades South of Alexandria, at noon in midsummer no shadow is cast, and that in a well made for the sake of testing this the light reaches to the bottom, clearly showing that the sun is vertically above that place at the time”. However, no mention is made of any well in Alexandria in either account.
  • 4:50 How did Eratosthenes know that the Sun was so far away that its light rays were close to parallel?

    This was not made so clear in our discussions or in the video (other than a brief glimpse of the timeline at 18:27), but Eratosthenes’s work actually came after Aristarchus, so it is very likely that Eratosthenes was aware of Aristarchus’s conclusions about how distant the Sun was from the Earth. Even if Aristarchus’s heliocentric model was disputed by the other Greeks, at least some of his other conclusions appear to have attracted some support. Also, after Eratosthenes’s time, there was further work by Greek, Indian, and Islamic astronomers (such as Hipparchus, Ptolemy, Aryabhata, and Al-Battani) to measure the same distances that Aristarchus did, although these subsequent measurements for the Sun also were somewhat far from modern accepted values.
  • 5:17 Is it completely accurate to say that on the summer solstice, the Earth’s axis of rotation is tilted “directly towards the Sun”?

    Strictly speaking, “in the direction towards the Sun” is more accurate than “directly towards the Sun”; it tilts at about 23.5 degrees towards the Sun, but it is not a total 90-degree tilt towards the Sun.
  • 5:39 Wait, aren’t there two tropics? The tropic of Cancer and the tropic of Capricorn?

    Yes! This corresponds to the two summers Earth experiences, one in the Northern hemisphere and one in the Southern hemisphere. The tropic of Cancer, at a latitude of about 23 degrees north, is where the Sun is directly overhead at noon during the Northern summer solstice (around June 21); the tropic of Capricorn, at a latitude of about 23 degrees south, is where the Sun is directly overhead at noon during the Southern summer solstice (around December 21). But Alexandria and Syene were both in the Northern Hemisphere, so it is the tropic of Cancer that is relevant to Eratosthenes’ calculations.
  • 5:41 Isn’t it kind of a massive coincidence that Syene was on the tropic of Cancer?

    Actually, Syene (now known as Aswan) was about half a degree of latitude away from the tropic of Cancer, which was one of the sources of inaccuracy in Eratosthenes’ calculations.  But one should take the “look-elsewhere effect” into account: because the Nile cuts across the tropic of Cancer, it was quite likely to happen that the Nile would intersect the tropic near some inhabited town.  It might not necessarily have been Syene, but that would just mean that Syene would have been substituted by this other town in Eratosthenes’s account.  

    On the other hand, it was fortunate that the Nile ran from North to South, so that distances between towns were a good proxy for the differences in latitude.  Apparently, Eratosthenes actually had a more complicated argument that would also work if the two towns in question were not necessarily oriented along the North-South direction, and if neither town was on the tropic of Cancer; but unfortunately the original writings of Eratosthenes are lost to us, and we do not know the details of this more general argument. (But some variants of the method can be found in later work of Posidonius, Aryabhata, and others.)

    Nowadays, the “Eratosthenes experiment” is run every year on the March equinox, in which schools at the same longitude are paired up to measure the elevation of the Sun at the same point in time, in order to obtain a measurement of the circumference of the Earth.  (The equinox is more convenient than the solstice when neither location is on a tropic, due to the simple motion of the Sun at that date.) With modern timekeeping, communications, surveying, and navigation, this is a far easier task to accomplish today than it was in Eratosthenes’ time.
  • 6:30 I thought the Earth wasn’t a perfect sphere. Does this affect this calculation?

    Yes, but only by a small amount. The centrifugal forces caused by the Earth’s rotation along its axis cause an equatorial bulge and a polar flattening so that the radius of the Earth fluctuates by about 20 kilometers from pole to equator. This sounds like a lot, but it is only about 0.3% of the mean Earth radius of 6371 km and is not the primary source of error in Eratosthenes’ calculations.
  • 7:27 Are the riverboat merchants and the “grad student” the leading theories for how Eratosthenes measured the distance from Alexandria to Syene?

    There is some recent research that suggests that Eratosthenes may have drawn on the work of professional bematists (step measurers – a precursor to the modern profession of surveyor) for this calculation. This somewhat ruins the “grad student” joke, but perhaps should be disclosed for the sake of completeness.
  • 8:51 How long is a “lunar month” in this context? Is it really 28 days?

    In this context the correct notion of a lunar month is a “synodic month” – the length of a lunar cycle relative to the Sun – which is actually about 29 days and 12 hours. It differs from the “sidereal month” – the length of a lunar cycle relative to the fixed stars – which is about 27 days and 8 hours – due to the motion of the Earth around the Sun (or the Sun around the Earth, in the geocentric model). [A similar correction needs to be made around 14:59, using the synodic month of 29 days and 12 hours rather than the “English lunar month” of 28 days (4 weeks).]
  • 10:47 Is the time taken for the Moon to complete an observed rotation around the Earth slightly less than 24 hours as claimed?

    Actually, I made a sign error: the lunar day (also known as a tidal day) is actually 24 hours and 50 minutes, because the Moon rotates in the same direction as the spinning of Earth around its axis. The animation therefore is also moving in the wrong direction as well (related to this, the line of sight is covering up the Moon in the wrong direction to the Moon rising at around 10:38).
  • 14:49 I thought the sine function was introduced well after the ancient Greeks.

    It’s true that the modern sine function only dates back to the Indian and Islamic mathematical traditions in the first millennium CE, several centuries after Aristarchus.  However, he still had Euclidean geometry at his disposal, which provided tools such as similar triangles that could be used to reach basically the same conclusions, albeit with significantly more effort than would be needed if one could use modern trigonometry.

    On the other hand, Aristarchus was somewhat hampered by not knowing an accurate value for \pi, which is also known as Archimedes’ constant: the fundamental work of Archimedes on this constant actually took place a few decades after that of Aristarchus!
  • 15:17 I plugged in the modern values for the distances to the Sun and Moon and got 18 minutes for the discrepancy, instead of half an hour.

    Yes; I quoted the wrong number here. In 1630, Godfried Wendelen replicated Aristarchus’s experiment. With improved timekeeping and the then-recent invention of the telescope, Wendelen obtained a measurement of half an hour for the discrepancy, which is significantly better than Aristarchus’s calculation of six hours, but still a little bit off from the true value of 18 minutes. (As such, Wendelinus’s estimate for the distance to the Sun was 60% of the true value.)
  • 15:27 Wouldn’t Aristarchus also have access to other timekeeping devices than sundials?

    Yes, for instance clepsydrae (water clocks) were available by that time; but they were of limited accuracy. It is also possible that Aristarchus could have used measurements of star elevations to also estimate time; it is not clear whether the astrolabe or the armillary sphere was available to him, but he would have had some other more primitive astronomical instruments such as the dioptra at his disposal. But again, the accuracy and calibration of these timekeeping tools would have been poor.

    However, most likely the more important limiting factor was the ability to determine the precise moment at which a perfect half Moon (or new Moon, or full Moon) occurs; this is extremely difficult to do with the naked eye. (The telescope would not be invented for almost two more millennia.)
  • 17:37 Could the parallax problem be solved by assuming that the stars are not distributed in a three-dimensional space, but instead on a celestial sphere?

    Putting all the stars on a fixed sphere would make the parallax effects less visible, as the stars in a given portion of the sky would now all move together at the same apparent velocity – but there would still be visible large-scale distortions in the shape of the constellations because the Earth would be closer to some portions of the celestial sphere than others. (This problem would be solved if the celestial sphere was somehow centered around the moving Earth rather than the fixed Sun, but then this basically becomes the geocentric model with extra steps.)
  • 18:29 Did nothing of note happen in astronomy between Eratosthenes and Kepler?

    Not at all! There were significant mathematical, technological, theoretical, and observational advances by astronomers from many cultures (Greek, Islamic, Indian, Chinese, European, and others) during this time, for instance improving some of the previous measurements on the distance ladder, a better understanding of eclipses, axial tilt, and even axial precession, more sophisticated trigonometry, and the development of new astronomical tools such as the astrolabe. See for instance this “deleted scene” from the video, as well as the FAQ entry for 14:49. But in order to make the overall story of the cosmic distance ladder fit into a two-part video, we chose to focus primarily on the first time each rung of the ladder was climbed.
  • 18:30 Is that really Kepler’s portrait?

    We have since learned that this portrait was most likely painted in the 19th century, and may have been based more on Kepler’s mentor, Michael Mästlin. A more commonly accepted portrait of Kepler may be found at his current Wikipedia page.
  • 19:07 Isn’t it tautological to say that the Earth takes one year to perform a full orbit around the Sun?

    Technically yes, but this is an illustration of the philosophical concept of “referential opacity“: the content of a sentence can change when substituting one term for another (e.g., “1 year” and “365 days”), even when both terms refer to the same object. Amusingly, the classic illustration of this, known as Frege’s puzzles, also comes from astronomy: it is an informative statement that Hesperus (the evening star) and Phosphorus (the morning star, also known as Lucifer) are the same object (which nowadays we call Venus), but it is a mere tautology that Hesperus and Hesperus are the same object: changing the reference from Phosphorus to Hesperus changes the meaning.
  • 19:10 How did Copernicus figure out the crucial fact that Mars takes 687 days to go around the Sun? Was it directly drawn from Babylonian data?

    Technically, Copernicus drew from tables by European astronomers that were largely based on earlier tables from the Islamic golden age, which in turn drew from earlier tables by Indian and Greek astronomers, the latter of which also incorporated data from the ancient Babylonians, so it is more accurate to say that Copernicus relied on centuries of data, at least some of which went all the way back to the Babylonians. Among all of this data was the times when Mars was in opposition to the Sun; if one imagines the Earth and Mars as being like runners going around a race track circling the Sun, with Earth on an inner track and Mars on an outer track, oppositions are analogous to when the Earth runner “laps” the Mars runner. From the centuries of observational data, such “laps” were known to occur about once every 780 days (this is known as the synodic period of Mars). Because the Earth takes roughly 365 days to perform a “lap”, it is possible to do a little math and conclude that Mars must therefore complete its own “lap” in 687 days (this is known as the sidereal period of Mars). (See also this post on the cosmic distance ladder Instagram for some further elaboration.)
  • 20:52 Did Kepler really steal data from Brahe?

    The situation is complex. When Kepler served as Brahe’s assistant, Brahe only provided Kepler with a limited amount of data, primarily involving Mars, in order to confirm Brahe’s own geo-heliocentric model. After Brahe’s death, the data was inherited by Brahe’s son-in-law and other relatives, who intended to publish Brahe’s work separately; however, Kepler, who was appointed as Imperial Mathematician to succeed Brahe, had at least some partial access to the data, and many historians believe he secretly copied portions of this data to aid his own research before finally securing complete access to the data from Brahe’s heirs after several years of disputes. On the other hand, as intellectual property rights laws were not well developed at this time, Kepler’s actions were technically legal, if ethically questionable.
  • 21:39 What is that funny loop in the orbit of Mars?

    This is known as retrograde motion. This arises because the orbital velocity of Earth (about 30 km/sec) is a little bit larger than that of Mars (about 24 km/sec). So, in opposition (when Mars is in the opposite position in the sky than the Sun), Earth will briefly overtake Mars, causing its observed position to move westward rather than eastward. But in most other times, the motion of Earth and Mars are at a sufficient angle that Mars will continue its apparent eastward motion despite the slightly faster speed of the Earth.
  • 21:59 Couldn’t one also work out the direction to other celestial objects in addition to the Sun and Mars, such as the stars, the Moon, or the other planets?  Would that have helped?

    Actually, the directions to the fixed stars were implicitly used in all of these observations to determine how the celestial sphere was positioned, and all the other directions were taken relative to that celestial sphere.  (Otherwise, all the calculations would be taken on a rotating frame of reference in which the unknown orbits of the planets were themselves rotating, which would have been an even more complex task.)  But the stars are too far away to be useful as one of the two landmarks to triangulate from, as they generate almost no parallax and so cannot distinguish one location from another.

    Measuring the direction to the Moon would tell you which portion of the lunar cycle one was in, and would determine the phase of the Moon, but this information would not help one triangulate, because the Moon’s position in the heliocentric model varies over time in a somewhat complicated fashion, and is too tied to the motion of the Earth to be a useful “landmark” to one to determine the Earth’s orbit around the Sun.

    In principle, using the measurements to all the planets at once could allow for some multidimensional analysis that would be more accurate than analyzing each of the planets separately, but this would require some sophisticated statistical analysis and modeling, as well as non-trivial amounts of compute – neither of which were available in Kepler’s time.
  • 22:57 Can you elaborate on how we know that the planets all move on a plane?

    The Earth’s orbit lies in a plane known as the ecliptic (it is where the lunar and solar eclipses occur). Different cultures have divided up the ecliptic in various ways; in Western astrology, for instance, the twelve main constellations that cross the ecliptic are known as the Zodiac. The planets can be observed to only wander along the Zodiac, but not other constellations: for instance, Mars can be observed to be in Cancer or Libra, but never in Orion or Ursa Major. From this, one can conclude (as a first approximation, at least), that the planets all lie on the ecliptic.

    However, this isn’t perfectly true, and the planets will deviate from the ecliptic by a small angle known as the ecliptic latitude. Tycho Brahe’s observations on these latitudes for Mars were an additional useful piece of data that helped Kepler complete his calculations (basically by suggesting how to join together the different “jigsaw pieces”), but the math here gets somewhat complicated, so the story here has been somewhat simplified to convey the main ideas.
  • 23:28 Can one work out the position of Earth from fixed locations of the Sun and Mars when the Sun and Mars are in conjunction (the same location in the sky) or opposition (opposite locations in the sky)?

    Technically, these are two times when the technique of triangulation fails to be accurate; and also in the former case it is extremely difficult to observe Mars due to the proximity to the Sun. But again, following the Universal Problem Solving Tip from 23:07, one should initially ignore these difficulties to locate a viable method, and correct for these issues later.
  • 24:04 So Kepler used Copernicus’s calculation of 687 days for the period of Mars. But didn’t Kepler discard Copernicus’s theory of circular orbits?

    Good question! It turns out that Copernicus’s calculations of orbital periods are quite robust (especially with centuries of data), and continue to work even when the orbits are not perfectly circular. But even if the calculations did depend on the circular orbit hypothesis, it would have been possible to use the Copernican model as a first approximation for the period, in order to get a better, but still approximate, description of the orbits of the planets. This in turn can be fed back into the Copernican calculations to give a second approximation to the period, which can then give a further refinement of the orbits. Thanks to the branch of mathematics known as perturbation theory, one can often make this type of iterative process converge to an exact answer, with the error in each successive approximation being smaller than the previous one. (But performing such an iteration would probably have been beyond the computational resources available in Kepler’s time; also, the foundations of perturbation theory require calculus, which only was developed several decades after Kepler.)
  • 24:21 Did Brahe have exactly 10 years of data on Mars’s positions?

    Actually, it was more like 17 years, but with many gaps, due both to inclement weather, as well as Brahe turning his attention to other astronomical objects than Mars in some years; also, in times of conjunction, Mars might only be visible in the daytime sky instead of the night sky, again complicating measurements. So the “jigsaw puzzle pieces” in 25:26 are in fact more complicated than always just five locations equally spaced in time; there are gaps and also observational errors to grapple with. But to understand the method one should ignore these complications; again, see “Universal Problem Solving Tip #1”. Even with his “idea of true genius” (which, incidentally one can find in Einstein’s introduction to Carola Baumgardt’s “Life of Kepler“), it took many years of further painstaking calculation for Kepler to tease out his laws of planetary motion from Brahe’s messy and incomplete observational data.
  • 26:44 Shouldn’t the Earth’s orbit be spread out at perihelion and clustered closer together at aphelion, to be consistent with Kepler’s laws?

    Yes, you are right; there was a coding error here.

Comments on the “deleted scene” on Al-Biruni

  • Was Al-Biruni really of Arab origin?

    Strictly speaking; no; his writings are all in Arabic, and he was nominally a subject of the Abbasid Caliphate whose rulers were Arab; but he was born in Khwarazm (in modern day Uzbekistan), and would have been a subject of either the Samanid empire or the Khrawazmian empire, both of which were largely self-governed and primarily Persian in culture and ethnic makeup, despite being technically vassals of the Caliphate. So he would have been part of what is sometimes called “Greater Persia” or “Greater Iran”.

    Another minor correction: while Al-Biruni was born in the tenth century, his work on the measurement of the Earth was published in the early eleventh century.
  • Is \theta really called the angle of declination?

    This was a misnomer on my part; this angle is more commonly called the dip angle.
  • But the height of the mountain would be so small compared to the radius of the Earth! How could this method work?

    Using the Taylor approximation \cos \theta \approx 1 - \theta^2/2, one can approximately write the relationship R = \frac{h \cos \theta}{1-\cos \theta} between the mountain height h, the Earth radius R, and the dip angle \theta (in radians) as R \approx 2 h / \theta^2. The key point here is the inverse quadratic dependence on \theta, which allows for even relatively small values of h to still be realistically useful for computing R. Al-Biruni’s measurement of the dip angle \theta was about 0.01 radians, leading to an estimate of R that is about four orders of magnitude larger than h, which is within ballpark at least of a typical height of a mountain (on the order of a kilometer) and the radius of the Earth (6400 kilometers).
  • Was the method really accurate to within a percentage point?

    This is disputed, somewhat similarly to the previous calculations of Eratosthenes. Al-Biruni’s measurements were in cubits, but there were multiple incompatible types of cubit in use at the time. It has also been pointed out that atmospheric refraction effects would have created noticeable changes in the observed dip angle \theta. It is thus likely that the true accuracy of Al-Biruni’s method was poorer than 1%, but that this was somehow compensated for by choosing a favorable conversion between cubits and modern units.

Comments on the second part of the video

  • 8:53 What does it mean for the transit of Io to be “twenty minutes ahead of schedule” when Jupiter is in opposition (Jupiter is opposite to the Sun when viewed from the Earth)?

    Actually, it should be halved to “ten minutes ahead of schedule”, with the transit being “ten minutes behind schedule” when Jupiter is in conjunction, with the net discrepancy being twenty minutes (or actually closer to 16 minutes when measured with modern technology). Both transits are being compared against an idealized periodic schedule in which the transits are occuring at a perfectly regular rate (about 42 hours), where the period is chosen to be the best fit to the actual data. This discrepancy is only noticeable after carefully comparing transit times over a period of months; at any given position of Jupiter, the Doppler effects of Earth moving towards or away from Jupiter would only affect the period between consecutive transits by just a few seconds, with the delays or accelerations only becoming cumulatively noticeable after many such transits.

    Also, the presentation here is oversimplified: at times of conjunction, Jupiter and Io are too close to the Sun for observation of the transit. Rømer actually observed the transits at other times than conjunction, and Huygens used more complicated trigonometry than what was presented here to infer a measurement for the speed of light in terms of the astronomical unit (which they had begun to measure a bit more accurately than in Aristarchus’s time; see the FAQ entry for 15:17 in the first video).
  • 10:05 Are the astrological signs for Earth and Venus swapped here?

    Yes, this was a small mistake in the animation.
  • 18:19 How did astronomers know that the Milky Way was only a small portion of the entire universe?

    This issue was known as the “Great Debate” in the early twentieth century. It was only with the work of Leavitt measuring Cepheids in Magellanic clouds and “spiral nebulae” (that we now know to be other galaxies), together with Hubble’s further analysis of distances to these Cepheids, that it was conclusively established that these clouds and nebulae in fact were at much greater distances than the diameter of the Milky Way.
  • 20:46 Does general relativity alone predict an uniformly expanding universe?

    This was an oversimplification. Einstein’s equations of general relativity contain a parameter \Lambda, known as the cosmological constant, which currently is only computable from experimental data. But even with this constant fixed, there are multiple solutions to these equations (basically because there are multiple possible initial conditions for the universe). For the purposes of cosmology, a particularly successful family of solutions are the solutions given by the Lambda-CDM model. This family of solutions contains additional parameters, such as the density of dark matter in the universe. Depending on the precise values of these parameters, the universe could be expanding or contracting, with the rate of expansion or contraction either increasing, decreasing, or staying roughly constant. But if one fits this model to all available data (including not just red shift measurements, but also measurements on the cosmic microwave background radiation and the spatial distribution of galaxies), one deduces a version of Hubble’s law which is nearly linear, but with an additional correction at very large scales; see the next item of this FAQ.
  • 21:07 Is Hubble’s original law sufficiently accurate to allow for good measurements of distances at the scale of the observable universe?

    Not really; as mentioned in the end of the video, there were additional efforts to cross-check and calibrate Hubble’s law at intermediate scales between the range of Cepheid methods (about 100 million light years) and observable universe scales (about 100 billion light years) by using further “standard candles” than Cepheids, most notably Type Ia supernovae (which are bright enough and predictable enough to be usable out to about 10 billion light years), the Tully-Fisher relation between the luminosity of a galaxy and its rotational speed, and gamma ray bursts. It turns out that due to the accelerating nature of the universe’s expansion, Hubble’s law is not completely linear at these large scales; this important correction cannot be discerned purely from Cepheid data, but also requires the other standard candles, as well as fitting that data (as well as other observational data, such as the cosmic microwave background radiation) to the cosmological models provided by general relativity (with the best fitting models to date being some version of the Lambda-CDM model).
  • 21:15 Where did this guess of the observable universe being about 20% of the full universe come from?

    There are some ways to get a lower bound on the size of the entire universe that go beyond the edge of the observable universe. One is through analysis of the cosmic microwave background radiation (CMBR), that has been carefully mapped out by several satellite observatories, most notably WMAP and Planck. Roughly speaking, a universe that was less than twice the size of the observable universe would create certain periodicities in the CMBR data; such periodicities are not observed, so this provides a lower bound (see for instance this paper for an example of such a calculation). The 20% number was a guess based on my vague recollection of these works, but there is no consensus currently on what the ratio truly is; there are some proposals that the entire universe is in fact several orders of magnitude larger than the observable one.
  • 23:49 Where can I learn more about this 10% discrepancy in Hubble’s law?

    This is known as the Hubble tension: roughly speaking, the various measurements of Hubble’s constant (either from climbing the cosmic distance ladder, or by fitting various observational data to standard cosmological models) tend to arrive at one of two values, that are about 10% apart from each other. The values based on gravitational wave observations are currently consistent with both values, due to significant error bars in this extremely sensitive method; but other more mature methods are now of sufficient accuracy that they are basically only consistent with one of the two values. Currently there is no consensus on the origin of this tension: possibilities include systemic biases in the observational data, subtle statistical issues with the methodology used to interpret the data, a correction to the standard cosmological model, the influence of some previously undiscovered law of physics, or some partial breakdown of the Copernican principle.
Wednesday, January 29th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
3:46 am
New exponent pairs, zero density estimates, and zero additive energy estimates: a systematic approach

Timothy Trudgian, Andrew Yang and I have just uploaded to the arXiv the paper “New exponent pairs, zero density estimates, and zero additive energy estimates: a systematic approach“. This paper launches a project envisioned in this previous blog post, in which the (widely dispersed) literature on various exponents in classical analytic number theory, as well as the relationships between these exponents, are collected in a living database, together with computer code to optimize the relations between them, with one eventual goal being to automate as much as possible the “routine” components of many analytic number theory papers, in which progress on one type of exponent is converted via standard arguments to progress on other exponents.

The database we are launching concurrently with this paper is called the Analytic Number Theory Exponent Database (ANTEDB). This Github repository aims to collect the latest results and relations on exponents such as the following:

  • The growth exponent {\mu(\sigma)} of the Riemann zeta function {\zeta(\sigma+it)} at real part {\sigma} (i.e., the best exponent for which {\zeta(\sigma+it) \ll |t|^{\mu(\sigma)+o(1)}} as {t \rightarrow \infty});
  • Exponent pairs {(k,\ell)} (used to bound exponential sums {\sum_{n \in I} e(T F(n/N))} for various phase functions {F} and parameters {T,N});
  • Zero density exponents {A(\sigma)} (used to bound the number of zeros of {\zeta} of real part larger than {\sigma});
etc.. These sorts of exponents are related to many topics in analytic number theory; for instance, the Lindelof hypothesis is equivalent to the assertion {\mu(1/2)=0}.

Information on these exponents is collected both in a LaTeX “blueprint” that is available as a human-readable set of web pages, and as part of our Python codebase. In the future one could also imagine the data being collected in a Lean formalization, but at present the database only contains a placeholder Lean folder.

As a consequence of collecting all the known bounds in the literature on these sorts of exponents, as well as abstracting out various relations between these exponents that were implicit in many papers in this subject, we were then able to run computer-assisted searches to improve some of the state of the art on these exponents in a largely automated fashion (without introducing any substantial new inputs from analytic number theory). In particular, we obtained:

  • four new exponent pairs;
  • several new zero density estimates; and
  • new estimates on the additive energy of zeroes of the Riemann zeta function.

We are hoping that the ANTEDB will receive more contributions in the future, for instance expanding to other types of exponents, or to update the database as new results are obtained (or old ones added). In the longer term one could also imagine integrating the ANTEDB with other tools, such as Lean or AI systems, but for now we have focused primarily on collecting the data and optimizing the relations between the exponents.

Thursday, December 19th, 2024
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
5:14 pm
Quaternions and spherical trigonometry

Hamilton’s quaternion number system {\mathbb{H}} is a non-commutative extension of the complex numbers, consisting of numbers of the form {t + xi + yj + zk} where {t,x,y,z} are real numbers, and {i,j,k} are anti-commuting square roots of {-1} with {ij=k}, {jk=i}, {ki=j}. While they are non-commutative, they do keep many other properties of the complex numbers:

  • Being non-commutative, the quaternions do not form a field. However, they are still a skew field (or division ring): multiplication is associative, and every non-zero quaternion has a unique multiplicative inverse.
  • Like the complex numbers, the quaternions have a conjugation

    \displaystyle  \overline{t+xi+yj+zk} := t-xi-yj-zk,

    although this is now an antihomomorphism rather than a homomorphism: {\overline{qr} = \overline{r}\ \overline{q}}. One can then split up a quaternion {t + xi + yj + zk} into its real part {t} and imaginary part {xi+yj+zk} by the familiar formulae

    \displaystyle  \mathrm{Re} q := \frac{q + \overline{q}}{2}; \quad \mathrm{Im} q := \frac{q - \overline{q}}{2}

    (though we now leave the imaginary part purely imaginary, as opposed to dividing by {i} in the complex case).
  • The inner product

    \displaystyle  \langle q, r \rangle := \mathrm{Re} q \overline{r}

    is symmetric and positive definite (with {1,i,j,k} forming an orthonormal basis). Also, for any {q}, {q \overline{q}} is real, hence equal to {\langle q, q \rangle}. Thus we have a norm

    \displaystyle  |q| = \sqrt{q\overline{q}} = \sqrt{\langle q,q \rangle} = \sqrt{t^2 + x^2 + y^2 + z^2}.

    Since the real numbers commute with all quaternions, we have the multiplicative property {|qr| = |q| |r|}. In particular, the unit quaternions {U(1,\mathbb{H}) := \{ q \in \mathbb{H}: |q|=1\}} (also known as {SU(2)}, {Sp(1)}, or {Spin(3)}) form a compact group.
  • We have the cyclic trace property

    \displaystyle  \mathrm{Re}(qr) = \mathrm{Re}(rq)

    which allows one to take adjoints of left and right multiplication:

    \displaystyle  \langle qr, s \rangle = \langle q, s\overline{r}\rangle; \quad \langle rq, s \rangle = \langle q, \overline{r}s \rangle

  • As {i,j,k} are square roots of {-1}, we have the usual Euler formulae

    \displaystyle  e^{i\theta} = \cos \theta + i \sin \theta, e^{j\theta} = \cos \theta + j \sin \theta, e^{k\theta} = \cos \theta + k \sin \theta

    for real {\theta}, together with other familiar formulae such as {\overline{e^{i\theta}} = e^{-i\theta}}, {e^{i(\alpha+\beta)} = e^{i\alpha} e^{i\beta}}, {|e^{i\theta}| = 1}, etc.
We will use these sorts of algebraic manipulations in the sequel without further comment.

The unit quaternions {U(1,\mathbb{H}) = \{ q \in \mathbb{H}: |q|=1\}} act on the imaginary quaternions {\{ xi + yj + zk: x,y,z \in {\bf R}\} \equiv {\bf R}^3} by conjugation:

\displaystyle  v \mapsto q v \overline{q}.

This action is by orientation-preserving isometries, hence by rotations. It is not quite faithful, since conjugation by the unit quaternion {-1} is the identity, but one can show that this is the only loss of faithfulness, reflecting the well known fact that {U(1,\mathbb{H}) \equiv SU(2)} is a double cover of {SO(3)}.

For instance, for any real {\theta}, conjugation by {e^{i\theta/2} = \cos(\theta/2) + i \sin(\theta/2)} is a rotation by {\theta} around {i}:

\displaystyle  e^{i\theta/2} i e^{-i\theta/2} = i \ \ \ \ \ (1)

\displaystyle  e^{i\theta/2} j e^{-i\theta/2} = \cos(\theta) j - \sin(\theta) k \ \ \ \ \ (2)

\displaystyle  e^{i\theta/2} k e^{-i\theta/2} = \cos(\theta) k + \sin(\theta) j. \ \ \ \ \ (3)

Similarly for cyclic permutations of {i,j,k}. The doubling of the angle here can be explained from the Lie algebra fact that {[i,j]=ij-ji} is {2k} rather than {k}; it also closely related to the aforementioned double cover. We also of course have {U(1,\mathbb{H})\equiv Spin(3)} acting on {\mathbb{H}} by left multiplication; this is known as the spinor representation, but will not be utilized much in this post. (Giving {\mathbb{H}} the right action of {{\bf C}} makes it a copy of {{\bf C}^2}, and the spinor representation then also becomes the standard representation of {SU(2)} on {{\bf C}^2}.)

Given how quaternions relate to three-dimensional rotations, it is not surprising that one can also be used to recover the basic laws of spherical trigonometry – the study of spherical triangles on the unit sphere. This is fairly well known, but it took a little effort for me to locate the required arguments, so I am recording the calculations here.

The first observation is that every unit quaternion {q} induces a unit tangent vector {qj\overline{q}} on the unit sphere {S^2 \subset {\bf R}^3}, located at {qi\overline{q} \in S^2}; the third unit vector {qk\overline{q}} is then another tangent vector orthogonal to the first two (and oriented to the left of the original tangent vector), and can be viewed as the cross product of {qi\overline{q} \in S^2} and {qj\overline{q} \in S^2}. Right multplication of this quaternion then corresponds to various natural operations on this unit tangent vector:

  • Right multiplying {q} by {e^{i\theta/2}} does not affect the location {qi\overline{q}} of the tangent vector, but rotates the tangent vector {qj\overline{q}} anticlockwise by {\theta} in the direction of the orthogonal tangent vector {qk\overline{q}}, as it replaces {qj\overline{q}} by {\cos(\theta) qj\overline{q} + \sin(\theta) qk\overline{q}}.
  • Right multiplying {q} by {e^{k\theta/2}} advances the tangent vector by geodesic flow by angle {\theta}, as it replaces {qi\overline{q}} by {\cos(\theta) qi\overline{q} + \sin(\theta) qj\overline{q}}, and replaces {qj\overline{q}} by {\cos(\theta) qj\overline{q} - \sin(\theta) qi\overline{q}}.

Now suppose one has a spherical triangle with vertices {A,B,C}, with the spherical arcs {AB, BC, CA} subtending angles {c, a, b} respectively, and the vertices {A,B,C} subtending angles {\alpha,\beta,\gamma} respectively; suppose also that {ABC} is oriented in an anti-clockwise direction for sake of discussion. Observe that if one starts at {A} with a tangent vector oriented towards {B}, advances that vector by {c}, and then rotates by {\pi - \beta}, the tangent vector now at {B} and pointing towards {C}. If one advances by {a} and rotates by {\pi - \gamma}, one is now at {C} pointing towards {A}; and if one then advances by {b} and rotates by {\pi - \alpha}, one is back at {A} pointing towards {B}. This gives the fundamental relation

\displaystyle  e^{kc/2} e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2} e^{kb/2} e^{i(\pi-\alpha)/2} = 1 \ \ \ \ \ (4)

relating the three sides and three equations of this triangle. (A priori, due to the lack of faithfulness of the {U(1,\mathbb{H})} action, the right-hand side could conceivably have been {-1} rather than {1}; but for extremely small triangles the right-hand side is clearly {1}, and so by continuity it must be {1} for all triangles.) Indeed, a moments thought will reveal that the condition (4) is necessary and sufficient for the data {a,b,c,\alpha,\beta,\gamma} to be associated with a spherical triangle. Thus one can view (4) as a “master equation” for spherical trigonometry: in principle, it can be used to derive all the other laws of this subject.

Remark 1 The law (4) has an evident symmetry {(a,b,c,\alpha,\beta,\gamma) \mapsto (\pi-\alpha,\pi-\beta,\pi-\gamma,\pi-a,\pi-b,\pi-c)}, which corresponds to the operation of replacing a spherical triangle with its dual triangle. Also, there is nothing particularly special about the choice of imaginaries {i,k} in (4); one can conjugate (4) by various quaternions and replace {i,k} here by any other orthogonal pair of unit quaternions.

Remark 2 If we work in the small scale regime, replacing {a,b,c} by {\varepsilon a, \varepsilon b, \varepsilon c} for some small {\varepsilon>0}, then we expect spherical triangles to behave like Euclidean triangles. Indeed, (4) to zeroth order becomes

\displaystyle  e^{i(\pi-\beta)/2} e^{i(\pi-\gamma)/2} e^{i(\pi-\alpha)/2} = 1

which reflects the classical fact that the sum of angles of a Euclidean triangle is equal to {\pi}. To first order, one obtains

\displaystyle  c + a e^{i(\pi-\gamma)/2} e^{i(\pi-\alpha)/2} + b e^{i(\pi-\alpha)/2} = 0

which reflects the evident fact that the vector sum of the sides of a Euclidean triangle sum to zero. (Geometrically, this correspondence reflects the fact that the action of the (projective) quaternion group on the unit sphere converges to the action of the special Euclidean group {SE(2)} on the plane, in a suitable asymptotic limit.)

The identity (4) is an identity of two unit quaternions; as the unit quaternion group {U(1,\mathbb{H})} is three-dimensional, this thus imposes three independent constraints on the six real parameters <img src="https://s0.wp.com/latex.php?latex=%7Ba%2Cb%2Cc%2C%5Calpha%2C%5Cbeta%2C%5Cgamma%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%7Ba%2Cb%2Cc%2C%5Calpha%2C%5Cbeta%2C%5Cgamma%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%7Ba%2Cb%2Cc%2C%5Calpha%2C%5Cbeta%2C%5Cgamma

Tuesday, December 17th, 2024
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
4:57 pm
On the distribution of eigenvalues of GUE and its minors at fixed index

I’ve just uploaded to the arXiv the paper “On the distribution of eigenvalues of GUE and its minors at fixed index“. This is a somewhat technical paper establishing some estimates regarding one of the most well-studied random matrix models, the Gaussian Unitary Ensemble (GUE), that were not previously in the literature, but which will be needed for some forthcoming work of Hariharan Narayanan on the limiting behavior of “hives” with GUE boundary conditions (building upon our previous joint work with Sheffield).

For sake of discussion we normalize the GUE model to be the random {N \times N} Hermitian matrix {H} whose probability density function is proportional to {e^{-\mathrm{tr} H^2}}. With this normalization, the famous Wigner semicircle law will tell us that the eigenvalues {\lambda_1 \leq \dots \leq \lambda_N} of this matrix will almost all lie in the interval {[-\sqrt{2N}, \sqrt{2N}]}, and after dividing by {\sqrt{2N}}, will asymptotically be distributed according to the semicircle distribution

\displaystyle  \rho_{\mathrm{sc}}(x) := \frac{2}{\pi} (1-x^2)_+^{1/2}.

In particular, the normalized {i^{th}} eigenvalue {\lambda_i/\sqrt{2N}} should be close to the classical location {\gamma_{i/N}}, where {\gamma_{i/N}} is the unique element of {[-1,1]} such that

\displaystyle  \int_{-\infty}^{\gamma_{i/N}} \rho_{\mathrm{sc}}(x)\ dx = \frac{i}{N}.

Eigenvalues can be described by their index {i} or by their (normalized) energy {\lambda_i/\sqrt{2N}}. In principle, the two descriptions are related by the classical map {i \mapsto \gamma_{i/N}} defined above, but there are microscopic fluctuations from the classical location that create subtle technical difficulties between “fixed index” results in which one focuses on a single index {i} (and neighboring indices {i+1, i-1}, etc.), and “fixed energy” results in which one focuses on a single energy {x} (and eigenvalues near this energy). The phenomenon of eigenvalue rigidity does give some control on these fluctuations, allowing one to relate “averaged index” results (in which the index {i} ranges over a mesoscopic range) with “averaged energy” results (in which the energy {x} is similarly averaged over a mesoscopic interval), but there are technical issues in passing back from averaged control to pointwise control, either for the index or energy.

We will be mostly concerned in the bulk region where the index {i} is in an inteval of the form {[\delta n, (1-\delta)n]} for some fixed {\delta>0}, or equivalently the energy {x} is in {[-1+c, 1-c]} for some fixed {c > 0}. In this region it is natural to introduce the normalized eigenvalue gaps

\displaystyle  g_i := \sqrt{N/2} \rho_{\mathrm{sc}}(\gamma_{i/N}) (\lambda_{i+1} - \lambda_i).

The semicircle law predicts that these gaps {g_i} have mean close to {1}; however, due to the aforementioned fluctuations around the classical location, this type of claim is only easy to establish in the “fixed energy”, “averaged energy”, or “averaged index” settings; the “fixed index” case was only achieved by myself as recently as 2013, where I showed that each such gap in fact asymptotically had the expected distribution of the Gaudin law, using manipulations of determinantal processes. A significantly more general result, avoiding the use of determinantal processes, was subsequently obtained by Erdos and Yau.

However, these results left open the possibility of bad tail behavior at extremely large or small values of the gaps {g_i}; in particular, moments of the {g_i} were not directly controlled by previous results. The first result of the paper is to push the determinantal analysis further, and obtain such results. For instance, we obtain moment bounds

\displaystyle  \mathop{\bf E} g_i^p \ll_p 1

for any fixed {p > 0}, as well as an exponential decay bound

\displaystyle  \mathop{\bf P} (g_i > h) \ll \exp(-h/4)

for {0 < h \ll \log\log N}, and a lower tail bound

\displaystyle  \mathop{\bf P} (g_i \leq h) \ll h^{2/3} \log^{1/2} \frac{1}{h}

for any {h>0}. We also obtain good control on sums {g_i + \dots + g_{i+m-1}} of {m} consecutive gaps for any fixed {m}, showing that this sum has mean {m + O(\log^{4/3} (2+m))} and variance {O(\log^{7/3} (2+m))}. (This is significantly less variance than one would expect from a sum of {m} independent random variables; this variance reduction phenomenon is closely related to the eigenvalue rigidity phenomenon alluded to earlier, and reflects the tendency of eigenvalues to repel each other.)

A key point in these estimates is that no factors of {\log N} occur in the estimates, which is what one would obtain if one tried to use existing eigenvalue rigidity theorems. (In particular, if one normalized the eigenvalues {\lambda_i} at the same scale at the gap {g_i}, they would fluctuate by a standard deviation of about {\sqrt{\log N}}; it is only the gaps between eigenvalues that exhibit much smaller fluctuation.) On the other hand, the dependence on {h} is not optimal, although it was sufficient for the applications I had in mind.

As with my previous paper, the strategy is to try to replace fixed index events such as {g_i > h} with averaged energy events. For instance, if {g_i > h} and {i} has classical location {x}, then there is an interval of normalized energies {t} of length {\gg h}, with the property that there are precisely {N-i} eigenvalues to the right of {f_x(t)} and no eigenvalues in the interval {[f_x(t), f_x(t+h/2)]}, where

\displaystyle  f_x(t) = \sqrt{2N}( x + \frac{t}{N \rho_{\mathrm{sc}}(x)})

is an affine rescaling to the scale of the eigenvalue gap. So matters soon reduce to controlling the probability of the event

\displaystyle  (N_{x,t} = N-i) \wedge (N_{x,t,h/2} = 0)

where {N_{x,t}} is the number of eigenvalues to the right of {f_x(t)}, and {N_{x,t,h/2}} is the number of eigenvalues in the interval {[f_x(t), f_x(t+h/2)]}. These are fixed energy events, and one can use the theory of determinantal processes to control them. For instance, each of the random variables {N_{x,t}}, {N_{x,t,h/2}} separately have the distribution of sums of independent Boolean variables, which are extremely well understood. Unfortunately, the coupling is a problem; conditioning on the event {N_{x,t} = N-i}, in particular, affects the distribution of {N_{x,t,h/2}}, so that it is no longer the sum of independent Boolean variables. However, it is still a mixture of such sums, and with this (and the Plancherel-Rotach asymptotics for the GUE determinantal kernel) it is possible to proceed and obtain the above estimates after some calculation.

For the intended application to GUE hives, it is important to not just control gaps {g_i} of the eigenvalues {\lambda_i} of the GUE matrix {M}, but also the gaps {g'_i} of the eigenvalues {\lambda'_i} of the top left {N-1 \times N-1} minor {M'} of {M}. This minor of a GUE matrix is basically again a GUE matrix, so the above theorem applies verbatim to the {g'_i}; but it turns out to be necessary to control the joint distribution of the {g_i} and {g'_i}, and also of the interlacing gaps {\tilde g_i} between the {\lambda_i} and {\lambda'_i}. For fixed energy, these gaps are in principle well understood, due to previous work of Adler-Nordenstam-van Moerbeke and of Johansson-Nordenstam which show that the spectrum of both matrices is asymptotically controlled by the Boutillier bead process. This also gives averaged energy and averaged index results without much difficulty, but to get to fixed index information, one needs some universality result in the index {i}. For the gaps {g_i} of the original matrix, such a universality result is available due to the aforementioned work of Erdos and Yau, but this does not immediately imply the corresponding universality result for the joint distribution of {g_i} and {g'_i} or {\tilde g_i}. For this, we need a way to relate the eigenvalues {\lambda_i} of the matrix {M} to the eigenvalues {\lambda'_i} of the minors {M'}. By a standard Schur’s complement calculation, one can obtain the equation

\displaystyle a_{NN} - \lambda_i - \sum_{j=1}^{N-1}\frac{|X_j|^2}{\lambda'_j - \lambda_i} = 0

for all {i}, where {a_{NN}} is the bottom right entry of {M}, and {X_1,\dots,X_{N-1}} are complex gaussians independent of {\lambda'_j}. This gives a random system of equations to solve for {\lambda_i} in terms of {\lambda'_j}. Using the previous bounds on eigenvalue gaps (particularly the concentration results for sums of consecutive gaps), one can localize this equation to the point where a given {\lambda_i} is mostly controlled by a bounded number of nearby {\lambda'_j}, and hence a single gap {g_i} is mostly controlled by a bounded number of {g'_j}. From this, it is possible to leverage the existing universality result of Erdos and Yau to obtain universality of the joint distribution of {g_i} and {g'_i} (or of {\tilde g_i}). (The result can also be extended to more layers of the minor process than just two, as long as the number of minors is held fixed.)

This at last brings us to the final result of the paper, which is the one which is actually needed for the application to GUE hives. Here, one is interested in controlling the variance of a linear combination {\sum_{l=1}^m a_l \tilde g_{i+l}} of a fixed number {l} of consecutive interlacing gaps {\tilde g_{i+l}}, where the {a_l} are arbitrary deterministic coefficients. An application of the triangle and Cauchy-Schwarz inequalities, combined with the previous moment bounds on gaps, shows that this randomv ariable has variance {\ll m \sum_{l=1}^m |a_i|^2}. However, this bound is not expected to be sharp, due to the expected decay between correlations of eigenvalue gaps. In this paper, I improve the variance bound to

\displaystyle  \ll_A \frac{m}{\log^A(2+m)} \sum_{l=1}^m |a_i|^2

for any {A>0}, which is what was needed for the application.

This improvement reflects some decay in the covariances between distant interlacing gaps {\tilde g_i, \tilde g_{i+h}}. I was not able to establish such decay directly. Instead, using some Fourier analysis, one can reduce matters to studying the case of modulated linear statistics such as {\sum_{l=1}^m e(\xi l) \tilde g_{i+l}} for various frequencies {\xi}. In “high frequency” cases one can use the triangle inequality to reduce matters to studying the original eigenvalue gaps {g_i}, which can be handled by a (somewhat complicated) determinantal process calculation, after first using universality results to pass from fixed index to averaged index, thence to averaged energy, then to fixed energy estimates. For low frequencies the triangle inequality argument is unfavorable, and one has to instead use the determinantal kernel of the full minor process, and not just an individual matrix. This requires some classical, but tedious, calculation of certain asymptotics of sums involving Hermite polynomials.

The full argument is unfortunately quite complex, but it seems that the combination of having to deal with minors, as well as fixed indices, places this result out of reach of many prior methods.

Thursday, December 5th, 2024
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
6:05 pm
AI for Math fund

Renaissance Philanthropy and XTX Markets have announced the launch of the AI for Math Fund, a new grant program supporting projects that apply AI and machine learning to mathematics, with a focus on automated theorem proving, with an initial $9.2 million in funding. The project funding categories, and examples of projects in such categories, are:

1. Production Grade Software Tools

  • AI-based autoformalization tools for translating natural-language mathematics into the formalisms of proof assistants
  • AI-based auto-informalization tools for translating proof-assistant proofs into interpretable natural-language mathematics
  • AI-based models for suggesting tactics/steps or relevant concepts to the user of a proof assistant, or for generating entire proofs
  • Infrastructure to connect proof assistants with computer algebra systems, calculus, and PDEs
  • A large-scale, AI-enhanced distributed collaboration platform for mathematicians

2. Datasets

  • Datasets of formalized theorems and proofs in a proof assistant
  • Datasets that would advance AI for theorem proving as applied to program verification and secure code generation
  • Datasets of (natural-language) mathematical problems, theorems, proofs, exposition, etc.
  • Benchmarks and training environments associated with datasets and model tasks (autoformalization, premise selection, tactic or proof generation, etc.)

3. Field Building

  • Textbooks
  • Courses
  • Documentation and support for proof assistants, and interfaces/APIs to integrate with AI tools

4. Breakthrough Ideas

  • Expected difficulty estimation (of sub-problems of a proof)
  • Novel mathematical implications of proofs formalized type-theoretically
  • Formalization of proof complexity in proof assistants

The deadline for initial expressions of interest is Jan 10, 2025.

[Disclosure: I have agreed to serve on the advisory board for this fund.]

Update: See also this discussion thread on possible projects that might be supported by this fund.

Thursday, November 28th, 2024
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
5:26 am
On several irrationality problems for Ahmes series

Vjeko Kovac and I have just uploaded to the arXiv our paper “On several irrationality problems for Ahmes series“. This paper resolves (or at least makes partial progress on) some open questions of Erdős and others on the irrationality of Ahmes series, which are infinite series of the form {\sum_{k=1}^\infty \frac{1}{a_k}} for some increasing sequence {a_k} of natural numbers. Of course, since most real numbers are irrational, one expects such series to “generically” be irrational, and we make this intuition precise (in both a probabilistic sense and a Baire category sense) in our paper. However, it is often difficult to establish the irrationality of any specific series. For example, it is already a non-trivial result of Erdős that the series {\sum_{k=1}^\infty \frac{1}{2^k-1}} is irrational, while the irrationality of {\sum_{p \hbox{ prime}} \frac{1}{2^p-1}} (equivalent to Erdős problem #69) remains open, although very recently Pratt established this conditionally on the Hardy–Littlewood prime tuples conjecture. Finally, the irrationality of {\sum_n \frac{1}{n!-1}} (Erdős problem #68) is completely open.

On the other hand, it has long been known that if the sequence {a_k} grows faster than {C^{2^k}} for any {C}, then the Ahmes series is necessarily irrational, basically because the fractional parts of {a_1 \dots a_m \sum_{k=1}^\infty \frac{1}{a_k}} can be arbitrarily small positive quantities, which is inconsistent with {\sum_{k=1}^\infty \frac{1}{a_k}} being rational. This growth rate is sharp, as can be seen by iterating the identity {\frac{1}{n} = \frac{1}{n+1} + \frac{1}{n(n+1)}} to obtain a rational Ahmes series of growth rate {(C+o(1))^{2^k}} for any fixed {C>1}.

In our paper we show that if {a_k} grows somewhat slower than the above sequences in the sense that {a_{k+1} = o(a_k^2)}, for instance if {a_k \asymp 2^{(2-\varepsilon)^k}} for a fixed {0 < \varepsilon < 1}, then one can find a comparable sequence {b_k \asymp a_k} for which {\sum_{k=1}^\infty \frac{1}{b_k}} is rational. This partially addresses Erdős problem #263, which asked if the sequence {a_k = 2^{2^k}} had this property, and whether any sequence of exponential or slower growth (but with {\sum_{k=1}^\infty 1/a_k} convergent) had this property. Unfortunately we barely miss a full solution of both parts of the problem, since the condition {a_{k+1} = o(a_k^2)} we need just fails to cover the case {a_k = 2^{2^k}}, and also does not quite hold for all sequences going to infinity at an exponential or slower rate.

We also show the following variant; if {a_k} has exponential growth in the sense that {a_{k+1} = O(a_k)} with {\sum_{k=1}^\infty \frac{1}{a_k}} convergent, then there exists nearby natural numbers {b_k = a_k + O(1)} such that {\sum_{k=1}^\infty \frac{1}{b_k}} is rational. This answers the first part of Erdős problem #264 which asked about the case {a_k = 2^k}, although the second part (which asks about {a_k = k!}) is slightly out of reach of our methods. Indeed, we show that the exponential growth hypothesis is best possible in the sense a random sequence {a_k} that grows faster than exponentially will not have this property, this result does not address any specific superexponential sequence such as {a_k = k!}, although it does apply to some sequence {a_k} of the shape {a_k = k! + O(\log\log k)}.

Our methods can also handle higher dimensional variants in which multiple series are simultaneously set to be rational. Perhaps the most striking result is this: we can find an increasing sequence {a_k} of natural numbers with the property that {\sum_{k=1}^\infty \frac{1}{a_k + t}} is rational for every rational {t} (excluding the cases {t = - a_k} to avoid division by zero)! This answers (in the negative) a question of Stolarsky Erdős problem #266, and also reproves Erdős problem #265 (and in the latter case one can even make {a_k} grow double exponentially fast).

Our methods are elementary and avoid any number-theoretic considerations, relying primarily on the countable dense nature of the rationals and an iterative approximation technique. The first observation is that the task of representing a given number {q} as an Ahmes series {\sum_{k=1}^\infty \frac{1}{a_k}} with each {a_k} lying in some interval {I_k} (with the {I_k} disjoint, and going to infinity fast enough to ensure convergence of the series), is possible if and only if the infinite sumset

\displaystyle  \frac{1}{I_1} + \frac{1}{I_2} + \dots

to contain {q}, where {\frac{1}{I_k} = \{ \frac{1}{a}: a \in I_k \}}. More generally, to represent a tuple of numbers {(q_t)_{t \in T}} indexed by some set {T} of numbers simultaneously as {\sum_{k=1}^\infty \frac{1}{a_k+t}} with {a_k \in I_k}, this is the same as asking for the infinite sumset

\displaystyle  E_1 + E_2 + \dots

to contain {(q_t)_{t \in T}}, where now

\displaystyle  E_k = \{ (\frac{1}{a+t})_{t \in T}: a \in I_k \}. \ \ \ \ \ (1)

So the main problem is to get control on such infinite sumsets. Here we use a very simple observation:

Proposition 1 (Iterative approximation) Let {V} be a Banach space, let {E_1,E_2,\dots} be sets with each {E_k} contained in the ball of radius {\varepsilon_k>0} around the origin for some {\varepsilon_k} with {\sum_{k=1}^\infty \varepsilon_k} convergent, so that the infinite sumset {E_1 + E_2 + \dots} is well-defined. Suppose that one has some convergent series {\sum_{k=1}^\infty v_k} in {V}, and sets {B_1,B_2,\dots} converging in norm to zero, such that

\displaystyle  v_k + B_k \subset E_k + B_{k+1} \ \ \ \ \ (2)

for all {k \geq 1}. Then the infinite sumset {E_1 + E_2 + \dots} contains {\sum_{k=1}^\infty v_k + B_1}.

Informally, the condition (2) asserts that {E_k} occupies all of {v_k + B_k} “at the scale {B_{k+1}}“.

Proof: Let {w_1 \in B_1}. Our task is to express {\sum_{k=1}^\infty v_k + w_1} as a series {\sum_{k=1}^\infty e_k} with {e_k \in E_k}. From (2) we may write

\displaystyle  \sum_{k=1}^\infty v_k + w_1 = \sum_{k=2}^\infty v_k + e_1 + w_2

for some {e_1 \in E_1} and {w_2 \in B_2}. Iterating this, we may find {e_k \in E_k} and {w_k \in B_k} such that

\displaystyle  \sum_{k=1}^\infty v_k + w_1 = \sum_{k=m+1}^\infty v_k + e_1 + e_2 + \dots + e_m + w_{m+1}

for all {m}. Sending {m \rightarrow \infty}, we obtain

\displaystyle  \sum_{k=1}^\infty v_k + w_1 = e_1 + e_2 + \dots

as required. \Box

In one dimension, sets of the form {\frac{1}{I_k}} are dense enough that the condition (2) can be satisfied in a large number of situations, leading to most of our one-dimensional results. In higher dimension, the sets {E_k} lie on curves in a high-dimensional space, and so do not directly obey usable inclusions of the form (2); however, for suitable choices of intervals {I_k}, one can take some finite sums {E_{k+1} + \dots + E_{k+d}} which will become dense enough to obtain usable inclusions of the form (2) once {d} reaches the dimension of the ambient space, basically thanks to the inverse function theorem (and the non-vanishing curvatures of the curve in question). For the Stolarsky problem, which is an infinite-dimensional problem, it turns out that one can modify this approach by letting {d} grow slowly to infinity with {k}.

Monday, November 11th, 2024
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
5:04 pm
Higher uniformity of arithmetic functions in short intervals II. Almost all intervals

Kaisa Matomäki, Maksym Radziwill, Fernando Xuancheng Shao, Joni Teräväinen, and myself have (finally) uploaded to the arXiv our paper “Higher uniformity of arithmetic functions in short intervals II. Almost all intervals“. This is a sequel to our previous paper from 2022. In that paper, discorrelation estimates such as

\displaystyle  \sum_{x \leq n \leq x+H} (\Lambda(n) - \Lambda^\sharp(n)) \bar{F}(g(n)\Gamma) = o(H)

were established, where {\Lambda} is the von Mangoldt function, {\Lambda^\sharp} was some suitable approximant to that function, {F(g(n)\Gamma)} was a nilsequence, and {[x,x+H]} was a reasonably short interval in the sense that {H \sim x^{\theta+\varepsilon}} for some {0 < \theta < 1} and some small {\varepsilon>0}. In that paper, we were able to obtain non-trivial estimates for {\theta} as small as {5/8}, and for some other functions such as divisor functions {d_k} for small values of {k}, we could lower {\theta} somewhat to values such as {3/5}, {5/9}, {1/3} of {\theta}. This had a number of analytic number theory consequences, for instance in obtaining asymptotics for additive patterns in primes in such intervals. However, there were multiple obstructions to lowering {\theta} much further. Even for the model problem when {F(g(n)\Gamma) = 1}, that is to say the study of primes in short intervals, until recently the best value of {\theta} available was {7/12}, although this was very recently improved to {17/30} by Guth and Maynard.

However, the situation is better when one is willing to consider estimates that are valid for almost all intervals, rather than all intervals, so that one now studies local higher order uniformity estimates of the form

\displaystyle  \int_X^{2X} \sup_{F,g} | \sum_{x \leq n \leq x+H} (\Lambda(n) - \Lambda^\sharp(n)) \bar{F}(g(n)\Gamma)|\ dx = o(XH)

where {H = X^{\theta+\varepsilon}} and the supremum is over all nilsequences of a certain Lipschitz constant on a fixed nilmanifold {G/\Gamma}. This generalizes local Fourier uniformity estimates of the form

\displaystyle  \int_X^{2X} \sup_{\alpha} | \sum_{x \leq n \leq x+H} (\Lambda(n) - \Lambda^\sharp(n)) e(-\alpha n)|\ dx = o(XH).

There is particular interest in such estimates in the case of the Möbius function {\mu(n)} (where, as per the Möbius pseudorandomness conjecture, the approximant {\mu^\sharp} should be taken to be zero, at least in the absence of a Siegel zero). This is because if one could get estimates of this form for any {H} that grows sufficiently slowly in {X} (in particular {H = \log^{o(1)} X}), this would imply the (logarithmically averaged) Chowla conjecture, as I showed in a previous paper.

While one can lower {\theta} somewhat, there are still barriers. For instance, in the model case {F \equiv 1}, that is to say prime number theorems in almost all short intervals, until very recently the best value of {\theta} was {1/6}, recently lowered to {2/15} by Guth and Maynard (and can be lowered all the way to zero on the Density Hypothesis). Nevertheless, we are able to get some improvements at higher orders:

  • For the von Mangoldt function, we can get {\theta} as low as {1/3}, with an arbitrary logarithmic saving {\log^{-A} X} in the error terms; for divisor functions, one can even get power savings in this regime.
  • For the Möbius function, we can get {\theta=0}, recovering our previous result with Tamar Ziegler, but now with {\log^{-A} X} type savings in the exceptional set (though not in the pointwise bound outside of the set).
  • We can now also get comparable results for the divisor function.

As sample applications, we can obtain Hardy-Littlewood conjecture asymptotics for arithmetic progressions of almost all given steps {h \sim X^{1/3+\varepsilon}}, and divisor correlation estimates on arithmetic progressions for almost all {h \sim X^\varepsilon}.

Our proofs are rather long, but broadly follow the “contagion” strategy of Walsh, generalized from the Fourier setting to the higher order setting. Firstly, by standard Heath–Brown type decompositions, and previous results, it suffices to control “Type II” discorrelations such as

\displaystyle  \sup_{F,g} | \sum_{x \leq n \leq x+H} \alpha*\beta(n) \bar{F}(g(n)\Gamma)|

for almost all {x}, and some suitable functions {\alpha,\beta} supported on medium scales. So the bad case is when for most {x}, one has a discorrelation

\displaystyle  |\sum_{x \leq n \leq x+H} \alpha*\beta(n) \bar{F_x}(g_x(n)\Gamma)| \gg H

for some nilsequence {F_x(g_x(n) \Gamma)} that depends on {x}.

The main issue is the dependency of the polynomial {g_x} on {x}. By using a “nilsequence large sieve” introduced in our previous paper, and removing degenerate cases, we can show a functional relationship amongst the {g_x} that is very roughly of the form

\displaystyle  g_x(an) \approx g_{x'}(a'n)

whenever {n \sim x/a \sim x'/a'} (and I am being extremely vague as to what the relation “{\approx}” means here). By a higher order (and quantitatively stronger) version of Walsh’s contagion analysis (which is ultimately to do with separation properties of Farey sequences), we can show that this implies that these polynomials {g_x(n)} (which exert influence over intervals {[x,x+H]}) can “infect” longer intervals {[x', x'+Ha]} with some new polynomials {\tilde g_{x'}(n)} and various {x' \sim Xa}, which are related to many of the previous polynomials by a relationship that looks very roughly like

\displaystyle  g_x(n) \approx \tilde g_{ax}(an).

This can be viewed as a rather complicated generalization of the following vaguely “cohomological”-looking observation: if one has some real numbers {\alpha_i} and some primes {p_i} with {p_j \alpha_i \approx p_i \alpha_j} for all {i,j}, then one should have {\alpha_i \approx p_i \alpha} for some {\alpha}, where I am being vague here about what {\approx} means (and why it might be useful to have primes). By iterating this sort of contagion relationship, one can eventually get the {g_x(n)} to behave like an Archimedean character {n^{iT}} for some {T} that is not too large (polynomial size in {X}), and then one can use relatively standard (but technically a bit lengthy) “major arc” techiques based on various integral estimates for zeta and {L} functions to conclude.

Friday, November 8th, 2024
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
2:32 pm
The elephant in the room

The day after the election, I found myself struggling with how to approach the complex analysis class I was teaching. Could I ignore the (almost literal) elephant in the room? Would my students be in the right mental state to learn math? Would I be in the right mental state to teach it?

I opened with the statement that usually in math we have the luxury of working in abstractions far removed from the real world. We are familiar with addressing mathematical problems with the (inessential) connections to the real world stripped away, leaving only the essential features to focus one’s attention. An election, for instance, might be treated as the outcome of N people, each of which has a probability p of voting for one candidate, and 1-p for another… and one can then try to analyze the problem from a dispassionate mathematical perspective. This type of mindset can be illuminating in many contexts. Real world events have real consequences, however, and in light of an event as consequential as the last election, a math lecture on contour integration or the central limit theorem may seem meaningless.

But there is one precious thing mathematics has, that almost no other field currently enjoys: a consensus on what the ground truth is, and how to reach it. Because of this, even the strongest differences of opinion in mathematics can eventually be resolved, and mistakes realized and corrected. This consensus is so strong, we simply take it for granted: a solution is correct or incorrect, a theorem is proved or not proved, and when a problem is solved, we simply move on to the next one. This is, sadly, not a state of affairs elsewhere. But if my students can learn from this and carry these skills— such as distinguishing an overly simple but mathematically flawed “solution” from a more complex, but accurate actual solution—to other spheres that have more contact with the real world, then my math lectures have consequence. Even—or perhaps, especially—in times like these.

[ << Previous 20 ]

LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.