Empirical analysis is often the first step towards the birth of a conjecture.
This is the case of the Birch-Swinnerton-Dyer (BSD) Conjecture describing the
rational points on an elliptic curve, one of the most celebrated unsolved
problems in mathematics. Here we extend the original empirical approach, to the
analysis of the Cremona database of quantities relevant to BSD, inspecting more
than 2.5 million elliptic curves by means of the latest techniques in data
science, machine-learning and topological data analysis.
Key quantities such as rank, Weierstrass coefficients, period, conductor,
Tamagawa number, regulator and order of the Tate-Shafarevich group give rise to
a high-dimensional point-cloud whose statistical properties we investigate. We
reveal patterns and distributions in the rank versus Weierstrass coefficients,
as well as the Beta distribution of the BSD ratio of the quantities. Via
gradient boosted trees, machine learning is applied in finding
inter-correlation amongst the various quantities. We anticipate that our
approach will spark further research on the statistical properties of large
datasets in Number Theory and more in general in pure Mathematics.

Laura Alessandretti

Andrea Baronchelli

Yang-Hui He

