
On 12 May, CERN announced the
Higgs Boson Machine Learning Challenge , a competition for the best algorithm for finding events with the
Higgs boson in a set of experimental data. The contest will run until September 15, winners will receive cash prizes ranging from $ 2,000 to $ 7,000. A successful solution can be integrated into the actual data processing from the ATLAS detector. To participate in the competition does not need special knowledge in particle physics.
The Higgs boson in the Large Hadron Collider is not detected directly, but by decay products. Protons of enormous energy collide in the center of the detector. In the course of a collision, a Higgs boson can be born, which in a short time decays into other particles. According to the predictions of the standard model, the most popular decay channel is a pair of quarks B and anti-B. The competition proposes to focus on more rare events when the Higgs boson decays into tau lepton and antitau lepton. Since these leptons also quickly decay through various channels, the detector "sees" only their decay products. However, a similar set of decay products can be obtained in many other ways, so many events form the background and, in order to study the Higgs boson, it is necessary to distinguish events with a boson from the background.
A huge number of collisions occur in the collider; therefore, it is very important to quickly and accurately distinguish interesting events from uninteresting data from the detector. This is proposed to do the contestants.
Each event is described by thirty numbers, of which 17 are direct data from the detector, and 13 are derived values ​​calculated from raw data, which, according to experts, may be useful for prediction. Among raw data, for example, PRI_tau_pt is the perpendicular component of the detected “hadron tau” pulse (tau lepton reconstructed via the hadron decay channel). Among the derivatives, for example, DER_mass_MMC is the estimated mass of the Higgs boson that could most likely generate this event (if there was a Higgs boson at all). A complete theoretical description of the parameters is given in a
special article , although it may not be worthwhile to read it in order to approach the problem with an unbroken look.
')
Participants are
invited to a training set of 250 thousand events for which they are known to be a signal or noise, and it is proposed to classify 550 thousand previously known control events. Results will be evaluated using a
formula that takes into account the number of correct and incorrect answers. To make it difficult to fit the results, you are not informed of the exact result of the test: until the end of the competition, the test is conducted on a random subset of 18% of the control sample.
Participants can team up to four people and send up to five solutions per day. You can discuss approaches to the solution on the
forum . To test your solution, it is enough to send a file with predictions: you can download the source later, if you qualify for a prize.
The authors of the three best solutions will receive cash prizes: $ 7,000, $ 4,000 and $ 2,000. Also, the ATLAS collaboration will select a winning team whose solution will be best suited for use in the experiment (taking into account performance, reliability and other parameters). This team will be invited to CERN to meet with the ATLAS Collaboration (with travel expenses).