Computing Archives – CERN Courier

Quantum simulators in high-energy physics

cern — Wed, 09 Jul 2025 07:12:13 +0000

In 1982 Richard Feynman posed a question that challenged computational limits: can a classical computer simulate a quantum system? His answer: not efficiently. The complexity of the computation increases rapidly, rendering realistic simulations intractable. To understand why, consider the basic units of classical and quantum information.

A classical bit can exist in one of two states: |0> or |1>. A quantum bit, or qubit, exists in a superposition α|0> + β|1>, where α and β are complex amplitudes with real and imaginary parts. This superposition is the core feature that distinguishes quantum bits and classical bits. While a classical bit is either |0> or |1>, a quantum bit can be a blend of both at once. This is what gives quantum computers their immense parallelism – and also their fragility.

The difference becomes profound with scale. Two classical bits have four possible states, and are always in just one of them at a time. Two qubits simultaneously encode a complex-valued superposition of all four states.

Resources scale exponentially. N classical bits encode N boolean values, but N qubits encode 2^N complex amplitudes. Simulating 50 qubits with double-precision real numbers for each part of the complex amplitudes would require more than a petabyte of memory, beyond the reach of even the largest supercomputers.

Direct mimicry

Feynman proposed a different approach to quantum simulation. If a classical computer struggles, why not use one quantum system to emulate the behaviour of another? This was the conceptual birth of the quantum simulator: a device that harnesses quantum mechanics to solve quantum problems. For decades, this visionary idea remained in the realm of theory, awaiting the technological breakthroughs that are now rapidly bringing it to life. Today, progress in quantum hardware is driving two main approaches: analog and digital quantum simulation, in direct analogy to the history of classical computing.

Optical tweezers In this neutral-atom experiment at Stanford University, atoms are confined in a lattice by highly focused laser beams (see inset). Credit: Stanford/Choi Research Group

In analog quantum simulators, the physical parameters of the simulator directly correspond to the parameters of the quantum system being studied. Think of it like a wind tunnel for aeroplanes: you are not calculating air resistance on a computer but directly observing how air flows over a model.

A striking example of an analog quantum simulator traps excited Rydberg atoms in precise configurations using highly focused laser beams known as “optical tweezers”. Rydberg atoms have one electron excited to an energy level far from the nucleus, giving them an exaggerated electric dipole moment that leads to tunable long-range dipole–dipole interactions – an ideal setup for simulating particle interactions in quantum field theories (see “Optical tweezers” figure).

The positions of the Rydberg atoms discretise the space inhabited by the quantum fields being modelled. At each point in the lattice, the local quantum degrees of freedom of the simulated fields are embodied by the internal states of the atoms. Dipole–dipole interactions simulate the dynamics of the quantum fields. This technique has been used to observe phenomena such as string breaking, where the force between particles pulls so strongly that the vacuum spontaneously creates new particle–antiparticle pairs. Such quantum simulations model processes that are notoriously difficult to calculate from first principles using classical computers (see “A philosophical dimension” panel).

Universal quantum computation

Digital quantum simulators operate much like classical digital computers, though using quantum rather than classical logic gates. While classical logic manipulates classical bits, quantum logic manipulates qubits. Because quantum logic gates obey the Schrödinger equation, they preserve information and are reversible, whereas most classical gates, such as “AND” and “OR”, are irreversible. Many quantum gates have no classical equivalent, because they manipulate phase, superposition or entanglement – a uniquely quantum phenomenon in which two or more qubits share a combined state. In an entangled system, the state of each qubit cannot be described independently of the others, even if they are far apart: the global description of the quantum state is more than the combination of the local information at every site.

A philosophical dimension

The discretisation of space by quantum simulators echoes the rise of lattice QCD in the 1970s and 1980s. Confronted with the non-perturbative nature of the strong interaction, Kenneth Wilson introduced a method to discretise spacetime, enabling numerical solutions to quantum chromodynamics beyond the reach of perturbation theory. Simulations on classical supercomputers have since deepened our understanding of quark confinement and hadron masses, catalysed advances in high-performance computing, and inspired international collaborations. It has become an indispensable tool in particle physics (see “Fermilab’s final word on muon g-2”).

In classical lattice QCD, the discretisation of spacetime is just a computational trick – a means to an end. But in quantum simulators this discretisation becomes physical. The simulator is a quantum system governed by the same fundamental laws as the target theory.

This raises a philosophical question: are we merely modelling the target theory or are we, in a limited but genuine sense, realising it? If an array of neutral atoms faithfully mimics the dynamical behaviour of a specific gauge theory, is it “just” a simulation, or is it another manifestation of that theory’s fundamental truth? Feynman’s original proposal was, in a sense, about using nature to compute itself. Quantum simulators bring this abstract notion into concrete laboratory reality.

By applying sequences of quantum logic gates, a digital quantum computer can model the time evolution of any target quantum system. This makes them flexible and scalable in pursuit of universal quantum computation – logic able to run any algorithm allowed by the laws of quantum mechanics, given enough qubits and sufficient time. Universal quantum computing requires only a small subset of the many quantum logic gates that can be conceived, for example Hadamard, T and CNOT. The Hadamard gate creates a superposition: |0> → (|0> + |1>) / √2. The T gate applies a 45° phase rotation: |1> → e^iπ/4|1>. And the CNOT gate entangles qubits by flipping a target qubit if a control qubit is |1>. These three suffice to prepare any quantum state from a trivial reference state: |ψ> = U₁ U₂U₃… U_N |0000…000>.

Trapped ions A quantum simulator at the University of Innsbruck. Credit: C Lackner/Innsbruck

To bring frontier physics problems within the scope of current quantum computing resources, the distinction between analog and digital quantum simulations is often blurred. The complexity of simulations can be reduced by combining digital gate sequences with analog quantum hardware that aligns with the interaction patterns relevant to the target problem. This is feasible as quantum logic gates usually rely on native interactions similar to those used in analog simulations. Rydberg atoms are a common choice. Alongside them, two other technologies are becoming increasingly dominant in digital quantum simulation: trapped ions and superconducting qubit arrays.

Trapped ions offer the greatest control. Individual charged ions can be suspended in free space using electromagnetic fields. Lasers manipulate their quantum states, inducing interactions between them. Trapped-ion systems are renowned for their high fidelity (meaning operations are accurate) and long coherence times (meaning they maintain their quantum properties for longer), making them excellent candidates for quantum simulation (see “Trapped ions” figure).

Superconducting qubit arrays promise the greatest scalability. These tiny superconducting circuit materials act as qubits when cooled to extremely low temperatures and manipulated with microwave pulses. This technology is at the forefront of efforts to build quantum simulators and digital quantum computers for universal quantum computation (see “Superconducting qubits” figure).

The noisy intermediate-scale quantum era

Despite rapid progress, these technologies are at an early stage of development and face three main limitations.

The first problem is that qubits are fragile. Interactions with their environment quickly compromise their superposition and entanglement, making computations unreliable. Preventing “decoherence” is one of the main engineering challenges in quantum technology today.

The second challenge is that quantum logic gates have low fidelity. Over a long sequence of operations, errors accumulate, corrupting the result.

Finally, quantum simulators currently have a very limited number of qubits – typically only a few hundred. This is far fewer than what is needed for high-energy physics (HEP) problems.

Superconducting qubits A 17-qubit quantum computer at ETH Zurich (top). A schematic of the printed circuit board (black square) is shown on the schematic above, where qubits (yellow) are connected by wires that control the qubits by generating microwave pulses (red and blue lines). The gold ports in the photograph connect signal lines to the outside of the chip. During operation, the device is suspended below a cryostat and cooled to 10 mK. Credit: J-C Besse/Quantum Device Lab, ETH Zurich

This situation is known as the noisy “intermediate-scale” quantum era: we are no longer doing proof-of-principle experiments with a few tens of qubits, but neither can we control thousands of them. These limitations mean that current digital simulations are often restricted to “toy” models, such as QED simplified to have just one spatial and one time dimension. Even with these constraints, small-scale devices have successfully reproduced non-perturbative aspects of the theories in real time and have verified the preservation of fundamental physical principles such as gauge invariance, the symmetry that underpins the fundamental forces of the Standard Model.

Quantum simulators may chart a similar path to classical lattice QCD, but with even greater reach. Lattice QCD struggles with real-time evolution and finite-density physics due to the infamous “sign problem”, wherein quantum interference between classically computed amplitudes causes exponentially worsening signal-to-noise ratios. This renders some of the most interesting problems unsolvable on classical machines.

Quantum simulators do not suffer from the sign problem because they evolve naturally in real-time, just like the physical systems they emulate. This promises to open new frontiers such as the simulation of early-universe dynamics, black-hole evaporation and the dense interiors of neutron stars.

Quantum simulators will powerfully augment traditional theoretical and computational methods, offering profound insights when Feynman diagrams become intractable, when dealing with real-time dynamics and when the sign problem renders classical simulations exponentially difficult. Just as the lattice revolution required decades of concerted community effort to reach its full potential, so will the quantum revolution, but the fruits will again transform the field. As the aphorism attributed to Mark Twain goes: history never repeats itself, but it often rhymes.

Quantum information

One of the most exciting and productive developments in recent years is the unexpected, yet profound, convergence between HEP and quantum information science (QIS). For a long time these fields evolved independently. HEP explored the universe’s smallest constituents and grandest structures, while QIS focused on harnessing quantum mechanics for computation and communication. One of the pioneers in studying the interface between these fields was John Bell, a theoretical physicist at CERN.

Just as the lattice revolution needed decades of concerted community effort to reach its full potential, so will the quantum revolution

HEP and QIS are now deeply intertwined. As quantum simulators advance, there is a growing demand for theoretical tools that combine the rigour of quantum field theory with the concepts of QIS. For example, tensor networks were developed in condensed-matter physics to represent highly entangled quantum states, and have now found surprising applications in lattice gauge theories and “holographic dualities” between quantum gravity and quantum field theory. Another example is quantum error correction – a vital QIS technique to protect fragile quantum information from noise, and now a major focus for quantum simulation in HEP.

This cross-disciplinary synthesis is not just conceptual; it is becoming institutional. Initiatives like the US Department of Energy’s Quantum Information Science Enabled Discovery (QuantISED) programme, CERN’s Quantum Technology Initiative (QTI) and Europe’s Quantum Flagship are making substantial investments in collaborative research. Quantum algorithms will become indispensable for theoretical problems just as quantum sensors are becoming indispensable to experimental observation (see “Sensing at quantum limits”).

The result is the emergence of a new breed of scientist: one equally fluent in the fundamental equations of particle physics and the practicalities of quantum hardware. These “hybrid” scientists are building the theoretical and computational scaffolding for a future where quantum simulation is a standard, indispensable tool in HEP.

The post Quantum simulators in high-energy physics appeared first on CERN Courier.

New frontiers in science in the era of AI

cern — Tue, 08 Jul 2025 19:25:45 +0000

Credit: Springer Nature

At a time when artificial intelligence is more buzzword than substance in many corners of public discourse, New Frontiers in Science in the Era of AI arrives with a clear mission: to contextualise AI within the long arc of scientific thought and current research frontiers. This book is not another breathless ode to ChatGPT or deep learning, nor a dry compilation of technical papers. Instead, it’s a broad and ambitious survey, spanning particle physics, evolutionary biology, neuroscience and AI ethics, that seeks to make sense of how emerging technologies are reshaping not only the sciences but knowledge and society more broadly.

The book’s chapters, written by established researchers from diverse fields, aim to avoid jargon while attracting non-specialists, without compromising depth. The book offers an insight into how physics remains foundational across scientific domains, and considers the social, ethical and philosophical implications of AI-driven science.

The first section, “New Physics World”, will be the most familiar terrain for physicists. Ugo Moschella’s essay, “What Are Things Made of? The History of Particles from Thales to Higgs”, opens with a sweeping yet grounded narrative of how metaphysical questions have persisted alongside empirical discoveries. He draws a bold parallel between the ancient idea of mass emerging from a cosmic vortex and the Higgs mechanism, a poetic analogy that holds surprising resonance. Thales, who lived roughly from 624 to 545 BCE, proposed that water is the fundamental substance out of which all others are formed. Following his revelation, Pythagoras and Empedocles added three more items to complete the list of the elements: earth, air and fire. Aristotle added a fifth element: the “aether”. The physical foundation of the standard cosmological model of the ancient world is then rooted in the Aristotelian conceptions of movement and gravity, argues Moschella. His essay lays the groundwork for future chapters that explore entanglement, computation and the transition from thought experiments to quantum technology and AI.

A broad and ambitious survey spanning particle physics, evolutionary biology, neuroscience and AI ethics

The second and third sections venture into evolutionary genetics, epigenetics (the study of heritable changes in gene expression) and neuroscience – areas more peripheral to physics, but timely nonetheless. Contributions by Eva Jablonka, evolutionary theorist and geneticist from Tel Aviv University, and Telmo Pievani, a biologist from the University of Padua, explore the biological implications of gene editing, environmental inheritance and self-directed evolution, as well as the ever-blurring boundaries between what is considered “natural” versus “artificial”. The authors propose that the human ability to edit genes is itself an evolutionary agent – a novel and unsettling idea, as this would be an evolution driven by a will and not by chance. Neuroscientist Jason D Runyan reflects compellingly on free will in the age of AI, blending empirical work with philosophical questions. These chapters enrich the central inquiry of what it means to be a “knowing agent”: someone who acts on nature according to its will, influenced by biological, cognitive and social factors. For physicists, the lesson may be less about adopting specific methods and more about recognising how their own field’s assumptions – about determinism, emergence or complexity – are echoed and challenged in the life sciences.

Perspectives on AI

The fourth section, “Artificial Intelligence Perspectives”, most directly addresses the book’s central theme. The quality, scientific depth and rigour are not equally distributed between these chapters, but are stimulating nonetheless. Topics range from the role of open-source AI in student-led AI projects at CERN’s IdeaSquare and real-time astrophysical discovery. Michael Coughlin and colleagues’ chapter on accelerated AI in astrophysics stands out for its technical clarity and relevance, a solid entry point for physicists curious about AI beyond popular discourse. Absent is an in-depth treatment of current AI applications in high-energy physics, such as anomaly detection in LHC triggers or generative models for simulation. Given the book’s CERN affiliations, this omission is surprising and leaves out some of the most active intersections of AI and high-energy physics (HEP) research.

Even as AI expands our modelling capacity, the epistemic limits of human cognition may remain permanent

The final sections address cosmological mysteries and the epistemological limits of human cognition. David H Wolpert’s epilogue, “What Can We Know About That Which We Cannot Even Imagine?”, serves as a reminder that even as AI expands our modelling capacity, the epistemic limits of human cognition – including conceptual blind spots and unprovable truths – may remain permanent. This tension is not a contradiction but a sobering reflection on the intrinsic boundaries of scientific – and more widely human – knowledge.

This eclectic volume is best read as a reflective companion to one’s own work. For advanced students, postdocs and researchers open to thinking beyond disciplinary boundaries, the book is an enriching, if at times uneven, read.

To a professional scientist, the book occasionally romanticises interdisciplinary exchange between specialised fields without fully engaging with the real methodological difficulties of translating complex concepts to the other sciences. Topics including the limitations of current large-language models, the reproducibility crisis in AI research, and the ethical risks of data-driven surveillance would have benefited from deeper treatment. Ethical questions in HEP may be less prominent in the public eye, but still exist. To mention a few, there are the environmental impact of large-scale facilities, the question of spending a substantial amount of public money on such mega-science projects, the potential dual-use concerns of the technologies developed, the governance of massive international collaborations and data transparency. These deserve more attention, and the book could have explored them more thoroughly.

A timely snapshot

Still, the book doesn’t pretend to be exhaustive. Its strength lies in curating diverse voices and offering a timely snapshot of science, as well as shedding light on ethical and philosophical questions associated with science that are less frequently discussed.

There is a vast knowledge gap in today’s society. Researchers often become so absorbed in their specific domains that they lose sight of their work’s broader philosophical and societal context and the need to explain it to the public. Meanwhile, public misunderstanding of science, and the resulting confusion between fact, theory and opinion, is growing. This gulf provides fertile ground for political manipulation and ideological extremism. New Frontiers in Science in the Era of AI has the immense merit of trying to bridge that gap. The editors and contributors deserve credit for producing a work of both scientific and societal relevance.

The post New frontiers in science in the era of AI appeared first on CERN Courier.

Accelerators on autopilot

cern — Mon, 19 May 2025 07:57:43 +0000

Dynamic and adaptive Inspired by inaccessible instruments such as the James Webb Space Telescope (top), CERN is moving to automate future colliders. Promising initial progress has already been made at multiple stages of the acceleration chain of the LHC (above). Credits: NASA-GSFC, A M Gutierrez (CI Lab)/ CERN-PHOTO-201904-108-2/M Brice

Particle accelerators can be surprisingly temperamental machines. Expertise, specialisation and experience is needed to maintain their performance. Nonlinear and resonant effects keep accelerator engineers and physicists up late into the night. With so many variables to juggle and fine-tune, even the most seasoned experts will be stretched by future colliders. Can artificial intelligence (AI) help?

Proposed solutions take inspiration from space telescopes. The two fields have been jockeying to innovate since the Hubble Space Telescope launched with minimal automation in 1990. In the 2000s, multiple space missions tested AI for fault detection and onboard decision-making, before the LHC took a notable step forward for colliders in the 2010s by incorporating machine learning (ML) in trigger decisions. Most recently, the James Webb Space Telescope launched in 2021 using AI-driven autonomous control systems for mirror alignment, thermal balancing and scheduling science operations with minimal intervention from the ground. The new Efficient Particle Accelerators project at CERN, which I have led since its approval in 2023, is now rolling out AI at scale across CERN’s accelerator complex (see “Dynamic and adaptive” image.

AI-driven automation will only become more necessary in the future. As well as being unprecedented in size and complexity, future accelerators will also have to navigate new constraints such as fluctuating energy availability from intermittent sources like wind and solar power, requiring highly adaptive and dynamic machine operation. This would represent a step change in complexity and scale. A new equipment integration paradigm would automate accelerator operation, equipment maintenance, fault analysis and recovery. Every item of equipment will need to be fully digitalised and able to auto-configure, auto-stabilise, auto-analyse and auto-recover. Like a driverless car, instrumentation and software layers must also be added for safe and efficient performance.

On-site human intervention of the LHC could be treated as a last resort – or perhaps designed out entirely

The final consideration is full virtualisation. While space telescopes are famously inaccessible once deployed, a machine like the Future Circular Collider (FCC) would present similar challenges. Given the scale and number of components, on-site human intervention should be treated as a last resort – or perhaps designed out entirely. This requires a new approach: equipment must be engineered for autonomy from the outset – with built-in margins, high reliability, modular designs and redundancy. Emerging technologies like robotic inspection, automated recovery systems and digital twins will play a central role in enabling this. A digital twin – a real-time, data-driven virtual replica of the accelerator – can be used to train and constrain control algorithms, test scenarios safely and support predictive diagnostics. Combined with differentiable simulations and layered instrumentation, these tools will make autonomous operation not just feasible, but optimal.

The field is moving fast. Recent advances allow us to rethink how humans interact with complex machines – not by tweaking hardware parameters, but by expressing intent at a higher level. Generative pre-trained transformers, a class of large language models, open the door to prompting machines with concepts rather than step-by-step instructions. While further R&D is needed for robust AI copilots, tailor-made ML models have already become standard tools for parameter optimisation, virtual diagnostics and anomaly detection across CERN’s accelerator landscape.

Progress is diverse. AI can reconstruct LHC bunch profiles using signals from wall current monitors, analyse camera images to spot anomalies in the “dump kickers” that safely remove beams, or even identify malfunctioning beam-position monitors. In the following, I identify four different types of AI that have been successfully deployed across CERN’s accelerator complex. They are merely the harbingers of a whole new way of operating CERN’s accelerators.

1. Beam steering with reinforcement learning

In 2020, LINAC4 became the new first link in the LHC’s modernised proton accelerator chain – and quickly became an early success story for AI-assisted control in particle accelerators.

Small deviations in a particle beam’s path within the vacuum chamber can have a significant impact, including beam loss, equipment damage or degraded beam quality. Beams must stay precisely centred in the beampipe to maintain stability and efficiency. But their trajectory is sensitive to small variations in magnet strength, temperature, radiofrequency phase and even ground vibrations. Worse still, errors typically accumulate along the accelerator, compounding the problem. Beam-position monitors (BPMs) provide measurements at discrete points – often noisy – while steering corrections are applied via small dipole corrector magnets, typically using model-based correction algorithms.

Beam steering A groundbreaking 2019 experiment used reinforcement learning to reduce the root mean square (RMS) of the horizontal trajectory of the H^– beams in LINAC4 from its initial uncalibrated value (green line). As training progressed (horizontal axis), the algorithm soon required three or fewer iterations to select corrector settings that achieve a final RMS (blue line) better than the target of 1 mm (dashed red line). Credit: V Kain

In 2019, the reinforcement learning (RL) algorithm normalised advantage function (NAF) was trained online to steer the H^– beam in the horizontal plane of LINAC4 during commissioning. In RL, an agent learns by interacting with its environment and receiving rewards that guide it toward better decisions. NAF uses a neural network to model the so-called Q-function that estimates rewards in RL and uses this to continuously refine its control policy.

Initially, the algorithm required many attempts to find an effective strategy, and in early iterations it occasionally worsened the beam trajectory, but as training progressed, performance improved rapidly. Eventually, the agent achieved a final trajectory better aligned than the goal of an RMS of 1 mm (see “Beam steering” figure).

This experiment demonstrated that RL can learn effective control policies for accelerator-physics problems within a reasonable amount of time. The agent was fully trained after about 300 iterations, or 30 minutes of beam time, making online training feasible. Since 2019, the use of AI techniques has expanded significantly across accelerator labs worldwide, targeting more and more problems that don’t have any classical solution. At CERN, tools such as GeOFF (Generic Optimisation Framework and Frontend) have been developed to standardise and scale these approaches throughout the accelerator complex.

2. Efficient injection with Bayesian optimisation

Bayesian optimisation (BO) is a global optimisation technique that uses a probabilistic model to find the optimal parameters of a system by balancing exploration and exploitation, making it ideal for expensive or noisy evaluations. A game-changing example of its use is the record-breaking LHC ion run in 2024. BO was extensively used all along the ion chain, and made a significant difference in LEIR (the low-energy ion ring, the first synchrotron in the chain) and in the Super Proton Synchrotron (SPS, the last accelerator before the LHC). In LEIR, most processes are no longer manually optimised, but the multi-turn injection process is still non-trivial and depends on various longitudinal and transverse parameters from its injector LINAC3.

Quick recovery Bayesian optimisation is able to restore the beam intensity in LEIR (blue line) to better than nominal values (dashed grey line) within fewer than 100 complete beam shots through the accelerator. Credit: B Rodriguez Mateos

In heavy-ion accelerators, particles are injected in a partially stripped charge state and must be converted to higher charge states at different stages for efficient acceleration. In the LHC ion injector chain, the stripping foil between LINAC3 and LEIR raises the charge of the lead ions from Pb²⁷⁺ to Pb⁵⁴⁺. A second stripping foil, between the PS and SPS, fully ionises the beam to Pb⁸²⁺ ions for final acceleration toward the LHC. These foils degrade over time due to thermal stress, radiation damage and sputtering, and must be remotely exchanged using a rotating wheel mechanism. Because each new foil has slightly different stripping efficiency and scattering properties, beam transmission must be re-optimised – a task that traditionally required expert manual tuning.

In 2024 it was successfully demonstrated that BO with embedded physics constraints can efficiently optimise the 21 most important parameters between LEIR and the LINAC3 injector. Following a stripping foil exchange, the algorithm restored the accumulated beam intensity in LEIR to better than nominal levels within just a few dozen iterations (see “Quick recovery” figure).

This example shows how AI can now match or outperform expert human tuning, significantly reducing recovery time, freeing up operator bandwidth and improving overall machine availability.

3. Adaptively correcting the 50 Hz ripple

In high-precision accelerator systems, even tiny perturbations can have significant effects. One such disturbance is the 50 Hz ripple in power supplies – small periodic fluctuations in current that originate from the electrical grid. While these ripples were historically only a concern for slow-extracted proton beams sent to fixed-target experiments, 2024 revealed a broader impact.

SPS intensity Year-on-year improvements in ion-beam performance in the SPS across the duration of one full ion acceleration cycle from injection to extraction. Each staircase shows stepwise injection and accumulation of bunches in the SPS, far exceeding the LHC Injector Upgrade (LIU) design goal in 2024 thanks to ML algorithms in both LEIR and the SPS. The 2024 increase was achieved with the same number of bunches per cycle (four bunches, 4b) as the previous year. Credit: H Bartosik

In the SPS, adaptive Bayesian optimisation (ABO) was deployed to control this ripple in real time. ABO extends BO by learning the objective not only as a function of the control parameters, but also as a function of time, which then allows continuous control through forecasting.

The algorithm generated shot-by-shot feed-forward corrections to inject precise counter-noise into the voltage regulation of one of the quadrupole magnet circuits. This approach was already in use for the North Area proton beams, but in summer 2024 it was discovered that even for high-intensity proton beams bound for the LHC, the same ripple could contribute to beam losses at low energy.

Thanks to existing ML frameworks, prior experience with ripple compensation and available hardware for active noise injection, the fix could be implemented quickly. While the gains for protons were modest – around 1% improvement in losses – the impact for LHC ion beams was far more dramatic. Correcting the 50 Hz ripple increased ion transmission by more than 15%. ABO is therefore now active whenever ions are accelerated, improving transmission and supporting the record beam intensity achieved in 2024 (see “SPS intensity” figure).

4. Predicting hysteresis with transformers

Another outstanding issue in today’s multi-cycling synchrotrons with iron-dominated electromagnets is correcting for magnetic hysteresis – a phenomenon where the magnetic field depends not only on the current but also on its cycling history. Cumbersome mitigation strategies include playing dummy cycles and manually re-tuning parameters after each change in magnetic history.

SPS hysteresis Magnetic field offsets due to hysteresis as a function of current in SPS dipole magnets for different current cycles. ML predictions (dashed lines) model measured values (solid lines) with a precision of the order of 10^–5 T. Credit: A Lu

While phenomenological hysteresis models exist, their accuracy is typically insufficient for precise beam control. ML offers a path forward, especially when supported by high-quality field measurement data. Recent work using temporal fusion transformers – a deep-learning architecture designed for multivariate time-series prediction – has demonstrated that ML-based models can accurately predict field deviations from the programmed transfer function across different SPS magnetic cycles (see “SPS hysteresis” figure). This hysteresis model is now used in the SPS control room to provide feed-forward corrections – pre-emptive adjustments to magnet currents based on the predicted magnetic state – ensuring field stability without waiting for feedback from beam measurements and manual adjustments.

A blueprint for the future

With the Efficient Particle Accelerators project, CERN is developing a blueprint for the next generation of autonomous equipment. This includes concepts for continuous self-analysis, anomaly detection and new layers of “Internet of Things” instrumentation that support auto-configuration and predictive maintenance. The focus is on making it easier to integrate smart software layers. Full results are expected by the end of LHC Run 3, with robust frameworks ready for deployment in Run 4.

AI can now match or outperform expert human tuning, significantly reducing recovery time and improving overall machine availability

The goal is ambitious: to reduce maintenance effort by at least 50% wherever these frameworks are applied. This is based on a realistic assumption – already today, about half of all interventions across the CERN accelerator complex are performed remotely, a number that continues to grow. With current technologies, many of these could be fully automated.

Together, these developments will not only improve the operability and resilience of today’s accelerators, but also lay the foundation for CERN’s future machines, where human intervention during operation may become the exception rather than the rule. AI is set to transform how we design, build and operate accelerators – and how we do science itself. It opens the door to new models of R&D, innovation and deep collaboration with industry.

The post Accelerators on autopilot appeared first on CERN Courier.

Machine learning in industry

cern — Mon, 19 May 2025 07:10:04 +0000

Flying high Antoni Shtipliyski is an engineering manager at Skyscanner, where he oversees the company’s internal machine-learning platform and operations. Credit: A Shtipliyski

In the past decade, machine learning has surged into every corner of industry, from travel and transport to healthcare and finance. For early-career researchers, who have spent their PhDs and postdocs coding, a job in machine learning may seem a natural next step.

“Scientists often study nature by attempting to model the world around us into mathematical models and computer code,” says Antoni Shtipliyski, engineering manager at Skyscanner. “But that’s only one part of the story if the aim is to apply these models to large-scale research questions or business problems. A completely orthogonal set of challenges revolves around how people collaborate to build and operate these systems. That’s where the real work begins.”

Used to large-scale experiments and collaborative problem solving, particle physicists are uniquely well-equipped to step into machine-learning roles. Shtipliyski worked on upgrades for the level-1 trigger system of the CMS experiment at CERN, before leaving to lead the machine-learning operations team in one of the biggest travel companies in the world.

Effective mindset

“At CERN, building an experimental detector is just the first step,” says Shtipliyski. “To be useful, it needs to be operated effectively over a long period of time. That’s exactly the mindset needed in industry.”

During his time as a physicist, Shtipliyski gained multiple skills that continue to help him at work today, but there were also a number of other areas he developed to succeed in machine learning in industry. One critical gap in a physicists’ portfolio, he notes, is that many people interpret machine-learning careers as purely algorithmic development and model training.

“At Skyscanner, my team doesn’t build models directly,” he says. “We look after the platform used to push and serve machine-learning models to our users. We oversee the techno-social machine that delivers these models to travellers. That’s the part people underestimate, and where a lot of the challenges lie.”

An important factor for physicists transitioning out of academia is to understand the entire lifecycle of a machine-learning project. This includes not only developing an algorithm, but deploying it, monitoring its performance, adapting it to changing conditions and ensuring that it serves business or user needs.

Learning to write and communicate yourself is incredibly powerful

“In practice, you often find new ways that machine-learning models surprise you,” says Shtipliyski. “So having flexibility and confidence that the evolved system still works is key. In physics we’re used to big experiments like CMS being designed 20 years before being built. By the time it’s operational, it’s adapted so much from the original spec. It’s no different with machine-learning systems.”

This ability to live with ambiguity and work through evolving systems is one of the strongest foundations physicists can bring. But large complex systems cannot be built alone, so companies will be looking for examples of soft skills: teamwork, collaboration, communication and leadership.

“Most people don’t emphasise these skills, but I found them to be among the most useful,” Shtipliyski says. “Learning to write and communicate yourself is incredibly powerful. Being able to clearly express what you’re doing and why you’re doing it, especially in high-trust environments, makes everything else easier. It’s something I also look for when I do hiring.”

Industry may not offer the same depth of exploration as academia, but it does offer something equally valuable: breadth, variety and a dynamic environment. Work evolves fast, deadlines come more readily and teams are constantly changing.

“In academia, things tend to move more slowly. You’re encouraged to go deep into one specific niche,” says Shtipliyski. “In industry, you often move faster and are sometimes more shallow. But if you can combine the depth of thought from academia with the breadth of experience from industry, that’s a winning combination.”

Applied skills

For physicists eyeing a career in machine learning, the most they can do is to familiarise themselves with tools and practices for building and deploying models. Show that you can use the skills developed in academia and apply them to other environments. This tells recruiters that you have a willingness to learn, and is a simple but effective way of demonstrating commitment to a project from start to finish, beyond your assigned work.

“People coming from physics or mathematics might want to spend more time on implementation,” says Shtipliyski. “Even if you follow a guided walkthrough online, or complete classes on Coursera, going through the whole process of implementing things from scratch teaches you a lot. This puts you in a position to reason about the big picture and shows employers your willingness to stretch yourself, to make trade-offs and to evaluate your work critically.”

A common misconception is that practicing machine learning outside of academia is somehow less rigorous or less meaningful. But in many ways, it can be more demanding.

Scientific development is often driven by arguments of beauty and robustness. In industry, there’s less patience for that,” he says. “You have to apply it to a real-world domain – finance, travel, healthcare. That domain shapes everything: your constraints, your models, even your ethics.”

Shtipliyski emphasises that the technical side of machine learning is only one half of the equation. The other half is organisational: helping teams work together, navigate constraints and build systems that evolve over time. Physicists would benefit from exploring different business domains to understand how machine learning is used in different contexts. For example, GDPR constraints make privacy a critical issue in healthcare and tech. Learning how government funding is distributed throughout each project, as well as understanding how to build a trusting relationship between the funding agencies and the team, is equally important.

“A lot of my day-to-day work is just passing information, helping people build a shared mental model,” he says. “Trust is earned by being vulnerable yourself, which allows others to be vulnerable in turn. Once that happens, you can solve almost any problem.”

Taking the lead

Particle physicists are used to working in high-stakes, international teams, so this collaborative mindset is engrained in their training. But many may not have had the opportunity to lead, manage or take responsibility for an entire project from start to finish.

“In CMS, I did not have a lot of say due to the complexity and scale of the project, but I was able to make meaningful contributions in the validation and running of the detector,” says Shtipliyski. “But what I did not get much exposure to was the end-to-end experience, and that’s something employers really want to see.”

This does not mean you need to be a project manager to gain leadership experience. Early-career researchers have the chance to up-skill when mentoring a newcomer, help improve the team’s workflow in a proactive way, or network with other physicists and think outside the box.

You can be the dedicated expert in the room, even if you’re new. That feels really empowering

“Even if you just shadow an existing project, if you can talk confidently about what was done, why it was done and how it might be done differently – that’s huge.”

Many early-career researchers hesitate prior to leaving academia. They worry about making the “wrong” choice, or being labelled as a “finance person” or “tech person” as soon as they enter another industry. This is something Shtipliyski struggled to reckon with, but eventually realised that such labels do not define you.

“It was tough at CERN trying to anticipate what comes next,” he admits. “I thought that I could only have one first job. What if it’s the wrong one? But once a scientist, always a scientist. You carry your experiences with you.”

Shtipliyski quickly learnt that industry operates under a different set of rules: where everyone comes from a different background, and the levels of expertise differ depending on the person you will speak to next. Having faced intense imposter syndrome at CERN – having shared spaces with world-leading experts – industry offered Shtipliyski a more level playing field.

“In academia, there’s a kind of ladder: the longer you stay, the better you get. In industry, it’s not like that,” says Shtipliyski. “You can be the dedicated expert in the room, even if you’re new. That feels really empowering.”

Industry rewards adaptability as much as expertise. For physicists stepping beyond academia, the challenge is not abandoning their training, but expanding it – learning to navigate ambiguity, communicate clearly and understand the full lifecycle of real-world systems. Harnessing a scientist’s natural curiosity, and demonstrating flexibility, allows the transition to become less about leaving science behind, and more about discovering new ways to apply it.

“You are the collection of your past experiences,” says Shtipliyski. “You have the freedom to shape the future.”

The post Machine learning in industry appeared first on CERN Courier.

An international year like no other

cern — Thu, 03 Apr 2025 09:41:02 +0000

Last June, the United Nations and UNESCO proclaimed 2025 the International Year of Quantum (IYQ): here is why it really matters.

Everything started a century ago, when scientists like Niels Bohr, Max Planck and Wolfgang Pauli, but also Albert Einstein, Erwin Schrödinger and many others, came up with ideas that would revolutionise our description of the subatomic world. This is when physics transitioned from being a deterministic discipline to a mostly probabilistic one, at least when we look at subatomic scales. Brave predictions of weird behaviours started to attract the attention of an increasingly larger part of the scientific community, and continued to appear decade after decade. The most popular ones being: particle entanglement, the superposition of states and the tunnelling effect. These are also some of the most impactful quantum effects, in terms of the technologies that emerged from them.

One hundred years on, and the scientific community is somewhat acclimatised to observing and measuring the probabilistic nature of particles and quanta. Lasers, MRI and even sliding doors would not exist without the pioneering studies on quantum mechanics. However, it’s common opinion that today we are on the edge of a second quantum revolution.

“International years” are proclaimed to raise awareness, focus global attention, encourage cooperation and mobilise resources towards a certain topic or research domain. The International Year of Quantum also aims to reverse-engineer the approach taken with artificial intelligence (AI), a technology that came along faster than any attempt to educate and prepare the layperson for its adoption. As we know, this is creating a lot of scepticism towards AI, which is often felt to be too complex and designed to generate a loss of control in its users.

The second quantum revolution has begun and we are at the dawn of future powerful applications

The second quantum revolution has begun in recent years and, while we are rapidly moving from simply using the properties of the quantum world to controlling individual quantum systems, we are still at the dawn of future powerful applications. Some quantum sensors are already being used, and quantum cryptography is quite well understood. However, quantum bits need further studies and the exploration of other quantum fields has not even started yet.

Unlike AI, we still have time to push for a more inclusive approach to the development of new technology. During the international year, hundreds of events, workshops and initiatives will emphasise the role of global collaboration in the development of accessible quantum technologies. Through initiatives like the Quantum Technology Initiative (QTI) and the Open Quantum Institute (OQI), CERN is actively contributing not only to scientific research but also to promoting the advancement of its applications for the benefit of society.

The IYQ inaugural event was organised at UNESCO Headquarters in Paris in February 2025. At CERN, this year’s public event season is devoted to the quantum year, and will present talks, performances, a film festival and more. The full programme is available at visit.cern/events.

The post An international year like no other appeared first on CERN Courier.

Game on for physicists

cern — Wed, 26 Mar 2025 14:35:42 +0000

Creative direction Raphael Granier de Cassagnac is a research director at CNRS and creative director of Exographer, a computer game that uses concepts from particle physics. Credits: F Plas/CNRS Photothèque (top); SciFunGames (bottom)

“Confucius famously may or may not have said: ‘When I hear, I forget. When I see, I remember. When I do, I understand.’ And computer-game mechanics can be inspired directly by science. Study it well, and you can invent game mechanics that allow you to engage with and learn about your own reality in a way you can’t when simply watching films or reading books.”

So says Raphael Granier de Cassagnac, a research director at France’s CNRS and Ecole Polytechnique, as well as member of the CMS collaboration at the CMS. Granier de Cassagnac is also the creative director of Exographer, a science-fiction computer game that draws on concepts from particle physics and is available on Steam, Switch, PlayStation 5 and Xbox.

“To some extent, it’s not too different from working at a place like CMS, which is also a super complicated object,” explains Granier de Cassagnac. Developing a game often requires graphic artists, sound designers, programmers and science advisors. To keep a detector like CMS running, you need engineers, computer scientists, accelerator physicists and funding agencies. And that’s to name just a few. Even if you are not the primary game designer or principal investigator, understanding the
fundamentals is crucial to keep the project running efficiently.

Root skills

Most physicists already have some familiarity with structured programming and data handling, which eases the transition into game development. Just as tools like ROOT and Geant4 serve as libraries for analysing particle collisions, game engines such as Unreal, Unity or Godot provide a foundation for building games. Prebuilt functionalities are used to refine the game mechanics.

“Physicists are trained to have an analytical mind, which helps when it comes to organising a game’s software,” explains Granier de Cassagnac. “The engine is merely one big library, and you never have to code anything super complicated, you just need to know how to use the building blocks you have and code in smaller sections to optimise the engine itself.”

While coding is an essential skill for game production, it is not enough to create a compelling game. Game design demands storytelling, character development and world-building. Structure, coherence and the ability to guide an audience through complex information are also required.

“Some games are character-driven, others focus more on the adventure or world-building,” says Granier de Cassagnac. “I’ve always enjoyed reading science fiction and playing role-playing games like Dungeons and Dragons, so writing for me came naturally.”

Entrepreneurship and collaboration are also key skills, as it is increasingly rare for developers to create games independently. Universities and startup incubators can provide valuable support through funding and mentorship. Incubators can help connect entrepreneurs with industry experts, and bridge the gap between scientific research and commercial viability.

“Managing a creative studio and a company, as well as selling the game, was entirely new for me,” recalls Granier de Cassagnac. “While working at CMS, we always had long deadlines and low pressure. Physicists are usually not prepared for the speed of the industry at all. Specialised offices in most universities can help with valorisation – taking scientific research and putting it on the market. You cannot forget that your academic institutions are still part of your support network.”

Though challenging to break into, opportunity abounds for those willing to upskill

The industry is fiercely competitive, with more games being released than players can consume, but a well-crafted game with a unique vision can still break through. A common mistake made by first-time developers is releasing their game too early. No matter how innovative the concept or engaging the mechanics, a game riddled with bugs frustrates players and damages its reputation. Even with strong marketing, a rushed release can lead to negative reviews and refunds – sometimes sinking a project entirely.

“In this industry, time is money and money is time,” explains Granier de Cassagnac. But though challenging to break into, opportunity abounds for those willing to upskill, with the gaming industry worth almost $200 billion a year and reaching more than three billion players worldwide by Granier de Cassagnac’s estimation. The most important aspects for making a successful game are originality, creativity, marketing and knowing the engine, he says.

“Learning must always be part of the process; without it we cannot improve,” adds Granier de Cassagnac, referring to his own upskilling for the company’s next project, which will be even more ambitious in its scientific coverage. “In the next game we want to explore the world as we know it, from the Big Bang to the rise of technology. We want to tell the story of humankind.”

The post Game on for physicists appeared first on CERN Courier.

The triggering of tomorrow

cern — Wed, 26 Mar 2025 14:14:12 +0000

The third edition of Triggering Discoveries in High Energy Physics (TDHEP) attracted 55 participants to Slovakia’s High Tatras mountains from 9 to 13 December 2024. The workshop is the only conference dedicated to triggering in high-energy physics, and follows previous editions in Jammu, India in 2013 and Puebla, Mexico in 2018. Given the upcoming High-Luminosity LHC (HL-LHC) upgrade, discussions focused on how trigger systems can be enhanced to manage high data rates while preserving physics sensitivity.

Triggering systems play a crucial role in filtering the vast amounts of data generated by modern collider experiments. A good trigger design selects features in the event sample that greatly enrich the proportion of the desired physics processes in the recorded data. The key considerations are timing and selectivity. Timing has long been at the core of experiment design – detectors must capture data at the appropriate time to record an event. Selectivity has been a feature of triggering for almost as long. Recording an event makes demands on running time and data-acquisition bandwidth, both of which are limited.

Evolving architecture

Thanks to detector upgrades and major changes in the cost and availability of fast data links and storage, the past 10 years have seen an evolution in LHC triggers away from hardware-based decisions using coarse-grain information.

Detector upgrades mean higher granularity and better time resolution, improving the precision of the trigger algorithms and the ability to resolve the problem of having multiple events in a single LHC bunch crossing (“pileup”). Such upgrades allow more precise initial-level hardware triggering, bringing the event rate down to a level where events can be reconstructed for further selection via high-level trigger (HLT) systems.

To take advantage of modern computer architecture more fully, HLTs use both graphics processing units (GPUs) and central processing units (CPUs) to process events. In ALICE and LHCb this leads to essentially triggerless access to all events, while in ATLAS and CMS hardware selections are still important. All HLTs now use machine learning (ML) algorithms, with the ATLAS and CMS experiments even considering their use at the first hardware level.

ATLAS and CMS are primarily designed to search for new physics. At the end of Run 3, upgrades to both experiments will significantly enhance granularity and time resolution to handle the high-luminosity environment of the HL-LHC, which will deliver up to 200 interactions per LHC bunch crossing. Both experiments achieved efficient triggering in Run 3, but higher luminosities, difficult-to-distinguish physics signatures, upgraded detectors and increasingly ambitious physics goals call for advanced new techniques. The step change will be significant. At HL-LHC, the first-level hardware trigger rate will increase from the current 100 kHz to 1 MHz in ATLAS and 760 kHz in CMS. The price to pay is increasing the latency – the time delay between input and output – to 10 µsec in ATLAS and 12.5 µsec in CMS.

The proposed trigger systems for ATLAS and CMS are predominantly FPGA-based, employing highly parallelised processing to crunch huge data streams efficiently in real time. Both will be two-level triggers: a hardware trigger followed by a software-based HLT. The ATLAS hardware trigger will utilise full-granularity calorimeter and muon signals in the global-trigger-event processor, using advanced ML techniques for real-time event selection. In addition to calorimeter and muon data, CMS will introduce a global track trigger, enabling real-time tracking at the first trigger level. All information will be integrated within the global-correlator trigger, which will extensively utilise ML to enhance event selection and background suppression.

Substantial upgrades

The other two big LHC experiments already implemented substantial trigger upgrades at the beginning of Run 3. The ALICE experiment is dedicated to studying the strong interactions of the quark–gluon plasma – a state of matter in which quarks and gluons are not confined in hadrons. The detector was upgraded significantly for Run 3, including the trigger and data-acquisition systems. The ALICE continuous readout can cope with 50 kHz for lead ion–lead ion (PbPb) collisions and several MHz for proton–proton (pp) collisions. In PbPb collisions the full data is continuously recorded and stored for offline analysis, while for pp collisions the data is filtered.

Unlike in Run 2, where the hardware trigger reduced the data rate to several kHz, Run 3 uses an online software trigger that is a natural part of the common online–offline computing framework. The raw data from detectors is streamed continuously and processed in real time using high-performance FPGAs and GPUs. ML plays a crucial role in the heavy-flavour software trigger, which is one of the main physics interests. Boosted decision trees are used to identify displaced vertices from heavy quark decays. The full chain from saving raw data in a 100 PB buffer to selecting events of interest and removing the original raw data takes about three weeks and was fully employed last year.

The third edition of TDHEP suggests that innovation in this field is only set to accelerate

The LHCb experiment focuses on precision measurements in heavy-flavour physics. A typical example is measuring the probability of a particle decaying into a certain decay channel. In Run 2 the hardware trigger tended to saturate in many hadronic channels when the luminosity was instantaneously increased. To solve this issue for Run 3 a high-level software trigger was developed that can handle 30 MHz event readout with 4 TB/s data flow. A GPU-based partial event reconstruction and primary selection of displaced tracks and vertices (HLT1) reduces the output data rate to 1 MHz. The calibration and detector alignment (embedded into the trigger system) are calculated during data taking just after HLT1 and feed full-event reconstruction (HLT2), which reduces the output rate to 20 kHz. This represents 10 GB/s written to disk for later analysis.

Away from the LHC, trigger requirements differ considerably. Contributions from other areas covered heavy-ion physics at Brookhaven National Laboratory’s Relativistic Heavy Ion Collider (RHIC), fixed-target physics at CERN and future experiments at the Facility for Antiproton and Ion Research at GSI Darmstadt and Brookhaven’s Electron–Ion Collider (EIC). NA62 at CERN and STAR at RHIC both use conventional trigger strategies to arrive at their final event samples. The forthcoming CBM experiment at FAIR and the ePIC experiment at the EIC deal with high intensities but aim for “triggerless” operation.

Requirements were reported to be even more diverse in astroparticle physics. The Pierre Auger Observatory combines local and global trigger decisions at three levels to manage the problem of trigger distribution and data collection over 3000 km² of fluorescence and Cherenkov detectors.

These diverse requirements will lead to new approaches being taken, and evolution as the experiments are finalised. The third edition of TDHEP suggests that innovation in this field is only set to accelerate.

The post The triggering of tomorrow appeared first on CERN Courier.

How to unfold with AI

cern — Mon, 27 Jan 2025 08:00:50 +0000

Open-science unfolding AI-based unfolding exploits the strengths of deep learning to remove detector-specific distortions without reducing the dimensionality of the data set. Credit: A Epshtein

All scientific measurements are affected by the limitations of measuring devices. To make a fair comparison between data and a scientific hypothesis, theoretical predictions must typically be smeared to approximate the known distortions of the detector. Data is then compared with theory at the level of the detector’s response. This works well for targeted measurements, but the detector simulation must be reapplied to the underlying physics model for every new hypothesis.

The alternative is to try to remove detector distortions from the data, and compare with theoretical predictions at the level of the theory. Once detector effects have been “unfolded” from the data, analysts can test any number of hypotheses without having to resimulate or re-estimate detector effects – a huge advantage for open science and data preservation that allows comparisons between datasets from different detectors. Physicists without access to the smearing functions can only use unfolded data.

No simple task

But unfolding detector distortions is no simple task. If the mathematical problem is solved through a straightforward inversion, using linear algebra, noisy fluctuations are amplified, resulting in large uncertainties. Some sort of “regularisation” must be imposed to smooth the fluctuations, but algorithms vary substantively and none is preeminent. Their scope has remained limited for decades. No traditional algorithm is capable of reliably unfolding detector distortions from data relative to more than a few observables at a time.

In the past few years, a new technique has emerged. Rather than unfolding detector effects from only one or two observables, it can unfold detector effects from multiple observables in a high-dimensional space; and rather than unfolding detector effects from binned histograms, it unfolds detector effects from an unbinned distribution of events. This technique is inspired by both artificial-intelligence techniques and the uniquely sparse and high-dimensional data sets of the LHC.

An ill-posed problem

Unfolding is used in many fields. Astronomers unfold point-spread functions to reveal true sky distributions. Medical physicists unfold detector distortions from CT and MRI scans. Geophysicists use unfolding to infer the Earth’s internal structure from seismic-wave data. Economists attempt to unfold the true distribution of opinions from incomplete survey samples. Engineers use deconvolution methods for noise reduction in signal processing. But in recent decades, no field has had a greater need to innovate unfolding techniques than high-energy physics, given its complex detectors, sparse datasets and stringent standards for statistical rigour.

In traditional unfolding algorithms, analysers first choose which quantity they are interested in measuring. An event generator then creates a histogram of the true values of this observable for a large sample of events in their detector. Next, a Monte Carlo simulation simulates the detector response, accounting for noise, background modelling, acceptance effects, reconstruction errors, misidentification errors and energy smearing. A matrix is constructed that transforms the histogram of the true values of the observable into the histogram of detector-level events. Finally, analysts “invert” the matrix and apply it to data, to unfold detector effects from the measurement.

How to unfold traditionally

Diverse algorithms have been invented to unfold distortions from data, with none yet achieving preeminence.

• Developed by Soviet mathematician Andrey Tikhonov in the late 1940s, Tikhonov regularisation (TR) frames unfolding as a minimisation problem with a penalty term added to suppress fluctuations in the solution.

• In the 1950s, statistical mechanic Edwin Jaynes took inspiration from information theory to seek solutions with maximum entropy, seeking to minimise bias beyond the data constraints.

• Between the 1960s and the 1990s, high-energy physicists increasingly drew on the linear algebra of 19th-century mathematicians Eugenio Beltrami and Camille Jordan to develop singular value decomposition as a pragmatic way to suppress noisy fluctuations.

• In the 1990s, Giulio D’Agostini and other high-energy physicists developed iterative Bayesian unfolding (IBU)– a similar technique to Lucy–Richardson deconvolution, which was developed independently in astronomy in the 1970s. An explicitly probabilistic approach well suited to complex detectors, IBU may be considered a forerunner of the neural-network-based technique described in this article.

IBU and TR are the most widely-used approaches in high-energy physics today, with the RooUnfold tool started by Tim Adye serving countless analysts.

At this point in the analysis, the ill-posed nature of the problem presents a major challenge. A simple matrix inversion seldom suffices as statistical noise produces large changes in the estimated input. Several algorithms have been proposed to regularise these fluctuations. Each comes with caveats and constraints, and there is no consensus on a single method that outperforms the rest (see “How to unfold traditionally” panel).

While these approaches have been successfully applied to thousands of measurements at the LHC and beyond, they have limitations. Histogramming is an efficient way to describe the distributions of one or two observables, but the number of bins grows exponentially with the number of parameters, restricting the number of observables that can be simultaneously unfolded. When unfolding only a few observables, model dependence can creep in, for example due to acceptance effects, and if another scientist wants to change the bin sizes or measure a different observable, they will have to redo the entire process.

New possibilities

AI opens up new possibilities for unfolding particle-physics data. Choosing good parameterisations in a high-dimensional space is difficult for humans, and binning is a way to limit the number of degrees of freedom in the problem, making it more tractable. Machine learning (ML) offers flexibility due to the large number of parameters in a deep neural network. Dozens of observables can be unfolded at once, and unfolded datasets can be published as an unbinned collection of individual events that have been corrected for detector distortions as an ensemble.

Better performance The AI unfolding algorithm OmniFold (red) outperforms iterative Bayesian unfolding (IBU, grey) in unfolding synthetic data for six jet-substructure observables in a simulation of the CMS detector. Physics-level observables from the Herwig generator (green) are smeared with detector effects to generate the synthetic data (black). Both unfolding algorithms are then trained by physics-level (blue) and detector-level (yellow) events originating from the Pythia generator. OmniFold unfolds each observable simultaneously, while IBU must unfold each observable separately. Credit: A Andreassen et al. 2020 Phys. Rev. Lett. 124 182001

One way to represent the result is as a set of simulated events with weights that encode information from the data. For example, if there are 10 times as many simulated events as real events, the average weight would be about 0.1, with the distribution of weights correcting the simulation to match reality, and errors on the weights reflecting the uncertainties inherent in the unfolding process. This approach gives maximum flexibility to future analysts, who can recombine them into any binning or combination they desire. The weights can be used to build histograms or compute statistics. The full covariance matrix can also be extracted from the weights, which is important for downstream fits.

But how do we know the unfolded values are capturing the truth, and not just “hallucinations” from the AI model?

An important validation step for these analyses are tests performed on synthetic data with a known answer. Analysts take new simulation models, different from the one being used for the primary analysis, and treat them as if they were real data. By unfolding these alternative simulations, researchers are able to compare their results to a known answer. If the biases are large, analysts will need to refine their methods to reduce the model-dependency. If the biases are small compared to the other uncertainties then this remaining difference can be added into the total uncertainty estimate, which is calculated in the traditional way using hundreds of simulations. In unfolding problems, the choice of regularisation method and strength always involves some tradeoff between bias and variance.

Just as unfolding in two dimensions instead of one with traditional methods can reduce model dependence by incorporating more aspects of the detector response, ML methods use the same underlying principle to include as much of the detector response as possible. Learning differences between data and simulation in high-dimensional spaces is the kind of task that ML excels at, and the results are competitive with established methods (see “Better performance” figure).

Neural learning

In the past few years, AI techniques have proven to be useful in practice, yielding publications from the LHC experiments, the H1 experiment at HERA and the STAR experiment at RHIC. The key idea underpinning the strategies used in each of these results is to use neural networks to learn a function that can reweight simulated events to look like data. The neural network is given a list of relevant features about an event such as the masses, energies and momenta of reconstructed objects, and trained to output the probability that it is from a Monte Carlo simulation or the data itself. Neural connections that reweight and combine the inputs across multiple layers are iteratively adjusted depending on the network’s performance. The network thereby learns the relative densities of the simulation and data throughout phase space. The ratio of these densities is used to transform the simulated distribution into one that more closely resembles real events (see “OmniFold” figure).

OmniFold Illustration of AI unfolding using the OmniFold algorithm. Reconstructed events are reweighted to match the data, and the corresponding reweighting of Monte Carlo truth is inferred. The procedure iterates and the final reweighted Monte Carlo truth distribution serves as the unfolded measurement. This figure is based on a diagram in the ATLAS collaboration’s recent paper in Physical Review Letters – the first analysis to be published unbinned, allowing readers to make their own derivative measurements. Credit: ATLAS Collab. 2024 arXiv:2405.20041

As this is a recently-developed technique, there are plenty of opportunities for new developments and improvements. These strategies are in principle capable of handling significant levels of background subtraction as well as acceptance and efficiency effects, but existing LHC measurements using AI-based unfolding generally have small backgrounds. And as with traditional methods, there is a risk in trying to estimate too many parameters from not enough data. This is typically controlled by stopping the training of the neural network early, combining multiple trainings into a single result, and performing cross validations on different subsets of the data.

Beyond the “OmniFold” methods we are developing, an active community is also working on alternative techniques, including ones based on generative AI. Researchers are also considering creative new ways to use these unfolded results that aren’t possible with traditional methods. One possibility in development is unfolding not just a selection of observables, but the full event. Another intriguing direction could be to generate new events with the corrections learnt by the network built-in. At present, the result of the unfolding is a reweighted set of simulated events, but once the neural network has been trained, its reweighting function could be used to simulate the unfolded sample from scratch, simplifying the output.

The post How to unfold with AI appeared first on CERN Courier.

The new hackerpreneur

cern — Mon, 27 Jan 2025 07:22:11 +0000

The World Wide Web, AI and quantum computing – what do these technologies have in common? They all started out as “hacks”, says Jiannan Zhang, founder of the open-source community platform DoraHacks. “When the Web was invented at CERN, it demonstrated that in order to fundamentally change how people live and work, you have to think of new ways to use existing technology,” says Zhang. “Progress cannot be made if you always start from scratch. That’s what hackathons are for.”

Ten years ago, Zhang helped organise the first CERN Webfest, a hackathon that explores creative uses of technology for science and society. Webfest helped Zhang develop his coding skills and knowledge of physics by applying it to something beyond his own discipline. He also made long-lasting connections with teammates, who were from different academic backgrounds and all over the world. After participating in more hackathons, Zhang’s growing “hacker spirit” inspired him to start his own company. In 2024 Zhang returned to Webfest not as a participant, but as the CEO of DoraHacks.

Hackathons are social coding events often spanning multiple days. They are inclusive and open – no academic institution or corporate backing is required – making them accessible to a diverse range of talented individuals. Participants work in teams, pooling their skills to tackle technical problems through software, hardware or a business plan for a new product. Physicists, computer scientists, engineers and entrepreneurs all bring their strengths to the table. Young scientists can pursue work that may not fit within typical research structures, develop their skills, and build portfolios and professional networks.

“If you’re really passionate about something, you should be able to jump on a project and work on it,” says Zhang. “You shouldn’t need to be associated with a university or have a PhD to pursue it.”

For early-career researchers, hackathons offer more than just technical challenges. They provide an alternative entry point into research and industry, bridging the gap between academia and real-world applications. University-run hackathons often attract corporate sponsors, giving them the budget to rent out stadiums with hundreds, sometimes thousands, of attendees.

“These large-scale hackathons really capture the attention of headhunters and mentors from industry,” explains Zhang. “They see the events as a recruitment pool. It can be a really effective way to advance careers and speak to representatives of big companies, as well as enhancing your coding skills.”

In the 2010s, weekend hackathons served as Zhang’s stepping stone into entrepreneurship. “I used to sit in the computer-science common room and work on my hacks. That’s how I met most of my friends,” recalled Zhang. “But later I realised that to build something great, I had to effectively organise people and capital. So I started to skip my computer-science classes and sneak into the business classrooms.” Zhang would hide in the back row of the business lectures, plotting his plan towards entrepreneurship. He networked with peers to evaluate different business models each day. “It was fun to combine our knowledge of engineering and business theory,” he added. “It made the journey a lot less stressful.”

But the transition from science to entrepreneurship was hard. “At the start you must learn and do everything yourself. The good thing is you’re exposed to lots of new skills and new people, but you also have to force yourself to do things you’re not usually good at.”

This is a dilemma many entrepreneurs face: whether to learn new skills from scratch, or to find business partners and delegate tasks. But finding trustworthy business partners is not always easy, and making the wrong decision can hinder the start up’s progress. That’s why planning the company’s vision and mission from the start is so important.

“The solution is actually pretty straight forward,” says Zhang. “You need to spend more time completing the important milestones yourself, to ensure you have a feasible product. Once you make the business plan and vision clear, you get support from everywhere.”

Decentralised community governance

Rather than hackathon participants competing for a week before abandoning their code, Zhang started DoraHacks to give teams from all over the world a chance to turn their ideas into fully developed products. “I want hackathons to be more than a recruitment tool,” he explains. “They should foster open-source development and decentralised community governance. Today, a hacker from Tanzania can collaborate virtually with a team in the US, and teams gain support to develop real products. This helps make tech fields much more diverse and accessible.”

Zhang’s company enables this by reducing logistical costs for organisers and providing funding mechanisms for participants, making hackathons accessible to aspiring researchers beyond academic institutions. As the community expands, new doors open for young scientists at the start of their careers.

“The business model is changing,” says Zhang. Hackathons are becoming fundamental to emerging technologies, particularly in areas like quantum computing, blockchain and AI, which often start out open source. “There will be a major shift in the process of product creation. Instead of building products in isolation, new technologies rely on platforms and infrastructure where hackers can contribute.”

Today, hackathons aren’t just about coding or networking – they’re about pushing the boundaries of what’s possible, creating meaningful solutions and launching new career paths. They act as incubators for ideas with lasting impact. Zhang wants to help these ideas become reality. “The future of innovation is collaborative and open source,” he says. “The old world relies on corporations building moats around closed-source technology, which is inefficient and inaccessible. The new world is centred around open platform technology, where people can build on top of old projects. This collaborative spirit is what makes the hacker movement so important.”

The post The new hackerpreneur appeared first on CERN Courier.

ICFA talks strategy and sustainability in Prague

cern — Mon, 27 Jan 2025 07:13:18 +0000

ICFA, the International Committee for Future Accelerators, was formed in 1976 to promote international collaboration in all phases of the construction and exploitation of very-high-energy accelerators. Its 96th meeting took place on 20 and 21 July during the recent ICHEP conference in Prague. Almost all of the 16 members from across the world attended in person, making the assembly lively and constructive.

The committee heard extensive reports from the leading HEP laboratories and various world regions on their recent activities and plans, including a presentation by Paris Sphicas, the chair of the European Committee for Future Accelerators (ECFA), on the process for the update of the European strategy for particle physics (ESPP). Launched by CERN Council in March 2024, the ESPP update is charged with recommending the next collider project at CERN after HL-LHC operation.

A global task

The ESPP update is also of high interest to non-European institutions and projects. Consequently, in addition to the expected inputs to the strategy from European HEP communities, those from non-European HEP communities are also welcome. Moreover, the recent US P5 report and the Chinese plans for CEPC, with a potential positive decision in 2025/2026, and discussions about the ILC project in Japan, will be important elements of the work to be carried out in the context of the ESPP update. They also emphasise the global nature of high-energy physics.

An integral part of the work of ICFA is carried out within its panels, which have been very active. Presentations were given from the new panel on the Data Lifecycle (chair Kati Lassila-Perini, Helsinki), the Beam Dynamics panel (new chair Yuan He, IMPCAS) and the Advanced and Novel Accelerators panel (new chair Patric Muggli, Max Planck Munich, proxied at the meeting by Brigitte Cros, Paris-Saclay). The Instrumentation and Innovation Development panel (chair Ian Shipsey, Oxford) is setting an example with its numerous schools, the ICFA instrumentation awards and centrally sponsored instrumentation studentships for early-career researchers from underserved world regions. Finally, the chair of the ILC International Development Team panel (Tatsuya Nakada, EPFL) summarised the latest status of the ILC Technological Network, and the proposed ILC collider project in Japan.

ICFA noted interesting structural developments in the global organisation of HEP

A special session was devoted to the sustainability of HEP accelerator infrastructures, considering the need to invest efforts into guidelines that enable better comparison of the environmental reports of labs and infrastructures, in particular for future facilities. It was therefore natural for ICFA to also hear reports not only from the panel on Sustainable Accelerators and Colliders led by Thomas Roser (BNL), but also from the European Lab Directors Working Group on Sustainability. This group, chaired by Caterina Bloise (INFN) and Maxim Titov (CEA), is mandated to develop a set of key indicators and a methodology for the reporting on future HEP projects, to be delivered in time for the ESPP update.

Finally, ICFA noted some very interesting structural developments in the global organisation of HEP. In the Asia-Oceania region, ACFA-HEP was recently formed as a sub-panel under the Asian Committee for Future Accelerators (ACFA), aiming for a better coordination of HEP activities in this particular region of the world. Hopefully, this will encourage other world regions to organise themselves in a similar way in order to strengthen their voice in the global HEP community – for example in Latin America. Here, a meeting was organised in August by the Latin American Association for High Energy, Cosmology and Astroparticle Physics (LAA-HECAP) to bring together scientists, institutions and funding agencies from across Latin America to coordinate actions for jointly funding research projects across the continent.

The next in-person ICFA meeting will be held during the Lepton–Photon conference in Madison, Wisconsin (USA), in August 2025.

The post ICFA talks strategy and sustainability in Prague appeared first on CERN Courier.

Rapid developments in precision predictions

cern — Fri, 24 Jan 2025 15:57:39 +0000

HP2 participants High Precision for Hard Processes 2024 took place in Turin, Italy. Credit: L Magnea

Achieving a theoretical uncertainty of only a few per cent in the measurement of physical observables is a vastly challenging task in the complex environment of hadronic collisions. To keep pace with experimental observations at the LHC and elsewhere, precision computing has had to develop rapidly in recent years – efforts that have been monitored and driven by the biennial High Precision for Hard Processes (HP2) conference for almost two decades now. The latest edition attracted 120 participants to the University of Torino from 10 to 13 September 2024.

All speakers addressed the same basic question: how can we achieve the most precise theoretical description for a wide variety of scattering processes at colliders?

The recipe for precise prediction involves many ingredients, so the talks in Torino probed several research directions. Advanced methods for the calculation of scattering amplitudes were discussed, among others, by Stephen Jones (IPPP Durham). These methods can be applied to detailed high-order phenomenological calculations for QCD, electroweak processes and BSM physics, as illustrated by Ramona Groeber (Padua) and Eleni Vryonidou (Manchester). Progress in parton showers – a crucial tool to bridge amplitude calculations and experimental results – was presented by Silvia Ferrario Ravasio (CERN). Dedicated methods to deal with the delicate issue of infrared divergences in high-order cross-section calculations were reviewed by Chiara Signorile-Signorile (Max Planck Institute, Munich).

The Torino conference was dedicated to the memory of Stefano Catani, a towering figure in the field of high-energy physics, who suddenly passed away at the beginning of this year. Starting from the early 1980s, and for the whole of his career, Catani made groundbreaking contributions in every facet of HP2. He was an inspiration to a whole generation of physicists working in high-energy phenomenology. We remember him as a generous and kind person, and a scientist of great rigour and vision. He will be sorely missed.

The post Rapid developments in precision predictions appeared first on CERN Courier.

AI treatments for stroke survivors

cern — Fri, 24 Jan 2025 15:52:08 +0000

Data on strokes is plentiful but fragmented, making it difficult to exploit in data-driven treatment strategies. The toolbox of the high-energy physicist is well adapted to the task. To amplify CERN’s societal contributions through technological innovation, the Unleashing a Comprehensive, Holistic and Patient-Centric Stroke Management for a Better, Rapid, Advanced and Personalised Stroke Diagnosis, Treatment and Outcome Prediction (UMBRELLA) project – co-led by Vall d’Hebron Research Institute and Siemens Healthineers – was officially launched on 1 October 2024. The kickoff meeting in Barcelona, Spain, convened more than 20 partners, including Philips, AstraZeneca, KU Leuven and EATRIS. Backed by nearly €27 million from the EU’s Innovative Health Initiative and industry collaborators, the project aims to transform stroke care across Europe.

The meeting highlighted the urgent need to address stroke as a pressing health challenge in Europe. Each year, more than one million acute stroke cases occur in Europe, with nearly 10 million survivors facing long-term consequences. In 2017, the economic burden of stroke treatments was estimated to be €60 billion – a figure that continues to grow. UMBRELLA’s partners outlined their collective ambition to translate a vast and fragmented stroke data set into actionable care innovations through standardisation and integration.

UMBRELLA will utilise advanced digital technologies to develop AI-powered predictive models for stroke management. By standardising real-world stroke data and leveraging tools like imaging technologies, wearable devices and virtual rehabilitation platforms, UMBRELLA aims to refine every stage of care – from diagnosis to recovery. Based on post-stroke data, AI-driven insights will empower clinicians to uncover root causes of strokes, improve treatment precision and predict patient outcomes, reshaping how stroke care is delivered.

Central to this effort is the integration of CERN’s federated-learning platform, CAFEIN. A decentralised approach to training machine-learning algorithms without exchanging data, it was initiated thanks to seed funding from CERN’s knowledge transfer budget for the benefit of medical applications: now CAFEIN promises to enhance diagnosis, treatment and prevention strategies for stroke victims, ultimately saving countless lives. A main topic of the kickoff meeting was the development of the “U-platform” – a federated data ecosystem co-designed by Siemens Healthineers and CERN. Based on CAFEIN, the infrastructure will enable the secure and privacy preserving training of advanced AI algorithms for personalised stroke diagnostics, risk prediction and treatment decisions without sharing sensitive patient data between institutions. Building on CERN’s expertise, including its success in federated AI modelling for brain pathologies under the EU TRUSTroke project, the CAFEIN team is poised to handle the increasing complexity and scale of data sets required by UMBRELLA.

Beyond technological advancements, the UMBRELLA consortium discussed a plan to establish standardised protocols for acute stroke management, with an emphasis on integrating these protocols into European healthcare guidelines. By improving data collection and facilitating outcome predictions, these standards will particularly benefit patients in remote and underserved regions. The project also aims to advance research into the causes of strokes, a quarter of which remain undetermined – a statistic UMBRELLA seeks to change.

This ambitious initiative not only showcases CERN’s role in pioneering federated-learning technologies but also underscores the broader societal benefits brought by basic science. By pushing technologies beyond the state-of-the-art, CERN and other particle-physics laboratories have fuelled innovations that have an impact on our everyday lives. As UMBRELLA begins its journey, its success holds the potential to redefine stroke care, delivering life-saving advancements to millions and paving the way for a healthier, more equitable future.

The post AI treatments for stroke survivors appeared first on CERN Courier.

Open-science cloud takes shape in Berlin

cern — Fri, 24 Jan 2025 15:16:54 +0000

Findable. Accessible. Interoperable. Reusable. That’s the dream scenario for scientific data and tools. The European Open Science Cloud (EOSC) is a pan-European initiative to develop a web of “FAIR” data services across all scientific fields. EOSC’s vision is to put in place a system for researchers in Europe to store, share, process, analyse and reuse research outputs such as data, publications and software across disciplines and borders.

EOSC’s sixth symposium attracted 450 delegates to Berlin from 21 to 23 October 2024, with a further 900 participating online. Since its launch in 2017, EOSC activities have focused on conceptualisation, prototyping and planning. In order to develop a trusted federation of research data and services for research and innovation, EOSC is being deployed as a network of nodes. With the launch during the symposium of the EOSC EU node, this year marked a transition from design to deployment.

While EOSC is a flagship science initiative of the European Commission, FAIR concerns researchers and stakeholders globally. Via the multiple projects under the wings of EOSC that collaborate with software and data institutes around the world, a pan-European effort can be made to ensure a research landscape that encourages knowledge sharing while recognising work and training the next generation in best practices in research. The EU node – funded by the European Commission, and the first to be implemented – will serve as a reference for roughly 10 additional nodes to be deployed in a first wave, with more to follow. They are accessible using any institutional credentials based on GÉANT’s MyAccess or with an EU login. A first operational implementation of the EOSC Federation is expected by the end of 2025.

A thematic focus of this year’s symposium was the need for clear guidelines on the adaption of FAIR governance for artificial intelligence (AI), which relies on the accessibility of large and high-quality datasets. It is often the case that AI models are trained with synthetic data, large-scale simulations and first-principles mathematical models, although these may only provide an incomplete description of complex and highly nonlinear real-world phenomena. Once AI models are calibrated against experimental data, their predictions become increasingly accurate. Adopting FAIR principles for the production, collection and curation of scientific datasets will streamline the design, training, validation and testing of AI models (see, for example, Y Chen et al. 2021 arXiv:2108.02214).

EOSC includes five science clusters, from natural sciences to social sciences, with a dedicated cluster for particle physics and astronomy called ESCAPE: the European Science Cluster of Astronomy and Particle Physics. The future deployment of the ESCAPE Virtual Research Environment across multiple nodes will provide users with tools to bring together diverse experimental results, for example, in the search for evidence of dark matter, and to perform new analyses incorporating data from complementary searches.

The post Open-science cloud takes shape in Berlin appeared first on CERN Courier.

Data analysis in the age of AI

cern — Wed, 20 Nov 2024 13:50:36 +0000

Experts in data analysis, statistics and machine learning for physics came together from 9 to 12 September at Imperial College London for PHYSTAT’s Statistics meets Machine Learning workshop. The goal of the meeting, which is part of the PHYSTAT series, was to discuss recent developments in machine learning (ML) and their impact on the statistical data-analysis techniques used in particle physics and astronomy.

Particle-physics experiments typically produce large amounts of highly complex data. Extracting information about the properties of fundamental physics interactions from these data is a non-trivial task. The general availability of simulation frameworks makes it relatively straightforward to model the forward process of data analysis: to go from an analytically formulated theory of nature to a sample of simulated events that describe the observation of that theory for a given particle collider and detector in minute detail. The inverse process – to infer from a set of observed data what is learned about a theory – is much harder as the predictions at the detector level are only available as “point clouds” of simulated events, rather than as the analytically formulated distributions that are needed by most statistical-inference methods.

Traditionally, statistical techniques have found a variety of ways to deal with this problem, mostly centered on simplifying the data via summary statistics that can be modelled empirically in an analytical form. A wide range of ML algorithms, ranging from neural networks to boosted decision trees trained to classify events as signal- or background-like, have been used in the past 25 years to construct such summary statistics.

The broader field of ML has experienced a very rapid development in recent years, moving from relatively straightforward models capable of describing a handful of observable quantities, to neural models with advanced architectures such as normalising flows, diffusion models and transformers. These boast millions to billions of parameters that are potentially capable of describing hundreds to thousands of observables – and can now extract features from the data with an order-of-magnitude better performance than traditional approaches.

New generation

These advances are driven by newly available computation strategies that not only calculate the learned functions, but also their analytical derivatives with respect to all model parameters, greatly speeding up training times, in particular in combination with modern computing hardware with graphics processing units (GPUs) that facilitate massively parallel calculations. This new generation of ML models offers great potential for novel uses in physics data analyses, but have not yet found their way to the mainstream of published physics results on a large scale. Nevertheless, significant progress has been made in the particle-physics community in learning the technology needed, and many new developments using this technology were shown at the workshop.

This new generation of machine-learning models offers great potential for novel uses in physics data analyses

Many of these ML developments showcase the ability of modern ML architectures to learn multidimensional distributions from point-cloud training samples to a very good approximation, even when the number of dimensions is large, for example between 20 and 100.

A prime use-case of such ML models is an emerging statistical analysis strategy known as simulation-based inference (SBI), where learned approximations of the probability density of signal and background over the full high-dimensional observables space are used, dispensing with the notion of summary statistics to simplify the data. Many examples were shown at the workshop, with applications ranging from particle physics to astronomy, pointing to significant improvements in sensitivity. Work is ongoing on procedures to model systematic uncertainties, and no published results in particle physics exist to date. Examples from astronomy showed that SBI can give results of comparable precision to the default Markov chain Monte Carlo approach for Bayesian computations, but with orders of magnitude faster computation times.

Beyond binning

A commonly used alternative approach to the full-fledged theory parameter inference from observed data is known as deconvolution or unfolding. Here the goal is publishing intermediate results in a form where the detector response has been taken out, but stopping short of interpreting this result in a particular theory framework. The classical approach to unfolding requires estimating a response matrix that captures the smearing effect of the detector on a particular observable, and applying the inverse of that to obtain an estimate of a theory-level distribution – however, this approach is challenging and limited in scope, as the inversion is numerically unstable, and requires a low dimensionality binning of the data. Results on several ML-based approaches were presented, which either learn the response matrix from modelling distributions outright (the generative approach) or learn classifiers that reweight simulated samples (the discriminative approach). Both approaches show very promising results that do not have the limitations on the binning and dimensionality of the distribution of the classical response-inversion approach.

A third domain where ML is facilitating great progress is that of anomaly searches, where an anomaly can either be a single observation that doesn’t fit the distribution (mostly in astronomy), or a collection of events that together don’t fit the distribution (mostly in particle physics). Several analyses highlighted both the power of ML models in such searches and the bounds from statistical theory: it is impossible to optimise sensitivity for single-event anomalies without knowing the outlier distribution, and unsupervised anomaly detectors require a semi-supervised statistical model to interpret ensembles of outliers.

A final application of machine-learned distributions that was much discussed is data augmentation – sampling a new, larger data sample from a learned distribution. If the synthetic data is significantly larger than the training sample, its statistical power will be greater, but will derive this statistical power from the smooth interpolation of the model, potentially generating so-called inductive bias. The validity of the assumed smoothness depends on its realism in a particular setting, for which there is no generic validation strategy. The use of a generative model amounts to a tradeoff between bias and variance.

Interpretable and explainable

Beyond the various novel applications of ML, there were lively discussions on the more fundamental aspects of artificial intelligence (AI), notably on the notion of and need for AI to be interpretable or explainable. Explainable AI aims to elucidate what input information was used, and its relative importance, but this goal has no unambiguous definition. The discussion on the need for explainability centres to a large extent on trust: would you trust a discovery if it is unclear what information the model used and how it was used? Can you convince peers of the validity of your result? The notion of interpretable AI goes beyond that. It is an often-desired quality by scientists, as human knowledge resulting from AI-based science is generally desired to be interpretable, for example in the form of theories based on symmetries, or structures that are simple, or “low-rank”. However, interpretability has no formal criteria, which makes it an impractical requirement. Beyond practicality, there is also a fundamental point: why should nature be simple? Why should models that describe it be restricted to being interpretable? The almost philosophical nature of this question made the discussion on interpretability one of the liveliest ones in the workshop, but for now without conclusion.

Human knowledge resulting from AI-based science is generally desired to be interpretable

For the longer-term future there are several interesting developments in the pipeline. In the design and training of new neural models, two techniques were shown to have great promise. The first one is the concept of foundation models, which are very large models that are pre-trained by very large datasets to learn generic features of the data. When these pre-trained generic models are retrained to perform a specific task, they are shown to outperform purpose-trained models for that same task. The second is on encoding domain knowledge in the network. Networks that have known symmetry principles encoded in the model can significantly outperform models that are generically trained on the same data.

The evaluation of systematic effects is still mostly taken care of in the statistical post-processing step. Future ML techniques may more fully integrate systematic uncertainties, for example by reducing the sensitivity to these uncertainties through adversarial training or pivoting methods. Beyond that, future methods may also integrate the currently separate step of propagating systematic uncertainties (“learning the profiling”) into the training of the procedure. A truly global end-to-end optimisation of the full analysis chain may ultimately become feasible and computationally tractable for models that provide analytical derivatives.

The post Data analysis in the age of AI appeared first on CERN Courier.

Two charming results of data parking

cern — Mon, 16 Sep 2024 14:03:49 +0000

Fig. 1. Invariant mass distributions of D^*+ → D⁰ π⁺ (left) and D^*– → D⁰π^– (right), where the neutral D meson decays into K⁰_s K⁰_s, with the fit projections overlaid. The difference in signal yields is illustrated by the orange horizontal line. Credit: CMS Collab.

The high data rate at the LHC creates challenges as well as opportunities. Great care is required to identify interesting events, as only a tiny fraction can trigger the detector’s readout. With the LHC achieving record-breaking instantaneous luminosity, the CMS collaboration has innovated to protect and expand its flavour-physics programme, which studies rare decays and subtle differences between particles containing beauty and charm quarks. Enhancements in the CMS data-taking strategy such as “data parking” have enabled the detector to surpass its initial performance limits. This has led to notable advances in charm physics, including CMS’s first analysis of CP violation in the charm sector and achieving world-leading sensitivity to the rare decay of the D⁰ meson into a pair of muons.

Data parking stores subsets of unprocessed data that cannot be processed promptly due to computing limitations. By parking events triggered by a single muon, CMS collected an inclusive sample of approximately 10 billion b-hadrons in 2018. This sample allowed CMS to reconstruct D⁰ and D⁰ decays into a pair of long-lived K⁰_s mesons, which are relatively easy to detect in the CMS detector despite the high level of pileup and the large number of low-momentum tracks.

CP violation is necessary to explain the matter–antimatter asymmetry observed in the universe, but the magnitude of CP violation from known sources is insufficient. Charmed meson decays are the only meson decays involving an up-type quark where CP violation can be studied. CP violation would be evident if the decay rates for D⁰ → K⁰_s K⁰_s and D⁰ → K⁰_s K⁰_s were found to differ. In the analysis, the flavour of the initial D⁰ or D⁰ meson is determined from the charge of the pion accompanying its creation in the decay of a D^*⁺ meson (see figure 1). To eliminate systematic effects arising from the charge asymmetry in production and detector response, the CP asymmetry is measured relative to that in D⁰ → K⁰_s π⁺π^–. The resulting asymmetry is found to be A_CP(K_SK_S) = 6.2% ± 3.0% (stat) ± 0.2% (syst) ± 0.8% (PDG), consistent with no CP violation within 2.0 standard deviations. Previous analyses by LHCb and Belle were consistent with no CP violation within 2.7 and 1.8 standard deviations, respectively. Before data parking, searching for direct CP violation in the charm sector with a fully hadronic final state was deemed unattainable for CMS.

The CMS collaboration has expanded its flavour-physics programme

For Run 3 the programme was enhanced by introducing an inclusive dimuon trigger covering the low mass range up to 8.5 GeV. With improvements in the CMS Tier-0 prompt reconstruction workflow, Run-3 parking data is now reconstructed without delay using the former Run-2 high-level trigger farm at LHC Point 5 and European Tier-1 resources. In 2024 CMS is collecting data at rates seven times higher than the nominal rates for Run 2, already reaching approximately 70% of the nominal trigger rate for the HL-LHC.

Using the data collected in 2022 and 2023, CMS performed a search for the rare D⁰-meson decay into a pair of muons, which was presented at the ICHEP conference in Prague. Rare decays of the charm quark, less explored compared to those of the bottom quark, offer an opportunity to probe new physics effects beyond the direct reach of current colliders, thanks to possible quantum interference by unknown heavy virtual particles. In 2023, the LHCb collaboration set an upper limit for the branching ratio at 3.5 × 10^–9 at a 95% confidence using Run-2 data. CMS surpassed the LHCb result, achieving a sensitivity of 2.6 × 10^–9 at a 95% confidence. Given that the Standard Model prediction is four orders of magnitude smaller, there is still considerable territory to explore.

Beginning with the 2024 run, the CMS flavour-physics programme will gain an additional data stream known as data scouting. This stream captures at very high-rate events triggered by new high-purity single muon level-one triggers in a reduced format. This format is suitable for reconstructing decays of heavy hadrons, offering performance comparable to standard data processing.

The post Two charming results of data parking appeared first on CERN Courier.

ATLAS turbocharges event simulation

cern — Sat, 04 May 2024 15:58:59 +0000

Fig. 1. The number of constituents of high-p_T reconstructed large-radius jets with the hybrid AtlFast3 configuration, and with the two separate components FastCaloSim and FastCaloGAN. The lower panel shows the ratio with respect to the Geant4 simulation. Source: ATLAS Collab. 2022 Comput. Softw. Big Sci. 6 7

As the harvest of data from the LHC experiments continues to increase, so does the required number of simulated collisions. This is a resource-intensive task as hundreds of particles must be tracked through complex detector geometries for each simulated physics collision – and Monte Carlo statistics must typically exceed experimental statistics by a factor of 10 or more, to minimise uncertainties when measured distributions are compared with theoretical predictions. To support data taking in Run 3 (2022–2025), the ATLAS collaboration therefore developed, evaluated and deployed a wide array of detailed optimisations to its detector-simulation software.

The production of simulated data begins with the generation of particles produced within the LHC’s proton–proton or heavy-ion collisions, followed by the simulation of their propagation through the detector and the modelling of the electronics signals from the active detection layers. Considerable computing resources are incurred when hadrons, photons and electrons enter the electromagnetic calorimeters and produce showers with many secondary particles whose trajectories and interactions with the detector material must be computed. The complex accordion geometry of the ATLAS electromagnetic calorimeter makes the Geant4 simulation of the shower development in the calorimeter system particularly compute-intensive, accounting for about 80% of the total simulation time for a typical collision event.

Since computing costs money and consumes electrical power, it is highly desirable to speed up the simulation of collision events without compromising accuracy. For example, considerable CPU resources were previously spent in the transportation of photons and neutrons; this has been mitigated by randomly removing 90% of the photons (neutrons) with energy below 0.5 (2) MeV and scaling up the energy deposited from the remaining 10% of low-energy particles. The simulation of photons in the finely segmented electromagnetic calorimeter took considerable time because the probabilities for each possible interaction process were calculated every time photons crossed a material boundary. That calculation time has been greatly reduced by using a uniform geometry with no photon transport boundaries and by determining the position of simulated interactions using the ratio of the cross sections in the various material layers. The combined effect of the optimisations brings an average speed gain of almost a factor of two.

ATLAS has also successfully used fast-simulation algorithms to leverage the available computational resources. Fast simulation aims at avoiding the compute-expensive Geant4 simulation of calorimeter showers by using parameterised models that are significantly faster and retain most of the physics performance of the more detailed simulation. However, one of the major limitations of the fast simulation employed by ATLAS during Run 2 was the insufficiently accurate modelling of physics observables such as the detailed description of the substructure of jets reconstructed with large-radius clustering algorithms.

AtlFast3 offers fast, high-precision physics simulations

For Run 3, ATLAS has developed a completely redesigned fast simulation toolkit, known as AtlFast3, which performs the simulation of the entire ATLAS detector. While the tracking systems continue to be simulated using Geant4, the energy response in the calorimeters is simulated using a hybrid approach that combines two new tools: FastCaloSim and FastCaloGAN.

FastCaloSim parametrises the longitudinal and lateral development of electromagnetic and hadronic showers, while the simulated energy response from FastCaloGAN is based on generative adversarial neural networks that are trained on pre-simulated Geant4 showers. AtlFast3 effectively combines the strengths of both approaches by selecting the most appropriate algorithm depending on the properties of the shower-initiating particles, tuned to optimise the performance of reconstructed observables, including those exploiting jet substructure. As an example, figure 1 shows that the hybrid AtlFast3 approach models the number of constituents of reconstructed jets as simulated with Geant4 very accurately.

With its significantly improved physics performance and a speedup between a factor of 3 (for Z → ee events) and 15 (for high-p_T di-jet events), AtlFast3 will play a crucial role in delivering high-precision physics simulations of ATLAS for Run 3 and beyond, while meeting the collaboration’s budgetary compute constraints.

The post ATLAS turbocharges event simulation appeared first on CERN Courier.

Next-generation triggers for HL-LHC and beyond

cern — Fri, 03 May 2024 12:56:30 +0000

On your marks ATLAS (top) and CMS (above) events recorded at a collision energy of 13.6 TeV on 5 April, marking the beginning of physics data-taking for 2024. Credit: ATLAS-PHOTO-2024-012-1/CMS-PHO-EVENTS-2024-019-3

The LHC experiments have surpassed expectations in their ability to squeeze the most out of their large datasets, also demonstrating the wealth of scientific understanding to be gained from improvements to data-acquisition pipelines. Colliding proton bunches at a rate of 40 MHz, the LHC produces a huge quantity of data that must be filtered in real-time to levels that are manageable for offline computing and ensuing physics analysis. When the High-Luminosity LHC (HL-LHC) enters operation from 2029, the data rates and event complexity will further increase significantly.

To meet this challenge, the general-purpose LHC experiments ATLAS and CMS are preparing significant detector upgrades, which include improvements in the online filtering or trigger-selection processes. In view of the importance of this step, the collaborations seek to further enhance their trigger and analysis capabilities, and thus their scientific potential, beyond their currently projected scope.

Following a visit by a group of private donors, in 2023 CERN, in close collaboration with the ATLAS and CMS collaborations, submitted a proposal to the Eric and Wendy Schmidt Fund for Strategic Innovation, which resulted in the award of a $48 million grant. The donation laid the foundations of the Next Generation Triggers project, which kicked off in January 2024. The five-year-long project aims to accelerate novel computing, engineering and scientific ideas for the ATLAS and CMS upgrades, also taking advantage of advanced AI techniques, not only in large-scale data analysis and simulation but also embedded in front-end detector electronics. These include quantum-inspired algorithms to improve simulations, and heterogeneous computing architectures and new strategies to optimise the performance of GPU-accelerated experiment code. The project will also provide insight to detectors and data flows for future projects, such as experiments at the proposed Future Circular Collider, while the associated infrastructure will support the advancement of software and algorithms for simulations that are vital to the HL-LHC and future-collider physics programmes. Through the direct involvement of the CERN experimental physics, information technology and theory departments, it is expected that results from the project will bring benefits across the lab’s scientific programme.

The Next Generation Triggers project is broken down into four work packages: infrastructure, algorithms and theory (to improve machine learning-assisted simulation and data collection, develop common frameworks and tools, and better leverage available and new computing infrastructures and platforms); enhancing the ATLAS trigger and data acquisition (to focus on improved and accelerated filtering and exotic signature detection); rethinking the CMS real-time data processing (to extend the use of heterogeneous computing to the whole online reconstruction and to design a novel AI-powered real-time processing workflow to analyse every collision); and education programmes and outreach to engage the community, industry and academia in the ambitious goals of the project, foster and train computing skills in the next generation of high-energy physicists, and complement existing successful community programmes with multi-disciplinary subjects across physics, computing science and engineering.

“The Next Generation Triggers project builds upon and further enhances the ambitious trigger and data acquisition upgrades of the ATLAS and CMS experiments to unleash the full scientific potential of the HL-LHC,” says ATLAS spokesperson Andreas Hoecker.

“Its work packages also benefit other critical areas of the HL-LHC programme, and the results obtained will be valuable for future particle-physics experiments at the energy frontier,” adds Patricia McBride, CMS spokesperson.

CERN will have sole discretion over the implementation of the Next Generation Triggers scientific programme and how the project is delivered overall. In line with its Open Science Policy, CERN also pledges to release all IP generated as part of the project under appropriate open licences.

The post Next-generation triggers for HL-LHC and beyond appeared first on CERN Courier.

Towards an unbiased digital world

cern — Wed, 17 Jan 2024 09:58:45 +0000

What is Open Web Search?

The Open Web Search project was started by a group of people who were concerned that navigation in the digital world is led by a handful of big commercial players (the European search market is largely dominated by Google, for example), who don’t simply offer their services out of generosity but because they want to generate revenue from advertisements. To achieve that they put great effort into profiling users: they analyse what you are searching for and then use this information to create more targeted adverts that create more revenue for them. They also filter search results to present information that fits your world view, to make sure that you come back because you feel at home on those web pages. For some people, and for the European Commission in the context of striving for open access to information and digital sovereignty, as well as becoming independent of US-based tech giants, this is a big concern.

How did the project come about?

In 2017 the founder of the Open Search Foundation reached out to me because I was working on CERN’s institutional search. He had a visionary idea: an open web index that is free, accessible to everyone and completely transparent in terms of the algorithms that it uses. Another angle was to create a valuable resource for building future services, especially data services. Building an index of the web is a massive endeavour, especially when you consider that the estimated total number of web pages worldwide is around 50 billion.

You could argue that unbiased, transparent access to information in the digital world should be on the level of a basic right

A group of technical experts from different institutes and universities, along with the CERN IT department, began with a number of experiments that were used to get a feel for the scale of the project. For example, to see how many web pages a single server can index and to evaluate the open source projects used for crawling and indexing web pages. The results of these experiments were highly valuable when it came to replying to the Horizon Europe funding call later on.

In parallel, we started a conference series, the Open Search Symposia (OSSYM). Two years ago there was a call for funding in the framework of the European Union (EU) Horizon Europe programme dedicated to Open Web search. Together with 13 other institutions and organisations, the CERN IT department participated and we were awarded a grant. We were then able to start the project in September 2022.

Supporting the vision CERN IT department member Andreas Wagner leads work package five “federated data infrastructure” for OpenWebSearch.EU. Credit: Eric Grancher

What are the technical challenges in building a new search engine?

We don’t want to copy what others are doing. For one, we don’t have the resources to build a new, massive data centre. The idea is a more collaborative approach, to have a distributed system where people can join depending on their means and interests. CERN is leading work-package five “federated data infrastructure”, in which we
and our four infrastructure partners (DLR and LRZ in Germany, CSC in Finland and IT4I in the Czech Republic) provide the infrastructure to set up the system that will ultimately allow the index itself to be built in a purely distributed way. At CERN we are running the so-called URL frontier – a system that oversees what is going on in terms of crawling and preparing this index, and has a long list of URLs that should be collected. When running the crawlers, they report back on what they have found on different web pages. It’s basically bookkeeping to ensure that we coordinate activities and don’t duplicate the efforts already made by others.

Open Web Search is said to be based on European values and jurisdiction. Who and what defines these?

That’s an interesting question. Within the project there is a dedicated work package six titled “open web search ecosystem and sustainability” that covers the ethical, legal and societal aspects of open search and addresses the need for building an ecosystem around open search, including the proper governance processes for the infrastructure.

The legal aspect is quite challenging because it is all new territory. The digital world evolves much faster than a legislator can keep up! Information on the web is freely available to anyone, but the moment you start downloading and redistributing it you are taking on ownership and responsibility. So you need you take copyright into account, which is regulated by most EU countries. Criminal law is more delicate in terms of the legal content. Every country has its own rules and there is no conformity. Overall, European values include transparency, fairness for data availability and adhering to democratic core principles. We are aiming at including these European values into the core design of our solution from the very beginning.

What is the status of the project right now?

The project was launched just over a year ago. On the infrastructure side the aim was to have the components in place, meaning having workflows ready and running. It’s not fully automated yet and there is still a lot of challenging work to do, but we have a fully functional set-up, so some institutes have been able to start crawling; they feed the data and it gets stored and distributed to the participating infrastructure partners including CERN. At the CERN data centre we coordinate the crawling efforts and provide advanced monitoring. As we go forward, we will work on aspects of scalability so that there won’t be any problems when we go bigger.

Ad-free The Open Web Search project aims to develop a search tool that provides unbiased, transparent information based on European values. Credit: openwebsearch.eu/NASA/Unsplash

What would a long-term funding model look like for this project?

You could argue that unbiased, transparent access to information in the digital world that has become so omnipresent in our daily lives should be on the level of a basic right. With that in mind, one could imagine a governmental funding scheme. Additionally, this index would be open to companies that can use it to build commercial applications on top of it, and for this use-case a back-charging model might be suitable. So, I could imagine a combination of public and usage-based funding.

In October last year the Open Search Symposium was hosted by the CERN IT department. What was the main focus there?

This is purposely not focused on one single aspect but is an interdisciplinary meeting. Participants include researchers, data centres, libraries, policy makers, legal and ethical experts, and society. This year we had some brilliant keynote speakers such as Věra Jourová, the vice president of the European Commission for Values and Transparency, and Christoph Schumann from LAION, a non-profit organisation that looks to democratise artificial intelligence models.

Ricardo Baeza-Yates (Institute for Experiential Artificial Intelligence, Northeastern University) gave a keynote speech about “Bias in Search and Recommender Systems” and Angella Ndaka (The Centre for Africa Epistemic Justice and University of Otago) talked about “Inclusion by whose terms? When being in doesn’t mean digital and web search inclusion”, the challenges of providing equal access to information to all parts of the world. We also had some of the founders of alternative search engines joining, and it was very interesting and inspiring to see what they are working on. And we had representatives from different universities looking at how research is advancing in different areas.

I see the purpose of Open Web Search as being an invaluable investment in the future

In general, OSSYM 2023 was about a wide range of topics related to internet search and information access in the digital world. We will shortly publish the proceedings of the nearly 25 scientific papers that were submitted and presented.

How realistic is it for this type of search engine to compete with the big players?

I don’t see it as our aim or purpose to compete with the big players. They have unlimited resources so they will continue what they are doing now. I see the purpose of Open Web Search as being an invaluable investment in the future. The Open Web Index could pave the way for upcoming competitors, creating new ideas and questioning the monopoly or gatekeeper roles of the big players. This could make accessing digital information more competitive and a fairer marketplace. I like the analogy of cartography: in the physical world, having access to (unbiased) maps is a common good. If you compare maps from different suppliers you still get basically the same information, which you can rely on. At present, in the digital world there is no unbiased, independent cartography available. For instance, if you look up the way to travel from Geneva to Paris online, you might have the most straightforward option suggested to you, but you might also be pointed towards diversions via restaurants, where you then might consider stopping for a drink or some food, all to support a commercial interest. An unbiased map of the digital world should give you the opportunity to decide for yourself where and how you wish to get to your destination.

The project will also help CERN to improve its own search capabilities and will provide an open-science search across CERN’s multiple information repositories. For me, it’s nice to think that we are helping to develop this tool at the place where the web was born. We want to make sure, just as CERN gave the web to the world, that this is a public right and to steer it in the right direction.

The post Towards an unbiased digital world appeared first on CERN Courier.

Machine-learning speedup for HL-LHC

cern — Wed, 17 Jan 2024 09:36:31 +0000

The fourth edition of the Fast Machine Learning for Science Workshop was hosted by Imperial College London from 25 to 28 September 2023, marking its first venture outside the US. The series was launched in response to the need for microsecond-speed machine-learning inference for the High-Luminosity LHC (HL-LHC) detectors, in particular in the hardware trigger systems of the ATLAS and CMS experiments. Achieving this level of speed requires non-standard and generally custom hardware platforms, which are traditionally very challenging to program. While machine learning is becoming widespread in society, this ultrafast niche is not well served by commercial tools. Consequently, particle physicists have developed tools, techniques and an active community in this area.

The workshop gathered almost 200 scientists and engineers in a hybrid format. Students, including undergraduates, and early-career researchers were strongly represented, as were key industry partners. A strong aim of the conference was to engage scientific communities outside particle physics to develop areas where the tools and techniques from particle physics could be game-changing.

The workshop focused on current and emerging techniques and scientific applications for deep learning and inference acceleration, including novel methods for efficient algorithm design, ultrafast on-detector inference and real-time systems. Acceleration as a service, hardware platforms, coprocessor technologies, distributed learning and hyper-parameter optimisation. The four-day event consisted of three workshop-style days with invited and contributed talks, and a final day dedicated to technical demonstrations and satellite meetings.

The tools and techniques from particle physics could be game-changing

The interdisciplinary nature of the workshop – which encompassed particle physics, free electron lasers, nuclear fusion, astrophysics, computer science and biology – made for a varied and interesting agenda. Attendees heard talks on how fast machine learning is being harnessed to speed up the identification of gravitational waves, and how it is needed to handle the high data rates and fast turnaround of experiments at free-electron lasers. In the medical arena, speakers addressed the need for faster image processing and data analysis for diagnosis and treatment, and the use of fast machine learning in biology to search for known and unknown features in large, heterogeneous datasets. The use of machine learning in control systems and simulations was discussed in the context of laser-driven accelerators and nuclear-fusion experiments, while in theoretical physics the application of machine learning to solve the electron wave equation in condensed matter, working towards a detailed and fundamental understanding of superconductivity, was presented.

Industry partners including AMD, Graphcore, Groq and Intel discussed current- and future-generation hardware platforms and architectures, and facilitated tutorials on their development toolchains. Researchers from Groq and Graphcore presented their latest dedicated chips for artificial-intelligence applications and showed that they have interesting applications to problems in particle physics, weather forecasting, protein folding, fluid dynamics, materials science and solving partial differential equations. AMD and Intel demonstrated the flexibility of their FPGA platforms and explained how to optimise them for scientific machine-learning applications.

A highlight of the social programme was a public lecture from Grammy Award-winning rapper Lupe Fiasco, who discussed his work with Google on large-language models. The workshop will return to the US next year, before landing in Zurich in 2025.

The post Machine-learning speedup for HL-LHC appeared first on CERN Courier.

ALICE ups its game for sustainable computing

cern — Thu, 24 Aug 2023 08:46:48 +0000

The Large Hadron Collider (LHC) roared back to life on 5 July 2022, when proton–proton collisions at a record centre-of-mass energy of 13.6 TeV resumed for Run 3. To enable the ALICE collaboration to benefit from the increased instantaneous luminosity of this and future LHC runs, the ALICE experiment underwent a major upgrade during Long Shutdown 2 (2019–2022) that will substantially improve track reconstruction in terms of spatial precision and tracking efficiency, in particular for low-momentum particles. The upgrade will also enable an increased interaction rate of up to 50 kHz for lead–lead (PbPb) collisions in continuous readout mode, which will allow ALICE to collect a data sample more than 10 times larger than the combined Run 1 and Run 2 samples.

ALICE is a unique experiment at the LHC devoted to the study of extreme nuclear matter. It comprises a central barrel (the largest data producer) and a forward muon “arm”. The central barrel relies mainly on four subdetectors for particle tracking: the new inner tracking system (ITS), which is a seven-layer, 12.5 gigapixel monolithic silicon tracker (CERN Courier July/August 2021 p29); an upgraded time projection chamber (TPC) with GEM-based readout for continuous operation; a transition radiation detector; and a time-of-flight detector. The muon arm is composed of three tracking devices: a newly installed muon forward tracker (a silicon tracker based on monolithic active pixel sensors), revamped muon chambers and a muon identifier.

Due to the increased data volume in the upgraded ALICE detector, storing all the raw data produced during Run 3 is impossible. One of the major ALICE upgrades in preparation for the latest run was therefore the design and deployment of a completely new computing model: the O² project, which merges online (synchronous) and offline (asynchronous) data processing into a single software framework. In addition to an upgrade of the experiment’s computing farms for data readout and processing, this necessitates efficient online compression and the use of graphics processing units (GPUs) to speed up processing.

Pioneering parallelism

As their name implies, GPUs were originally designed to accelerate computer-graphics rendering, especially in 3D gaming. While they continue to be utilised for such workloads, GPUs have become general-purpose vector processors for use in a variety of settings. Their intrinsic ability to perform several tasks simultaneously gives them a much higher compute throughput than traditional CPUs and enables them to be optimised for data processing rather than, say, data caching. GPUs thus reduce the cost and energy consumption of associated computing farms: without them, about eight times as many servers of the same type and other resources would be required to handle the ALICE TPC online processing of PbPb collision data at a 50 kHz interaction rate.

Data flow Overview of the ALICE detector dataflow. All the detector front-end cards are read out and the readout nodes are connected to the EPN farm hosted in the surface container, where data processing takes place on GPUs and the output is then transferred to the CERN distributed storage system, EOS. Credit: ALICE

Since 2010, when the high-level trigger online computer farm (HLT) entered operation, the ALICE detector has pioneered the use of GPUs for data compression and processing in high-energy physics. The HLT had direct access to the detector readout hardware and was crucial to compress data obtained from heavy-ion collisions. In addition, the HLT software framework was advanced enough to perform online data reconstruction. The experience gained during its operation in LHC Run 1 and 2 was essential for the design and development of the current O² software and hardware systems.

For data readout and processing during Run 3, the ALICE detector front-end electronics are connected via radiation-tolerant gigabit-transceiver links to custom field programmable gate arrays (see “Data flow” figure). The latter, hosted in the first-level processor (FLP) farm nodes, perform continuous readout and zero-suppression (the removal of data without physics signal). In the case of the ALICE TPC, zero-suppression reduces the data rate from a prohibitive 3.3 TB/s at the front end to 900 GB/s for 50 kHz minimum-bias PbPb operations. This data stream is then pushed by the FLP readout farm to the event processing nodes (EPN) using data-distribution software running on both farms.

Located in three containers on the surface close to the ALICE site, the EPN farm currently comprises 350 servers, each equipped with eight AMD GPUs with 32 GB of RAM each, two 32-core AMD CPUs and 512 GB of memory. The EPN farm is optimised for the fastest possible TPC track reconstruction, which constitutes the bulk of the synchronous processing, and provides most of its computing power in the form of GPU processing. As data flow from the front end into the farms and cannot be buffered, the EPN computing capacity must be suﬃcient for the highest data rates expected during Run 3.

Having pioneered the use of GPUs in high-energy physics for more than a decade, ALICE now employs GPUs heavily to speed up online and offline processing

Due to the continuous readout approach at the ALICE experiment, processing does not occur on a particular “event” triggered by some characteristic pattern in detector signals. Instead, all data is read out and stored during a predefined time slot in a time frame (TF) data structure. The TF length is usually chosen as a multiple of one LHC orbit (corresponding to about 90 microseconds). However, since a whole TF must always ﬁt into the GPU’s memory, the collaboration chose to use 32 GB GPU memory to grant enough flexibility in operating with different TF lengths. In addition, an optimisation effort was put in place to reuse GPU memory in consecutive processing steps. During the proton run in 2022 the system was stressed by increasing the proton collision rates beyond those needed in order to maximise the integrated luminosity for physics analyses. In this scenario the TF length was chosen to be 128 LHC orbits. Such high-rate tests aimed to reproduce occupancies similar to the expected rates of PbPb collisions. The experience of ALICE demonstrated that the EPN processing could sustain rates nearly twice the nominal design value (600 GB/s) originally foreseen for PbPb collisions. Using high-rate proton collisions at 2.6 MHz the readout reached 1.24 TB/s, which was fully absorbed and processed on the EPNs. However, due to ﬂuctuations in centrality and luminosity, the number of TPC hits (and thus the required memory size) varies to a small extent, demanding a certain safety margin.

Flexible compression

At the incoming raw-data rates during Run 3, it is impossible to store the data – even temporarily. Hence, the outgoing data is compressed in real time to a manageable size on the EPN farm. During this network transfer, event building is carried out by the data distribution suite, which collects all the partial TFs sent by the detectors and schedules the building of the complete TF. At the end of the transfer, each EPN node receives and then processes a full TF containing data from all ALICE detectors.

Parallel processing A batch of GPUs manufactured by AMD, which are used extensively in the new ALICE computing model. Credit: ALICE collab.

The detector generating by far the largest data volume is the TPC, contributing more than 90% to the total data size. The EPN farm compresses this to a manageable rate of around 100 GB/s (depending on the interaction rate), which is then stored to the disk buffer. The TPC compression is particularly elaborate, employing several steps including a track-model compression to reduce the cluster entropy before the entropy encoding. Evaluating the TPC space-charge distortion during data taking is also the most computing-intensive aspect of online calibrations, requiring global track reconstruction for several detectors. At the increased Run 3 interaction rate, processing on the order of one percent of the events is sufficient for the calibration.

During data taking, the EPN system operates synchronously and the TPC reconstruction fully loads the GPUs. With the EPN farm providing 90% of its compute performance via GPUs, it is also desirable to maximise the GPU utilisation in the asynchronous phase. Since the relative contribution of the TPC processing to the overall workload is much smaller in the asynchronous phase, GPU idle times would be high and processing would be CPU-limited if the TPC part only ran on the GPUs. To use the GPUs maximally, the central-barrel asynchronous reconstruction software is being implemented with native GPU support. Currently, around 60% of the workload can run on a GPU, yielding a speedup factor of about 2.25 compared to CPU-only processing. With the full adaptation of the central-barrel tracking software to the GPU, it is estimated that 80% of the reconstruction workload could be processed on GPUs.

In contrast to synchronous processing, asynchronous processing includes the reconstruction of data from all detectors, and all events instead of only a subset; physics analysis-ready objects produced from asynchronous processing are then made available on the computing Grid. As a result, the processing workload for all detectors, except the TPC, is significantly higher in the asynchronous phase. For the TPC, clustering and data compression are not necessary during asynchronous processing, while the tracking runs on a smaller input data set because some of the detector hits were removed during data compression. Consequently, TPC processing is faster in the asynchronous phase than in the synchronous phase. Overall, the TPC contributes significantly to asynchronous processing, but is not dominant. The asynchronous reconstruction will be divided between the EPN farm and the Grid sites. While the ﬁnal distribution scheme is still to be decided, the plan is to split reconstruction between the online computing farm, the Tier 0 and the Tier 1 sites. During the LHC shutdown periods, the EPN farm nodes will almost entirely be used for asynchronous processing.

Great shape

In 2021, during the first pilot-beam collisions at injection energy, synchronous processing was running and successfully commissioned. In 2022 it was used during nominal LHC operations, where ALICE performed online processing of pp collisions at a 2.6 MHz inelastic interaction rate. At lower interaction rates (both for pp and PbPb collisions), ALICE ran additional processing tasks on free EPN resources, for instance online TPC charged-particle energy-loss determination, which would not be possible at the full 50 kHz PbPb collision rate. The particle-identification performance is demonstrated in the figure “Particle ID”, in which no additional selections on the tracks or detector calibrations were applied.

Particle ID The performance of the ALICE TPC for particle identification by energy loss, dE/dx. Credit: ALICE collab.

Another performance metric used to assess the quality of the online TPC reconstruction is the charged-particle tracking efficiency. The efficiency for reconstructing tracks from PbPb collisions at a centre-of-mass energy of 5.52 TeV per nucleon pair ranges from 94–100% for p_T > 0.1 GeV/c. Here the fake-track rate is rather negligible, however the clone rate increases significantly for low-p_T primary tracks due to incomplete track merging of very low-momentum particles that curl in the ALICE solenoidal field and leave and enter the TPC multiple times.

The effective use of GPU resources provides extremely efficient processors. Additionally, GPUs deliver improved data quality and compute cost and efficiency – aspects that have not been overlooked by the other LHC experiments. To manage their data rates in real time, LHCb developed the Allen project, a first-level trigger processed entirely on GPUs that reduces the data rate prior to the alignment, calibration and final reconstruction steps by a factor of 30–60. With this approach, 4 TB/s are processed in real time, with 10 GB of the most interesting collisions selected for physics analysis.

At the beginning of Run 3, the CMS collaboration deployed a new HLT farm comprising 400 CPUs and 400 GPUs. With respect to a traditional solution using only CPUs, this configuration reduced the processing time of the high-level trigger by 40%, improved the data-processing throughput by 80% and reduced the power consumption of the farm by 30%. ATLAS uses GPUs extensively for physics analyses, especially for machine-learning applications. Focus has also been placed on data processing, anticipating that in the following years much of that can be offloaded to GPUs. For all four LHC experiments, the future use of GPUs is crucial to reduce the cost, size and power consumption within the higher luminosities of the LHC.

Having pioneered the use of GPUs in high-energy physics for more than a decade, ALICE now employs GPUs heavily to speed up online and offline processing. Today, 99% of synchronous processing is performed on GPUs, dominated by the largest contributor, the TPC.

More code

On the other hand, only about 60% of asynchronous processing (for 650 kHz pp collisions) is currently running on GPUs, i.e. offline data processing on the EPN farm. For asynchronous processing, even if the TPC is still an important contributor to the compute load, there are several other subdetectors that are important. In fact, there is an ongoing effort to port considerably more code to the GPUs. Such an effort will increase the fraction of GPU-accelerated code to beyond 80% for full barrel tracking. Eventually ALICE aims to run 90% of the whole asynchronous processing on GPUs.

Hard tracking Visualisation of a 2 ms time frame of PbPb collisions at a 50 kHz interaction rate in the ALICE TPC, showing tracks from different primary collisions in different colours. Credit: ALICE collab.

In November 2022 the upgraded ALICE detectors and central systems saw PbPb collisions for the first time during a two-day pilot run at a collision rate of about 50 Hz. High-rate PbPb processing was validated by injecting Monte Carlo data into the readout farm and running the whole data processing chain on 230 EPN nodes. Due to the TPC data volumes being somewhat larger than initially expected, this stress test is now being revalidated with continuously optimised TPC firmware using 350 EPN nodes together with the final TPC firmware to provide the required 20% compute margin with respect to foreseen 50 kHz PbPb operations in October 2023. Together with the upgraded detector components, the ALICE experiment has never been in better shape to probe extreme nuclear matter during the current and future LHC runs.

The post ALICE ups its game for sustainable computing appeared first on CERN Courier.

Report explores quantum computing in particle physics

cern — Wed, 23 Aug 2023 08:35:11 +0000

Quantum potential A quantum computer built by IBM based on superconducting qubits. Credit: IBM

Researchers from CERN, DESY, IBM Quantum and more than 30 other organisations have published a white paper identifying activities in particle physics that could benefit from quantum-computing technologies. Posted on arXiv on 6 July, the 40 page-long paper is the outcome of a working group set up at the QT4HEP conference held at CERN last November, which identified topics in theoretical and experimental high-energy physics where quantum algorithms may produce significant insights and results that are very hard or even not accessible by classical computers.

Combining quantum and information theory, quantum computing is natively aligned with the underlying physics of the Standard Model. Quantum bits, or qubits, are the computational representation of a state that can be entangled and brought into superposition. Once measured, qubits do not represent discrete numbers 0 and 1 as their classical counterparts, but a probability ranging from 0 to 1. Hence quantum-computing algorithms can be exploited to achieve computational advantages in terms of speed and accuracy, especially for processes that are yet to be understood.

“Quantum computing is very promising, but not every problem in particle physics is suited to this model of computing,” says Alberto Di Meglio, head of IT Innovation at CERN and one of the white paper’s lead authors alongside Karl Jansen of DESY and Ivano Tavernelli of IBM Quantum. “It’s important to ensure that we are ready and that we can accurately identify the areas where these technologies have the potential to be most useful.”

Neutrino oscillations in extreme environments, such as supernovae, are one promising example given. In the context of quantum computing, neutrino oscillations can be considered strongly coupled many-body systems that are driven by the weak interaction. Even a two-flavour model of oscillating neutrinos is almost impossible to simulate exactly for classical computers, making this problem well suited for quantum computing. The report also identifies lattice-gauge theory and quantum field theory in general as candidates that could enjoy a quantum advantage. The considered applications include quantum dynamics, hybrid quantum/classical algorithms for static problems in lattice gauge theory, optimisation and classification problems.

With quantum computing we address problems in those areas that are very hard to tackle with classical methods

In experimental physics, potential applications range from simulations to data analysis and include jet physics, track reconstruction and algorithms used to simulate the detector performance. One key advantage here is the speed up in processing time compared to classical algorithms. Quantum-computing algorithms might also be better at finding correlations in data, while Monte Carlo simulations could benefit from random numbers generated by a quantum computer.

“With quantum computing we address problems in those areas that are very hard – or even impossible – to tackle with classical methods,” says Karl Jansen (DESY). “We can now explore physical systems to which we still do not have access.”

The working group will meet again at CERN for a special workshop on 16 and 17 November, immediately before the Quantum Techniques in Machine Learning conference from 19 to 24 November.

The post Report explores quantum computing in particle physics appeared first on CERN Courier.

Event displays in motion

cern — Wed, 05 Jul 2023 10:24:43 +0000

The first event displays in particle physics were direct images of traces left by particles when they interacted with gases or liquids. The oldest event display of an elementary particle, published in Charles Wilson’s Nobel lecture from 1927 and taken between 1912 and 1913, showed a trajectory of an electron. It was a trail made by small droplets caused by the interaction between an electron coming from cosmic rays and gas molecules in a cloud chamber, the trajectory being bent due to the electrostatic field (see “First light” figure). Bubble chambers, which work in a similar way to cloud chambers but are filled with liquid rather than gas, were key in proving the existence of neutral currents 50 years ago, along with many other important results. In both cases a particle crossing the detector triggered a camera that took photographs of the trajectories.

Following the discovery of the Higgs boson in particular, outreach has become another major pillar of event displays

Georges Charpak’s invention of the multi-wire proportional chamber in 1968, which made it possible to distinguish single tracks electronically, paved the way for three-dimensional (3D) event displays. With 40 drift chambers, and computers able to process the large amounts of data produced by the UA1 detector at the SppS, it was possible to display the tracks of decaying W and Z bosons along the beam axis, aiding their 1983 discovery (see “Inside events” figure, top).

Design guidelines

With the advent of LEP and the availability of more powerful computers and reconstruction software, physicists knew that the amount of data would increase to the point where displaying all of it would make pictures incomprehensible. In 1995 members of the ALEPH collaboration released guidelines – implemented in a programme called Dali, which succeeded Megatek – to make event displays as easy to understand as possible, and the same principles apply today. To make them better match human perception, two different layouts were proposed: the wire-frame technique and the fish-eye transformation. The former shows detector elements via a rendering of their shape, resulting in a 3D impression (see “Inside events” figure, bottom). However, the wire-frame pictures needed to be simplified when too many trajectories and detector layers were available. This gave rise to the fish-eye view, or projection in x versus y, which emphasised the role of the tracking system. The remaining issue of superimposed detector layers was mitigated by showing a cross section of the detector in the same event display (see “Inside events” figure, middle). Together with a colour palette that helped distinguish the different objects, such as jets, from one other, these design principles prevailed into the LHC era.

First light The image of an electron track in a cloud chamber that is widely considered to be the first ever event display. Credit: C T R Wilson

The LHC not only took data acquisition, software and analysis algorithms to a new level, but also event displays. In a similar vein to LEP, the displays used to be more of a debugging tool for the experiments to visualise events and see how the reconstruction software and detector work. In this case, a static image of the event is created and sent to the control room in real time, which is then examined by experts for anomalies, for example due to incorrect cabling. “Visualising the data is really powerful and shows you how beautiful the experiment can be, but also the brutal truth because it can tell you something that does not work as expected,” says ALICE’s David Dobrigkeit Chinellato. “This is especially important after long shutdowns or the annual year-end-technical stops.”

Largely based on the software used to create event displays at LEP, each of the four main LHC experiments developed their own tools, tailored to their specific analysis software (see “LHC returns” figure). The detector geometry is loaded into the software, followed by the event data; if the detector layout doesn’t change, the geometry is not recreated. As at LEP, both fish-eye and wire-frame images are used. Thanks to better rendering software and hardware developments such as more powerful CPUs and GPUs, wire-frame images are becoming ever more realistic (see “LHC returns” figure). Computing developments and additional pileup due to increased collisions have motivated more advanced event displays. Driven by the enthusiasm of individual physicists, and in time for the start of the LHC Run 3 ion run in October 2022, ALICE experimentalists have began to use software that renders each event to give it a more realistic and crisper view (see “Picture perfect” image). In particular, in lead–lead collisions at 5.36 TeV per nucleon pair measured with ALICE, the fully reconstructed tracks are plotted to achieve the most efficient visualisation.

Inside events (Top) The UA1 event display of the first Z boson that was detected, taken using drift chambers. Calorimeter hits are also shown in the display, which was created using the “Megatek” system. (Middle) The iconic event display design used at the ALEPH experiment, showing the fish-eye view that is still in use today. (Bottom) The wire-frame technique used during the LEP era. Credits: CERN-EX-8704168; ALEPH Collaboration; CERN-ECP-95-25

ATLAS also uses both fish-eye and wire-frame views. Their current event-display framework, Virtual Point 1 (VP1), creates interactive 3D event displays and integrates the detector geometry to draw a selected set of particle passages through the detector. As with the other experiments, different parts of the detector can be added or removed, resulting in a sliced view. Similarly, CMS visualises their events using in-house software known as Fireworks, while LHCb has moved from a traditional view using Panoramix software to a 3D one using software based on Root TEve.

In addition, ATLAS, CMS and ALICE have developed virtual-reality views. VP1, for instance, allows data to be exported in a format that is used for videos and 3D images. This enables both physicists and the public to fully immerse themselves in the detector. CMS physicists created a first virtual-reality version during a hackathon, which took place at CERN in 2016 and integrated this feature with small modifications in their application used for outreach. ALICE’s augmented-reality application “More than ALICE”, which is intended for visitors, overlays the description of detectors and even event displays, and works on mobile devices.

Phoenix rising

To streamline the work on event displays at CERN, developers in the LHC experiments joined forces and published a visualisation whitepaper in 2017 to identify challenges and possible solutions. As a result it was decided to create an experiment-agnostic event display, later named Phoenix. “When we realised the overlap of what we are doing across many different experiments, we decided to develop a flexible browser-based framework, where we can share effort and leverage our individual expertise, and where users don’t need to install any special software,” says main developer Edward Moyse of ATLAS. While experiment-specific frameworks are closely tied to the experiments’ data format and visualise all incoming data, experiment-agnostic frameworks only deal with a simplified version of the detectors and a subset of the event data. This makes them lightweight and fast, and requires an extra processing step as the experimental data need to be put into a generic format and thus lose some detail. Furthermore, not every experiment has the symmetric layout of ATLAS and CMS. This applies to LHCb, for instance.

LHC returns Event displays of the first LHC Run 3 collisions using the ATLAS VP1 (left) and CMS iSpy (right) display systems. Credits: ATLAS_VP1_136TeV_run427394_evt3038977_2022-07-05T17-02-31_v5; CMS-PHO-EVENTS-2022-030-4

Phoenix initially supported the geometry and event- display formats for LHCb and ATLAS, but those for CMS were added soon after and now FCC has joined. The platform had its first test in 2018 with the TrackML computing challenge using a fictious High-Luminosity LHC (HL-LHC) detector created with Phoenix. The main reason to launch this challenge was to find new machine-learning algorithms that can deal with the unprecedented increase in data collection and pile-up in detectors expected during the HL-LHC runs, and at proposed future colliders.

Painting outreach

Following the discovery of the Higgs boson in particular, outreach has become another major pillar of event displays. Visually pleasing images and videos of particle collisions, which help in the communication of results, are tailor made for today’s era of social media and high-bandwidth internet connections. “We created a special event display for the LHCb master class,” mentions LHCb’s Ben Couturier. “We show the students what an event looks like from the detector to the particle tracks.” CMS’s iSpy application is web-based and primarily used for outreach and CMS masterclasses, and has also been extended with a virtual-reality application. “When I started to work on event displays around 2007, the graphics were already good but ran in dedicated applications,” says CMS’s Tom McCauley. “For me, the big change is that you can now use all these things on the web. You can access them easily on your mobile phone or your laptop without needing to be an expert on the specific software.”

Into the future Event displays from LHCb (left) and the simulated HL-LHC detector for the TrackML challenge made using Phoenix (right). Credits: LHCb Collaboration; TrackML

Being available via a browser means that Phoenix is a versatile tool for outreach as well as physics. In cases or regions where the necessary bandwidth to create event displays is sparse, pre-created events can be used to highlight the main physics objects and to display the detector as clearly as possible. Another new way to experience a collision and to immerse fully into an event is to wear virtual-reality goggles.

An even older and more experiment-agnostic framework than Phoenix using virtual-reality experiences exists at CERN, and is aptly called TEV (Total Event Display). Formerly used to show event displays in the LHC interactive tunnel as well as in the Microcosm exhibition, it is now used at the CERN Globe and the new Science Gateway centre. There, visitors will be able to play a game called “proton football”, where the collision energy depends on the “kick” the players give their protons. “This game shows that event displays are the best of both worlds,” explains developer Joao Pequenao of CERN. “They inspire children to learn more about physics by simply playing a soccer game, and they help physicists to debug their detectors.”

The post Event displays in motion appeared first on CERN Courier.

RadiaSoft rewrites the rulebook on modelling and optimisation of multi-stage accelerators

cern — Wed, 14 Jun 2023 16:08:24 +0000

Design and optimisation Sirepo aggregates a portfolio of codes to facilitate the modelling of beam dynamics for a range of particle acceleration schemes. Dedicated applications within Sirepo include more specific simulation software covering use-cases for magnets (above), X-ray beamlines, control systems, plasmas, neutronics and machine learning. (Courtesy: RadiaSoft)

“Robust software solutions that deliver step-function advances in accelerator design, engineering and research productivity.” Writ large, that’s the unifying goal shaping the day-to-day work of physicists and software engineers at RadiaSoft, a Boulder, Colorado-based company that specialises in the provision of high-level research, design and scientific consulting services in beamline physics, accelerator science and associated machine-learning technologies. Theirs is a wide-ranging remit that spans electron linacs, free-electron lasers (FELs), synchrotron radiation generated in electron rings, high-intensity proton synchrotrons and accumulator rings, with the RadiaSoft team bringing creativity and novel solutions to the design and simulation of high-power particle beams and directed radiation.

Founded in 2013, RadiaSoft has spent the past decade establishing a heavyweight customer base with organisations large and small. Customers include US Department of Energy (DOE) national laboratories (among them Fermilab, Brookhaven National Laboratory and SLAC National Accelerator Laboratory); academic research centres (at institutions such as UCLA and Texas A&M University); as well as industry partners like Best Medical, Modern Hydrogen and Radiabeam.

“As an R&D company, we work across an expansive canvas encompassing accelerator physics, enabling platform technologies and engineering for multi-stage accelerator facilities,” Jonathan Edelen, an accelerator physicist and president of RadiaSoft, told CERN Courier. “We work collaboratively with our scientific customers – often funded under the US government’s Small Business Innovation Research (SBIR) programme – and help them tackle physics and engineering problems that they don’t have the expertise or time to solve on their own.”

According to Edelen, the secret of RadiaSoft’s success lies in combining “broad and deep technical domain knowledge” with a wealth of prior project experience at large-scale accelerator facilities plus the development of customised solutions for smaller operations. “That collective know-how,” he said, “allows us to tackle all manner of design and optimisation problems in accelerator science. We work on projects ranging from fundamental studies of esoteric beam dynamics to the simulation and design of RF control systems to the development of machine-learning algorithms for accelerator controls.”

Software innovation

All of this core expertise in accelerator physics and engineering doesn’t stand alone in the RadiaSoft corporate portfolio. Underpinning the consultancy’s value proposition is a software development team that majors on the realisation of intuitive, powerful graphical user interfaces for open-source, high-performance simulation codes. This same team – which aggregates many years of experience working with all sorts of simulation packages for accelerator physics and subsystems – is the driving force behind RadiaSoft’s flagship product Sirepo.

Sirepo is a browser-based, computer-aided engineering (CAE) gateway with graphical user interface. The platform supports over a dozen community-developed particle-accelerator simulation codes, a specialised machine-learning application, a controls-modelling application and a private JupyterHub instance. Users can build simulations with drag-and-drop inputs, collaborate in real-time with shareable URLs, or download their simulations to work from the command line.

Jonathan Edelen: “Sirepo Omega is all about convergence, automation and enhanced workflow efficiency.” (Courtesy: RadiaSoft)

“Sirepo makes it easier for users to learn and run these simulation codes, many of which can be difficult to work with,” explained Edelen. “Equally important, RadiaSoft handles all of the dependency packaging, installation and versioning of the codes within Sirepo, while our cloud-based computing architecture allows users to share their simulations by simply sending a URL to colleagues.”

At a granular level, Sirepo aggregates a portfolio of codes to facilitate the modelling of beam dynamics for a range of particle acceleration schemes. Users can employ elegant, for example, to study varied configurations such as linacs and rings, while OPAL can model electron guns, beamlines and linacs with space charge, wake fields and coherent synchrotron radiation. Meanwhile, MAD-X is used for large-scale lattice design, with options for optimisation and export for use in other tracking codes within Sirepo. Users can also delve into time-dependent, 3D modelling for FELs, with Genesis offering simulation for single-pass FELs and extensions to cover oscillators or multi-stage set-ups. Sirepo’s interoperability means that the user can easily coordinate these codes for an end-to-end simulation of the beamline.

Other dedicated applications within Sirepo act as a repository for more specific simulation software for use-cases with magnets, X-ray beamlines, control systems, plasmas, neutronics and machine learning. “The codes within Sirepo are used in the design and optimisation of accelerator facilities worldwide,” noted Edelen, “and are relevant through the full life-cycle of an accelerator – from new-build through commissioning, acceptance and regular operations.”

Next up in the evolution of the RadiaSoft simulation environment is Sirepo Omega. Due to go live over the summer, this unified workflow manager will enable scientific users to create series simulations within the Sirepo “sandbox” – the outputs from one simulation automatically flowing along the simulation chain to generate a final result (with users, if they wish, able to probe individual simulations on a localised basis throughout the workflow).

“Sirepo Omega is all about convergence, automation and enhanced workflow efficiency,” said Edelen. For example, all coordinate transformations between different simulations are handled automatically by Sirepo Omega, while users can track particles automatically through all the sub-simulations. “The user will also get plots and summarised outputs in one convenient location, so there’s no need to go digging into each application to retrieve the results,” Edelen added.

Optimisation is better by design

Another foundational building block of the RadiaSoft service offering is rsopt, a Python framework for testing and running black-box optimisation problems in accelerator science and engineering. As such, the rsopt library and workflow manager give scientists a high degree of modularity when it comes to the choice of optimisation algorithms and code execution methods (including easy-to-use functionality for logging and saving results). Integration with Sirepo also enables templating for a number of codes related to particle accelerator simulation, making it straightforward to parse existing input files and use them without any additional modification.

“As well as ease-of-use and maintainability, rsopt combines a lot of powerful utilities for typical accelerator optimisation problems,” explained Chris Hall, senior scientist at RadiaSoft. A case study in this regard might be an accelerator code representing the design for an FEL – a common requirement being to minimise the bunch length or maximise the peak current while minimising emittance growth.

Chris Hall: “rsopt combines a lot of powerful utilities for typical accelerator optimisation problems.” (Courtesy: RadiaSoft)

A recent RadiaSoft collaboration involved the use of the rsopt library for the simulation and training of a non-destructive beam diagnostic capable of characterising the transverse and longitudinal profiles of ultrashort, high-brightness electron beams in FELs and next-generation colliders. “One of the attractions of rsopt,” added Hall, “is the ability to carry out various samplings to generate data for machine-learning models or to get a better understanding of your problem space.”

What’s more, interoperability is hard-wired into rsopt to enable platform-independent execution and scaling to massively parallel systems. “This means users can easily move their code execution between computational environments and utilise algorithms from multiple libraries without having to refactor their own code,” noted Hall.

The future’s bright for RadiaSoft. Having established a sustainable R&D business model in North America over the past 10 years, international expansion is the next step on the group’s commercial roadmap. “Watch this space,” Edelen concluded. “The task for us right now is finding the right path to grow our R&D activity by collaborating with scientists and engineers at large-scale accelerator facilities in Europe.”

For more information about Sirepo, sign up on CERN Courier for the RadiaSoft webinar End-to-end simulation of particle accelerators using Sirepo (19 July).

The post RadiaSoft rewrites the rulebook on modelling and optimisation of multi-stage accelerators appeared first on CERN Courier.

End-to-end simulation of particle accelerators using Sirepo

cern — Tue, 13 Jun 2023 07:49:41 +0000

By clicking the “Watch now” button you will be taken to our third-party webinar provider in order to register your details.

Want to learn more on this subject?

Watch now

This webinar will give a high-level overview of how scientists can model particle accelerators using Sirepo, an open-source scientific computing gateway.

The speaker, Jonathan Edelen, will work through examples using three of Sirepo’s applications that best highlight the different modelling regimes for simulating a free-electron laser.

Want to learn more on this subject?

Watch now

Jonathan Edelen, president, earned a PhD in accelerator physics from Colorado State University, after which he was selected for the prestigious Bardeen Fellowship at Fermilab. While at Fermilab he worked on RF systems and thermionic cathode sources at the Advanced Photon Source. Currently, Jon is focused on building advanced control algorithms for particle accelerators including solutions involving machine learning.

The post End-to-end simulation of particle accelerators using Sirepo appeared first on CERN Courier.

A powerful eye opener into the world of AI

cern — Fri, 03 Mar 2023 12:05:12 +0000

The appearance of the word “for” rather than “in” in the title of this collection raises the bar from an academic description to a primer. It is neither the book’s length (more than 800 pages), nor the fact that the author list resembles a who’s who in artificial intelligence (AI) research carried out in high-energy physics that makes this book live up to its premise; it is the careful crafting of its content and structure.

Artificial intelligence is not new to our field. On the contrary, some of the concepts and algorithms have been pioneered in high-energy physics. Artificial Intelligence for High Energy Physics credits this as well as reaching into very recent AI research. It covers topics ranging from unsupervised machine-learning techniques in clustering to workhorse tools such as boosted decision trees in analyses, and from recent applications of AI in event reconstruction to simulations at the boundary where AI can help us to understand physics.

Each chapter follows a similar structure: after setting the broader context, a short theoretical introduction into the tools (and, where possible, the available software) is given, which is then applied and adapted to a high-energy physics problem. The ratio of in-depth theoretical background to AI concepts and the focus on applications is well balanced, and underlines the work of the editors, who avoided duplication and cross-reference individual chapters and topics. The editors and authors have not only created a selection of high-quality review articles, but a coherent and remarkably good read. Takeaway messages in the chapter for distributed training and optimisation stand out, and one might wish that this concept found more resonance throughout the book.

Credit: World Scientific

Sometimes, the book can be used as a glossary, which helps to bridge the gaps that seem to exist simply because high-energy physicists and data scientists use different names for similar or even identical things. While the book can certainly be used as a guide for a physicist in AI, an AI researcher with the necessary physics knowledge may not be served quite so well.

In an ideal world, each chapter would have a reference dataset to allow the reader to follow the stated problems and learn through building and exercising the described pipelines. This, however, would turn the book from a primer into a textbook for AI in high-energy physics. To be fair, wherever possible the authors of the chapters have used and referred to publicly available datasets, and one chapter is devoted to the issue of arranging a community data competition, such as the TrackML challenge in 2018.

As for the most important question – have I learned something new? – the answer is a resounding “yes”. While none of the broad topics and their application to high-energy physics will come as a surprise to those who have been following the field in recent years, there are neat projects and detailed applications showcased in this book. Furthermore, reading about a familiar topic in someone else’s words can be a powerful eye opener.

The post A powerful eye opener into the world of AI appeared first on CERN Courier.

Combining quantum with high-energy physics

cern — Tue, 10 Jan 2023 12:01:15 +0000

From 1 to 4 November, the first International Conference on Quantum Technologies for High-Energy Physics (QT4HEP) was held at CERN. With 224 people attending in person and many more following online, the event brought together researchers from academia and industry to discuss recent developments and, in particular, to identify activities within particle physics that can benefit most from the application of quantum technologies.

Opening the event, Joachim Mnich, CERN director for research and computing, noted that CERN is widely recognised, including by its member states, as an important platform for promoting applications of quantum technologies for both particle physics and beyond. “The journey has just begun, and the road is still long,” he said, “but it is certain that deep collaboration between physicists and computing experts will be key in capitalising on the full potential of quantum technologies.”

The conference was organised by the CERN Quantum Technology Initiative (CERN QTI), which was established in 2020, and followed a successful workshop on quantum computing in 2018 that marked the beginning of a range of new investigations into quantum technologies at CERN. CERN QTI covers four main research areas: quantum theory and simulation; quantum sensing, metrology and materials; quantum computing and algorithms; and quantum communication and networks. The first day’s sessions focused on the first two: quantum theory and simulation, as well as quantum sensing, metrology and materials. Topics covered included the quantum simulation of neutrino oscillations, scaling up atomic interferometers for the detection of dark matter, and the application of quantum traps and clocks to new-physics searches.

Building partnerships

Participants showed an interest in broadening collaborations related to particle physics. Members of the quantum theory and quantum sensing communities discussed ways to identify and promote areas of promise relevant to CERN’s scientific programme. It is clear that many detectors in particle physics can be enhanced – or even made possible – through targeted R&D in quantum technologies. This fits well with ongoing efforts to implement a chapter on quantum technologies in the European Committee for Future Accelerators’ R&D roadmap for detectors, noted Michael Doser, who coordinates the branch of CERN QTI focused on sensing, metrology and materials.

For the theory and simulation branch of CERN QTI, the speakers provided a useful overview of quantum machine learning, quantum simulations of high-energy collider events and neutrino processes, as well as making quantum-information studies of wormholes testable on a quantum processor. Elina Fuchs, who coordinates this branch of CERN QTI, explained how quantum advantages have been found for toy models of increased physical relevance. Furthermore, she said, developing a dictionary that relates interactions at high energies to lower energies will enhance knowledge about new-physics models learned from quantum-sensing experiments.

The conference demonstrated the clear potential of different quantum technologies to impact upon particle-physics research

The second day’s sessions focused on the remaining two areas, with talks on quantum-machine learning, noise gates for quantum computing, the journey towards a quantum internet, and much more. These talks clearly demonstrated the importance of working in interdisciplinary, heterogeneous teams when approaching particle-physics research with quantum-computing techniques. The technical talks also showed how studies on the algorithms are becoming more robust, with a focus on trying to address problems that are as realistic as possible.

A keynote talk from Yasser Omar, president of the Portuguese Quantum Institute, presented the “fleet” of programmes on quantum technologies that has been launched since the EU Quantum Flagship was announced in 2018. In particular, he highlighted QuantERA, a network of 39 funding organisations from 31 countries; QuIC, the European Quantum Industry Consortium; EuroQCI, the European Quantum Communication Infrastructure; EuroQCS, the European Quantum Computing and Simulation Infrastructure; and the many large national quantum initiatives being launched across Europe. The goal, he said, is to make Europe autonomous in quantum technologies, while remaining open to international collaboration. He also highlighted the role of World Quantum Day – founded in 2021 and celebrated each year on 14 April – in raising awareness around the world of quantum science.

Jay Gambetta, vice president of IBM Quantum, gave a fascinating talk on the path to quantum computers that exceed the capabilities of classical computers. “Particle physics is a promising area for looking for near-term quantum advantage,” he said. “Achieving this is going to take both partnership with experts in quantum information science and particle physics, as well as access to tools that will make this possible.”

Industry and impact

The third day’s sessions – organised in collaboration with CERN’s knowledge transfer group – were primarily dedicated to industrial co-development. Many of the extreme requirements faced by quantum technologies are shared with particle physics, such as superconducting materials, ultra-high vacuum, precise timing, and much more. For this reason, CERN has built up a wealth of expertise and specific technologies that can directly address challenges in the quantum industry. CERN strives to maximise the impact of all of its technologies and know-how on society in many ways to ease the transfer of CERN’s knowledge to industry and society. One focus is to see which technologies might help to build robust quantum-computing devices. Already, CERN’s White Rabbit technology, which provides sub-nanosecond accuracy and picosecond precision of synchronisation for the LHC accelerator chain, has found its way to the quantum community, noted Han Dols, business development and entrepreneurship section leader.

Several of the day’s talks focused on challenges around trapped ions and control systems. Other topics covered included the potential of quantum computing for drug development, measuring brain function using quantum sensors, and developing specialised instrumentation for quantum computers. Representatives of several start-up companies, as well as from established technology leaders, including Intel, Atos and Roche, spoke during the day. The end of the third day was dedicated to crucial education, training and outreach initiatives. Google provided financial support for 11 students to attend the conference, and many students and researchers presented posters.

Marieke Hood, executive director for corporate affairs at the Geneva Science and Diplomacy Anticipator (GESDA) foundation, also gave a timely presentation about the recently announced Open Quantum Institute (OQI). CERN is part of a coalition of science and industry partners proposing the creation of this institute, which will work to ensure that emerging quantum technologies tackle key societal challenges. It was launched at the 2022 GESDA Summit in October, during which CERN Director-General Fabiola Gianotti highlighted the potential of quantum technologies to help achieve key UN Sustainable Development Goals. “The OQI acts at the interface of science and diplomacy,” said Hood. “We’re proud to count CERN as a key partner for OQI, its experience of multinational collaboration will be most useful to help us achieve these ambitions.”

The final day of the conference was dedicated to hands-on workshops with three different quantum-computing providers. In parallel, a two-day meeting of the “Quantum Computing 4HEP” working group, organised by CERN, DESY and the IBM Quantum Network, took place.

Qubit by qubit

Overall, the QT4HEP conference demonstrated the clear potential of different quantum technologies to impact upon particle-physics research. Some of these technologies are here today, while others are still a long way off. Targeted collaboration across disciplines and the academia–industry interface will help ensure that CERN’s research community is ready to maximise on the potential of these technologies.

“Widespread quantum computing may not be here yet, but events like this one provide a vital platform for assessing the opportunities this breakthrough technology could deliver for science,” said Enrica Porcari, head of the CERN IT department. “Through this event and the CERN QTI, we are building on CERN’s tradition of bringing communities together for open discussion, exploration, co-design and co-development of new technologies.”

The post Combining quantum with high-energy physics appeared first on CERN Courier.

Connecting the dots with neural networks

cern — Tue, 08 Nov 2022 13:51:59 +0000

Going deep A timely and valuable look at this fast-evolving field. Credit: Placidplace/Pixabay

The use of deep learning in particle physics has exploded in recent years. Based on INSPIRE HEP’s database, the number of papers in high-energy physics and related fields referring to deep learning and similar topics has grown 10-fold over the last decade. A textbook introducing these concepts to physics students is therefore timely and valuable.

When teaching deep learning to physicists, it can be difficult to strike a balance between theory and practice, physics and programming, and foundations and state-of-the-art. Born out of a lecture series at RWTH Aachen and Hamburg universities, Deep Learning for Physics Research by Martin Erdmann, Jonas Glombitza, Gregor Kasieczka and Uwe Klemradt does an admiral job of striking this balance.

The book contains 21 chapters split across four parts: deep-learning basics, standard deep neural-networks, interpretability and uncertainty quantification, and advanced concepts.

In part one, the authors cover introductory topics including physics data, neural-network building blocks, training and model building. Part two surveys and applies different neural-network structures, including fully connected, convolutional, recurrent and graph neural-networks, while also reviewing multi-task learning. Part three covers introspection, interpretability, uncertainty quantification, and revisits different objective functions for a variety of learning tasks. Finally, part four touches on weakly supervised and unsupervised learning methods, generative models, domain adaptation and anomaly detection. Helping to lower the barrier to entry for physics students to use deep learning in their work, the authors contextualise these methods in real physics-research studies, which is an added benefit compared to similar textbooks.

Deep learning borrows many concepts from physics, which can provide a way of connecting similar ideas in the two fields. A nice example explained in the book is the cross-entropy loss function, which has its origins in the definition of entropy according to Gibbs and Boltzmann. Another example that crops up, although rather late in part three, is the connection between the mean-squared-error loss function and the log-likelihood function for a Gaussian probability distribution, which may be more familiar to physics students accustomed to performing maximum likelihood fits.

Hands-on

Accompanying the textbook is a breadth of free, online Jupyter notebooks (executable Python code in an interactive format), which are available at http://deeplearningphysics.org. These curated notebooks are paired with different chapters and immerse students in hands-on exercises. Both the problem and corresponding solution notebooks are available online, and are accessible to students even without expensive computing hardware as they can be launched on free cloud services such as Google Colab or Binder. In addition, students who have a CERN account can launch the notebooks on CERN’s service for web-based analysis (SWAN) platform.

Credit: World Scientific

Advanced exercises include the training and evaluation of a denoising autoencoder for speckle removal in X-ray images and a Wasserstein generative adversarial network for the generation of cosmic-ray-induced air-shower footprints. What is truly exciting about these exercises is their use of physics research examples, many taken from recent publications. Students can see how close their homework exercises and solutions are to cutting-edge research, which can be highly motivating.

In a book spanning less than 300 pages (excluding references), it is impossible to cover everything, especially as new deep-learning methods are developed almost daily. For a more theoretical understanding of the fundamentals of deep learning, readers are advised to consult the classic Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville, while for more recent deep-learning developments in particle physics they are directed to the article “A Living Review of Machine Learning for Particle Physics” by Matthew Feickert and Benjamin Nachman.

With continued interest in deep learning, coverage of a variety of real physics-research examples and a breadth of accessible, online exercises, Deep Learning in Physics Research is poised to be a standard textbook on the bookshelf of physics students for years to come.

The post Connecting the dots with neural networks appeared first on CERN Courier.

Tracing molecules at the vacuum frontier

cern — Mon, 05 Sep 2022 09:11:50 +0000

Fusion flow Thermal-radiation calculation for a huge (1 m diameter) cryopump at ITER, based on a modified version of Molflow called McCryoT, which exploits similarities between heat and molecular transfer. Credit: R Kersevan/CERN

In particle accelerators, large vacuum systems guarantee that the beams travel as freely as possible. Despite being one 25-trillionth the density of Earth’s atmosphere, however, a tiny concentration of gas molecules remain. These pose a problem: their collisions with accelerated particles reduce the beam lifetime and induce instabilities. It is therefore vital, from the early design stage, to plan efficient vacuum systems and predict residual pressure profiles.

Surprisingly, it is almost impossible to find commercial software that can carry out the underlying vacuum calculations. Since the background pressure in accelerators (of the order 10^–9–10^–12 mbar) is so low, molecules rarely collide with one other and thus the results of codes based on computational fluid dynamics aren’t valid. Although workarounds exist (solving vacuum equations analytically, modelling a vacuum system as an electrical circuit, or taking advantage of similarities between ultra-high-vacuum and thermal radiation), a CERN-developed simulator “Molflow”, for molecular flow, has become the de-facto industry standard for ultra-high-vacuum simulations.

Instead of trying to analytically solve the surprisingly difficult gas behaviour over a large system in one step, Molflow is based on the so-called test-particle Monte Carlo method. In a nutshell: if the geometry is known, a single test particle is created at a gas source and “bounced” through the system until it reaches a pump. Then, repeating this millions of times, with each bounce happening in a random direction, just like in the real world, the program can calculate the hit-density anywhere, from which the pressure is obtained.

The idea for Molflow emerged in 1988 when the author (RK) visited CERN to discuss the design of the Elettra light source with CERN vacuum experts (see “From CERN to Elettra, ESRF, ITER and back” panel). Back then, few people could have foreseen the numerous applications outside particle physics that it would have. Today, Molflow is used in applications ranging from chip manufacturing to the exploration of the Martian surface, with more than 1000 users worldwide and many more downloads from the dedicated website.

Molflow in space

While at CERN we naturally associate ultra-high vacuum with particle accelerators, there is another domain where operating pressures are extremely low: space. In 2017, after first meeting at a conference, a group from German satellite manufacturer OHB visited the CERN vacuum group, interested to see our chemistry lab and the cleaning process applied to vacuum components. We also demoed Molflow for vacuum simulations. It turned out that they were actively looking for a modelling tool that could simulate specific molecular-contamination transport phenomena for their satellites, since the industrial code they were using had very limited capabilities and was not open-source.

Molflow has complemented NASA JPL codes to estimate the return flux during a series of planned fly-bys around Jupiter’s moon Europa

A high-quality, clean mirror for a space telescope, for example, must spend up to two weeks encapsulated in the closed fairing from launch until it is deployed in orbit. During this time, without careful prediction and mitigation, certain volatile compounds (such as adhesive used on heating elements) present within the spacecraft can find their way to and become deposited on optical elements, reducing their reflectivity and performance. It is therefore necessary to calculate the probability that molecules migrate from a certain location, through several bounces, and end up on optical components. Whereas this is straightforward when all simulation parameters are static, adding chemical processes and molecule accumulation on surfaces required custom development. Even though Molflow could not handle these processes “out of the box”, the OHB team was able to use it as a basis that could be built on, saving the effort of creating the graphical user interface and the ray-tracing parts from scratch. With the help of CERN’s knowledge-transfer team, a collaboration was established with the Technical University of Munich: a “fork” in the code was created; new physical processes specific to their application were added; and the code was also adapted to run on computer clusters. The work was made publicly available in 2018, when Molflow became open source.

From CERN to Elettra, ESRF, ITER and back

Synchrotron simulation Molflow simulation of the synchrotron-radiation power density on a “crotch absorber”, which protects downstream UHV vacuum chambers in a storage ring. Credit: M Ady/CERN

Molflow emerged in 1988 during a visit to CERN from its original author (RK), who was working at the Elettra light source in Trieste at the time. CERN vacuum expert Alberto Pace showed him a computer code written in Fortran that enabled the trajectories of particles to be calculated, via a technique called ray tracing. On returning to Trieste, and realising that the CERN code couldn’t be run there due to hardware and software incompatibilities, RK decided to rewrite it from scratch. Three years later the code was formally released. Once more, credit must be given to CERN for having been the birthplace of new ideas for other laboratories to develop their own applications.

Molflow was originally written in Turbo Pascal, had (black and white) graphics, and visualised geometries in 3D – even allowing basic geometry editing and pressure plots. While today such features are found in every simulator, at the time the code stood out and was used in the design of several accelerator facilities, including the Diamond Light Source, Spallation Neutron Source, Elettra, Alba and others – as well as for the analysis of a gas-jet experiment for the PANDA experiment at GSI Darmstadt. That said, the early code had its limitations. For example, the upper limit of user memory (640 kB for MS-DOS) significantly limited the number of polygons used to describe the geometry, and it was single-processor.

In 2007 the original code was given a new lease of life at the European Synchrotron Radiation Facility in Grenoble, where RK had moved as head of the vacuum group. Ported to C++, multi-processor capability was added, which is particularly suitable for Monte Carlo calculations: if you have eight CPU cores, for example, you can trace eight molecules at the same time. OpenGL (Open Graphics Library) acceleration made the visualisation very fast even for large structures, allowing the usual camera controls of CAD editors to be added. Between 2009 and 2011 Molflow was used at ITER, again following its original author, for the design and analysis of vacuum components for the international tokamak project.

In 2012 the project was resumed at CERN, where RK had arrived the previous year. From here, the focus was on expanding the physics and applications: ray-tracing terms like “hit density” and “capture probability” were replaced with real-world quantities such as pressure and pumping speed. To publish the code within the group, a website was created with downloads, tutorial videos and a user forum. Later that year, a sister code “Synrad” for synchrotron-radiation calculations, also written in Trieste in the 1990s, was ported to the modern environment. The two codes could, for the first time, be used as a package: first, a synchrotron-radiation simulation could determine where light hits a vacuum chamber, then the results could be imported to a subsequent vacuum simulation to trace the gas desorbed from the chamber walls. This is the so-called photon-stimulated desorption effect, which is a major hindrance to many accelerators, including the LHC.

Molflow and Synrad have been downloaded more than 1000 times in the past year alone, and anonymous user metrics hint at around 500 users who launch it at least once per month. The code is used by far the most in China, followed by the US, Germany and Japan. Switzerland, including users at CERN, places only fifth. Since 2018, the roughly 35,000-line code has been available open-source and, although originally written for Windows, it is now available for other operating systems, including the new ARM-based Macs and several versions of Linux.

One year later, the Contamination Control Engineering (CCE) team from NASA’s Jet Propulsion Laboratory (JPL) in California reached out to CERN in the context of its three-stage Mars 2020 mission. The Mars 2020 Perseverance Rover, built to search for signs of ancient microbial life, successfully landed on the Martian surface in February 2021 and has collected and cached samples in sealed tubes. A second mission plans to retrieve the cache canister and launch it into Mars orbit, while a third would locate and capture the orbital sample and return it to Earth. Each spacecraft experiences and contributes to its own contamination environment through thruster operations, material outgassing and other processes. JPL’s CCE team performs the identification, quantification and mitigation of such contaminants, from the concept-generation to the end-of-mission phase. Key to this effort is the computational physics modelling of contaminant transport from materials outgassing, venting, leakage and thruster plume effects.

Contamination consists of two types: molecular (thin-film deposition effects) and particulate (producing obscuration, optical scatter, erosion or mechanical damage). Both can lead to degradation of optical properties and spurious chemical composition measurements. As more sensitive space missions are proposed and built – particularly those that aim to detect life – understanding and controlling outgassing properties requires novel approaches to operating thermal vacuum chambers.

Just like accelerator components, most spacecraft hardware undergoes long-duration vacuum baking at relatively high temperatures to reduce outgassing. Outgassing rates are verified with quartz crystal microbalances (QCMs), rather than vacuum gauges as used at CERN. These probes measure the resonance frequency of oscillation, which is affected by the accumulation of adsorbed molecules, and are very sensitive: a 1 ng deposition on 1 cm² of surface de-tunes the resonance frequency by 2 Hz. By performing free-molecular transport simulations in the vacuum-chamber test environment, measurements by the QCMs can be translated to outgassing rates of the sources, which are located some distance from the probes. For these calculations, JPL currently uses both Monte Carlo schemes (via Molflow) and “view factor matrix” calculations (through in-house solvers). During one successful Molflow application (see “Molflow in space” image, top) a vacuum chamber with a heated inner shroud was simulated, and optimisation of the chamber geometry resulted in a factor-40 increase of transmission to the QCMs over the baseline configuration.

From SPHEREx to LISA

Another JPL project involving free molecular-flow simulations is the future near-infrared space observatory SPHEREx (Spectro-Photometer for the History of the Universe and Ices Explorer). This instrument has cryogenically cooled optical surfaces that may condense molecules in vacuum and are thus prone to significant performance degradation from the accumulation of contaminants, including water. Even when taking as much care as possible during the design and preparation of the systems, some elements, such as water, cannot be entirely removed from a spacecraft and will desorb from materials persistently. It is therefore vital to know where and how much contamination will accumulate. For SPHEREx, water outgassing, molecular transport and adsorption were modelled using Molflow against internal thermal predictions, enabling a decontamination strategy to keep its optics free from performance-degrading accumulation (see “Molflow in space” image, left). Molflow has also complemented other NASA JPL codes to estimate the return flux (whereby gas particles desorbing from a spacecraft return to it after collisions with a planet’s atmosphere) during a series of planned fly-bys around Jupiter’s moon Europa. For such exospheric sampling missions, it is important to distinguish the actual collected sample from return-flux contaminants that originated from the spacecraft but ended up being collected due to atmospheric rebounds.

Molflow in space Above: a vacuum-chamber simulation performed for NASA’s Mars 2020 mission. Left: designing a decontamination solution for NASA’s SPHEREx mission, due for launch in 2024. Credit: JPL Contamination Control Engineering

It is the ability to import large, complex geometries (through a triangulated file format called STL, used in 3D printing and supported by most CAD software) that makes Molflow usable for JPL’s molecular transport problems. In fact, the JPL team “boosted” our codes with external post-processing: instead of built-in visualisation, they parsed the output file format to extract pressure data on individual facets (polygons representing a surface cell), and sometimes even changed input parameters programmatically – once again working directly on Molflow’s own file format. They also made a few feature requests, such as adding histograms showing how many times molecules bounce before adsorption, or the total distance or time they travel before being adsorbed on the surfaces. These were straightforward to implement, and because JPL’s scientific interests also matched those of CERN users, such additions are now available for everyone in the public versions of the code. Similar requests have come from experiments employing short-lived radioactive beams, such as those generated at CERN’s ISOLDE beamlines. Last year, against all odds during COVID-related restrictions, the JPL team managed to visit CERN. While showing the team around the site and the chemistry laboratory, they held a seminar for our vacuum group about contamination control at JPL, and we showed the outlook for Molflow developments.

Our latest space-related collaboration, started in 2021, concerns the European Space Agency’s LISA mission, a future gravitational-wave interferometer in space (see CERN Courier September/October 2022 p51). Molflow is being used to analyse data from the recently completed LISA Pathfinder mission, which explored the feasibility of keeping two test masses in gravitational free-fall and using them as inertial sensors by measuring their motion with extreme precision. Because the satellite’s sides have different temperatures, and because the gas sources are asymmetric around the masses, there is a difference in outgassing between two sides. Moreover, the gas molecules that reach the test mass are slightly faster on one side than the other, resulting in a net force and torque acting on the mass, of the order of femtonewtons. When such precise inertial measurements are required, this phenomenon has to be quantified, along with other microscopic forces, such as Brownian noise resulting from the random bounces of molecules on the test mass. To this end, Molflow is currently being modified to add molecular force calculations for LISA, along with relevant physical quantities such as noise and resulting torque.

Sky’s the limit

High-energy applications Top left: transmission probability calculation between two ports (red) of a gas-injection device. Top right: simulation of a slice of LHC beam pipe that has two surfaces: an outer “cold bore” at 1.9 K and a warmer 20 K inner beam-screen that catches the heat caused by beam- and radiation-induced processes, and contains pumping holes that enable particles to pass between the surfaces. Bottom: time-dependent simulation of a pressure wave due to an electric spark in an RF cavity for CLIC, creating a burst of molecules in the tiny cavity that may interfere with the following electron bunch. Credit: CERN/C Garion

Molflow has proven to be a versatile and effective computational physics model for the characterisation of free-molecular flow, having been adopted for use in space exploration and the aerospace sector. It promises to continue to intertwine different fields of science in unexpected ways. Thanks to the ever-growing gaming industry, which uses ray tracing to render photorealistic scenes of multiple light sources, consumer-grade graphics cards started supporting ray-tracing in 2019. Although intended for gaming, they are programmable for generic purposes, including science applications. Simulating on graphics-processing units is much faster than traditional CPUs, but it is also less precise: in the vacuum world, tiny imprecisions in the geometry can result in “leaks” or some simulated particles crossing internal walls. If this issue can be overcome, the speedup potential is huge. In-house testing carried out recently at CERN by PhD candidate Pascal Bahr demonstrated a speedup factor of up to 300 on entry-level Nvidia graphics cards, for example.

Our latest space-related collaboration concerns the European Space Agency’s LISA mission

Another planned Molflow feature is to include surface processes that change the simulation parameters dynamically. For example, some getter films gradually lose their pumping ability as they saturate with gas molecules. This saturation depends on the pumping speed itself, resulting in two parameters (pumping speed and molecular surface saturation) that depend on each other. The way around this is to perform the simulation in iterative time steps, which is straightforward to add but raises many numerical problems.

Finally, a much-requested feature is automation. The most recent versions of the code already allow scripting, that is, running batch jobs with physics parameters changed step-by-step between each execution. Extending these automation capabilities, and adding export formats that allow easier post-processing with common tools (Matlab, Excel and common Python libraries) would significantly increase usability. If adding GPU ray tracing and iterative simulations are successful, the resulting – much faster and more versatile – Molflow code will remain an important tool to predict and optimise the complex vacuum systems of future colliders.

The post Tracing molecules at the vacuum frontier appeared first on CERN Courier.

Multidisciplinary CERN forum tackles AI

cern — Tue, 21 Dec 2021 11:07:04 +0000

Machine-learning expert Anima Anandkumar of Caltech and NVIDIA speaking in the Globe at CERN on 20 September. Credit: CERN-PHOTO-202109-132-52

The inaugural Sparks! Serendipity Forum attracted 49 leading computer scientists, policymakers and related experts to CERN from 17 to 18 September for a multidisciplinary science-innovation forum. In this first edition, participants discussed a range of ethical and technical issues related to artificial intelligence (AI), which has deep and developing importance for high-energy physics and its societal applications. The structure of the discussions was designed to stimulate interactions between AI specialists, scientists, philosophers, ethicists and other professionals with an interest in the subject, leading to new insights, dialogue and collaboration between participants.

World-leading cognitive psychologist Daniel Kahneman opened the public part of the event by discussing errors in human decision making, and their impact on AI. He explained that human decision making will always have bias, and therefore be “noisy” in his definition, and asked whether AI could be the solution, pointing out that AI algorithms might not be able to cope with the complexity of decisions that humans have to make. Others speculated as to whether AI could ever achieve the reproducibility of human cognition – and if the focus should shift from searching for a “missing link” to considering how AI research is actually conducted by making the process more regulated and transparent.

Introspective AI

Participants discussed both the advantages and challenges associated with designing introspective AI, which is capable of examining its own processes and could be beneficial in making predictions about the future. Participants also questioned, however, whether we should be trying to make AI more self-aware and human-like. Neuroscientist Ed Boyden explored introspection through the lens of neural pathways, and asked whether we can design introspective AI before we understand introspection in brains. Following the introspection theme, philosopher Luisa Damiano addressed the reality versus fiction of “social-embodied” AI – the idea of robots interacting with us in our physical world – arguing that such a possibility would require careful ethical considerations.

AI is already a powerful, and growing, tool for particle physics

Many participants advocated developing so-called “strong” AI technology that can solve problems it has not come across before, in line with specific and targeted goals. Computer scientist Max Welling explored the potential for AI to exceed human intelligence, and suggested that AI can potentially be as creative as humans, although further research is required.

On the subject of ethics, Anja Kaspersen (former director of the UN Office for Disarmament Affairs) asked: who makes the rules? Linking to military, humanitarian and technological affairs, she considered how our experience in dealing with nuclear weapons could help us deal with the development of AI. She said that AI is prone to ethics washing: the process of creating an illusory sense that ethical issues are being appropriately addressed when they are not. Participants agreed that we should seek to avoid polarising the community when considering risks associated with current and future AI, and suggested a more open approach to deal with the challenges faced by AI today and tomorrow. Skype co-founder Jann Tallin identified AI as one of the most worrying existential risks facing society today; the fact that machines do not consider whether their decisions are unethical demands that we consider the constraints of the AI design space within the realm of decision making.

Fruits of labour

The initial outcomes of the Sparks! Serendipity Forum are being written up as a CERN Yellow Report, and at least one paper will be submitted to the journal Machine Learning Science and Technology. Time will tell what other fruits of the serendipitous interactions at Sparks! will bring. One thing is certain, however, AI is already a powerful, and growing, tool for particle physics. Without it, the LHC experiments’ analyses would have been much more tortuous, as discussed by Jennifer Ngadiuba and Maurizio Pierini (CERN Courier September/October 2021 p31)

Future editions of the Sparks! Serendipity Forum will tackle different themes in science and innovation that are relevant to CERN’s research. The 2022 event will be built around future health technologies, including the many accelerator, detector and simulation technologies that are offshoots of high-energy-physics research.

The post Multidisciplinary CERN forum tackles AI appeared first on CERN Courier.

CERN unveils roadmap for quantum technology

cern — Thu, 04 Nov 2021 14:25:58 +0000

Credit: CERN

Launched one year ago, the CERN Quantum Technology Initiative (QTI) will see high-energy physicists and others play their part in a global effort to bring about the next “quantum revolution”, whereby phenomena such as superposition and entanglement are exploited to build novel computing, communication, sensing and simulation devices (CERN Courier September/October 2020 p47).

On 14 October, the CERN QTI coordination team announced a strategy and roadmap to establish joint research, educational and training activities, set up a supporting resource infrastructure, and provide dedicated mechanisms for exchange of knowledge and technology. Oversight for the CERN QTI will be provided by a newly established advisory board composed of international experts nominated by CERN’s 23 Member States.

As an international, open and neutral platform, describes the roadmap document, CERN is uniquely positioned to act as an “honest broker” to facilitate cross-disciplinary discussions between CERN Member States and to foster innovative ideas in high-energy physics and beyond. This is underpinned by several R&D projects that are already under way at CERN across four main areas: quantum computing and algorithms; quantum theory and simulation; quantum sensing, metrology and materials; and quantum communication and networks. These projects target applications such as quantum-graph neural networks for track reconstruction, quantum support vector machines for particle classification, and quantum generative adversarial networks for physics simulation, as well as new sensors and materials for future detectors, and quantum-key-distribution protocols for distributed data analysis.

Education and training are also at the core of the CERN QTI. Building on the success of its first online course on quantum computing, the initiative plans to extend its academia–industry training programme to build competencies across different R&D and engineering activities for the new generation of scientists, from high-school students to senior researchers.

Co-chairs of the CERN QTI advisory board, Kerstin Borras and Yasser Omar, stated: “The road map builds on high-quality research projects already ongoing at CERN, with top-level collaborations, to advance a vision and concrete steps to explore the potential of quantum information science and technologies for high-energy physics”.

The post CERN unveils roadmap for quantum technology appeared first on CERN Courier.

Having the right connections is key

cern — Thu, 04 Nov 2021 14:06:52 +0000

Focused Radio astronomer Philip Diamond has been SKAO director-general since October 2012. Credit: SKAO

Having led the SKAO for almost a decade, how did it feel to get the green light for construction in June this year?

The project has been a long time in gestation and I have invested much of my professional life in the SKA project. When the day came, I was 95% confident that the SKAO council would give us the green light to proceed, as we were still going through ratification processes in national parliaments. I sent a message to my senior team saying: “This is the most momentous week of my career” because of the collective effort of so many people in the observatory and across the entire partnership over so many years. It was a great feeling, even if we couldn’t celebrate properly because of the pandemic.

What will the SKA telescopes do that previous radio telescopes couldn’t?

The game changer is the sheer size of the facility. Initially, we’re building 131,072 low-frequency antennas in Western Australia (“SKA-Low”) and 197 15 m-class dishes in South Africa (“SKA-Mid”). This will provide us with up to a factor of 10 improvement in our ability to see fainter details in the universe. The long-term SKA vision will increase the sensitivity by a further factor of 10. We’ve got many science areas, but two are going to be unique to us. One is the ability to detect hydrogen all the way back to the epoch of reionisation, also called the “cosmic-dawn”. The frequency range that we cover, combined with the large collecting area and the sensitivity of the two radio telescopes, will allow us to make a “movie” of the universe evolving from a few hundred million years after the Big Bang to the present day. We probably won’t see the first stars but will see the effect of the first stars, and we may see some of the first galaxies and black holes.

We put a lot of effort into conveying the societal impact of the SKA

The second key science goal is the study of pulsars, especially millisecond pulsars, which emit radio pulses extremely regularly, giving astronomers superb natural clocks in the sky. The SKA will be able to detect every pulsar that can be detected on Earth (at least every pulsar that is pointing in our direction and within the ~70% of the sky visible by the SKA). Pulsars will be used as a proxy to detect and study gravitational waves from extreme phenomena. For instance, when there’s a massive galaxy merger that generates gravitational waves, we will be able to detect the passage of the waves through a change in the pulse arrival times. The SKA telescopes will be a natural extension of existing pulsar-timing arrays, and will be working as a network but also individually.

Another goal is to better understand the influence of dark matter on galaxies and how the universe evolves, and we will also be able to address questions regarding the nature of neutrinos through cosmological studies.

How big is the expected SKA dataset, and how will it be managed?

It depends where you look in the data stream, because the digital signal processing systems will be reducing the data volume as much as possible. Raw data coming out of SKA-Low will be 2 Pb per second – dramatically exceeding the entire internet data rate. That data goes from our fibre network into data processing, all on-site, with electronics heavily shielded to protect the telescopes from interference. Coming out from there, it’s about 5 Tb of data per second being transferred to supercomputing facilities off-site, which is pretty much equivalent to the output generated by SKA-Mid in South Africa. From that point the data will flow into supercomputers for on-the-fly calibration and data processing, emerging as “science-ready” data. It all flows into what we call the SKA Regional Centre network, basically supercomputers dotted around the globe, very much like that used in the Worldwide LHC Computing Grid. By piping the data out to a network of regional centres at a rate of 100 Gb per second, we are going to see around 350 Pb per year of science data from each telescope.

And you’ve been collaborating with CERN on the SKA data challenge?

Very much so. We signed a memorandum of understanding three years ago, essentially to learn how CERN distributes its data and how its processing systems work. There are things we were able to share too, as the SKA will have to process a larger amount of data than even the High-Luminosity LHC will produce. Recently we have entered into a further, broader collaboration with CERN, GÉANT and PRACE [the Partnership for Advanced Computing in Europe] to look at the collaborative use of supercomputer centres in Europe.

SKAO’s organisational model also appears to have much in common with CERN’s?

If you were to look at the text of our treaty you would see its antecedents in those of CERN and ESO (the European Southern Observatory). We are an intergovernmental organisation with a treaty and a convention signed in Rome in March 2019. Right now, we’ve got seven members who have ratified the convention, which was enough for us to kick-off the observatory, and we’ve got countries like France, Spain and Switzerland on the road to accession. Other countries like India, Sweden, Canada and Germany are also following their internal processes and we expect them to join the observatory as full members in the months to come; Japan and South Korea are observers on the SKAO council at this stage. Unlike CERN, we don’t link member contributions directly to gross domestic product (GDP) – one reason being the huge disparity in GDP amongst our member states. We looked at a number of models and none of them were satisfactory, so in the end we invented something that we use as a starting point for negotiation and that’s a proxy for the scientific capacity within countries. It’s actually the number of scientists that an individual country has who are members of the International Astronomical Union. For most of our members it correlates pretty well with GDP.

Is there a sufficient volume of contracts for industries across the participating nations?

Absolutely. The SKA antennas, dishes and front-ends are essentially evolutions of existing designs. It’s the digital hardware and especially the software where there are huge innovations with the SKA. We have started a contracting process with every country and they’re guaranteed to get at least 70% of their investment in the construction funds back. The SKAO budget for the first 10 years – which includes the construction of the telescopes, the salaries of observatory staff and the start of first operations – is €2 billion. The actual telescope itself costs around €1.2 billion.

Why did it take 30 years for the SKA project to be approved?

Back in the late 1980s/early 1990s, radio astronomers were looking ahead to the next big questions. The first mention of what we call the SKA was at a conference in Albuquerque, New Mexico, celebrating the 10th anniversary of the Very Large Array, which is still a state-of-the-art radio telescope. A colleague pulled together discussions and wrote a paper proposing the “Hydrogen Array”. It was clear we would need approximately one square kilometre of collecting area, which meant there had to be a lot of innovation in the telescopes to keep things affordable. A lot of the early design work was funded by the European Commission and we formed an international steering committee to coordinate the effort. But it wasn’t until 2011 that the SKA Organisation was formed, allowing us to go out and raise the money, put the organisational structure in place, confirm the locations, formalise the detailed design and then go and build the telescopes. There was a lot of exploration surrounding the details of the intergovernmental organisation – at one point we were discussing joining ESO.

Building the SKA 10 years earlier would have been extremely difficult, however. One reason is that we would have missed out on the big-data technology and innovation revolution. Another relates to the cost of power in these remote regions: SKA’s Western Australia site is 200 km from the nearest power grid, so we are powering things with photovoltaics and batteries, the cost of which has dropped dramatically in the past five years.

What are the key ingredients for the successful management of large science projects?

One has to have a diplomatic manner. We’ve got 16 countries involved all the way from China to Canada and in both hemispheres, and you have to work closely with colleagues and diverse people all the way up to ministerial level. Being sure the connections with the government are solid and having the right connections are key. We also put a lot of effort into conveying the societal impact of the SKA. Just as CERN invented the web, Wi-Fi came out of radio astronomy, as did a lot of medical imaging technology, and we have been working hard to identify future knowledge-transfer areas.

Rising up “SKA-MPI”, the Max Planck Society-funded prototype dish, being assembled at the South African site in early 2019. Credit: Nasief Manie/SARAO

It also would have been much harder if I did not have a radio-astronomy background, because a lot of what I had to do in the early days was to rely on a network of radio-astronomy contacts around the world to sign up for the SKA and to lobby their governments. While I have no immediate plans to step aside, I think 10 or 12 years is a healthy period for a senior role. When the SKAO council begins the search for my successor, I do hope they recognise the need to have at least an astronomer, if not radio astronomer.

I look at science as an interlinked ecosystem

Finally, it is critical to have the right team, because projects like this are too large to keep in one person’s head. The team I have is the best I’ve ever worked with. It’s a fantastic effort to make all this a reality.

What are the long-term operational plans for the SKA?

The SKA is expected to operate for around 50 years, and our science case is built around this long-term aspiration. In our first phase, whose construction has started and should end in 2028/2029, we will have just under 200 dishes in South Africa, whereas we’d like to have potentially up to 2500 dishes there at the appropriate time. Similarly, in Western Australia we have a goal of up to a million low-frequency antennas, eight times the size of what we’re building now. Fifty years is somewhat arbitrary, and there are not yet any funded plans for such an expansion, but the dishes and antennas themselves will easily last for that time. The electronics are a different matter. That’s why the Lovell Telescope, which I can see outside my window here at SKAO HQ, is still an active science instrument after 65 years, because the electronics inside are state of the art. In terms of its collecting area, it is still the third largest steerable dish on Earth!

How do you see the future of big science more generally?

If there is a bright side to the COVID-19 pandemic, it has forced governments to recognise how critical science and expert knowledge are to survive, and hopefully that has translated into more realism regarding climate change for example. I look at science as an interlinked ecosystem: the hard sciences like physics build infrastructures designed to answer fundamental questions and produce technological impact, but they also train science graduates who enter other areas. The SKAO governments recognise the benefits of what South African colleagues call human capital development: that scientists and engineers who are inspired by and develop through these big projects will diffuse into industry and impact other areas of society. My experience of the senior civil servants that I have come across tells me that they understand this link.

The post Having the right connections is key appeared first on CERN Courier.

Making complexity irrelevant

cern — Thu, 04 Nov 2021 14:02:22 +0000

Plane tracking One day’s worth of flight data, consisting of more than four billion data points. Credit: gluoNNet

Describing itself as a big-data graph-analytics start-up, gluoNNet seeks to bring data analysis from CERN into “real-life” applications. Just two years old, the 12-strong firm based in Geneva and London has already aided clients with decision making by simplifying open-to-public datasets. With studies predicting that in three to four years almost 80% of data and analytics innovations may come from graph technologies, the physicist-based team aims to be the “R&D department” for medium-sized companies and help them evaluate massive volumes of data in a matter of minutes.

gluoNNet co-founder and president Daniel Dobos, an honorary researcher at the Lancaster University, first joined CERN in 2002, focusing on diamond and silicon detectors for the ATLAS experiment. A passion to share technology with a wider audience soon led him to collaborate with organisations and institutes outside the field. In 2016 he became head of foresight and futures for the United Nations-hosted Global Humanitarian Lab, which strives to bring up-to-date technology to countries across the world. Together with co-founder and fellow ATLAS collaborator Karolos Potamianos, an Ernest Rutherford Fellow at the University of Oxford, the pair have been collaborating on non-physics projects since 2014. An example is the THE Port Association, which organises in-person and online events together with CERN IdeaSquare and other partners, including “humanitarian hackathons”.

CERN’s understanding of big data is different to other’s
Daniel Dobos

gluoNNet was a natural next step to bring data analysis from high-energy physics into broader applications. It began as a non-profit, with most work being non-commercial and helping non-governmental organisations (NGOs). Working with UNICEF, for example, gluoNNet tracked countries’ financial transactions on fighting child violence to see if governments were standing by their commitments. “Our analysis even made one country – which was already one of the top donors – double their contribution, after being embarrassed by how little was actually being spent,” says Dobos.

But Dobos was quick to realise that for gluoNNet to become sustainable it had to incorporate, which it did in 2020. “We wanted to take on jobs that were more impactful, however they were also more expensive.” A second base was then added in the UK, which enabled more ambitious projects to be taken on.

Tracking flights

One project arose from an encounter at CERN IdeaSquare. The former head of security of a major European airline had visited CERN and noticed the particle-tracking technology as well as the international and collaborative environment; he believed something similar was needed in the aviation industry. During the visit a lively discussion about the similarities between data in aviation and particle tracking emerged. This person later became a part of the Civil Aviation Administration of Kazakhstan, which gluoNNet now works with to create a holistic overview of global air traffic (see image above). “We were looking for regulatory, safety and ecological misbehaviour, and trying to find out why some airplanes are spending more time in the air than they were expected to,” says Kristiane Novotny, a theoretical physicist who wrote her PhD thesis at CERN and is now a lead data scientist at gluoNNet. “If we can find out why, we can help reduce flight times, and therefore reduce carbon-dioxide emissions due to shorter flights.”

Using experience acquired at CERN in processing enormous amounts of data, gluoNNet’s data-mining and machine-learning algorithms benefit from the same attitude as that at CERN, explains Dobos. “CERN’s understanding of big data is different to other’s. For some companies, what doesn’t fit in an Excel sheet is considered ‘big data’, whereas at CERN this is miniscule.” Therefore, it is no accident that most in the team are CERN alumni. “We need people who have the CERN spirit,” he states. “If you tell people at CERN that we want to get to Mars by tomorrow, they will get on and think about how to get there, rather than shutting down the idea.”

Though it’s still early days for gluoNNet, the team is undertaking R&D to take things to the next level. Working with CERN openlab and the Middle East Technical University’s Application and Research Center for Space and Accelerator Technologies, for example, gluoNNet is exploring the application of quantum-computing algorithms (namely quantum-graph neural networks) for particle-track reconstruction, as well as industrial applications, such as the analysis of aviation data. Another R&D effort, which originated at the Pan European Quantum Internet Hackathon 2019, aims to make use of quantum key distribution to achieve a secure VPN (virtual private network) connection.

One of gluoNNet’s main future projects is a platform that can provide an interconnected system for analysts and decision makers at companies. The platform would allow large amounts of data to be uploaded and presented clearly, with Dobos explaining, “Companies have meetings with data analysts back and forth for weeks on decisions; this could be a place that shortens these decisions to minutes. Large technology companies start to put these platforms in place, but they are out of reach for small and medium sized companies that can’t develop such frameworks internally.”

The vast amounts of data we have available today hold invaluable insights for governments, companies, NGOs and individuals, says Potamianos. “Most of the time only a fraction of the actual information is considered, missing out on relationships, dynamics and intricacies that data could reveal. With gluoNNet, we aim to help stakeholders that don’t have in-house expertise in advanced data processing and visualisation technologies to get insights from their data, making its complexity irrelevant to decision makers.”

The post Making complexity irrelevant appeared first on CERN Courier.

Learning to detect new top-quark interactions

cern — Mon, 04 Oct 2021 07:29:50 +0000

Fig. 1. The response of a neural network used to target a specific type of EFT interaction in ttZ production. The lower panel shows the change of the event yield in each bin with respect to the SM post-fit expectation for two benchmark EFT scenarios, both for the tt–Z process (solid line) and the total prediction (dotted), illustrating the neural network’s ability to isolate anomalous effects. Credit: CERN

Ever since its discovery in 1995 at the Tevatron, the top quark has been considered to be a highly effective probe of new physics. A key reason is that the last fundamental fermion predicted by the Standard Model (SM) has a remarkably high mass, just a sliver under the Higgs vacuum expectation value divided by the square root of two, implying a Yukawa coupling close to unity. This has far-reaching implications: the top quark impacts the electroweak sector significantly through loop corrections, and may couple preferentially to new massive states. But while the top quark may represent a window into new physics, we cannot know a priori whether new massive particles could ever be produced at the LHC, and direct searches have so far been inconclusive. Model-independent measurements carried out within the framework of effective field theory (EFT) are therefore becoming increasingly important as a means to make the most of the wealth of precision measurements at the LHC. This approach makes it possible to systematically correlate sparse deviations observed in different measurements, in order to pinpoint any anomalies in top-quark couplings that might arise from unknown massive particles.

The top quark impacts the electroweak sector significantly through loop corrections

A new CMS analysis searches for anomalies in top-quark interactions with the Z boson using an EFT framework. The cross-section measurements of the rare associated production of either one (tZ) or two (ttZ) top quarks with a Z boson were statistically limited until recently. These interactions are among the least constrained by the available data in the top-quark sector, despite being modified in numerous beyond-SM models, such as composite Higgs models and minimal supersymmetry. Using the full LHC Run-2 data set, this study targets high-purity final states with multiple electrons and muons. It sets some of the tightest constraints to date on five generic types of EFT interactions that could substantially modify the characteristics of associated top-Z production, while having negligible or no effect on background processes.

Machine learning

In contrast to the more usual reinterpretations of SM measurements that require assumptions on the nature of new physics, this analysis considers EFT effects on observables at the detector level and constrains them directly from the data using a strategy that combines observables specifically selected for their sensitivity to EFT. The key feature of this work is its heavy use of multivariate-analysis techniques based on machine learning, which improve its sensitivity to new interactions. First, to define regions enriched in the processes of interest, a multiclass neural network is trained to discriminate between different SM processes. Subsequently, several binary neural networks learn to separate events generated according to the SM from events that include EFT effects arising from one or more types of anomalous interactions. For the first time in an analysis using LHC data, these classifiers were trained on the full physical amplitudes, including the interference between SM and EFT components.

The binary classifiers are used to construct powerful discriminant variables out of high-dimensional input data. Their distributions are fitted to data to constrain up to five types of EFT couplings simultaneously. The widths of the corresponding confidence intervals are significantly reduced thanks to the combination of the available kinematic information that was specifically chosen to be sensitive to EFT in the top quark sector. All results are consistent with the SM, which indicates either the absence of new effects in the targeted interactions or that the mass scale of new physics is too high to be probed with the current sensitivity. This result is an important step towards the more widespread use of machine learning to target EFT effects, to efficiently explore the enormous volume of LHC data more globally and comprehensively.

The post Learning to detect new top-quark interactions appeared first on CERN Courier.

Emergence

cern — Thu, 02 Sep 2021 09:54:39 +0000

Emergent simplicity The evolution of a murmuration of starlings cannot be described by following the motion of any individual bird. Credit: iStock/georgeclerk

Particle physics is at its heart a reductionistic endeavour that tries to reduce reality to its most basic building blocks. This view of nature is most evident in the search for a theory of everything – an idea that is nowadays more common in popularisations of physics than among physicists themselves. If discovered, all physical phenomena would follow from the application of its fundamental laws.

A complementary perspective to reductionism is that of emergence. Emergence says that new and different kinds of phenomena arise in large and complex systems, and that these phenomena may be impossible, or at least very hard, to derive from the laws that govern their basic constituents. It deals with properties of a macroscopic system that have no meaning at the level of its microscopic building blocks. Good examples are the wetness of water and the superconductivity of an alloy. These concepts don’t exist at the level of individual atoms or molecules, and are very difficult to derive from the microscopic laws.

As physicists continue to search for cracks in the Standard Model (SM) and Einstein’s general theory of relativity, could these natural laws in fact be emergent from a deeper reality? And emergence is not limited to the world of the very small, but by its very nature skips across orders of magnitude in scale. It is even evident, often mesmerisingly so, at scales much larger than atoms or elementary particles, for example in the murmurations of a flock of birds – a phenomenon that is impossible to describe by following the motion of an individual bird. Another striking example may be intelligence. The mechanism by which artificial intelligence is beginning to emerge from the complexity of underlying computing codes shows similarities with emergent phenomena in physics. One can argue that intelligence, whether it occurs naturally, as in humans, or artificially, should also be viewed as an emergent phenomenon.

Data compression

Renormalisable quantum field theory, the foundation of the SM, works extraordinarily well. The same is true of general relativity. How can our best theories of nature be so successful, while at the same time being merely emergent? Perhaps these theories are so successful precisely because they are emergent.

As a warm up, let’s consider the laws of thermodynamics, which emerge from the microscopic motion of many molecules. These laws are not fundamental but are derived by statistical averaging – a huge data compression in which the individual motions of the microscopic particles are compressed into just a few macroscopic quantities such as temperature. As a result, the laws of thermodynamics are universal and independent of the details of the microscopic theory. This is true of all the most successful emergent theories; they describe universal macroscopic phenomena whose underlying microscopic descriptions may be very different. For instance, two physical systems that undergo a second-order phase transition, while being very different microscopically, often obey exactly the same scaling laws, and are at the critical point described by the same emergent theory. In other words, an emergent theory can often be derived from a large universality class of many underlying microscopic theories.

Successful emergent theories describe universal macroscopic phenomena whose underlying microscopic descriptions may be very different

Entropy is a key concept here. Suppose that you try to store the microscopic data associated with the motion of some particles on a computer. If we need N bits to store all that information, we have 2^N possible microscopic states. The entropy equals the logarithm of this number, and essentially counts the number of bits of information. Entropy is therefore a measure of the total amount of data that has been compressed. In deriving the laws of thermodynamics, you throw away a large amount of microscopic data, but you at least keep count of how much information has been removed in the data-compression procedure.

Emergent quantum field theory

One of the great theoretical-physics paradigm shifts of the 20th century occurred when Kenneth Wilson explained the emergence of quantum field theory through the application of the renormalisation group. As with thermodynamics, renormalisation compresses microscopic data into a few relevant parameters – in this case, the fields and interactions of the emergent quantum field theory. Wilson demonstrated that quantum field theories appear naturally as an effective long-distance and low-energy description of systems whose microscopic definition is given in terms of a quantum system living on a discretised spacetime. As a concrete example, consider quantum spins on a lattice. Here, renormalisation amounts to replacing the lattice by a coarser lattice with fewer points, and redefining the spins to be the average of the original spins. One then rescales the coarser lattice so that the distance between lattice points takes the old value, and repeats this step many times. A key insight was that, for quantum statistical systems that are close to a phase transition, you can take a continuum limit in which the expectation values of the spins turn into the local quantum fields on the continuum spacetime.

This procedure is analogous to the compression algorithms used in machine learning. Each renormalisation step creates a new layer, and the algorithm that is applied between two layers amounts to a form of data compression. The goal is similar: you only keep the information that is required to describe the long-distance and low-energy behaviour of the system in the most efficient way.

Gauge symmetry To ensure that a neural network recognises a pixelated lizard, its algorithm should be invariant under rotations. Credit: M Weiler/arXiv:2106.06020

So quantum field theory can be seen as an effective emergent description of one of a large universality class of many possible underlying microscopic theories. But what about the SM specifically, and its possible supersymmetric extensions? Gauge fields are central ingredients of the SM and its extensions. Could gauge symmetries and their associated forces emerge from a microscopic description in which there are no gauge fields? Similar questions can also be asked about the gravitational force. Could the curvature of spacetime be explained from an emergent perspective?

String theory seems to indicate that this is indeed possible, at least theoretically. While initially formulated in terms of vibrating strings moving in space and time, it became clear in the 1990s that string theory also contains many more extended objects, known as “branes”. By studying the interplay between branes and strings, an even more microscopic theoretical description was found in which the coordinates of space and time themselves start to dissolve: instead of being described by real numbers, our familiar (x, y, z) coordinates are replaced by non-commuting matrices. At low energies, these matrices begin to commute, and give rise to the normal spacetime with which we are familiar. In these theoretical models it was found that both gauge forces and gravitational forces appear at low energies, while not existing at the microscopic level.

While these models show that it is theoretically possible for gauge forces to emerge, there is at present no emergent theory of the SM. Such a theory seems to be well beyond us. Gravity, however, being universal, has been more amenable to emergence.

Emergent gravity

In the early 1970s, a group of physicists became interested in the question: what happens to the entropy of a thermodynamic system that is dropped into a black hole? The surprising conclusion was that black holes have a temperature and an entropy, and behave exactly like thermodynamic systems. In particular, they obey the first law of thermodynamics: when the mass of a black hole increases, its (Bekenstein–Hawking) entropy also increases.

The correspondence between the gravitational laws and the laws of thermodynamics does not only hold near black holes. You can artificially create a gravitational field by accelerating. For an observer who continues to accelerate, even empty space develops a horizon, from behind which light rays will not be able to catch up. These horizons also carry a temperature and entropy, and obey the same thermodynamic laws as black-hole horizons.

It was shown by Stephen Hawking that the thermal radiation emitted from a black hole originates from pair creation near the black-hole horizon. The properties of the pair of particles, such as spin and charge, are undetermined due to quantum uncertainty, but if one particle has spin up (or positive charge), then the other particle must have spin down (or negative charge). This means that the particles are quantum entangled. Quantum entangled pairs can also be found in flat space by considering accelerated observers.

Crucially, even the vacuum can be entangled. By separating spacetime into two parts, you can ask how much entanglement there is between the two sides. The answer to this was found in the last decade, through the work of many theorists, and turns out to be rather surprising. If you consider two regions of space that are separated by a two-dimensional surface, the amount of quantum entanglement between the two sides turns out to be precisely given by the Bekenstein–Hawking entropy formula: it is equal to a quarter of the area of the surface measured in Planck units.

Holographic renormalisation

Black-hole maths Jacob Bekenstein and Stephen Hawking related entropy to the area of the event horizon. Credit: J Bekenstein

The AdS/CFT correspondence incorporates a principle called “holography”: the gravitational physics inside a region of space emerges from a microscopic description that, just like a hologram, lives on a space with one less dimension and thus can be viewed as living on the boundary of the spacetime region. The extra dimension of space emerges together with the gravitational force through a process called “holographic renormalisation”. One successively adds new layers of spacetime. Each layer is obtained from the previous layer through “coarse-graining”, in a similar way to both renormalisation in quantum field theory and data-compression algorithms in machine learning.

Unfortunately, our universe is not described by a negatively curved spacetime. It is much closer to a so-called de Sitter spacetime, which has a positive curvature. The main difference between de Sitter space and the negatively curved anti-de Sitter space is that de Sitter space does not have a boundary. Instead, it has a cosmological horizon whose size is determined by the rate of the Hubble expansion. One proposed explanation for this qualitative difference is that, unlike for negatively curved spacetimes, the microscopic quantum state of our universe is not unique, but secretly carries a lot of quantum information. The amount of this quantum information can once again be counted by an entropy: the Bekenstein–Hawking entropy associated with the cosmological horizon.

This raises an interesting prospect: if the microscopic quantum data of our universe may be thought of as many entangled qubits, could our current theories of spacetime, particles and forces emerge via data compression? Space, for example, could emerge by forgetting the precise way in which all the individual qubits are entangled, but only preserving the information about the amount of quantum entanglement present in the microscopic quantum state. This compressed information would then be stored in the form of the areas of certain surfaces inside the emergent curved spacetime.

In this description, gravity would follow for free, expressed in the curvature of this emergent spacetime. What is not immediately clear is why the curved spacetime would obey the Einstein equations. As Einstein showed, the amount of curvature in spacetime is determined by the amount of energy (or mass) that is present. It can be shown that his equations are precisely equivalent to an application of the first law of thermodynamics. The presence of mass or energy changes the amount of entanglement, and hence the area of the surfaces in spacetime. This change in area can be computed and precisely leads to the same spacetime curvature that follows from the Einstein equations.

The idea that gravity emerges from quantum entanglement goes back to the 1990s, and was first proposed by Ted Jacobson. Not long afterwards, Juan Maldacena discovered that general relativity can be derived from an underlying microscopic quantum theory without a gravitational force. His description only works for infinite spacetimes with negative curvature called anti-de Sitter (or AdS–) space, as opposed to the positive curvature we measure. The microscopic description then takes the form of a scale-invariant quantum field theory – a so-called conformal field theory (CFT) – that lives on the boundary of the AdS–space (see “Holographic renormalisation” panel). It is in this context that the connection between vacuum entanglement and the Bekenstein–Hawking entropy, and the derivation of the Einstein equations from entanglement, are best understood. I have also contributed to these developments in a paper in 2010 that emphasised the role of entropy and information for the emergence of the gravitational force. Over the last decade a lot of progress has been made in our understanding of these connections, in particular the deep connection between gravity and quantum entanglement. Quantum information has taken centre stage in the most recent theoretical developments.

Emergent intelligence

But what about viewing the even more complex problem of human intelligence as an emergent phenomenon? Since scientific knowledge is condensed and stored in our current theories of nature, the process of theory formation can itself be viewed as a very efficient form of data compression: it only keeps the information needed to make predictions about reproducible events. Our theories provide us with a way to make predictions with the fewest possible number of free parameters.

The same principles apply in machine learning. The way an artificial-intelligence machine is able to predict whether an image represents a dog or a cat is by compressing the microscopic data stored in individual pixels in the most efficient way. This decision cannot be made at the level of individual pixels. Only after the data has been compressed and reduced to its essence does it becomes clear what the picture represents. In this sense, the dog/cat-ness of a picture is an emergent property. This is even true for the way humans process the data collected by our senses. It seems easy to tell whether we are seeing or hearing a dog or a cat, but underneath, and hidden from our conscious mind, our brains perform a very complicated task that turns all the neural data that come from our eyes and ears into a signal that is compressed into a single outcome: it is a dog or a cat.

Emergence is often summarised with the slogan “the whole is more than the sum of its parts”

Can intelligence, whether artificial or human, be explained from a reductionist point of view? Or is it an emergent concept that only appears when we consider a complex system built out of many basic constituents? There are arguments in favour of both sides. As human beings, our brains are hard-wired to observe, learn, analyse and solve problems. To achieve these goals the brain takes the large amount of complex data received via our senses and reduces it to a very small set of information that is most relevant for our purposes. This capacity for efficient data compression may indeed be a good definition for intelligence, when it is linked to making decisions towards reaching a certain goal. Intelligence defined in this way is exhibited in humans, but can also be achieved artificially.

Artificially intelligent computers beat us at problem solving, pattern recognition and sometimes even in what appears to be “generating new ideas”. A striking example is DeepMind’s AlphaZero, whose chess rating far exceeds that of any human player. Just four hours after learning the rules of chess, AlphaZero was able to beat the strongest conventional “brute force” chess program by coming up with smarter ideas and showing a deeper understanding of the game. Top grandmasters use its ideas in their own games at the highest level.

In its basic material design, an artificial-intelligence machine looks like an ordinary computer. On the other hand, it is practically impossible to explain all aspects of human intelligence by starting at the microscopic level of the neurons in our brain, let alone in terms of the elementary particles that make up those neurons. Furthermore, the intellectual capability of humans is closely connected to the sense of consciousness, which most scientists would agree does not allow for a simple reductionist explanation.

Emergence is often summarised with the slogan “the whole is more than the sum of its parts” – or as condensed-matter theorist Phil Anderson put it, “more is different”. It counters the reductionist point of view, reminding us that the laws that we think to be fundamental today may in fact emerge from a deeper underlying reality. While this deeper layer may remain inaccessible to experiment, it is an essential tool for theorists of the mind and the laws of physics alike.

The post Emergence appeared first on CERN Courier.

Web code auctioned as crypto asset

cern — Thu, 02 Sep 2021 09:48:23 +0000

Unique A portion of the web’s original source code. Credit: NFT ERC-721

Time-stamped files stated by Tim Berners-Lee to contain the original source code for the web and digitally signed by him, have sold for US$5.4 million at auction. The files were sold as a non-fungible token (NFT), a form of a crypto asset that uses blockchain technology to confer uniqueness.

The web was originally conceived at CERN to meet the demand for automated information-sharing between physicists spread across universities and institutes worldwide. Berners-Lee wrote his first project proposal in March 1989, and the first website, which was dedicated to the World Wide Web project itself and hosted on Berners-Lee’s NeXT computer, went live in the summer of 1991. Less than two years later, on 30 April 1993, and after several iterations in development, CERN placed version three of the software in the public domain. It deliberately did so on a royalty-free, “no-strings-attached” basis, addressing the memo simply “To whom it may concern.”

The seed that led CERN to relinquish ownership of the web was planted 70 years ago, in the CERN Convention, which states that results of its work were to be “published or otherwise made generally available” – a culture of openness that continues to this day.

The auction offer describes the NFT as containing approximately 9555 lines of code, including implementations of the three languages and protocols that remain fundamental to the web today: HTML (Hypertext Markup Language), HTTP (Hypertext Transfer Protocol) and URIs (Uniform Resource Identifiers). The lot also includes an animated visualisation of the code, a letter written by Berners-Lee reflecting on the process of creating it, and a Scalable Vector Graphics representation of the full code created from the original files.

Bidding for the NFT, which auction- house Sotheby’s claims is its first-ever sale of a digital-born artefact, opened on 23 June and attracted a total of 51 bids. The sale will benefit initiatives that Berners-Lee and his wife Rosemary Leith support, stated a Sotheby’s press release.

The post Web code auctioned as crypto asset appeared first on CERN Courier.

Designing an AI physicist

cern — Thu, 02 Sep 2021 09:46:08 +0000

Best of both worlds We need to merge the insights gained from artificial intelligence and physics intelligence. Credit: D Dominguez/CERN

Can we trust physics decisions made by machines? In recent applications of artificial intelligence (AI) to particle physics, we have partially sidestepped this question by using machine learning to augment analyses, rather than replace them. We have gained trust in AI decisions through careful studies of “control regions” and painstaking numerical simulations. As our physics ambitions grow, however, we are using “deeper” networks with more layers and more complicated architectures, which are difficult to validate in the traditional way. And to mitigate 10 to 100-fold increases in computing costs, we are planning to fully integrate AI into data collection, simulation and analysis at the high-luminosity LHC.

To build trust in AI, I believe we need to teach it to think like a physicist.

I am the director of the US National Science Foundation’s new Institute for Artificial Intelligence and Fundamental Interactions, which was founded last year. Our goal is to fuse advances in deep learning with time-tested strategies for “deep thinking” in the physical sciences. Many promising opportunities are open to us. Core principles of fundamental physics such as causality and spacetime symmetries can be directly incorporated into the structure of neural networks. Symbolic regression can often translate solutions learned by AI into compact, human-interpretable equations. In experimental physics, it is becoming possible to estimate and mitigate systematic uncertainties using AI, even when there are a large number of nuisance parameters. In theoretical physics, we are finding ways to merge AI with traditional numerical tools to satisfy stringent requirements that calculations be exact and reproducible. High-energy physicists are well positioned to develop trustworthy AI that can be scrutinised, verified and interpreted, since the five-sigma standard of discovery in our field necessitates it.

It is equally important, however, that we physicists teach ourselves how to think like a machine.

Jesse Thaler is a professor at MIT and director of the US National Science Foundation’s Institute for Artiﬁcial Intelligence and Fundamental Interactions. Credit: J Charney

Modern AI tools yield results that are often surprisingly accurate and insightful, but sometimes unstable or biased. This can happen if the problem to be solved is “underspecified”, meaning that we have not provided the machine with a complete list of desired behaviours, such as insensitivity to noise, sensible ways to extrapolate and awareness of uncertainties. An even more challenging situation arises when the machine can identify multiple solutions to a problem, but lacks a guiding principle to decide which is most robust. By thinking like a machine, and recognising that modern AI solves problems through numerical optimisation, we can better understand the intrinsic limitations of training neural networks with finite and imperfect datasets, and develop improved optimisation strategies. By thinking like a machine, we can better translate first principles, best practices and domain knowledge from fundamental physics into the computational language of AI.

Beyond these innovations, which echo the logical and algorithmic AI that preceded the deep-learning revolution of the past decade, we are also finding surprising connections between thinking like a machine and thinking like a physicist. Recently, computer scientists and physicists have begun to discover that the apparent complexity of deep learning may mask an emergent simplicity. This idea is familiar from statistical physics, where the interactions of many atoms or molecules can often be summarised in terms of simpler emergent properties of materials. In the case of deep learning, as the width and depth of a neural network grows, its behaviour seems to be describable in terms of a small number of emergent parameters, sometimes just a handful. This suggests that tools from statistical physics and quantum field theory can be used to understand AI dynamics, and yield deeper insights into their power and limitations.

If we don’t exploit the full power of AI, we will not maximise the discovery potential of the LHC and other experiments

Ultimately, we need to merge the insights gained from artificial intelligence and physics intelligence. If we don’t exploit the full power of AI, we will not maximise the discovery potential of the LHC and other experiments. But if we don’t build trustable AI, we will lack scientific rigour. Machines may never think like human physicists, and human physicists will certainly never match the computational ability of AI, but together we have enormous potential to learn about the fundamental structure of the universe.

The post Designing an AI physicist appeared first on CERN Courier.

Forging the future of AI

cern — Tue, 31 Aug 2021 22:00:52 +0000

Launching the forum The CMS collaboration’s Jennifer Ngadiuba speaks to fellow Sparks! participant and machine-learning expert Michael Kagan of the ATLAS experiment (right) and Bruno Giussani (left), the global curator of the TED conference series, who will host the public Sparks! event on 18 September. Credit: C Marcelloni/OPEN-PHO-ACCEL-2020-003-10

Field lines arc through the air. By chance, a cosmic ray knocks an electron off a molecule. It hurtles away, crashing into other molecules and multiplying the effect. The temperature rises, liberating a new supply of electrons. A spark lights up the dark.

Vivienne Ming. Credit: V Ming

The absence of causal inference in practical machine learning touches on every aspect of AI research, application, ethics and policy
Vivienne Ming is a theoretical neuroscientist and a serial AI entrepreneur

This is an excellent metaphor for the Sparks! Serendipity Forum – a new annual event at CERN designed to encourage interdisciplinary collaborations between experts on key scientific issues of the day. The first edition, which will take place from 17 to 18 September, will focus on artificial intelligence (AI). Fifty leading thinkers will explore the future of AI in topical groups, with the outcomes of their exchanges to be written up and published in the journal Machine Learning: Science and Technology. The forum reflects the growing use of machine-learning techniques in particle physics and emphasises the importance that CERN and the wider community places on collaborating with diverse technological sectors. Such interactions are essential to the long-term success of the field.

Anima Anandkumar. Credit: A Anandkumar

AI is orders of magnitude faster than traditional numerical simulations. On the other side of the coin, simulations are being used to train AI in domains such as robotics where real data is very scarce
Anima Anandkumar is Bren professor at Caltech and director of machine learning research at NVIDIA

The likelihood of sparks flying depends on the weather. To take the temperature, CERN Courier spoke to a sample of the Sparks! participants to preview themes for the September event.

Genevieve Bell. Credit: T Osborne

2020 revealed unexpectedly fragile technological and socio-cultural infrastructures. How we locate our conversations and research about AI in those contexts feels as important as the research itself
Genevieve Bell is director of the School of Cybernetics at the Australian National University and vice president at Intel

Back to the future

In the 1980s, AI research was dominated by code that emulated logical reasoning. In the 1990s and 2000s, attention turned to softening its strong syllogisms into probabilistic reasoning. Huge strides forward in the past decade have rejected logical reasoning, however, instead capitalising on computing power by letting layer upon layer of artificial neurons discern the relationships inherent in vast data sets. Such “deep learning” has been transformative, fuelling innumerable innovations, from self-driving cars to searches for exotica at the LHC (see Hunting anomalies with an AI trigger). But many Sparks! participants think that the time has come to reintegrate causal logic into AI.

Stuart Russell. Credit: S Russell

Geneva is the home not only of CERN but also of the UN negotiations on lethal autonomous weapons. The major powers must put the evil genie back in the bottle before it’s too late
Stuart Russell is professor of computer science at the University of California, Berkeley and coauthor of the seminal text on AI

“A purely predictive system, such as the current machine learning that we have, that lacks a notion of causality, seems to be very severely limited in its ability to simulate the way that people think,” says Nobel-prize-winning cognitive psychologist Daniel Kahneman. “Current AI is built to solve one specific task, which usually does not include reasoning about that task,” agrees AAAI president-elect Francesca Rossi. “Leveraging what we know about how people reason and behave can help build more robust, adaptable and generalisable AI – and also AI that can support humans in making better decisions.”

Tomaso Poggio. Credit: T Poggio

AI is converging on forms of intelligence that are useful but very likely not human-like
Tomaso Poggio is a cofounder of computational neuroscience and Eugene McDermott professor at MIT

Google’s Nyalleng Moorosi identifies another weakness of deep-learning models that are trained with imperfect data: whether AI is deciding who deserves a loan or whether an event resembles physics beyond the Standard Model, its decisions are only as good as its training. “What we call the ground truth is actually a system that is full of errors,” she says.

Nyalleng Moorosi. Credit: N Moorosi

We always had privacy violation, we had people being blamed falsely for crimes they didn’t do, we had mis-diagnostics, we also had false news, but what AI has done is amplify all this, and make it bigger
Nyalleng Moorosi is a research software engineer at Google and a founding member of Deep Learning Indaba

Furthermore, says influential computational neuroscientist Tomaso Poggio, we don’t yet understand the statistical behaviour of deep-learning algorithms with mathematical precision. “There is a risk in trying to understand things like particle physics using tools we don’t really understand,” he explains, also citing attempts to use artificial neural networks to model organic neural networks. “It seems a very ironic situation, and something that is not very scientific.”

Daniel Kahneman. Credit: D Kahneman

This idea of partnership, that worries me. It looks to me like a very unstable equilibrium. If the AI is good enough to help the person, then pretty soon it will not need the person
Daniel Kahneman is a renowned cognitive psychologist and a winner of the 2002 Nobel Prize in Economics

Stuart Russell, one of the world’s most respected voices on AI, echoes Poggio’s concerns, and also calls for a greater focus on controlled experimentation in AI research itself. “Instead of trying to compete between Deep Mind and OpenAI on who can do the biggest demo, let’s try to answer scientific questions,” he says. “Let’s work the way scientists work.”

Good or bad?

Though most Sparks! participants firmly believe that AI benefits humanity, ethical concerns are uppermost in their minds. From social-media algorithms to autonomous weapons, current AI overwhelmingly lacks compassion and moral reasoning, is inflexible and unaware of its fallibility, and cannot explain its decisions. Fairness, inclusivity, accountability, social cohesion, security and international law are all impacted, deepening links between the ethical responsibilities of individuals, multinational corporations and governments. “This is where I appeal to the human-rights framework,” says philosopher S Matthew Liao. “There’s a basic minimum that we need to make sure everyone has access to. If we start from there, a lot of these problems become more tractable.”

S Matthew Liao. Credit: S M Liao

We need to understand ethical principles, rather than just list them, because then there’s a worry that we’re just doing ethics washing – they sound good but they don’t have any bite
S Matthew Liao is a philosopher and the director of the Center for Bioethics at New York University

Far-term ethical considerations will be even more profound if AI develops human-level intelligence. When Sparks! participants were invited to put a confidence interval on when they expect human-level AI to emerge, answers ranged from [2050, 2100] at 90% confidence to [2040, ∞] at 99% confidence. Other participants said simply “in 100 years” or noted that this is “delightfully the wrong question” as it’s too human-centric. But by any estimation, talking about AI cannot wait.

Francesca Rossi. Credit: F Rossi

Only a multi-stakeholder and multi-disciplinary approach can build an ecosystem of trust around AI. Education, cultural change, diversity and governance are equally as important as making AI explainable, robust and transparent
Francesca Rossi co-leads the World Economic Forum Council on AI for humanity and is IBM AI ethics global leader and the president-elect of AAAI

“With Sparks!, we plan to give a nudge to serendipity in interdisciplinary science by inviting experts from a range of fields to share their knowledge, their visions and their concerns for an area of common interest, first with each other, and then with the public,” says Joachim Mnich, CERN’s director for research and computing. “For the first edition of Sparks!, we’ve chosen the theme of AI, which is as important in particle physics as it is in society at large. Sparks! is a unique experiment in interdisciplinarity, which I hope will inspire continued innovative uses of AI in high-energy physics. I invite the whole community to get involved in the public event on 18 September.”

The post Forging the future of AI appeared first on CERN Courier.

Hunting anomalies with an AI trigger

cern — Tue, 31 Aug 2021 21:55:21 +0000

In the 1970s, the robust mathematical framework of the Standard Model (SM) replaced data observation as the dominant starting point for scientific inquiry in particle physics. Decades-long physics programmes were put together based on its predictions. Physicists built complex and highly successful experiments at particle colliders, culminating in the discovery of the Higgs boson at the LHC in 2012.

Along this journey, particle physicists adapted their methods to deal with ever growing data volumes and rates. To handle the large amount of data generated in collisions, they had to optimise real-time selection algorithms, or triggers. The field became an early adopter of artificial intelligence (AI) techniques, especially those falling under the umbrella of “supervised” machine learning. Verifying the SM’s predictions or exposing its shortcomings became the main goal of particle physics. But with the SM now apparently complete, and supervised studies incrementally excluding favoured models of new physics, “unsupervised” learning has the potential to lead the field into the uncharted waters beyond the SM.

Blind faith

To maximise discovery potential while minimising the risk of false discovery claims, physicists design rigorous data-analysis protocols to minimise the risk of human bias. Data analysis at the LHC is blind: physicists prevent themselves from combing through data in search of surprises. Simulations and “control regions” adjacent to the data of interest are instead used to design a measurement. When the solidity of the procedure is demonstrated, an internal review process gives the analysts the green light to look at the result on the real data and produce the experimental result.

A blind analysis is by necessity a supervised approach. The hypothesis being tested is specified upfront and tested against the null hypothesis – for example, the existence of the Higgs boson in a particular mass range versus its absence. Once spelled out, the hypothesis determines other aspects of the experimental process: how to select the data, how to separate signals from background and how to interpret the result. The analysis is supervised in the sense that humans identify what the possible signals and backgrounds are, and label examples of both for the algorithm.

Trigger warning Artist’s impression of an FPGA in the level-one trigger scanning for anomalies at a rate of 40 million events per second. Credit: S Sioni/CMS-PHO-EVENTS-2021-004-2/M Rayner

The data flow at the LHC makes the need to specify a signal hypothesis upfront even more compelling. The LHC produces 40 million collision events every second. Each overlaps with 34 others from the same bunch crossing, on average, like many pictures superimposed on top of each other. However, the computing infrastructure of a typical experiment is designed to sustain a data flow of just 1000 events per second. To avoid being overwhelmed by the data pressure, it’s necessary to select these 1000 out of every 40 million events in a short time. But how do you decide what’s interesting?

This is where the supervised nature of data analysis at the LHC comes into play. A set of selection rules – the trigger algorithms – are designed so that the kind of collisions predicted by the signal hypotheses being studied are present among the 1000 (see “Big data” figure). As long as you know what to look for, this strategy optimises your resources. The discovery in 2012 of the Higgs boson demonstrates this: a mission considered impossible in the 1980s was accomplished with less data and less time than anticipated by the most optimistic guesses when the LHC was being designed. Machine learning played a crucial role in this.

Machine learning

Machine learning (ML) is a branch of computer science that deals with algorithms capable of accomplishing a task without being explicitly programmed to do so. Unlike traditional algorithms, which are sets of pre-determined operations, an ML algorithm is not programmed. It is trained on data, so that it can adjust itself to maximise its chances of success, as defined by a quantitative figure of merit.

To explain further, let’s use the example of a dataset of images of cats and dogs. We’ll label the cats as “0” and the dogs as “1”, and represent the images as a two-dimensional array of coloured pixels, each with a fraction of red, green and blue. Each dog or cat is now a stack of three two-dimensional arrays of numbers between 0 and 1 – essentially just the animal pictured in red, green and blue light. We would like to have a mathematical function converting this stack of arrays into a score ranging from 0 to 1. The larger the score, the higher the probability that the image is a dog. The smaller the score, the higher the probability that the image is a cat. An ML algorithm is a function of this kind, whose parameters are fixed by looking at a given dataset for which the correct labels are known. Through a training process, the algorithm is tuned to minimise the number of wrong answers by comparing its prediction to the labels.

Big data The data flow from the ATLAS and CMS experiments must be filtered down to just 1000 events per second for the data to be handled by the available downstream computing resources. This is done by a two-stage real-time filtering process. The level-one (L1) trigger is coded on application-specific integrated circuits and field-programmable gate arrays underground near the detectors. The high-level trigger operates on CPUs at ground level. Anomaly detection would be most beneficial at the L1 trigger, where all produced events could be inspected. Credit: M Pierini

Now replace the dogs with photons from the decay of a Higgs boson, and the cats with detector noise that is mistaken to be photons. Repeat the procedure, and you will obtain a photon-identification algorithm that you can use on LHC data to improve the search for Higgs bosons. This is what happened in the CMS experiment back in 2012. Thanks to the use of a special kind of ML algorithm called boosted decision trees, it was possible to maximise the accuracy of the Higgs-boson search, exploiting the rich information provided by the experiment’s electromagnetic calorimeter. The ATLAS collaboration developed a similar procedure to identify Higgs bosons decaying into a pair of tau leptons.

Photon and tau-lepton classifiers are both examples of supervised learning, and the success of the discovery of the Higgs boson was also a success story for applied ML. So far so good. But what about searching for new physics?

Typical examples of new physics such as supersymmetry, extra dimensions and the underlying structure for the Higgs boson have been extensively investigated at the LHC, with no evidence for them found in data. This has told us a great deal about what the particles predicted by these scenarios cannot look like, but what if the signal hypotheses are simply wrong, and we’re not looking for the right thing? This situation calls for “unsupervised” learning, where humans are not required to label data. As with supervised learning, this idea doesn’t originate in physics. Marketing teams use clustering algorithms based on it to identify customer segments. Banks use it to detect credit-card fraud by looking for anomalous access patterns in customers’ accounts. Similar anomaly detection techniques could be used at the LHC to single out rare events, possibly originating from new, previously undreamt of, mechanisms.

Unsupervised learning

Anomaly detection is a possible strategy for keeping watch for new physics without having to specify an exact signal. A kind of unsupervised ML, it involves ranking an unlabelled dataset from the most typical to the most atypical, using a ranking metric learned during training. One of the advantages of this approach is that the algorithm can be trained on data recorded by the experiment rather than simulations. This could, for example, be a control sample that we know to be dominated by SM processes: the algorithm will learn how to reconstruct these events “exactly” – and conversely how to rank unknown processes as atypical. As a proof of principle, this strategy has already been applied to re-discover the top quark using the first open-data release by the CMS collaboration.

This approach could be used in the online data processing at the LHC and applied to the full 40 million collision events produced every second. Clustering techniques commonly used in observational astronomy could be used to highlight the recurrence of special kinds of events.

Anomaly hunting In this illustrative simulation of the unsupervised detection of leptoquark (orange squares) and neutral-scalar-boson (red triangles) decays on a SM background (purple circles), LHC collisions are compressed by an autoencoder and then further compressed to a two-dimensional representation (z₁,z₂), which
is suitable for human observation, using the t-SNE algorithm. While most new-physics events overlap with the SM events, the most anomalous populate the outlying regions. These outliers could be used to define a stream of potentially interesting events to be further scrutinised in future data-taking campaigns. Credit: E Puljak

In case a new kind of process happens in an LHC collision, but is discarded by the trigger algorithms serving the traditional physics programme, an anomaly-detection algorithm could save the relevant events, storing them in a special stream of anomalous events (see “Anomaly hunting” figure). The ultimate goal of this approach would be the creation of an “anomaly catalogue” of event topologies for further studies, which could inspire novel ideas for new-physics scenarios to test using more traditional techniques. With an anomaly catalogue, we could return to the first stage of the scientific method, and recover a data-driven alternative approach to the theory-driven investigation that we have come to rely on.

This idea comes with severe technological challenges. To apply this technique to all collision events, we would need to integrate the algorithm, typically a special kind of neural network called an autoencoder, into the very first stage of the online data selection, the level-one (L1) trigger. The L1 trigger consists of logic algorithms integrated onto custom electronic boards based on field programmable gate arrays (FPGAs) – a highly parallelisable chip that serves as a programmable emulator of electronic circuits. Any L1 trigger algorithm has to run within the order of one microsecond, and take only a fraction of the available computing resources. To run in the L1 trigger system, an anomaly detection network needs to be converted into an electronic circuit that would fulfill these constraints. This goal can be met using the “hls4ml” (high-level synthesis for ML) library – a tool designed by an international collaboration of LHC physicists that exploits automatic workflows.

Computer-science collaboration

Recently, we collaborated with a team of researchers from Google to integrate the hls4ml library into Google’s “QKeras” – a tool for developing accurate ML models on FPGAs with a limited computing footprint. Thanks to this partnership, we developed a workflow that can design a ML model in concert with its final implementation on the experimental hardware. The resulting QKeras+hls4ml bundle is designed to allow LHC physicists to deploy anomaly-detection algorithms in the L1 trigger system. This approach could practically be deployed in L1 trigger systems before the end of LHC Run 3 – a powerful complement to the anomaly-detection techniques that are already being considered for “offline” data analysis on the traditionally triggered samples.

AI techniques could help the field break beyond the limits of human creativity in theory building

If this strategy is endorsed by the experimental collaborations, it could create a public dataset of anomalous data that could be investigated during the third LHC long shutdown, from 2025 to 2027. By studying those events, phenomenologists and theoretical physicists could formulate creative hypotheses about new-physics scenarios to test, potentially opening up new search directions for the High-Luminosity LHC.

Blind analyses minimise human bias if you know what to look for, but risk yielding diminishing returns when the theoretical picture is uncertain, as is the case in particle physics after the first 10 years of LHC physics. Unsupervised AI techniques such as anomaly detection could help the field break beyond the limits of human creativity in theory building. In the big-data environment of the LHC, they offer a powerful means to move the field back to data-driven discovery, after 50 years of theory-driven progress. To maximise their impact, they should be applied to every collision produced at the LHC. For that reason, we argue that anomaly-detection algorithms should be deployed in the L1 triggers of the LHC experiments, despite the technological challenges that must be overcome to make that happen.

The post Hunting anomalies with an AI trigger appeared first on CERN Courier.

What’s in the box?

cern — Tue, 31 Aug 2021 21:50:42 +0000

Magnifying the unexpected Artist’s impression of a neural network probing a black box of complex final states at the LHC. Credit: B Nachman (inspired by arxiv:0811.4622)

The need for innovation in machine learning (ML) transcends any single experimental collaboration, and requires more in-depth work than can take place at a workshop. Data challenges, wherein simulated “black box” datasets are made public, and contestants design algorithms to analyse them, have become essential tools to spark interdisciplinary collaboration and innovation. Two have recently concluded. In both cases, contestants were challenged to use ML to figure out “what’s in the box?”

LHC Olympics

The LHC Olympics (LHCO) data challenge was launched in autumn 2019, and the results were presented at the ML4Jets and Anomaly Detection workshops in spring and summer 2020. A final report summarising the challenge was posted to arXiv earlier this year, written by around 50 authors from a variety of backgrounds in theory, the ATLAS and CMS experiments, and beyond. The name of this community effort was inspired by the first LHC Olympics that took place more than a decade ago, before the start of the LHC. In those olympics, researchers were worried about being able to categorise all of the new particles that would be discovered when the machine turned on. Since then, we have learned a great deal about nature at TeV energy scales, with no evidence yet for new particles or forces of nature. The latest LHC Olympics focused on a different challenge – being able to find new physics in the first place. We now know that new physics must be rare and not exactly like what we expected.

In order to prepare for rare and unexpected new physics, organisers Gregor Kasieczka (University of Hamburg), Benjamin Nachman (Lawrence Berkeley National Laboratory) and David Shih (Rutgers University) provided a set of black-box datasets composed mostly of Standard Model (SM) background events. Contestants were charged with identifying any anomalous events that would be a sign of new physics. These datasets focused on resonant anomaly detection, whereby the anomaly is assumed to be localised – a “bump hunt”, in effect. This is a generic feature of new physics produced from massive new particles: the reconstructed parent mass is the resonant feature. By assuming that the signal is localised, one can use regions away from the signal to estimate the background. The LHCO provided one R&D dataset with labels and three black boxes to play with: one with an anomaly decaying into two two-pronged resonances, one without an anomaly, and one with an anomaly featuring two different decay modes (a dijet decay X → qq and a trijet decay X → gY, Y → qq). There are currently no dedicated searches for these signals in LHC data.

No labels

About 20 algorithms were deployed on the LHCO datasets, including supervised learning, unsupervised learning, weakly supervised learning and semi-supervised learning. Supervised learning is the most widely used method across science and industry, whereby each training example has a label: “background” or “signal”. For this challenge, the data do not have labels as we do not know exactly what we are looking for, and so strategies trained with labels from a different dataset often did not work well. By contrast, unsupervised learning generally tries to identify events that are rarely or never produced by the background; weakly supervised methods use some context from data to provide noisy labels; and semi-supervised methods use some simulation information in order to have a partial set of labels. Each method has its strengths and weaknesses, and multiple approaches are usually needed to achieve a broad coverage of possible signals.

The Dark Machines data challenge focused on developing algorithms broadly sensitive to non-resonant anomalies

The best performance on the first black box in the LHCO challenge, as measured by finding and correctly characterising the anomalous signals, was by a team of cosmologists at Berkeley (George Stein, Uros Seljak and Biwei Dai) who compared the phase-space density between a sliding signal region and sidebands (see “Olympian algorithm” figure). Overall, the algorithms did well on the R&D dataset, and some also did well on the first black box, with methods that made use of likelihood ratios proving particularly effective. But no method was able to detect the anomalies in the third black box, and many teams reported a false signal for the second black box. This “placebo effect’’ illustrates the need for ML approaches to have an accurate estimation of the background and not just a procedure for identifying signals. The challenge for the third black box, however, required algorithms to identify multiple clusters of anomalous events rather than a single cluster. Future innovation is needed in this department.

Dark Machines

A second data challenge was launched in June 2020 within the Dark Machines initiative. Dark Machines is a research collective of physicists and data scientists who apply ML techniques to understand the nature of dark matter – as we don’t know the nature of dark matter, it is critical to search broadly for its anomalous signatures. The challenge was organised by Sascha Caron (Radboud University), Caterina Doglioni (University of Lund) and Maurizio Pierini (CERN), with notable contributions from Bryan Ostidiek (Harvard University) in the development of a common software infrastructure, and Melissa van Beekveld (University of Oxford) for dataset generation. In total, 39 participants arranged in 13 teams explored various unsupervised techniques, with each team submitting multiple algorithms.

Olympian algorithm The “anomaly score”, α, as a function of the invariant mass of the leading two jets of events in “black box 1” of the LHCO data challenge, in the analysis of Stein, Seljak and Dai, who used an early form of a technique that is now called “Gaussianising iterative slicing”. A number of anomalous events are seen near 3750 GeV. Credit: arXiv:2101.08320

By contrast with LHCO, the Dark Machines data challenge focused on developing algorithms broadly sensitive to non-resonant anomalies. Good examples of non-resonant new physics include many supersymmetric models and models of dark matter – anything where “invisible” particles don’t interact with the detector. In such a situation, resonant peaks become excesses in the tails of the missing-transverse-energy distribution. Two datasets were provided: R&D datasets including a concoction of SM processes and many signal samples for contestants to develop their approaches on; and a black-box dataset mixing SM events with events from unspecified signal processes. The challenge has now formally concluded, and its outcome was posted on arXiv in May, but the black-box has not been opened to allow the community to continue to test ideas on it.

A wide variety of unsupervised methods have been deployed so far. The algorithms use diverse representations of the collider events (for example, lists of particle four-momenta, or physics quantities computed from them), and both implicit and explicit approaches for estimating the probability density of the background (for example, autoencoders and “normalising flows”). While no single method universally achieved the highest sensitivity to new-physics events, methods that mapped the background to a fixed point and looked for events that were not described well by this mapping generally did better than techniques that had a so-called dynamic embedding. A key question exposed by this challenge that will inspire future innovation is how best to tune and combine unsupervised machine-learning algorithms in a way that is model independent with respect to the new physics describing the signal.

The enthusiastic response to the LHCO and Dark Machines data challenges highlights the important future role of unsupervised ML at the LHC and elsewhere in fundamental physics. So far, just one analysis has been published – a dijet-resonance search by the ATLAS collaboration using weakly-supervised ML – but many more are underway, and these techniques are even being considered for use in the level-one triggers of LHC experiments (see Hunting anomalies with an AI trigger). And as the detection of outliers also has a large number of real-world applications, from fraud detection to industrial maintenance, fruitful cross-talk between fundamental research and industry is possible.

The LHCO and Dark Machines data challenges are a stepping stone to an exciting experimental programme that is just beginning.

The post What’s in the box? appeared first on CERN Courier.

Stealing theorists’ lunch

cern — Tue, 31 Aug 2021 21:49:46 +0000

John Ellis is Clerk Maxwell Professor of Theoretical Physics at King’s College London, and Anima Anandkumar is Bren professor at Caltech and a director of machine-learning research at NVIDIA. Credit: J Guillaune/CERN-PHOTO-201410-199-5/A Anandkumar

How might artificial intelligence make an impact on theoretical physics?

John Ellis (JE): To phrase it simply: where do we go next? We have the Standard Model, which describes all the visible matter in the universe successfully, but we know dark matter must be out there. There are also puzzles, such as what is the origin of the matter in the universe? During my lifetime we’ve been playing around with a bunch of ideas for tackling those problems, but haven’t come up with solutions. We have been able to solve some but not others. Could artificial intelligence (AI) help us find new paths towards attacking these questions? This would be truly stealing theoretical physicists’ lunch.

Anima Anandkumar (AA): I think the first steps are whether you can understand more basic physics and be able to come up with predictions as well. For example, could AI rediscover the Standard Model? One day we can hope to look at what the discrepancies are for the current model, and hopefully come up with better suggestions.

JE: An interesting exercise might be to take some of the puzzles we have at the moment and somehow equip an AI system with a theoretical framework that we physicists are trying to work with, let the AI loose and see whether it comes up with anything. Even over the last few weeks, a couple of experimental puzzles have been reinforced by new results on B-meson decays and the anomalous magnetic moment of the muon. There are many theoretical ideas for solving these puzzles but none of them strike me as being particularly satisfactory in the sense of indicating a clear path towards the next synthesis beyond the Standard Model. Is it imaginable that one could devise an AI system that, if you gave it a set of concepts that we have, and the experimental anomalies that we have, then the AI could point the way?

AA: The devil is in the details. How do we give the right kind of data and knowledge about physics? How do we express those anomalies while at the same time making sure that we don’t bias the model? There are anomalies suggesting that the current model is not complete – if you are giving that prior knowledge then you could be biasing the models away from discovering new aspects. So, I think that delicate balance is the main challenge.

JE: I think that theoretical physicists could propose a framework with boundaries that AI could explore. We could tell you what sort of particles are allowed, what sort of interactions those could have and what would still be a well-behaved theory from the point of view of relativity and quantum mechanics. Then, let’s just release the AI to see whether it can come up with a combination of particles and interactions that could solve our problems. I think that in this sort of problem space, the creativity would come in the testing of the theory. The AI might find a particle and a set of interactions that would deal with the anomalies that I was talking about, but how do we know what’s the right theory? We have to propose some other experiments that might test it – and that’s one place where the creativity of theoretical physicists will come into play.

AA: Absolutely. And many theories are not directly testable. That’s where the deeper knowledge and intuition that theoretical physicists have is so critical.

Is human creativity driven by our consciousness, or can contemporary AI be creative?

AA: Humans are creative in so many ways. We can dream, we can hallucinate, we can create – so how do we build those capabilities into AI? Richard Feynman famously said “What I cannot create, I do not understand.” It appears that our creativity gives us the ability to understand the complex inner workings of the universe. With the current AI paradigm this is very difficult. Current AI is geared towards scenarios where the training and testing distributions are similar, however, creativity requires extrapolation – being able to imagine entirely new scenarios. So extrapolation is an essential aspect. Can you go from what you have learned and extrapolate new scenarios? For that we need some form of invariance or understanding of the underlying laws. That’s where physics is front and centre. Humans have intuitive notions of physics from early childhood. We slowly pick them up from physical interactions with the world. That understanding is at the heart of getting AI to be creative.

JE: It is often said that a child learns more laws of physics than an adult ever will! As a human being, I think that I think. I think that I understand. How can we introduce those things into AI?

Could AI rediscover the Standard Model?

AA: We need to get AI to create images, and other kinds of data it experiences, and then reason about the likelihood of the samples. Is this data point unlikely versus another one? Similarly to what we see in the brain, we recently built feedback mechanisms into AI systems. When you are watching me, it’s not just a free-flowing system going from the retina into the brain; there’s also a feedback system going from the inferior temporal cortex back into the visual cortex. This kind of feedback is fundamental to us being conscious. Building these kinds of mechanisms into AI is the first step to creating conscious AI.

JE: A lot of the things that you just mentioned sound like they’re going to be incredibly useful going forward in our systems for analysing data. But how is AI going to devise an experiment that we should do? Or how is AI going to devise a theory that we should test?

AA: Those are the challenging aspects for an AI. A data-driven method using a standard neural network would perform really poorly. It will only think of the data that it can see and not about data that it hasn’t seen – what we call “zero-short generalisation”. To me, the past decade’s impressive progress is due to a trinity of data, neural networks and computing infrastructure, mainly powered by GPUs [graphics processing units], coming together: the next step for AI is a wider generalisation to the ability to extrapolate and predict hitherto unseen scenarios.

Across the many tens of orders of magnitude described by modern physics, new laws and behaviours “emerge” non-trivially in complexity (see Emergence). Could intelligence also be an emergent phenomenon?

JE: As a theoretical physicist, my main field of interest is the fundamental building blocks of matter, and the roles that they play very early in the history of the universe. Emergence is the word that we use when we try to capture what happens when you put many of these fundamental constituents together, and they behave in a way that you could often not anticipate if you just looked at the fundamental laws of physics. One of the interesting developments in physics over the past generation is to recognise that there are some universal patterns that emerge. I’m thinking, for example, of phase transitions that look universal, even though the underlying systems are extremely different. So, I wonder, is there something similar in the field of intelligence? For example, the brain structure of the octopus is very different from that of a human, so to what extent does the octopus think in the same way that we do?

AA: There’s a lot of interest now in studying the octopus. From what I learned, its intelligence is spread out so that it’s not just in its brain but also in its tentacles. Consequently, you have this distributed notion of intelligence that still works very well. It can be extremely camouflaged – imagine being in a wild ocean without a shell to protect yourself. That pressure created the need for intelligence such that it can be extremely aware of its surroundings and able to quickly camouflage itself or manipulate different tools.

JE: If intelligence is the way that a living thing deals with threats and feeds itself, should we apply the same evolutionary pressure to AI systems? We threaten them and only the fittest will survive. We tell them they have to go and find their own electricity or silicon or something like that – I understand that there are some first steps in this direction, computer programs competing with each other at chess, for example, or robots that have to find wall sockets and plug themselves in. Is this something that one could generalise? And then intelligence could emerge in a way that we hadn’t imagined?

Similarly to what we see in the brain, we recently built feedback mechanisms into AI systems

AA: That’s an excellent point. Because what you mentioned broadly is competition – different kinds of pressures that drive towards good, robust objectives. An example is generative adversarial models, which can generate very realistic looking images. Here you have a discriminator that challenges the generator to generate images that look real. These kinds of competitions or games are getting a lot of traction and we have now passed the Turing test when it comes to generating human faces – you can no longer tell very easily whether it is generated by AI or if it is a real person. So, I think those kinds of mechanisms that have competition built into the objective they optimise are fundamental to creating more robust and more intelligent systems.

JE: All this is very impressive – but there are still some elements that I am missing, which seem very important to theoretical physics. Take chess: a very big system but finite nevertheless. In some sense, what I try to do as a theoretical physicist has no boundaries. In some sense, it is infinite. So, is there any hope that AI would eventually be able to deal with problems that have no boundaries?

AA: That’s the difficulty. These are infinite-dimensional spaces… so how do we decide how to move around there? What distinguishes an expert like you from an average human is that you build your knowledge and develop intuition – you can quickly make judgments and find which narrow part of the space you want to work on compared to all the possibilities. That’s the aspect that is so difficult for AI to figure out. The space is enormous. On the other hand, AI does have a lot more memory, a lot more computational capacity. So can we create a hybrid system, with physicists and machine learning in tandem, to help us harness the capabilities of both AI and humans together? We’re currently exploring theorem provers: can we use the theorems that humans have proven, and then add reinforcement learning on top to create very fast theorem solvers? If we can create such fast theorem provers in pure mathematics, I can see them being very useful for understanding the Standard Model and the gaps and discrepancies in it. It is much harder than chess, for example, but there are exciting programming frameworks and data sets available, with efforts to bring together different branches of mathematics. But I don’t think humans will be out of the loop, at least for now.

The post Stealing theorists’ lunch appeared first on CERN Courier.

Loop Summit convenes in Como

cern — Thu, 19 Aug 2021 12:50:11 +0000

Precision calculations in the Standard Model and beyond are very important for the experimental programme of the LHC, planned high-energy colliders and gravitational-wave detectors of the future. Following two years of pandemic-imposed virtual discussions, 25 invited experts gathered from 26 to 30 July at Cadenabbia on Lake Como, Italy, to present new results and discuss paths into the computational landscape of this year’s “Loop Summit”.

Invitational Participants at the Loop Summit 2021. Credit: Loop Summit 2021

The conference surveyed topics relating to multi-loop and multi-leg calculations in quantum chromodynamics (QCD) and electroweak processes. In scattering processes, loops are closed particle lines and legs represent external particles. Both present computational challenges. Recent progress on many inclusive processes has been reported at three- or four-loop order, including for deep-inelastic scattering, jets at colliders, the Drell–Yan process, top-quark and Higgs-boson production, and aspects of bottom-quark physics. Much improved descriptions of scaling violations of parton densities, heavy-quark effects at colliders, power corrections, mixed QCD and electroweak corrections, and high-order QED corrections for e⁺e^– colliders have also recently been obtained. These will be important for many processes at the LHC, and pave the way to physics at facilities such as the proposed Future Circular Collider (FCC).

Quantum field theory provides a very elegant way to solve Einsteinian gravity

Weighty considerations

Although merging black holes can have millions of solar masses, the physics describing them remains classical, and quantum gravity happened, if at all, shortly after the Big Bang. Nevertheless, quantum field theory provides an elegant way to solve Einsteinian gravity. At this year’s Loop Summit, perturbative approaches to gravity were discussed that use field-theoretic methods at the level of the 5th and 6th post-Newtonian approximations, where the nth post-Newtonian order corresponds to a classical n-loop calculation between black-hole world lines. These calculations allow predictions of the binding energy and periastron advance of spiralling-in pairs of black holes, and relate them to gravitational-wave effects. In these calculations, the classical loops all link to world lines in classical graviton networks within the framework of an effective-field-theory representation of Einsteinian gravity.

Other talks discussed important progress on advanced analytic computation technologies and new mathematical methods such as computational improvements in massive Dirac-algebra, new ways to calculate loop integrals analytically, new ways to deal consistently with polarised processes, the efficient reduction of highly connected systems of integrals, the solution of gigantic systems of differential equations, and numerical methods based on loop-tree duality. All these methods will decrease the theory errors for many processes due to be measured in the high-luminosity phase of the LHC, and beyond.

Half of the meeting was devoted to developing new ideas in subgroups. In-person discussions are invaluable for highly technical discussions such as these — there is still no substitute for gathering around the blackboard informally and jotting down equations and diagrams. The next Loop Summit in this triennial series will take place in summer 2024.

The post Loop Summit convenes in Como appeared first on CERN Courier.

AI and GPUs take centre stage at vCHEP

cern — Sun, 18 Jul 2021 12:21:35 +0000

vCHEP 2021 More than 1000 participants took part across 20 time zones, from Brisbane to Honolulu. Credit: vCHEP 2021

The 25th International Conference on Computing in High-Energy and Nuclear Physics (CHEP) gathered more than 1000 participants online from 17 to 21 May. Dubbed “vCHEP”, the event took place virtually after this year’s in-person event in Norfolk, Virginia, had to be cancelled due to the COVID-19 pandemic. Participants tuned in across 20 time zones, from Brisbane to Honolulu, to live talks, recorded sessions, excellent discussions on chat apps (to replace the traditional coffee-break interactions) and special sessions that linked job seekers with recruiters.

Given vCHEP’s virtual nature this year, there was a different focus on the content. Plenary speakers are usually invited, but this time the organisers invited papers of up to 10 pages to be submitted, and chose a plenary programme from the most interesting and innovative. Just 30 had to be selected from more than 200 submissions — twice as many as expected — but the outcome was a diverse programme tackling the huge issues of data rate and event complexity in future experiments in nuclear and high-energy physics (HEP).

Artificial intelligence

So what were the hot topics at vCHEP? One outstanding one was artificial intelligence and machine learning. There were more papers submitted on this theme than any other, showing that the field is continuing to innovate in this domain.

Interest in using graph neural networks for the problem of charged-particle tracking was very high, with three plenary talks. Using a graph to represent the hits in a tracker as nodes and possible connections between hits as edges is a very natural way to represent the data that we get from experiments. The network can be effectively trained to pick out the edges representing the true tracks and reject those that are just spurious connections. The time needed to get to a good solution has improved dramatically in just a few years, and the scaling of the solution to dense environments, such as at the High-Luminosity LHC (HL-LHC), is very promising for this relatively new technique.

ATLAS showed off their new fast-simulation framework

On the simulation side, work was presented showcasing new neural-network architectures that use a “bounded information-bottleneck autoencoder” to improve training stability, providing a solution that replicates important features such as how real minimum-ionising particles interact with calorimeters. ATLAS also showed off their new fast-simulation framework, which combines traditional parametric simulation with generative adversarial networks, to provide better agreement with Geant4 than ever before.

New architectures

Machine learning is very well suited to new computing architectures, such as graphics processing units (GPUs), but many other experimental-physics codes are also being rewritten to take advantage of these new architectures. IceCube are simulating photon transport in the Antarctic ice on GPUs, and presented detailed work on their performance analysis that led to recent significant speed-ups. Meanwhile, LHCb will introduce GPUs to their trigger farm for Run 3, and showed how much this will improve the energy consumption per event of the high-level trigger. This will help to meet the physical constraints of power and cooling close to the detector, and is a first step towards bringing HEP’s overall computing energy consumption to the table as an important parameter.

LHCb will introduce GPUs to their trigger farm for Run 3

Encouraging work on porting event generation to GPUs was also presented — particularly appropriately, given the spiralling costs of higher order generators for HL-LHC physics. Looking at the long-term future of these new code bases, there were investigations of porting calorimeter simulation and liquid-argon time-projection chamber software to different toolkits for heterogeneous programming, a topic that will become even more important as computing centres diversify their offerings.

Keeping up with benchmarking and valuing these heterogeneous resources is an important topic for the Worldwide LHC Computing Grid, and a report from the HEPiX Benchmarking group pointed to the future for evaluating modern CPUs and GPUs for a variety of real-world HEP applications. Staying on the facilities topic, R&D was presented on how to optimise delivering reliable and affordable storage for HEP, based on CephFS and the CERN-developed EOS storage system. This will be critical to providing the massive storage needed in the future. The network between facilities will likely become dynamically configurable in the future, and how best to take advantage of machine learning for traffic prediction is being investigated.

Quantum computing

vCHEP was also the first edition of CHEP with a dedicated parallel session on quantum computing. Meshing very well with CERN’s Quantum Initiative, this showed how seriously investigations of how to use this technology in the future are being taken. Interesting results on using quantum support-vector machines to train networks for signal/background classification for B-meson decays were highlighted.

On a meta note, presentations also explored how to adapt outreach events to a virtual setup, to keep up public engagement during lockdown, and how best to use online software training to equip the future generation of physicists with the advanced software skills they will need.

Was vCHEP a success? So far, the feedback is overwhelmingly positive. It was a showcase for the excellent work going on in the field, and 11 of the best papers will be published in a special edition of Computing and Software for Big Science — another first for CHEP in 2021.

The post AI and GPUs take centre stage at vCHEP appeared first on CERN Courier.

CMS seeks support for Lebanese colleagues

cern — Fri, 23 Apr 2021 14:41:08 +0000

Strong ties Young Lebanese scientists at CERN. Credit: M Alali

The CMS collaboration, in partnership with the Geneva-based Sharing Knowledge Foundation, has launched a fundraising initiative to support the Lebanese scientific community during an especially difficult period. Lebanon signed an international cooperation agreement with CERN in 2016, which triggered a strong development of the country’s contributions to CERN projects, particularly to the CMS experiment through the affiliation of four of its top universities. Yet the country is dealing with an unprecedented economic crisis, food shortages, Syrian refugees and the COVID-19 pandemic, all in the aftermath of the Beirut port explosion in August 2020.

“Even the most resilient higher-education institutions in Lebanon are struggling to survive,” says CMS collaborator Martin Gastal of CERN, who initiated the fundraising activity in March. “Despite these challenges, the Lebanese scientific community has reaffirmed its commitment to CERN and CMS, but it needs support.”

One project, High-Performance Computing for Lebanon (HPC4L), which was initiated to build Lebanon’s research capacity while contributing as a Tier-2 centre to the analysis of CMS data, is particularly at risk. HPC4L was due to benefit from servers donated by CERN to Lebanon, and from the transfer of CERN and CMS knowledge and expertise to train a dedicated support team that will run a high-performance computing facility there. But the hardware has been unable to be shipped from CERN because of a lack of available funding. CMS and the Sharing Knowledge Foundation are therefore fundraising to cover the shipping costs of the donated hardware, to purchase hardware to allow its installation, and to support Lebanese experts while they are trained at CERN by the CMS offline computing team.

“At this pivotal moment, every effort to help Lebanon counts,” says Gastal. “CMS is reaching out for donations to support this initiative, to help both the Lebanese research community and the country itself.”

More information, including how to get involved, can be found at: cern.ch/fundraiser-lebanon.

The post CMS seeks support for Lebanese colleagues appeared first on CERN Courier.

An anomalous moment for the muon

cern — Wed, 14 Apr 2021 12:58:58 +0000

Challenging calculation It took a billion core-hours on the Mira supercomputer to compute the hadronic light-by-light contribution to the muon’s magnetic moment. Credit: L Jin

A fermion’s spin tends to twist to align with a magnetic field – an effect that becomes dramatically macroscopic when electron spins twist together in a ferromagnet. Microscopically, the tiny magnetic moment of a fermion interacts with the external magnetic field through absorption of photons that comprise the field. Quantifying this picture, the Dirac equation predicts fermion magnetic moments to be precisely two in units of Bohr magnetons, e/2m. But virtual lines and loops add an additional 0.1% or so to this value, giving rise to an “anomalous” contribution known as “g–2” to the particle’s magnetic moment, caused by quantum fluctuations. Calculated to tenth order in quantum electrodynamics (QED), and verified experimentally to about two parts in 10¹⁰, the electron’s magnetic moment is one of the most precisely known numbers in the physical sciences. While also measured precisely, the magnetic moment of the muon, however, is in tension with the Standard Model.

Tricky comparison

The anomalous magnetic moment of the muon was first measured at CERN in 1959, and prior to 2021, was most recently measured by the E821 experiment at Brookhaven National Laboratory (BNL) 16 years ago. The comparison between theory and data is much trickier than for electrons. Being short-lived, muons are less suited to experiments with Penning traps, whereby stable charged particles are confined using static electric and magnetic fields, and the trapped particles are then cooled to allow precise measurements of their properties. Instead, experiments infer how quickly muon spins precess in a storage ring – a situation similar to the wobbling of a spinning top, where information on the muon’s advancing spin is encoded in the direction of the electron that is emitted when it decays. Theoretical calculations are also more challenging, as hadronic contributions are no longer so heavily suppressed when they emerge as virtual particles from the more massive muon.

All told, our knowledge of the anomalous magnetic moment of the muon is currently three orders of magnitude less precise than for electrons. And while everything tallies up, more or less, for the electron, BNL’s longstanding measurement of the magnetic moment of the muon is 3.7σ greater than the Standard Model prediction (see panel “Rising to the moment”). The possibility that the discrepancy could be due to virtual contributions from as-yet-undiscovered particles demands ever more precise theoretical calculations. This need is now more pressing than ever, given the increased precision of the experimental value expected in the next few years from the Muon g–2 collaboration at Fermilab in the US and other experiments such as the Muon g–2/EDM collaboration at J-PARC in Japan. Hotly anticipated results from the first data run at Fermilab’s E989 experiment were released on 7 April. The new result is completely consistent with the BNL value but with a slightly smaller error, leading to a slightly larger discrepancy of 4.2σ with the Standard Model when the measurements are combined (see Fermilab strengthens muon g-2 anomaly).

Hadronic vacuum polarisation

The value of the muon anomaly, a_μ, is an important test of the Standard Model because currently it is known very precisely – to roughly 0.5 parts per million (ppm) – in both experiment and theory. QED dominates the value of a_μ, but due to the non-perturbative nature of QCD it is strong interactions that contribute most to the error. The theoretical uncertainty on the anomalous magnetic moment of the muon is currently dominated by so-called hadronic vacuum polarisation (HVP) diagrams. In HVP, a virtual photon briefly explodes into a “hadronic blob”, before being reabsorbed, while the magnetic-field photon is simultaneously absorbed by the muon. While of order α² in QED, it is all orders in QCD, making for very difficult calculations.

Rising to the moment

Artist's illustration of muons experiencing Standard-Model interactions with virtual photons in a magnetic field. Credit: M. Rayner / CERN' data-caption='Freewheelin’ Artist’s illustration of muons interacting with a magnetic field. Credit: M. Rayner / CERN'> Artist's illustration of muons experiencing Standard-Model interactions with virtual photons in a magnetic field. Credit: M. Rayner / CERN'>

Freewheelin’ Artist’s illustration of muons interacting with a magnetic field. Credit: M. Rayner / CERN

In the Standard Model, the magnetic moment of the muon is computed order-by-order in powers of a for QED (each virtual photon represents a factor of α), and to all orders in as for QCD.

At the lowest order in QED, the Dirac term (pictured left) accounts for precisely two Bohr magnetons and arises purely from the muon (μ) and the real external photon (γ) representing the magnetic field.

At higher orders in QED, virtual Standard Model particles, depicted by lines forming loops, contribute to a fractional increase of aμ with respect to that value: the so-called anomalous magnetic moment of the muon. It is defined to be a_μ = (g–2)/2, where g is the gyromagnetic ratio of the muon – the number of Bohr magnetons, e/2m, which make up the muon’s magnetic moment. According to the Dirac equation, g = 2, but radiative corrections increase its value.

The biggest contribution is from the Schwinger term (pictured left, O(α)) and higher-order QED diagrams.

a_μ^QED = (116 584 718.931 ± 0.104) × 10^–11

Electroweak lines (pictured left) also make a well-defined contribution. These diagrams are suppressed by the heavy masses of the Higgs, W and Z bosons.

a_μ^EW = (153.6 ± 1.0) × 10–11

The biggest QCD contribution is due to hadronic vacuum polarisation (HVP) diagrams. These are computed from leading order (pictured left, O(α²)), with one “hadronic blob” at all orders in as (shaded) up to next-to-next-to-leading order (NNLO, O(α⁴), with three hadronic blobs) in the HVP.

Hadronic light-by-light scattering (HLbL, pictured left at O(α³) and all orders in α_s (shaded)), makes a smaller contribution but with a larger fractional uncertainty.

Neglecting lattice–QCD calculations for the HVP in favour of those based on e⁺e^– data and phenomenology, the total anomalous magnetic moment is given by

a_μ^SM = a_μ^QED + a_μ^EW + a_μ^HVP + a_μ^HLbL = (116 591 810 ± 43) × 10^–11.

This is somewhat below the combined value from the E821 experiment at BNL in 2004 and the E989 experiment at Fermilab in 2021.

a_μ^exp = (116 592 061 ± 41) × 10^–11

The discrepancy has roughly 4.2σ significance:

a_μ^exp– a_μ^SM = (251 ± 59) × 10^–11.

Historically, and into the present, HVP is calculated using a dispersion relation and experimental data for the cross section for e⁺e^– → hadrons. This idea was born of necessity almost 60 years ago, before QCD was even on the scene, let alone calculable. The key realisation is that the imaginary part of the vacuum polarisation is directly related to the hadronic cross section via the optical theorem of wave-scattering theory; a dispersion relation then relates the imaginary part to the real part. The cross section is determined over a relatively wide range of energies, in both exclusive and inclusive channels. The dominant contribution – about three quarters – comes from the e⁺e^– → π⁺π^– channel, which peaks at the rho meson mass, 775 MeV. Though the integral converges rapidly with increasing energy, data are needed over a relatively broad region to obtain the necessary precision. Above the τ mass, QCD perturbation theory hones the calculation.

Several groups have computed the HVP contribution in this way, and recently a consensus value has been produced as part of the worldwide Muon g–2 Theory Initiative. The error stands at about 0.58% and is the dominant part of the theory error. It is worth noting that a significant part of the error arises from a tension between the most precise measurements, by the BaBar and KLOE experiments, around the rho–meson peak. New measurements, including those from experiments at Novosibirsk, Russia and Japan’s Belle II experiment, may help resolve the inconsistency in the current data and reduce the error by a factor of two or so.

The alternative approach, of calculating the HVP contribution from first principles using lattice QCD, is not yet at the same level of precision, but is getting there. Consistency between the two approaches will be crucial for any claim of new physics.

Lattice QCD

Kenneth Wilson formulated lattice gauge theory in 1974 as a means to rid quantum field theories of their notorious infinities – a process known as regulating the theory – while maintaining exact gauge invariance, but without using perturbation theory. Lattice QCD calculations involve the very large dimensional integration of path integrals in QCD. Because of confinement, a perturbative treatment including physical hadronic states is not possible, so the complete integral, regulated properly in a discrete, finite volume, is done numerically by Monte Carlo integration.

Lattice QCD has made significant improvements over the last several years, both in methodology and invested computing time. Recently developed methods (which rely on low-lying eigenmodes of the Dirac operator to speed up calculations) have been especially important for muon–anomaly calculations. By allowing state-of-the-art calculations using physical masses, they remove a significant systematic: the so-called chiral extrapolation for the light quarks. The remaining systematic errors arise from the finite volume and non-zero lattice spacing employed in the simulations. These are handled by doing multiple simulations and extrapolating to the infinite-volume and zero-lattice-spacing limits.

The HVP contribution can readily be computed using lattice QCD in Euclidean space with space-like four-momenta in the photon loop, thus yielding the real part of the HVP directly. The dispersive result is currently more precise (see “Off the mark” figure”), but further improvements will depend on consistent new e⁺e^– scattering datasets.

Off the mark Values of the leading-order hadronic vacuum-polarisation contribution to the anomalous magnetic moment of the muon, a_μ^{HVP, LO} (× 10¹⁰). The purple circles indicate independent lattice–QCD calculations; the red squares show results from dispersion relations and the experimental cross-sections for e⁺e^– annihilation; the orange triangle shows a combined value where contributions are taken from the most precise parts of data-driven and lattice results of Jegerlehner 2018 and RBC/UKQCD 2018, respectively; and the black triangle shows the theoretical value currently being used for experimental comparisons with E821 and E989. The grey band indicates the value needed to bring the SM into agreement with the measurements by the BNL and Fermilab experiments. Credit: plot adapted from T Aoyama et al. 2020.

Rapid progress in the last few years has resulted in first lattice results with sub-percent uncertainty, closing in on the precision of the dispersive approach. Since these lattice calculations are very involved and still maturing, it will be crucial to monitor the emerging picture once several precise results with different systematic approaches are available. It will be particularly important to aim for statistics-dominated errors to make it more straightforward to quantitatively interpret the resulting agreement with the no-new-physics scenario or the dispersive results. In the shorter term, it will also be crucial to cross-check between different lattice and dispersive results using additional observables, for example based on the vector–vector correlators.

With improved lattice calculations in the pipeline from a number of groups, the tension between lattice QCD and phenomenological calculations may well be resolved before the Fermilab and J-PARC experiments announce their final results. Interestingly, there is a new lattice result with sub-percent precision (BMW 2020) that is in agreement both with the no-new-physics point within 1.3σ, and with the dispersive-data-driven result within 2.1σ. Barring a significant re-evaluation of the phenomenological calculation, however, HVP does not appear to be the source of the discrepancy with experiments.

The next most likely Standard Model process to explain the muon anomaly is hadronic light-by-light scattering. Though it occurs less frequently since it includes an extra virtual photon compared to the HVP contribution, it is much less well known, with comparable uncertainties to HVP.

Hadronic light-by-light scattering

In hadronic light-by-light scattering (HLbL), the magnetic field interacts not with the muon, but with a hadronic “blob”, which is connected to the muon by three virtual photons. (The interaction of the four photons via the hadronic blob gives HLbL its name.) A miscalculation of the HLbL contribution has often been proposed as the source of the apparently anomalous measurement of the muon anomaly by BNL’s E821 collaboration.

Since the so-called Glasgow consensus (the fruit of a 2009 workshop) first established a value more than 10 years ago, significant progress has been made on the analytic computation of the HLbL scattering contribution. In particular, a dispersive analysis of the most important hadronic channels has been carried out, including the leading pion–pole, sub-leading pion loop and rescattering diagrams including heavier pseudoscalars. These calculations are analogous in spirit to the dispersive HVP calculations, but are more complicated, and the experimental measurements are more difficult because form factors with one or two virtual photons are required.

The project to calculate the HLbL contribution using lattice QCD began more than 10 years ago, and many improvements to the method have been made to reduce both statistical and systematic errors since then. Last year we published, with colleagues Norman Christ, Taku Izubuchi and Masashi Hayakawa, the first ever lattice–QCD calculation of the HLbL contribution with all errors controlled, finding a_μ^{HLbL, lattice} = (78.7 ± 30.6 (stat) ± 17.7 (sys)) × 10^–11. The calculation was not easy: it took four years and a billion core-hours on the Mira supercomputer at Argonne National Laboratory’s Large Computing Facility.

Our lattice HLbL calculations are quite consistent with the analytic and data-driven result, which is approximately a factor of two more precise. Combining the results leads to a_μ^HLbL = (90 ± 17) × 10^–11, which means the very difficult HLbL contribution cannot explain the Standard Model discrepancy with experiment. To make such a strong conclusion, however, it is necessary to have consistent results from at least two completely different methods of calculating this challenging non-perturbative quantity.

New physics?

If current theory calculations of the muon anomaly hold up, and the new experiments reduce its uncertainty by the hoped-for factor of four, then a new-physics explanation will become impossible to ignore. The idea would be to add particles and interactions that have not yet been observed but may soon be discovered at the LHC or in future experiments. New particles would be expected to contribute to the anomaly through Feynman diagrams similar to the Standard Model topographies (see “Rising to the moment” panel).

Calculations of the anomalous magnetic moment of the muon are not finished

The most commonly considered new-physics explanation is supersymmetry, but the increasingly stringent lower limits placed on the masses of super-partners by the LHC experiments make it increasingly difficult to explain the muon anomaly. Other theories could do the job too. One popular idea that could also explain persistent anomalies in the b-quark sector is heavy scalar leptoquarks, which mediate a new interaction allowing leptons and quarks to change into each other. Another option involves scenarios whereby the Standard Model Higgs boson is accompanied by a heavier Higgs-like boson.

The calculations of the anomalous magnetic moment of the muon are not finished. As a systematically improvable method, we expect more precise lattice determinations of the hadronic contributions in the near future. Increasingly powerful algorithms and hardware resources will further improve precision on the lattice side, and new experimental measurements and analysis methods will do the same for dispersive studies of the HVP and HLbL contributions.

To confidently discover new physics requires that these two independent approaches to the Standard Model value agree. With the first new results on the experimental value of the muon anomaly in almost two decades showing perfect agreement with the old value, we anxiously await more precise measurements in the near future. Our hope is that the clash of theory and experiment will be the beginning of an exciting new chapter of particle physics, heralding new discoveries at current and future particle colliders.

The post An anomalous moment for the muon appeared first on CERN Courier.

Tooling up to hunt dark matter

cern — Thu, 04 Mar 2021 13:33:55 +0000

Dark modelling The famous Bullet Cluster, in which the mass distribution of two colliding clusters (blue) and the distribution of baryonic matter (pink) are overlaid on optical data showing the positions of the galaxies. Credit: NASA/CXC/CfA/M Maverik et al./STScl/ESO WFI/Magellan/U. Arizona/D Clowe et al.

The past century has seen ever stronger links forged between the physics of elementary particles and the universe at large. But the picture is mostly incomplete. For example, numerous observations indicate that 87% of the matter of the universe is dark, suggesting the existence of a new matter constituent. Given a plethora of dark-matter candidates, numerical tools are essential to advance our understanding. Fostering cooperation in the development of such software, the TOOLS 2020 conference attracted around 200 phenomenologists and experimental physicists for a week-long online workshop in November.

The viable mass range for dark matter spans 90 orders of magnitude, while the uncertainty about its interaction cross section with ordinary matter is even larger (see “Theoretical landscape” figure). Dark matter may be new particles belonging to theories beyond-the-Standard Model (BSM), an aggregate of new or SM particles, or very heavy objects such as primordial black holes (PBHs). On the latter subject, Jérémy Auffinger (IP2I Lyon) updated TOOLS 2020 delegates on codes for very light PBHs, noting that “BlackHawk” is the first open-source code for Hawking-radiation calculations.

Flourishing models

Weakly interacting massive particles (WIMPs) have enduring popularity as dark-matter candidates, and are amenable to search strategies ranging from colliders to astrophysical observations. In the absence of any clear detection of WIMPs at the electroweak scale, the number of models has flourished. Above the TeV scale, these include general hidden-sector models, FIMPs (feebly interacting massive particles), SIMPs (strongly interacting massive particles), super-heavy and/or composite candidates and PBHs. Below the GeV scale, besides FIMPs, candidates include the QCD axion, more generic ALPs (axion-like particles) and ultra-light bosonic candidates. ALPs are a class of models that received particular attention at TOOLS 2020, and is now being sought in fixed-target experiments across the globe.

For each dark-matter model, astroparticle physicists must compute the theoretical predictions and characteristic signatures of the model and confront those predictions with the experimental bounds to select the model parameter space that is consistent with observations. To this end, the past decade has seen the development of a huge variety of software – a trend mapped and encouraged by the TOOLS conference series, initiated by Fawzi Boudjema (LAPTh Annecy) in 1999, which has brought the community together every couple of years since.

Models connecting dark matter with collider experiments are becoming ever more optimised to the needs of users

Three continuously tested codes currently dominate generic BSM dark-matter model computations. Each allows for the computation of relic density from freeze-out and predictions for direct and indirect detection, often up to next-to-leading corrections. Agreement between them is kept below the percentage level. “micrOMEGAs” is by far the most used code, and is capable of predicting observables for any generic model of WIMPs, including those with multiple dark-matter candidates. “DarkSUSY” is more oriented towards supersymmetric theories, but it can be used for generic models as the code has a very convenient modular structure. Finally, “MadDM” can compute WIMP observables for any BSM model from MeV to hundreds of TeV. As MadDM is a plugin of MadGraph, it inherits unique features such as its automatic computation of new dark-matter observables, including indirect-detection processes with an arbitrary number of final-state particles and loop-induced processes. This is essential for analysing sharp spectral features in indirect-detection gamma-ray measurements that cannot be mimicked by any known astrophysical background.

Theoretical landscape Possible interaction cross sections are sketched versus mass for viable dark-matter candidates. Credit: C Arina

Both micrOMEGAs and MadDM permit the user to confront theories with recast experimental likelihoods for several direct and indirect detection experiments. Jan Heisig (UCLouvain) reported that this is a work in progress, with many more experimental data sets to be included shortly. Torsten Bringmann (University of Oslo) noted that a strength of DarkSUSY is the modelling of qualitatively different production mechanisms in the early universe. Alongside the standard freeze-out mechanism, several new scenarios can arise, such as freeze-in (FIMP models, as chemical and kinetic equilibrium cannot be achieved), dark freeze-out, reannihilation and “cannibalism”, to name just a few. Freeze-in is now supported by micrOMEGAs.

Models connecting dark matter with collider experiments are becoming ever more optimised to the needs of users. For example, micrOMEGAs interfaces with SModelS, which is capable of quickly applying all possible LHC-relevant supersymmetric searches. The software also includes long-lived particles, as commonly found in FIMP models. As MadDM is embedded in MadGraph, noted Benjamin Fuks (LPTHE Paris), tools such as MadAnalysis may be used to recast CMS and ATLAS searches. Celine Degrande (UCLouvain) described another nice tool, FeynRules, which produces model files in both the MadDM and micrOMEGAs formats given the Lagrangian for the BSM model, providing a very useful automatised chain from the model directly to the dark-matter observables, high-energy predictions and comparisons with experimental results. Meanwhile, MadDump expands MadGraph’s predictions and detector simulations from the high-energy collider limits to fixed-target experiments such as NA62. To complete a vibrant landscape of development efforts, Tomas Gonzalo (Monash) presented the GAMBIT collaboration’s work to provide tools for global fits to generic dark-matter models.

A phenomenologist’s dream

Huge efforts are underway to develop a computational platform to study new directions in experimental searches for dark matter, and TOOLS 2020 showed that we are already very close to the phenomenologist’s dream for WIMPs. TOOLS 2020 wasn’t just about dark matter either – it also covered developments in Higgs and flavour physics, precision tests and general fitting, and other tools. Interested parties are welcome to join in the next TOOLS conference due to take place in Annecy in 2022.

The post Tooling up to hunt dark matter appeared first on CERN Courier.

HPC computing collaboration kicks off

cern — Wed, 10 Feb 2021 15:18:50 +0000

CERN has welcomed more than 120 delegates to an online kick-off workshop for a new collaboration on high-performance computing (HPC). CERN, SKAO (the organisation leading the development of the Square Kilometre Array), GÉANT (the pan-European network and services provider for research and education) and PRACE (the Partnership for Advanced Computing in Europe) will work together to realise the full potential of the coming generation of HPC technology for data-intensive science.

It is an exascale project for an exascale problem
Maria Girone

“It is an exascale project for an exascale problem,” said Maria Girone, CERN coordinator of the collaboration and CERN openlab CTO, in opening remarks at the workshop. “HPC is at the intersection of several important R&D activities: the expansion of computing resources for important data-intensive science projects like the HL-LHC and the SKA, the adoption of new techniques such as artificial intelligence and machine learning, and the evolution of software to maximise the potential of heterogeneous hardware architectures.”

The 29 September workshop, which was organised with the support of CERN openlab, saw participants establish the collaboration’s foundations, outline initial challenges and begin to define the technical programme. Four main initial areas of work were discussed at the event: training and centres of expertise, benchmarking, data access, and authorisation and authentication.

One of the largest challenges in using new HPC technology is the need to adapt to heterogeneous hardware. This involves the development and dissemination of new programming skills, which is at the core of the new HPC collaboration’s plan. A number of examples showing the potential of heterogeneous systems were discussed. One is the EU-funded DEEP-EST project, which is developing a modular supercomputing prototype for exascale computing. DEEP-EST has already contributed to the re-engineering of high-energy physics algorithms for accelerated architectures, highlighting the significant mutual benefits of collaboration across fields when it comes to HPC. PRACE’s excellent record of providing support and training will also be critical to the success of the collaboration.

Benchmarking press

Establishing a common benchmark suite will help the organisations to measure and compare the performance of different types of computing resources for data-analysis workflows from astronomy and particle physics. The suite will include applications representative of the HEP and astrophysics communities – reflecting today’s needs, as well as those of the future – and augment the existing Unified European Applications Benchmark Suite.

Access is another challenge when using HPC resources. Data from the HL-LHC and the SKA will be globally distributed and will be moved over high-capacity networks, staged and cached to reduce latency, and eventually processed, analysed and redistributed. Accessing the HPC resources themselves involves adherence to strict cyber-security protocols. A technical area devoted to authorisation and authentication infrastructure is defining demonstrators to enable large scientific communities to securely access protected resources.

The collaboration will now move forward with its ambitious technical programme. Working groups are forming around specific challenges, with the partner organisations providing access to appropriate testbed resources. Important activities are already taking place in all four areas of work, and a second collaboration workshop will soon be organised.

The post HPC computing collaboration kicks off appeared first on CERN Courier.

Learning language by machine

cern — Fri, 05 Feb 2021 08:14:06 +0000

Talking shop Lingvist CEO Mait Müntel (left) talks to Rachel Bray of the CERN Office of Alumni Relations at a LinkedIn Live event last year. Credit: J Fichet/CERN

Mait Müntel came to CERN as a summer student in 2004 and quickly became hooked on particle physics, completing a PhD in the CMS collaboration in 2008 with a thesis devoted to signatures of double-charged Higgs bosons. Continuing in the field, he was one of the first to do shifts in the CMS control room when the LHC ramped up. It was then that he realised that the real LHC data looked nothing like the Monte Carlo simulations of his student days. Many things had to be rectified, but Mait admits he was none too fond of coding and didn’t have any formal training. “I thought I would simply ‘learn by doing’,” he says. “However, with hindsight, I should probably have been more systematic in my approach.” Little did he know that, within a few years, he would be running a company with around 40 staff developing advanced language-learning algorithms.

Memory models

Despite spending long periods in the Geneva region, Mait had not found the time to pick up French. Frustrated, he began to take an interest in the use of computers to help humans learn languages at an accelerated speed. “I wanted to analyse from a statistical point of view the language people were actually speaking, which, having spent several years learning both Russian and English, I was convinced was very different to what is found in academic books and courses,” he says. Over the course of one weekend, he wrote a software crawler that enabled him to download a collection of French subtitles from a film database. His next step was to study memory models to understand how one acquires new knowledge, calculating that, if a computer program could intelligently decide what would be optimal to learn in the next moment, it would be possible to learn a language in only 200 hours. He started building some software using ROOT (the object-oriented program and library developed by CERN for data analysis) and, within two weeks, was able to read a proper book in French. “I had included a huge book library in the software and as the computer knew my level of vocabulary, it could recommend books for me. This was immensely gratifying and pushed me to progress even further.” Two months later, he passed the national French language exam in Estonia.

Mait became convinced that he had to do something with his idea. So he went on holiday, and hired two software developers to develop his code so it would work on the web. Whilst on holiday, he happened to meet a friend of a friend, who helped him set up Lingvist as a company. Estonia, he says, has a fantastic start-up and software-development culture thanks to Skype, which was invented there. Later, Mait met the technical co-founder of Skype at a conference, who coincidentally had been working on software to accelerate human learning. He dropped his attempts and became Lingvist’s first investor.

Short-term memory capabilities can differ between five minutes and two seconds!
Mait Müntel

The pair secured a generous grant from the European Union Horizon 2020 programme and things were falling into place, though it wasn’t all easy says Mait: “You can use the analogy of sitting in a nice warm office at CERN, surrounded by beautiful mountains. In the office, you are safe and protected, but if you go outside and climb the mountains, you encounter rain and hail, it is an uphill struggle and very uncomfortable, but immensely satisfying when you reach the summit. Even if you work more than 100 hours per week.”

Lingvist currently has three million users, and Mait is convinced that the technology can be applied to all types of education. “What our data have demonstrated is that levels of learning in people are very different. Short-term memory capabilities can differ between five minutes and two seconds! Currently, based on our data, the older generation has much better memory characteristics. The benefit of our software is that it measures memory, and no matter one’s retention capabilities, the software will help improve retention rates.”

New talents

Faced with a future where artificial intelligence will make many jobs extinct, and many people will need to retrain, competitiveness will be derived from the speed at which people can learn, says Mait. He is now building Lingvist’s data-science research team to grow the company to its full potential, and is always on the lookout for new CERN talent. “Traditionally, physicists have excellent modelling, machine-learning and data-analysis skills, even though they might not be aware of it,” he says.

The post Learning language by machine appeared first on CERN Courier.

A unique period for computing, but will it last?

cern — Wed, 18 Nov 2020 09:35:20 +0000

Stack ’em high Monica Marinucci and Ivan Deloose at CERN’s PC farm in 2001. Credit: CERN-CO-0101012-01

Twenty-five years ago in Rio de Janeiro, at the 8th International Conference on Computing in High-Energy and Nuclear Physics (CHEP-95), I presented a paper on behalf of my research team titled “The PC as Physics Computer for LHC”. We highlighted impressive improvements in price and performance compared to other solutions on offer. In the years that followed, the community started moving to PCs in a massive way, and today the PC remains unchallenged as the workhorse for high-energy physics (HEP) computing.

HEP-computing demands have always been greater than the available capacity. However, our community does not have the financial clout to dictate the way computing should evolve, demanding constant innovation and research in computing and IT to maintain progress. A few years before CHEP-95, RISC workstations and servers had started complementing the mainframes that had been acquired at high cost at the start-up of LEP in 1989. We thought we could do even better than RISC. The increased-energy LEP2 phase needed lots of simulation, and the same needs were already manifest for the LHC. These were our inspirations that led PC servers to start populating our computer centres – a move that was also helped by a fair amount of luck.

Fast change

HEP programs need good floating-point compute capabilities and early generations of the Intel x86 processors, such as the 486/487 chips, offered mediocre capabilities. The Pentium processors that emerged in the mid-1990s changed the scene significantly, and the competitive race between Intel and AMD was a major driver of continued hardware innovation.

Another strong tailwind came from the relentless efforts to shrink transistor sizes in line with Moore’s law, which saw processor speeds increase from 50/100 MHz to 2000/3000 MHz in little more than a decade. After 2006, when speed increases became impossible for thermal reasons, efforts moved to producing multi-core chips. However, HEP continued to profit. Since all physics events at colliders such as the LHC are independent of all others, it was sufficient to split a job into multiple jobs across all cores.

Sverre Jarp worked at CERN for more than 40 years, during which he held many different positions, including CTO of CERN openlab from 2002 until 2014. Credit: N Jarp

The HEP community was also lucky with software. Back in 1995 we had chosen Windows/NT as the operating system, mainly because it supported multiprocessing, which significantly enhanced our price/performance. Physicists, however, insisted on Unix. In 1991, Linus Thorvalds released Linux version 0.01 and it quickly gathered momentum as a worldwide open-source project. When release 2.0 appeared in 1996, multiprocessing support was included and the operating system was quickly adopted by our community.

Furthermore, HEP adopted the Grid concept to cope with the demands of the LHC. Thanks to projects such as Enabling Grids for E-science, we built the Worldwide LHC Computing Grid, which today handles more than two million tasks across one million PC cores every 24 hours. Although grid computing remained mainly amongst scientific users, the analogous concept of cloud computing had the same cementing effect across industry. Today, all the major cloud-computing providers overwhelmingly rely on PC servers.

In 1995 we had seen a glimmer, but we had no idea that the PC would remain an uncontested winner during a quarter of a century of scientific computing. The question is whether it will last for another quarter century?

The contenders

The end of CPU scaling, argued a recent report by the HEP Software Foundation, demands radical changes in computing and software to ensure the success of the LHC and other experiments into the 2020s and beyond. There are many contenders that would like to replace the x86 PC architecture. It could be graphics processors, where both Intel, AMD and Nvidia are active. A wilder guess is quantum computing, whereas a more conservative guess would be processors similar to the x86, but based on other architectures, such as ARM or RISC-V.

The end of CPU scaling demands radical changes to ensure the success of the LHC and other high-energy physics experiments

During the PC project we collaborated with Hewlett-Packard, which had a division in Grenoble, not too far away. Such R&D collaborations have been vital to CERN and the community since the beginning and they remain so today. They allow us to get insight into forthcoming products and future plans, while our feedback can help to influence the products in plan. CERN openlab, which has been the focal point for such collaborations for two decades, early-on coined the phrase “You make it, we break it”. However, whatever the future holds, it is fair to assume that PCs will remain the workhorse for HEP computing for many years to come.

The post A unique period for computing, but will it last? appeared first on CERN Courier.

CERN and quantum technologies

cern — Fri, 25 Sep 2020 08:15:13 +0000

QT inroads CERN’s AEGIS experiment is able to explore the multi-particle entangled nature of photons from positronium annihilation, and is one of several examples of existing CERN research with relevance to quantum technologies. Credit: CERN-PHOTO-201604-080-2

Quantum technologies, which exploit inherent phenomena of quantum mechanics such as superposition and entanglement, have the potential to transform science and society over the next five to 10 years. This is sometimes described as the second quantum revolution, following the first that included the introduction of devices such as lasers and transistors over the past half century. Quantum technologies (QTs) require resources that are not mainstream today. During the past couple of years, dedicated support for R&D in QTs has become part of national and international research agendas, with several major initiatives underway worldwide. The time had come for CERN to engage more formally with such activities.

Following a first workshop on quantum computing in high-energy physics organised by CERN openlab in November 2018, best-effort initiatives, events and joint pilot projects have been set up at CERN to explore the interest of the community in quantum technologies (in particular quantum computing), as well as possible synergies with other research fields. In June, CERN management announced the CERN quantum technology initiative. CERN is in the unique position of having in one place the diverse set of skills and technologies – including software, computing and data science, theory, sensors, cryogenics, electronics and material science – necessary for a multidisciplinary endeavour like QT. CERN also has compelling use cases that create ideal conditions to compare classic and quantum approaches to certain applications, and has a rich network of academic and industry relations working in unique collaborations such as CERN openlab.

Alberto Di Meglio is coordinator of the CERN quantum technology initiative and head of CERN openlab. Credit: CERN

Today, QT is organised into four main domains. One is computing, where quantum phenomena such as superposition are used to speed up certain classes of computational problems beyond the limits achievable with classical systems. A second is quantum sensing and metrology, which exploits the high sensitivity of coherent quantum systems to design new classes of precision detectors and measurement devices. The third, quantum communication, whereby single or entangled photons and their quantum states are used to implement secure communication protocols across fibre-optic networks, or quantum memory devices able to store quantum states. The fourth domain is quantum theory, simulation and information processing, where well-controlled quantum systems are used to simulate or reproduce the behaviour of different, less accessible, many-body quantum phenomena, and relations between quantum phenomena and gravitation can be explored – a topic at the heart of CERN’s theoretical research programme. There is much overlap between these four domains, for example quantum sensors and networks can be brought together to create potentially very precise, large-scale detector systems.

Over the next three years, the quantum technology initiative will assess the potential impact of QTs on CERN and high-energy physics on the timescale of the HL-LHC and beyond. After establishing governance and operational instruments, the initiative will work to define concrete R&D objectives in the four main QT areas by the end of this year. It will also develop an international education and training programme in collaboration with leading experts, universities and industry, and identify mechanisms for knowledge sharing within the CERN Member States, the high-energy physics community, other scientific research communities and society at large. Graduate students will be selected in time for the first projects to begin in early 2021.

Joint initiatives

A number of joint collaborations are already being created across the high-energy physics community and CERN is involved in several pilot investigation projects with leading academic and research centres. On the industry side, through CERN openlab, CERN is already collaborating on quantum-related technologies with CQC, Google, IBM and Intel. The CERN quantum technology initiative will continue to forge links with industry and collaborate with the main national quantum initiatives worldwide.

Quantum technologies have the potential to transform science and society over the next five to 10 years

By taking part in this rapidly growing field, CERN not only has much to offer, but also stands to benefit directly from it. For example, QTs have strong potential in supporting the design of new sophisticated types of detectors, or in tackling the computing workloads of the physics experiments more efficiently. The CERN quantum technology initiative, by helping structure and coordinate activities with our community and the many international public and private initiatives, is a vital step to prepare for this exciting future.

The post CERN and quantum technologies appeared first on CERN Courier.

Adapting to exascale computing

cern — Mon, 02 Dec 2019 10:11:19 +0000

Ever-evolving The CERN data centre, photographed in 2016. Credit: S Bennett/CERN

It is impossible to envisage high-energy physics without its foundation of microprocessor technology, software and distributed computing. Almost as soon as CERN was founded the first contract to provide a computer was signed, but it took manufacturer Ferranti more than two years to deliver “Mercury”, our first valve-based behemoth, in 1958. So early did this machine arrive that the venerable FORTRAN language had yet to be invented! A team of about 10 people was required for operations and the I/O system was already a bottleneck. It was not long before faster and more capable machines were available at the lab. By 1963, an IBM 7090 based on transistor technology was available with a FORTRAN compiler and tape storage. This machine could analyse 300,000 frames of spark-chamber data – a big early success. By the 1970s, computers were important enough that CERN hosted its first Computing and Data Handling School. It was clear that computers were here to stay.

By the time of the LEP era in the late 1980s, CERN hosted multiple large mainframes. Workstations, to be used by individuals or small teams, had become feasible. DEC VAX systems were a big step forward in power, reliability and usability and their operating system, VMS, is still talked of warmly by older colleagues in the field. Even more economical machines, personal computers (PCs), were also reaching a threshold of having enough computing power to be useful to physicists. Moore’s law, which predicted the doubling of transistor densities every two years, was well established and PCs were riding this technological wave. More transistors meant more capable computers, and every time transistors got smaller, clock speeds could be ramped up. It was a golden age where more advanced machines, running ever faster, gave us an exponential increase in computing power.

Close encounters A simulated HL-LHC collision event in an upgraded ATLAS detector, which has an average of around 200 collisions per particle bunch crossing. Credit: ATLAS/CERN

Key also to the computing revolution, alongside the hardware, was the growth of open-source software. The GNU project had produced many utilities that could be used by hackers and coders on which to base their own software. With the start of the Linux project to provide a kernel, humble PCs became increasingly capable machines for scientific computing. Around the same time, Tim Berners-Lee’s proposal for the World Wide Web, which began as a tool for connecting information for CERN scientists, started to take off. CERN realised the value in releasing the web as an open standard and in doing so enabled a success that today connects almost the entire planet.

LHC computing

This interconnected world was one of the cornerstones of the computing that was envisaged for the Large Hadron Collider (LHC). Mainframes were not enough, nor were local clusters. What the LHC needed was a worldwide system of interconnected computing systems: the Worldwide LHC Computing Grid (WLCG). Not only would information need to be transferred, but huge amounts of data and millions of computer jobs would need to be moved and executed, all with a reliability that would support the LHC’s physics programme. A large investment in brand new grid technologies was undertaken, and software engineers and physicists in the experiments had to develop, deploy and operate a new grid system utterly unlike anything that had gone before. Despite rapid progress in computing power, storage space and networking, it was extremely hard to make a reliable, working distributed system for particle physics out of these pieces. Yet we achieved this incredible task. During the past decade, thousands of physics results from the four LHC experiments, including the Higgs-boson discovery, were enabled by the billions of jobs executed and the petabytes of data shipped around the world.

The software that was developed to support the LHC is equally impressive. The community had made a wholesale migration from the LEP FORTRAN era to C++ and millions of lines of code were developed. Huge software efforts in every experiment produced frameworks that managed data taking and reconstruction of raw events to analysis data. In simulation, the Geant4 toolkit enabled the experiments to begin data-taking at the LHC with a fantastic level of understanding of the extraordinarily complex detectors, enabling commissioning to take place at a remarkable rate. The common ROOT foundational libraries and analysis environment allowed physicists to process the billions of events that the LHC supplied and extract the physics from them successfully at previously unheard of scales.

Changes in the wider world

While physicists were busy preparing for the LHC, the web became a pervasive part of people’s lives. Internet superpowers like Google, Amazon and Facebook grew up as the LHC was being readied and this changed the position of particle physics in the computing landscape. Where particle physics had once been a leading player in software and hardware, enjoying good terms and some generous discounts, we found ourselves increasingly dwarfed by these other players. Our data volumes, while the biggest in science, didn’t look so large next to Google; the processing power we needed, more than we had ever used before, was small beside Amazon; and our data centres, though growing, were easily outstripped by Facebook.

Delivering data Speedup of the ALICE TPC tracking on GPUs, normalised to a single CPU core, based on lead–lead collision data collected in 2015. Credit: arXiv:1905.05515

Technology, too, started to shift. Since around 2005, Moore’s law, while still largely holding, has no longer been accompanied by increases in CPU clock speeds. Programs that ran in a serial mode on a single CPU core therefore started to become constrained in their performance. Instead, performance gains would come from concurrent execution on multiple threads or from using vectorised maths, rather than from faster cores. Experiments adapted by executing more tasks in parallel – from simply running more jobs at the same time to adopting multi-process and multi-threaded processing models. This post hoc parallelism was often extremely difficult because the code and frameworks written for the LHC had assumed a serial execution model.

The barriers being discovered for CPUs also caused hardware engineers to rethink how to exploit CMOS technology for processors. The past decade has witnessed the rise of the graphics processing unit (GPU) as an alternative way to exploit transistors on silicon. GPUs run with a different execution model: much more of the silicon is devoted to floating-point calculations, and there are many more processing cores, but each core is smaller and less powerful than a CPU. To utilise such devices effectively, algorithms often have to be entirely rethought and data layouts have to be redesigned. Much of the convenient, but slow, abstraction power of C++ has to be given up in favour of more explicit code and simpler layouts. However, this rapid evolution poses other problems for the code long term. There is no single way to programme a GPU and vendors’ toolkits are usually quite specific to their hardware.

It is both a challenge and also an opportunity to work with new scientific partners in the era of exascale science

All of this would be less important were it the case that the LHC experiments were standing still, but nothing could be further from the truth. For Run 3 of the LHC, scheduled to start in 2021, the ALICE and LHCb collaborations are installing new detectors and preparing to take massively more data than they did up to now. Hardware triggers are being dropped in favour of full software processing systems and continuous data processing. The high-luminosity upgrade of the LHC for Run 4, from 2026, will be accompanied by new detector systems for ATLAS and CMS, much higher trigger rates and greatly increased event complexity. All of this physics needs to be supported by a radical evolution of software and computing systems, and in a more challenging sociological and technological environment. The LHC will also not be the only scientific big player in the future. Facilities such as DUNE, FAIR, SKA and LSST will come online and have to handle as much, if not more, data than at CERN and in the WLCG. That is both a challenge but also an opportunity to work with new scientific partners in the era of exascale science.

There is one solution that we know will not work: simply scaling up the money spent on software and computing. We will need to live with flat budgets, so if the event rate of an experiment increases by a factor of 10 then we have a budget per event that just shrank by the same amount! Recognising this, the HEP Software Foundation (HSF) was invited by the WLCG in 2016 to produce a roadmap for how to evolve software and computing in the 2020s – resulting in a community white paper supported by hundreds of experts in many institutions worldwide (CERN Courier April 2018 p38). In parallel, CERN open lab – a public–private partnership through which CERN collaborates with leading ICT companies and other research organisations – published a white paper setting out specific challenges that are ripe for tackling through collaborative R&D projects with leading commercial partners.

Facing the data onslaught

Since the white paper was published, the HSF and the LHC-experiment collaborations have worked hard to tackle the challenges it lays out. Understanding how event generators can be best configured to get good physics at minimum cost is a major focus, while efforts to get simulation speed-ups from classical fast techniques, as well as new machine-learning approaches, have intensified. Reconstruction algorithms have been reworked to take advantage of GPUs and accelerators, and are being seriously considered for Run 3 by CMS and LHCb (as ALICE makes even more use of GPUs since their successful deployment in Run 2). In the analysis domain, the core of ROOT is being reworked to be faster and also easier for analysts to work with. Much inspiration is taken from the Python ecosystem, using Jupyter notebooks and services like SWAN.

Graeme Stewart is a software specialist in the CERN EP-SFT group. He is a member of the ATLAS experiment and coordinator of the High-Energy Physics Software Foundation. Credit: CERN

These developments are firmly rooted in the new distributed models of software development based on GitHub or GitLab and with worldwide development communities, hackathons and social coding. Open source is also vital, and all of the LHC experiments have now opened up their software. In the computing domain there is intense R&D into improving data management and access, and the ATLAS-developed Rucio data management system is being adopted by a wide range of other HEP experiments and many astronomy communities. Many of these developments got a shot in the arm from the IRIS–HEP project in the US; other European initiatives, such as IRIS in the UK and the IDT-UM German project are helping, though much more remains to be done.

All this sets us on a good path for the future, but still, the problems remain significant, the implementation of solutions is difficult and the level of uncertainty is high. Looking back to the first computers at CERN and then imagining the same stretch of time into the future, predictions are next to impossible. Disruptive technology, like quantum computing, might even entirely revolutionise the field. However, if there is one thing that we can be sure of, it’s that the next decades of software and computing at CERN will very likely be as interesting and surprising as the ones already passed.

The post Adapting to exascale computing appeared first on CERN Courier.

FPGAs that speak your language

cern — Tue, 12 Nov 2019 14:13:11 +0000

Purple rain Visualisation of logic gates firing as an FPGA evaluates energies in the CMS calorimeter trigger. Credit: A Rose

Teeming with radiation and data, the heart of a hadron collider is an inhospitable environment in which to make a tricky decision. Nevertheless, the LHC experiment detectors have only microseconds after each proton–proton collision to make their most critical analysis call: whether to read out the detector or reject the event forever. As a result of limitations in read-out bandwidth, only 0.002% of the terabits per second of data generated by the detectors can be saved for use in physics analyses. Boosts in energy and luminosity – and the accompanying surge in the complexity of the data from the high-luminosity LHC upgrade – mean that the technical challenge is growing rapidly. New techniques are therefore needed to ensure that decisions are made with speed, precision and flexibility so that the subsequent physics measurements are as sharp as possible.

The front-end and read-out systems of most collider detectors include many application-specific integrated circuits (ASICs). These custom-designed chips digitise signals at the interface between the detector and the outside world. The algorithms are baked into silicon at the foundries of some of the biggest companies in the world, with limited prospects for changing their functionality in the light of changing conditions or detector performance. Minor design changes require substantial time and money to fix, and the replacement chip must be fabricated from scratch. In the LHC era, the tricky trigger electronics are therefore not implemented with ASICs, as before, but with field-programmable gate arrays (FPGAs). Previously used to prototype the ASICs, FPGAs may be re-programmed “in the field”, without a trip to the foundry. Now also prevalent in high-performance computing, with leading tech companies using them to accelerate critical processing in their data centres, FPGAs offer the benefits of task-specific customisation of the computing architecture without having to set the chip’s functionality in stone – or in this case silicon.

Architecture of a chip

A Xilinx Virtex 7 FPGA. Credit: Xilinx

FPGAs can compete with other high-performance computing chips due to their massive capability for parallel processing and relatively low power consumption per operation. The devices contain many millions of programmable logic gates that can be configured and connected together to solve specific problems. Because of the vast numbers of tiny processing units, FPGAs can be programmed to work on many different parts of a task simultaneously, thereby achieving massive throughput and low latency – ideal for increasingly popular machine-learning applications. FPGAs can also support high bandwidth inputs and outputs of up to about 100 dedicated high-speed serial links, making them ideal workhorses to process the deluge of data that streams out of particle detectors (see CERN Courier September 2016 p21).

The difficulty is that programming FPGAs is traditionally the preserve of engineers coding low-level languages such as VHDL and Verilog, where even simple tasks can be tricky. For example, a function to sum two numbers together requires several lines of code in VHDL, with the designer even required to define when the operations happen relative to the processor clock (figure 1). Outsourcing the coding is impractical, given the imminent need to implement elaborate algorithms featuring machine learning in the trigger to quickly analyse data from high-granularity detectors in high-luminosity environments. During the past five years, however, tools have matured, allowing FPGAs to be programmed in variants of high-level languages such as C++ and Java, and bringing FPGA coding within the reach of physicists themselves.

Fig. 1. Code snippets illustrating the difficulty in writing even simple algorithms in VHDL (top), compared to implementations in Java-variant MaxJ (middle) and C++, as compiled by Vivado HLS (bottom). Credit: CERN

But can high-level tools produce FPGA code with low-enough latency for trigger applications? And can their resource usage compete with professionally developed low-level code? During the past couple of years CMS physicists have trialled the use of a Java-based language, MaxJ, and tools from Maxeler Technologies, a leading company in accelerated computing and data-flow engines, who were partners in the studies. More recently the collaboration has also gained experience with the C++-based Vivado high-level synthesis (HLS) tool of the FPGA manufacturer Xilinx. The work has demonstrated the potential for ground-breaking new tools to be used in future triggers, without significantly increasing resource usage and latency.

Track and field-programmable

Tasked with finding hadronic jets and calculating missing transverse energy in a few microseconds, the trigger of the CMS calorimeter handles an information throughput of 6.5 terabits per second. Data are read out from the detector into the trigger-system FPGAs in the counting room in a cavern adjacent to CMS. The official FPGA code was implemented in VHDL over several months each of development, debugging and testing. To investigate whether high-level FPGA programming can be practical, the same algorithms were implemented in MaxJ by an inexperienced doctoral student (figure 2), with the low-level clocking and management of high-speed serial links still undertaken by the professionally developed code. The high-level code had comparable latency and resource usage with one exception: the hand-crafted VHDL was superior when it came to quickly sorting objects by their transverse momentum. With this caveat, the study suggests that using high-level development tools can dramatically lower the bar for developing FPGA firmware, to the extent that students and physicists can contribute to large parts of the development of labyrinthine electronics systems.

Fig. 2. The official VHDL implementation of the CMS calorimeter trigger for missing transverse energy (top) and jet transverse energy (bottom) are consistent with high-level Java-based MaxJ code written by a doctoral student. Credit: CERN

Kalman filtering is an example of an algorithm that is conventionally used for offline track reconstruction on CPUs, away from the low-latency restrictions of the trigger. The mathematical aspects of the algorithm are difficult to implement in a low-level language, for example requiring trajectory fits to be iteratively optimised using sequential matrix algebra calculations. But the advantages of a high-level language could conceivably make Kalman filtering tractable in the trigger. To test this, the algorithm was implemented for the phase-II upgrade of the CMS tracker in MaxJ. The scheduler of Maxeler’s tool, MaxCompiler, automatically pipelines the operations to achieve the best throughput, keeping the flow of data synchronised. This saves a significant amount of effort in the development of a complicated new algorithm compared to a low-level language, where this must be done by hand. Additionally, MaxCompiler’s support for fixed-point arithmetic allows the developer to make full use of the capability of FPGAs to use custom data types. Tailoring the data representation to the problem at hand results in faster, more lightweight processing, which would be prohibitively labour-intensive in a low-level language. The result of the study was hundreds of simultaneous track fits in a single FPGA in just over a microsecond.

Ghost in the machine

Deep neural networks, which have become increasingly prevalent in offline analysis and event reconstruction thanks to their ability to exploit tangled relationships in data, are another obvious candidate for processing data more efficiently. To find out if such algorithms could be implemented in FPGAs, and executed within the tight latency constraints of the trigger, an example application was developed to identify fake tracks – the inevitable byproducts of overlapping particle trajectories – in the output of the MaxJ Kalman filter described above. Machine learning has the potential to distinguish such bogus tracks better than simple selection cuts, and a boosted decision tree (BDT) proved effective here, with the decision step, which employs many small and independent decision trees, implemented with MaxCompiler. A latency of a few hundredths of a microsecond – much shorter than the iterative Kalman filter as BDTs are inherently very parallelisable – was achieved using only a small percentage of the silicon area of the FPGA, so leaving room for other algorithms. Another tool capable of executing machine-learning models in tens of nanoseconds is the “hls4ml” FPGA inference engine for deep neural networks, built on the Vivado HLS compiler of Xilinx. With the use of such tools, non-FPGA experts can trade-off latency and resource usage – two critical metrics of performance, which would require significant extra effort to balance in collaboration with engineers writing low-level code.

Workhorse Nine calorimeter-trigger cards process CMS events in parallel, while a 10th receives their outputs. The 11th is spare. Each card has a Xilinx Virtex 7 FPGA. The cards were installed at the beginning of LHC Run 2 and will be upgraded for high-luminosity running. Credit: D Newbold

Though requiring a little extra learning and some knowledge of the underlying technology, it is now possible for ordinary physicists to programme FPGAs in high-level languages, such as Maxeler’s MaxJ and Xilinx’s Vivado HLS. Development time can be cut significantly, while maintaining latency and resource usage at a similar level to hand-crafted FPGA code, with the fast development of mathematically intricate algorithms an especially promising use case. Opening up FPGA programming to physicists will allow offline approaches such as machine learning to be transferred to real-time detector electronics.

Novel approaches will be critical for all aspects of computing at the high-luminosity LHC. New levels of complexity and throughput will exceed the capability of CPUs alone, and require the extensive use of heterogenous accelerators such as FPGAs, graphics processing units (GPUs) and perhaps even tensor processing units (TPUs) in offline computing. Recent developments in FPGA interfaces are therefore most welcome as they will allow particle physicists to execute complex algorithms in the trigger, and make the critical initial selection more effective than ever before.

The post FPGAs that speak your language appeared first on CERN Courier.

Cloud services take off in the US and Europe

cern — Mon, 02 Sep 2019 13:36:39 +0000

Fermilab has announced the launch of HEPCloud, a step towards a new computing paradigm in particle physics to deal with the vast quantities of data pouring in from existing and future facilities. The aim is to allow researchers to “rent” high-performance computing centres and commercial clouds at times of peak demand, thus reducing the costs of providing computing capacity. Similar projects are also gaining pace in Europe.

“Traditionally, we would buy enough computers for peak capacity and put them in our local data centre to cover our needs,” says Fermilab’s Panagiotis Spentzouris, one of HEPCloud’s drivers. “However, the needs of experiments are not steady. They have peaks and valleys, so you want an elastic facility.” All Fermilab experiments will soon submit jobs to HEPCloud, which provides a uniform interface so that researchers don’t need expert knowledge about where and how best to run their jobs.

The idea dates back to 2014, when Spentzouris and Fermilab colleague Lothar Bauerdick assessed the volumes of data coming from Fermilab’s neutrino programme and the US participation in CERN’s Large Hadron Collider (LHC) experiments. The first demonstration of HEPCloud on a significant scale was in February 2016, when the CMS experiment used it to achieve about 60,000 cores on the Amazon cloud, AWS, and, later that year, to run 160,000 cores using Google Cloud Services. Most recently in May 2018, the NOvA team at Fermilab was able to execute around 2 million hardware threads at a supercomputer at the National Energy Research Scientific Computing Center of the US Department of Energy’s Office of Science. HEPCloud project members now plan to enable experiments to use the state-of-the art supercomputing facilities run by the DOE’s Advanced Scientific Computing Research programme at Argonne and Oak Ridge national laboratories.

Europe’s Helix Nebula

CERN is leading a similar project in Europe called the Helix Nebula Science Cloud (HNSciCloud). Launched in 2016 and supported by the European Union (EU), it builds on work initiated by EIROforum in 2010 and aims to bridge cloud computing and open science. Working with IT contractors, HNSciCloud members have so far developed three prototype platforms and made them accessible to experts for testing.

The results and lessons learned are contributing to the implementation of the European Open Science Cloud

“The HNSciCloud pre-commercial procurement finished in December 2018, having shown the integration of commercial cloud services from several providers (including Exoscale and T-Systems) with CERN’s in-house capacity in order to serve the needs of the LHC experiments as well as use cases from life sciences, astronomy, proton and neutron science,” explains project leader Bob Jones of CERN. “The results and lessons learned are contributing to the implementation of the European Open Science Cloud where a common procurement framework is being developed in the context of the new OCRE [Open Clouds for Research Environments] project.”

The European Open Science Cloud, an EU-funded initiative started in 2015, aims to bring efficiencies and make European research data more sharable and reusable. To help European research infrastructures move towards this open-science future, a €16 million EU project called ESCAPE (European Science Cluster of Astronomy & Particle Physics ESFRI) was launched in February. The 3.5 year-long project led by the CNRS will see 31 facilities in astronomy and particle physics collaborate on cloud computing and data science, including CERN, the European Southern Observatory, the Cherenkov Telescope Array, KM3NeT and the Square Kilometre Array (SKA).

In the context of ESCAPE, CERN is leading the effort of prototyping and implementing a FAIR (findable, accessible, interoperable, reproducible) data infrastructure based on open-source software, explains Simone Campana of CERN, who is deputy project leader of the Worldwide LHC Computing Grid (WLCG). “This work complements the WLCG R&D activity in the area of data organisation, management and access in preparation for the HL-LHC. In fact, the computing activities of the CERN experiments at HL-LHC and other initiatives such as SKA will be very similar in scale, and will likely coexist on a shared infrastructure.”

The post Cloud services take off in the US and Europe appeared first on CERN Courier.

Computing boost for Lebanon and Nepal

cern — Wed, 10 Jul 2019 13:51:23 +0000

Racking up The HPC Nepal team in the new computing centre. Credit: D Bista.

In the heart of Beirut in a five-storey house owned by the Lebanese national telecommunication company, floors are about to be coated to make them anti-static, walls and ceilings will be insulated, and cabling systems installed so wires don’t become tangled. These and other details are set to be complete by mid-2020, when approximately 3000 processor cores, donated by CERN, will arrive.

The High-Performance Computing for Lebanon (HPC4L) project is part of efforts by Lebanese scientists to boost the nation’s research capabilities. Like many other countries that have been through conflict and seen their highly-skilled graduates leave to seek better opportunities, Lebanon is trying to stem its brain-drain. Though the new facility will not be the only HPC centre in the country, it is different because it involves both public and private institutions and has the full support of the government. “There are a few small-scale HPC facilities in different universities here, but they suffer from being isolated and hence are quickly outdated and underused,” says physicist Haitham Zaraket of Lebanese University in Beirut. “This HPC project puts together the main players in the realm of HPC in Lebanon.”

Having joined the LHC’s CMS experiment in 2016, Lebanese physicists want to develop the new facility into a CMS Tier-2 computing centre. High-speed internet will connect it to universities around the world and HPC4L has a mandate to ensure operation, maintenance, and user-interfacing for smooth and effective running of the facility. “We’ve been working with the government, private and public partners to prepare not just the infrastructure but also the team,” explains HPC4L project coordinator Martin Gastal of CERN. “CERN/CMS’s expertise and knowledge will help set up the facility and train users, but the team in Lebanon will run it themselves.” The Lebanese facility will also be used for computational biology, oil and gas discovery, financial forecasting, genome analysis and the social sciences.

Nepal is another country striving for greater digital storage and computing power. In 2017 Nepal signed a cooperation agreement with CERN. The following year, around 2500 cores from CERN enabled an HPC facility to be established at the government-run IT Park, with experts from Kathmandu University forming its core team. Rajendra Adhikari, project leader of Nepal’s HPC centre (pictured, second from right), also won an award from NVIDIA for the latest graphics card worth USD 3000 and added it to the system. Nepal has never had computing on such a scale before, says Adhikari. “With this facility, we can train our students and conduct research that requires high-performance computing and data storage, from climate modelling, earthquake simulations to medical imaging and basic research.”

The Nepal facility is planning to store health data from hospitals, which is often deleted because of lack of storage space, and tests are being carried out to process drone images taken to map topography for hydropower feasibility studies. Even in the initial phases of the new centre, says Adhikari, computing tasks that used to take 45 days can now be processed in just 12 hours.

The SESAME light source in Jordan, which itself received 576 cores from CERN in 2017, is also using its experience to assist neighbouring regions in setting up and maintaining HPC facilities. “High-performance computing is a strong enabler of research capacity building in regions challenged by limited financial resources and talent exodus,” says Gastal. “By supporting the set up of efficient data processing and storage facilites, CERN, together with affiliated institutes, can assist fellow researchers in investing in the scientific potential of their own countries.”

The post Computing boost for Lebanon and Nepal appeared first on CERN Courier.

Harnessing the web for humanity

cern — Mon, 11 Mar 2019 16:46:02 +0000

Credit: The Humanized Internet.

What would you do if you were thrust into a world where suddenly you lacked control over who you were? If you had no way to prove where you were from, who you were related to, or what you had accomplished? If you lost all your documentation in a natural disaster, or were forced to leave your home without taking anything with you? Without proof of identity, people are unable access essential systems such as health, education and banking services, and they are also exceedingly vulnerable to trafficking and incarceration. Having and owning your identity is an essential human right that too many people are lacking.

More than 68 million people worldwide have been displaced by war and conflict, and over 25 million have fled their countries and gone from the designation of “citizen” to “refugee”. They are often prevented from working in their new countries, and, even if they are allowed to work, many nations will not let professional credentials, such as licences to practise law or medicine, follow these people across their borders. We end up stripping away fundamental human dignities and leaving exorbitant amounts of untapped potential on the table. Countries need to recognise not just the right to identity but also that identity is portable across nation states.

The issue of sovereign identities extends much further than documentation. All over the world, individuals are becoming commodified by companies offering “free” services because their actual products are the users and their data. Every individual should have the right to decide to monetise their data if they want. But the speed, scale and stealth of such practises is making it increasingly difficult to retain control of our data.

All of this is happening as we celebrate the 30th anniversary of the web. While there is no doubt that the web has been incredibly beneficial for humanity, it has also turned people into pawns and opened them up to new security risks. I believe that we can not only remedy these harms, but that we’ve yet to harness even a small fraction of the good that the web can do. Enter The Humanized Internet – a non-profit movement founded in 2017 that is working to use new technologies to give every human being secure, sovereign control over their own digital identity.

Monique Morrow, a former CTO at Cisco, is president and co-founder of The Humanized Internet. This article is based on her forthcoming book The Humanized Internet (River Publishers, Denmark). Credit: TEDxGeneva.

New technologies like blockchain, which allows digital information to be distributed but not copied, can allow us to tackle this issue. Blockchain has some key differences with today’s databases. First, it allows participants to see and verify all data involved, minimising chances of fraud. Second, all data is verified and encrypted before being added to an individual block in such a way that a hacker would need to have exponentially more computing power to break in than is required in today’s systems. These characteristics allow blockchain to provide public ledgers that participants trust based on the agreed-upon consensus protocol. Once data transactions are on a block, they cannot be overwritten, and no central institution holds control, as these ledgers are visible to all the users connected to them. Users’ identities within a ledger are known only to the users themselves.

The first implication of this technology is that it can help to establish a person’s citizenship in their state of origin and enable registration of official records. Without this many people would be considered stateless and granted almost no rights or diplomatic protections. For refugees, digital identities also allow peer-to-peer donation and transparent public transactions. Additionally, digital identities create the ability to practise selective disclosure, where individuals can choose to share their records only at their own discretion.

We now need more people to get on board. We are already working with experts to discuss the potential of blockchain to improve inclusion in state-authenticated identity programmes and how to combat potential privacy challenges, in addition to e-voting systems that could allow inclusive participation in voting at all policy levels. We should all be the centre of our universe; our identity should be wholly and irrevocably our own.

The post Harnessing the web for humanity appeared first on CERN Courier.

Inspired by software

cern — Mon, 11 Mar 2019 15:30:28 +0000

Credit: S Kulkarni.

Of all the movements to make science and technology more open, the oldest is “open source” software. It was here that the “open” ideals were articulated, and from which all later movements such as open-access publishing derive. Whilst it rightly stands on this pedestal, from another point of view open-source software was simply the natural extension of academic freedom and knowledge-sharing into the digital age.

Open-source has its roots in the free software movement, which grew in the 1980s in response to monopolising corporations and restrictions on proprietary software. The underlying ideal is open collaboration: peers freely, collectively and publicly build software solutions. A second ideal is recognition, in which credit for the contributions made by individuals and organisations worldwide is openly acknowledged. A third ideal concerns rights, specifically the so-called four freedoms granted to users: to use the software for any purpose; to study the source code to understand how it works; to share and redistribute the software; and to improve the software and share the improvements with the community. Users and developers therefore contribute to a virtuous circle in which software is continuously improved and shared towards a common good, minimising vendor lock-in for users.

Today, 20 years after the term “open source” was coined, and despite initial resistance from traditional software companies, many successful open-source business models exist. These mainly involve consultancy and support services for software released under an open-source licence and extend beyond science to suppliers of everyday tools such as the WordPress platform, Firefox browser and the Android operating system. A more recent and unfortunate business model adopted by some companies is “open core”, whereby essential features are deemed premium and sold as proprietary software on top of existing open-source components.

Founding principles

Open collaboration is one of CERN’s founding principles, so it was natural to extend the principle into its software. The web’s invention brought this into sharp focus. Having experienced first-hand its potential to connect physicists around the globe, in 1993 CERN released the web software into the public domain so that developers could collaborate and improve on it (see CERN’s ultimate act of openness). The following year, CERN released the next web-server version under an open-source licence with the explicit goal of preventing private companies from turning it into proprietary software. These were crucial steps in nurturing the universal adoption of the web as a way to share digital information, and exemplars of CERN’s best practice in open-source software.

Nowadays, open-source software can be found in pretty much every corner of CERN, as in other sciences and industry. Indico and Invenio – two of the largest open-source projects developed at CERN to promote open collaboration – rely on the open-source framework Python Flask. Experimental data are stored in CERN’s Exascale Open Storage system, and most of the servers in the CERN computing centre are running on Openstack – an open-source cloud infrastructure to which CERN is an active contributor. Of course, CERN also relies heavily on open-source GNU/Linux as both a server and desktop operating system. On the accelerator and physics analysis side, it’s all about open source. From C2MON, a system at the heart of accelerator monitoring and data acquisition, to ROOT, the main data-analysis framework used to analyse experimental data, the vast majority of the software components behind the science done at CERN are released under an open-source licence.

Open hardware

The success of the open-source model for software has inspired CERN engineers to create an analogous “open hardware” licence, enabling electronics designers to collaborate and use, study, share and improve the designs of hardware components used for physics experiments. This approach has become popular in many sciences, and has become a lifeline for teaching and research in developing countries.

Being a scientist in the digital age means being a software producer and a software consumer. As a result, collaborative software-development platforms such as GitHub and GitLab have become as important to the physics department as they are to the IT department. Until recently, the software underlying an analysis has not been easily shared. CERN has therefore been developing research data-management tools to enable the publication of software and data, forming the basis of an open-data portal (see Preserving the legacy of particle physics). Naturally, this software itself is open source and has also been used to create the worldwide open-data service Zenodo, which is connected to GitHub to make the publication of open-source software a standard part of the research cycle.

Interestingly, as with the early days of open source, many corners of the scientific community are hesitant about open science. Some people are concerned that their software and data are not of sufficient quality or interest to be shared, or that they will be helping others to the next discovery before them. To triumph over the sceptics, open science can learn from the open-source movement, adopting standard licences, credit systems, collaborative development techniques and shared governance. In this way, it too will be able to reap the benefits of open collaboration: transparency, efficiency, perpetuity and flexibility.

The post Inspired by software appeared first on CERN Courier.

Open science: A vision for collaborative, reproducible and reusable research

cern — Mon, 11 Mar 2019 15:15:40 +0000

Credit: iStock/Nastco.

The goal of practising science in such a way that others can collaborate, question and contribute – known as “open science” – long predates the web. One could even argue that it began with the first academic journal 350 years ago, which enabled scientists to share knowledge and resources to foster progress. But the web offered opportunities way beyond anything before it, quickly transforming academic publishing and giving rise to greater sharing in areas such as software. Alongside the open-source (Inspired by software), open-access (A turning point for open-access publishing) and open-data (Preserving the legacy of particle physics) movements grew the era of open science, which aims to encompass the scientific process as a whole.

Today, numerous research communities, political circles and funding bodies view open science and reproducible research as vital to accelerate future discoveries. Yet, to fully reap the benefits of open and reproducible research, it is necessary to start implementing tools to power a more profound change in the way we conduct and perceive research. This poses both sociological and technological challenges, starting from the conceptualisation of research projects, through conducting research, to how we ensure peer review and assess the results of projects and grants. New technologies have brought open science within our reach, and it is now up to scientific communities to agree on the extent to which they want to embrace this vision.

Particle physicists were among the first to embrace the open-science movement, sharing preprints and building a deep culture of using and sharing open-source software. The cost and complexity of experimental particle physics, making complete replication of measurements unfeasible, presents unique challenges in terms of open data and scientific reproducibility. It may even be considered that openness itself, in the sense of having an unfettered access to data from its inception, is not particularly advantageous.

Take the existing data-management policies of the LHC collaborations: while physicists generally strive to be open in their research, the complexity of the data and analysis procedures means that data become publicly open only after a certain embargo period that is used to assess its correctness. The science is thus born “closed”. Instead of thinking about “open data” from its inception, it is more useful to speak about FAIR (findable, accessible, interoperable and reusable) data, a term coined by the FORCE11 community. The data should be FAIR throughout the scientific process, from being initially closed to being made meaningfully open later to those outside the experimental collaborations.

True open science demands more than simply making data available: it needs to concern itself with providing information on how to repeat or verify an analysis performed over given datasets, producing results that can be reused by others for comparison, confirmation or simply for deeper understanding and inspiration. This requires runnable examples of how the research was performed, accompanied by software, documentation, runnable scripts, notebooks, workflows and compute environments. It is often too late to try to document research in such detail once it has been published.

True open science demands more than simply making data available

FAIR data repositories for particle physics, the “closed” CERN Analysis Preservation portal and the “open” CERN Open Data portal emerged five years ago to address the community’s open-science needs. These digital repositories enable physicists to preserve, document, organise and share datasets, code and tools used during analyses. A flexible metadata structure helps researchers to define everything from experimental configurations to data samples, from analysis code to software libraries and environments used to analyse the data, accompanied by documentation and links to presentations and publications. The result is a standard way to describe and document an analysis for the purposes of discoverability and reproducibility.

Open movements. Open science encompasses all aspects of how scientific research is governed, performed, shared, published and evaluated. Credit: T Šimko.

Recent advancements in the IT industry allow us to encapsulate the compute environments where the analysis was conducted. Capturing information about how the analysis was carried out can be achieved via a set of runnable scripts, notebooks, structured workflows and “containerised” pipelines. Complementary to data repositories, a third service named REANA (reusable analyses) allows researchers to submit parameterised computational workflows to run on remote compute clouds. It can be used to reinterpret preserved analyses but also to run “active” analyses before they are published and preserved, with the underlying philosophy that physics analyses should be automated from inception so that they can be executed without manual intervention. Future reuse and reinterpretation starts with the first commit of the analysis code; altering an already-finished analysis to facilitate its eventual reuse after publication is often too late.

Full control

The key guiding principle of the analysis preservation and reuse framework is to leave the decision as to when a dataset or a complete analysis is shared, privately or publicly, in the hands of the researchers. This gives the experiment collaborations full control over the release procedures, and thus fully supports internal processing and review protocols before the results are published on community services, such as arXiv, HEPData and INSPIRE.

The CERN Open Data portal was launched in 2014 amid a discussion as to whether primary particle-physics data would find any use outside of the LHC collaborations. Within a few years, the first paper based on open data from the CMS experiment was published (see Preserving the legacy of particle physics).

Three decades after the web was born, science is being shared more openly than ever and particle physics is at the forefront of this movement. As we have seen, however, simple compliance with data and software openness is not enough: we also need to capture, from the start of the research process, runnable recipes, software environments, computational workflows and notebooks. The increasing demand from funding agencies and policymakers for open data-management plans, coupled with technological progress in information technology, leads us to believe that the time is ripe for this change.

Sharing research in an easily reproducible and reusable manner will facilitate knowledge transfer within and between research teams, accelerating the scientific process. This fills us with hope that three decades from now, even if future generations may not be able to run our current code on their futuristic hardware platforms, they will be at least well equipped to understand the processes behind today’s published research in sufficient detail to be able to check our results and potentially reveal something new.

The post Open science: A vision for collaborative, reproducible and reusable research appeared first on CERN Courier.

Preserving the legacy of particle physics

cern — Mon, 11 Mar 2019 14:58:34 +0000

Through the looking glass. A student studying ALICE data during a particle-physics masterclass in 2011. Credit: CERN-GE-1103106-45.

In the 17th century, Galileo Galilei looked at the moons of Jupiter through a telescope and recorded his observations in his now-famous notebooks. Galileo’s notes – his data – survive to this day and can be reviewed by anyone around the world. Students, amateurs and professionals can replicate Galileo’s data and results – a tenet of the scientific method.

In particle physics, with its unique and expensive experiments, it is practically impossible for others to attempt to reproduce the original work. When it is impractical to gather fresh data to replicate an analysis, we settle for reproducing the analysis with the originally obtained data. However, a 2013 study by researchers at the University of British Columbia, Canada, estimates that the odds of scientific data existing in an analysable form reduce by about 17% each year.

Indeed, just a few years down the line it might not even be possible for researchers to revisit their own data due to changes in formats, software or operating systems. This has led to growing calls for scientists to release and archive their data openly. One motivation is moral: society funds research and so should have access to all of its outputs. Another is practical: a fresh look at data could enable novel research and lead to discoveries that may have eluded earlier searches.

Like open-access publishing (see A turning point for open-access publishing), governments have started to impose demands on scientists regarding the availability and long-term preservation of research data. The European Commission, for example, has piloted the mandatory release of open data as part of its Horizon 2020 programme and plans to invest heavily in open data in the future. An increasing number of data repositories have been established for life and medical sciences as well as for social sciences and meteorology, and the idea is gaining traction across disciplines. Only days after they announced the first observation of gravitational waves, the LIGO and VIRGO collaborations made public their data. NASA also releases data from many of its missions via open databases, such as exoplanet catalogues. The Natural History Museum in London makes data from millions of specimens available via a website and, in the world of art, the Rijksmuseum in Amsterdam provides an interface for developers to build apps featuring historic artworks.

Data levels

The open-data movement is of special interest to particle physics, owing to the uniqueness and large volume of datasets involved in discoveries such as that of the Higgs boson at the Large Hadron Collider (LHC). The four main LHC experiments have started to periodically release their data in an open manner, and these data can be classified into four levels. The first consists of the data shown in final publications, such as plots and tables, while the second concerns datasets in a simplified format that are suitable for “lightweight” analyses in educational or similar contexts. The third level involves the data being used for analysis by the researchers themselves, requiring specialised code and dedicated computing resources, and the final level with the highest complexity is the raw data generated by the detectors, which requires petabytes of storage and, uncalibrated, is not of much use without being fed to the third tier.

In late 2014 CERN launched an open-data portal and released research data from the LHC for the first time. The data, collected by the CMS experiment, represented half the level-three data recorded in 2010. The ALICE experiment has also released level-three data from proton–proton as well as lead–lead collisions, while all four collaborations – including ATLAS and LHCb – have released subsets of level-two data for education and outreach purposes.

Proactive policy

The story of open data at CMS goes back to 2011. “We started drafting an open-data policy, not because of pressure from funding agencies but because defining our own policy proactively meant we did not have an external body defining it for us,” explains Kati Lassila-Perini, who leads the collaboration’s data-preservation project. CMS aims to release half of each year’s level-three data three years after data taking, and 100% of the data within a ten-year window. By guaranteeing that people outside CMS can use these data, says Lassila-Perini, the collaboration can ensure that the knowledge of how to analyse the data is not lost, while allowing people outside CMS to look for things the collaboration might not have time for. To allow external re-use of the data, CMS released appropriate metadata as well as analysis examples. The datasets soon found takers and, in 2017, a group of theoretical physicists not affiliated with the collaboration published two papers using them. CMS has since released half its 2011 data (corresponding to around 200 TB) and half its 2012 data (1 PB), with the first releases of level-three data from the LHC’s Run 2 in the pipeline.

Fig. 1. The official CMS plot for the Higgs-to-four-lepton channel shown during the Higgs-boson discovery announcement (left), and a similar plot (right) produced using CMS open data that contains more data but has not been scrutinised by CMS experts. Source: Phys. Lett. B 716 30.

The LHC collaborations have been releasing simpler datasets for educational activities from as early as 2011, for example for the International Physics Masterclasses that involve thousands of high-school students around the globe each year. In addition, CMS has made available several Jupyter notebooks – a browser-based analysis platform named with a nod to Galileo – in assorted languages (programming and human) that allow anyone with an internet connection to perform a basic analysis. “The real impact of open data in terms of numbers of users is in schools,” says Lassila-Perini. “It makes it possible for young people with no previous contact with coding to learn about data analysis and maybe discover how fascinating it can be.” Also available from CMS are more complex examples aimed at university-level students.

Open-data endeavours by ATLAS are very much focused on education, and the collaboration has provided curated datasets for teaching in places that may not have substantial computing resources or internet access. “Not even the documentation can rely on online content, so everything we produce needs to be self-contained,” remarks Arturo Sánchez Pineda, who coordinates ATLAS’s open-data programme. ATLAS datasets and analysis tools, which also rely on Jupyter notebooks, have been optimised to fit on a USB memory stick and allow simplified ATLAS analyses to be conducted just about anywhere in the world. In 2016, ATLAS released simplified open data corresponding to 1 fb^–1 at 8 TeV, with the aim of giving university students a feel for what a real particle-physics analysis involves.

ATLAS open data have already found their way into university theses and have been used by people outside the collaboration to develop their own educational tools. Indeed, within ATLAS, new members can now choose to work on preparing open data as their qualification task to become an ATLAS co-author, says Sánchez Pineda. This summer, ATLAS will release 10 fb^–1 of level-two data from Run 2, with more than 100 simulated physics processes and related resources. ATLAS does not provide level-three data openly and researchers interested in analysing these can do so through a tailored association programme, which 80 people have taken advantage of so far. “This allows external scientists to rely on ATLAS software, computing and analysis expertise for their project,” says Sánchez Pineda.

Fundamental motivation

CERN’s open-data portal hosts and serves data from the four big LHC experiments, also providing many of the software tools including virtual machines to run the analysis code. The OPERA collaboration recently started sharing its research data via CERN and other particle-physics collaborations are interested in joining the project.

Although high-energy physics has made great strides in providing open access to research publications, we are still in the very early days of open data. Theorist Jesse Thaler of MIT, who led the first independent analysis using CMS open data, acknowledges that it is possible for people to get their hands on coveted data by joining an experimental collaboration, but sees a much brighter future with open data. “What about more exploratory studies where the theory hasn’t yet been invented? What about engaging undergraduate students? What about examining old data for signs of new physics?” he asks. These provocative questions serve as fundamental motivations for making all data in high-energy physics as open as possible.

The post Preserving the legacy of particle physics appeared first on CERN Courier.

CERN’s ultimate act of openness

cern — Mon, 11 Mar 2019 14:50:45 +0000

On 30 April 1993, CERN released a memo that placed into the public domain all of the web’s underlying software: the basic client, basic server and library of common code.

At a mere 30 years old, the World Wide Web already ranks as one of humankind’s most disruptive inventions. Developed at CERN in the early 1990s, it has touched practically every facet of life, impacting industry, penetrating our personal lives and transforming the way we transact. At the same time, the web is shrinking continents and erasing borders, bringing with it an array of benefits and challenges as humanity adjusts to this new technology.

This reality is apparent to all. What is less well known, but deserves recognition, is the legal dimension of the web’s history. On 30 April 1993, CERN released a memo (see image) that placed into the public domain all of the web’s underlying software: the basic client, basic server and library of common code. The document was addressed “To whom it may concern” – which would suggest the authors were not entirely sure who the target audience was. Yet, with hindsight, this line can equally be interpreted as an unintended address to humanity at large.

The legal implication was that CERN relinquished all intellectual property rights in this software. It was a deliberate decision, the intention being that a no-strings-attached release of the software would “further compatibility, common practices, and standards in networking and computer supported collaboration” – arguably modest ambitions for what turned out to be such a seismic technological step. To understand what seeded this development you need to go back to the 1950s, at a time when “software” would have been better understood as referring to clothing rather than computing.

European project

CERN was born out of the wreckage of World War II, playing a role, on the one hand, as a mechanism for reconciliation between former belligerents, while, on the other, offering European nuclear physicists the opportunity to conduct their research locally. The hope was that this would stem the “brain drain” to the US, from a Europe still recovering from the devastating effects of war.

In 1953, CERN’s future Member States agreed on the text of the organisation’s founding Convention, defining its mission as providing “for collaboration among European States in nuclear research of a pure scientific and fundamental character”. With the public acutely aware of the role that destructive nuclear technology had played during the war, the Convention additionally stipulated that CERN was to have “no concern with work for military requirements” and that the results of its work, were to be “published or otherwise made generally available”.

In the early years of CERN’s existence, the openness resulting from this requirement for transparency was essentially delivered through traditional channels, in particular through publication in scientific journals. Over time, this became the cultural norm at CERN, permeating all aspects of its work both internally and with its collaborating partners and society at large. CERN’s release of the WWW software into the public domain, arguably in itself a consequence of the openness requirement of the Convention, could be seen as a precursor to today’s web-based tools that represent further manifestations of CERN’s openness: the SCOAP3 publishing model, open-source software and hardware, and open data.

Perhaps the best measure for how ingrained openness is in CERN’s ethos as a laboratory is to ask the question: “if CERN would have known then what it knows now about the impact of the World Wide Web, would it still have made the web software available, just as it did in 1993?” We would like to suggest that, yes, our culture of openness would provoke the same response now as it did then, though no doubt a modern, open-source licensing regime would be applied.

A culture of openness

This, in turn, can be viewed as testament and credit to the wisdom of CERN’s founders, and to the CERN Convention, which remains the cornerstone of our work to this day.

The post CERN’s ultimate act of openness appeared first on CERN Courier.

Real-time triggering boosts heavy-flavour programme

cern — Thu, 24 Jan 2019 09:00:15 +0000

The mass spectrum of muon pairs arising from a common vertex, obtained directly from the real-time analysis trigger. No further offline processing was applied.

A report from the LHCb collaboration

Throughout LHC Run 2, LHCb has been flooded by b- and c-hadrons due to the large beauty and charm production cross-sections within the experiment’s acceptance. To cope with this abundant flux of signal particles and to fully exploit them for LHCb’s precision flavour-physics programme, the collaboration has recently implemented a unique real-time analysis strategy to select and classify, with high efficiency, a large number of b- and c-hadron decays. Key components of this strategy are a real-time alignment and calibration of the detector, allowing offline-quality event reconstruction within the software trigger, which runs on a dedicated computing farm. In addition, the collaboration took the novel step of only saving to tape interesting physics objects (for example, tracks, vertices and energy deposits), and discarding the rest of the event. Dubbed “selective persistence”, this substantially reduced the average event size written from the online system without any loss in physics performance, thus permitting a higher trigger rate within the same output data rate (bandwidth). This has allowed the LHCb collaboration to maintain, and even expand, its broad programme throughout Run 2, despite limited computing resources.

LHCb has been flooded by b- and c-hadrons due to the large beauty and charm production cross-sections within the experiment’s acceptance.

The two-stage LHCb software trigger is able to select heavy flavoured hadrons with high purity, leaving event-size reduction as the handle to reduce trigger bandwidth. This is particularly true for the large charm trigger rate, where saving the full raw events would result in a prohibitively high bandwidth. Saving only the physics objects entering the trigger decision reduces the event size by a factor up to 20, allowing larger statistics to be collected at constant bandwidth. Several measurements of charm production and decay properties have been made so far using only this information. The sets of physics objects that must be saved for offline analysis can also be chosen “à la carte”, opening the door for further bandwidth savings on inclusive analyses too.

For the LHCb upgrade (see LHCb’s momentous metamorphosis), when the instantaneous luminosity increases by a factor of five, these new techniques will become standard. LHCb expects that more than 70% of the physics programme will use the reduced event format. The full software trigger, combined with real-time alignment and calibration, along with the selective persistence pioneered by LHCb, will likely become the standard for very high-luminosity experiments. The collaboration is therefore working hard to implement these new techniques and ensure that the current quality of physics data can be equalled or surpassed in Run 3.

The post Real-time triggering boosts heavy-flavour programme appeared first on CERN Courier.

Exploring quantum computing for high-energy physics

cern — Fri, 30 Nov 2018 14:45:17 +0000

Eckhard Elsen, CERN’s director for research and computing, speaking at the quantum computing in high-energy physics event. Image credit: A Purcell

The ambitious upgrade programme for the Large Hadron Collider (LHC) will result in significant information and communications technology (ICT) challenges over the next decade and beyond. It is therefore vital that members of the HEP research community keep looking for innovative computing technologies so as to continue to maximise the discovery potential of the world-leading research infrastructures at their disposal (CERN Courier November 2018 p5).

On 5–6 November, CERN hosted a first-of-its kind workshop on quantum computing in high-energy physics (HEP). The event was organised by CERN openlab, a public–private partnership between CERN and leading ICT companies established to accelerate the development of computing technologies needed by the LHC research community.

More than 400 people followed the workshop, which provided an overview of the current state of quantum-computing technologies. The event also served as a forum to discuss which activities within the HEP community may be amenable to the application of quantum-computing technologies.

“In CERN openlab, we’re always looking with keen interest at new computing architectures and trying to understand their potential for disrupting and improving the way we do things,” says Alberto Di Meglio, head of CERN openlab. “We want to understand which computing workflows from HEP could potentially most benefit from nascent quantum-computing technologies; this workshop was the start of the discussion.”

Significant developments are being made in the field of quantum computing, even if today’s quantum-computing hardware has not yet reached the level at which it could be put into production. Nevertheless, quantum-computing technologies are among those that hold future promise of substantially speeding up tasks that are computationally expensive.

“Quantum computing is no panacea, and will certainly not solve all the future computing needs of the HEP community,” says Eckhard Elsen, CERN’s director for research and computing. “Nevertheless, quantum computers are starting to be available; a breakthrough in the number of qubits could emerge at any time. Fundamentally rethinking our algorithms may appear as an interesting intellectual challenge today, yet may turn out as a major benefit in addressing computing challenges in the future.”

The workshop featured representatives of the LHC experiments, who spoke about how computing challenges are likely to evolve as we approach the era of the High-Luminosity LHC. There was also discussion of work already undertaken to assess the feasibility of applying today’s quantum-computing technologies to problems in HEP. Jean-Roch Vlimant provided an overview of their recent work at the California Institute of Technology, with collaborators from the University of Southern California, to solve an optimisation problem related to the search for Higgs bosons. Using an approach known as quantum annealing for machine learning, the team demonstrated some advantage over traditional machine-learning methods for small training datasets. Given the relative simplicity of the algorithm and its robustness to error, they report, this technique may find application in other areas of experimental particle physics, such as real-time decision making in event-selection problems and classification in neutrino physics.

Several large-scale research initiatives related to quantum-computing technologies were presented at the event, including the European Union’s €1 billion Quantum Technologies Flagship project, which involves universities and commercial partners across Europe. Presentations were also given of ambitious programmes in the US, such as the Northeast Quantum Systems Center at Brookhaven National Laboratory and the Quantum Science Program at Fermilab, which includes research areas in superconducting quantum systems, quantum algorithms for HEP, and computational problems and theory.

Perhaps most importantly, the workshop brought members of the HEP community together with leading companies working on quantum-computing technologies. Intel, IBM, Strangeworks, D-Wave, Microsoft, Rigetti and Google all presented their latest work in this area at the event. Of these companies, Intel and IBM are already working closely with CERN through CERN openlab. Plus, Google also announced at the event that they have signed an agreement to join CERN openlab.

“Now is the right time for the HEP community to get involved and engage with different quantum-computing initiatives already underway, fostering common activities and knowledge sharing,” says Federico Carminati, CERN openlab CIO and chair of the event. “With its well-established links across many of the world’s leading ICT companies, CERN openlab is ideally positioned to help drive this activity forward. We believe this first event was a great success and look forward to organising future activities in this exciting area.”

Recordings of the talks given at the workshop are available via the CERN openlab website at: openlab.cern.

The post Exploring quantum computing for high-energy physics appeared first on CERN Courier.

ROOT’s renovation takes centre stage at Sarajevo meeting

cern — Mon, 29 Oct 2018 14:59:25 +0000

Participants of the ROOT workshop, Image credit: F Rademakers

The 11th ROOT Users’ Workshop was held on 10–13 September in Sarajevo, Bosnia and Herzegovina, at the Academy of Science and Arts: an exceptional setting that also provided an opportunity to involve Bosnia and Herzegovina in CERN’s activities.

The SoFTware Development for Experiments group in the experimental physics department at CERN drives the development of ROOT, a modular software toolkit for processing, analysing and visualising scientific data. ROOT is also a means to read and write data: LHC experiments alone produced about 1 exabyte of data stored in the ROOT file format.

Thousands of high-energy physicists use ROOT daily to produce scientific results. For the ROOT team, this is a big responsibility, especially considering the challenges Run 3 at the LHC and the High Luminosity LHC (HL-LHC) pose to all of us. Luckily, we can rely on a lively user community, whose contribution is so useful that, periodically, a ROOT users’ workshop is organised. The event’s objective is to gather together the ROOT community of users and developers to collect criticism, praise and suggestions: a unique occasion to shape the future of the ROOT project.

More than 100 people attended this year’s workshop, a 30% increase from 2015, making the event a success. What’s more, the diversity of the attendees – students, analysis physicists, software experts and framework developers – brought different levels of expertise to the event. The workshop featured 69 contributions as well as engaging discussions. Software companies participated, with three invited contributions: Peter Müßig from SAP presented OpenUI5, the core of the SAP javascript framework that will be used for ROOT’s graphical user interface; Chandler Carruth from Google discussed ways to make large-scale software projects written in C++, the language for number-crunching code in high-energy and nuclear physics (HENP), simpler, faster and safer; and Sylvain Corlay from Quantstack showed novel ways to tackle numerical analysis with multidimensional array expressions. These speakers said they enjoyed the workshop and plan to come to CERN to extend the collaboration.

ROOT’s renovation was the workshop’s main theme. To be at the bleeding edge of software technology, ROOT – which has been the cornerstone of virtually all HENP software stacks for two decades – is undergoing an intense upgrade of its key components. This effort represents an exciting time for physicists and software developers. In the event, ROOT users expressed their appreciation of the effort to make it easier to use and faster on modern computer architectures, with the sole objective of reducing the time interval between data delivery and the presentation of plots.

In particular, the spotlight was on the modernisation of the I/O subsystem, crucial for the future LHC physics programme; ROOT’s parallelisation, a prerequisite to face Run 3 and HL-LHC analyses; as well as on new graphics, multivariate tools and an interface to the Python language, which are all elements of prime importance for scientists’ everyday work.

The participants’ feedback was enthusiastic, the atmosphere was positive, and the criticism received was constructive and fruitful for the ROOT team. We thank the participating physicists and computer scientists: we appreciated your engagement and are looking forward to organising the next ROOT workshop.

The post ROOT’s renovation takes centre stage at Sarajevo meeting appeared first on CERN Courier.

Quantum thinking required

cern — Mon, 29 Oct 2018 09:00:01 +0000

Cooling technology for a prototype quantum processor developed by IBM, one of several companies working to develop quantum-computing technologies. Image credit: IBM Research

The High-Luminosity Large Hadron Collider (HL-LHC), due to operate in around 2026, will require a computing capacity 50–100 times greater than currently exists. The big uncertainty in this number is largely due to the difficulty in knowing how well the code used in high-energy physics (HEP) can benefit from new, hyper-parallel computing architectures as they become available. Up to now, code modernisation is an area in which the HEP community has generally not fared too well.

We need to think differently to address the vast increase in computing requirements ahead. Before the Large Electron–Positron collider was launched in the 1980s, its computing challenges also seemed daunting; early predictions underestimated them by a factor of 100 or more. Fortunately, new consumer technology arrived and made scientific computing, hitherto dominated by expensive mainframes, suddenly more democratic and cheaper.

A similar story unfolded with the LHC, for which the predicted computing requirements were so large that IT planners offering their expert view were accused of sabotaging the project! This time, the technology that made it possible to meet these requirements was grid computing, conceived at the turn of the millennium and driven largely by the ingenuity of the HEP community.

Looking forward to the HL-LHC era, we again need to make sure the community is ready to exploit further revolutions in computing. Quantum computing is certainly one such technology on the horizon. Thanks to the visionary ideas of Feynman and others, the concept of quantum computing was popularised in the early 1980s. Since then, theorists have explored its mind-blowing possibilities, while engineers have struggled to produce reliable hardware to turn these ideas into reality.

Qubits are the basic units of quantum computing: thanks to quantum entanglement, n qubits can represent 2ⁿ different states on which the same calculation can be performed simultaneously. A quantum computer with 79 entangled qubits has an Avogadro number of states (about 10²³); with 263 qubits, such a machine could represent as many concurrent states as there are protons in the universe; while an upgrade to 400 qubits could contain all the information encoded in the universe.

However, the road to unlocking this potential – even partially – is long and arduous. Measuring the quantum states that result from a computation can prove difficult, offsetting some of the potential gains. Also, since classical logic operations tend to destroy the entangled state, quantum computers require special reversible gates. The hunt has been on for almost 30 years for algorithms that could outperform their classical counterparts. Some have been found, but it seems clear that there will be no universal quantum computer on which we will be able to compile our C++ code and then magically run it faster. Instead, we will have to recast our algorithms and computing models for this brave new quantum world.

In terms of hardware, progress is steady but the prizes are still a long way off. The qubit entanglement in existing prototypes, even when cooled to the level of millikelvins, is easily lost and the qubit error rate is still painfully high. Nevertheless, a breakthrough in hardware could be achieved at any moment.

A few pioneers are already experimenting with HEP algorithms and simulations on quantum computers, with significant quantum-computing initiatives having been announced recently in both Europe and the US. In CERN openlab, we are now exploring these opportunities in collaboration with companies working in the quantum-computing field – kicking things off with a workshop at CERN in November (see below).

The HEP community has a proud tradition of being at the forefront of computing. It is therefore well placed to make significant contributions to the development of quantum computing – and stands to benefit greatly, if and when its enormous potential finally begins to be realised.

The post Quantum thinking required appeared first on CERN Courier.

US initiative to tackle data demands of HL-LHC

cern — Fri, 28 Sep 2018 13:27:18 +0000

A simulation of a possible signal for new physics in the CMS detector at the HL-LHC, where analyses will be complicated by the simultaneous presence of up to 200 background events. Image credit: CERN

The US National Science Foundation (NSF) has launched a $25 million effort to help tackle the torrent of data from the High-Luminosity Large Hadron Collider (HL-LHC). The Institute for Research and Innovation in Software for High-Energy Physics (IRIS-HEP), announced on 4 September, brings together multidisciplinary teams of researchers and educators from 17 universities in the US. It will receive $5 million per year for a period of five years, with a focus on developing new software tools, algorithms, system designs and training the next generation of users.

Construction for the HL-LHC upgrade is already under way (CERN Courier July/August 2018 p7) and the machine is expected to reach full capability in the mid-2020s. Boosting the LHC’s luminosity by a factor of almost 10, HL-LHC will collect around 25 times more data than the LHC has produced up to now and push data processing and storage to the limit. How to address the immense computing challenges ahead was the subject of a recent community white paper published by the HEP Software Foundation (CERN Courier April 2018 p38).

In 2016, the NSF convened a project to gauge the LHC data challenge, bringing together representatives from the high-energy physics and computer-science communities to review two decades of successful LHC data-processing approaches and discuss ways to address the obstacles that lay ahead. The new software institute emerged from that effort.

The institute is primarily about people, rather than computing hardware, explains IRIS-HEP principal investigator and executive director Peter Elmer of Princeton University, who is also a member of the CMS collaboration. “The institute will be virtual, with a core at Princeton, but coordinated as a single distributed collaborative project involving the participating universities similar to many activities in high-energy physics,” he says. “High-energy physics had a rush of discoveries in the 1960s and 1970s that led to the Standard Model of particle physics, and the Higgs boson was the last missing piece of that puzzle. We are now searching for the next layer of physics beyond the Standard Model. The software institute will be key to getting us there.”

Co-funded by NSF’s Office of Advanced Cyberinfrastructure (OAC) and the NSF division of physics, IRIS-HEP is the third OAC software institute, following the Molecular Sciences Software Institute and the Science Gateways Community Institute.

“Our US colleagues worked with us very closely preparing the community white paper last year, which was then used as one of the significant inputs into the NSF proposal,” says Graeme Stewart of CERN and the HEP Software Foundation. “So we’re really happy about the funding announcement and very much looking forward to working together with them.”

The post US initiative to tackle data demands of HL-LHC appeared first on CERN Courier.

Boosting high-performance computing in Nepal

cern — Fri, 31 Aug 2018 08:00:00 +0000

CERN director for research and computing, Eckhard Elsen (left), and Ram Prasad Subedi of the Permanent Mission of Nepal, Geneva, with the computing equipment. Credit: CERN-201806-162-1

On 28 June, 200 servers from the CERN computing centre were donated to Kathmandu University (KU) in Nepal. The equipment, which is no longer needed by CERN, will contribute towards a new high-performance computing facility for research and educational purposes.

With more than 15,000 students across seven schools, KU is the second largest university in Nepal. But infrastructure and resources for carrying out research are still minimal compared to universities of similar size in Europe and the US. For example, the KU school of medicine is forced to periodically delete medical imaging data because disk storage is at a premium, undermining the value of the data for preventative screening of diseases, or for population health studies. Similarly, R&D projects in the schools of science and engineering fulfill their needs by borrowing computing time abroad, either through online data transfer, marred by bandwidth, or by physically taking data tapes to institutes abroad for analysis.

“We cannot emphasise enough the need for a high-performance computing facility at KU, and, speaking of the larger national context, in Nepal,” says Rajendra Adhikari, an assistant professor of physics at KU. “The server donation from CERN to KU will have a historically significant impact in fundamental research and development at KU and in Nepal.”

A total of 184 CPU servers and 16 disk servers, in addition to 12 network switches, were shipped from CERN to KU. The CPU servers’ capacity represents more than 2500 processor cores and 8 TB of memory, while the disk servers will provide more than 700 TB of storage. The total computing capacity is equivalent to more than 2000 typical desktop computers.

Since 2012, CERN has regularly donated computing equipment that no longer meets its highly specific requirements but is still more than adequate for less exacting environments. To date, a total of 2079 servers and 123 network switches have been donated to countries and international organisations, namely Algeria, Bulgaria, Ecuador, Egypt, Ghana, Mexico, Morocco, Pakistan, the Philippines, Senegal, Serbia, the SESAME laboratory in Jordan, and now Nepal. In the process leading up to the KU donation, the government of Nepal and CERN signed an International Cooperation Agreement to formalise their relationship (CERN Courier October 2017 p28).

“It is our hope that the server handover is one of the first steps of scientific partnership. We are committed to accelerate the local research programme, and to collaborate with CERN and its experiments in the near future,” says Adhikari.

The post Boosting high-performance computing in Nepal appeared first on CERN Courier.

Higgs centre opens for business

cern — Mon, 09 Jul 2018 10:56:38 +0000

The new Higgs Centre for Innovation. Credit: STFC

A new facility called the Higgs Centre for Innovation opened at the Royal Observatory in Edinburgh on 25 May as part of the UK government’s efforts to boost productivity and innovation. The centre, named after Peter Higgs of the University of Edinburgh, who shared the 2013 Nobel Prize in physics for his theoretical work on the Higgs mechanism, will offer start-up companies direct access to academics and industry experts. Space-related technology and big-data analytics are the intended focus, and up to 12 companies will be based there at any one time. According to a press release from the UK Science and Technology Facilities Council (STFC), the facility incorporates laboratories and working spaces for researchers, and includes a business incubation centre based on the successful European Space Agency model already in operation in the UK.

“Professor Higgs’ theoretical work could only be proven by collaboration in different scientific fields, using technology built through joint international ventures,” said principal and vice-chancellor of the University of Edinburgh Peter Mathieson. “This reflects the aims and values of the Higgs Centre for Innovation, which bring scientists, engineers and students together under one roof to work together for the purpose of bettering our understanding of space-related science and driving technological advancement forward.”

The Higgs Centre for Innovation was funded through a £10.7 million investment from the UK government via STFC, which is also investing £2 million over the next five years to operate the centre.

The post Higgs centre opens for business appeared first on CERN Courier.

Learning machine learning

cern — Mon, 09 Jul 2018 11:30:53 +0000

Electromagnetic shower identification for the OPERA experiment, showing a detector element filled with background tracks (top left), the same volume after pre-filtering by OPERA’s tracking algorithm (top right), and finally the shower revealed after even more thorough filtering (bottom). Credit: OPERA collaboration

Machine learning, whereby the ability of a computer to perform an intelligent task progressively improves, has penetrated many scientific domains. It allows researchers to tackle problems from a completely new perspective, enabling improvements to things previously thought solved for good. The downside of machine learning is that the field itself is developing so quickly, with new techniques popping up at an incredible rate, that it is hard to keep up. What is needed is some sort of high-level trigger to discriminate between good and bad, and to guide a growing community of users in a systematic way.

Machine-learning techniques are already in wide use in particle physics, and they will only become more prevalent during the coming years of the high-luminosity LHC and future colliders. Online data processing, offline data analysis, fast Monte Carlo generation techniques and detector-upgrade optimisation are just a few examples of the areas that could profit significantly from smarter algorithms (see The rise of deep learning).

The most remarkable growth trend in machine learning today, and one that has also been heavily hyped, concerns so-called deep learning. Although there is no strict boundary, a neural network with less than four layers is considered “shallow”, while one with more than 10 layers and many thousands of connections is considered “deep”. Using deep-learning algorithms, plus performative computing resources and extremely large datasets, researchers have managed to break important barriers for such tasks as text translation, voice recognition, image segmentation and even to master the game Go. Many of the educational materials one can find on the Internet are thus focused around typical tasks such as image recognition, annotation, segmentation, text processing and pattern generation.

Since most of these are conveyed in computer-science language, there is an obvious language barrier for domain-specific scientists, such as particle physicists, who have to learn a new technique and apply it to their own research. Another complication is that there are a variety of machine-learning methods capable of solving particular problems and a plenitude of tools (i.e. different languages, packages and platforms) out there – almost all of which are online – with which to implement those methods.

Targeting particle physics

As machine learning spreads into new domains such as astrophysics or biology, schools that focus on problems in specific areas are becoming more popular. Historically there are several summer schools for particle physicists focused around data analysis, computing and statistical learning – in particular the CERN School for computing, INFN School of Statistics and the CMS Data Analysis School. But, until 2014, none focused specifically on machine learning. In that year, a series with the straightforward title Machine-Learning school for High-Energy Physics (MLHEP) was launched.

MLHEP grew out of the well-established Yandex School of Data Analysis (YSDA), a non-commercial educational organisation funded by the Russia-based internet firm Yandex. Over the past decade, YSDA has grown to receive several thousand applications per year, out of which around 200 people pass the entrance exams and around 50 graduate in conjunction with leading Russian universities – almost all of them finding data-science positions in the private sector.

Simulated overlapping electromagnetic showers in the proposed SHiP detector, which participants of MLHEP-2017 had to recognise against dense background settings using machine-learning methods.

In 2015, YSDA joined the LHCb collaboration. The goal was to help optimise LHCb’s high-level trigger system to improve its efficiency for selecting B-decay events, and the result of the LHCb-YSDA collaboration was an efficiency gain of up to 60% compared to that obtained during LHC Run I. Another early joint effort between YSDA, CERN and MIT within LHCb was the design of decision-tree algorithms capable of decorrelating their output from a given variable, such as invariant mass.

The first MLHEP schools in 2015 and 2016 were satellite events at the Large Hadron Collider Physics (LHCP) conference held in St. Petersburg and Lund, respectively. Another key contrib-utor to the school was the faculty of computer science at Russia’s Higher School of Economics (HSE), which was founded in 2014 by Yandex. MLHEP 2017 was organised by Imperial College London in the UK, and the 2018 school takes place in Oxford at the beginning of August.

The topics covered during the schools usually start from the basic aspects of machine learning, such as loss functions, optimisation methods, predictive-model quality validation, and stretch towards advanced techniques like generative adversarial networks and Bayesian optimisation. The curriculum is not static, and each year the focus changes to address the most interesting and promising trends in deep learning while providing an overview of various techniques available on the market. At the 2018 school, speakers were invited from both academia and from companies, including Oracle, Nvidia, Yandex and DeepMind.

Breaking the language barrier

Some people compare deep learning not with a tool or platform, but with a language that allows a researcher to express computational “sentences” addressing a particular problem.

To reinforce the language analogy, recall that there is no solid theory of deep learning yet; in a sense it is just a bunch of best-practices and approaches that has proven to work in several important cases. A lot of the time during MLHEP classes is therefore devoted to practical exercises. School students are also encouraged to enter a data-science competition that is related to particle physics – e.g. tracking for the Coherent Muon to Electron Transition (COMET) experiment and event selection for the Higgs-boson discovery by the ATLAS and CMS experiments. The competition is published on the machine-learning competition platform kaggle.com at the start of the school, and is open for anyone who wants to get more machine-learning practice.

For summer 2017, the competition was organised together with the OPERA and SHiP collaborations. The goal was to analyse volumes of nuclear emulsions collected by OPERA that contain lots of cosmic-background tracks as well as tracks from electromagnetic showers. These shower-like structures are of interest for OPERA for the analysis of tau-neutrino interactions, so special algorithms have to be developed. Such algorithms are also very relevant to the SHiP experiment, which aims to use emulsion-based detectors for finding hidden-sector particles at CERN. According to some theoretical models, such showers might be closely related to hidden-sector particle interaction with regular matter (e.g. elastic scattering of very weakly-interacting particles off electrons or nuclei), so a separate task would be to discriminate these showers from neutrino interactions. The performance of the algorithms designed by participants was amazing. The winner of the challenge presented his solution at the SHiP collaboration meeting in November 2017 and was invited by OPERA to continue the collaboration.

A major part of the MLHEP curriculum is given by YSDA/HSE lecturers, and guest speakers help to broaden the view on the machine-learning challenges and methods. The school is non-commercial, and its success depends on external contributions from the HSE, YSDA, local organisers and commercial sponsors. For the past two years we have been supported by the Marie Skłodowska-Curie training network AMVA4NewPhysics, which has also sent several PhD students to the school.

The format of the summer school is very productive, allowing students to dive into the topics without distraction. The school materials also remain available at GitHub, allowing students to access them whenever they want. As time goes by, basic machine-learning courses are becoming more readily available online, giving us a chance to introduce more advanced topics every year and to keep up with the rapid developments in this field.

The post Learning machine learning appeared first on CERN Courier.

The rise of deep learning

cern — Mon, 09 Jul 2018 11:15:46 +0000

Credit: iStock/ktsimage.

It is 1965 and workers at CERN are busy analysing photographs of trajectories of particles travelling through a bubble chamber. These and other scanning workers were employed by CERN and laboratories across the world to manually scan countless such photographs, seeking to identify specific patterns contained in them. It was their painstaking work – which required significant skill and a lot of visual effort – that put particle physics in high gear. Researchers used the photographs (see figures 1 and 3) to make discoveries that would form a cornerstone of the Standard Model of particle physics, such as the observation of weak neutral currents with the Gargamelle bubble chamber in 1973.

In the subsequent decades the field moved away from photographs to collision data collected with electronic detectors. Not only had data volumes become unmanageable, but Moore’s law had begun to take hold and a revolution in computing power was under way. The marriage between high-energy physics and computing was to become one of the most fruitful in science. Today, the Large Hadron Collider (LHC), with its hundreds of millions of proton–proton collisions per second, generates data at a rate of 25 GB/s – leading the CERN data centre to pass the milestone of 200 PB of permanently archived data last summer. Modelling, filtering and analysing such datasets would be impossible had the high-energy-physics community not invested heavily in computing and a distributed-computing network called the Grid.

Learning revolution

The next paradigm change in computing, now under way, is based on artificial intelligence. The so-called deep learning revolution of the late 2000s has significantly changed how scientific data analysis is performed, and has brought machine-learning techniques to the forefront of particle-physics analysis. Such techniques offer advances in areas ranging from event selection to particle identification to event simulation, accelerating progress in the field while offering considerable savings in resources. In many cases, images of particle tracks are making a comeback – although in a slightly different form from their 1960s counterparts.

Fig. 1. A scanning worker analysing a bubble-chamber photograph in 1980. Credit: CERN

Artificial neural networks are at the centre of the deep learning revolution. These algorithms are loosely based on the structure of biological brains, which consist of networks of neurons interconnected by signal-carrying synapses. In artificial neural networks these two entities – neurons and synapses – are represented by mathematical equivalents. During the algorithm’s “training” stage, the values of parameters such as the weights representing the synapses are modified to lower the overall error rate and improve the performance of the network for a particular task. Possible tasks vary from identifying images of people’s faces to isolating the particles into which the Higgs boson decays from a background of identical particles produced by other Standard Model processes.

Artificial neural networks have been around since the 1960s. But it took several decades of theoretical and computational development for these algorithms to outperform humans, in some specific tasks. For example: in 1996, IBM’s chess-playing computer Deep Blue won its first game against the then world chess champion Garry Kasparov; in 2016 Google DeepMind’s AlphaGo deep neutral-network algorithm defeated the best human players in the game of Go; modern self-driving cars are powered by deep neural networks; and in December 2017 the latest DeepMind algorithm, called AlphaZero, learned how to play chess in just four hours and defeated the world’s best chess-playing computer program. So important is artificial intelligence in potentially addressing intractable challenges that the world’s leading economies are establishing dedicated investment programmes to better harness its power.

Computer vision

The immense computing and data challenges of high-energy physics are ideally suited to modern machine-learning algorithms. Because the signals measured by particle detectors are stored digitally, it is possible to recreate an image from the outcome of particle collisions. This is most easily seen for cases where detectors offer discrete pixelised position information, such as in some neutrino experiments, but it also applies, on a more complex basis, to collider experiments. It was not long after computer-vision techniques, which are based on so-called convolutional neural networks (figure 2), were applied to the analysis of images that particle physicists applied them to detector images – first of jets and then of photons, muons and neutrinos, simplifying and making the task of understanding ever-larger and more abstract datasets more intuitive.

Fig. 2. Convolutional neural networks automatically extract features from input images in stages, drawing inspiration from the structure of biological visual systems. Spatiotemporal convolutions over the image (blue boxes) lead to increasingly higher-order feature abstractions while fully connected layers learn how to best combine them to perform image classification. Credit: Modified from MathWorks

Particle physicists were among the first to use artificial-intelligence techniques in software development, data analysis and theoretical calculations. The first of a series of workshops on this topic, titled Artificial Intelligence in High-Energy and Nuclear Physics (AIHENP), was held in 1990. At the time, several changes were taking effect. For example, neural networks were being evaluated for event-selection and analysis purposes, and theorists were calling on algebraic or symbolic artificial-intelligence tools to cope with a dramatic increase in the number of terms in perturbation-theory calculations.

Over the years, the AIHENP series was renamed ACAT (Advanced Computing and Analysis Techniques) and expanded to span a broader range of topics. However, following a new wave of adoption of machine learning in particle physics, the focus of the 18th edition of the workshop, ACAT 2017, was again machine learning – featuring its role in event reconstruction and classification, fast simulation of detector response, measurements of particle properties, and AlphaGo-inspired calculations of Feynman loop integrals, to name a few examples.

Learning challenge

For these advances to happen, machine-learning algorithms had to improve and a physics community dedicated to machine learning needed to be built. In 2014 a machine-learning challenge set up by the ATLAS experiment to identify the Higgs boson garnered close to 2000 participants on the machine-learning competition platform Kaggle. To the surprise of many, the challenge was won by a computer scientist armed with an ensemble of artificial neural networks. In 2015 the Inter-experimental LHC Machine Learning working group was born at CERN out of a desire of physicists from across the LHC to have a platform for machine-learning work and discussions. The group quickly grew to include all the LHC experiments and to involve others outside CERN, like the Belle II experiment in Japan and neutrino experiments worldwide. More dedicated training efforts in machine learning are now emerging, including the Yandex machine learning school for high-energy physics and the INSIGHTS and AMVA4NewPhysics Marie Skłodowska-Curie Innovative Training Networks (see Learning machine learning).

Fig. 3. A modern high-pileup event recorded by the CMS experiment (top) compared to an old bubble-chamber photograph (bottom). Credit: CERN

Event selection, reconstruction and classification are arguably the most important particle-physics tasks to which machine learning has been applied. As in the time of manual scanning, when the photographs of particle trajectories were analysed to select events of potential physics interest, modern trigger systems are used by many particle-physics experiments, including those at the LHC, to select events for further analysis (figure 3). The decision of whether to save or throw away an event has to be made in a split microsecond and requires specialised hardware located directly on the trigger systems’ logic boards. In 2010 the CMS experiment introduced machine-learning algorithms to its trigger system to better estimate the momentum of muons, which may help identify physics beyond the Standard Model. At around the same time, the LHCb experiment also began to use such algorithms in their trigger system for event selection.

Neutrino experiments such as NOvA and MicroBooNE at Fermilab in the US have also used computer-vision techniques to reconstruct and classify various types of neutrino events. In the NOvA experiment, using deep learning techniques for such tasks is equivalent to collecting 30% more data, or alternatively building and using more expensive detectors – potentially saving global taxpayers significant amounts of money. Similar efficiency gains are observed by the LHC experiments.

Currently, about half of the Worldwide LHC Computing Grid budget in computing is spent simulating the numerous possible outcomes of high-energy proton–proton collisions. To achieve a detailed understanding of the Standard Model and any physics beyond it, a tremendous number of such Monte Carlo events needs to be simulated. But despite the best efforts by the community worldwide to optimise these simulations, the speed is still a factor of 100 short of the needs of the High-Luminosity LHC, which is scheduled to start taking data around 2026. If a machine-learning model could directly learn the properties of the reconstructed particles and bypass the complicated simulation process of the interactions between the particles and the material of the detectors, it could lead to simulations orders of magnitude faster than those currently available.

Competing networks

One idea for such a model relies on algorithms called generative adversarial networks (GANs). In these algorithms, two neural networks compete with each other for a particular goal, with one of them acting as an adversary that the other network is trying to fool. CERN’s openlab and software for experiments group, along with others in the LHC community and industry partners, are starting to see the first results of using GANs for faster event and detector simulations.

Particle physics has come a long way from the heyday of manual scanners in understanding elementary particles and their interactions. But there are gaps in our understanding of the universe that need to be filled – the nature of dark matter, dark energy, matter–antimatter asymmetry, neutrinos and colour confinement, to name a few. High-energy physicists hope to find answers to these questions using the LHC and its upcoming upgrades, as well as future lepton colliders and neutrino experiments. In this endeavour, machine learning will most likely play a significant part in making data processing, data analysis and simulation, and many other tasks, more efficient.

Driven by the promise of great returns, big companies such as Google, Apple, Microsoft, IBM, Intel, Nvidia and Facebook are investing hundreds of millions of dollars in deep learning technology including dedicated software and hardware. As these technologies find their way into particle physics, together with high-performance computing, they will boost the performance of current machine-learning algorithms. Another way to increase the performance is through collaborative machine learning, which involves several machine-learning units operating in parallel. Quantum algorithms running on quantum computers might also bring orders-of-magnitude improvement in algorithm acceleration, and there are probably more advances in store that are difficult to predict today. The availability of more powerful computer systems together with deep learning will likely allow particle physicists to think bigger and perhaps come up with new types of searches for new physics or with ideas to automatically extract and learn physics from the data.

That said, machine learning in particle physics still faces several challenges. Some of the most significant include understanding how to treat systematic uncertainties while employing machine-learning models and interpreting what the models learn. Another challenge is how to make complex deep learning algorithms work in the tight time window of modern trigger systems, to take advantage of the deluge of data that is currently thrown away. These challenges aside, the progress we are seeing today in machine learning and in its application to particle physics is probably just the beginning of the revolution to come.

The post The rise of deep learning appeared first on CERN Courier.

Flooded LHC data centre back in business

cern — Thu, 19 Apr 2018 11:00:00 +0000

INFN-CNAF personnel working by torchlight a few days after the incident, wearing masks due to the presence of dust and mud. Image credit: INFN-CNAF

Following severe damage caused by flooding on 9 November, the INFN-CNAF Tier-1 data centre of the Worldwide LHC Computing Grid (WLCG) in Bologna, Italy, has been fully repaired and is back in business crunching LHC data. The incident was caused by the burst of a large water pipe at high pressure in a nearby street, which rapidly flooded the area where the data centre is located. Although the centre was designed to be waterproof against natural events, the volume of water was overwhelming: some 500 m³ of water and mud entered the various rooms, seriously damaging electronic appliances, computing servers, network and storage equipment. A room hosting four 1.4 MW electrical-power panels was filled first, leaving the centre without electricity.

The Bologna centre, which is one of 14 Tier-1 WLCG centres located around the world, hosts a good fraction of LHC data and associated computing resources. It is equipped with around 20,000 CPU cores, 25 PB of disk storage, and a tape library presently filled with about 50 PB of data. Offline computing activities for the LHC experiments were immediately affected. About 10% of the servers, disks, tape cartridges and computing nodes were reached by floodwater, and the mechanics of the tape library were also affected.

Despite the scale of the damage, INFN-CNAF personnel were not discouraged, quickly defining a roadmap to recovery and then attacking one by one all the affected subsystems. First, the rooms at the centre had to be dried and then meticulously cleaned to remove residual mud. Then, within a few weeks, new electrical panels were installed to allow subsystems to be turned back on.

Although all LHC disk-storage systems were reached by the water, the INFN-CNAF personnel were able to recover the data in their entirety, without losing a single bit. This was thanks in part to the available level of redundancy of the disk arrays and to their vertical layout. Wet tape cartridges hosting critical LHC data had to be sent to a specialised laboratory for data recovery.

A dedicated computing farm was set up very quickly at the nearby Cineca computing centre and connected to INFN-CNAF via a high-speed 400 Gbps link to enable the centre to reach the required LHC capacity for 2018. During March, three months since the incident, all LHC experiments were progressively put back online. Following the successful recovery, INFN is planning to move the centre to a new site in the coming years.

The post Flooded LHC data centre back in business appeared first on CERN Courier.

Time to adapt for big data

cern — Fri, 23 Mar 2018 11:00:00 +0000

Inside the CERN computer centre in 2017. Image credit: J Ordan/CERN

It would be impossible for anyone to conceive of carrying out a particle-physics experiment today without the use of computers and software. Since the 1960s, high-energy physicists have pioneered the use of computers for data acquisition, simulation and analysis. This hasn’t just accelerated progress in the field, but driven computing technology generally – from the development of the World Wide Web at CERN to the massive distributed resources of the Worldwide LHC Computing Grid (WLCG) that supports the LHC experiments. For many years these developments and the increasing complexity of data analysis rode a wave of hardware improvements that saw computers get faster every year. However, those blissful days of relying on Moore’s law are now well behind us (see “CPU scaling comes to the end of an era”), and this has major ramifications for our field.

Image credit: C Legget /LBL

The high-luminosity upgrade of the LHC (HL-LHC), due to enter operation in the mid-2020s, will push the frontiers of accelerator and detector technology, bringing enormous challenges to software and computing (CERN Courier October 2017 p5). The scale of the HL-LHC data challenge is staggering: the machine will collect almost 25 times more data than the LHC has produced up to now, and the total LHC dataset (which already stands at almost 1 exabyte) will grow many times larger. If the LHC’s ATLAS and CMS experiments project their current computing models to Run 4 of the LHC in 2026, the CPU and disk space required will jump by between a factor of 20 to 40 (figures 1 and 2).

Even with optimistic projections of technological improvements there would be a huge shortfall in computing resources. The WLCG hardware budget is already around 100 million Swiss francs per year and, given the changing nature of computing hardware and slowing technological gains, it is out of the question to simply throw more resources at the problem and hope things will work out. A more radical approach for improvements is needed. Fortunately, this comes at a time when other fields have started to tackle data-mining problems of a comparable scale to those in high-energy physics – today’s commercial data centres crunch data at prodigious rates and exceed the size of our biggest Tier-1 WLCG centres by a large margin. Our efforts in software and computing therefore naturally fit into and can benefit from the emerging field of data science.

Fig. 1. The estimated disk-space requirements (left) and CPU resources (right) of the ATLAS experiment for period the 2018 to 2028 for both data and simulation processing. The blue points are estimates based on the current software performance estimates, while the solid line shows the amount of resources expected to be available if a flat-funding scenario is assumed, which implies an increase of 20% per year based on the current technology trends.

A new way to approach the high-energy physics (HEP) computing problem began in 2014, when the HEP Software Foundation (HSF) was founded. Its aim was to bring the HEP software community together and find common solutions to the challenges ahead, beginning with a number of workshops organised by a dedicated startup team. In the summer of 2016 the fledgling HSF body was charged by WLCG leaders to produce a roadmap for HEP software and computing. With help from a planning grant from the US National Science Foundation, at a meeting in San Diego in January 2017, the HSF brought community and non-HEP experts together to gather ideas in a world much changed from the time when the first LHC software was created. The outcome of this process was summarised in a 90-page-long community white paper released in December last year.

The report doesn’t just look at the LHC but considers common problems across HEP, including neutrino and other “intensity-frontier” experiments, Belle II at KEK, and future linear and circular colliders. In addition to improving the performance of our software and optimising the computing infrastructure itself, the report also explores new approaches that would extend our physics reach as well as ways to improve the sustainability of our software to match the multi-decade lifespan of the experiments.

Fig. 2. CMS estimated disk space (left) and CPU resources (right) required for the high-luminosity LHC era, using the current computing model with parameters projected for the next 12 years. The vertical label “THS06” stands for Terra HEP SPEC 06, a standard measure of the performance of a CPU code used in high-energy physics.

Almost every aspect of HEP software and computing is presented in the white paper, detailing the R&D programmes necessary to deliver the improvements the community needs. HSF members looked at all steps from event generation and data taking up to final analysis, each of which presents specific challenges and opportunities.

Souped-up simulation

Every experiment needs to be grounded in our current knowledge of physics, which means that generating simulated physics events is essential. For much of the current HEP experiment programme it is sufficient to generate events based on leading-order calculations – a relatively modest task in terms of computing requirements. However, already at Run 2 of the LHC there is an increasing demand for next-to-leading order, or even next-to-next-to-leading order, event generators to allow more precise comparisons between experiments and the Standard Model predictions (CERN Courier April 2017 p18). These calculations are particularly challenging both in terms of the software (e.g. handling difficult integrations) and the mathematical technicalities (e.g. minimising negative event weights), which greatly increase the computational burden. Some physics analyses based on Run-2 data are limited by theoretical uncertainties and, by Run 4 in the mid-2020s, this problem will be even more widespread. Investment in technical improvements of the computation is therefore vital, in addition to progress in our underlying theoretical understanding.

Increasingly large and sophisticated detectors, and the search for rarer processes hidden amongst large backgrounds, means that particle physicists need ever-better detector simulation. The models describing the passage of particles through the detector need to be improved in many areas for high precision work at the LHC and for the neutrino programme. With simulation being such a huge consumer of resources for current experiments (often representing more than half of all computing done), it is a key area to adapt to new computing architectures.

The final touches to the HEP Software Foundation (HSF) white paper were made at a workshop in Annecy in June 2017. Image credit: B Lieunard/LAPP

Vectorisation, whereby processors can execute identical arithmetic instructions on multiple pieces of data, would force us to give up the simplicity of simulating each particle individually. The best way to do this is one of the most important R&D topics identified by the white paper. Another is to find ways to reduce the long simulation times required by large and complex detectors, which exacerbates the problem of creating simulated data sets with sufficiently high statistics. This requires research into generic toolkits for faster simulation. In principle, mixing and digitising the detector hits at high pile-up is a problem that is particularly suited for parallel processing on new concurrent computing architectures – but only if the rate at which data is read can be managed.

This shift to newer architectures is equally important for our software triggers and event-reconstruction code. Investing more effort in software triggers, such as those already being developed by the ALICE and LHCb experiments for LHC Run 3, will help control the data volumes and enable analyses to be undertaken directly from initial reconstruction by avoiding an independent reprocessing step. For ATLAS and CMS, the increased pile-up at high luminosity makes charged-particle tracking within a reasonable computing budget a critical challenge (figure 3). Here, as well as the considerable effort required to make our current code ready for concurrent use, research is needed into the use of new, more “parallelisable” algorithms, which maintain physics accuracy. Only these would allow us to take advantage of the parallel capabilities of modern processors, including GPUs (just like the gaming industry has done, although without the need there to treat the underlying physics with such care). The use of updated detector technology such as track triggers and timing detectors will require software developments to exploit this additional detector information.

Fig. 3. The high-luminosity LHC upgrade will produce around 200 simultaneous proton–proton collisions in a single bunch crossing, illustrated by a simulated event for ATLAS. Image credit: ATLAS

For final data analysis, a key metric for physicists is “time to insight”, i.e. how quickly new ideas can be tested against data. Maintaining that agility will be a huge challenge given the number of events physicists have to process and the need to keep the overall data volume under control. Currently a number of data-reduction steps are used, aiming at a final dataset that can fit on a laptop but bloating the storage requirements by creating many intermediate data products. In the future, access to dedicated analysis facilities that are designed for a fast turnaround without tedious data reduction cycles may serve the community’s needs better.

This draws on trends in the data-analytics industry, where a number of products, such as Apache Spark, already offer such a data-analysis model. However, HEP data is usually more complex and highly structured, and integration between the ROOT analysis framework and new systems will require significant work. This may also lend itself better to approaches where analysts concentrate on describing what they want to achieve and a back-end engine takes care of optimising the task for the underlying hardware resource. These approaches also integrate better with data-preservation requirements, which are increasingly important for our field. Over and above preserving the underlying bits of data, a fundamental challenge is to preserve knowledge about how to use this data. Preserved knowledge can help new analysts to start their work more quickly, so there would be quite tangible immediate benefits to this approach.

A very promising general technique for adapting our current models to new hardware is machine learning, for which there exist many excellent toolkits. Machine learning has the potential to further improve the physics reach of data analysis and may also speed up and improve the accuracy of physics simulation, triggering and reconstruction. Applying machine learning is very much in vogue, and many examples of successful applications of these data-science techniques exist, but real insight is required to know where best to invest for the HEP community. For example, a deeper understanding of the impact of such black boxes and how they relate to underlying physics, with good control of systematics, is needed. It is expected that such techniques will be successful in a number of areas, but there remains much research to be done.

New challenges

Supporting the computational training phase necessary for machine learning brings a new challenge to our field. With millions of free parameters being optimised across large GPU clusters, this task is quite unlike those currently undertaken on the WLCG grid infrastructure and represents another dimension to the HL-LHC data problem. There is a need to restructure resources at facilities and to incorporate commercial and scientific clouds into the pool available for HEP computing. In some regions high-performance computing facilities will also play a major role, but these facilities are usually not suitable for current HEP workflows and will need more consistent interfaces as well as the evolution of computing systems and the software itself. Optimising storage resources into “data lakes”, where a small number of sites act as data silos that stream data to compute resources, could be more effective than our current approaches. This will require enhanced delivery of data over the network to which our computing and software systems will need to adapt. A new generation of managed networks, where dedicated connections between sites can be controlled dynamically, will play a major role.

The many challenges faced by the HEP software and computing community over the coming decade are wide ranging and hard. They require new investment in critical areas and a commitment to solving problems in common between us, and demand that a new generation of physicists is trained with updated computing skills. We cannot afford a “business as usual” approach to solving these problems, nor will hardware improvements come to our rescue, so software upgrades need urgent attention.

The recently completed roadmap for software and computing R&D is a unique document because it addresses the problems that our whole community faces in a way that was never done before. Progress in other fields gives us a chance to learn from and collaborate with other scientific communities and even commercial partners. The strengthening of links, across experiments and in different regions, that the HEP Software Foundation has helped to produce, puts us in a good position to move forward with a common R&D programme that will be essential for the continued success of high-energy physics.

The post Time to adapt for big data appeared first on CERN Courier.

Fermilab joins CERN openlab on data reduction

cern — Mon, 15 Jan 2018 09:15:00 +0000

CERN’s computing centre, photographed in 2017.
Image credit: R Hradil.

In November, Fermilab became a research member of CERN openlab – a public-private partnership between CERN and major ICT companies established in 2001 to meet the demands of particle-physics research. Fermilab researchers will now collaborate with members of the LHC’s CMS experiment and the CERN IT department to improve technologies related to physics data reduction, which is vital for gaining insights from the vast amounts of data produced by high-energy physics experiments.

The work will take place within an existing CERN openlab project with Intel on big-data analytics. The goal is to use industry-standard big-data tools to create a new tool for filtering many petabytes of heterogeneous collision data to create manageable, but still rich, datasets of a few terabytes for analysis. Using current systems, this kind of targeted data reduction can often take weeks, but the Intel-CERN project aims to reduce it to a matter of hours.

The team plans to first create a prototype capable of processing 1 PB of data with about 1000 computer cores. Based on current projections, this is about one twentieth of the scale of the final system that would be needed to handle the data produced when the High-Luminosity LHC comes online in 2026. “This kind of work, investigating big-data analytics techniques is vital for high-energy physics — both in terms of physics data and data from industrial control systems on the LHC,” says Maria Girone, CERN openlab CTO

The post Fermilab joins CERN openlab on data reduction appeared first on CERN Courier.

Servers for SESAME

cern — Fri, 13 Oct 2017 08:00:00 +0000

Image credit: M Brice/CERN.

On 12 September, 56 servers left CERN bound for the SESAME light-source facility in Jordan. “These servers are a very valuable addition to the SESAME data centre,” said Salman Matalgah, head of IT at SESAME. “They will help ensure that we’re able to provide first-class computing support to our users.” Speaking for CERN, Charlotte Warakaulle, director for international relations, said: “After many other successful donations, it’s great that we can extend the list of beneficiaries to include SESAME: a truly inspiring project showcasing and building on scientific capacity in the Middle East and neighbouring regions.” Pictured are CERN’s head of IT Frédéric Hemmer (left), Charlotte Warakaulle and president of SESAME Council Rolf Heuer, with the servers packed and ready to go.

The post Servers for SESAME appeared first on CERN Courier.

Facing up to the exabyte era

cern — Fri, 13 Oct 2017 08:00:00 +0000

An operator in the CERN data centre. The HL-LHC will demand 50–100 times more computing capacity than the LHC, as highlighted in a CERN openlab white paper published in September.
Image credit: S Bennett/CERN.

The high-luminosity Large Hadron Collider (HL-LHC) will dramatically increase the rate of particle collisions compared with today’s machine, boosting the potential for discoveries. In addition to extensive work on CERN’s accelerator complex and the LHC detectors, this second phase in the LHC’s life will generate unprecedented data challenges.

The increased rate of collisions makes the task of reconstructing events (piecing together the underlying collisions from millions of electrical signals read out by the LHC detectors) significantly more complex. At the same time, the LHC experiments are planning to employ more flexible trigger systems that can collect a greater number of events. These factors will drive a huge increase in computing needs for the start of the HL-LHC era in around 2026. Using current software, hardware and analysis techniques, the required computing capacity is roughly 50–100 times higher than today, with data storage alone expected to enter the exabyte (10¹⁸ bytes) regime.

It is reasonable to expect that technology improvements over the next seven to 10 years will yield an improvement of around a factor 10 in both processing and storage capabilities for no extra cost. While this will go some way to address the HL-LHC’s requirements, it will still leave a significant deficit. With budgets unlikely to increase, it will not be possible to solve the problem by simply increasing the total computing resources available. It is therefore vital to explore new technologies and methodologies in conjunction with the world’s leading information and communication technology (ICT) companies.

CERN openlab, which was established by the CERN IT department in 2001, is a public–private partnership that enables CERN to collaborate with ICT companies to meet the demands of particle-physics research. Since the start of this year, CERN openlab has carried out an in-depth consultation to identify the main ICT challenges faced by the LHC research community over the coming years. Based on our findings, we published a white paper in September on future ICT challenges in scientific research.

Alberto Di Meglio.

The paper identifies 16 ICT challenge areas that need to be tackled in collaboration with industry, and these have been grouped into four overarching R&D topics. The first focuses on data-centre technologies to ensure that: data-centre architectures are flexible and cost effective; cloud-computing resources can be used in a scalable, hybrid manner; new technologies for solving storage-capacity issues are thoroughly investigated; and long-term data-storage systems are reliable and economically viable. The second major R&D topic relates to the modernisation of code, so that the maximum performance can be achieved on the new hardware platforms available. The third R&D topic focuses on machine learning, in particular its potentially large role in monitoring the accelerator chain and optimising the use of ICT resources.

The fourth R&D topic in the white paper identifies ICT challenges that are common across research disciplines. With ever more research fields such as astrophysics and biomedicine adopting big-data methodologies, it is vital that we share tools and learn from one another – in particular to ensure that leading ICT companies are producing solutions that meet our common needs.

In summary, CERN openlab has identified ICT challenges that must be tackled over the coming years to ensure that physicists worldwide can get the most from CERN’s infrastructure and experiments. In addition, the white paper demonstrates the emergence of new technology paradigms, from pervasive ultra-fast networks of smart sensors in the “internet of things”, to machine learning and “smart everything” paradigms. These technologies could revolutionise the way big science is done, particularly in terms of data analysis and the control of complex systems, and also have enormous potential for the benefit of wider society. CERN openlab, with its unique collaboration with several of the world’s leading IT companies, is ideally positioned to help make this a reality.

• openlab.cern.

The post Facing up to the exabyte era appeared first on CERN Courier.

Machine learning improves cosmic citizen science

cern — Fri, 22 Sep 2017 07:00:00 +0000

(From top) Tracks indicate a muon produced by a cosmic ray, worms indicate an electron produced by a radioactive decay, and spots indicate an electron or gamma ray.
Image credit: DECO.

Launched in 2014, the Distributed Electronic Cosmic-ray Observatory (DECO) enables Android smartphone cameras to detect cosmic rays. In response to the increasing number of events being recorded, however, the DECO team has developed a new machine-learning analysis that classifies 96% of events correctly.

Similar to detectors used in high-energy physics experiments, the semiconductors in smartphone camera sensors detect ionising radiation when charged particles traverse the depleted region of their sensor. The DECO app can spot three distinct types of charged-particle events: tracks, worms and spots (see image). Each event can be caused by a variety of particle interactions, from cosmic rays to alpha particles, and a handful of events can be expected every 24 hours or so.

These events have so far been classified by the users themselves, but the increasing number of images being collected meant there was a need for a more reliable computerised classification system. Due to the technological variations of smartphones and the orientation of the sensor when a cosmic ray strikes, traditional algorithms would have struggled to classify events.

The DECO team used advances in machine learning similar to those widely used in high-energy physics to design several deep neural-network architectures to classify the images. The best performing design, which contained over 14 million learnable parameters and was trained with 3.6 million images, correctly sorted 96% of 100 independent images. An iOS version of DECO is currently in the beta stage and is expected to be released within the next year.

The post Machine learning improves cosmic citizen science appeared first on CERN Courier.

SKA and CERN co-operate on extreme computing

cern — Fri, 11 Aug 2017 08:00:00 +0000

CERN Director-General Fabiola Gianotti and SKA director-general Philip Diamond signing a big-data co-operation agreement.
Image credit: S Bennett/CERN.

On 14 July, the Square Kilometre Array (SKA) organisation signed an agreement with CERN to formalize their collaboration in the area of extreme-scale computing. The agreement will address the challenges of “exascale” computing and data storage, with the SKA and the Large Hadron Collider (LHC) to generate an overwhelming volume of data in the coming years.

When completed, SKA will be the world’s largest radio telescope with a total collecting area of more than 1 km² using thousands of high-frequency dishes and many more low- and mid-frequency aperture array telescopes distributed across Africa, Australia and the UK. Phase 1 of the project, representing approximately 10% of the final array, will generate around 300 PB of data every year – 50% more than has been collected by the LHC experiments in the last seven years. As is the case at CERN, SKA data will be analysed by scientific collaborations distributed across the planet. The acquisition, storage, management, distribution and analysis of such volumes of scientific data is a major technological challenge.

“Both CERN and SKA are and will be pushing the limits of what is possible technologically, and by working together and with industry, we are ensuring that we are ready to make the most of this upcoming data and computing surge,”says SKA director-general Philip Diamond.

CERN and SKA have agreed to hold regular meetings to discuss the strategic direction of their collaborations, and develop demonstrator projects or prototypes to investigate concepts for managing and analysing exascale data sets in a globally distributed environment. “The LHC computing demands are tackled by the Worldwide LHC computing grid, which employs more than half a million computing cores around the globe interconnected by a powerful network,” says CERN’s director of research and computing Eckhard Elsen. “As our demands increase with the planned intensity upgrade of the LHC, we want to expand this concept by using common ideas and infrastructure into a scientific cloud. SKA will be an ideal partner in this endeavour.”

The post SKA and CERN co-operate on extreme computing appeared first on CERN Courier.

European computing cloud takes off

cern — Fri, 17 Mar 2017 09:00:00 +0000

Scope of the pilot phase for the European Open Science Cloud.

A European scheme to make publicly funded scientific data openly available has entered its first phase of development, with CERN one of several organisations poised to test the new technology. Launched in January and led by the UK’s Science and Technology Facilities Council, a €10 million two-year pilot project funded by the European Commission marks the first step towards the ambitious European Open Science Cloud (EOSC) project. With more than 30 organisations involved, the aim of the EOSC is to establish a Europe-wide data environment to allow scientists across the continent to exchange and analyse data. As well as providing the basis for better scientific research and making more efficient use of data resources, the open-data ethos promises to address societal challenges such as public-health or environmental emergencies, where easy access to reliable research data may improve response times.

The pilot phase of the EOSC aims to establish a governance framework and build the trust and skills required. Specifically, the pilot will encourage selected communities to develop demonstrators to showcase EOSC’s potential across various research areas including life sciences, energy, climate science, material science and the humanities. Given the intense computing requirements of high-energy physics, CERN is playing an important role in the pilot project.

The CERN demonstrator aims to show that the basic requirements for the capture and long-term preservation of particle-physics data, documentation, software and the environment in which it runs can be satisfied by the EOSC pilot. “The purpose of CERN’s involvement in the pilot is not to demonstrate that the EOSC can handle the complex and demanding requirements of LHC data-taking, reconstruction, distribution, re-processing and analysis,” explains Jamie Shiers of CERN’s IT department. “The motivation for long-term data preservation is for reuse and sharing.”

Propelled by the growing IT needs of the LHC and experience gained by deploying scientific workloads on commercial cloud services, explains Bob Jones of CERN’s IT department, CERN proposed a model for a European science cloud some years ago. In 2015 this model was expanded and endorsed by members of EIROforum. “The rapid expanse in the quantities of open data being produced by science is stretching the underlying IT services,” says Jones. “The Helix Nebula Science Cloud, led by CERN, is already working with leading commercial cloud service providers to support this growing need for a wide range of scientific use cases.”

The challenging EOSC project, which raises issues such as service integration, intellectual property, legal responsibility and service quality, complements the work of the Research Data Alliance and builds on the European Strategy Forum on Research Infrastructure (ESFRI) road map. “Our goal is to make science more efficient and productive and let millions of researchers share and analyse research data in a trusted environment across technologies, disciplines and borders,” says Carlos Moedas, EC commissioner for research, science and innovation.

The post European computing cloud takes off appeared first on CERN Courier.

Learning Scientific Programming With Python

cern — Fri, 14 Oct 2016 12:07:31 +0000

By Christian Hill

Cambridge University Press

Science cannot be accomplished nowadays without the help of computers to produce, analyse, treat and visualise large experimental data sets. Scientists are called to code their programs using a programming language such as Python, which in recent times has become very popular among researchers in different scientific domains. It is a high-level language that is relatively easy to learn, rich in functionality and fairly compact. It includes many additional modules, in particular scientific and visualisation tools covering a vast area in numerical computation, which make it very handy for scientists and engineers.

In this book, the author covers basic programming concepts – such as numbers, variables, strings, lists, basic data structures, control flow, and functions. It also deals with advanced concepts and idioms of the Python language and of the tools that are presented, enabling readers to quickly gain proficiency. The most advanced topics and functionalities are clearly marked, so they can be skipped in the first reading.

While discussing Python structures, the author explains the differences with respect to other languages, in particular C, which can be useful for readers migrating from these languages to Python. The book focuses on version 3 of Python, but when needed exposes the differences with version 2, which is still widely in use among the scientific community.

Once the basic concepts of the language are in place, the book passes to the NumPy, SciPy and Matplotlib libraries for numerical programming and data visualisation. These modules are open source, commonly used by scientists and easy to obtain and install. The functionality of each is well introduced with lots of examples, which is clearly an advantage with respect to the terse reference documentation of the modules that are available from the web. NumPy is the de facto standard for general scientific programming that deals very efficiently with data structures such as unidimensional arrays, while the SciPy library complements NumPy with more specific functionalities for scientific computing, including the evaluation of special functions frequently used in science and engineering, minimisation, integration, interpolation and equation solving.

Essential for any scientific work is the plotting of the data. This is achieved with the Matplotlib module, which is probably the most popular one that exists for Python. Many kinds of graphics are nicely introduced in the book, starting from the most basic ones, such as 1D plots, to fairly complex 3D and contour plots. The book also discusses the use of IPython notebooks to build rich-media documents, interleaving text and formulas with code and images into shareable documents for scientific analysis.

The book has many relevant examples, with their development traced from both science and engineering points of view. Each chapter concludes with a series of well-selected exercises, the complete step-by-step solutions of which are reported at the end of the volume. In addition, a nice collection of problems without solutions are also added to each section.

The book is a very complete reference of the major features of the Python language and of the most common scientific libraries. It is written in a clear, precise and didactical style that would appeal to those who, even if they are already familiar with the Python programming language, would like to develop their proficiency in numerical and scientific programming with the standard tools of the Python system.

The post Learning Scientific Programming With Python appeared first on CERN Courier.

CMS gears up for the LHC data deluge

cern — Fri, 12 Aug 2016 08:00:00 +0000

An event display showing the extreme environment of proton–proton collisions at the LHC’s Run 2. Image credit: CMS Collaboration.

ATLAS and CMS, the large general-purpose experiments at CERN’s Large Hadron Collider (LHC), produce enormous data sets. Bunches of protons circulating in opposite directions around the LHC pile into each other every 25 nanoseconds, flooding the detectors with particle debris. Recording every collision would produce data at an unmanageable rate of around 50 terabytes per second. To reduce this volume for offline storage and processing, the experiments use an online filtering system called a trigger. The trigger system must remove the data from 99.998% of all LHC bunch crossings but keep the tiny fraction of interesting data that drives the experiment’s scientific mission. The decisions made in the trigger, which ultimately dictate the physics reach of the experiment, must be made in real time and are irrevocable.

The trigger system of the CMS experiment has two levels. The first, Level-1, is built from custom electronics in the CMS underground cavern, and reduces the rate of selected bunch crossings from 40 MHz to less than 100 kHz. There is a period of only four microseconds during which a decision must be reached, because data cannot be held within the on-detector memory buffers for longer than this. The second level, called the High Level Trigger (HLT), is software-based. Approximately 20,000 commercial CPU cores, housed in a building on the surface above the CMS cavern, run software that further reduces the crossing rate to an average of about 1 kHz. This is low enough to transfer the remaining data to the CERN Data Centre for permanent storage.

The original trigger system served CMS well during Run 1 of the LHC, which provided high-energy collisions at up to 8 TeV from 2010–2013. Designed in the late 1990s and operational by 2008, the system allowed the CMS collaboration to co-discover the Higgs boson in multiple final-state topologies. Among hundreds of other CMS measurements, it also allowed us to observe the rare decay B_s → μμ with a significance of 4.3σ.

In Run 2 of the LHC, which got under way last year, CMS faces a much more challenging collision environment. The LHC now delivers both an increased centre-of-mass energy of 13 TeV and increased luminosity beyond the original LHC design of 10³⁴ s^–1cm^–2. While these improve the detector’s capability to observe rare physics events, they also result in severe event “pile-up” due to multiple overlapping proton collisions within a single bunch crossing. This effect not only makes it much harder to select useful crossings, it can drive trigger rates beyond what can be tolerated. This could be partially mitigated by raising the energy thresholds for the selection of certain particles. However, it is essential that CMS maintains its sensitivity to physics at the electroweak scale, both to probe the couplings of the Higgs boson and to catch glimpses of any physics beyond the Standard Model. An improved trigger system is therefore required that makes use of the most up-to-date technology to maintain or improve on the selection criteria used in Run 1.

Thinking ahead

In anticipation of these challenges, CMS has successfully completed an ambitious “Phase-1” upgrade to its Level-1 trigger system that has been deployed for operation this year. Trigger rates are reduced via several criteria: tightening isolation requirements on leptons; improving the identification of hadronic tau-lepton decays; increasing muon momentum resolution; and using pile-up energy subtraction techniques for jets and energy sums. We also employ more sophisticated methods to make combinations of objects for event selection, which is accomplished by the global trigger system (see figure 1).

Fig. 1. Data-flow chart for the CMS Level-1 trigger upgrade, where each box represents a physical device. Data from the calorimeter-detector systems start from the top left, and the muon detectors at top right. Processing of these data takes place in two layers, and the results are combined for final decision-making in the global trigger (bottom). Image credit: CMS Collaboration.

These new features have been enabled by the use of the most up-to-date Field Programmable Gate Array (FPGA) processors, which provide up to 20 times more processing capacity and 10 times more communication throughput than the technology used in the original trigger system. The use of reprogrammable FPGAs throughout the system offers huge flexibility, and the use of fully optical communications in a standardised telecommunication architecture (microTCA) makes the system more reliable and easier to maintain compared with the previous VME standard used in high-energy physics for decades (see Decisions down to the wire).

Decisions down to the wire

Credit: G Iles

Overall, about 70 processors comprise the CMS Level-1 trigger upgrade. All processors make use of the large-capacity Virtex-7 FPGA from the Xilinx Corporation, and three board variants were produced. The first calorimeter trigger layer uses the CTP7 board, which highlights an on-board Zync system-on-chip from Xilinx for on-board control and monitoring. The second calorimeter trigger layer, the barrel muon processors, and the global trigger and global muon trigger use the MP7, which is a generic symmetric processor with 72 optical links for both input and output. Finally, a third, modular variant called the MTF7 is used for the overlap and end-cap muon trigger regions, and features a 1 GB memory mezzanine used for the momentum calculation in the end-cap region. This memory can store the calculation of the momentum from multiple angular inputs in the challenging forward region of CMS where the magnetic bending is small.

The Level-1 trigger requires very rapid access to detector information. This is currently provided by the CMS calorimeters and muon system, which have dedicated optical data links for this purpose. The calorimeter trigger system – which is used to identify electrons, photons, tau leptons, and jets, and also to measure energy sums – consists of two processing layers. The first layer is responsible for collecting the data from calorimeter regions, summing the energies from the electromagnetic and hadronic calorimeter compartments, and organising the data to allow efficient processing. These data are then streamed to a second layer of processors in an approach called time-multiplexing. The second layer applies clustering algorithms to identify calorimeter-based “trigger objects” corresponding to single particle candidates, jets or features in the overall transverse-energy flow of the collision. Time-multiplexing allows data from the entire calorimeter for one beam crossing to be streamed to a single processor at full granularity, avoiding the need to share data between processors. Improved energy and position resolutions for the trigger objects, along with the increased logic space available, allows more sophisticated trigger decisions.

The muon trigger system also consists of two layers. For the original trigger system, a separate trigger was provided from each of the three muon-detector systems employed at CMS: drift tubes (DT) in the barrel region; cathode-strip chambers (CSC) in the endcap regions; and resistive plate chambers (RPC) throughout the barrel and endcaps. Each system provides unique information useful for making a trigger decision; for example, the superior timing of the RPCs can correct the time assignment of DTs and CSC track segments, as well as provide redundancy in case a specific DT or CSC is malfunctioning.

Fig. 2. One example of the improved performance from the new system, showing the efficiency for triggering on tau leptons for several thresholds as a function of the offline reconstructed transverse momentum of the tau lepton. In contrast, the trigger used during Run 1 had a plateau efficiency of only about 60%, due to limited discrimination between the narrow tau energy-deposit signature and broader jets. Image credit: CMS Collaboration.

In Run 2, we combine trigger segments from all of these units at an earlier stage than in the original system, and send them to the muon track-finding system in a first processing layer. This approach creates an improved, highly robust muon trigger that can take advantage of the specific benefits of each technology earlier in the processing chain. The second processing layer of the muon trigger takes as input the tracks from 36 track-finding processors to identify the best eight candidate muons. It cancels duplicate tracks that occur along the boundaries of processing layers, and will in the future also receive information from the calorimeter trigger to identify isolated muons. These are a signature of interesting rare particle decays such as those of vector bosons.

A feast of physics

Finally, the global trigger processor collects information from both the calorimeter and muon trigger systems to arrive at the final decision on whether to keep the data from a given beam crossing – again, all in a period of four microseconds or less. The trigger changes made for Run 2 allow an event selection procedure that is much closer to that traditionally performed in software in the HLT or in offline analysis. The global trigger applies the trigger “menu” of the experiment – a large set of selection criteria designed to identify the broad classes of events used in CMS physics analyses. For example, events with a W or Z boson in the final state can be identified by the requirement for one or two isolated leptons above a certain energy threshold; top-quark decays by demanding high-energy leptons and jets in the same bunch crossing; and dark-matter candidates via missing transverse energy. The new system can contain several hundred such items – which is quite a feast of physics – and the complete trigger menu for CMS evolves continually as our understanding improves.

The trigger upgrade was commissioned in parallel with the original trigger system during LHC operations in 2015. This allowed the new system to be fully tested and optimised without affecting CMS physics data collection. Signals from the detector were physically split to feed both the initial and upgraded trigger systems, a project that was accomplished during the LHC’s first long shutdown in 2013–2014. For the electromagnetic calorimeter, for instance, new optical transmitters were produced to replace the existing copper cables to send data to the old and new calorimeter triggers simultaneously. A complete split was not realistic for the barrel muon system, but a large detector slice was prepared nevertheless. The encouraging results during commissioning allowed the final decision to proceed, with the upgrade to be taken in early January 2016.

As with the electronics, an entirely new software system had to be developed for system control and monitoring. For example, low-level board communication changed from a PCI-VME bus adapter to a combination of Ethernet and PCI-express. This took two years of effort from a team of experts, but also offered the opportunity to thoroughly redesign the software from the bottom up, with an emphasis on commonality and standardisation for long-term maintenance. The result is a powerful new trigger system with more flexibility to adapt to the increasingly extreme conditions of the LHC while maintaining efficiency for future discoveries (figure 2, previous page).

Although the “visible” work of data analysis at the LHC takes place on a timescale of months or years at institutes across the world, the first and most crucial decisions in the analysis chain happen underground and within microseconds of each proton–proton collision. The improvements made to the CMS trigger for Run 2 mean that a richer and more precisely defined data set can be delivered to physicists working on a huge variety of different searches and measurements in the years to come. Moreover, the new system allows flexibility and routes for expansion, so that event selections can continue to be refined as we make new discoveries and as physics priorities evolve.

The CMS groups that delivered the new trigger system are now turning their attention to the ultimate Phase-2 upgrade that will be possible by around 2025. This will make use of additional information from the CMS silicon tracker in the Level-1 decision, which is a technique never used before in particle physics and will approach the limits of technology, even in a decade’s time. As long as the CMS physics programme continues to push new boundaries, the trigger team will not be taking time off.

The post CMS gears up for the LHC data deluge appeared first on CERN Courier.

The end of computing’s steam age

cern — Fri, 12 Aug 2016 08:00:00 +0000

The CERN data centre is the heart of the Worldwide LHC Computing Grid, but the future of scientific computing could be on a dedicated Science Cloud.
Image credit: Sophia Elizabeth Bennett.

Steam once powered the world. If you wanted to build a factory, or a scientific laboratory, you needed a steam engine and a supply of coal. Today, for most of us, power comes out of the wall in the form of electricity.

The modern-day analogue is computing: if you want to run a large laboratory such as CERN, you need a dedicated computer centre. The time, however, is ripe for change.

For LHC physicists, this change has already happened. We call it the Worldwide LHC Computing Grid (WLCG), which is maintained by the global particle-physics community. As physicists move towards the High Luminosity LHC (HL-LHC), however, we need a new solution for our increasingly demanding computing and data-storage needs. That solution could look very much like the Cloud, which is the general term for distributed computing and data storage in broader society.

There are clear differences between the Cloud and the Grid. When developing the WLCG, CERN was able to factor in technology that was years in the future by banking on Moore’s law, which states that processing capacity doubles roughly every 18 months. After more than 50 years, however, Moore’s law is coming up against a hard technology limit. Cloud technology, by contrast, shows no sign of slowing down: more bandwidth simply means more fibre or colour-multiplexing on the same fibre.

Cloud computing is already at an advanced stage. While CERN was building the WLCG, the Googles and Amazons of the world were building huge data warehouses to host commercial Clouds. Although we could turn to them to satisfy our computing needs, it is doubtful that such firms could guarantee the preservation of our data for the decades that it would be needed. We therefore need a dedicated “Science Cloud” instead.

CERN has already started to think about the parameters for such a facility. Zenodo, for example, is a future-proof and non-proprietary data repository that has been adopted by other big-data communities. The virtual nature of the technology allows various scientific disciplines to coexist on a given infrastructure, making it very attractive to providers. The next step requires co-operation with governments to develop computing and data warehouses for a Science Cloud.

CERN and the broader particle-physics community have much to bring to this effort. Just as CERN played a pioneering role in developing Grid computing to meet the needs of the LHC, we can contribute to the development of the Science Cloud to meet the demands of the HL-LHC. Not only will this machine produce a luminosity five times greater than the LHC, but data are increasingly coming straight from the sensors in the LHC detectors to our computer centre with minimal processing and reduction along the way. Add to that CERN’s open-access ethos, which began in open-access publishing and is now moving towards “open data”, and you have a powerful combination of know-how relevant to designing future computing and data facilities. Particle physics can therefore help develop Cloud computing for the benefit of science as a whole.

In the future, scientific computing will be accessed much as electrical power is today: we will tap into resources simply by plugging in, without worrying about where our computing cycles and data storage are physically located. Rather than relying on our own large computer centre, there will be a Science Cloud composed of computing and data centres serving the scientific endeavour as a whole, guaranteeing data preservation for as long as it is needed. Its location should be determined primarily by its efficiency of operation.

CERN has been in the vanguard of scientific computing for decades, from the computerised control system of the Super Proton Synchrotron in the 1970s, to CERNET, TCP/IP, the World Wide Web and the WLCG. It is in that vanguard that we need to remain, to deliver the best science possible. Working with governments and other data-intensive fields of science, it’s time for particle physics to play its part in developing a world in which the computing socket sits right next to the power socket. It’s time to move beyond computing’s golden age of steam.

The post The end of computing’s steam age appeared first on CERN Courier.

CERN’s IT gears up to face the challenges of LHC Run 2

cern — Fri, 20 May 2016 08:00:00 +0000

Résumé

L’informatique du CERN prête à relever les défis de l’Exploitation 2 du LHC

Pour l’Exploitation 2, le LHC va continuer à ouvrir la voie à de nouvelles découvertes en fournissant aux expériences jusqu’à un milliard de collisions par seconde. À plus haute énergie et intensité, les collisions sont plus complexes à reconstruire et analyser ; les besoins en capacité de calcul sont par conséquent plus élevés. La deuxième période d’exploitation doit fournir deux fois plus de données que la première, soit environ 50 Po par an. Le moment est donc propice pour faire le point sur l’informatique du LHC afin de voir ce qui a été fait durant le premier long arrêt (LS1) en prévision de l’augmentation du taux de collision et de la luminosité lors de la deuxième période d’exploitation, ce qu’il est possible de réaliser aujourd’hui, et ce qui est prévu pour l’avenir.

The Tier-0 data centre on CERN’s Meyrin site. This is the heart of the Worldwide LHC Computing Grid.
Image credit: Roger Claus, CERN.

2015 saw the start of Run 2 for the LHC, where the machine reached a proton–proton collision energy of 13 TeV – the highest ever reached by a particle accelerator. Beam intensity also increased and, by the end of 2015, 2240 proton bunches per beam were being collided. This year, in Run 2 the LHC will continue to open the path for new discoveries by providing up to one billion collisions per second to ATLAS and CMS. At higher energy and intensity, collision events are more complex to reconstruct and analyse, therefore computing requirements must increase accordingly. Run 2 is anticipated to yield twice the data produced in the first run, about 50 petabytes (PB) per year. So it is an opportune time to look at the LHC’s computing, to see what was achieved during Long Shutdown 1 (LS1), to keep up with the collision rate and luminosity increases of Run 2, how it is performing now and what is foreseen for the future.

LS1 upgrades and Run 2

The Worldwide LHC Computing Grid (WLCG) collaboration, the LHC experiment teams and the CERN IT department were kept busy as the accelerator complex entered LS1, not only with analysis of the large amount of data already collected at the LHC but also with preparations for the higher flow of data during Run 2. The latter entailed major upgrades of the computing infrastructure and services, lasting the entire duration of LS1.

Consolidation of the CERN data centre and inauguration of its extension in Budapest were two major milestones in the upgrade plan achieved in 2013. The main objective of the consolidation and upgrade of the Meyrin data centre was to secure critical information-technology systems. Such services can now keep running, even in the event of a major power cut affecting CERN. The consolidation also ensured important redundancy and increased the overall computing-power capacity of the IT centre from 2.9 MW to 3.5 MW. Additionally, on 13 June 2013, CERN and the Wigner Research Centre for Physics in Budapest inaugurated the Hungarian data centre, which hosts the extension of the CERN Tier-0 data centre, adding up to 2.7 MW capacity to the Meyrin-site facility. This substantially extended the capabilities of the Tier-0 activities of WLCG, which include running the first-pass event reconstruction and producing, among other things, the event-summary data for analysis.

Building a CERN private cloud (preview-courier.web.cern.ch/cws/article/cnl/38515) was required to remotely manage the capacity hosted at Wigner, enable efficient management of the increased computing capacity installed for Run 2, and to provide the computing infrastructure powering most of the LHC grid services. To deliver a scalable cloud operating system, CERN IT started using OpenStack. This open-source project now plays a vital role in enabling CERN to tailor its computing resources in a flexible way and has been running in production since July 2013. Multiple OpenStack clouds at CERN successfully run simulation and analysis for the CERN user community. To support the growth of capacity needed for Run 2, the compute capacity of the CERN private cloud has nearly doubled during 2015, now providing more than 150,000 computing cores. CMS, ATLAS and ALICE have also deployed OpenStack on their high-level trigger farms, providing a further 45,000 cores for use in certain conditions when the accelerator isn’t running. Through various collaborations, such as with BARC (Mumbai, India) and between CERN openlab (see the text box, overleaf) and Rackspace, CERN has contributed more than 90 improvements in the latest OpenStack release.

As surprising as it may seem, LS1 was also a very busy period with regards to storage. Both the CERN Advanced STORage manager (CASTOR) and EOS, an open-source distributed disk storage system developed at CERN and in production since 2011, went through either major migration or deployment. CASTOR relies on a tape-based back end for permanent data archiving, and LS1 offered an ideal opportunity to migrate the archived data from legacy cartridges and formats to higher-density ones. This involved migrating around 85 PB of data, and was carried out in two phases during 2014 and 2015. As an overall result, no less than 30,000 tape-cartridge slots were released to store more data. The EOS 2015 deployment brought storage at CERN to a new scale and enables the research community to make use of 100 PB of disk storage in a distributed environment using tens of thousands of heterogeneous hard drives, with minimal data movements and dynamic reconfiguration. It currently stores 45 PB of data with an installed capacity of 135 PB. Data preservation is essential, and more can be read on this aspect in “Data preservation is a journey” .

Evolution of CERN databases before and after LS1. Size represents the size on disk in TB. Colour represents writing activity in TB/month. The activity is more than four times higher after LS1 than before LS1, with the new quench-protection system real application clusters (QPSR) having an activity of 115 TB/month.
Image credit: CERN.

Databases play a significant role with regards to storage, accelerator operations and physics. A great number of upgrades were performed, both in terms of software and hardware, to rejuvenate platforms, accompany the CERN IT computing-infrastructure’s transformation and the needs of the accelerators and experiments. The control applications of the LHC migrated from a file-based archiver to a centralised infrastructure based on Oracle databases. The evolution of the database technologies deployed for WLCG database services improved the availability, performance and robustness of the replication service. New services have also been implemented. The databases for archiving the controls’ data are now able to handle, at peak, one million changes per second, compared with the previous 150,000 changes per second. This also positively impacts on the controls of the quench-protection system of the LHC magnets, which has been modernised to safely operate the machine at 13 TeV energy. These upgrades and changes, which in some cases have built on the work accomplished as part of CERN openlab projects, have a strong impact on the increasing size and scope of the databases, as can be seen in the CERN databases diagram (above right).

To optimise computing and storage resources in Run 2, the experiments have adopted new computing models. These models move away from the strict hierarchical roles of the tiered centres described in the original WLCG models, to a peer site model, and make more effective use of the capabilities of all sites. This is coupled with significant changes in data-management strategies, away from explicit placement of data sets globally to a much more dynamic system that replicates data only when necessary. Remote access to data is now also allowed under certain conditions. These “data federations”, which optimise the use of expensive disk space, are possible because of the greatly improved networking capabilities made available to WLCG over the past few years. The experiment collaborations also invested significant effort during LS1 to improve the performance and efficiency of their core software, with extensive work to validate the new software and frameworks in readiness for the expected increase in data. Thanks to those successful results, a doubling of the CPU and storage capacity was needed to manage the increased data rate and complexity of Run 2 – without such gains, a much greater capacity would have been required.

Despite the upgrades and development mentioned, additional computing resources are always needed, notably for simulations of physics events, or accelerator and detector upgrades. In recent years, volunteer computing has played an increasing role in this domain. The volunteer capacity now corresponds to about half the capacity of the CERN batch system. Since 2011, thanks to virtualisation, the use of LHC@home has been greatly extended, with about 2.7 trillion events being simulated. Following this success, ATLAS became the first experiment to join, with volunteers steadily ramping up for the last 18 months and a production rate now equivalent to that of a WLCG Tier-2 site.

In terms of network activities, LS1 gave the opportunity to perform bandwidth increases and redundancy improvements at various levels. The data-transfer rates have been increased between some of the detectors (ATLAS, ALICE) and the Meyrin data centre by a factor of two and four. A third circuit has been ordered in addition to the two dedicated and redundant 100 Gbit/s circuits that were already connecting the CERN Meyrin site and the Wigner site since 2013. The LHC Optical Private Network (LHCOPN) and the LHC Open Network Environment (LHCONE) have evolved to serve the networking requirements of the new computing models for Run 2. LHCOPN, reserved for LHC data transfers and analysis and connecting the Tier-0 and Tier-1 sites, benefitted from bandwidth increases from 10 Gbps to 20 and 40 Gbps. LHCONE has been deployed to meet the requirements of the new computing model of the LHC experiments, which demands the transfer of data among any pair of Tier-1, Tier-2 and Tier-3 sites. As of the start of Run 2, LHCONE’s traffic represents no less than one third of the European research traffic. Transatlantic connections improved steadily, with ESnet setting up three 100 Gbps links extending to CERN through Europe, replacing the five 10 Gbps links used during Run 1.

This plot represents the amount of data, in TB, being sent to the CERN archive between 2008 and 2016. The yearly amount of LHC data has gradually increased since 2010 (Run 1, 2010: 12.5 PB, 2011: 19.1 PB, 2012: 27 PB) and during Run 2 (31.5 PB).
Image credit: CERN.

With the start of Run 2, supported by these upgrades and improvements of the computing infrastructure, new data-taking records were achieved: 40 PB of data were successfully written on tape at CERN in 2015; out of the 30 PB from the LHC experiments, a record-breaking 7.3 PB were collected in October; and up to 0.5 PB of data were written to tape each day during the heavy-ion run. By way of comparison, CERN’s tape-based archive system collected in the region of 70 PB of data in total during the first run of the LHC, as shown in the plot (right). In total, today, WLCG has access to some 600,000 cores and 500 PB of storage, provided by the 170 collaborating sites in 42 countries, which enabled the Grid to set a new record in October 2015 by running a total of 51.1 million jobs.

Looking into the future

With the LHC’s computing now well on track with Run 2 needs, the WLCG collaboration is looking further into the future, already focusing on the two phases of upgrades planned for the LHC. The first phase (2019–2020) will see major upgrades of ALICE and LHCb, as well as increased luminosity of the LHC. The second phase – the High Luminosity LHC project (HL-LHC), in 2024–2025 – will upgrade the LHC to a much higher luminosity and increase the precision of the substantially improved ATLAS and CMS detectors.

The requirements for data and computing will grow dramatically during this time, with rates of 500 PB/year expected for the HL-LHC. The needs for processing are expected to increase more than 10 times over and above what technology evolution will provide. As a consequence, partnerships such as those with CERN openlab and other programmes of R&D are essential to investigate how the computing models could evolve to address these needs. They will focus on applying more intelligence into filtering and selecting data as early as possible. Investigating the distributed infrastructure itself (the grid) and how one can best make use of available technologies and opportunistic resources (grid, cloud, HPC, volunteer, etc), improving software performance to optimise the overall system.

Building on many initiatives that have used large-scale commercial cloud resources for similar cases, the Helix Nebula the Science Cloud (HNSciCloud) pre-commercial procurement (PCP) project may bring interesting solutions. The project, which is led by CERN, started in January 2016, and is co-funded by the European Commission. HNSciCloud pulls together commercial cloud-service providers, publicly funded e-infrastructures and a group of 10 buyers’ in-house resources to build a hybrid cloud platform, on top of which a competitive marketplace of European cloud players can develop their own services for a wider range of users. It aims at bringing Europe’s technical development, policy and procurement activities together to remove fragmentation and maximise exploitation. The alignment of commercial and public (regional, national and European) strategies will increase the rate of innovation.

To improve software performance, the High Energy Physics (HEP) Software Foundation, a major new long-term activity, has been initiated. This seeks to address the optimal use of modern CPU architectures and encourage more commonality in key software libraries. The initiative will provide underlying support for the significant re-engineering of experiment core software that will be necessary in the coming years.

In addition, there is a great deal of interest in investigating new ways of data analysis: global queries, machine learning and many more. These are all significant and exciting challenges, but it is clear that the LHC’s computing will continue to evolve, and that in 10 years it will look very different, while still retaining the features that enable global collaboration.

R&D collaboration with CERN openlab

CERN openlab is a unique public–private partnership that has accelerated the development of cutting-edge solutions for the worldwide LHC community and wider scientific research since 2001. Through CERN openlab, CERN collaborates with leading ICT companies and research institutes. Testing in CERN’s demanding environment provides the partners with valuable feedback on their products, while allowing CERN to assess the merits of new technologies in their early stages of development for possible future use. In January 2015, CERN openlab entered its fifth three-year phase.

The topics addressed in CERN openlab’s fifth phase were defined through discussion and collaborative analysis of requirements. This involved CERN openlab industrial collaborators, representatives of CERN, members of the LHC experiment collaborations, and delegates from other international research organisations. The topics include next-generation data-acquisition systems, optimised hardware- and software-based computing platforms for simulation and analysis, scalable and interoperable data storage and management, cloud-computing operations and procurement, and data-analytics platforms and applications.

The post CERN’s IT gears up to face the challenges of LHC Run 2 appeared first on CERN Courier.

Data preservation is a journey

cern — Fri, 20 May 2016 07:00:00 +0000

The tape-unit reel-display system (RDS) shown mounted over tape units in the 6600 computing complex, in 1965.
Image credit: CERN.

As an organisation with more than 60 years of history, CERN has created large volumes of “data” of many different types. This involves not only scientific data – by far the largest in terms of volume – but also many other types (photographs, videos, minutes, memoranda, web pages and so forth). Sadly, some of this information from as recently as the 1990s, such as the first CERN web pages, has been lost, as well as more notably much of the data from numerous pre-LEP experiments. Today, things look rather different, with concerted efforts across the laboratory to preserve its “digital memory”. This concerns not only “born-digital” material but also what is still available from the pre-digital era. Whereas the latter often existed (and luckily often still exists) in multiple physical copies, the fate of digital data can be more precarious. This led Vint Cerf, vice-president of Google and an early internet pioneer, to declare in February 2015: “We are nonchalantly throwing all of our data into what could become an information black hole without realising it.” This is a situation that we have to avoid for all CERN data – it’s our legacy.

Interestingly, many of the tools that are relevant for preserving data from the LHC and other experiments are also suitable for other types of data. Furthermore, there are models that are widely accepted across numerous disciplines for how data preservation should be approached and how success against agreed metrics can be demonstrated.

Success, however, is far from guaranteed: the tools involved have had a lifetime that is much shorter than the desired retention period of the current data, and so constant effort is required. Data preservation is a journey, not a destination.

The basic model that more or less all data-preservation efforts worldwide adhere to – or at least refer to – is the Open Archival Information System (OAIS) model, for which there is an ISO standard (ISO 14721:2012). Related to this are a number of procedures for auditing and certifying “trusted digital repositories”, including another ISO standard – ISO 16363.

This certification requires, first and foremost, a commitment by “the repository” (CERN in this case) to “the long-term retention of, management of, and access to digital information”.

In conjunction with numerous more technical criteria, certification is therefore a way of demonstrating that specific goals regarding data preservation are being, and will be, met. For example, will we still be able to access and use data from LEP in 2030? Will we be able to reproduce analyses on LHC data up until the “FCC era”?

In the context of the Worldwide LHC Computing Grid (WLCG), self-certification of, initially, the Tier0 site, is currently under way. This is a first step prior to possible formal certification, certification of other WLCG sites (e.g. the Tier1s), and even certification of CERN as a whole. This could cover not only current and future experiments but also the “digital memory” of non-experiment data.

More recently, an automated magnetic-tape vault at the CERN Computer Centre, in 2008.
Image credit: CERN.

What would this involve and what consequences would it have? Fortunately, many of the metrics that make up ISO 16363 are part of CERN’s current practices. To pass an audit, quite a few of these would have to be formalised into official documents (stored in a certified digital repository with a digital object identifier): there are no technical difficulties here but it would require effort and commitment to complete. In addition, it is likely that the ongoing self-certification will uncover some weak areas. Addressing these can be expected to help ensure that all of our data remains accessible, interpretable and usable for long periods of time: several decades and perhaps even longer. Increasingly, funding agencies are requiring not only the preservation of data generated by projects that they fund, but also details of how reproducibility of results will be addressed and how data will be shared beyond the initial community that generated it. Therefore, these are issues that we need to address, in any event.

A reasonable target by which certification could be achieved would be prior to the next update of the European Strategy for Particle Physics (ESPP), and further updates of this strategy would offer a suitable frequency of checking that the policies and procedures were still effective.

The current status of scientific data preservation in high-energy physics owes much to the Study Group that was initiated at DESY in late 2008/early 2009. This group published a “Blueprint document” in May 2012, and a summary of this was input to the 2012 ESPP update process. Since that time, effort has continued worldwide, with a new status report published at the end of 2015.

In 2016, we will profit from the first ever international data-preservation conference to be held in Switzerland (iPRES, Bern, 3–6 October) to discuss our status and plans with the wider data-preservation community. Not only do we have services, tools and experiences to offer, but we also have much to gain, as witnessed by the work on OAIS, developed in the space community, and related standards and practices.

High-energy physics is recognised as a leader in the open-access movement, and the tools in use for this, based on Invenio Digital Library software, have been key to our success. They also underpin more recent offerings, such as the CERN Open Data and Analysis Portals. We are also recognised as world leaders in “bit preservation”, where the 100+PB of LHC (and other) data are proactively curated with increasing reliability (or decreasing occurrences of rare but inevitable loss of data), despite ever-growing data volumes. Finally, CERN’s work on virtualisation and versioning file-systems through CernVM and CernVM-FS has already demonstrated great potential for the highly complex task of “software preservation”.

• For further reading, visit arxiv.org/pdf/1205.4667 and dx.doi.org/10.5281/zenodo.46158.

The post Data preservation is a journey appeared first on CERN Courier.

Korean Tier-1 link upgrades to 10 Gbps

cern — Wed, 22 Jul 2015 08:00:00 +0000

The LHC OPN link from KISTI-GSDC.
Image credit: Map data: © Google 2015, INEGI.

On 21 May, the Korea Institute of Science & Technology Information–Global Science experimental Data hub Center (KISTI-GSDC) – the Korean Tier-1 site of the Worldwide LHC Computing Grid (WLCG) – completed the upgrade to 10 Gbps of the bandwidth of its optical-fibre link to CERN. The link is part of the LHC Optical Private Network (OPN) that is used for fast data replication from the Tier-0 at CERN to Tier-1 sites in the WLCG.

KISTI-GSDC was approved as a full Tier-1 site at the 24th WLCG Overview Board in November 2013, backed by the ALICE community’s appreciation of the effort to sustain the site’s reliability and the contribution to computing resources for the experiment. At the time, the bandwidth of the dedicated connection to CERN provided by KISTI-GSDC was below that required, but the road map for upgrading the bandwidth was accepted.

The original proposal was to provide the upgrade of the OPN link by October 2014. However, following an in-depth revision of the executive plan with the Ministry of Science, ICT and Future Planning – the funding agency – to find the most cost-effective way, the upgrade process did not start until the end of February this year. It was finally completed just before the scheduled start of the LHC’s Run 2 in May.

The OPN link between KISTI and CERN is composed of two sections: Daejeon–Chicago (operated by KISTI) and Chicago–Geneva (operated by SURFnet). An additional line to be switched on in case of any necessary intervention complements the link. The yearly budget is about CHF1.1 million.

The post Korean Tier-1 link upgrades to 10 Gbps appeared first on CERN Courier.

From the Web to the Grid and Beyond: Computing Paradigms Driven by High-Energy Physics

cern — Fri, 28 Mar 2014 09:38:35 +0000

By René Brun, Federico Carminati and Giuliana Galli Carminati (eds.)
Springer
Hardback: £62.99 €74.85 $99.00
E-book: £49.99 €59.49 $69.95
Also available at the CERN bookshop

To tell the story behind the title, the editors of this book have brought together chapters written by many well-known people in the field of computing in high-energy physics.

It starts with enlightening accounts by René Brun and Ben Segal of how things that I have been familiar with since being a postdoc came to be. I was intrigued to discover how we alighted on so much of what we now take for granted, such as C++, TCP/IP, Unix, code-management systems and ROOT. There is a nice – and at times frightening – account of the environment in which the World Wide Web was born, describing the conditions that needed to be present for it to happen as it did, and which nearly might not have been the case. The reader is reminded that ground-breaking developments in high-energy physics do not, in general, come about from hierarchical management plans, but from giving space to visionaries.

There are several chapters on the Grid (Les Robertson, Patricia Méndez Lorenzo and Jamie Shiers) and the evolution from grids to clouds (Pedrag Buncic and Federico Carminati). These will be of interest to those who, like me, were involved in a series of EU Grid projects that absorbed many of us completely during the era of “e-science”. The Worldwide LHC Computing Grid was built and is of course now taken for granted by all of us. The discussion of virtualization and the evolution from grids to clouds presents an interesting take on what is a change of name and what is a change of technology.

In another chapter, Carminati gives his candid take on software development – and I found myself smiling and agreeing. Many of us will remember when some sort of religion sprang up around OO design methods, UML, OMT, software reviews and so on. He gives his view of where this helped and where it hindered in our environment, where requirements change, users are developers, and forward motion is made by common consent not by top-down design.

Distributed data and its access is discussed in depth by Fabrizio Furano and Andrew Hanushevsky, who remind us that this is one of the most demanding sectors in computing for high-energy physics. A history of parallel computing by Fons Rademakers is interesting because this has become topical recently, as we struggle to deal with many-core devices. Lawrence Pinsky’s chapter on software legal issues delves into how instruments such as copyright and patents are applied in an area for which they were never designed. It makes for engrossing reading, in the same way that technical issues become captivating when watching legal drama on television.

It is not clear – to me at least – whether Giuliana Galli Carminati’s final chapter on “the planetary brain” is a speculation too far and should be politely passed over, as the author invites the reader to do, or whether there is something significant there that the reader should be concerned about. The speculation is whether the web and grid form something that could be considered as a brain on a planetary scale. I leave you to judge.

It is a highly interesting book, and I plan to read many of the chapters again.

•

The post From the Web to the Grid and Beyond: Computing Paradigms Driven by High-Energy Physics appeared first on CERN Courier.

CERN School of Computing: 10 years of renewal

cern — Wed, 20 Nov 2013 09:00:00 +0000

Students at the 2013 CSC in Nicosia in August.
Image credit: G Lo Presti.

On 29 August 2013, on the ground floor of Building FST01 of the Faculty of Pure and Applied Sciences at the University of Cyprus in Nicosia, 31 students filed silently into the two classrooms of the CERN School of Computing and took a seat in front of a computer. An hour later they were followed by a second wave of 31 students. They were all there to participate in the 12th occasion of a unique CERN initiative – the final examination of its computing school.

The CERN School of Computing (CSC) is one of the three schools that CERN has set up to deliver knowledge in the organization’s main scientific and technical pillars – physics, accelerators and computing. Like its counterparts, the CERN Accelerator School and what is now the European School of High-Energy Physics, each year it attracts several-dozen participants from across the world for a fortnight of activities relating to its main topic.

How and why was the CSC set up? On 23 September 1968, future director-general Léon van Hove put forward a proposal to the then director-general, Bernard Gregory, for the creation of a summer school on data handling. This followed a recommendation made on 21 May 1968 to the Academic Training Committee by Ross MacLeod, head of the Data and Documents Division, the forerunner of today’s Information Technology Department. The proposal recommended that a school be organized in summer 1969 or 1970. The memorandum from van Hove to Gregory gave a visionary description of the potential audience for this new school: “It would address a mixed audience of young high-energy physicists and computer scientists.” Forty-five years later, not a word needs to be changed.

The justification for the school was also prophetic: “One of the interests of the Data Handling Summer School lies in the fact that it would be useful not only for high-energy physicists but also for those working in applied mathematics and computing. It would be an excellent opportunity for CERN to strengthen its contacts with a field which may well play a growing role in the long-range future.” With the agreement of Mervyn Hine, director of research, Gregory approved the proposal on 15 November 1968 and on 20 December MacLeod proposed a list of names to van Hove to form the first organizing committee. Alongside people from outside CERN – Bernard Levrat, John Burren and Peter Kirstein – were Tor Bloch, Rudi Böck, Bernard French, Robert Hagedorn, Lew Kowarski, Carlo Rubbia and Paolo Zanella from CERN.

With the CSC Diploma in their sights, students take the final exam.
Image credit: F Fluckiger.

The first CSC was not held at CERN as initially proposed but in Varenna, Italy, in 1970. It was realized quickly that the computing school – with the physics and accelerator schools – could be effective for collaboration between national physics communities and CERN. Until 1986 the CSC was organized every other year, then yearly starting with the school in Troia, Portugal, in 1987. To date there have been 36 schools, attended by 2300 students from five continents.

Ten years ago, I took over the reins of the school and proposed a redefinition of its objectives as it entered its fourth decade: “The school’s main aim is to create and share a common culture in the field of scientific computing, which is a strategic necessity to promote mobility within CERN and between institutes, and to carry out large transnational computing projects. The second aim is the creation of strong social links between participants, students and teachers alike, to reinforce the cohesion of the community and improve the effectiveness of its shared initiatives. The school should be open to computer scientists and physicists and ensure that both groups get to know each other and acquire a solid grounding in whichever of these domains is not their own.”

Moreover, the new management proposed three major changes of direction. First, they vowed to reinvigorate the resolutely academic dimension of the CSC, which during the years had gradually and imperceptibly become more like a conference. Conferences are necessary for scientific progress – they are forums where people can present their work, have their ideas challenged, have fruitful discussions about controversial issues and talk about themselves and what they do. The interventions at conferences are short, sometimes redundant or contradictory. The transmission of facts and opinions becomes more prominent than the transfer of knowledge. I took the view that this should not be the primary role of the CSC, since conferences such as the Computing in High-Energy Physics series serve this purpose perfectly. The academic dimension was therefore progressively re-established through the implementation of three principles.

Three principles

The first academic principle concerns the organization of the teaching. A deliberately limited number of teachers – each giving a series of lessons of several hours – ensures coherence between the different classes, avoids redundancy and delivers consistent content, more than a series of short interventions. Moreover, for several years now all of the non-CERN teachers have been university professors. This is not the result of a strict policy but it is worthy of note that the choice of teachers has been consistent with this academic ambition.

The social dimension – discovering sea kayaking.
Image credit: J Hammer.

The second principle for restoring the academic dimension concerns the school’s curriculum. The main accent is on the transmission of knowledge and not of know-how. In this way, the CSC differs from training programmes organized by the laboratories and institutes, which are focused on know-how. The difference between knowledge and know-how is an important principle in the field of learning sciences. To get a better understanding of this distinction, the management of the school established relations with experts in the field at an early stage, particularly at the University of Geneva.

Knowledge is made up of fundamental concepts and facts on which additional knowledge is built and developed to persist over time

What are the differences? Knowledge is made up of fundamental concepts and facts on which additional knowledge is built and developed to persist over time. Moreover, the student acquires knowledge, incorporates it into his or her personal knowledge corpus and transforms it. Two physicists never have the same understanding of quantum mechanics. On the other hand, know-how – which includes methods and the use of tools – can generally be acquired autonomously with few prerequisites. With the exception of physical skills – such as knowing how to ride a bike or swim – which we tend not to lose, know-how requires regular practise so that it is not forgotten. Knowledge is more enduring by nature. Finally – and this is one of the main differences – knowledge can be transposed more readily to other environments and adapted to new problems. That at least is the theory. In practice, the differences are sometimes less clear. This is the challenge with which the CSC tries to get to grips each year when defining its programme – are we really operating mainly in the field of knowledge? The school is made up in equal parts of lectures and hands-on sessions, so do the latter not relate more to know-how? Yes, but the acquisition of this know-how is not an end in itself – it provides knowledge with a better anchorage.

The third principle of the academic dimension is evaluation of the knowledge acquired and recognition of the required level of excellence with a certificate. Following requests from students who wanted the high level of knowledge gained during the school to be formally certified, the CSC Diploma was introduced in 2002 to recognize success in the final exam and vouch for the student’s diligence throughout the programme. To date, 671 students have been awarded the CSC Diploma, which often figures prominently in their CVs. But that’s not all. Since 2008, the academic quality of the school, its teachers and exam has been formally audited each year by a different independent university. Each autumn, the school management prepares a file that is aimed at integrating the next school into the academic curriculum of the host university. The universities of Brunel, Copenhagen, Gjøvik, Göttingen, Nicosia and Uppsala have analysed and accepted CERN’s request. As a result, they have each awarded a formal European Credit Transfer System (ECTS) certificate to complement the CERN diploma.

This academic reorientation of the school is one of the three main renewal projects undertaken during the past 10 years. The second relates to the school’s social dimension. The creation of social links and networks between the participants and with their teachers has become the school’s second aim. This is considered to be a strategic objective because not only does it reinforce the cohesion of the community, it also improves the efficiency of large projects or services, such as the Worldwide LHC Computing Grid, through improved mutual understanding between the individuals contributing to them.

How it all began – the 1968 memorandum from Léon van Hove to Bernard Gregory, then director-general of CERN.

The main vehicle chosen for socialization is sport. Every afternoon, a large part of the timetable is freed up for a dozen indoor and outdoor sports. Tennis, climbing or swimming lessons are given, often by the school’s teachers. Each year, participants discover an activity that is new to them, such as horse riding, sailing, canoeing, kayaking, scuba diving, rock climbing, cricket and mountain biking. The sport programme is supported by the CERN Medical Service and is associated with the “Move! Eat better” initiative. A second vehicle for socialization – music – is being considered and could be introduced for future schools. The intention is to give those who are interested the opportunity each afternoon to take part in instrumental music or choral singing or to discover them for the first time, with the same aim as for sport of “doing things together to get to know each other better”.

The third renewal project is plurality. In contrast to CERN’s high-energy physics and accelerator schools, which have organized several annual events for a number of years, the CSC has long remained the organization’s only school in the field of computing. However, since 2005 the CSC management has organized the inverted CSC (iCSC, “Where students turn into teachers”) and starting in 2013 the thematic CSC (tCSC). The idea behind the inverted school is simple – to capitalize on the considerable amount of knowledge accumulated by the participants in a school by inviting them to teach one or more lessons at a short school of three to five half-days, organized at CERN at the mid-point between two summer schools. To date, 40 former students have taught at one of these inverted schools.

It should be noted that the academic principle is still predominant. The goal is not to talk about oneself or one’s project but to present a topic, an innovative one if possible. This is not always easy, so each young teacher who is selected is assigned a mentor who follows the design and production of the lesson across three months. The inverted school has another aim – it is also a school for learning to teach. It represents the second link in a chain of training stages for new teachers for the main school. The first link, for those who are interested, is to give a short academic presentation while attending the main school. After the iCSC, i.e. the second link, some are invited to give an hour’s lesson at the main school before the last stage – their full integration into the teaching staff. This process generally takes several years.

During the latest CSC in Nicosia, five out of the 11 teachers were younger than 35. Three of them had passed through the CSC training chain. Along with their forthcoming colleagues, they are the future of the school. Leaving the CSC after 11 years as its director, I am confident that the next generation is ready to take up the baton.

The post CERN School of Computing: 10 years of renewal appeared first on CERN Courier.

Networks Geeks: How They Built the Internet

cern — Mon, 19 Aug 2013 08:12:20 +0000

By Brian E Carpenter
Springer
Paperback: £15 €21.09 $19.99
E-book: £11.99 €15.46 $9.99

In Network Geeks, Brian Carpenter weaves the history of the early internet into an entertaining personal narrative. As head of CERN’s computer-networking group throughout the 1980s, he is well placed to describe the discussions, the splits, the technical specifications and countless acronyms that made up the esoteric world of networking in the early days of the internet in Europe. Just don’t expect to be spared the technical details.

Carpenter joined CERN in 1971, at a time when computers filled entire rooms, messages were relayed by paper tape or punched card and numerous local networks ran bespoke software packages around the laboratory. Simplifying the system brought Carpenter into the world of the internet Engineering Task Force – the committee charged with overseeing the development of standards for internet technology.

I enjoyed the fictional account of a meeting of the Task Force in 1996, which gives a vivid idea of the sheer number of technical issues, documents and acronyms that the group tackled. That year, traffic was doubling every 100 days. Keeping up with the pace of change and deciding which standards and protocols to use – TCP/IP or OSI? – were emotive issues. As with any new technology, there was lobbying, competition and elements of luck. Nobody knew where the internet would lead.

Carpenter’s enthusiasm is the strength of Network Geeks. He recounts his early interest in science – a childhood of Meccano and Sputnik – with an easy nostalgia and his memories of informal meetings with often-bearded computer scientists show genuine warmth. But it is no easy read. The autobiographical narrative jumps jarringly between lyrical descriptions of the author’s youth and the rather mundane details of computer networking. At times I felt I was drowning in specifics when I was really hoping for a wider view, for implications rather than specifications.

Networks Geeks reminded me that the evolution of technology can be as much down to politics and luck as to scientific advances. It gave me a great overview of the climate in the early days on the internet. At the same, the heavy layers of jargon also reminded me why I’m no computer scientist.

The post Networks Geeks: How They Built the Internet appeared first on CERN Courier.

Data centre opens

cern — Fri, 19 Jul 2013 07:00:00 +0000

Image credit: Wigner RCP.

CERN and the Wigner Research Centre for Physics inaugurated the CERN Tier-0 data-centre extension in Budapest on 13 June, marking the completion of the facility. CERN’s director-general, Rolf Heuer, far left, joined József Pálinkás, president of the Hungarian Academy of Sciences, and Viktor Orbán, prime minister of Hungary, in the ceremonial “cutting the ribbon”, in the company of Péter József Lévai, far right, general director of the Wigner Research Centre for Physics (RCP). This extension adds up to 2.5 MW capacity to the 3.5 MW load of the data centre at CERN’s Meyrin site, which has already reached its limit. The dedicated and redundant 100 Gbit/s circuits connecting the two sites have been functional since February and about 20,000 computing cores, 500 servers and 5.5 PB of storage are already operational at the new facility.

The post Data centre opens appeared first on CERN Courier.

CERN data centre passes 100 petabytes

cern — Thu, 28 Mar 2013 09:00:00 +0000

On the same day that the LHC’s first three-year physics run ended, CERN announced that its data centre had recorded more than 100 petabytes (PB) – 100 million gigabytes – of physics data.

Amassed over the past 20 years, the storing of this 100 PB – the equivalent of 700 years of full HD-quality video – has been a challenge. At CERN, the bulk of the data (about 88 PB) is archived on tape using the CERN Advanced Storage (CASTOR) system. The rest (13 PB) is stored on the EOS-disk pool system, which is optimized for fast analysis access by many concurrent users.

For the CASTOR system, eight robotic tape libraries are distributed across two buildings, with each tape library capable of containing up to 14,000 tape cartridges. CERN currently has around 52,000 tape cartridges with a capacity ranging from 1 terabyte (TB) to 5.5 TB each. For the EOS system, the data are stored on more than 17,000 disks attached to 800 disk servers.

Not all of the data are generated by LHC experiments. CERN’s IT Department hosts data from many other high-energy physics experiments at CERN, past and present, and is also a data centre for the Alpha Magnetic Spectrometer.

For both tape and disk, efficient data storage and access must be provided, and this involves identifying performance bottlenecks and understanding how users want to access the data. Tapes are checked regularly to make sure that they stay in good condition and are accessible to users. To optimize storage space, the complete archive is regularly migrated to the newest high-capacity tapes. Disk-based systems are replicated automatically after hard-disk failures and a scalable namespace enables fast concurrent access to millions of individual files.

The data centre will keep busy during the long shutdown of the whole accelerator complex, analysing data taken during the LHC’s first three-year run and preparing for the higher expected data flow when the accelerators and experiments start up again. An extension of the centre and the use of a remote data centre in Hungary will further increase the data centre’s capacity.

The post CERN data centre passes 100 petabytes appeared first on CERN Courier.

The LHC’s worldwide computer

cern — Thu, 28 Mar 2013 09:00:00 +0000

Fig. 1. Diagram showing the tier system of WLCG, with CERN’s Tier-0 site sending data to the 11 Tier-1 sites and their corresponding Tier-2 sites. More Tier-1 and Tier-2 sites are foreseen.

Mid-February marked the end of the first three-year run of the LHC. While the machine exceeded all expectations, delivering significantly more data to the experiments than initially foreseen, high-performance distributed computing also enabled physicists to announce on 4 July the discovery of a new particle (CERN Courier September 2012 p46). With the first run now over, it is a good time to look back at the Worldwide LHC Computing Grid to see what was initially planned, how it performed and what is foreseen for the future.

Back in the late 1990s, it was already clear that the expected amount of LHC data would far exceed the computing capacity at CERN alone. Distributed computing was the sensible choice. The first model proposed was MONARC (Models of Networked Analysis at Regional Centres for LHC Experiments), on which the experiments originally based their computing models (CERN Courier June 2000 p17). In September 2001, CERN Council approved the first phase of the LHC Computing Grid project, led by Les Robertson of CERN’s IT department (CERN Courier November 2001 p5). From 2002 to 2005, staff at CERN and collaborating institutes around the world developed prototype equipment and techniques. From 2006, the LHC Computing Grid became the Worldwide LHC Computing Grid (WLCG) as global computing centres became connected to CERN to help store data and provide computing power.

WLCG uses a tier structure with the CERN data centre as Tier-0 (figure 1). CERN sends out data to each of the 11 major data centres around the world that form the first level, or Tier-1, via optical-fibre links working at multiples of 10 Gbit/s. Each Tier-1 site is then linked to a number of Tier-2 sites, usually located in the same geographical region. Computing resources are supported by the national funding agencies of the countries where each tier is located.

Exceeding expectations

Before the LHC run began, the experiment collaborations had high expectations for the Grid. Distributed computing was the only way that they could store, process and analyse the data – both simulated and real. But, equally, there was some hesitation: the scale of the data processing was unprecedented and it was the first time that analysis had been distributed in this way, dependent on work done at so many different places and funded by so many sources.

There was caution on the computing side too; concerns about network reliability led to built-in complexities such as database replication. As it turned out, the network performed much better than expected. Networking in general saw a big improvement, with connections of 10 Gbit/s being more or less standard to the many university departments where the tiers are housed. Greater reliability, greater bandwidth and greater performance led to increased confidence. The initial complexities and the need for replication of databases reduced, and over time the Grid saw increased simplicity, with a greater reliance on central services run at CERN.

A wealth of data

Network improvements, coupled with the reduced costs of computing hardware meant that more resources could be provided. Improved performance allowed the physics to evolve as the LHC experiments increased their trigger rates to explore more regions than initially foreseen, thus increasing the instantaneous data. LHCb now writes as much data as had been initially estimated for ATLAS and CMS. In 2010, the LHC produced its nominal 15 petabytes (PB) of data a year. Since then, it has increased to 23 PB in 2011 and 27 PB in 2012. LHC data contributed about 70 PB to the recent milestone of 100 PB of CERN data storage (see p6).

In ATLAS and CMS, at least one collision took place every 50 ns i.e. with a frequency of 20 MHz. The ATLAS trigger output-rate increased over the years to up to 400 Hz of output into the main physics streams in 2012, giving more than 5.5 × 10⁹ recorded physics collisions. CMS collected more than 10¹⁰ collision events after the start of the run and reconstructed more than 2 × 10¹⁰ simulated crossings.

For ALICE, the most important periods of data-taking were the heavy-ion (PbPb) periods – about 40 days in 2010 and 2011. The collaboration collected some 200 million PbPb events with various trigger set-ups. These periods produced the bulk of the data volume in ALICE and their reconstruction and analysis required the biggest amount of CPU resources. In addition, the ALICE detector operated during the proton–proton periods and collected reference data for comparison with the heavy-ion data. In 2013, just before the long shutdown, ALICE collected asymmetrical proton–lead collisions with an interaction versus trigger rate of 10%. In total, from 2010, ALICE accumulated about 8 PB of raw data. Add to that the reconstruction, Monte Carlo simulations and analysis results, and the total data volume grows to about 20 PB.

In LHCb, the trigger reduces 20 million collisions a second to 5000 events written to tape each second. The experiment produces about 350 MB of raw data per second of LHC running, with the total raw data recorded since the start of LHC at about 3 PB. The total amount of data stored by LHCb is 20 PB, of which about 8 PB are on disk. Simulated data accounts for about 20% of the total. On average, about one tenth of the jobs running concurrently on the WLCG come from LHCb.

Fig. 2. The Tier-2 sites of the Worldwide LHC Computing Grid now regularly deliver more than 50% of total resources. They were initially foreseen to deliver 40%.

The WLCG gives access to vast distributed resources across the globe in Tier-1 and Tier-2 sites, as well as to additional voluntary resources from interested institutions, ensuring built-in resilience because the analysis is not performed in a single data centre and hence is not dependent on that centre. It also makes the LHC data available worldwide at the same time.

As time has gone on, the Tier-2 sites have been used far more than foreseen (figure 2). Originally thought to be just for analysis and Monte Carlo simulations, the sites can now do much more with more resources and networking than anticipated. They currently contribute to data reprocessing, normally run at Tier-1 sites, and have enabled the Grid to absorb peak loads that have arisen when processing real data as a result of the extension of the LHC run and the higher-than-expected data collection rates. Because the capacity available at Tier-0 and Tier-1 was insufficient to process new data and reprocess earlier data simultaneously, the reprocessing activity was largely done on Tier-2s. Without them it would not have been possible to have the complete 2012 data set reprocessed in time for analyses targeting the winter conferences in early 2013.

The challenges for the Grid were three-fold. The main one was to understand how best to manage the LHC data and use the Grid’s heterogeneous environment in a way that physicists could concern themselves with analysis without needing to know where their data were. A distributed system is more complex and demanding to master than the usual batch-processing farms, so the physicists required continuous education on how to use the system. The Grid needs to be fully operational at all times (24/7, 365 days/year) and should “never sleep” (figure 3), meaning that important upgrades of the Grid middleware in all data centres must be done on a regular basis. For the latter, the success can be attributed in part to the excellent quality of the middleware itself (supplied by various common projects, such as WLCG/EGEE/EMI in Europe and OSG in the US, see box) and to the administrators of the computing centres, who keep the computing fabric running continuously.

Requirements for the future

With CERN now entering its first long shutdown (LS1), the physicists previously on shift in the control rooms are turning to analysis of the data. Hence LS1 will not be a period of “pause” for the Grid. In addition to analysis, the computing infrastructure will undergo a continual process of upgrades and improvements.

The computing requirements of ALICE, ATLAS, CMS and LHCb are expected to evolve and increase in conjunction with the experiments’ physics programmes and the improved precision of the detectors’ measurements. The ALICE collaboration will re-calibrate, re-process and re-analyse the data collected from 2010 until 2013 during LS1. After the shutdown, the Grid capacity (CPU and storage) will be about 30% more than that currently installed, which will allow the experiment to resume data-taking and immediate data processing at the higher LHC energy. The ATLAS collaboration has an ambitious plan to improve its software and computing performance further during LS1 to moderate the increase in hardware needs. They nonetheless expect a substantial increase in their computing needs compared with what was pledged for 2012. The CMS collaboration expects the trigger rate – and subsequently the processing and analysis challenges – to continue to grow with the higher energy and luminosity after LS1. LHCb’s broader scope to include charm physics may increase the experiment’s data rate by a factor of about two after LS1, which would require more storage on the Grid and more CPU power. The collaboration also plans to make much more use of Tier-2 sites for data processing than was the case up until now.

Fig. 3. The Grid never sleeps: this image shows the activity on 1 January 2013, just after midnight, with almost 250,000 jobs running.
Image credit: Data SIO, NOAA, US Navy, NGA, GEBCO, Google, US Dept. of State Geographer, GeoBasis, DE/BKG.

For the Grid itself, the aim is to make it simpler and more integrated, with work now underway to extend CERN’s Tier-0 data centre, using resources at CERN and the Wigner Research Centre in Budapest (CERN Courier June 2012 p9). Equipment is already being installed and should be fully operational in 2013.

Future challenges and requirements are the result of great successes. Grid performance has been excellent and all of the experiments have not only been good at recording data, but have also found that their detectors could even do more. This has led to the experiment collaborations wanting to capitalize on this potential. With a wealth of data, they can be thankful for the worldwide computer, showing global collaboration at its best.

Worldwide LHC Computing Grid in numbers

• About 10,000 physicists use it

• On average well in excess of 250,000 jobs run concurrently on the Grid

• 30 million jobs ran in January 2013

• 260,000 available processing cores

• 180 PB disk storage available worldwide

• 15% of the computing resources are at CERN

• 10 Gbit/s optical-fibre links connect CERN to each of the 11 Tier-1 institutes

• There are now more than 70 PB of stored data at CERN from the LHC

Beyond particle physics

Throughout its lifetime, WLCG has worked closely with Grid projects co-funded by the European Commission, such as EGEE (Enabling Grids for E-sciencE), EGI (European Grid Infrastructure) and EMI (European Middleware Initiative), or funded by the US National Science Foundation and Department of Energy, such as OSG (Open Science Grid). These projects have provided operational and developmental support and enabled wider scientific communities to use Grid computing, from biologists who simulate millions of molecular drug candidates to find out how they interact with specific proteins, to Earth-scientists who model the future of the planet’s climate.

The post The LHC’s worldwide computer appeared first on CERN Courier.

Deferred triggering optimizes CPU use

cern — Thu, 31 May 2012 10:00:00 +0000

Like all of the LHC experiments, LHCb relies on a tremendous amount of CPU power to select interesting events out of the many millions that the LHC produces every second. Indeed, a large part of the ingenuity of the LHCb collaboration goes into developing trigger algorithms that can sift out the interesting physics from a sea of background. The cleverer the algorithms, the better the physics, but often the computational cost is also higher. About 1500 powerful computing servers in an event filter farm are kept 100% busy when LHCb is taking data and still more could be used.

Measured output rate to permanent storage of one of the LHCb servers during and after an LHC fill. The output rate increases by a few per cent after the end of a fill – because of the reduced network traffic – while the remaining, locally stored events are processed.

However, this enormous computing power is used less than 20% of the time when averaged over the entire year. This is partly because of the annual shutdown, so preparations are under way to use the power of the filter farm during that period for offline processing of data – the issues to be addressed include feeding the farm with events from external storage. The rest of the idle time is a result of the gaps between the periods when there are protons colliding in the LHC (the “fills”), which typically last between two and three hours, where no collisions take place and therefore no computing power is required.

This raises the question about whether it is somehow possible to borrow the CPU power of the idle servers and use it during physics runs for an extra boost. Such thoughts led to the idea of “deferred triggering”: storing events that cannot be processed online on the local disks of the servers, and later, when the fill is over, processing them on the now idle servers.

The LHCb Online and Trigger teams quickly worked out the technical details and started the implementation of a deferred trigger early this year. As often happens in online computing, the storing and moving of the data is the easy part, while the true challenge lies in the monitoring and control of the processing, robust error-recovery and careful bookkeeping. After a few weeks, all of the essential pieces were ready for the first successful tests using real data.

Depending on the ratio of the fill length to inter-fill time, up to 20% of CPU time can be deferred – limited only by the available disk space (currently around 200 TB) and the time between fills in the LHC. Buying that amount of CPU power would correspond to an investment of hundreds of thousands of Swiss francs. Instead, this enterprising idea has allowed an increase in the performance of its trigger, allowing time for more complex algorithms (such as the online reconstruction of K_S decays) to extend the physics reach of the experiment.

The post Deferred triggering optimizes CPU use appeared first on CERN Courier.

Hungary to host extension to CERN data centre

cern — Thu, 31 May 2012 10:00:00 +0000

Artist’s impression of the new data centre.
Image credit: Wigner Research Centre.

Following a competitive call for tender, CERN has signed a contract with the Wigner Research Centre for Physics in Budapest for an extension to CERN’s data centre. Under the new agreement, the Wigner Centre will host CERN equipment that will substantially extend the capabilities of Tier-0 of the Worldwide LHC Computing Grid (WLCG) and provide the opportunity to implement solutions for business continuity. The contract is initially until 31 December 2015, with the possibility of up to four one-year extensions thereafter.

The WLCG is a global system organized in tiers, with the central hub being Tier-0 at CERN. Eleven major Tier-1 centres around the world are linked to CERN via dedicated high-bandwidth links. Smaller Tier-2 and Tier-3 centres linked via the internet bring the total number of computer centres involved to more than 140 in 35 countries. The WLCG serves a community of some 8000 scientists working on LHC experiments, allowing seamless access, distributed computing and data-storage facilities.

The Tier-0 at CERN currently provides some 30 PB of data storage on disk and includes the majority of the 65,000 processing cores in the CERN Computer Centre. Under the new agreement, the Wigner Research Centre will extend this capacity with 20,000 cores and 5.5 PB of disk storage, and will see this doubling after three years.

The post Hungary to host extension to CERN data centre appeared first on CERN Courier.

The openlab adventure continues to thrive

cern — Fri, 27 Apr 2012 07:00:00 +0000

Friday, 31 May 2001, 6 p.m. – Back in my office, I open my notebook and write “My understanding of MD’s ideas” in blue ink. I draw a box and write the words “Open Lab” in the middle of it. I’ve just left the office of Manuel Delfino, the head of CERN’s IT division. His assistant had called to ask me to go and see Manuel at 4 p.m. to talk about “industrial relations”. I’ve been technology-transfer co-ordinator for a few weeks but I had no idea of what he was going to say to me. An hour later, I need to collect my thoughts. Manuel has just set out one of the most amazing plans I’ve ever seen. There’s nothing like it, no model to go on, and yet the ideas are simple and the vision is clear. He’s asked me to take care of it. The CERN openlab adventure is about to begin.

The original, hand-written notes from the first meeting on 31 May 2001 on what was to become openlab. The red circle highlights the four desired characteristics proposed for the new initiative.
Image credit: F Fluckiger.

This is how the opening lines of the openlab story could begin if it were ever to be written as a novel. At the start of the millennium, the case was clear for Manuel Delfino: CERN was in the process of developing the computing infrastructure for the LHC; significant research and development was needed; and advanced solutions and technologies had to be evaluated. His idea was that, although CERN had substantial computing resources and a sound R&D tradition, collaborating with industry would make it possible to do more and do it better.

Four basic principles

CERN was no stranger to collaboration with industry, and I pointed out to Manuel that we had always done field tests on the latest systems in conjunction with their developers. He nodded but stressed that here was the difference: what he was proposing was not a random collection of short-term, independent tests governed by various different agreements. Instead, the four basic principles of openlab would be as follows (I jotted them down carefully because Manuel wasn’t using notes): first, openlab should use a common framework for all partnerships, meaning that the same duration and the same level of contribution should apply to everyone; second, openlab should focus on long-term partnerships of up to three years; third, openlab should target the major market players, with the minimum contribution threshold set at a significant level; last, in return CERN would contribute its expertise, evaluation capacity and its unique requirements. Industrial partners would contribute in kind – in the form of equipment and support – and in cash by funding young people working on joint projects. Ten years on, openlab is still governed by these same four principles.

Manuel Delfino, left, and Luciano Maiani, who was then director-general of CERN, at the first meeting of the openlab board in March 2002.

Back to May 2001. After paving the way with extensive political discussions over several months, Manuel had written a formal letter to five large companies, Enterasys, IBM, Intel, Oracle and KPN QWest, inviting them to become the founding members of the Open Lab (renamed “openlab” a few months later). These letters, which were adapted to suit each case, are model sales-pitches worthy of a professional fundraiser. They set out the unprecedented computing challenges associated with the LHC, the unique opportunities of a partnership with CERN in the LHC framework, the potential benefits for each party and proposed clear areas of technical collaboration for each partner. The letters also demanded a rapid response, indicating that replies needed to reach CERN’s director-general just six weeks later, by 15 June. A model application letter was also provided. With the director-general’s approval, Manuel wrote directly to the top management of the companies concerned, i.e. their chairs and vice-chairs. The letters had the desired effect: three companies gave a positive response by the 15 June deadline, while the other two followed suit a few months later – openlab was ready to go.

The first task was to define the common framework. CERN’s legal service was brought in and the guiding principles of openlab, drawn up in the form of a public document and not as a contract, were ready by the end of 2001. The document was designed to serve as the basis for the detailed agreements with individual partners, which now had to be concluded.

Three-year phases

At the start of 2002, after a few months of existence, openlab had three partners: Enterasys, Intel and KPN QWest (which later withdrew when it became a casualty of the bursting of the telecoms and dotcom bubbles). On 11 March, the first meeting of the board of sponsors was held at CERN. Chaired by the then director-general, Luciano Maiani, representatives of the industrial companies were in attendance as well as Manuel, Les Robertson (the head of the LHC Computing Grid project) and me. At the meeting I presented the first openlab annual report, which has since been followed by nine more, each printed in more than 1000 copies. Then, in July, openlab was joined by HP, and subsequently followed by IBM in March 2003 and by Oracle in October 2003.

In the meantime, a steering structure for openlab was set up at CERN in early 2003, headed by the new head of the IT Department, Wolfgang von Rüden, in an ex officio capacity. Sverre Jarp was the chief technical officer, while François Grey was in charge of communication and I was to co-ordinate the overall management. January 2003 was also a good opportunity to resynchronize the partnerships. The concept of three-year “openlab phases” was adopted, the first covering the years 2003–2005. Management practices and the technical focus would be reviewed and adapted through the successive phases.

Robert Aymar, then director-general of CERN, and a representative from Intel, sign the openlab-III agreement between CERN and Intel.

Thus, Phase I began with an innovative and ambitious technical objective: each partnership was to form a building block of a common structure so that all of the projects would be closely linked. This common construction, which we were all building together, was called “opencluster”. It was an innovative and ambitious idea – but unfortunately too ambitious. The constraints ultimately proved too restrictive – both for the existing projects and for bringing in new partners. So what of a new unifying structure to replace opencluster? The idea was eventually abandoned when it came to openlab-II: although the search for synergies between individual projects was by no means excluded, it was no longer an obligation.

A further adjustment occurred in the meantime, in the shape of a new and complementary type of partnership: the status of “contributor” was created in January 2004, aimed at tactical, shorter-term collaborations focusing on a specific technology. Voltaire was the first company to acquire the new status on 2 April, to provide CERN with the first high-speed network based on Infiniband technology. A further innovation followed in July. François set up the openlab Student Programme, designed to bring students to CERN from around the world to work on openlab projects. With the discontinuation of the opencluster concept, and with the new contributor status and the student programme, openlab had emphatically demonstrated its ability to adapt and progress. The second phase, openlab-II, began in January 2006, with Intel, Oracle and HP as partners and the security-software companies Stonesoft and F-Secure as contributors. They were joined in March 2007 by EDS, a giant of the IT-services industry, which contributed to the monitoring tools needed for the Grid computing system being developed for the LHC.

The year 2007 also saw a technical development that was to prove crucial for the future of openlab. At the instigation of Jean-Michel Jouanigot of the network group, CERN and HP ProCurve pioneered a new joint-research partnership. So far, projects had essentially focused on the evaluation and integration of technologies proposed by the partners from industry. In this case, CERN and HP ProCurve were to undertake joint design and development work. The openlab’s hallmark motto, “You make it, we break it”, was joined by a new slogan, “We make it together”. Another major event followed in September 2008 when Wolfgang’s patient, months-long discussions with Siemens culminated in the company becoming a openlab partner. Thus, by the end of Phase II, openlab had entered the world of control systems.

At the start of openlab-III in 2009, Intel, Oracle and HP were joined by Siemens. EDS also decided to extend its partnership by one year. This third phase was characterized by a marked increase in education and communication efforts. More and more workshops were organized on specific themes – particularly in the framework of collaboration with Intel – and the communication structure was reorganized. The post of openlab communications officer, directly attached to the openlab manager, was created in the summer of 2008. A specific programme was drawn up with each partner and tools for monitoring spin-offs were implemented.

Everything was therefore in place for the next phase, which Wolfgang enthusiastically started to prepare at the end of 2010. In May 2011, in agreement with Frédéric Hemmer, who had taken over as head of the IT Department in 2009, he handed over the reins to Bob Jones. The fourth phase of openlab began in January 2012 with not only HP, Intel and Oracle as partners, but also with Chinese multinational Huawei, whose arrival extended openlab’s technical scope to include storage technologies.

After 10 years of existence, the basic principles of openlab still hold true and its long-standing partners are still present. While I, too, passed on the baton at the start of 2012, the openlab adventure is by no means over.

• For a version of this article in French, see https://cern.ch/Fluckiger/Articles/F.Fluckiger-openlab-10_ans_deja.pdf.

The post The openlab adventure continues to thrive appeared first on CERN Courier.

The ALICE computing project

cern — Tue, 27 Mar 2012 14:35:00 +0000

The ALICE software environment (AliRoot) first saw light in 1998, at a time when computing in high-energy physics was facing a challenging task. A community of several thousand users and developers had to be converted from a procedural language (FORTRAN) that had been in use for 40 years to a comparatively new object-oriented language (C++) with which there was no previous experience. Coupled to this was the transition from loosely connected computer centres to a highly integrated Grid system. Again, this would involve a risky but unavoidable evolution from a well known model where, for experiments at CERN, for example, most of the computing was done at CERN with analysis performed at regional computer centres to a highly integrated system based on the Grid “vision”, for which neither experience nor tools were available.

Fig. 1. The structure of the ALICE offline framework, AliRoot.

In the ALICE experiment, we had a small offline team that was concentrated at CERN. The effect of having this small, localized team was to favour pragmatic solutions that did not require a long planning and development phase and that would, at the same time, give maximum attention to automation of the operations. So, on one side we concentrated on “taking what is there and works”, so as to provide the physicists quickly with the tools they needed, while on the other we devoted attention towards ensuring that the solutions we adopted would lend themselves to resilient hands-off operation and would evolve with time. We could not afford to develop “temporary” solutions but still we had to deliver quickly and develop the software incrementally in ways that would involve no major rewrites.

The rise of AliRoot

When development of the current ALICE computing infrastructure started, the collaboration decided to make an immediate transition to C++ for its production environment. This meant the use of existing and proven elements. For the detector simulation package, the choice fell on GEANT3, appropriately “wrapped” into a C++ “class”, together with ROOT, the C++ framework for data manipulation and analysis that René Brun and his team developed for the LHC experiments. This led to a complete, albeit embryonic, framework that could be used for the experiment’s detector-performance reports. AliRoot was born.

Fig. 2. Live monitoring of the nodes in the ALICE Grid.

The initial design was exceedingly simple. There was no insulation layer between AliRoot and ROOT; no software-management layer beyond a software repository accessible to the whole ALICE collaboration; and only a single executable for simulation, calibration, reconstruction and analysis. The software was delivered in a single package, which just needed GEANT3 and ROOT to be operational.

To allow the code to evolve, we relied heavily on virtual interfaces that insulated the steering part from the code from the 18 ALICE subdetectors and the event generators. This proved to be a useful choice because it made the addition of new event generators – and even of new detectors, easy and seamless.

To protect simulation code by users (geometry description, scoring and signal generation) and to ease the transition from GEANT3 to GEANT4, we also developed a “virtual interface” with the Monte Carlo simulator, which allowed us to reuse the ALICE simulation code with other detector-simulation packages. The pressure from the users, who relied on AliRoot as their only working tool, prompted us to assume an “agile” working style, with frequent releases and “merciless” refactorizations of the code whenever needed. In open-source jargon we were working in a “bazaar style”, guided by the users’ feedback and requirements, as opposed to the “cathedral style” process where the code is restricted to an elite group of developers between major releases. The difficulty of working with a rapidly evolving system while also balancing a rapid response to the users’ needs, long-term evolution and stability was largely offset by the flexibility and robustness of a simple design, as well as the consistency of a unique development line where the users’ investment in code and algorithms has been preserved over more than a decade.

The design of the analysis framework also relied directly on the facilities provided by the ROOT framework. We used the ROOT tasks to implement the so called “analysis train”, where one event is read in memory and then passed to the different analysis tasks, which are linked like wagons of a train. Virtuality with respect to the data is achieved via “readers” that can accept different kinds of input and take care of the format conversion. At ALICE we have two analysis objects: the event summary data (ESD) that result from the reconstruction and the analysis object data (AOD) in the form of compact event information derived from the ESD. AODs can be customized with additional files that add information to each event without the need to rewrite them (the delta-AOD). Figure 1 gives a schematic representation that attempts to catch the essence of AliRoot.

The framework is such that the same code can be run on a local workstation, or on a parallel system enabled by the “ROOT Proof” system, where different events are dispatched to different cores, or on the Grid. A plug-in mechanism takes care of hiding the differences from the user.

The early transition to C++ and the “burn the bridge” approach encouraged (or rather compelled) several senior physicists to jump the fence and move to the new language. That the framework was there more than 10 years before data-taking began and that its principles of operation did not change during its evolution allowed several of them to become seasoned C++ programmers and AliRoot experts by the time that the detector started producing data.

AliRoot today

Today’s AliRoot retains most of the features of the original even if the code provides much more functionality and is correspondingly more complex. Comprising contributions from more than 400 authors, it is the framework within which all ALICE data are processed and analysed. The release cycle has been kept nimble. We have one update a week and one full new release of AliRoot every six months. Thanks to an efficient software-distribution scheme, the deployment of a full new version on the Grid takes as little as half a day. This has proved useful for “emergency fixes” during critical productions. A farm of “virtual” AliRoot builders is in continuous operation building the code on different combinations of operating system and compiler. Nightly builds and tests are automatically performed to assess the quality of the code and the performance parameters (memory and CPU).

The next challenge will be to adapt the code to new parallel and concurrent architectures to make the most of the performance of the modern hardware, for which we are currently exploiting only a small fraction of the potential. This will probably require a profound rethinking of the class and data structures, as well as of the algorithms. It will be the major subject of the offline upgrade that will take place in 2013 and 2014 during the LHC’s long shutdown. This challenge is made more interesting because new (and not quite compatible) architectures are continuously being produced.

An AliEn runs the Grid

Work on the Grid implementation for ALICE had to follow a different path. The effort required to develop a complete Grid system from scratch would have been prohibitive and in the Grid world there was no equivalent to ROOT that would provide a solid foundation. There was, however, plenty of open-source software with the elements necessary for building a distributed computing system that would embody major portions of the Grid “vision”.

Following the same philosophy used in the development of AliRoot, but with a different technique, we built a lightweight framework written in the Perl programming language, which linked together several tens of individual open-source components. This system used web services to create a “grid in a box” – a “shrink-wrapped” environment, called Alice Environment or AliEn – or implement a functional Grid system, which already allowed us to run large Monte Carlo productions as early as 2002. From the beginning, the core of this system consisted of a distributed file catalogue and a workload-management system based on the “pull” mechanism, where computer centres fetch appropriate workloads from a central queue.

Fig. 3. A screenshot of jobs executing on the ALICE Grid during the second half of 2011.

AliEn was built as a metasystem from the start with the aim of presenting the user with a seamless interface while joining together the different Grid systems (a so-called overlay Grid) that harness the various resources. As AliEn could offer the complete set of services that ALICE needed from the Grid, the interface with the different systems consisted of replacing as far as possible the AliEn services with the ones of the native Grids.

This has proved to be a good principle because the Advanced Resource Connector (ARC) services of the NorduGrid collaboration are now integrated with AliEn. ALICE users access transparently three Grids (EGEE, OSC and ARC), as well as the few remaining native AliEn sites. One important step was achieved with the tight integration of AliEn with the MonALISA monitoring system, which allows large quantities of dynamic parameters related to the Grid operation to be stored and processed. This integration will continue in the direction of provisioning and scheduling Grid resources based on past and current performance, and load as recorded by MonALISA.

The AliEn Grid has also seen substantial evolution, its core components having been upgraded and replaced several times. However, the user interface has changed little. Thanks to AliEn and MonALISA, the central operation of the entire ALICE Grid takes the equivalent of only three or four full-time operators. It routinely runs complicated job chains fully automated at all times, totalling an average of 28,000 jobs in continuous execution on 80 computer centres in four continents (figure 3).

The next step

Despite the generous efforts of the funding agencies, computing resources in ALICE remain tight. To alleviate the problem and ensure that resources are used at the maximum efficiency, all ALICE computing resources are pooled into AliEn. The corollary is that the Grid is the most natural place for all ALICE users to run any job that exceeds the capacity of a laptop. This has put considerable stress on the ALICE Grid developers to provide a friendly environment, where even running short, test jobs on the Grid should be as simple and fast as running them on a personal computer. This still remains the goal but much ground has been covered in making Grid usage as transparent and efficient as possible; indeed, all ALICE analysis is performed on the Grid. Before a major conference, it is not uncommon to see more than half of the total Grid resources being used by private-analysis jobs.

The challenges ahead for the ALICE Grid are to improve the optimization tools for workload scheduling and data access, thereby increasing the capabilities to exploit opportunistic computing resources. The availability of the comprehensive and highly optimized monitoring tools and data provided by MonALISA are assets that have not yet been completely exploited to provide predictive provisioning of resources for optimized usage. This is an example of a “boundary pushing” research subject in computer science, which promises to yield urgently needed improvements to the everyday life of ALICE physicists.

It will also be important to exploit interactivity and parallelism at the level of the Grid, to improve the “time-to-solution” and to come a step closer to the original Grid vision of making a geographically distributed, heterogeneous system appear similarly to a desktop computer. In particular, the evolution of AliRoot to exploit parallel computing architectures should be extended as seamlessly as possible from multicore and multi-CPU machines – first to different machines and then to Grid nodes. This implies both an evolution of the Grid environment as well as the ALICE software, which will have to be transformed to expose the intrinsic parallelism of the problem in question (event processing) at its different levels of granularity.

Although it is difficult to define success for a computing project in high-energy physics, and while ALICE computing certainly offers much room for improvement, it cannot be denied that it has fulfilled its mandate of allowing the processing and analysis of the initial ALICE data. However, this should not be considered as a result acquired once and for all, or subject only to incremental improvements. Requirements from physicists are always evolving – or rather, growing qualitatively and quantitatively. While technology offers the possibilities to satisfy these requirements, this will entail major reshaping of ALICE’s code and Grid tools to ride the technology wave while preserving as much as possible of the users’ investment. This will be a challenging task for the ALICE computing people for years to come.

The post The ALICE computing project appeared first on CERN Courier.

LHC@home 2.0 attracts massive support

cern — Fri, 23 Sep 2011 10:00:00 +0000

The public launch in August of a new application for CERN’s volunteer-computing platform LHC@home produced an overwhelming response. The application Test4Theory, which runs Monte Carlo simulations of events in the LHC, was announced in a CERN press release on 8 August. Within three days, the number of registered volunteers swelled from a few hundred to nearly 8000. The application joins SixTrack, an accelerator beam-dynamics tool that has been used for LHC machine studies at CERN since 2004 and is now being prepared and extended in collaboration with the École polytechnique fédérale de Lausanne for studies of the LHC and its upgrade.

Given that the new application requires participants to install a virtual machine on their computer – not a trivial task – the level of enthusiasm is impressive. So, to avoid saturating the server that manages the project, there is now a waiting list for new participants. With the volunteer computing power at hand, nearly 20 billion events have already been simulated. According to CERN’s Peter Skands, the physicist leading the simulation effort, when the number of active volunteers passes 40,000 – which could happen later this year – the system will become equivalent to a true “virtual collider”, producing as many collisions per second as the real LHC.

Running part of a “virtual LHC” on their computers is clearly appealing to those who join LHC@home. The volunteers have not only dedicated a great deal of computing time to the project, but in many cases also provided expert assistance in debugging some of the software and managing the discussion forums that are part and parcel of a successful online citizen-science project.

• You can sign up to join the project at http://lhcathome.web.cern.ch/LHCathome/Physics/.

The post LHC@home 2.0 attracts massive support appeared first on CERN Courier.

Citizen cyberscience: the new age of the amateur

cern — Fri, 26 Aug 2011 10:00:00 +0000

An example of fitting theory to experimental particle physics data (black points) gives clearly different results from different parameter values (red and blue curves). Somewhere in between, a theory curve should be possible that “best” describes the data, a task a volunteer could do by altering parameter values. (W T Geile, D A Kosower and P Z Skands 2011 arxiv: 1102.2126v [hep-ph], accepted by Phys. Rev. D.)

The world of journalism has been turned upside-down in recent years by social media technologies that allow a wider range of people to take part in gathering, filtering and distributing news. Although some professional journalists resisted this trend at first, most now appreciate the likes of Facebook, Twitter and blogs in expanding the sources of news and opinion and accelerating dissemination: the audience has become part of the show.

Could the internet one day wreak the same sort of social change on the world of science, breaking down the distinction between amateur and professional? In the world of high-energy physics, that might seem unlikely. What amateur can really contribute something substantial to, say the analysis of LHC data? Yet in many fields of science, the scope for amateur contributions is growing fast.

Modern astronomy, for example, has a long tradition of inspired amateur contributions, such as spotting comets or supernovae. Now, the internet has broadened the range of tasks that amateurs can tackle. For example, the project GalaxyZoo, led by researchers at the University of Oxford, invites volunteers to participate in web-based classification of galaxy images. Such pattern recognition is a task where the human mind still tends to outperform computer algorithms.

Not only can astronomers attract hundreds of thousands of free and eager assistants this way, but occasionally those helpers can themselves make interesting discoveries. This was the case for a Dutch school teacher, Hanny van Arkel, who spotted a strange object in one of the GalaxyZoo images that had stumped even the professional astronomers. It now bears the name “Hanny’s Voorwerp”, the second word meaning “object” in Dutch.

GalaxyZoo is just one of many volunteer-based projects making waves in astronomy. Projects such as Stardust@home, Planet Hunters, Solar Watch and MilkyWay@home all contribute to cutting-edge research. The Einstein@home project uses volunteer computing power to search for – among other things – pulsar signals in radio-astronomy data. Run by researchers at the Max-Planck Institute for Gravitational Research, the project published its first discoveries in Science last year, acknowledging the names of the volunteers whose computers had made each discovery.

Crowdsourcing research

However, it is in fields outside those traditionally accessible to amateurs where some of the most impressive results of citizen-powered science are beginning to be felt. Consider the computer game FoldIt, where players compete to fold protein molecules into their lowest energy configuration. Humans routinely outperform computers at this task, because the human mind is uniquely apt at such spatial puzzles; and teenagers typically out-compete trained biochemists. What the scientists behind the FoldIt project, based at the University of Washington, have also discovered is that the players were spontaneously collaborating to explore new folding strategies – a possibility the researchers had not anticipated. In other words, the amateur protein folders were initiating their own research programme.

Volunteers can run simulations on their own PCs and laptops to help explore how water molecules flow through carbon nanotubes, as part of an effort to design low-cost, efficient water filtration systems.
Image credit: Centre for Micro and Nano Mechanics, Tsinghua University.

Could high-energy physics also benefit from this type of approach? Peter Skands, a theorist at CERN, thinks so. He has been working with colleagues on a project about fitting models to LHC data, where delicate tuning of the model parameters by eye can help the physicists achieve the best overall fit. Experience with a high-school intern convinced Skands that even people not versed in the gory details of LHC physics could solve this highly visual problem efficiently.

Volunteers can already contribute their processor time to another project that Skands is involved in – simulating collisions in the LHC for the recently launched LHC@Home 2.0 project, where 200 volunteers have already simulated more than 5 billion collision events. Such volunteer computing projects, like Einstein@Home, are not as passive as they might appear. Many of the volunteers have spent countless hours helping developers in the early alpha-test stages of the project by providing detailed bug reports. Message boards and a credit system for the amount of processing completed – features provided by an open-source platform called BOINC – add elements of social networking and gaming to the project.

The LHC@Home 2.0 project also relies on CernVM, a virtual machine technology developed at CERN that enables complex simulation code to run easily on the diverse platforms provided by volunteers. Running fully fledged physics simulations for the LHC on home computers – a prospect that seemed technically impossible when the first LHC@home project was introduced in 2004 to simulate proton-beam stability in the LHC ring – now has the potential to expand significantly the computing resources for the LHC experiments. Projects like LHC@home typically draw tens of thousands of volunteers and their computers, a significant fraction of the estimated 250,000 processor cores currently supporting the four LHC experiments.

A humanitarian angle

LHC@home 2.0 is an example of a project that has benefited from the support of the Citizen Cyberscience Centre (CCC), which was set up in 2009 in partnership between CERN, the UN Institute of Training and Research and the University of Geneva. A major objective of the CCC is to promote volunteer computing and volunteer thinking for researchers in developing regions, because this approach effectively provides huge resources to scientists at next to no cost. Such resources can also be used to tackle pressing humanitarian and development challenges.

One example is the project Computing for Clean Water, led by researchers at Tsinghua University in Beijing. The project was initiated by the CCC with the sponsorship of a philanthropic programme run by IBM, called World Community Grid. The goal is to simulate how water flows through carbon nanotubes and explore the use of arrays of nanotubes for low-cost water filtration and desalination. The simulations would require thousands of years on a typical university computing cluster but can be done in just months using volunteer-computing resources aggregated through World Community Grid.

Another example is volunteer mapping for UNOSAT, the operational satellite-applications programme for UNITAR, which is based at CERN. Although a range of crowd-based mapping techniques are available these days, the use of satellite images to assess accurately the extent of damage in regions devastated by war or natural disasters is not trivial, even for experts. However, rapid and accurate assessment is vital for humanitarian purposes in estimating reconstruction costs and rapid mobilization of the international community and NGOs.

Damage assessment using satellite images, like this assessment produced after the Haiti earthquake, is an area where the Citizen Cyberscience Centre is exploring the benefits of public participation for humanitarian response to natural disasters and conflict situations.
Image credit: UNOSAT.

With the help of researchers at the University of Geneva and HP Labs in Palo Alto, UNOSAT is testing new approaches in crowdsourcing damage assessment by volunteers. These involve using statistical approaches to improve accuracy, as well as models inspired by economics where volunteers can vote on the quality of others’ results.

There are hundreds of citizen-cyberscience projects engaging millions of volunteers but the vast majority supports researchers in industrialized countries. A large part of the CCC activities involve raising awareness in developing regions. With the support of the Shuttleworth Foundation in South Africa, the CCC has been organizing a series of “hackfests”: two-day events where scientists, software developers and citizen enthusiasts meet to build prototypes of new citizen-based projects, which the scientists can then go on to refine. Hackfests have already taken place in Beijing, Taipei, Rio de Janeiro and Berlin, with more planned this year in South Africa and India.

The topics covered to date include: using mobile-phone Bluetooth signals as a proxy for bacteria, tracking how air-borne bacterial diseases such as tuberculosis spread in buildings, monitoring earthquakes using the motion sensors built in to laptop computers and digitizing tables of economics data from government archives. Because the “end-users” – the citizen volunteers themselves – participate in the events, there is a healthy focus on making projects as accessible and attractive as possible, so that even more volunteers sign up and stay active.

At such events, when asked what sort of rewards the most engaged volunteers might appreciate for their online efforts, one striking response – echoed on several occasions – is the opportunity to make a suggestion to the scientists for the course of their future research. In other words, there is a desire on behalf of volunteers to be involved more actively in the process that defines what science gets done. The volunteers who propose this are quite humble in their expectations – they understand that not every idea they have will be useful or feasible. Whether scientists will reject this sort of offer of advice as unwanted interference, or embrace the potentially much larger brainpower that informed amateurs could provide, remains to be seen. But the sentiment is clear: in science, as in journalism, the audience wants to be part of the show.

The post Citizen cyberscience: the new age of the amateur appeared first on CERN Courier.

Hardware joins the open movement

cern — Mon, 06 Jun 2011 10:00:00 +0000

A circuit board designed within the context of the Open Hardware Repository. The reverse side bears the licence statement: “Licensed under CERN OHL www.ohwr.org/cernohl.”

“Designing in an open environment is definitely more fun than doing it in isolation, and we firmly believe that having fun results in better hardware.” It is hard to deny that enthusiasm is inspiring and that it can be one of the factors in the success of any enterprise. The statement comes from the Manifesto of the Open Hardware Repository (OHR), which is defined by its creators as a place on the web where electronics designers can collaborate on open-hardware designs, much in the philosophy of the movement for open-source software. Of course, there is more to this than the importance of enthusiasm. Feedback from peers, design reuse and better collaboration with industry are also among the important advantages to working in an open environment.

The OHR was the initiative of electronics designers working in experimental-physics laboratories who felt the need to enable knowledge-exchange across a wide community and in line with the ideals of “open science” being fostered by organizations such as CERN. “For us, the drive towards open hardware was largely motivated by well meaning envy of our colleagues who develop Linux device-drivers,” says Javier Serrano, an engineer at CERN’s Beams Department and the founder of the OHR. “They are part of a very large community of competent designers who share their knowledge and time in order to come up with the best possible operating system. They learn a lot and have lots of fun in the process. This enables them to provide better drivers faster to our CERN clients,” he continues. “We wanted that, and found out that there was no intrinsic reason why hardware development should be any different. After all, we all work with computers and the products of our efforts are also binary files, which later become pieces of hardware.”

One of the main factors leading to the creation of the OHR was the wish to avoid duplication by simply sharing results across different teams that might be working simultaneously towards the solution of the same problem. Sharing the achievements of each researcher in the repository also results in an improved quality of work. “Sharing design effort with other people has forced us to be better in a number of areas,” states Serrano. “You can’t share without a proper preliminary specification-phase and good documentation. You also can’t share if you design a monolithic solution rather than a modular one from which you and others can pick bits and pieces to use in other projects. The first time somebody comes and takes a critical look at your project it feels a bit awkward, but then you realize how much great talent there is out there and how these people can help, especially in areas that are not your main domain of competence.”

Under licence

The Open Hardware Repository logo, with more than a passing resemblance to a well known mascot for free and open-source software.

Two years after its creation, the OHR currently hosts more than 40 projects from institutes that include CERN, GSI and the University of Cape Town. Such a wealth of knowledge in electronics design can now be shared under the newly published CERN Open Hardware Licence (OHL), which was released in March and is available on the OHR. “In the spirit of knowledge sharing and dissemination, this licence governs the use, copying, modification and distribution of hardware design documentation, and the manufacture and distribution of products,” explains Myriam Ayass, legal adviser of the Knowledge and Technology Transfer Group at CERN and author of the CERN OHL. The documentation that the OHL refers to includes schematic diagrams, designs, circuit or circuit-board layouts, mechanical drawings, flow charts and descriptive texts, as well as other explanatory material. The documentation can be in any medium, including – but not limited to – computer files and representations on paper, film, or other media.

The introduction of the CERN OHL is indeed a novelty in which the long-standing practice of sharing hardware design has adopted a clear policy for the management of intellectual property. “The CERN–OHL is to hardware what the General Public Licence is to software. It defines the conditions under which a licensee will be able to use or modify the licensed material,” explains Ayass. “The concept of ‘open-source hardware’ or ‘open hardware’ is not yet as well known or widespread as the free software or open-source software concept,” she continues. “However, it shares the same principles: anyone should be able to see the source (the design documentation in case of hardware), study it, modify it and share it. In addition, if modifications are made and distributed, it must be under the same licence conditions – this is the ‘persistent’ nature of the licence, which ensures that the whole community will continue benefiting from improvements, in the sense that everyone will in turn be able to make modifications to these improvements.”

Despite these similarities, the application of “openness” in the two domains – software and hardware – differs substantially because of the nature of the “products”. “In the case of hardware, physical resources must be committed for the creation of physical devices,” Ayass points out. “The CERN OHL thus specifically states that manufacturers of such products should not imply any kind of endorsement or responsibility on the part of the designer(s) when producing and/or selling hardware based on the design documents. This is important in terms of legal risks associated with engaging in open-source hardware, and properly regulating this is a prerequisite for many of those involved.”

The OHR also aims to promote a new business model in which companies can play a variety of roles, design open hardware in collaboration with other designers or clients and get paid for that work. As Serrano explains: “Companies can also commercialize the resulting designs, either on their own or as part of larger systems. Customers, on their side, can debug designs and improve them very efficiently, ultimately benefiting not only their own systems but also the companies and other clients.”

“The fact that the designs are ‘open’ also means that anyone can manufacture the product based on this design – from individuals to research institutes to big companies – and commercialize it. This is one approach of technology transfer that nicely combines dissemination of the technology and of the accompanying knowledge,” adds Ayass. This combining of an innovative business model and the OHL is finding a positive response in the commercial world. “We are very excited because we are proving that there is no contradiction between commercial hardware and openness,” says Serrano, who concludes: “The CERN OHL will be a great tool for us to collaborate with other institutes and companies.”

• For more about the OHR see www.ohwr.org. For more about the CERN OHL, see www.ohwr.org/cernohl.

The post Hardware joins the open movement appeared first on CERN Courier.

Computing conference goes to Taipei

cern — Wed, 30 Mar 2011 09:00:00 +0000

CHEP 2010 opened with a spectacular, 20-minute drum ceremony.
Image credit: ASGC Taipei.

The conferences on Computing in High Energy and Nuclear Physics (CHEP), which are held approximately every 18 months, reached their silver jubilee with CHEP 2010, held at the Academia Sinica Grid Computing Centre (ASGC) in Taipei in October. ASGC is the LHC Computing Grid (LCG) Tier 1 site for Asia and the organizers are experienced in hosting large conferences. Their expertise was demonstrated again throughout the week-long meeting, drawing almost 500 participants from more than 30 countries, including 25 students sponsored by CERN’s Marie Curie Initial Training Network for Data Acquisition, Electronics and Optoelectronics for LHC Experiments (ACEOLE).

Appropriately, given the subsequent preponderance of LHC-related talks, the LCG project leader, Ian Bird of CERN, gave the opening plenary talk. He described the status of the LCG, how it got there and where it may go next, and presented some measures of its success. The CERN Tier 0 centre moves some 1 PB of data a day, in- and out-flows combined; it writes around 70 tapes a day; the worldwide grid supports some 1–million jobs a day; and it is used by more than 2000 physicists for analysis. Bird was particularly proud of the growth in service reliability, which he attributed to many years of preparation and testing. For the future, he believes that the LCG community needs to be concerned with sustainability, data issues and changing technologies. The status of the LHC experiments’ offline systems were summarized by Roger Jones of Lancaster University. He stated that the first year of operations had been a great success, as presentations at the International Conference on High Energy Physics in Paris had indicated. He paid tribute to CERN’s support of Tier–0 and he remarked that data distribution has been smooth.

In the clouds

As expected, there were many talks about cloud computing, including several plenary talks on general aspects, as well as technical presentations on practical experiences and tests or evaluations of the possible use of cloud computing in high-energy physics. It is sometimes difficult to separate hype from initiatives with definite potential but it is clear that clouds will find a place in high-energy physics computing, probably based more on private clouds rather than on the well known commercial offerings.

Wong Chi-Huey, president of Academia Sinica, welcomed the participants to the conference.
Image credit: ASGC Taipei.

Harvey Newman of Caltech described a new generation of high-energy physics networking and computing models. As the available bandwidth continues to grow exponentially in capacity, LHC experiments are increasingly benefiting from it – to the extent that experiment models are being modified to make more use of pulling data to a job rather than pushing jobs towards the data. A recently formed working group is gathering new network requirements for future networking at LCG sites.

Lucas Taylor of Fermilab addressed the issue of public communications in high-energy physics. Recent LHC milestones have attracted massive media interest and Taylor stated that the LHC community simply has no choice other than to be open, and welcome the attention. The community therefore needs a coherent policy, clear messages and open engagement with traditional media (TV, radio, press) as well as with new media (Web 2.0, Twitter, Facebook, etc.). He noted major video-production efforts undertaken by the experiments, for example ATLAS-Live and CMS TV, and encouraged the audience to contribute where possible – write a blog or an article for publication, offer a tour or a public lecture and help build relationships with the media.

There was an interesting presentation of the Facility for Antiproton and Ion Research (FAIR) being built at GSI, Darmstadt. Construction will start next year and switch-on is scheduled for 2018. Two of the planned experiments are the size of ALICE or LHCb, with similar data rates expected. Triggering is a particular problem and data acquisition will have to rely on event filtering, so online farms will have to be several orders of magnitude larger than at the LHC (10,000 to 100,000 cores). This is a major area of current research.

David South of DESY, speaking on behalf of the Study Group for Data Preservation and Long-term Analysis in High-Energy Physics set up by the International Committee for Future Accelerators, presented what is probably the most serious effort yet for data preservation in high-energy physics. The question is: what to do with data after the end of an experiment? With few exceptions, data from an experiment are often stored somewhere until eventually they are lost or destroyed. He presented some reasons why preservation is desirable but needs to be properly planned. Some important aspects include the technology used for storage (should it follow storage trends, migrating from one media format to the next?), as well as the choice of which data to store. Going beyond the raw data, this must also include software, documentation and publications, metadata (logbooks, wikis, messages, etc.) and – the most difficult aspect – people’s expertise.

Although some traditional plenary time had been scheduled for additional parallel sessions, there were still far too many submissions to be given as oral presentations. So, almost 200 submissions were scheduled as posters, which were displayed in two batches of 100 each over two days. The morning coffee breaks were extended to permit attendees to view them and interact with authors. There were also two so-called Birds of a Feather sessions on LCG Operations and LCG Service Co-ordination, which allowed the audience to discuss aspects of the LCG service in an informal manner.

The parallel stream on Online Computing was, of course, dominated by LHC data acquisition (DAQ). The DAQ systems for all experiments are working well, leading to fast production of physics results. Talks on event processing provided evidence of the benefits of solid preparation and testing; simulation studies have proved to provide an amazingly accurate description of LHC data. Both the ATLAS and CMS collaborations report success with prompt processing at the LCG Tier 0 at CERN. New experiments, for example at FAIR, should take advantage of the experiment frameworks used currently by all of the LHC experiments, although the analysis challenges of the FAIR experiments exceed those of the LHC. There was also a word of caution – reconstruction works well today but how will it cope with increasing event pile-up in the future?

Presentations in the software engineering, data storage and databases stream covered a heterogeneous range of subjects, from quality assurance and performance monitoring to databases, software re-cycling and data preservation. Once again, the conclusion was that the software frameworks for the LHC are in good shape and that other experiments should be able to benefit from this.

The most popular parallel stream of talks was dedicated to distributed processing and analysis. A main theme was the successful processing and analysis of data in a distributed environment, dominated, of course, by the LHC. The message here is positive: the computing models are mainly performing as expected. The success of the experiments relies on the success of the Grid services and the sites but the hardest problems take far longer to solve than foreseen in the targeted service levels. The other two main themes were architecture for future facilities such as FAIR, the Belle II experiment, at the SuperKEKB upgrade in Japan, and the SuperB project in Italy; and improvements in infrastructure and services for distributed computing. The new projects are using a tier structure, but apparently with one layer fewer than in the LCG. Two new, non-high-energy-physics projects – the Fermi gamma-ray telescope and the Joint Dark Energy Mission – seem not to use Grid-like schemes.

Tools that work

The message from the computing fabrics and networking stream was that “hardware is not reliable, commodity or otherwise”; this statement from Bird’s opening plenary was illustrated in several talks. Deployments of upgrades, patches, new services are slow – another quote from Bird. Several talks showed that the community has the mechanism, so perhaps the problem is in communications and not in the technology? Yes, storage is an issue and there is a great deal of work going on in this area, as shown in several talks and posters. However, the various tools available today have proved that they work: via the LCG, the experiments have stored and made accessible the first months of LHC data. This stream included many talks and posters on different aspects and uses of virtualization. It was also shown that 40 Gbit and 100 Gbit networks are a reality: network bandwidth is there but the community must expect to have to pay for it.

Four days of poster sessions allowed for almost 200 submissions to go on display.
Image credit: ASGC Taipei.

Compared with previous CHEP conferences, there was a shift in the Grid and cloud middleware sessions. These showed that pilot jobs are fully established, virtualization is entering serious large-scale production use and there are more cloud models than before. A number of monitoring and information system tools were presented, as well as work on data management. Various aspects of security were also covered. Regarding clouds, although the STAR collaboration at the Relativistic Heavy Ion Collider at Brookhaven reported impressive production experience and there were a few examples of successful uses of Amazon EC2 clouds, other initiatives are still at the starting gate and some may not get much further. There was a particularly interesting example linking CernVM and Boinc. It was in this stream that one of the more memorable quotes of the week occurred, from Rob Quick of Fermilab: “There is no substitute for experience.”

The final parallel stream covered collaborative tools, with two sessions. The first was dedicated to outreach (Web 2.0, ATLAS Live and CMS Worldwide) and new initiatives (Inspire); the second to tools (ATLAS Glance information system, EVO, Lecture archival scheme).

• The next CHEP will be held on 21–25 May, 2012, hosted by Brookhaven National Laboratory, at the NYU campus in Greenwich Village, New York, see www.chep2012.org/.

The post Computing conference goes to Taipei appeared first on CERN Courier.

EUAsiaGrid discovers opportunity in diversity

cern — Mon, 07 Jun 2010 08:00:00 +0000

Distribution of the EUAsiaGrid institutional partners across South-East Asia and Australia. The partners in Europe are INFN in Italy, CESNET in the Czech Republic, NCeSS in the UK and HealthGrid in France.

More than half of the world’s people live in Asia. Even putting aside the two titans India and China, there are some 600 million inhabitants – 100 million more than in the entire EU – in the region that is commonly referred to as South-East Asia. From Myanmar in the west to Indonesia’s Papua province in the east, the territory is nearly twice the width of the continental US. Most of the Asian partners in EUAsiaGrid hail from this region, which has more than its fair share of natural disasters in the form of earthquakes, volcano eruptions, typhoons and tsunamis, not to mention enduring political tensions.

Despite these challenging circumstances, EUAsiaGrid has managed to make a significant impact in a relatively short time. This has been driven by increased sharing of data storage and processing power between participating institutions in the region. It was achieved through a concerted effort by the project leaders to encourage the adoption across the region of the gLite middleware of Enabling Grids for E-sciencE (EGEE), which is the same middleware used by the Worldwide LHC Computing Grid (WLCG).

As the head of EUAsiaGrid, Marco Paganoni, who is based at INFN and the University of Milan-Bicocca, points out: “This technological push has enabled researchers in some of the participating countries to become involved in international science initiatives that they otherwise might not be able to afford to participate in.”

EUAsiaGrid owes its origins to the pioneering efforts of the global high-energy physics community

Like many other international Grid projects, EUAsiaGrid owes its origins to the pioneering efforts of the global high-energy physics community to promote Grid technology for science, and to the nurturing role of the European Commission in spreading Grid technical know-how throughout the world through joint projects. In addition, a key catalyst for EUAsiaGrid has been Simon Lin, project director of Academia Sinica Grid Computing (ASGC). His efforts established ASGC as the Asian Tier-1 data centre for WLCG. He and his team have been bringing Asian researchers together for nine years at the annual International Symposium on Grid Computing (ISGC) held each spring in Taipei.

The EUAsiaGrid project, launched as a “support action” by the European Commission within Framework Programme 7 in April 2008, focuses on discovering regional research benefits for Grid computing. “We realized that identifying and addressing local needs was the key to success in this region,” says Paganoni. From the outset, capturing local e-science requirements was an important component of the project’s objectives. Moreover, comparing those requirements revealed a great deal of common ground amid all of the regional diversity.

Earth-shaking experience

Circles indicate warning times for the earthquake that struck the island of Taiwan on 4 March 2010. Taipei is the northernmost city indicated on the map, on the 50-s circle.
Image credit: Nai-Chi Hsiao, Central Weather Bureau, Taipei.

One common theme was the region’s propensity for natural disasters and the ability of Grid technology and related information technology solutions to help mitigate the consequences of such events. For example, EUAsiaGrid researchers have helped build links between different national sensor-networks, such as those of Vietnam and Indonesia. Researchers in the Philippines are now benefiting from the Grid-based seismic modelling experience of their Taiwanese partners. Sharing data and Grid know-how in this manner means that the scientists involved can better tune local models of earthquake and tsunami propagation.

At the most recent ISGC, which was held in March, a special EUAsiaGrid Disaster Mitigation Workshop devoted a day to the latest technological progress in monitoring and simulating both earthquakes and tsunamis. Nai-Chi Hsiao of the Central Weather Bureau in Taipei explained in a talk about the early-warning system for Taiwan that it takes just 60 s for an earthquake to travel from the south to the north of the island, leaving precious little time to make a decision about shutting down nuclear reactors or bringing high-speed trains to a grinding halt and so avoid the worst consequences that a large earthquake might cause.

Perspective of earthquake depth for major earthquakes in and around the island of Taiwan.
Image credit: Wen-Tzong Liang, Institute of Earth Sciences Academia Sinica.

Where could Grid technology fit into this picture? The island is rocked by earthquakes, both large and small, all of the time. It is simply not viable to shut down power plants and stop trains every time that a tremor is detected. What is needed is a quick prediction of the impact that a particular earthquake may have on key infrastructure across the island. However, the level of shaking that an earthquake produces 100 km away can depend strongly on, for example, the depth at which it occurs.

There is certainly no time to do a full simulation once an earthquake is detected. According to Li Zhao of the Institute of Earth Sciences at Academia Sinica, it might instead be possible to pull out a pre-processed simulation from a database and make a quick decision based on what it predicts. This would require processing and storing the results of simulations for a huge number of possible earthquake epicentres – a task that is well suited to Grid computing.

Neglected diseases

Another common thread of the research sponsored by EUAsiaGrid has been searching for cures to diseases that plague the region but which have been largely neglected by pharmaceuticals companies because they do not affect more lucrative markets in the industrialized world.

Consider dengue fever, for example. For most sufferers, the fever and pain produced by the disease pass after a very unpleasant week, but for some it leads to dengue haemorrhagic fever, which is often fatal. Like malaria, dengue is borne by mosquitoes. But unlike malaria, it affects people as much in the cities as it does in the countryside. As a result, it has a particularly high incidence in heavily populated parts of South-East Asia and it is a significant source of infant mortality in several countries.

Map of prevalence of dengue fever around the world.
Image credit: WHO.

As yet there are no drugs designed to specifically target the dengue virus. So EUAsiaGrid partners launched an initiative last July called Dengue Fever Drug Discovery, which will start a systematic search for such drugs by harnessing Grid computing to model how huge databases of chemical compounds would interact with key sites on the dengue virus, potentially disabling it.

This is not the first time that Grid technology has been used to amplify the computing power that can be harnessed for such ambitious challenges. Malaria and avian influenza have been targets of previous massive search efforts, dubbed by experts “in-silico high-throughput screening”.

Leading the effort on dengue at Academia Sinica in Taipei is researcher Ying-Ta Wu of the Genomics Research Centre. He and colleagues prepared some 300,000 virtual compounds to be tested in a couple of months, using the equivalent of more than 12 years of the processing power of a single PC. The goal of this exercise was not just to get the processing done quickly but also to encourage partners in Asia to collaborate on sharing the necessary hardware, including institutes in Malaysia, Vietnam and Thailand.

The Aedis aegypti mosquito is the principal carrier of dengue fever.
Image credit: Muhammad Mahdi Karim.

It is not just hard sciences such as geology and biology that benefit from Grid know-how. Indeed, as Paganoni notes: “Modelling the social and economic impacts of major disasters and diseases is a Grid-computing challenge in itself, and is often top of the agenda when EUAsiaGrid researchers have discussions with government representatives in the region.”

Even the humanities have benefited from these efforts. Capturing culture in a digital form can lead to impressive demands for storage and processing. Grid technology has a role to play in providing those resources. For instance, it can take more than a week using a single desktop computer to render a 10-minute recording of the movements of a Malay dancer performing the classical Mak Yong dance into a virtual 3D image of the dancer, using motion-capture equipment attached to the dancer’s body. Once this is done, though, every detail of the dance movement is permanently digitized, and hence preserved for posterity, as well as being available for “edutainment” applications.

The problem, however, is that a complete Mak Yong dance carried out for ceremonial purposes could last a whole night, not just 10 minutes. Rendering and storing all of the data necessary for this calls for Grid computing.

A Mak Yong dancer with motion sensors attached for recording movements during the dance.
Image credit: Info Com Development Centre of Universiti Putra Malaysia.

Faridah Noor, an associate professor at the University of Malaya, became involved in the EUAsiaGrid project because she saw great potential for Grid-enabled digital preservation of traditional dances and artefacts for posterity. She and her colleagues are working on several projects to capture and preserve digitally even the most ephemeral cultural relics, such as masks carved by shamans of the Mah Meri tribe used to help cure people of their ailments or to ward off evil. The particular challenge here is that the shamans deliberately throw the masks into the sea as part of the ritual, to cast away bad spirits.

As Noor, who works in the area of sociolinguistics and ethnolinguistics, points out: “We have to capture the story behind the mask.” Each mask is made for an individual and his or her illness, so capturing the inspiration that guides the shaman while preparing the mask is as important as recording the way in which he carves the wood, and rendering 3D images of the resulting mask.

An important legacy of the EUAsiaGrid project, Paganoni says, will be the links that it has helped to establish between researchers in the natural sciences, the social sciences and the humanities, both within South-East Asia and with European institutions. These links trace their origin to a common interest in exploiting Grid technology.

• Based on articles previously published in International Science Grid This Week, with permission.

The post EUAsiaGrid discovers opportunity in diversity appeared first on CERN Courier.

Particle physics INSPIREs information retrieval

cern — Wed, 31 Mar 2010 07:00:00 +0000

INSPIRE’s detailed record for the recent joint paper by the Tevatron experiments CDF and DØ on limits on the mass of the Higgs boson.

Particle physicists thrive on information. They first create information by performing experiments or elaborating theoretical conjectures. Then they convey it to their peers by writing papers that are disseminated in a preprint form long before publication. Keeping track of this information has long been the task of libraries at the larger laboratories, such as at CERN, DESY, Fermilab and SLAC, as well as being the focus of indispensable services including arXiv and those of the Particle Data Group.

It is household knowledge that the web was born at CERN, and every particle physicist knows about SPIRES, the place where they can find papers, citations and information about colleagues. However, not everyone knows that the first US web server and the first database on the web came about at SLAC with just one aim: to bring scientific information to the fingertips of particle physicists through the SPIRES platform. SPIRES was hailed as the first “killer” application of the then nascent web.

No matter how venerable, the information tools currently serving particle physicists no longer live up to expectations and information management tools used elsewhere in the world have been catching up with those of the high-energy physics community. The soon to be released INSPIRE service will bring state-of-the-art information retrieval to the fingertips of researchers in high-energy physics once more, not only enabling more efficient searching but paving the way for modern technologies and techniques to augment the tried-and-tested tools of the trade.

Meeting demand

The INSPIRE project involves information specialists from CERN, DESY, Fermilab and SLAC working in close collaboration with arXiv, the Particle Data Group and publishers within the field of particle physics. “We separate the work such that we don’t duplicate things. Having one common corpus that everyone is working on allows us to improve remarkably the quality of the end product,” explains Tim Smith, head of the User and Document Services Group in the IT Department at CERN, which is providing the Invenio technology that lies at the core of INSPIRE.

In 2007, many providers of information in the field came together for a summit at SLAC to see how physics-information resources could be enhanced. The INSPIRE project emerged from that meeting and the vision behind it was built from a survey launched by the four labs to evaluate the real needs of the community (Gentil-Beccot et al. 2008.). A large number of physicists replied enthusiastically, even writing reams of details in the boxes that were made available to input free text. The bulk of the respondents noted that the SPIRES and arXiv services were together the dominant resources in the field. However, they pointed out that SPIRES in particular was “too slow” or “too arcane” to meet their current needs.

INSPIRE responds to this directive from the community by combining the most successful aspects of SPIRES (a joint project of DESY, Fermilab and SLAC) with the modern technology of Invenio (the CERN open-source digital-library software). “SPIRES’ underlying software was overdue for replacement, and adopting Invenio has given INSPIRE the opportunity to reproduce SPIRES’ functionality using current technology,” says Travis Brooks, manager of the SPIRES databases at SLAC. The name of the service, with the “IN” from Invenio augmenting SPIRES’ familiar name, underscores this beneficial partnership. “It reflects the fact that this is an evolution from SPIRES because the SPIRES service is very much appreciated by a large community of physicists. It is a sort of brand in the field,” says Jens Vigen, head of the Scientific Information Group at CERN.

However, INSPIRE takes its own inspiration from more than just SPIRES and Invenio. In searching for a paper, INSPIRE will not only fully understand the search syntax of SPIRES, but will also support free-text searches like those in Google. “From the replies we received to the survey, we could observe that young people prefer to just throw a text string in a field and push the search button, as happens in Google,” notes Brooks.

This service will facilitate the work of the large community of particle physicists. “Even more exciting is that after releasing the initial INSPIRE service, we will be releasing many new features built on top of the modern platform,” says Zaven Akopov of the DESY library. INSPIRE will enable authors and readers to help catalogue and sort material so that everyone will find the most relevant material quickly and easily. INSPIRE will also be able to store files associated with documents, including the full text of older or “orphaned” preprints. Stephen Parke, senior scientist at the Fermilab Theory Department looks forward to these enhancements: “INSPIRE will be a fabulous service to the high-energy-physics community. Not only will you be able to do faster, more flexible searching but there is a real need to archive all conference slides and the full text of PhD theses; INSPIRE is just what the community needs at this time.”

INSPIRE provides citation histories, in this case for the paper that announced the detection of ⁸B solar neutrinos in the Sudbury Neutrino Observatory in 2001, a key result in the discovery of neutrino oscillations.

Pilot users see INSPIRE already rising to meet these expectations, as remarked on by Tony Thomas, director of the Australian Research Council Special Research Centre for the Structure of Matter: “I tried the alpha version of INSPIRE and was amazed by how rapidly it responded to even quite long and complex requests.”

The Invenio software that underlies INSPIRE is a collaborative tool developed at CERN for managing large digital libraries. It is already inspiring many other institutes around the world. In particular, the Astrophysics Data System (ADS) – the digital library run by the Harvard-Smithsonian Center for Astrophysics for NASA – recently chose Invenio as the new technology to manage its collection. “We can imagine all sorts of possible synergies here,” Brooks anticipates. “ADS is a resource very much like SPIRES, but focusing on the astronomy/astrophysics and increasingly astroparticle community, and since our two fields have begun to do a lot of interdisciplinary work the tighter collaboration between these resources will benefit both user communities.”

Invenio is also being used by many other institutes around the world and many more are considering it. “In the true spirit of CERN, Invenio is an open-source product and thus it is made available under the GNU General Public Licence,” explains Smith. “At CERN, Invenio currently manages about a million records. There aren’t that many products that can actually handle so many records,” he adds.

Invenio has at the same time broadened its scope to include all sorts of digital records, including photos, videos and recordings of presentations. It makes use of a versatile interface that makes it possible, for example, to have the site available in 20 languages. Invenio’s expandability is being exploited to the full for the INSPIRE project where a rich set of back-office tools are being developed for cataloguers. “These tools will greatly ease the manual tasks, thereby allowing us to get papers faster and more accurately into INSPIRE,” explains Heath O’Connell from the Fermilab library. “This will increase the search accuracy for users. Furthermore, with the advanced Web 2.0 features of INSPIRE, users will have a simpler, more powerful way to submit additions, corrections and updates, which will be processed almost in real time”.

Researchers in high-energy physics were once the beneficiaries of world-leading information management. Now INSPIRE, anchored by the Invenio software, aims once again to give the community a world-class solution to its information needs. The future is rich with possibilities, from interactive PDF documents to exciting new opportunities for mining this wealth of bibliographic data, enabling sophisticated analyses of citations and other information. The conclusion is easy: if you are a physicist, just let yourself be INSPIREd!

• The INSPIRE service is available at http://inspirebeta.net/.

The post Particle physics INSPIREs information retrieval appeared first on CERN Courier.

Working for the world: UNOSAT and CERN

cern — Wed, 30 Sep 2009 07:00:00 +0000

Image credit: United Nations.' data-caption='UNOSAT experts participate in UN mission to the field to contribute their geographic-information skills. Here, UNOSAT staff plots GPS points during an emergency in Côte d’Ivoire.
Image credit: United Nations.'> Image credit: United Nations.'>

UNOSAT experts participate in UN mission to the field to contribute their geographic-information skills. Here, UNOSAT staff plots GPS points during an emergency in Côte d’Ivoire.
Image credit: United Nations.

Much of the interesting work that happens at CERN is underground – but not all. Since 2002, the team that runs UNOSAT, the Operational Satellite Applications Programme of the United Nations Institute for Training and Research (UNITAR), has been based at the laboratory’s Meyrin site. This hosting arrangement, which has support from the Swiss government, resulted from a pioneering institutional agreement between CERN and the UN. The programme demonstrates the potential for collaboration between these two international bodies in areas of mutual interest.

The mission of UNITAR, established by the UN General Assembly, is to deliver innovative training and to conduct research on knowledge systems and methodologies. Through adult professional training and technical support, the institute contributes towards developing the capacities of tens of thousands of professionals around the world using face-to-face and distance learning.

UNOSAT is a technology-based initiative supported by a team of specialists in remote-sensing and geographic-information systems. It is part of UNITAR’s Department of Research, mainly because of its groundbreaking innovations in the use of satellite-derived solutions in the context of the UN work. As a result of its research and applications, UNOSAT offers very high-resolution imagery to enhance humanitarian actions; monitors piracy using geospatial information; connects the world of the UN to Grid technology; and it has introduced objective satellite images into the assessment of human-rights violations.

A vital source of information

Initially created to explore the potential of satellite Earth observation for the international community, this programme has developed specific mapping and analysis services that are used by various UN agencies and by national experts worldwide. UNOSAT’s mission is to deliver integrated satellite-based solutions for human security, peace and socioeconomic development. Its most important goal, however, is to make satellite data and geographic information easily accessible to an increasing number of UN and national experts who work with geographic information systems (GIS).

The UNOSAT team combines the experience of satellite imagery analysts, database programmers and geographic-information experts with that of fieldworkers and development experts. This unique set of skills gives the UNOSAT team the ability to understand the needs of a variety of international and national users and to provide them with suitable information anywhere and anytime. Anywhere, because – thanks to CERN’s IT support – UNOSAT can handle and store large amounts of data and transfer maps as needed directly via the web; anytime, because UNOSAT is available 24 hours a day, every day of the year.

In simple terms, UNOSAT acquires and processes satellite data to produce and deliver information, analysis and observations, which are used by the UN and national entities for emergency response and to assess the impact of a disaster or conflict, or to plan sustainable development. The main difference between this programme and other UN undertakings is that UNOSAT uses high-end technology to develop innovative solutions. It does this in partnership with the main space agencies and commercial satellite-data providers.

One such innovation was the creation in 2003 of a new humanitarian rapid-mapping service. Now fully developed, the service has been used in more than 100 major disasters and conflict situations, and has produced more than 900 satellite-derived analyses and maps. The work requires the rapid acquisition and processing of satellite imagery and data for the creation of map and GIS layers. These are then used by the headquarters of UN agencies to make decisions and in the field during an emergency response to co-ordinate rescue teams and assess the impact of a given emergency. This type of map was of great use in the aftermath of the Asian Tsunami of 2004 and in response to the 2005 earthquake in Pakistan. Similar maps have been used to monitor the impact of the conflict between Israel and the Hezbollah in Southern Lebanon and during the Middle East crisis in Gaza. They have also been valuable in monitoring the flux of displaced populations, most recently, during the conflict this year in Sri Lanka (figure 1).

Fig. 1. An example of using satellite geographic information to analyse and monitor the humanitarian situation on the ground in Sri Lanka.

There are tens of less publicized crises every year in which the UN is involved because of their humanitarian consequences on thousands of innocent civilians in developing countries. UNOSAT supports the work of relief workers and NGO volunteers with timely and accurate analysis of a situation on the ground, and responds to requests from the field for particular geographic information.

The work of UNOSAT is not solely related with emergencies, although the maps available on the website all refer to humanitarian assistance. This publication policy is linked to enabling humanitarian workers in various field locations to download maps prepared by UNOSAT at CERN via internet or satellite telecommunications. In addition, there are a large number of maps and analyses that are not publicly available on the UNOSAT website because they are part of project activities requested by UN agencies, such as the UN Development Programme, the International Organization for Migration and the World Health Organization.

Once an emergency is over, the work of the UN continues with assistance to governments in rehabilitation and reconstruction. UNOSAT remains engaged beyond the emergency phase by supporting early recovery activities that are undertaken to help local populations get back to normality following a disaster or conflict. Satellites are helpful in these circumstances: think of the work required to reconstruct an entire cadastre, for example, without appropriate geographic information; or to plan the re-establishment of road and rail networks without accurate information on the extent of damage suffered.

UNOSAT’s experience in mapping and analysis – and its innovative methodologies – are regularly transferred to the world beyond, thanks to training modules and information events that are organized by the UN or directly by UNITAR. At CERN, for example, UNOSAT hosts and trains national experts from Indonesia, Nicaragua and Nigeria, to mention a few recent cases. These experts receive intensive two-week training sessions, during which they stay at CERN. In other cases, UNOSAT sends its trainers abroad to train and provide technical support to fieldworkers in developing countries. All of the experts trained by UNOSAT then become part of a global network of skilled staff who can be connected to work together when needed.

The technical work of UNOSAT is made possible by the agreement between UNITAR and CERN, so CERN’s support is of fundamental importance. The recognition – and even the awards that UNOSAT enjoys in return for its relentless work – go in part also to all those at CERN who help and support UNOSAT work.

Conscious of the potential held by this success story, CERN and UNITAR took the opportunity of the renewal of their agreement in December 2008 to begin a series of consultations to strengthen their collaboration in areas of mutual interest. The realm of scientific applications to advance international agendas that guide the work of the UN is being discussed at senior level and ideas for joint undertakings are currently being considered.

• For more information, visit www.unitar.org and www.unitar.org/unosat.

The post Working for the world: UNOSAT and CERN appeared first on CERN Courier.

STEP ’09 sets new records around the world

cern — Tue, 25 Aug 2009 10:00:00 +0000

After months of preparation and two intensive weeks of continuous operation in June, the LHC experiments celebrated the achievement of a new set of goals aimed at demonstrating full readiness for the data-taking with collisions expected to start later this year. The Scale Testing for the Experiment Programme ’09 (STEP ’09) was designed to stress the Worldwide LHC Computing Grid (WLCG), the global computing Grid that will support the experiments as they exploit the new particle collider. The WLCG combines the computing power of more than 140 computer centres, in a collaboration between 33 countries.

While there have been several large-scale data-processing tests in recent years, this was the first production demonstration to involve all of the key elements from data-taking through to analysis. This allowed different records to be established in data-taking throughput, data import and export rates between the various Grid sites, and in huge numbers of analysis, simulation and reprocessing jobs. The ATLAS experiment ran close to 1 million analysis jobs and achieved 6 GB/s of Grid traffic – the equivalent of a DVD’s worth of data a second, sustained over long periods. This result coincides with the transition of Grids into long-term sustainable e-infrastructures that will be of fundamental importance to projects with the lifetime of the LHC.

With the restart of the LHC only months away, there will be a large increase in the number of Grid users, from several hundred unique users today to several thousand when data-taking and analysis commence. This will happen only through significant streamlining of operations and the simplification of end-users’ interaction with the Grid. STEP ’09 involved massive-scale testing of end-user analysis scenarios, including “community-support” infrastructures, whereby the community is trained and enabled to be largely self-supporting, backed a core of by Grid and application experts.

The post STEP ’09 sets new records around the world appeared first on CERN Courier.

CERN openlab enters phase three

cern — Wed, 15 Jul 2009 08:00:00 +0000

On 2–3 April CERN’s director-general, Rolf Heuer, officially launched the third phase of the CERN openlab at the 2009 annual meeting of the CERN openlab Board of Sponsors. During his introductory speech Heuer stressed the importance of collaborating with industry and building closer relationships with other key institutes, as well as the European Commission. The board meeting provided an opportunity for partner companies (HP, Intel, Oracle and Siemens), a contributor (EDS, an HP company) and CERN to present the key achievements obtained during openlab-II and the expectations for openlab-III.

Each phase of CERN openlab corresponds to a three-year period. In openlab-I (2003–2005) the focus was on the development of an advanced prototype called opencluster. CERN openlab-II (2006–2008) addressed a range of domains from platforms, databases and the Grid to security and networking, with HP, Intel and Oracle as partners and EDS, an HP company, as a contributor. Disseminating the expertise and knowledge has also been a key focus of openlab. Regular training sessions have taken place and activities include openlab contributions to the CERN School of Computing and the CERN openlab Summer Student Programme, with its specialized lectures.

With the start of the third phase of CERN openlab, new projects have already been initiated with the partners. These are structured into four Competence Centres (CC): Automation and Controls CC; Database CC; Networking CC; and Platform CC. Through the Automation and Controls CC, CERN, Siemens and ETM Professional Control (a subsidiary of Siemens) are collaborating on security, as well as the move of automation tools towards software engineering and handling of large environments. In partnership with Oracle, the Database CC focuses on items such as data distribution and replication, monitoring and infrastructure management, highly available database services and application design, as well as automatic failover and standby databases.

One focus of the Networking CC is a research project launched by CERN and HP ProCurve to understand the behaviour of large computer networks (with 10,000 nodes or more) in high-performance computing or large campus installations. Another activity involves the grid-monitoring and messaging projects carried out in collaboration with EDS, an HP company. The Platform CC project focuses on PC-based computing hardware and the related software. In collaboration with Intel it addresses important fields such as thermal optimization, application tuning and benchmarking. It also has a strong emphasis on teaching. During the third phase, the team will not only capitalize on and extend the successful work carried out in openlab-II, but it will also tackle crucial new areas. Additional team members have recently joined and the structure is now in place to collaborate and work on bringing these projects to fruition.

The openlab team consists of three complementary groups of people: the young engineers hired by CERN and funded by the partners (21 people over the past eight years); technical experts from partner companies involved in the openlab projects; and CERN management and technical experts working partly or fully on the joint activities. The people involved are not concentrated in a single group at CERN. They span many different units in the IT department, as well as the Industrial Controls and Electronics Group in the engineering department, since the arrival of Siemens as an openlab partner. The distributed team structure permits close collaboration with computing experts in the LHC experiments, as well as with engineers and scientists from the various openlab partners who contribute greatly to these activities. In addition, significant contributions are made by the students participating in the CERN openlab Summer Student Programme, both directly to the openlab activities and more widely to the Worldwide LHC Computing Grid, the Enabling Grids for E-sciencE project and other Grid- and CERN-related activities in the IT Department. Since the inception of openlab, more than 100 young computer scientists have participated in the programme, where they spend two months at CERN. This summer the programme will be welcoming 14 students of 11 different nationalities.

• The activities carried out from May 2008 to May 2009 are presented in the eighth CERN openlab annual report available from the CERN openlab web site at www.cern.ch/openlab.

The post CERN openlab enters phase three appeared first on CERN Courier.

CHEP ’09: clouds, data, Grids and the LHC

cern — Wed, 15 Jul 2009 08:00:00 +0000

' data-caption='The old city of Prague blends with its modern aspects to provide a suitable setting for CHEP ’09. (Photos courtesy CHEP ’09 organizers.'>'>

The old city of Prague blends with its modern aspects to provide a suitable setting for CHEP ’09. (Photos courtesy CHEP ’09 organizers.

The CHEP series of conferences is held every 18 months and covers the wide field of computing in high-energy and nuclear physics. CHEP ’09, the 17th in the series, was held in Prague on 21–27 March and attracted 615 attendees from 41 countries. It was co-organized by the Czech academic-network operator CESNET, Charles University in Prague (Faculty of Mathematics and Physics), the Czech Technical University, and the Institute of Physics and the Nuclear Physics Institute of the Czech Academy of Sciences. Throughout the week some 500 papers and posters were presented. As usual, given the CHEP tradition of devoting the morning sessions to plenary talks and limiting the number of afternoon parallel sessions to six or seven, the organizers found themselves short of capacity for oral presentations. They received 500 offers for the 200 programme slots, so the remainder were shown as posters, split into three full-day sessions of around 100 each day. The morning coffee break was extended specifically to allow time to browse the posters and discuss with the poster authors.

A large number of the presentations related to some aspect of computing for the up-coming LHC experiments but there was also a healthy number of contributions from experiments elsewhere in the world, including Brookhaven National Laboratory, Fermilab and SLAC (where BaBar is still analysing its data although the experiment has stopped data-taking) in the US, KEK in Japan and DESY in Germany.

Data and performance

' data-caption='Sergio Bertolucci, CERN’s director for research and scientific computing, opens the conference with a talk on the LHC and its experiments.'>'>

Sergio Bertolucci, CERN’s director for research and scientific computing, opens the conference with a talk on the LHC and its experiments.

The conference was preceded by a Worldwide LHC Computing Grid (WLCG) Workshop, summarized at CHEP ’09 by Harry Renshall from CERN. There was a good mixture of Tier 0, T1 and T2 representatives in the total of the 228 people present at the workshop, which began with a review of each of the LHC experiment’s plans. All of these include more stress-testing in some form or other before the restart of the LHC. The transition to the European Grid Initiative from the Enabling Grids for E-sciencE project is clearly an issue, as is the lack of a winter shutdown in the LHC plans. There was discussion on whether or not there should be a new “Computing Challenge”, to test the readiness of the WLCG. The eventual decision was “yes”, but to rename it STEP ’09 (Scale Testing for the Experimental Programme), schedule it for May or June 2009 and concentrate on tape recall and event processing. The workshop concluded that ongoing emphasis should be put on stability, preparing for a 44-week run and continuing the good work that has now started on data analysis.

Sergio Bertolucci, CERN’s director for research and scientific computing, gave the opening talk of the conference. He reviewed the LHC start-up and initial running, the steps being taken for the repairs following the incident of 19 September 2008 as well as to avoid any repetition, and the plans for the restart. He compared the work currently being done at Fermilab, and how CERN will learn from this in the search for the Higgs boson. Les Robertson of CERN, who led the WLCG project through the first six years, discussed how we got here and what will come next. A very simple Grid was first presented at CHEP in Padova in 2000, leading Robertson to label the 2000s as the decade of the Grid. Thanks to the development and adoption of standards, Grids have now developed and matured, with an increasing number of sciences and industrial applications making use of them. However, Robertson thinks that we should be looking at locating Grid centres where energy is cheap, using virtualization to share processing power better, and starting to look at “clouds”: what are they in comparison to Grids?

The theme of using clouds, which enable access to leased computing power and storage capacity, came up several times in the meeting. For example, the Belle experiment at KEK is experimenting with the use of clouds for Monte Carlo simulations in its planning for SuperBelle; and the STAR experiment at Brookhaven is also considering clouds for Monte Carlo production. Another of Robertson’s suggestions for future work, “virtualization”, was also one of the most common topics in terms of contributions throughout the week, with different uses cropping up time and again in the conference’s various streams.

Other notable plenary talks included those by Neil Geddes, Kors Bos and Ruth Pordes. Geddes, of the UK Science and Technology Facilities Council Rutherford Appleton Laboratory, asked “can WLCG deliver?” He deduced that it can, and in fact does, but that there are many challenges still to face. Bos, of Nikhef and the ATLAS collaboration, compared the different computing approaches across the LHC experiments, pointing out similarities and contrasts. Femilab’s Pordes, who is executive director of the Open Science Grid, described work in the US on evolving Grids to make them easier to use and more accessible to a wider audience of researchers and scientists.

The conference had a number of commercial sponsors, in particular IBM, Intel and Sun Microsystems, and part of Wednesday morning was devoted to speakers from these corporations. IBM used its slot to describe a machine that aims to offer cooler, denser and more efficient computing power. Intel focused on its effort to get more computing for less energy, making note of work done under the openlab partnership with CERN (CERN openlab enters phase three). The company hopes to address this partially by increasing computing-energy efficiency (denser packaging, more cores, more parallelism etc) because it realizes that power is constraining growth in every part of computing. The speaker from Sun presented ideas on building state-of-the-art data centres. He claimed that raised floors are dead and instead proposed “containers” or a similar “pod architecture” with built-in cooling and a modular structure connected to overhead, hot-pluggable busways. Another issue is to build “green” centres and he cited solar farms in Abu Dhabi as well as a scheme to use free ocean-cooling for floating ship-based computing centres.

It impossible to summarize in a short report the seven streams of material presented in the afternoon sessions but some highlights deserve to be mentioned. The CERN-developed Indico conference tool was presented with statistics showing that it has been adopted by more than 40 institutes and manages material for an impressive 60,000 workshops, conferences and meetings. The 44 Grid middleware talks and 76 poster presentations can be summarized as follows: production Grids are here; Grid middleware is usable and is being used; standards are evolving but have a long way to go; and the use of network bandwidth is keeping pace with technology. From the stream of talks on distributed processing and analysis, the clear message is that much work has been done on user-analysis tools since the last CHEP, with some commonalities between the LHC experiments. Data-management and access protocols for analysis are a major concern and the storage fabric is expected to be stressed when the LHC starts running.

Dario Barberis of Genova/INFN and ATLAS presented the conference summary. He had searched for the most common words in the 500 submitted abstracts and the winner was “data”, sometimes linked with “access”, “management” or “analysis”. He noted that users want simple access to data, so the computing community needs to provide easy-to-use tools that hide the complexity of the Grid. Of course “Grid” was another of the most common words, but the word “cloud” did not appear in the top 100 although clouds were much discussed in plenary and parallel talks. For Barberis, a major theme was “performance” – at all levels, from individual software codes to global Grid performance. He felt that networking is a neglected but important topic (for example the famous digital divide and end-to-end access times). His conclusion was that performance will be a major area of work in the future as well as the major topic at the next CHEP in Taipei, on 17–22 October 2010.

The post CHEP ’09: clouds, data, Grids and the LHC appeared first on CERN Courier.

Study group considers how to preserve data

cern — Wed, 29 Apr 2009 10:00:00 +0000

A simulated event in the JADE detector, generated using a refined Monte Carlo program and reconstructed using revitalized software more than 10 years after the end of the experiment. Image credit: Siggi Bethke.

High-energy-physics experiments collect data over long time periods, while the associated collaborations of experimentalists exploit these data to produce their physics publications. The scientific potential of an experiment is in principle defined and exhausted within the lifetime of such collaborations. However, the continuous improvement in areas of theory, experiment and simulation – as well as the advent of new ideas or unexpected discoveries – may reveal the need to re-analyse old data. Examples of such analyses already exist and they are likely to become more frequent in the future. As experimental complexity and the associated costs continue to increase, many present-day experiments, especially those based at colliders, will provide unique data sets that are unlikely to be improved upon in the short term. The close of the current decade will see the end of data-taking at several large experiments and scientists are now confronted with the question of how to preserve the scientific heritage of this valuable pool of acquired data.

To address this specific issue in a systematic way, the Study Group on Data Preservation and Long Term Analysis in High Energy Physics formed at the end of 2008. Its aim is to clarify the objectives and the means of preserving data in high-energy physics. The collider experiments BaBar, Belle, BES-III, CLEO, CDF, D0, H1 and ZEUS, as well as the associated computing centres at SLAC, KEK, the Institute of High Energy Physics in Beijing, Fermilab and DESY, are all represented, together with CERN, in the group’s steering committee.

Digital gold mine

The group’s inaugural workshop took place on 26–28 January at DESY, Hamburg. To form a quantitative view of the data landscape in high-energy physics, each of the participating experimental collaborations presented their computing models to the workshop, including the applicability and adaptability of the models to long-term analysis. Not surprisingly, the data models are similar – reflecting the nature of colliding-beam experiments.

The data are organized by events, with increasing levels of abstraction from raw detector-level quantities to N-tuple-like data for physics analysis. They are supported by large samples of simulated Monte Carlo events. The software is organized in a similar manner, with a more conservative part for reconstruction to reflect the complexity of the hardware and a more dynamic part closer to the analysis level. Data analysis is in most cases done in C++ using the ROOT analysis environment and is mainly performed on local computing farms. Monte Carlo simulation also uses a farm-based approach but it is striking to see how popular the Grid is for the mass-production of simulated events. The amount of data that should be preserved for analysis varies between 0.5 PB and 10 PB for each experiment, which is not huge by today’s standards but nonetheless a large amount. The degree of preparation for long-term data varies between experiments but it is obvious that no preparation was foreseen at an early stage of the programs; any conservation initiatives will take place in parallel with the end of the data analysis.

The main issue will be the communication between the experimental collaborations and the computing centres after final analyses

From a long-term perspective, digital data are widely recognized as fragile objects. Speakers from a few notable computing centres – including Fabio Hernandez of the Centre de Calcul de l’Institut, National de Physique Nucléaire et de Physique des Particules, Stephen Wolbers of Fermilab, Martin Gasthuber of DESY and Erik Mattias Wadenstein of the Nordic DataGrid Facility – showed that storage technology should not pose problems with respect to the amount of data under discussion. Instead, the main issue will be the communication between the experimental collaborations and the computing centres after final analyses and/or the collaborations where roles have not been clearly defined in the past. The current preservation model, where the data are simply saved on tapes, runs the risk that the data will disappear into cupboards while the read-out hardware may be lost, become impractical or obsolete. It is important to define a clear protocol for data preservation, the items of which should be transparent enough to ensure that the digital content of an experiment (data and software) remains accessible.

Participants of the first workshop on data preservation and long-term analysis in high-energy physics at DESY, Hamburg. Image credit: DESY.

On the software side, the most popular analysis framework is ROOT, the object-oriented software and library that was originally developed at CERN. This offers many possibilities for storing and documenting high-energy-physics data and has the advantage of a large existing user community and a long-term commitment for support, as CERN’s René Brun explained at the workshop. One example of software dependence is the use of inherited libraries (e.g. CERNLIB or GEANT3), and of commercial software and/or packages that are no longer officially maintained but remain crucial to most running experiments. It would be an advantageous first step towards long-term stability of any analysis framework if such vulnerabilities could be removed from the software model of the experiments. Modern techniques of software emulation, such as virtualization, may also offer promising features, as Yves Kemp of DESY explained. Exploring such solutions should be part of future investigations.

Examples of previous experience with data from old experiments show clearly that a complete re-analysis has only been possible when all of the ingredients could be accounted for. Siggi Bethke of the Max Planck Institute of Physics in Munich showed how a re-analysis of data from the JADE experiment (1979–1986), using refined theoretical input and a better simulation, led to a significant improvement in the determination of the strong coupling-constant as a function of energy. While the usual statement is that higher-energy experiments replace older, low-energy ones, this example shows that measurements at lower energies can play a unique role in a global physical picture.

The experience at the Large Electron-Positron (LEP) collider, which Peter Igo-Kemenes, André Holzner and Matthias Schroeder of CERN described, suggested once more that the definition of the preserved data should definitely include all of the tools necessary to retrieve and understand the information so as to be able to use it for new future analyses. The general status of the LEP data is of concern, and the recovery of the information – to cross-check a signal of new physics, for example – may become impossible within a few years if no effort is made to define a consistent and clear stewardship of the data. This demonstrates that both early preparation and sufficient resources are vital in maintaining the capability to reinvestigate older data samples.

The next-generation publications database, INSPIRE, offers extended data-storage capabilities that could be used immediately to enhance public or private information related to scientific articles

The modus operandi in high-energy physics can also profit from the rich experience accumulated in other fields. Fabio Pasian of Trieste told the workshop how the European Virtual Observatory project has developed a framework for common data storage of astrophysical measurements. More general initiatives to investigate the persistency of digital data also exist and provide useful hints as to the critical points in the organization of such projects.

There is also an increasing awareness in funding agencies regarding the preservation of scientific data, as David Corney of the UK’s Science and Technology Facilities Council, Salvatore Mele of CERN and Amber Boehnlein of the US Department of Energy described. In particular, the Alliance for Permanent Access and the EU-funded project in Framework Programme 7 on the Permanent Access to the Records of Science in Europe recently conducted a survey of the high-energy-physics community, which found that the majority of scientists strongly support the preservation of high-energy-physics data. One important aspect that was also positively appreciated in the survey answers was the question of open access to the data in conjunction with the organizational and technical matters, an issue that deserves careful consideration. The next-generation publications database, INSPIRE, offers extended data-storage capabilities that could be used immediately to enhance public or private information related to scientific articles, including tables, macros, explanatory notes and potentially even analysis software and data, as Travis Brooks of SLAC explained.

While this first workshop compiled a great deal of information, the work to synthesize it remains to be completed and further input in many areas is still needed. In addition, the raison d’être for data preservation should be clearly and convincingly formulated, together with a viable economic model. All high-energy-physics experiments have the capability of taking some concrete action now to propose models for data preservation. A survey of technology is also important, because one of the crucial factors may indeed be the evolution of hardware. Moreover, the whole process must be supervised by well defined structures and steered by clear specifications that are endorsed by the major laboratories and computing centres. A second workshop is planned to take place at SLAC in summer 2009 with the aim of producing a preliminary report for further reference, so that the “future of the past” will become clearer in high-energy physics.

The post Study group considers how to preserve data appeared first on CERN Courier.

Happy 20th birthday, World Wide Web

cern — Wed, 29 Apr 2009 10:00:00 +0000

Tim Berners-Lee is reunited with the historic NeXT computer at CERN during celebrations for the 20th anniversary of the web in 2009 . He used this computer to develop and run the first Web server, multimedia browser and Web editor. Credit: CERN

In March 1989 Tim Berners-Lee, a physicist at CERN, handed a document entitled “Information management: a proposal” to his group leader Mike Sendall. “Vague, but exciting”, were the words that Sendall wrote on the proposal, allowing Berners-Lee to continue with the project. Both were unaware that it would evolve into one of the most important communication tools ever created.

The first page of Berners-Lee’s 1989 proposal for the World Wide Web. Credit: CERN

Berners-Lee returned to CERN on 13 March this year to celebrate the 20th anniversary of the birth of the World Wide Web. He was joined by several web pioneers, including Robert Cailliau and Jean-François Groff, who worked with Berners-Lee in the early days of the project, and Ben Segal, the person who brought the internet to CERN. In between reminiscing about life at CERN and the early years of the web, the four gave a demonstration of the first ever web browser running on the very same NeXT computer on which Berners-Lee wrote the original browser and server software.

The event was not only about the history of the web; it also included a short keynote speech from Berners-Lee, which was followed by a panel discussion on the future of the web. The panel members were contemporary experts who Berners-Lee believes are currently working with the web in an exciting way.

Berners-Lee’s original 1989 proposal showed how information could easily be transferred over the internet by using hypertext, the now familiar point-and-click system of navigating through information pages. The following year, Cailliau, a systems engineer, joined the project and soon became its number-one advocate.

The birth of the web
Berners-Lee’s idea was to bring together hypertext with the internet and personal computers, thereby having a single information network to help CERN physicists to share all of the computer-stored information not only at the laboratory but around the world. Hypertext would enable users to browse easily between documents on web pages that use links. Berners-Lee went on to produce a browser-editor with the goal of developing a tool to make a creative space to share and edit information and build a common hypertext. What should they call this new browser? “The Mine of Information”? “The Information Mesh”? When they settled on a name in May 1990 – before even the first piece of code had been written – it was Tim who suggested “the World Wide Web”, or “WWW”.

Development work began in earnest using NeXT computers delivered to CERN in September 1990. Info.cern.ch was the address of the world’s first web site and web server, which was running on one NeXT computer by Christmas of 1990. The first web-page address was http://info.cern.ch/hypertext/WWW/TheProject.html, which gave information about the WWW project. Visitors to the pages could learn more about hypertext, technical details for creating their own web page and an explanation on how to search the web for information.

Although the web began as a tool to aid particle physicists, today it is used in countless ways by the global community

To allow the web to extend, Berners-Lee’s team needed to distribute server and browser software. The NeXT systems, however, were far more advanced than the computers that many other people had at their disposal, so they set to work on a far less sophisticated piece of software for distribution. By the spring of 1991, testing was under way on a universal line-mode browser, created by Nicola Pellow, a technical student. The browser was designed to run on any computer or terminal and worked using a simple menu with numbers to provide the links. There was no mouse and no graphics, just plain text, but it allowed anyone with an internet connection to access the information on the web.

The panel discussion, “The Future of the Web”, with left to right: Chris Bizer, Tom Scott, Tim Berners-Lee, Dan Brickley and Stephane Boyera. Credit: CERN

Servers began to appear in other institutions across Europe throughout 1991 and by December the first server outside the continent was installed in the US at the Stanford Linear Accelerator Center (SLAC). By November 1992 there were 26 servers in the world and by October 1993 the number had increased to more than 200 known web servers. In February 1993 the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign released the first version of Mosaic, which made the web easily available to ordinary PC and Macintosh computers.

The rest, as they say, is history. Although the web began as a tool to aid particle physicists, today it is used in countless ways by the global community. Today the primary purpose of household computers is not to compute but “to go on the web”.

Berners-Lee left CERN in 1994 to run the World Wide Web Consortium (W3C) at the Massachusetts Institute of Technology and help to develop guidelines to ensure long-term growth of the web. So what predictions do Berners-Lee and the W3C have for the future of the web? What might it look like at the age of 30?

In his talk at the WWW@20 celebrations Berners-Lee outlined his hopes and expectations for the future: “There are currently roughly the same number of web pages as there are neurons in the human brain”. The difference, he went on to say, is that the number of web pages increases as the web grows older.

Robert Cailliau talks of the early days of the web at CERN and the key role played by the late Mike Sendall. Credit: CERN

One important future development is the “Semantic Web” – a place where machines can do all of the tedious work. The concept is to create a web where machines can interpret pages like humans. It will be a “move from using a search engine to an answer engine,” explains Christian Bizer of the web-based system groups at Freie Universität Berlin. “When I search the web I don’t want to find documents, I want to find answers to my questions!” he says. If a search engine can understand a web page then it can pick out the exact answer to a question, rather than simply presenting you with a list of web pages.

As Berners-Lee put it: “The Semantic Web is a web of data. There is a lot of data that we all use every day, and it’s not part of the web. For example, I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar, but can I see my photos in a calendar to see what I was doing when I took them? Can I see bank-statement lines in a calendar? Why not? Because we don’t have a web of data. Because data is controlled by applications, and each application keeps it to itself.”

“Device independence” is a move towards a greater variety of equipment that can connect to the web. Only a few years ago, virtually the only way to access the web was through a PC or workstation. Now, mobile handsets, smart phones, PDAs, interactive television systems, voice-response systems, kiosks and even some domestic appliances can access the web.

The mobile web is one of the fastest-developing areas of web use. Already, more global web browsing is done on hand-held devices, like mobile phones, than on laptops. It is especially important in developing countries, where landlines and broadband are still rare. For example, African fishermen are using the web on old mobile phones to check the market price of fish to make sure that they arrive at the best port to sell their daily catch. The W3C is trying to create standards for browsing the web on phones and to encourage people to make the web more accessible to everyone in the world.

• The full-length webcast of the WWW@20 event is available at http://cdsweb.cern.ch/record/1167328?ln=en.

The post Happy 20th birthday, World Wide Web appeared first on CERN Courier.

The age of citizen cyberscience

cern — Wed, 29 Apr 2009 10:00:00 +0000

I first met Rytis Slatkevicius in 2006, when he was 18. At the time, he had assembled the world’s largest database of prime numbers. He had done this by harnessing the spare processing power of computers belonging to thousands of prime-number enthusiasts, using the internet.

Distribution of some of the volunteers contributing to the project MalariaControl.net, which simulates the spread of malaria in Africa. More than 25,000 have so far contributed computing power to the project.
Image credit: Swiss Tropical Institute.

Today, Rytis is a mild-mannered MBA student by day and an avid prime-number sleuth by night. His project, called PrimeGrid, is tackling a host of numerical challenges, such as finding the longest arithmetic progression of prime numbers (the current record is 25). Professional mathematicians now eagerly collaborate with Rytis, to analyse the gems that his volunteers dig up. Yet he funds his project by selling PrimeGrid mugs and t-shirts. In short, Rytis and his online volunteers are a web-enabled version of a venerable tradition: they are citizen scientists.

There are nearly 100 science projects using such volunteer computing. Like PrimeGrid, most are based on an open-source software platform called BOINC. Many address topical themes, such as modelling climate change (ClimatePrediction.net), developing drugs for AIDS (FightAids@home), or simulating the spread of malaria (MalariaControl.net).

Fundamental science projects are also well represented. Einstein@Home analyses data from gravitational wave detectors, MilkyWay@Home simulates galactic evolution, and LHC@home studies accelerator beam dynamics. Each of these projects has easily attracted tens of thousands of volunteers.

Just what motivates people to participate in projects like these? One reason is community. BOINC provides enthusiastic volunteers with message boards to chat with each other, and share information about the science behind the project. This is strikingly similar to the sort of social networking that happens on websites such as Facebook, but with a scientific twist.

Another incentive is BOINC’s credit system, which measures how much processing each volunteer has done – turning the project into an online game where they can compete as individuals or in teams. Again, there are obvious analogies with popular online games such as Second Life.

Brains vs processors

A new wave of online science projects, which can be described as volunteer thinking, takes the idea of participative science to a higher level. A popular example is the project GalaxyZoo, where volunteers can classify images of galaxies from the Sloan Digital Sky Survey as either elliptical or spiral, via a simple web interface. In a matter of months, some 100,000 volunteers classified more than 1 million galaxies. People do this sort of pattern recognition more accurately than any computer algorithm. And by asking many volunteers to classify the same image, their statistical average proves to be more accurate than even a professional astronomer.

When I mentioned this project to a seasoned high-energy physicist, he remarked wistfully, “Ah, yes, reminds me of the scanning girls”. High-energy physics data analysis used to involve teams of young women manually analysing particle tracks. But these were salaried workers who required office space. Volunteer thinking expands this kind of assistance to millions of enthusiasts on the web at no cost.

Going one step farther in interactivity, the project Foldit is an online game that scores a player’s ability to fold a protein molecule into a minimal-energy structure. Through a nifty web interface, players can shake, wiggle and stretch different parts of the molecule. Again, people are often much faster at this task than computers, because of their aptitude to reason in three dimensions. And the best protein folders are usually teenage gaming enthusiasts rather than trained biochemists.

Who can benefit from this web-based boom in citizen science? In my view, scientists in the developing world stand to gain most by effectively plugging in to philanthropic resources: the computers and brains of supportive citizens, primarily those in industrialized countries with the necessary equipment and leisure time. A project called Africa@home, which I’ve been involved in, has trained dozens of African scientists to use BOINC. Some are already developing new volunteer-thinking projects, and a first African BOINC server is running at the University of Cape Town.

A new initiative called Asia@home was launched last month with a workshop at Academia Sinica in Taipei and a seminar at the Institute of High Energy Physics in Beijing, to drum up interest in that region. Asia represents an enormous potential, in terms of both the numbers of people with internet access (more Chinese are now online than Americans) and the high levels of education and interest in science.

To encourage such initiatives further, CERN, the United Nations Institute for Training and Research and the University of Geneva are planning to establish a Citizen Cyberscience Centre. This will help disseminate volunteer computing in the developing world and encourage new technical approaches. For example, as mobile phones become more powerful they, too, can surely be harnessed. There are about one billion internet connections on the planet and three billion mobile phones. That represents a huge opportunity for citizen science.

The post The age of citizen cyberscience appeared first on CERN Courier.

High-energy physics team sets data-transfer world records

cern — Tue, 27 Jan 2009 01:00:00 +0000

An international team led by the California Institute of Technology (Caltech), with partners from Michigan, Florida, Tennessee, Fermilab, Brookhaven, CERN, Brazil, Estonia, Korea and Pakistan, set new world records for sustained data transfer among storage systems during the successful SuperComputing 2008 (SC08) conference held in Austin, Texas, in November.

Caltech’s exhibit at SC08 by the High-Energy Physics (HEP) group and the Center for Advanced Computing Research (CACR) demonstrated new applications and systems for globally distributed data analysis for the LHC at CERN, together with Caltech’s global monitoring system, MonALISA, and its collaboration system, Enabling Virtual Organizations. A highlight of the exhibit was the HEP team’s record-breaking demonstration of storage-to-storage data transfers. This achieved a bidirectional peak throughput of 114 Gbit/s and a sustained data flow of more than 110 Gbit/s among clusters of servers on the show floor and at Caltech, Michigan, CERN, Fermilab, Brazil (Rio de Janiero, São Paulo), Korea, Estonia and locations in the US LHCNet network in Chicago, New York, Geneva and Amsterdam.

A sample of the rates flowing in and out of servers at SC08, monitored by MonaLISA. The general smoothness results from using FDT.

The team used a small fraction of the global LHC Grid to transfer data between the Tier-1, Tier-2 and Tier-3 facilities at the partners’ sites and between a Tier-2-scale computing and storage facility that the HEP and CACR team had constructed on the exhibit floor in fewer than two days. Rates of more than 40 Gbit/s were sustained in both directions for several hours (and up to 71 Gbit/s in one direction). One of the key elements in this demonstration was Fast Data Transfer (FDT), an open-source Java application based on TCP developed by the Caltech team in close collaboration with colleagues at Politechnica Bucharest. FDT works dynamically with Caltech’s MonALISA system to monitor the capability of the storage systems, as well as the network path, in real time. It also sends data out to the network at a rate that is matched to the capacity of long-range network paths.

A second major milestone was achieved by the HEP team working together with Ciena Corporation, which had just completed its first OTU4-standard optical link carrying a 100 Gbit/s payload over a single wavelength with forward-error correction. The teams used a fibre-optic cable with 10 fibre-pairs to link their neighbouring booths together; Ciena’s system to multiplex and demultiplex ten 10 Gbit/s links onto the single OTU4 wavelength running on an 80 km fibre loop; and some of the Caltech nodes used in setting the wide-area network records, together with FDT. Thanks to the system’s high throughput capabilities and the error-free links between the booths, the teams managed to achieve a maximum of 199.90 Gbit/s bidirectionally (memory-to-memory) within minutes, and an average of 191 Gbit/s during a 12 hour period that logged the transmission of 1.02 PB overnight.

The post High-energy physics team sets data-transfer world records appeared first on CERN Courier.

Scale-free Networks: Complex Webs in Nature and Technology

cern — Mon, 20 Oct 2008 08:09:48 +0000

By Guido Caldarelli, Oxford University Press. Hardback ISBN 9780199211517, £49.95 ($115).

This book presents the experimental evidence for scale-free networks and provides students and researchers with theoretical results and algorithms to analyse and understand these features. A variety of different social, natural and technological systems – from the Internet to food webs and boards of company directors – can be described by the same mathematical framework. In all these situations a graph of the elements of the system and their interconnections displays a universal feature: there are few elements with many connections, and many elements with few connections. The content and exposition make this a useful textbook for beginners, as well as a reference book for experts in a variety of disciplines.

The post Scale-free Networks: Complex Webs in Nature and Technology appeared first on CERN Courier.

Principles of Quantum Computation and Information: Volume II: Basic Tools and Special Topics

cern — Mon, 20 Oct 2008 08:09:48 +0000

by Giuliano Benenti, Giulio Casati and Giuliano Strini, World Scientific. Hardback ISBN 9789812563453 £33 ($58). Paperback ISBN 9789812565280 £22 ($38).

Quantum computation and information is a new, rapidly developing interdisciplinary field. Building on the basic concepts introduced in Volume I, this second volume deals with various important aspects, both theoretical and experimental, of quantum computation and information in depth. The areas include quantum data compression, accessible information, entanglement concentration, limits to quantum computation due to decoherence, quantum error-correction, and the first experimental implementations of quantum information protocols. This volume also includes a selection of special topics, including quantum trajectories, quantum computation and quantum chaos, and the Zeno effect.

The post Principles of Quantum Computation and Information: Volume II: Basic Tools and Special Topics appeared first on CERN Courier.

LHC computing: Milestones (archive)

cern — Fri, 19 Sep 2008 09:34:16 +0000

The Grid gets EU funds

Plans for the next generation of network-based information-handling systems took a major step forward when the European Union’s Fifth Framework Information Society Technologies programme concluded negotiations to fund the Data Grid research and development project. The project was submitted to the EU by a consortium of 21 bodies involved in a variety of sciences, from high-energy physics to Earth observation and biology, as well as computer sciences and industry. CERN is the leading and coordinating partner in the project.

Starting from this year, the Data Grid project will receive in excess of €9.8 million for three years to develop middleware (software) to deploy applications on widely distributed computing systems. In addition to receiving EU support, the enterprise is being substantially underwritten by funding agencies from a number of CERN’s member states. Due to the large volume of data that it will produce, CERN’s LHC collider will be an important component of the Data Grid.

As far as CERN is concerned, this programme of work will integrate well into the computing testbed activity that is already planned for the LHC. Indeed, the model for the distributed computing architecture that Data Grid will implement is largely based on the results of the MONARC (Models of Networked Analysis at Regional Centres for LHC experiments) project.

The work that the project will involve has been divided into numbered subsections, or “work packages” (WP). CERN’s main contribution will be to three of these work packages: WP 2, dedicated to data management and data replication; WP 4, which will look at computing-fabric management; and WP 8, which will deal with high-energy physics applications. Most of the resources for WP 8 will come from the four major LHC experimental collaborations: ATLAS, CMS, ALICE and LHCb.

• March 2001 p5 (abridged).

The Gigabyte System Network

To mark the major international Telecom ’99 exhibition in Geneva, CERN staged a demonstration of the world’s fastest computer-networking standard, the Gigabyte System Network. This is a new networking standard developed by the High-Performance Networking Forum, which is a worldwide collaboration between industry and academia. Telecom ’99 delegates came to CERN to see the new standard in action.

GSN is the first networking standard capable of handling the enormous data rates expected from the LHC experiments. It has a capacity of 800 Mbyte/s (that’s getting on for a full-length feature film), making it attractive beyond the realms of scientific research. Internet service providers, for example, expect to require these data rates to supply high-quality multimedia across the Internet within a few years. Today, however, most home network users have to be content with 5 kbyte/s, or about a single frame. Even CERN, one of Europe’s largest networking centres, currently has a total external capacity of only 22 Mbyte/s.

• November 1999 p10 (abridged).

Approval for Grid project for LHC computing

The first phase of the impressive Computing Grid project for CERN’s LHC was approved at a special meeting of CERN’s Council, its governing body, on 20 September.

Particle-physics experiments lead the demand for more computing power. The LHC experiments will yield huge increases in data rate and/or event size.

• October 2001 p32 (extract).

After LHC commissioning, the collider’s four giant detectors will be accumulating more than 10 million Gbytes of particle-collision data each year (equivalent to the contents of about 20 million CD-ROMs). To handle this will require a thousand times the computing power available at CERN today.

Nearly 10 000 scientists, at hundreds of universities round the world, will group in virtual communities to analyse this LHC data. The strategy relies on the coordinated deployment of communications technologies at hundreds of institutes via an intricately interconnected worldwide grid of tens of thousands of computers and storage devices.

The LHC Computing Grid project will proceed in two phases. Phase 1, to be activated in 2002 and continuing in 2003 and 2004, will develop the prototype equipment and techniques necessary for the data-intensive scientific computing of the LHC era. In 2005, 2006 and 2007, Phase 2 of the project, which will build on the experience gained in the first phase, will construct the production version of the LHC Computing Grid.

Phase 1 will require an investment at CERN of SFr30_million (some €20 million) which will come from contributions from CERN’s member states and major involvement of industrial sponsors. More than 50 positions for young professionals will be created. Significant investments are also being made by participants in the LHC programme, particularly in the US and Japan, as well as Europe.

• November 2001 p5 (abridged).

The post LHC computing: Milestones (archive) appeared first on CERN Courier.

LHC computing: Switching on to the Grid (archive)

cern — Fri, 19 Sep 2008 09:33:13 +0000

A grid infrastructure for one country, with one Tier 1 centre, and several Tier 2 regional centres. Tier 3 centres are at university level, Tier 4 centres are inside research departments.

When CERN’s LHC collider begins operation, it will be the most powerful machine of its type in the world, providing research facilities for thousands of researchers from all over the globe.

The computing capacity required for analysing the data generated by these big LHC experiments will be several orders of magnitude greater than that used by current experiments at CERN, itself already substantial. Satisfying this vast data-processing appetite will require the integrated use of computing facilities installed at several research centres across Europe, the US and Asia.

During the last two years the Models of Networked Analysis at Regional Centres for LHC Experiments (MONARC) project, supported by a number of institutes participating in the LHC programme, has been developing and evaluating models for LHC computing. MONARC has also developed tools for simulating the behaviour of such models when implemented in a wide-area distributed computing environment. This requirement arrived on the scene at the same time as a growing awareness that major new projects in science and technology need matching computer support and access to resources worldwide.

In the 1970s and 1980s the Internet grew up as a network of computer networks, each established to service specific communities and each with a heavy commitment to data processing.

In the late 1980s the World Wide Web was invented at CERN to enable particle physicists scattered all over the globe to access information and participate actively in their research projects directly from their home institutes. The amazing synergy of the Internet, the boom in personal computing and the growth of the Web grips the whole world in today’s dot.com lifestyle.

Internet, Web, what next?

However, the Web is not the end of the line. New thinking for the millennium, summarized in a milestone book entitled The Grid by Ian Foster of Argonne and Carl Kesselman of the Information Sciences Institute of the University of Southern California, aims to develop new software (“middleware”) to handle computations spanning widely distributed computational and information resources – from supercomputers to individual PCs.

Just as a grid for electric power supply brings watts to the wallplug in a way that is completely transparent to the end user, so the new data Grid will do the same for information.

Each of the major LHC experiments – ATLAS, CMS and ALICE – is estimated to require computer power equivalent to 40,000 of today’s PCs. Adding LHCb to the equation gives a total equivalent of 140,000 PCs, and this is only for day 1 of the LHC.

Within about a year this demand will have grown by 30%. The demand for data storage is equally impressive, calling for some several thousand terabytes – more information than is contained in the combined telephone directories for the populations of millions of planets. With users across the globe, this represents a new challenge in distributed computing. For the LHC, each experiment will have its own central computer and data storage facilities at CERN, but these have to be integrated with regional computing centres accessed by the researchers from their home institutes.

CERN serves as Grid testbed

As a milestone en route to this panorama, an interim solution is being developed, with a central facility at CERN complemented by five or six regional centres and several smaller ones, so that computing can ultimately be carried out on a cluster in the user’s research department. To see whether this proposed model is on the right track, a testbed is to be implemented using realistic data.

Several nations have launched new Grid-oriented initiatives – in the US by NASA and the National Science Foundation, while in Europe particle physics provides a natural focus for work in, among others, the UK, France, Italy and Holland. Other areas of science, such as Earth observation and bioinformatics, are also on board. In Europe, European Commission funding is being sought to underwrite this major effort to propel computing into a new orbit.

• June 2000 pp17–18.

The post LHC computing: Switching on to the Grid (archive) appeared first on CERN Courier.

EGEE steps up a gear with a third phase of Grid infrastructure

cern — Tue, 20 May 2008 11:16:38 +0000

Enabling Grids for E-sciencE (EGEE) is the largest multidisciplinary Grid infrastructure in the world, covering research fields from particle physics to biomedicine. Now the project has begun its third phase, EGEE III.

This phase aims to expand and optimize the Grid infrastructure, which is currently used more than 150,000 times per day by scientific users. Co-funded by the European Commission, EGEE III brings together more than 120 organizations to produce a reliable and scalable computing resource available to the European and global research community. At present it consists of 250 sites in 48 countries and more than 60,000 CPUs with more than 20 petabytes of storage, available to some 8000 users 24 hours a day, seven days a week.

These figures considerably exceed the goals planned for the end of the first four years of the EGEE programme, demonstrating the enthusiasm in the scientific community for EGEE and Grid solutions. Ultimately EGEE would like to see a unified, interoperable Grid infrastructure, and with this goal in mind it is working closely with other European and worldwide Grid projects to help to define the standards necessary to make this happen.

The tools and techniques used in one discipline can often be recycled and used elsewhere, by other scientists, or even in the world of business and finance, where EGEE is employed to find new oil reserves, simulate market behaviour and map taxation policy.

EGEE will hold its next conference, EGEE ’08, in Istanbul on 22–26 September 2008. The conference will provide an opportunity for business and academic sectors to network with the EGEE communities, collaborating projects, developers and decision makers, to realize the vision of a sustainable, interoperable European Grid.

The post EGEE steps up a gear with a third phase of Grid infrastructure appeared first on CERN Courier.

Detector controls for LHC experiments

cern — Thu, 13 Mar 2008 13:58:38 +0000

Traditionally at CERN, teams on each experiment, and in some cases each subdetector, have independently developed a detector-control system (DCS) – sometimes known as “slow controls”. This was still the case for the experiments at LEP. However, several factors – the number and geographical distribution of development teams, the size and complexity of the systems, limited resources, the long lifetime (20 years) and the perceived similarity between the required systems – led to a change in philosophy. CERN and the experiments’ management jointly decided to develop, as much as possible, a common DCS for the LHC experiments. This led to the setting up in 1997 of the Joint Controls Project (JCOP) as a collaboration between the controls teams on the LHC experiments and the support groups in CERN’s information technology and physics departments.

The early emphasis in JCOP was on the difficult task of acquiring an understanding of the needs of the experiments and agreeing on common developments and activities. This was a period where disagreements were most prevalent. However, with time the collaboration improved and so did progress. Part of this early effort was to develop a common overall architecture that would become the basis of many of the later activities.

The role of JCOP

In parallel, the JCOP team undertook evaluations to assess the suitability of a number of technologies, primarily commercial ones, such as OLE for Process Control (OPC), the field buses CANBus and ProfiBus, commercial programmable logic controllers (PLCs), as well as supervisory control and data acquisition (SCADA) products. The evaluation of SCADA products eventually led to the selection of the Prozessvisualisierungs und Steuerungs System (PVSS) tool as a major building block for the DCS for experiments. The CERN Controls Board subsequently selected PVSS as the recommended SCADA system for CERN. In addition, and where suitable commercial solutions were not available, products developed at CERN were also evaluated. This led to JCOP’s adoption and support of CERN’s distributed information manager (DIM) middleware system and the SMI++ finite-state machine (FSM) toolkit. Furthermore, developments made in one experiment were also adopted by other experiments. The best example of this is the embedded local monitor board (ELMB) developed by ATLAS. This is a small, low-cost, high-density radiation-tolerant input/output card that is now used extensively in all LHC experiments, as well as in some others.

Fig. 1. The layered structure and use of the Joint Controls Project framework, indicating its basis on both commercial (e.g. PVSS) and CERN-developed products (e.g. the distributed information manager (DIM) and finite-state machine (FSM) toolkit.).

One major thrust has been the development of the so-called JCOP framework (FW) (figure 1). Based on specifications agreed with the experiments, this provides a customized layer on top of the technologies chosen, such as PVSS, SMI++ and DIM. It offers many ready-to-use components for the control and monitoring of standard devices in the experiments (e.g. CAEN high voltage, Wiener and CAEN low voltage, the ELMB and racks). The FW also extends the functionality of the underlying tools, such as the configuration database tool and installation tool.

These developments were not only the work of the CERN support groups but also depended on contributions from the experiments. In this way the development and maintenance was done once and used by many. This centralized development has not only significantly reduced the overall development effort but will also ease the long-term maintenance – an issue typically encountered by experiments in high-energy physics where short-term collaborators do much of the development work.

As figure 1 shows, the JCOP FW has been developed in layers based on a component model. In this way each layer builds on the facilities offered by the layer below, allowing subdetector groups to pick and choose between the components on offer, taking only those that they require. The figure also illustrates how the JCOP FW, although originally designed and implemented for the LHC experiments, can be used by other experiments and applications owing to the approach adopted. Some components in particular have been incorporated into the unified industrial control system (UNICOS) FW, developed within the CERN accelerator controls group (Under control: keeping the LHC beams on track). The UNICOS FW, initially developed for the LHC cryogenics control system, is now used for many applications in the accelerator domain and as the basis for the gas-control systems (GCS) for the LHC experiments.

In addition to these development and support activities, JCOP provides an excellent forum for technical discussions and the sharing of experience across experiments. There are regular meetings, both at the managerial and the technical levels, to exchange information and discuss issues of concern for all experiments. A number of more formal workshops and reviews have also taken place involving experts from non-LHC experiments to ensure the relevance and quality of the products developed. Moreover, to maximize the efficiency and use of PVSS and the JCOP FW, JCOP offers tailor-made training courses. This is particularly important because the subdetector-development teams have a high turnover of staff for their controls applications. To date, several hundred people have attended these courses.

As experiments have not always tackled issues at the same time, this common approach has allowed them to benefit from the experience of the first experiment to address a particular problem. In addition, JCOP has conducted a number of test activities, which cover the testing of commonly used commercial applications, such as various OPC servers, as well as the scalability of many of the supported tools. Where the tests indicated problems, this provided feedback for the tool developers, including the commercial suppliers. This in turn resulted in significant improvements in the products.

Fig. 2. The development approach for the DCS for the LHC experiments, which involves different groups of people at the different levels.

Building on the framework

Although JCOP provides the basic building blocks and plenty of support, there is still considerable work left for the subdetector teams around the world who build the final applications. This is a complex process because there are often several geographically distributed groups working on a single subdetector application, and all of the applications must eventually be brought together and integrated into a single homogeneous DCS. For this to be possible, the often small central experiment controls teams play a significant role (figure 2). They not only participate extensively in the activities of JCOP, but also have other important tasks to perform, including development of guidelines and recommendations for the subdetector developments, to ensure easy integration; customization and extension of the JCOP FW for the experiment’s specific needs (e.g. specific hardware used in their experiment but not in the others); support and consultation for the subdetector teams; development of applications for the monitoring and control of the general experiment infrastructure e.g. for the control of racks and environmental monitoring.

As well as selecting, developing and supporting tools to ease the development of the DCSs, there have been two areas where complete applications have been developed. These are the detector safety systems (DSS) and the gas control systems (GCS). The DSS, which is based on redundant Siemens PLCs and PVSS, has been developed in a data-driven manner that allows all four LHC experiments to configure it to their individual needs. Although not yet fully configured, the DSS is now deployed in the four experiments and has been running successfully in some for more than a year.

The approach for the GCS goes one step further. It is also based on PLCs (Schneider Premium) and PVSS, but the PLC and PVSS code of the 23 GCSs is generated automatically using a model-based development technique. In simple terms, there is a generic GCS model that includes all possible modules and options, and each GCS application is defined by a particular combination of these modules and options. The final GCS application is created by selecting from a set of predefined components and configuring them appropriately using an application builder created for the purpose. All 23 GCSs have been generated in this way and are now deployed.

Detector control display for the Tile Calorimeter in the ATLAS experiment.

At the time of writing, the four LHC experiment collaborations were all heavily engaged in the commissioning of their detectors and control systems. To date, the integration of the various subdetector-control systems has proceeded relatively smoothly, owing to the homogeneous nature of the subdetector implementations. However, that is not to say that it has been problem free. Some issues of scaling and performance have emerged as the systems have increased in size, with more and more of the detectors being commissioned. However, thanks to the JCOP collaboration, it has been possible to address these issues in common for all experiments.

Despite some initial difficulties, the players involved see the approach described in this article, as well as the JCOP collaboration, as a success. The key here has been the building of confidence between the central team and its clients through the transparency of the procedures used to manage the project. All of the partners need to understand what is being done, what resources are available and that the milestones will be adhered to. The benefits of this collaborative approach include less overall effort, through the avoidance of duplicate development; each central DCS and subdetector team can concentrate on their own specific issues; easier integration between developments; sharing of knowledge and experience between the various teams; and greater commonality between the experiment systems enables the provision of central support. In addition, it is easier to guarantee long-term maintenance with CERN-based central support. Compared with previous projects, JCOP has led to a great deal of commonality between the LHC experiments’ DCSs, and it seems likely that with more centralized resources even more could have been achieved in common.

Could the JCOP approach be applied more widely to other experiment systems? If the project has strong management, then I believe so. Indeed, the control system based on this approach for the LHCb experiment is not limited to the DCS but also covers the complete experiment-control system, which includes the trigger, data-acquisition and readout systems as well as the overall run control. Only time will tell if this approach can, and will, be applied more extensively in future projects.

The post Detector controls for LHC experiments appeared first on CERN Courier.

Under control: keeping the LHC beams on track

cern — Thu, 13 Mar 2008 13:39:39 +0000

The scale and complexity of the Large Hadron Collider (LHC) under construction at CERN are unprecedented in the field of particle accelerators. It has the largest number of components and the widest diversity of systems of any accelerator in the world. As many as 500 objects around the 27 km ring, from passive valves to complex experimental detectors, could in principle move into the beam path in either the LHC ring or the transfer lines. Operation of the machine will be extremely complicated for a number of reasons, including critical technical subsystems, a large parameter space, real-time feedback loops and the need for online magnetic and beam measurements. In addition, the LHC is the first superconducting accelerator built at CERN and will use four large-scale cryoplants with 1.8 K refrigeration capability.

A close-up of a typical LHC operation console.

The complexity means that repairs of any damaged equipment will take a long time. For example, it will take about 30 days to change a superconducting magnet. Then there is the question of damage if systems go wrong. The energy stored in the beams and magnets is more than twice the levels of other machines. That accumulated in the beam could, for example, melt 500 kg of copper. All of this means that the LHC machine must be protected at all costs. If an incident occurs during operation, it is critical that it is possible to determine what has happened and trace the cause. Moreover, operation should not resume if the machine is not back in a good working state.

Fig. 1. The complex LHC controls infrastructure network comprises three tiers.

The accelerator controls group at CERN has spent the past four years developing a new software and hardware control system architecture based on the many years of experience in controlling the particle injector chain at CERN. The resulting LHC controls infrastructure is based on a classic three-tier architecture: a basic resource tier that gathers all of the controls equipment located close to the accelerators; a middle tier of servers; and a top tier that interfaces with the operators (figure 1).

Fig. 2. The architecture of the LHC software application (LSA) is designed to be modular, layered and distributed and provides operators with real-time data.

Complex architecture

The LHC Software Application (LSA) system covers all of the most important aspects of accelerator controls: optics (twiss, machine layout), parameter space, settings generation and management (generation of functions based on optics, functions and scalar values for all parameters), trim (coherent modifications of settings, translation from physics to hardware parameters), operational exploitation, hardware exploitation (equipment control, measurements) and beam-based measurements. The software architecture is based on three main principles (figure 2). It is modular (each module has high cohesion, providing a clear application program interface to its functionality), layered (with three isolated logical layers – database and hardware access layer, business layer, user applications) and distributed (when deployed in the three-tier configuration). It provides homogenous application software to operate the SPS accelerator, its transfer lines and the LHC, and it has already been used successfully in 2005 and 2006 to operate the Low Energy Ion Ring (LEIR) accelerator, the SPS and LHC transfer lines.

The front-end hardware of the resource tier consists of 250 VMEbus64x sub-racks and 120 industrial PCs distributed in the surface buildings around the 27 km ring of the LHC. The mission of these systems is to perform direct real-time measurements and data acquisition close to the machine, and to deliver this information to the application software running in the upper levels of the control system. These embedded systems use home-made hardware and commercial off-the-shelf technology modules, and they serve as managers for various types of fieldbus such as WorldFIP, a deterministic bus used for the real-time control of the LHC power converters and the quench-protection system. All front ends in the LHC have a built-in timing receiver that guarantees synchronization to within 1 μs. This is required for time tagging of post-mortem data. The tier also covers programmable logic controllers, which drive various kinds of industrial actuator and sensor for systems, such as the LHC cryogenics systems and the LHC vacuum system.

The middle tier of the LHC controls system is mostly located in the Central Computer Room, close to the CERN Control Centre (CCC). This tier consists of various servers: application servers, which host the software required to operate the LHC beams and run the supervisory control and data acquisition (SCADA) systems; data servers that contain the LHC layout and the controls configuration, as well as all of the machine settings needed to operate the machine or to diagnose machine behaviours; and file servers containing the operational applications. More than 100 servers provide all of these services. The middle tier also includes the central timing that provides the information for cycling the whole complex of machines involved in the production of the LHC beam, from the linacs onwards.

Fig. 3. The CERN control centre during the first days of use in 2006 (top); and today (bottom).

At the top level – the presentation tier – consoles in the CCC run GUIs that will allow machine operators to control and optimize the LHC beams and supervise the state of key systems. Dedicated displays provide real-time summaries of key machine parameters. The CCC is divided into four “islands”, each devoted to a specific task: CERN’s PS complex; the SPS; technical services; and the LHC. Each island is made of five operational consoles and a typical LHC console is composed of five computers (figure 3). These are PCs running interactive applications, fixed displays and video displays, and they include a dedicated PC connected only to the public network. This can be used for general office activities such as e-mail and web browsing, leaving the LHC control system isolated from exterior networks.

Failsafe mechanisms

In building the infrastructure for the LHC controls, the controls groups developed a number of technical solutions to the many challenges facing them. Security was of paramount concern: the LHC control system must be protected, not only from external hackers, but also from inadvertent errors by operators and failures in the system. The Computing and Network Infrastructure for Controls is a CERN-wide working group set up in 2004 to define a security policy for all of CERN, including networking aspects, operating systems configuration (Windows and Linux), services and support (Lüders 2007). One of the group’s major outcomes is the formal separation of the general-purpose network and the technical network, where connection to the latter requires the appropriate authorization.

Fig. 4. Tokens and access maps for the role-based access control (RBAC).

Another solution has been to deploy, in close collaboration with Fermilab, “role-based” access (RBAC) to equipment in the communication infrastructure. The main motivation to have RBAC in a control system is to prevent unauthorized access and provide an inexpensive way to protect the accelerator. A user is prevented from entering the wrong settings – or from even logging into the application at all. RBAC works by giving people roles and assigning permissions to those roles to make settings. An RBAC token – containing information about the user, the application, the location, the role and so on – is obtained during the authentication phase (figure 4). This is then attached to any subsequent access to equipment and is used to grant or deny the action. Depending on the action made, who is making the call and from where, and when it is executed, access will be either granted or denied. This allows for filtering, control and traceability of modifications to the equipment.

An alarm service for the operation of all of the CERN accelerator chain and technical infrastructure exists in the form of the LHC Alarm SERvice (LASER). This is used operationally for the transfer lines, the SPS, the CERN Neutrinos to Gran Sasso (CNGS) project, the experiments and the LHC, and it has recently been adapted for the PS Complex (Sigerud et al. 2005). LASER provides the collection, analysis, distribution, definition and archiving of information about abnormal situations – fault states – either for dedicated alarm consoles, running mainly in the control rooms, or for specialized applications.

LASER does not actually detect the fault states. This is done by user surveillance programs, which run either on distributed front-end computers or on central servers. The service processes about 180,000 alarm events each day and currently has more than 120,000 definitions. It is relatively simple for equipment specialists to define and send alarms, so one challenge has been to keep the number of events and definitions to a practical limit for human operations, according to recommended best practice.

Architecture for the diagnostics and monitoring project, DIAMON.

The controls infrastructure of the LHC and its whole injector chain spans large distances and is based on a diversity of equipment, all of which needs to be constantly monitored. When a problem is detected, the CCC is notified and an appropriate repair has to be proposed. The purpose of the diagnostics and monitoring (DIAMON) project is to provide the operators and equipment groups with tools to monitor the accelerator and beam controls infrastructure with easy-to-use first-line diagnostics, as well as to solve problems or help to decide on responsibilities for the first line of intervention.

The scope of DIAMON covers some 3000 “agents”. These are pieces of code, each of which monitors a part of the infrastructure, from the fieldbuses and frontends to the hardware of the control-room consoles. It uses LASER and works in two main parts: the monitoring part constantly checks all items of the controls infrastructure and reports on problems; while the diagnostic part displays the overall status of the controls infrastructure and proposes support for repairs.

The frontend of the controls system has its own dedicated real-time frontend software architecture (FESA). This framework offers a complete environment for equipment specialists to design, develop, deploy and test equipment software. Despite the diversity of devices – such as beam-loss monitors, power converters, kickers, cryogenic systems and pick-ups – FESA has successfully standardized a high-level language and an object-oriented framework for describing and developing portable equipment software, at least across CERN’s accelerators. This reduces the time spent developing and maintaining equipment software and brings consistency across the equipment software deployed across all accelerators at CERN.

This article illustrates only some of the technical solutions that have been studied, developed and deployed in the controls infrastructure in the effort to cope with the stringent and demanding challenges of the LHC. This infrastructure has now been tested almost completely on machines and facilities that are already operational, from LEIR to the SPS and CNGS, and LHC hardware commissioning. The estimated collective effort amounts to some 300 person-years and a cost of SFr21 m. Part of the enormous human resource comes from international collaborations, the valuable contributions of which are hugely appreciated. Now the accelerator controls group is confident that they can meet the challenges of the LHC.

• This article is based on the author’s presentation at ICALEPCS ’07 (Control systems for big physics reach maturity).

The post Under control: keeping the LHC beams on track appeared first on CERN Courier.

Control systems for big physics reach maturity

cern — Thu, 13 Mar 2008 13:36:48 +0000

Control systems are a huge feature of the operation of particle accelerators and other large-scale physics projects. They allow completely integrated operation, including the continuous monitoring of subsystems; display of statuses and alarms to operators; preparation and automation of scheduled operations; archiving data; and making all of the experimental data available to operators and system experts. The latest news from projects around the world formed the main focus of the 11th International Conference on Accelerator and Large Experimental Physics Control Systems (ICALEPCS), which took place on 13–19 October in Knoxville, Tennessee. More than 360 people from 22 countries attended the meeting hosted by the Oak Ridge National Laboratory (ORNL) and the Thomas Jefferson National Accelerator Facility at the Knoxville Conference Center. The 260 presentations, including 71 talks, confirmed the use of established technologies and reviewed their consolidation. Excellent poster sessions also provided plenty of opportunity for discussions with the authors during the coffee breaks.

The conference included a tour of the Spallation Neutron Source, with its impressive klystrons.
Image credit: Pamela J Kite ORNL/NScD/RAD.

The weekend prior to the conference saw three related meetings. Almost 50 people attended the Control System Cyber-Security workshop, where eight major laboratories presented and discussed current implementations and future prospects for securing control systems. All have acknowledged the risk and all follow a “defence-in-depth” approach, focusing on network protection and segregation, authorization and authentication, centralized PC installation schemes and collaboration between information-technology and controls experts.

Approaches to control systems

In parallel, 200 people attended meetings of the collaborations developing the open-source toolkits EPICS and TANGO. The EPICS collaboration in particular has grown since previous ICALEPCS meetings. The contributions presented at the conference showed that these two toolkits are the most widely used and are the predominant choice for many facilities. For example, EPICS has recently been selected for use at the Spallation Neutron Source (SNS) at ORNL, while the control system of the ALBA light source in Spain will be based on TANGO.

Alternative solutions employ commercial supervisory control and data acquisition (SCADA) products for control systems. This is the case, for example, at CERN, the Laser Mégajoule project and the SOLEIL synchrotron. At CERN, the cryogenics system for the LHC and the LHC experiments, among others, make extensive use of commercial SCADA systems. The combination of their use with appropriate software frameworks developed in common has largely facilitated the design and construction of these control systems. They are currently being scaled up to their final operational size – a task that has gone smoothly so far (Under control: keeping the LHC beams on track and Detector controls for LHC experiments).

Independent of the approach adopted, the controls community has focused strongly on the software-development process, taking an increasing interest in risk reduction, improved productivity and quality assurance, as well as outsourcing. The conference learned of many efforts for standardization and best practice, from the management of requirements to development, implementation and testing. Speakers from CERN, for example, explained the benefits of the adoption of the Agile design and programming methodology in the context of control-system development.

Fig.1. Accelerator management at SPring-8 in Japan uses a geographical information system.
Image credit: Akihiro Yamashita/SPring-8.

The ITER tokamak project in Cadarache, France, has taken an approach that uses a unified design to deal with the static and dynamic behaviour of subsystems. The operation of ITER requires the orchestration of up to 120 control systems, including all technical and plasma diagnostic systems. ITER will outsource a large fraction of these control systems, which will be procured “in kind” from the participating teams. Outsourcing also played a major role in the Australian Synchrotron and it involved co-operation between research institutions and industrial companies to enhance and optimize the functionality of their control-system products. Such collaboration needs the definition of strict acceptance criteria and deadlines, but it also allows outsourcing of the risk. The Mégajoule project tested its subcontracting process within a small “vertical slice”, before adapting all of the outsourcing and the integration process to the full-scale system. The Atacama Large Millimetric and Submillimetric Array has provided further lessons about the successful organization of a distributed team and integration of different objects. The project enforced a common software framework on all participating teams, and the integration process focused on functionality rather than on the subsystems.

In addition to the software frameworks for control systems, there are many plug-ins, tools and utilities under development, using, in particular, the Java language. For example, EPICS employs Java at all levels from the front-end Java input/output (I/O) controllers to the supervision layer. Java is now a top candidate for new developments, owing mostly to its productivity and portability, not only for graphical user interfaces (GUIs) and utilities but also for applications that are calculation intensive. The accelerator domain has integrated more advanced Java-related techniques successfully. SLAC, for example, has benefited from the open-source Eclipse technologies, and the Java-based open-source Spring is being deployed in the LHC accelerator control systems at CERN (Under control: keeping the LHC beams on track). However, somewhat contrarily to these common efforts, individual projects have also developed a variety of new electronic logbooks and custom GUI builders.

The flexibility and portability of Java are becoming increasingly combined with the extensible markup language XML. With interoperability in mind, the growing (and correct) usage of XML and associated technologies provides a good basis for openness, data exchange and automation, rather than simply for configuration.

An example of this openness is the adoption of industrial solutions for data management. Modern control systems have seen a rapid growth in data to be archived, together with rising expectations for performance and scalability. File-based or dedicated solutions for data management are reaching their limits, so these techniques are now being replaced by high-performance databases, such as Oracle and PostgreSQL. These databases not only record the parameters of control systems but also are used for administration, documentation management and equipment management. In addition to these well established technologies, some users have chosen ingenious approaches. For example, SPring-8 in Japan has a geographic information system integrated into its accelerator management (figure 1). The Google Maps-like system allows localizing, visualizing and monitoring of equipment in real time, and it has opened up interesting perspectives for the control systems community.

Hardware becomes soft

On the hardware side, VME equipment shows an increased use of embedded controllers, such as digital signal processors and field-programmable gate arrays. Their flexibility brings the controls software directly onto the front end, for example, as cross-compiled EPICS I/O controllers. The development of radiation hard front-ends, for example, for the Compact Linear Collider study and the LHC at CERN, have presented other challenges. Timing systems have also had to face new challenges: the LHC requires independent and asynchronous timing cycles of arbitrary duration; timing distributions, for the accelerators at SOLEIL or the Los Alamos Neutron Science Center, for example, are based on common networks with broadcasting clocks and event-driven data; and modern free-electron lasers (FELs), such as at SPring-8, depend on timing accuracies of femtoseconds to achieve stable laser beams.

FELs and light sources were the main focus of several status reports at the conference. The X-ray FEL project at SPring-8 has implemented its control system in MADOCA, a framework that follows a three-tier control model. The layers consist of an interface layer based on DeviceNet programmable logic controllers and VME crates; communication middleware based on remote procedure calls; and Linux consoles for the GUIs. The control system for the Free-electron Laser in Hamburg (FLASH) at DESY provides bunch-synchronized data recording using a novel integration of a fast DAQ system. The FLASH collaboration carried out an evaluation of the front-end crates used in the telecoms industry, which suggested that they had more reliable operation and integrated management compared with VME crates. The collaboration for the ALICE experiment at the LHC reported on progress with its control system, which is currently being installed, commissioned and prepared for operation, due to start later this year. Other status reports came from the Facility for Antiproton and Ion Research at GSI and the Diamond Light Source in the UK.

The conference concluded with presentations about the new developments and future steps in the evolution of some of the major controls frameworks. These underlined that the ICALEPCS conference not only confirmed the use of established technologies and designs, in particular EPICS and TANGO, but also showed the success of commercial SCADA solutions. Control systems have become highly developed and the conference reviewed consolidation efforts and extensions thoroughly. The social programme included a dinner with bluegrass music and an excellent tour of the SNS, the world’s most intense pulsed accelerator-based neutron source, which rounded off the meeting nicely. Now the controls community awaits the 12th ICALEPCS conference, to be held in Kobe, Japan, in autumn 2009.

The post Control systems for big physics reach maturity appeared first on CERN Courier.

Symmetry breaking on a supercomputer

cern — Mon, 04 Jun 2007 08:36:59 +0000

The Japan Lattice QCD Collaboration has used numerical simulations to reproduce spontaneous chiral symmetry breaking (SCSB) in quantum chromodynamics (QCD). This idea underlies the widely accepted explanation for the masses of particles made from the lighter quarks, but it has not yet been proven theoretically starting from QCD. Now using a new supercomputer and an appropriate formulation of lattice QCD, Shoji Hashimoto from KEK and colleagues have realized an exact chiral symmetry on the lattice, and observe the effects of symmetry breaking.

Chiral symmetry distinguishes right-hand spinning quarks from left-handed and is exact only if the quarks move at c and are therefore massless. In 1961 Yoichiro Nambu and Giovanni Jona-Lasinio proposed the idea of SCSB, inspired by the Bardeen–Cooper–Schrieffer mechanism of superconductivity in which spin-up and spin-down electrons pair up and condense into a lower energy level. In QCD a quark and an antiquark pair up, leading to a vacuum full of condensed quark–antiquark pairs. The result is that chiral symmetry is broken, so that the quarks – and the particles they form – acquire masses.

In their simulation the group employed the overlap fermion formulation for quarks on the lattice, proposed by Herbert Neuberger in 1998. While this is an ideal formulation theoretically, it is numerically difficult to implement, requiring more than 100 times the computer power of other fermion formulations. However, the group used the new IBM System BlueGene Solution supercomputer installed at KEK in March 2006, as well as steady improvements of numerical algorithms

The ratios of eigenvalues (2/1 etc). N_f is the number of dynamical quarks in the vacuum and Q is the topological charge of the gluon field on the lattice. The centre panel has the result with light dynamical quarks, showing that the QCD simulation (red) reproduces the prediction (blue).

The group’s simulation included extremely light quarks to give eigenvalues of the quarks. The results reproduce predictions (see figure) indicating that chiral symmetry breaking gives rise to light pions that behave as expected.

The post Symmetry breaking on a supercomputer appeared first on CERN Courier.

Coupled-clusters point to faster computation

cern — Mon, 30 Apr 2007 22:00:00 +0000

Calculations of the structure of heavy nuclei have long suffered from the difficulties presented by the sheer complexity of the many-body system, with all of its protons and neutrons. Using theory to make meaningful predictions requires massive datasets that tax even high-powered supercomputers. Recently researchers from Michigan State and Central Michigan universities have reported dramatic success in stripping away much of this complexity, reducing computational time from days or weeks to minutes or hours.

One way to tackle the many-body problem is first to construct mathematical functions that describe each particle, and then start multiplying these functions together to get some idea of the underlying physics of the system. This approach of making the full configuration-interaction (CI) calculation works well enough to describe light nuclei, but becomes extremely challenging with heavier elements. For example, to calculate wave functions and energy levels for the pf-shell structure of ⁵⁶Ni, it means in effect solving an equation with around 10⁹ variables.

Researchers face a similar problem in quantum chemistry in studying molecules with many dozens of interacting electrons. For several years, however, they have used a computationally cost-effective alternative to CI known as coupled-cluster (CC) theory, which was originally suggested in nuclear theory, but largely developed by quantum chemists and atomic and molecular physicists. Now the CC method is making its way back into nuclear physics, first in calculations of light nuclei, and most recently in developments for heavy nuclei. The key is correlation, the idea that some pairs of fermions in the system (whether nucleons or electrons) are strongly linked and related.

The researchers first used the Michigan-State High Performance Computing Center and the Central Michigan Center for High Performance Scientific Computing for the several-week-long task of solving the CI equation describing ⁵⁶Ni, to create a benchmark against which they could compare the results of the CC calculation (M Horoi et al. 2007). They found then that the CC theory produced near identical results and that the time spent crunching the numbers – on a standard laptop – was often measured in minutes or even seconds.

This research bodes well for next-generation nuclear science. Because of existing and planned accelerators around the world, the next few decades promise to yield many heavy isotopes for study. Theoretical models will need to keep pace with the expected avalanche of experimental data. To date, many such models have treated the nucleus as a relatively undifferentiated liquid, gas or other set of mathematical averages – all of which tends to gloss over subtle nuclear nuances. In contrast, coupled-cluster theory may be the only manageable and scalable model that takes a particle-by-particle approach.

The post Coupled-clusters point to faster computation appeared first on CERN Courier.

Programming the Universe: A Quantum Computer Scientist Takes on the Cosmos

cern — Mon, 24 Jul 2006 08:18:04 +0000

by Seth Lloyd, Alfred A Knopf. Hardback ISBN 1400040922, $25.95.

I borrowed this book from my local library a couple of months ago and found it so irritating that I gave up after the first few chapters. When I agreed to review it I decided that I’d better read it a little more thoroughly – amazingly, this time I really enjoyed it.

Some of the anecdotes and name-dropping are rather annoying and I’m not sure that I can embrace the central thesis – that our universe is a giant quantum computer (QC) computing itself. (I would have thought that the inherent randomness of things argues against the universe as computer.) However, the book does contain an unusually informative and quirky account of the theory of our surroundings, from small to large, and it is very entertaining and easy to read.

As a sort of theoretical theory book, it is not real science that we are looking at here. It takes the current theories of particle physics and cosmology, assumes that they are all correct and then constructs a new all-embracing theory. Somewhere in the book a claim is made that there are predictions, but I didn’t see any sign of them (theories that make no predictions are unfortunately getting more and more common these days).

Perhaps the book is way ahead of its time. The most important force in the universe is surely gravity, so when some future theorist has finally developed a quantum theory of gravity, then we might be ready for it.

I have heard that the intelligent-design people are unhappy with this book, but they shouldn’t be. Lloyd has presented them with a great opportunity: surely the hypothetical intelligent designer and the hypothetical programmer of the big hypothetical QC within which we live might be one and the same.

As alert readers of CERN Courier will already know, a recently published experimental result (O Hosten et al. 2006 Nature 439 949) has confirmed theoretical speculation and demonstrated that QCs compute the same whether they are on or off. So here’s an interesting thought: amazing as our universe is, if Lloyd is right it might not even have been turned on yet.

Indeed, it all reminds me a bit too much of Deep Thought and a misspent youth, but it’s a fun book. I guess it takes a quantum-mechanical engineer to view things in such an odd manner. I do recommend this book, but urge that you don’t take it too seriously. Also make sure that you read it twice and remember that the answer might well be 42.

The post Programming the Universe: A Quantum Computer Scientist Takes on the Cosmos appeared first on CERN Courier.

The dark side of computing power

cern — Wed, 02 Nov 2005 00:00:00 +0000

On a recent visit to CERN, I had the chance to see how the high-energy physics (HEP) community was struggling with many of the same sorts of computing problems that we have to deal with at Google. So here are some thoughts on where commodity computing may be going, and how organizations like CERN and Google could influence things in the right direction.

First a few words about what we do at Google. The Web consists of more than 10 billion pages of information. With an average of 10 kB of textual information per page, this adds up to around 100 TB. This is our data-set at Google. It is big, but tractable – it is apparently just a few days’ worth of data production from the Large Hadron Collider. So just like particle physicists have already found out, we need a lot of computers, disks, networking and software. And we need them to be cheap.

The switch to commodity computing began many years ago. The rationale is that single machine performance is not that interesting any more, since price goes up non-linearly with performance. As long as your problem can be easily partitioned – which is the case for processing Web pages or particle events – then you might as well use cheaper, simpler machines.

But even with cheap commodity computers, keeping costs down is a challenge. And increasingly, the challenge is not just hardware costs, but also reducing energy consumption. In the early days at Google – just five years ago – you would have been amazed to see cheap household fans around our data centre, being used just to keep things cool. Saving power is still the name of the game in our data centres today, even to the extent that we shut off the lights in them when no-one is there.

Let’s look more closely at the hidden electrical power costs of a data centre. Although chip performance keeps going up, and performance per dollar, too, performance per watt is stagnant. In other words, the total power consumed in data centres is rising. Worse, the operational costs of commercial data centres are almost directly proportional to how much power is consumed by the PCs. And unfortunately, a lot of that is wasted.

For example, while the system power of a dual-processor PC is around 265 W, cooling overhead adds another 135 W. Over four years, the power costs of running a PC can add up to half of the hardware cost. Yet this is a gross underestimate of real energy costs. It ignores issues such as inefficiencies of power distribution within the data centre. Globally, even ignoring cooling costs, you lose a factor of two in power from the point where electricity is fed into a data centre to the motherboard in the server.
Since I’m from a dotcom, an obvious business model has occurred to me: an electricity company could give PCs away – provided users agreed to run the PCs continuously for several years on the power from that company. Such companies could make a handsome profit!

A major inefficiency in the data centre is DC power supplies, which are typically about 70% efficient. At Google ours are 90% efficient, and the extra cost of this higher efficiency is easily compensated for by the reduced power consumption over the lifetime of the power supply.

Part of Google’s strategy has been to work with our component vendors to get more energy-efficient equipment to market earlier. For example, most motherboards have three DC voltage inputs, for historical reasons. Since the processor actually works at a voltage different from all three of these, this is very inefficient. Reducing this to one DC voltage produces savings, even if there are initial costs involved in getting the vendor to make the necessary changes to their production. The HEP community ought to be in a similar position to squeeze extra mileage out of equipment from established vendors.

Tackling power-distribution losses and cooling inefficiencies in conventional data centres also means improving the physical design of the centre. We employ mechanical engineers at Google to help with this, and yes, the improvements they make in reducing energy costs amply justify their wages.

While I’ve focused on some negative trends in power consumption, there are also positive ones. The recent switch to multicore processors was a successful attempt to reduce processors’ runaway energy consumption. But Moore’s law keeps gnawing away at any ingenious improvement of this kind. Ultimately, power consumption is likely to become the most critical cost factor for data-centre budgets, as energy prices continue to rise worldwide and concerns about global warming put increasing pressure on organizations to use electrical power more efficiently.

Of course, there are other areas where the cost of running data centres can be greatly optimized. For example, networking equipment lacks commodity solutions, at least at the data-centre scale. And better software to turn unreliable PCs into efficient computing platforms can surely be devised.

In general, Google’s needs and those of the HEP community are similar. So I hope we can continue to exchange experiences and learn from each other.

The post The dark side of computing power appeared first on CERN Courier.

In need of the human touch

cern — Tue, 29 Mar 2005 22:00:00 +0000

I have led software projects since 1987 and have never known one, including my own, that was not in a crisis. After thinking and reading about it and after much discussion I have become convinced that most of us write software each day for a number of reasons but without ever penetrating its innermost nature.

Federico Carminati

A software project is primarily a programming effort, and this is done with a programming language. Now this is already an oxymoron. Programming is writing before; it entails predicting or dictating the behaviour of something or someone. A language, on the other hand, is the vehicle of communication that in some ways carries its own negation because it is a way of expressing concepts that are inevitably reinterpreted at the receiver’s end. How many times have you raged “Why does this stupid computer do what I tell it [or him or her according to your momentary mood toward one of the genders], and not what I want!?” A language is in fact a set of tools that have been developed through evolution not to “program” but to “interact”.

Moreover every programmer has his own “language” beyond the “programming language”. Many times on opening a program file and looking at the code, I have been able to recognize the author at once and feel sympathy (“Oh, this is my old pal…”) or its opposite (“Here he goes again with his distorted mind…”), as if opening a letter.

Now if only it were that simple. If several people are working on a project, you not only have to develop the program for the project but you also have to manage communication between its members and its customers via human and programming language.

This is where our friends the engineers say to us “Why don’t you build it like a bridge?” However, software engineering is one more oxymoron cast upon us. We could never build software like a bridge, no more than engineers could ever remove an obsolete bridge with a stroke of a key without leaving tons of scrap metal behind. Software engineering’s dream of “employing solid engineering processes on software development” is more a definition than a real target. We all know exactly why it has little chance of working in this way, but we cannot put it into words when we have coffee with our engineer friends. Again, language leaves us wanting.

Attempts to apply engineering to software have filled books with explanations of why it did not work and of how to do it right, which means that a solution is not at hand. The elements for success are known: planning, user-developer interaction, communication, and communication again. The problem is how to combine them into a winning strategy.

Then along came Linux and the open source community. Can an operating system be built without buying the land, building the offices, hiring hundreds of programmers and making a master plan for which there is no printer large enough? Can a few people in a garage outwit, outperform and eventually out-market the big ones? Obviously the answer is yes, and this is why Linux, “the glorified video game” to quote a colleague of mine, has carried a subversive message. I think we have not yet drawn all the lessons. I still hear survivors from recent software wrecks say: “If only we had been more disciplined in following The Plan…”

Is software engineering catching up? Agile technologies put the planning activity at the core of the process while minimizing the importance of “The Plan”, and emphasize the communication between developers and customers.

Have the “rules of the garage” finally been written? Not quite. Open source goes far beyond agile technologies by successfully bonding people who are collaborating on a single large project into a distributed community that communicates essentially by e-mail. Is constraining the communication to one single channel part of the secret? Maybe. What is certain is that in open source the market forces are left to act, and new features emerge and evolve in a Darwinian environment where the fittest survives. But this alone would not be enough for a successful software project.

A good idea that has not matured enough can be burned forever if it is exposed too early to the customers. Here judicious planning is necessary, and the determination and vision of the developer is still a factor in deciding when and how to inject his “creature” into the game. I am afraid (or rather I should say delighted) we are not close to seeing the human factor disappear from software development.

The post In need of the human touch appeared first on CERN Courier.

Computing at CERN: the mainframe era

cern — Sun, 05 Sep 2004 22:00:00 +0000

In June 1996 computing staff at CERN turned off the IBM 3090 for the last time, so marking the end of an era that had lasted 40 years. In May 1956 CERN had signed the purchasing contract for its first mainframe computer – a Ferranti Mercury with a clock cycle 200,000 times slower than modern PCs. Now, the age of the mainframe is gone, replaced by “scalable solutions” based on Unix “boxes” and PCs, and CERN and its collaborating institutes are in the process of installing several tens of thousands of PCs to help satisfy computing requirements for the Large Hadron Collider.

The Mercury was a first-generation vacuum tube (valve) machine with a 60 microsecond clock cycle. It took five cycles – 300 microseconds – to multiply 40-bit words and had no hardware division, a function that had to be programmed. The machine took two years to build, arriving at CERN in 1958, which was a year later than originally foreseen. Programming by users was possible from the end of 1958 with a language called Autocode. Input and output (I/O) was by paper tape, although magnetic tape units were added in 1962. Indeed, the I/O proved something of a limitation, for example when the Mercury was put to use in the analysis of paper tape produced by the instruments used to scan and measure bubble-chamber film. The work of the fast and powerful central processing unit (CPU) was held up by the sluggish I/O. By 1959 it was already clear that a more powerful system was needed to deal with the streams of data coming from the experiments at CERN.

A panoramic view of CERN’s computer centre in the mid-1980s, during the era of the combined service provided by IBM and Siemens. (Courtesy Chris Jones.)

The 1960s arrived at the computing centre initially in the form of an IBM 709 in January 1961. Although it was still based on valves, it could be programmed in FORTRAN, read instructions written on cards, and read and write magnetic tape. Its CPU was four to five times faster than that of the Mercury, but it came with a price tag of 10 millions Swiss francs (in 1960 prices!). Only two years later it was replaced by an IBM 7090, a transistorized version of the same machine with a 2.18 microsecond clock cycle. This marked the end for the valve machines, and after a period in which it was dedicated to a single experiment at CERN (the Missing Mass Spectrometer), the Mercury was given to the Academy of Mining and Metallurgy in Krakow. With the 7090 the physicists could really take advantage of all the developments that had begun on the 709, such as on-line connection to devices including the flying spot digitizers to measure film from bubble and spark chambers. More than 300,000 frames of spark-chamber film were automatically scanned and measured in record time with the 7090. This period also saw the first on-line connection to film-less detectors, recording data on magnetic tape.

CERN’s IBM 709 computer is unloaded at Geneva’s Cointrin Airport in 1961, under the watchful eye of a Swiss customs officer, at right.

In 1965 the first CDC machine arrived at CERN – the 6600 designed by computer pioneer Seymour Cray, with a CPU clock cycle of 100 ns and a processing power 10 times that of the IBM 7090. With serial number 3, it was a pre-production series machine. It had disks more than 1 m in diameter – which could hold 500 million bits (64 megabytes) and subsequently made neat coffee tables – tape units and a high-speed card reader. However, as Paolo Zanella, who became division leader from 1976 until 1988, recalled, “The introduction of such a complex system was by no means trivial and CERN experienced one of the most painful periods in its computing history. The coupling of unstable hardware to shaky software resulted in a long traumatic effort to offer a reliable service.” Eventually the 6600 was able to realise its potential, but only after less-powerful machines had been brought in to cope with the increasing demands of the users. Then in 1972 it was joined by a still more powerful sibling, the CDC 7600, the most powerful computer of the time and five times faster than the 6600, but again there were similar painful “teething problems”.

With a speed of just over 10 Mips (millions of instructions per second) and superb floating-point performance, the 7600 was, for its time, a veritable “Ferrari” of computing. But it was a Ferrari with a very difficult running-in period. The system software was again late and inadequate. In the first months the machine had a bad ground-loop problem causing intermittent faults and eventually requiring all modules to be fitted with sheathed rubber bands. It was a magnificent engine for its time whose reliability and tape handling just did not perform to the levels needed, in particular by the electronic experiments. Its superior floating-point capabilities were valuable for processing data from bubble-chamber experiments with their relatively low data rates, but for the fast electronic experiments the “log jam” of the tape drives was a major problem.

So a second revolution occurred with the reintroduction of an IBM system, the 370/168, in 1976, which was able to meet a wider range of users’ requirements. Not only did this machine bring dependable modern tape drives, it also demonstrated that computer hardware could work reliably and it ushered in the heyday of the mainframe, with its robotic mass storage system and a laser printer operating at 19,000 lines per minute. With a CPU cycle of 80 ns, 4 megabytes (later 5) of semiconductor memory and a high-speed multiply unit, it became the “CERN unit” of physics data-processing power, corresponding to 3-4 Mips. Moreover, the advent of the laser printer, with its ability to print bitmaps rather than simple mono-spaced characters, heralded the beginning of scientific text processing and the end for the plotters with their coloured pens (to say nothing of typewriters).

The arrival of the IBM 370/168 in 1976 ushered in the heyday of the mainframe and the “CERN unit” of physics data processing.

The IBM also brought with it the MVS (Multiple Virtual Storage) operating system, with its pedantic Job Control Language, and it provided the opportunity for CERN to introduce WYLBUR, the well-loved, cleverly designed and friendly time-sharing system developed at SLAC, together with its beautifully handwritten and illustrated manual by John Ehrman. WYLBUR was a masterpiece of design, achieving miracles with little power (at the time) shared amongst many simultaneous users. It won friends with its accommodating character and began the exit of punch-card machinery as computer terminals were introduced across the lab. It was also well interfaced with the IBM Mass Store, a unique file storage device, and this provided great convenience for file handling and physics data sample processing. At its peak WYLBUR served around 1200 users per week.

The IBM 370/168 was the starting point for the IBM-based services in the computer centre and was followed by a series of more powerful machines: the 3032, the 3081, several 3090s and finally the ES/9000. In addition, a sister line of compatible machines from Siemens/Fujitsu was introduced and together they provided a single system in a manner transparent to the users. This service carried the bulk of the computer users, more than 6000 per week, and most of the data handing right up to the end of its life in 1996. At its peak around 1995 the IBM service provided a central processor power around a quarter of a top PC today, but the data-processing capacity was outstanding.

During this period CERN’s project for the Large Electron Positron (LEP) collider brought its own challenges, together with a planning review in 1983 of the computing requirements for the LEP era. Attractive alternative systems to the mainframe began to appear over the horizon, presenting computing services with some difficult choices. The DEC VAX machines, used by many physics groups – and subsequently introduced as a successful central facility – were well liked for the excellent VMS operating system. On another scale the technical jump in functionality that was appearing on the new personal workstations, for example from Apollo – such as a fully bit-mapped screen and a “whole half a megabyte of memory” for a single user – were an obvious major attraction for serious computer-code developers, albeit at a cost that was not yet within the reach of many. It is perhaps worth reflecting that in 1983 the PC used the DOS operating system and a character-based screen, whilst the Macintosh had not yet been announced, so bit-mapped screens were a major step forward. (To put that in context, another recommendation of the above planning review was that CERN should install a local-area network and that Ethernet was the best candidate for this.)

The user-friendly nature of the WYLBUR time-sharing system, which was developed at SLAC, was reflected in its beautifully handwritten and illustrated manual by John Ehrman.

The future clearly held exciting times, but some pragmatic decisions about finances, functionality, capacity and tape handling capacity had to be made. It was agreed that for the LEP era the IBM-based services would move to the truly interactive VM/CMS operating system as used at SLAC. (WYLBUR was really a clever editor submitting jobs to batch processing.) This led to a most important development, the HEPVM collaboration. It was possible and indeed desirable to modify the VM/CMS operating system to suit the needs of the user community. All the high-energy physics (HEP) sites running VM/CMS were setting out to do exactly this, as indeed they had done with many previous operating systems. To some extent each site started off as if it were their sovereign right to do this better than the others. In order to defend the rights of the itinerant physicist, in 1983 Norman McCubbin from the Rutherford Appleton Laboratory made the radical but irresistible proposal: “don’t do it better, do it the same!”

The HEPVM collaboration comprised most of the sites who ran VM/CMS as an operating system and who had LEP physicists as clients. This ranged from large dedicated sites such as SLAC, CERN and IN2P3, to university sites where the physicists were far from being the only clients. It was of course impossible to impose upon the diverse managements involved, so it was a question of discussion and explanation and working at the issues. Two important products resulted from this collaboration. A HEPVM tape was distributed to more than 30 sites, containing all the code necessary for producing a unified HEP environment, and the “concept of collaboration between sites” was established as a normal way to proceed. The subsequent off-spring, HEPiX and HEPNT, have continued the tradition of collaboration and it goes without saying that such collaboration will have to take a higher level again in order to make Grid computing successful.

The era of the supercomputer

The 1980s also saw the advent of the supercomputer. The CRAY X-MP supercomputer, which arrived at CERN in January 1988, was the logical successor to Seymour Cray’s CDC 7600 at CERN, and a triumph of price negotiation. The combined scalar performance of its four processors was about a quarter of the largest IBM installed at CERN, but it had strong vector floating-point performance. Its colourful presence resolved the question as to whether the physics codes could really profit from vector capabilities, and probably the greatest benefit to CERN from the CRAY was to the engineers whose applications, for example in finite element analysis and accelerator design, excelled on this machine. The decision was also taken to work together with CRAY to pioneer Unix as the operating system, and this work was no doubt of use to later generations of machines running Unix at CERN.

The CRAY X-MP brought vector capabilities – and a colourful presence – to CERN’s computer centre with its arrival in 1988.

Throughout most of the mainframe period the power delivered to users had doubled approximately every 3.5 years – the CDC 7600 lasted an astonishing 12 years. The arrival of the complete processor on a CMOS chip, which conformed to Moore’s law of doubling speed every 18 months, was an irresistible force that sounded the eventual replacement of mainframe systems, although a number of other issues had to be solved first, including notably the provision of reliable tape-handling facilities. The heyday of the mainframe thus eventually came to an inevitable end.

One very positive feature of the mainframe era at CERN was the joint project teams with the major manufacturers, in particular those of IBM and DEC. The presence of around say 20 engineers from such a company on-site led to extremely good service, not only from the local staff but also through direct contacts to the development teams in America. It was not unknown for a critical bug, discovered during the evening at CERN, to be fixed overnight by the development team in America and installed for the CERN service the next morning, a sharp contrast to the service available in these days of commodity computing. The manufacturers on their side saw the physicists’ use of their computers as pushing the limits of what was possible and pointing the way to the needs of other more straightforward customers in several years time. Hence their willingness to install completely new products, sometimes from their research laboratories, and often free of charge, as a way of getting them used, appraised and de-bugged. The requirements from the physicists made their way back into products and into the operating systems. This was one particular and successful way for particle physics to transfer its technology and expertise to the world at large. In addition, the joint projects provided a framework for excellent pricing, allowing particle physics to receive much more computer equipment than they could normally have paid for.

The post Computing at CERN: the mainframe era appeared first on CERN Courier.

Going public: a new paradigm

cern — Sun, 05 Sep 2004 22:00:00 +0000

Most of the world’s computing power is no longer concentrated in supercomputer centres and machine rooms. Instead it is distributed around the world in hundreds of millions of PCs and game consoles, of which a growing fraction are connected to the Internet.

A new computing paradigm, “public-resource computing”, uses these resources to perform scientific supercomputing. This enables previously unfeasible research and has social implications as well: it catalyses global communities centred on common interests and goals; it encourages public awareness of current scientific research; and it gives the public a measure of control over the directions of science progress.

The number of Internet-connected PCs is growing and is projected to reach 1 billion by 2015. Together they could provide 10¹⁵ floating point operations per second (FLOPS) of power. The potential for distributed disk storage is also huge.

Public-resource computing emerged in the mid-1990s with two projects: GIMPS (looking for large prime numbers); and Distributed.net, (solving cryptographic codes). In 1999 our group launched a third project, SETI@home, which searches radiotelescope data for signs of extraterrestrial intelligence. The appeal of this challenge extended beyond hobbyists; it attracted millions of participants worldwide and inspired a number of other academic projects as well as efforts to commercialize the paradigm. SETI@home currently runs on about 1 million PCs, providing a processing rate of more than 60 teraFLOPS. In contrast, the largest conventional supercomputer, the NEC Earth Simulator, offers in the region of 35 teraFLOPs.

Public-resource computing is effective only if many participate. This relies on publicity. For example, SETI@home has received coverage in the mass-media and in Internet news forums like Slashdot. This, together with its screensaver graphics, seeded a large-scale “viral marketing” effect.

Retaining participants requires an understanding of their motivations. A poll of SETI@home users showed that many are interested in the science, so we developed Web-based educational material and regular scientific news. Another key factor is “credit” – a numerical measure of work accomplished. SETI@home provides website “leader boards” where users are listed in order of their credit.

SETI@home participants contribute more than just CPU time. Some have translated the SETI@home website into 30 languages, and developed add-on software and ancillary websites. It is important to provide channels for these contributions. Various communities have formed around SETI@home. A single, worldwide community interacts through the website and its message boards. Meanwhile, national and language-specific communities have their own websites and message boards. These have been particularly effective in recruiting new participants.

All the world’s a computer

We are developing software called BOINC (Berkeley Open Infrastructure for Network Computing), which facilitates creating and operating public-resource computing projects. Several BOINC-based projects are in progress, including SETI@home, Folding@home and Climateprediction.net. BOINC participants can register with multiple projects and can control how their resources are shared. For example, a user might devote 60% of his CPU time to studying global warming and 40% to SETI.

We hope that BOINC will stimulate public interest in scientific research. Computer owners can donate their resources to any of a number of projects, so they will study and evaluate them, learning about their goals, methods and chances of success. Further, control over resource allocation for scientific research will shift slightly from government funding agencies to the public. This offers a uniquely direct and democratic influence on the directions of scientific research.

What other computational projects are amenable to public-resource computing? The task must be divisible into independent parts whose ratio of computation to data is fairly high (or the cost of Internet data transfer may exceed the cost of doing the computation centrally). Also, the code needed to run the task should be stable over time and require a minimal computational environment.

Climateprediction.net is a recent example of such an effort in the public-resource computing field. Models of complex physical systems, such as global climate, are often chaotic. Studying their statistics requires large numbers of independent simulations with different boundary conditions.

CPU-intensive data-processing applications include analysis of radiotelescope data, and some applications stemming from high-energy physics are also amenable to public computing: CERN has been testing BOINC in house to simulate particle orbits in the LHC. Other possibilities include biomedical applications, such as virtual drug design and gene-sequence analysis. Early pioneers in this field include Folding@home from Stanford University.

In the long run, the inexorable march of Moore’s law, and the corresponding increase of storage capacity on PCs and the bandwidth available to home computers on the Internet, means that public-resource computing should improve both qualitatively and quantitatively, which should open an ever-widening range of opportunities for this new paradigm in scientific computing.

The post Going public: a new paradigm appeared first on CERN Courier.

SweGrid gets set for future challenges

cern — Mon, 07 Jun 2004 22:00:00 +0000

SweGrid, the first national Grid test-bed in Sweden, was inaugurated on 18 March in Uppsala. The Grid nodes, each consisting of a cluster of 100 PCs and 2 Tbyte of disk storage, are located at the six national computer centres in Umeå, Uppsala, Stockholm, Linköping, Göteborg and Lund, and are linked together through the 10 Gbit/s national network SUNET. An additional 60 Tbyte disk storage will be delivered in May and eventually the test-bed will comprise 120 Tbyte disk storage plus 120 Tbyte robotic tape storage in total.

The initiative for this national Grid has come from the Swedish high-energy physics community and was driven by the future requirements for large computing capacity to analyse data from the Large Hadron Collider (LHC). One-third of SweGrid’s full computer resources are currently being used for the execution of the “ATLAS Data Challenge 2” in May and June 2004. In addition, many other applications in other branches of science, such as genome research, climate research, solid-state physics, quantum chemistry and space science, are also being launched on SweGrid.

The equipment for SweGrid has been financed by the Wallenberg Foundation in Sweden. The personnel costs for seven SweGrid technicians and three doctoral students are being covered by the Swedish Research Council through its Swedish National Infrastructure for Computing (SNIC). The Strategic Technical Advisory Committee in SNIC, composed of the directors of Sweden’s six national computer centres, is acting as SweGrid’s executive board.

A Nordic Grid development project, NorduGrid, began in 2000 as a collaboration between high-energy physicists. It set up the first small Nordic Grid test-bed in 2001 and used this to develop the NorduGrid middleware, which has become one of the first Grid middlewares to be used in production internationally, as during the “ATLAS Data Challenge 1” in 2003.

Stimulated by this progress, the Nordic Science Research Councils (NOS-N) took a common initiative to study how the computer resources in the Nordic countries could be organized in a common Grid facility, called the Nordic Data Grid Facility (NDGF). SweGrid constitutes a Swedish contribution to this common effort. The NDGF study group is scheduled to forward a detailed proposal for such a facility to the NOS-N committee within a year from now.

Several interesting presentations were given at the SweGrid inauguration seminar. Mario Campolargo, head of the Information Society Research Infrastructure Unit of the European Commission, described the pan-European GEANT computer network and the potential this represents for Grid development in Europe. He also discussed the significance of the current European Grid development initiatives sponsored by the EC 6th Framework Programme, such as Enabling Grids for e-Science in Europe, a CERN-led initiative in which Sweden has an active role.

Erik Elmroth from the Swedish National Computer Center in Umeå discussed current activities for making Grid services more accessible, such as developing tools for resource brokering and Grid-wide accounting, and establishing Grid portals as common easy-to-use interfaces to the Grid. Niclas Andersson, the leader of the six technicians who have set up and are now running SweGrid, described the deployment and operations of the test-bed and presented its technical specifications.

John Ellis from CERN gave an overview of the physics at the LHC and illustrated the large computer resources required if the new physics phenomena were to be discovered at the LHC. He demonstrated that finding a heavy particle of mass 1 TeV/c² at the LHC would be the equivalent of finding a needle in all of Sweden’s haystacks, which he estimated to be 100 m³ each in volume and to total 100,000. Gilbert Poulard, also from CERN, described the reconstruction and analysis of events in ATLAS and how the software and access to data will be exercised with Grid tools during the forthcoming Data Challenge 2.

There were also reports on Grid applications in other disciplines. Gunnar Norstedt from the Karolinska Institutet in Stockholm described the use of SweGrid for the analysis of gene promoters; a gene promoter is a portion of DNA that regulates the genes and their expression. A general computer code for such analysis has been set up and will be made available at SweGrid through a Grid portal. Roland Lindh from the quantum chemistry group at Lund University described MOLCAS, which is a code for electronic structure calculations in large molecules and which will be accessible on SweGrid.

The final part of the ceremony was conducted by Anders Ynnerman, the leader of SNIC. After Janne Carlsson from the Wallenberg Foundation and Jan Martinsson from the Swedish Research Council had expressed their great satisfaction with the project, Sverker Holmgren, head of the Uppsala National Computer Center, gave a successful first demonstration of how to operate SweGrid.

The post SweGrid gets set for future challenges appeared first on CERN Courier.

WestGrid team announces completion of computing network in western Canada

cern — Mon, 03 May 2004 22:00:00 +0000

A schematic view of the WestGrid project, which provides the infrastructure for high-performance computing resources across British Columbia and Alberta, Canada.

Scientists leading the WestGrid project in Canada have announced that the major resources of this $48 million project are available for general use by the research community. Canadian particle physicists have already applied WestGrid successfully to ongoing experiments, and plans are underway at TRIUMF to link the WestGrid Linux cluster to the LHC Computing Grid (LCG).

The aim of the WestGrid project is to provide high-performance computing in western Canada, based on resources at several universities in Alberta and British Columbia, and at TRIUMF. It currently consists of the following: a 256 cpu shared-memory machine (SGI Origin) for large-scale parallel processing at the University of Alberta; a cluster of multiprocessors (36 x 4 cpu Alpha nodes) at the University of Calgary, connected by a high-speed Quadrics interconnect that is also for parallel jobs; a 1008 cpu Linux cluster (3 GHz IBM blades) at the University of British Columbia (UBC) and TRIUMF for serial or loosely coupled parallel jobs; and a network storage facility (IBM) at Simon Fraser University, initially with 24 TeraBytes of disk space and about 70 TeraBytes of tape. As of November 2003, the WestGrid Linux cluster at UBC/TRIUMF ranked 58th in the “TOP500 Supercomputer Sites” rankings.

The Grid-enabled infrastructure also includes major collaborative facilities known as Access Grid nodes, with a total of seven institutions interconnected over dedicated research “lightpaths” on the existing provincial and national research networks. The new resources are expected to support advances in research in many disciplines where large amounts of data are typically involved, such as medical research, astronomy, subatomic physics, pharmaceutical research and chemistry.

Two particle-physics experiments, TWIST at TRIUMF and D0 at Fermilab, have already participated in the testing phase at the UBC/TRIUMF site. Both experiments benefited greatly from access to significant computing resources during the tests. For the future, it is planned to connect WestGrid indirectly to the LCG through the LCG site at TRIUMF. Work is ongoing to develop the software necessary to achieve this without the need to install LCG tools on WestGrid itself.

The post WestGrid team announces completion of computing network in western Canada appeared first on CERN Courier.

Smaller institutes look to gain from scientific fallout

cern — Mon, 03 May 2004 22:00:00 +0000

Was this a pentaquark? The pK⁺K⁰_sπ^–π^– mass distribution from 16 GeV/c π^–p interactions in CERN’s 2 m bubble chamber shows a significant peak at 3520 MeV.

Large laboratories obtain scientific data in vast quantities and usually use this material for rapid research being driven by competition. The majority of important results are collected in as short a time as possible. When new data appear older data lose their importance and are abandoned or placed at the disposal of smaller labs that could make use of them.

This has been the case in the past with data obtained at laboratories such as CERN, Fermilab and JINR, which came in such quantities that they could not be exhaustively analysed by the researchers there. The data were therefore given to various universities and other smaller laboratories, which over a long period of time have analysed the events in question and sometimes made valid discoveries.

More recently, data from the CDF and D0 experiments at Fermilab have become available via the web. A more leisurely analysis phase is also happening with data obtained from experiments at LEP, whose activity is slowing down. Thus it gives the possibility of allowing researchers at smaller scientific institutions to follow up the work and make new findings. For example, institutes in the “Post L3” collaboration are currently analysing some LEP data in their own time and have no obligation to provide results by a specific deadline.

The pictures made in the late 1960s with the CERN 2 m hydrogen bubble chamber show the possible importance of this approach. Its films ended up in various universities, either for further analysis or for didactic purposes, because bubble-chamber pictures are useful for students. Consequently, during the 1970s, the University of Bucharest and JINR in Dubna obtained 125,000 pictures courtesy of CERN. The pictures were found to contain a number of interesting items that had earlier been overlooked because in the principal analysis they had been viewed with different criteria in mind.

In one particular example, V M Karnauhov, V I Moroz, C Coca and A Mihul were able to report on finding a resonance in π^–p interactions at 16 GeV, having a mass of 3520 ± 3 MeV/c² and a width of 7 ⁺²⁰_-07MeV with eight standard deviations (Karnauhov et al. 1992). At the time this seemed very strange, as most physicists were not particularly interested as the resonance corresponded to a five-quark particle (uud ccbar), which did not fit then into any theoretical framework.

During the past year, however, evidence for several exotic resonances has been reported. A real “gold rush” for similar phenomena – the “pentaquarks” – has begun, even though there are few, if any, irrefutable theoretical explanations. Their masses have not yet been calculated due to the lack of a theoretical basis. These include the Θ* (1540 MeV and a width of 17 MeV) and the Ξ (1862) baryon with S = -2, which have still to be established with high accuracy. They appear like states of five quarks (pentaquarks), i.e. four quarks and one antiquark, so yielding a system without colour, which is necessary to be observable.

The 2 m bubble-chamber data suggested long ago that at least one more baryonic exotic state was found with a mass of 3520 ± 3 MeV/c², a width of 7 ⁺²⁰_-07 MeV and S = 0. This was a pentaquark baryon with neutral strangeness. The essential difference between the Θ*and Ξ (1862) and what was found long ago is that the old resonance was formed by quarks including a ccbar pair, while the new ones contain s (sbar) quarks, giving a substantial difference in the final mass. Other teams have also reported possible sightings of pentaquarks in data from the 2 m chamber, and now the H1 experiment at DESY has evidence for a uuddcbar state with a mass of 3100 MeV/c².

So what can we learn from this experience? The distribution of data to smaller institutions, which perhaps have more time to follow different or unfashionable lines of analysis, must continue. Besides the benefits that this activity can bring to the institutes themselves, the long-term process also has the benefit of bringing fresh minds to the analysis as younger physicists, who may bring new approaches, replacing older ones.

The Grid should also be able to overcome some of the difficulties of the past. It aims at providing a global computing facility, which will allow the smaller laboratories to participate in the primary research. However, the Grid is being developed to provide enormous computing power; it will not be able to provide the thinking time that is necessary for the best job to be done. This can only be provided by the researchers performing long-term analysis generally in the smaller laboratories.

The post Smaller institutes look to gain from scientific fallout appeared first on CERN Courier.

Gigabits, the Grid and the Guinness Book of Records

cern — Mon, 01 Mar 2004 00:00:00 +0000

On five separate occasions during 2003, a team led by Harvey Newman of Caltech and Olivier Martin of CERN established new records for long-distance data transfer, earning a place for these renowned academic institutions in the Guinness Book of Records. This year, new records are expected to be set as the performance of single-stream TCP (Transmission Control Protocol) is pushed closer towards 10 Gbps (gigabits per second). In 1980 “high speed” meant data transfers of 9.6 kbps (kilobits per second), using analogue transmission lines. So the achievement of 10 Gbps in 2004 corresponds to an increase by a factor of 1 million in 25 years – an advance that is even more impressive than the classic “Moore’s law” of computer processing, in which the number of transistors per integrated circuit (i.e. the processing power) follows an almost exponential curve, increasing by a factor of two every 18 months, or 1000 every 15 years.

Some of the members of the DataTAG project team at CERN and a member of the Caltech team who, together with their partners, were behind the first record-breaking transfer of data between CERN and California in February 2003. From left to right: Martin Fluckiger, Stan Cannon, Paolo Moroni, Sylvain Ravot (Caltech), Elise Guyot, Daniel Davids, Olivier Martin (project manager), Rosy Mondardini and Edoardo Martelli.

While chasing such records may sound like an irrelevant game, the underlying goal is of great importance for the future of data-intensive computing Grids. In particular, for CERN and all the physicists across the world working on experiments at the Large Hadron Collider (LHC), the LHC Computing Grid will depend critically on sustainable multi-gigabit per second throughput between different sites. The evolution of such long-distance computing capabilities at CERN has been an important part of CERN’s development as a laboratory, not only for European users but also for those across the globe.

The early days

Computer networks have been of increasing importance at CERN since the early 1970s, when the first links were set up between experiments and the computer centre. The first external links, for example to the Rutherford Laboratory in the UK, were only established during the late 1970s and had very limited purposes, such as remote job submission and output file retrieval. Then from 1974 onwards, together with the EARN/BITnet and UUCP mail network initiatives, there was an extraordinary development in electronic mail. However, it was only in the late 1980s that the foundations for today’s high-speed networks were truly laid down. Indeed, the first international 2 Mbps (megabits per second) link was installed by INFN during the summer of 1989, just in time for the start-up of CERN’s Large Electron Positron collider. However, there was still no Europe-wide consensus on a common protocol, and as a consequence multiple backbones had to be maintained, e.g. DECnet, SNA, X25 and TCP/IP (TCP/Internet Protocol).

The evolution of the Internet2 land-speed records, in gigabits per second (top) and the unit of terabit-metres per second used for the record (bottom), from their inception in March 2000 to the records recently established by the team led by Caltech and CERN.

Back in late 1988, the National Science Foundation (NSF) in the US made an all-important choice when it established NSFnet, the first TCP/IP-based nationwide 1.5 Mbps backbone. This was initially used to connect the NSF-sponsored Super Computer Centers and was later extended to serve regional networks, which themselves connected universities. The NSFnet, which is at the origin of the academic as well as the commercial Internet, served as the emerging commercial Internet backbone until its shutdown in 1995.

In 1990 CERN picked up on this development – not without courage – and together with IBM and other academic partners in Europe developed the use of EASInet (European Academic Supercomputer Initiative Network), a multi-protocol backbone that took account of Europe’s networking idiosyncrasies. EASInet, which also provided a 2 Mbps TCP/IP backbone to European researchers, had a 1.5 Mbps link to NSFnet through Cornell University and was at the origin of the European Internet, together with EBONE. These developments established TCP/IP as the major protocol for Internet backbones around the world.

The Internet2 land-speed records

In 2000, to stimulate continuing research and experimentation in TCP transfers, the Internet2 project, a consortium of approximately 200 US universities working in partnership with industry and government, created a contest – the Internet2 land-speed record (I2LSR). This involves sending data across long distances by “terrestrial” means – that is, by underground as well as undersea fibre-optic networks, as opposed to by satellite – using both the current Internet standard, IPv4, and the next-generation Internet, IPv6. The unit of measurement for the contest is bit-metres per second – a very wise and fair decision as the complexity of achieving high throughput with standard TCP installations, e.g. on Linux, is indeed proportional to the distance.

In 2003 CERN and its partners were involved in several record-breaking feats. On 27-28 February a team from Caltech, CERN, LANL and SLAC entered the science and technology section of the Guinness Book of Records when they set an IPv4 record with a single 2.38 Gbps stream over a 10,000 km path between Geneva and Sunnyvale, California, by way of Chicago. Less than three months later, a new IPv6 record was established on 6 May by a team from Caltech and CERN, with a single 983 Mbps stream over 7067 km between Geneva and Chicago.

However, thanks to the 10 Gbps DataTAG circuit (see “DataTAG” box), which became available in September 2003, new IPv4 and IPv6 records were established only a few months later, first between Geneva and Chicago, and then between Geneva, California and Arizona. On 1 October a team from Caltech and CERN achieved the amazing result of 38.42 petabit-metres per second with a single 5.44 Gbps stream over the 7073 km path between Geneva and Chicago. This corresponds to the transfer of 1.1 terabytes of physics data in less than 30 minutes, or the transfer of a full-length DVD to Los Angeles in about 7 seconds.

Then in November a longer 10 Gbps path to Los Angeles, California and Phoenix, Arizona, became available through Abilene, the US universities’ backbone, and CALREN, the California Research and Education Network. This allowed the IPv4 and IPv6 records to be broken yet again on 6 November, achieving 5.64 Gbps with IPv4 over a path of 10,949 km between CERN and Los Angeles, i.e. 61.7 petabit-metres per second. Five days later, a transfer at 4 Gbps with IPv6 over 11,539 km between CERN and Phoenix through Chicago and Los Angeles established a record of 46.15 petabit-metres per second.

As with all records, there is still ample room for improvement. With the advent of PCI Express chips, faster processors, improved motherboards and better 10GigE network adapters, there is little doubt it will be feasible to push the performance of single-stream TCP transport much closer to 10 Gbps in the near future, that is, well above 100 petabit-metres per second.

As Harvey Newman, head of the Caltech team and chair of the ICFA Standing Committee on Inter-Regional Connectivity, has pointed out, these records are a major milestone towards the goal of providing on-demand access to high-energy physics data from around the world, using servers that are affordable to physicists from all regions. Indeed, for the first time in the history of wide-area networking, performance has been limited only by the end systems and not by the network: servers side by side have the same TCP performance as servers separated by 10,000 km.

The post Gigabits, the Grid and the Guinness Book of Records appeared first on CERN Courier.

Exploiting the transatlantic light path

cern — Mon, 01 Mar 2004 00:00:00 +0000

Romanian president Ion Iliescu (left) talks with Harvey Newman (centre) and Iosif Legrand of Caltech at CERN’s SIS-on-line stand during the ICT4D exhibition at the WSIS in Geneva in December.

In two world exhibitions in Geneva in 2003, a collaboration between Caltech, CERN and other international institutes set out to demonstrate the possibilities and opportunities provided by the DataTAG transatlantic high-speed “light path”, which currently allows data transmission rates up to 10 gigabits per second (Gbps). The Services Industriels de Genève extended the light path into the heart of the exhibition floor in Geneva’s exhibition centre, Palexpo, both for ITU Telecom World 2003 in October and the Information and Communication Technologies for Development (ICT4D) exhibition at the World Summit on the Information Society (WSIS) in December.

A substantial, portable data centre was built on the exhibition floor in collaboration with Telehouse, CERN’s partner in the CERN Internet Exchange Point (CIXP), which is the major centre for interchange between telecommunications operators in the Geneva area. The CIXP was extended directly to the stand in Palexpo and the DataTAG light path was able to provide 10 Gbps Ethernet connectivity from the stand to collaborators in North America – Los Angeles and Chicago in the US and Ottawa in Canada. (Ethernet has come a long way from the days when it was considered to be a technology fit only for very-local-area networks!) The equipment to operate the DataTAG link at these highest state-of-the-art speeds was provided by CISCO, Intel and HP at several points on the light path.

The aims of the two world exhibitions were slightly different. Telecom World 2003 continued the 20 year tradition of CERN’s involvement in demonstrations of the latest high-speed networking, and succeeded in breaking – yet again – the Internet2 records for high-speed data transmission over long distances. The ICT4D exhibition at the WSIS focused on demonstrations aimed at “turning the digital divide into a digital opportunity”, in line with the summit’s declarations.

Jazz players in Geneva play with others in Canada via the transatlantic link-up at ICT4D. (Photo: Daniel Davids.)

Nonetheless, a number of items were common to both exhibitions, such as the Virtual Rooms Videoconferencing System (VRVS), which runs over the Internet; the Grid Café portal, which aims to explain and demonstrate the Grid and is proving extremely popular as a website; and the MonALISA system, which was developed by Caltech and portrays in an elegant and highly visual manner the performance of a worldwide networking system or the machines in a world Grid, and demonstrates how essential such systems are to the successful operation of Grids.

The VRVS system was fundamental to many of the demonstrations. It showed its use for international collaboration in virtual organizations, as well as in e-learning and e-lectures of several varieties, including Tokyo Lectures, a global teaching project in modern artificial intelligence in conjunction with the Swiss Education and Research Network; an impromptu presentation from the stand to the e-health conference in London; and direct sessions to the Internet2 conference in Indianapolis, including the ceremony where Harvey Newman from Caltech and Olivier Martin from CERN jointly received two Internet2 Landspeed Record awards over the Internet and announced that these records had already been broken again during Telecom World 2003.

VRVS was also used when the Romanian president, Ion Iliescu, made an extended visit to the ICT4D stand at WSIS and participated in a videoconference with his compatriots back in Bucharest. President Iliescu was also able to appreciate the efforts of his compatriot Iosif Legrand, who has made the major contribution to MonALISA. E-learning and the global transmission of lectures is a strong point of such systems, especially in the context of WSIS, and plans for such exploitation are now taking off in a really meaningful manner.

The ICT4D exhibition included a “touch and feel” haptic-feedback demonstration between Geneva and Ottawa. (Photo: Daniel Davids.)

Two of the highlights on the ICT4D stand were provided by a collaboration with the Communications Research Centre in Canada. A remote “touch and feel” demonstration of haptic feedback allowed visitors to the stand in Geneva to “shake hands” with people in Ottawa and to feel the body of a dummy, as is necessary in telemedicine. This equipment is being used on a real basis in Canada for trials of remote operations. The final “bouquet” was a jazz concert, with musicians on both sides of the Atlantic playing together. Musicians from the Geneva Conservatoire de Musique played along with those from the Holy Heart of Mary Secondary School in St John’s in Newfoundland. This two-hour session demonstrated the well-developed ability of musicians to adapt to delays of a few hundred milliseconds, and the show was closed by a final jam session.

The post Exploiting the transatlantic light path appeared first on CERN Courier.

Grid technology goes to Pakistan

cern — Tue, 09 Dec 2003 00:00:00 +0000

Participants at the workshop learning about the potential of Grid computing in Pakistan.

As a natural extension of its participation in the Large Hadron Collider (LHC) project, Pakistan has begun a deeper involvement in the LHC Computing Grid (LCG). A first step towards this was the Grid Technology Workshop held in Islamabad on 20-22 October, which was organized by Pakistan’s National Centre for Physics (NCP) in collaboration with CERN. The primary goal of the workshop was to provide hands-on experience in Grid technology to Pakistani scientists, engineers and professionals, enhancing their skills in Grid-related tools such as Grid architecture, Grid standards and the Globus toolkit.

The workshop was inaugurated by CERN’s director-general, Luciano Maiani, who explained that Grid technology will be crucial in exploiting the physics potential of the LHC, which is currently being constructed at CERN by a broad international collaboration that includes Pakistan, as well as China, India, Japan and other eastern and far eastern countries. The inauguration was attended by a number of dignitaries, including Parvez Butt, chairman of the Pakistan Atomic Energy Commission (PAEC), and ambassadors from countries such as Iran, Bangladesh, South Korea and Mynamar. In addition to the participants from Pakistan, several people from CERN attended the workshop.

A variety of talks on the first day was followed by a two-day tutorial session for the 45 participants, who came from 14 different scientific, research and development organizations and universities across Pakistan, including the NCP, the PAEC, the Commission on Science and Technology for Sustainable Development in the South (COMSATS), and the National University of Science and Technology. The Grid tutorials were based on a testbed consisting of nine servers, including computing elements servers, storage element servers, a resource broker server, a server for the Berkley Database Information Index and TopGIIS, a Replica Catalogue Server (RLS), and worker node and user interface servers. The host and user certificates for the Grid testbed machines were issued by the French CNRS certificate authority.

The workshop was very well publicized in the local newspapers and on television, and the participants found it both interesting and useful. The next step is to launch an LCG testbed partner site node on the same resources, and the whole exercise will lead to participation in the Data Challenge 2004 (DC04) for LHC computing.

The post Grid technology goes to Pakistan appeared first on CERN Courier.

The LCG gets started…

cern — Sun, 05 Oct 2003 22:00:00 +0000

Red squares indicate the computer centres active in LCG-1, the first phase of operations of the LHC Computing Grid, at the time of writing (Fermilab in Chicago and Brookhaven National Laboratory near New York, US; PIC in Barcelona, Spain; Rutherford Appleton Laboratory in Oxfordshire, UK; IN2P3 in Lyons, France; CERN; CNAF in Bologna, Italy; FZK in Karlsruhe, Germany; IPNP in Prague, the Czech Republic; RMKI in Budapest, Hungary; Moscow State University, Russia; Academia Sinica in Taipei, Taiwan; and University of Tokyo, Japan). Blue squares are centres that are planning to join within 6-12 months.

This summer the IT division at CERN was a hive of activity as dozens of young software engineers worked round the clock to launch the LHC (Large Hadron Collider) Computing Grid (LCG) into its first phase of operations. Meanwhile, similar hectic preparations were going on at other major computing centres around the world. The LCG project, which was launched last year, has a mission to integrate thousands of computers worldwide into a global computing resource. This technological tour de force will rely on novel Grid software, called middleware, and will also benefit from new hardware developments in the IT industry.

The challenge facing the LCG project can be summarized in terms of two large numbers. The LHC will produce more than 10 petabytes of data a year – the equivalent of a stack of CDs 20 km high – and require around 100,000 of today’s PCs to analyse that data. Behind the numbers, however, is a new philosophy. The data and processing power should be available to the thousands of scientists involved in LHC experiments in a completely seamless fashion, independent of their location. This is the philosophy of computer Grids, which take their name from the ubiquitous, highly reliable electricity grid with its plug-in-the-wall simplicity.

The LCG project has been rapidly gearing up for this challenge, with more than 50 computer scientists and engineers from partner centres around the world joining the effort over the past year. The first version of the LCG, called LCG-1, is now up and running on a restricted number of sites (see map) and with limited functionality. Over the next few years, however, the plan is for the LCG to grow in size and complexity, absorbing new Grid technologies and integrating many more sites.

…while the EGEE gets ready

The success of the European Union (EU)-funded European Data-Grid (EDG) project – a three-year effort led by CERN, which is due to finish in spring 2004 – has generated strong support for a follow-up project. The objective is to build a permanent European Grid infrastructure that can serve a broad spectrum of applications reliably and continuously. Providing such a service will require a much larger effort than setting up the current EDG test bed. So CERN has established a pan-European consortium called Enabling Grids for E-science in Europe (EGEE) to build and operate such a production Grid infrastructure, providing round-the-clock Grid service to scientists throughout Europe.

A proposal for such a project was submitted to the EU 6th Framework Programme in May 2003, where some €50 million has been earmarked by the commission for major Grid infrastructure projects. This proposal, again led by CERN, involves some 70 partners, encompassing all major computer centres in Europe as well as leading American and Russian centres. EGEE, following a positive evaluation by EU independent experts, has been invited to negotiate a contract with the EU for the major part of the allocated funds. Final contract negotiations with the EU are planned for November, and if all goes well the project should get under way by next spring.

The LHC Computing Grid will provide the springboard for EGEE, and in turn benefit from Grid software engineering that is part of the EGEE project. However, the mission of EGEE is also to extend the potential benefits of a Grid infrastructure beyond high-energy physics. The first target is biomedical applications, with other scientific and technological fields not far behind.

European projects galore

EDG and EGEE are by no means the only Grid projects that involve CERN. For example, DataTAG aims to provide high-speed connections between Grids in Europe and the US. In May, the project set its latest land-speed record, transferring data at nearly 1 Gbit/s (equivalent to nearly two DVD films a minute) between CERN and Chicago using the new IPv6 Internet protocol.

CrossGrid aims to extend the functionality of the EDG to advanced applications such as real-time simulations. The GRACE project is developing a decentralized search engine based on Grid technology. MammoGrid is dedicated to building a Grid for hospitals to share and analyse mammograms to improve breast-cancer treatment. GridStart aims to co-ordinate the efforts of the major Grid initiatives in Europe and disseminate information about the benefits of Grid technology to industry and society.

The post The LCG gets started… appeared first on CERN Courier.

The CERN openlab: a novel testbed for the Grid

cern — Sun, 05 Oct 2003 22:00:00 +0000

Sverre Jarp, chief technology officer of the CERN openlab, with equipment from Enterasys Networks. Credit: CERN

Grid computing is the computer buzzword of the decade. Not since the World Wide Web was developed at CERN more than 10 years ago has a new networking technology held so much promise for both science and society. The philosophy of the Grid is to provide vast amounts of computer power at the click of a mouse, by linking geographically distributed computers and developing “middleware” to run the computers as though they were an integrated resource. Whereas the Web gives access to distributed information, the Grid does the same for distributed processing power and storage capacity.

There are many varieties of Grid technology. In the commercial arena, Grids that harness the combined power of many workstations within a single organization are already common. But CERN’s objective is altogether more ambitious: to store petabytes of data from the Large Hadron Collider (LHC) experiments in a distributed fashion and make the data easily accessible to thousands of scientists around the world. This requires much more than just spare PC capacity – a network of major computer centres around the world must provide their resources in a seamless way.

CERN and a range of academic partners have launched several major projects in order to achieve this objective. In the European arena, CERN is leading the European DataGrid (EDG) project, which addresses the needs of several scientific communities, including high-energy particle physics. The EDG has already developed the middleware necessary to run a Grid testbed involving more than 20 sites. CERN is also leading a follow-on project funded by the European Union, EGEE (Enabling Grids for E-Science in Europe), which aims to provide a reliable Grid service to European science. Last year, the LHC Computing Grid (LCG) project was launched by CERN and partners to deploy a global Grid dedicated to LHC needs, drawing on the experience of the EDG and other international efforts. This project has started running a global Grid, called LCG-1.

Enter the openlab

The CERN openlab for DataGrid applications fits into CERN’s portfolio of Grid activities by addressing a key issue, namely the impact on the LCG of cutting-edge IT technologies that are currently emerging from industry. Peering into the technological crystal ball in this way can only be done in close collaboration with leading industrial partners. The benefits are mutual: through generous sponsorship of state-of-the-art equipment from the partners, CERN acquires early access to valuable technology that is still several years from the commodity computing market the LCG will be based on.

In return, CERN provides demanding data challenges, which push these new technologies to their limits – this is the “lab” part of the openlab. CERN also provides a neutral environment for integrating solutions from different partners, to test their interoperability. This is a vital role in an age where open standards (the “open” part of openlab) are increasingly guiding the development of the IT industry.

The CERN openlab for DataGrid applications was launched in 2001 by Manuel Delfino, then the IT Division leader at CERN. After a hiatus, during which the IT industry was rocked by the telecoms crash, the partnership took off in September 2002, when HP joined founding members Intel and Enterasys Networks, and integration of technologies from all three led to the CERN opencluster project.

IBM joins CERN openlab to tackle the petabyte challenge

Rainer Többicke of CERN with the IBM-sponsored 28 terabyte storage system. Credit: CERN

The LHC will generate more than 10 petabytes of data per year, the equivalent of a stack of CD ROMs 20 km high. There is no obvious way to extend conventional data-storage technology to this scale, so new solutions must be considered. IBM was therefore keen to join the CERN openlab in April 2003, in order to establish a research collaboration aimed at creating a massive data-management system built on Grid computing, which will use innovative storage virtualization and file-management technology.

IBM has been a strong supporter of Grid computing, from its sponsorship of the first Global Grid Forum in Amsterdam in 2001 to its participation in the European DataGrid project. The company sees Grid computing as an important technological realization of the vision of “computing on demand”, and expects that as Grid computing moves from exclusive use in the scientific and technical world into commercial applications, it will indeed be the foundation for the first wave of e-business on demand.

The technology that IBM brings to the CERN openlab partnership is called Storage Tank. Conceived in IBM Research, the new technology is designed to provide scalable, high-performance and highly available management of huge amounts of data using a single file namespace, regardless of where or on what operating system the data reside. (Recently, IBM announced that the commercial version will be named IBM TotalStorage SAN File System.) IBM and CERN will work together to extend Storage Tank’s capabilities so it can manage the LHC data and provide access to it from any location worldwide.

Brian E Carpenter, IBM Systems Group, and Jai Menon, IBM Research.

At present, the CERN opencluster consists of 32 Linux-based HP rack-mounted servers, each equipped with two 1 GHz Intel Itanium 2 processors. Itanium uses 64-bit processor technology, which is anticipated to displace today’s 32-bit technology over the next few years. As part of the agreement with the CERN openlab partners, this cluster is planned to double in size during 2003, and double again in 2004, making it an extremely high-performance computing engine. In April this year, IBM joined the CERN openlab, contributing advanced storage technology that will be combined with the CERN opencluster (see “IBM joins CERN openlab to tackle the petabyte challenge” box).

For high-speed data transfer challenges, Intel has delivered 10 Gbps Ethernet Network Interface Cards (NICs), which have been installed on the HP computers, and Enterasys Networks has delivered three switches equipped to operate at 10 gigabits per second (Gbps), with additional port capacity for 1 Gbps.

Over the next few months, the CERN opencluster will be linked to the EDG testbed to see how these new technologies perform in a Grid environment. The results will be closely monitored by the LCG project to determine the potential impact of the technologies involved. Already at this stage, however, much has been learned that has implications for the LCG.

For example, thanks to the preinstalled management cards in each node of the cluster, automation has been developed to allow remote system restart and remote power control. This development confirmed the notion that – for a modest hardware investment – large clusters can be controlled with no operator present. This is highly relevant to the LCG, which will need to deploy such automation on a large scale.

Several major physics software packages have been successfully ported and tested on the 64-bit environment of the CERN opencluster, in collaboration with the groups responsible for maintaining the various packages. Benchmarking of the physics packages has begun and the first results are promising. For example, PROOF (Parallel ROOT Facility) is a version of the popular CERN-developed software ROOT for data analysis, which is being developed for interactive analysis of very large ROOT data files on a cluster of computers. The CERN opencluster has shown that the amount of data that can be handled by PROOF scales linearly with cluster size – on one cluster node it takes 325 s to analyse a certain amount of data, and only 12 s when all 32 nodes are used.

Data challenges galore

One of the major challenges of the CERN opencluster project is to take maximum advantage of the partners’ 10 Gbps technology. In April, a first series of tests was conducted between two of the nodes in the cluster, which were directly connected (via a “back-to-back” connection) through 10 Gbps Ethernet NICs. The transfer reached a data rate of 755 megabytes per second (MB/s), a record, and double the maximum rate obtained with 32-bit processors. The transfer took place over a 10 km fibre and used very big frames (16 kB) in a single stream, as well as the regular suite of Linux Kernel protocols (TCP/IP).

One of the nodes in the high-performance CERN opencluster.

The best results through the Enterasys switches were obtained when aggregating the 1 Gbps bi-directional traffic involving 10 nodes in each group. The peak traffic between the switches was then measured to be 8.2 Gbps. The next stages of this data challenge will include evaluating the next version of the Intel processors.

In May, CERN announced the successful completion of a major data challenge aimed at pushing the limits of data storage to tape. This involved, in a critical way, several components of the CERN opencluster. Using 45 newly installed StorageTek tape drives, capable of writing to tape at 30 MB/s, storage-to-tape rates of 1.1 GB/s were achieved for periods of several hours, with peaks of 1.2 GB/s – roughly equivalent to storing a whole movie on DVD every four seconds. The average sustained over a three-day period was of 920 MB/s. Previous best results by other research labs were typically less than 850 MB/s.

The significance of this result, and the purpose of the data challenge, was to show that CERN’s IT Division is on track to cope with the enormous data rates expected from the LHC. One experiment alone, ALICE, is expected to produce data at rates of 1.25 GB/s.

In order to simulate the LHC data acquisition procedure, an equivalent stream of artificial data was generated using 40 computer servers. These data were stored temporarily to 60 disk servers, which included the CERN opencluster servers, before being transferred to the tape servers. A key contributing factor to the success of the data challenge was a high-performance switched network from Enterasys Networks with 10 Gbps Ethernet capability, which routed the data from PC to disk and from disk to tape.

An open dialogue

While many of the benefits of the CERN openlab for the industrial partners stem from such data challenges, there is also a strong emphasis in openlab’s mission on the opportunities that this novel partnership provides for enhanced communication and cross-fertilization between CERN and the partners, and between the partners themselves. Top engineers from the partner companies collaborate closely with the CERN openlab team in CERN’s IT Division, so that the inevitable technical challenges that arise when dealing with new technologies are dealt with rapidly and efficiently. Furthermore, as part of their sponsorship, HP is funding two CERN fellows to work on the CERN opencluster. The CERN openlab team also organizes thematic workshops on specific topics of interest, bringing together leading technical experts from the partner companies, as well as public “First Tuesday” events on general technology issues related to the openlab agenda, which attract hundreds of participants from the industrial and investor communities.

CERN fellow Andreas Hirstius with the opencluster.

A CERN openlab student programme has also been created, bringing together teams of students from different European universities to work on applications of Grid technology. And the CERN openlab is actively supporting the establishment of a Grid café for the CERN Microcosm exhibition – a Web café for the general public with a focus on Grid technologies, including a dedicated website that will link to instructive Grid demos.

Efforts are ongoing in the CERN openlab to evaluate other possible areas of technological collaboration with current or future partners. The concept is certainly proving popular, with other major IT companies expressing an interest in joining. This could occur by using complementary technologies to provide added functionality and performance to the existing opencluster. Or it could involve launching new projects that deal with other aspects of Grid technology relevant to the LCG, such as Grid security and mobile access to the Grid.

In conclusion, the CERN openlab puts a new twist on an activity – collaboration with leading IT companies – that has been going on at CERN for decades. Whereas traditionally such collaboration was bilateral and focused on “here-and-now” solutions, the CERN openlab brings a multilateral long-term perspective into play. This may be a useful prototype for future industrial partnerships in other high-tech areas, where CERN and a range of partners can spread their risks and increase their potential for success by working on long-term development projects together.

Internet for the masses

cern — Mon, 30 Jun 2003 22:00:00 +0000

What if no telecommunications companies, no government and no World Bank involvement were necessary to develop and build an information and communication technology (ICT) infrastructure in developing countries? What if it cost only $0.50 (€0.43) per student per month to install such an infrastructure in schools in developing countries?

This may sound like an impossible dream for those who live in a developing country, as I do in Indonesia. But fortunately, in reality it can easily be done. It is not the equipment, nor the legislation, nor the investment that counts; it is the ability to educate a critical mass of people to gain the information and knowledge that are vital to the establishment of such an infrastructure.

It seems that the traditional Indonesian telecommunications companies (such as Telco) and the government believe that any ICT infrastructure requires highly skilled and trained personnel to run expensive, sophisticated equipment that can be funded only by multinational investors. This belief is embedded in all legal and policy frameworks within the Indonesian telecommunications industry.

Stronger, smarter and faster

However, let’s take a closer look at ICT, and note a few important features of how it has developed. It has recently become more powerful, smarter and faster, and has greater memory requirements than ever before. Fortunately, all of these advanced features can now be obtained at much lower costs, and are also much easier to use, configure and control. ICT has become more user-friendly – with dramatic consequences.

The investment required in infrastructure can now be drastically reduced to a level that makes it affordable for a household or community to build and operate their own ICT system. Moreover, it can be operated by people with limited technical skills. This enables a community-based telecommunications infrastructure to be built by the people and run by the people, for the people.

It is a totally different concept and a significant paradigm shift away from the traditional telecommunications infrastructure, which is normally licensed by the government and built and run by the telecommunications operators for the subscribers. Unfortunately, most telecommunications policies and regulations, at least in Indonesia, cannot easily be adapted to accommodate such a shift.

After seven years of trying to educate the Indonesian government about the basic idea of a community-based ICT infrastructure, in 1996 I succeeded in having it partially written into some sections in the Indonesian National Information Infrastructure policy known as “Nusantara 21”.

This infrastructure currently supports about 4 million Indonesian individuals, more than 2000 cyber cafés and more than 1500 schools on the Internet, running on more than 2500 WiFi nodes.

However, in February 2000, fed up with the lack of progress, I left my work as a civil servant to dedicate myself to becoming an IT writer, delivering ICT knowledge to Indonesians through various media, such as CD-ROMs, the Web, books, talk shows, seminars and workshops, as well as answering e-mails on more than 100 Internet mailing lists. Since then, experience has proved that a knowledgeable society with access to new ICT equipment can easily deploy a self-financed infrastructure, thus releasing its dependence on the telecommunications companies as well as on its own government.

Two major technologies are used as the backbone of this Indonesian bottom-up, community-based telecommunications infrastructure, namely wireless Internet (WiFi) and Voice over Internet Protocol (VoIP). WiFi-based systems, when run at 2.4 and 5.8 GHz and extended by simple external antennas, are quite good for 5-8 km links. This makes it possible to bypass the Telco system’s “last mile” and enables the NeighborhoodNet Internet Service Provider to reduce access costs.

This infrastructure currently supports about 4 million Indonesian individuals, more than 2000 cyber cafés and more than 1500 schools on the Internet, running on more than 2500 WiFi nodes. It has increased dramatically in size in the past few years.

Building on the infrastructure

Because the Indonesian government is planning to increase phone tariffs in mid-2003, a free VoIP infrastructure, also known as Indonesian VoIP MaverickNet, was deployed on top of the Indonesian Internet infrastructure in early January 2003. Within around three months, we managed to deploy more than 150 VoIP gatekeepers based on www.gnugk.org freeware to handle approximately 1000 calls per gatekeeper per day for more than 3000 registered users and an estimated 8000 or more unregistered users.

Long-distance and local calls are routed through the Internet infrastructure without any Telco interconnection, via a VoIP MaverickNet area code, +6288, that has been specifically assigned to this task. Users can also be called and registered to the VoIP gatekeeper using their normal Telco number if they wish – this can easily be done, as the gatekeeper can recognize any form of number. This has the side-effect that people can be called on Telco’s number at no charge via VoIP MaverickNet, thus avoiding using the expensive Telco infrastructure.

A community-based telecommunications infrastructure would not have been possible without the generous knowledge-sharing of many people on the Internet. I thank them all.

The post Internet for the masses appeared first on CERN Courier.

Laboratories link up to win Internet land speed record

cern — Mon, 31 Mar 2003 22:00:00 +0000

The transfer of data equivalent to two feature-length movies on DVD between California and Amsterdam in less than one second has been recognized as a new record by the Internet2 consortium. The operation, which achieved an average speed of more than 923 megabytes per second, was by an international team from several laboratories and involved a number of different networking systems.

The team comprised members of NIKHEF, SLAC, Caltech and the University of Amsterdam, with support from CERN. They used the advanced networking capabilities of TeraGrid, StarLight, SURFnet and NetherLight, together with optical networking links provided by Level 3 Communications and Cisco Systems. The transfer involved standard PC hardware running Debian GNU/LINUX in Amsterdam and redHat Linux in Sunnyvale. The team is supported by the EU-funded DataTAG project and by the US Department of Energy.

The Internet2 Land Speed Record is an open and ongoing competition run by Internet2, a consortium of 200 universities that are working with industry and government to develop network applications and technologies. The record-breaking event, which took place during the SC2002 conference in Baltimore in November, was judged on a combination of the bandwidth used and the distance covered using standard Internet (TCP/IP) protocols. By transferring 6.7 gigabytes across 10,978 km in 58 seconds, the transfer set a record of 9891.60 terabit metres per second.

The post Laboratories link up to win Internet land speed record appeared first on CERN Courier.

Canadians set record for long-distance data transfer

cern — Wed, 01 Jan 2003 00:00:00 +0000

A Canadian team has succeeded in transferring 1 TByte of data over a newly established “lightpath” extending 12,000 km from TRIUMF in Vancouver to CERN in Geneva in under 4 h – a new record rate of 700 Mbps on average. Peak rates of 1 Gbps were seen during the tests, which took place in conjunction with the iGRID 2002 conference held in Amsterdam in late September. The previous record for a transatlantic transfer was 400 Mbps.

The achievement is particularly notable because the data were transferred from “disk to disk”, making it a realistic representation of a practical data transfer. The data started on disk at TRIUMF and ended up on disk at CERN, where in principle they could be used for physics analysis. The data transferred were the result of Monte Carlo simulations of the ATLAS experiment, being constructed at CERN to take data at the Large Hadron Collider.

The transfer used a new technology for network data transfer, called a lightpath. Lightpaths establish a direct optical link between two remote computers, essentially positioning them in a “local-area network” that is anything but local. This avoids the need for more complicated arbitration (or routing) of the network traffic. The link used here to connect TRIUMF and CERN is the longest-known single-hop network.

The post Canadians set record for long-distance data transfer appeared first on CERN Courier.

Karlsruhe Grid computing centre is inaugurated

cern — Wed, 01 Jan 2003 00:00:00 +0000

Les Robertson, leader of the LHC computing Grid project, takes the stand at the inauguration of GridKa. To the left is FZK executive board member Reinhard Maschuw.

The inauguration colloquium for the Grid Computing Centre Karlsruhe (GridKa) was held on 30 October at the Forschungszentrum Karlsruhe (FZK). FZK hosts the German Tier 1 centre for the Large Hadron Collider (LHC) experiments (ALICE, ATLAS, CMS and LHCb), as well as four other particle physics experiments (BaBar at the Stanford Linear Accelerator Laboratory, CDF and D0 at Fermilab, and COMPASS at CERN).

To cope with the computational requirements of the LHC experiments, a worldwide virtual computing centre is being developed – a global computational grid of tens of thousands of computers and storage devices. About one-third of the capacity will be at CERN, with the other two-thirds in regional computing centres spread across Europe, America and Asia. At the end of 2001, the German HEP community proposed FZK as the host of the German regional centre for LHC computing, and as the analysis centre for BaBar, CDF, D0 and COMPASS. FZK, a German national laboratory of similar size to CERN, accepted the challenge and established GridKa.

After just nine months a milestone has been reached – more than 300 processors and about 40 TByte of disk space are available for physicists from 41 research groups of 19 German institutes. The application software of the eight experiments, as well as grid middleware, has been installed. BaBar was the pilot user and is still the main customer. CDF and D0 have started to use GridKa for the analysis of Tevatron data. During the summer of 2002, ATLAS and ALICE used the centre for their worldwide distributed data challenges. The University of Karlsruhe CMS group uses the centre for analysis jobs.

On 29-30 October, the first GridKa users’ meeting was held. On the first day, more than 50 participants attended tutorials about grid computing, the Globus toolkit, software of the European DataGrid project, and the ALICE grid environment AliEn. The second day continued with presentations on GridKa and the status and plans of the experiments. An important contribution was a talk by CERN’s Ingo Augustin, who discussed the “European Data Grid: First Steps towards Global Computing”.

The highlight of the users’ meeting was the inauguration colloquium for GridKa, with almost 200 representatives from science, industry and politics. After an introduction by Reinhard Maschuw of FZK, there were talks about grid computing by Hermann Schunck of the German Federal Ministry of Education and Research, Marcel Kunze of FZK, Tony Hey representing the UK e-Science Initiative, Siegfried Bethke of the Max Planck Institute in Munich, Michael Resch of the University of Stuttgart, Philippe Bricard of IBM France, and CERN’s Les Robertson. The central theme of all of the talks was the conviction that grid computing will be an important part of the computing infrastructure of the 21st century. The particle physics community will drive the first large-scale deployment of a worldwide grid, which will have a significant impact on future scientific and industrial applications.

The post Karlsruhe Grid computing centre is inaugurated appeared first on CERN Courier.

CERN hosts First Tuesday meeting

cern — Sun, 01 Dec 2002 00:00:00 +0000

CERN played host to a meeting of First Tuesday Suisse romande, a network for innovation and technology in French-speaking Switzerland, in September. Some 250 people came to the laboratory to learn about the latest developments in Grid computing technology. Topics covered Grid development at CERN and in industry, including an insight into the emerging field of Grid economics and an example of how Grid technologies are having an impact in the medical arena. The event also marked the company Hewlett-Packard joining the CERN openlab for DataGrid applications, other sponsors of this industrial collaboration being Intel and Enterasys Networks.

Now in its fourth year of operation, First Tuesday Suisse romande was founded to provide a forum for entrepreneurs, investors and all those interested in new technology. Its Geneva meetings are held on the first Tuesday of each month, and take the form of a few short presentations followed by an informal networking session. CERN’s director for technology transfer and scientific computing, Hans Hoffmann, plans to host more First Tuesday events at CERN. “The Large Hadron Collider project,” he explains, “is a goldmine of technological innovation, ideally suited for the kind of networking events First Tuesday holds.” This first event was broadcast on the Web to ensure a wider international audience could benefit.

The post CERN hosts First Tuesday meeting appeared first on CERN Courier.

LHC and the Grid: the great challenge

cern — Sun, 01 Dec 2002 00:00:00 +0000

Robert A Eisenstein, CERN and US National Science Foundation.

“May you live in interesting times,” says the old Chinese proverb, and we surely do. We are at a time in history when many fundamental notions about science are changing rapidly and profoundly. Natural curiosity is blurring the old boundaries between fields: astronomy and physics are now one indivisible whole; the biochemical roots of biology drive the entire field; and for all sciences the computational aspects, for both data collection and simulation, are now indispensable.

Cheap, readily available, powerful computational capacity and other new technologies allow us to make incredibly fine-grained measurements, revealing details never observable before. We can simulate our detectors and basic physical processes at a level of precision that was unimaginable just a few years ago. This has led to an enormous increase in the demand for processor speed, data storage and fast networks, and it is now impossible to find at one location all the computational resources necessary to keep up with the data output and processing demands of a major experiment. With LEP, or at Fermilab, each experiment could still take care of its own computing needs, but that modality is not viable at full LHC design luminosities. This is true not only for high-energy physics, but for many other branches of experimental and theoretical science.

Thus the idea of distributed computing was born. It is not a new concept, and there are quite a few examples already in existence. However, applied to the LHC, it means that the success of any single large experiment now depends on the implementation of a highly sophisticated international computational “Grid”, capable of assembling and utilizing the necessary processing tools in a way that
is intended to be transparent to the user.

Many issues then naturally arise. How will these various “Grids” share the hardware fabric that they necessarily cohabit? How can efficiencies be achieved that optimize its use? How can we avoid needless recreations of software? How will the Grid provide security from wilful or accidental harm? How much will it cost to implement an initial Grid? What is a realistic timescale? How will all this be managed, and who is in charge?

It is clear that we have before us a task that requires significant advances in computer science, as well as a level of international co-operation that may be unprecedented in science. Substantial progress is needed over the next 5-7 years, or else there is a strong possibility that the use of full LHC luminosity will not be realized on the timescale foreseen. The event rates would simply be too high to be processed computationally.

Most of these things are known, at least in principle. In fact, there are national Grid efforts throughout Europe, North America and Asia, and there are small but significant “test grids” in high-energy physics already operating. The Global Grid Forum is an important medium for sharing what is known about this new computing modality. At CERN, the LHC Computing Grid Project working groups are hard at work with colleagues throughout the high-energy physics community, a principal task being to facilitate close collaboration between the LHC experiments to define common goals and solutions. The importance of doing this cannot be overstated.

As is often the case with high technology, it is hard to plan in detail because progress is so rapid. And creativity – long both a necessity and a source of pride in high-energy physics – must be preserved. Budgetary aspects and international complexities are also not simple. But these software systems must soon be operational at a level consistent with what the detectors will provide, in exactly in the same way as for other detector components. I believe it is time to depart from past practice and to begin treating software as a “deliverable” in the same way we do those other components. That means bringing to bear the concepts of modern project management: clear project definition and assignments; clear lines of responsibility; careful evaluations of resources needed; resource-loaded schedules with milestones; regular assessment and review; and detailed memoranda to establish who is doing what. Will things change en route? Absolutely. But as Eisenhower once put it: “Plans are useless, but planning is essential.”

Several people in the software community are concerned that such efforts might be counter-productive. But good project management incorporates all of the essential intangible factors that make for successful outcomes: respect for the individuals and groups involved; proper sharing of both the resources available and the credit due; a degree of flexibility and tolerance for change; and encouragement of creative solutions.

As has happened often before, high-energy physics is at the “bleeding edge” of an important technological advance – indeed, software is but one among many. One crucial difference today is the high public visibility of the LHC project and the worldwide attention being paid to Grid developments. There may well be no other scientific community capable of pulling this off, but in fact we have no choice. It is a difficult challenge, but also a golden opportunity. We must make the most of it!

The post LHC and the Grid: the great challenge appeared first on CERN Courier.

Grid technology developed by ALICE

cern — Fri, 01 Nov 2002 00:00:00 +0000

The ALICE experiment, which is being prepared for CERN’s Large Hadron Collider, has developed the ALICE production environment (AliEn), which implements many components of the Grid computing technologies that will be needed to analyse ALICE data. Through AliEn, the computer centres that participate in ALICE can be seen and used as a single entity – any available node executes jobs and file access is transparent to the user, wherever in the world a file might be.

For AliEn, the ALICE collaboration has adopted the latest Internet standards for information exchange (known as Web Services), along with strong certificate-based security and authentication protocols. The system is built around open-source components and provides an implementation of a Grid system applicable to cases where handling many distributed read-only files is required.

AliEn aims to offer a stable interface for ALICE researchers over the lifetime of the experiment (more than 20 years). As progress is made in the definition of Grid standards and interoperability, AliEn will be progressively interfaced to emerging products from both Europe and the US. Moreover, it is not specific to ALICE, and has already been adopted by the MammoGrid project (supported by the European Union), which aims to create a pan-European database of mammograms.

ALICE is currently using the system for distributed production of Monte Carlo data at more than 30 sites on four continents. During the last year more than 15,000 jobs have been run under AliEn control worldwide, totalling 25 CPU years and producing 20 Tbyte of data. Information about AliEn is available at http://alien.cern.ch.

The post Grid technology developed by ALICE appeared first on CERN Courier.

Physicists create font for antimatter

cern — Mon, 30 Sep 2002 22:00:00 +0000

Have you ever been frustrated by the difficulty of representing antiparticles in a Microsoft Word document, where you have to resort to writing “-bar” after the letter denoting the particle – for example as in K-bar? Now help is at hand, at least for Apple Macintosh users, in the form of a font that allows bars, or “overlines”, to be added to English characters and the most commonly used Greek characters. Physicists from the University of Mississippi in the US have developed the font, LinguistA, which allows you to make a K-bar, for example, by simply typing shift-5 followed by K.

More information is available at http://www.arxiv.org/abs/hep-ex/0208028.

The post Physicists create font for antimatter appeared first on CERN Courier.

D0 physicists ‘shake hands’ across the Atlantic in key grid test

cern — Wed, 29 May 2002 22:00:00 +0000

D0’s international mixture of experimenters may sometimes get the chance to meet at Fermilab, but their data analysis relies on the power of processors at collaborating institutes across the world, which in future will be connected by the datagrid. (Fermilab Visual Media Services.)

In an important test of datagrid technology, members of the D0 collaboration at Fermilab have successfully communicated across the Atlantic with colleagues in the UK. The aim of the grid is not only to make it possible to access data remotely on different machines, but also to enable data processing to take place on remote machines.

A vital first step in achieving this goal is to allow individuals wishing to access the grid to identify themselves and show that they are authorized users. A two-way trust must be established between the individual and the machine that is being used.

In the tests, carried out in February, Fermilab exchanged files with Lancaster University, UK and Imperial College, London, after the transfers had been authenticated using certificates issued by the Department of Energy ScienceGrid and the UK High Energy Physics Certificate Authority.

The “firewalls” installed in many computer systems are making it increasingly difficult to access computers remotely, which is the antithesis of the philosophy behind the grid. The authentication system is intended to provide a means of allowing secure access so that the grid can operate effectively. In February’s tests the certificates were used to establish trust between users and machines at Fermilab, Imperial College and Lancaster.

Although in this case the users were members of the same collaboration, the transfers took place as though the users were completely unknown to one another. This approach was used to test the Globus Toolkit – the software tool that was used to build the authentication system.

This software is currently being developed by the US-based Globus Project to bring about the higher level of computer access that will be essential if the grid is to fulfil its promise in a wider context.

The post D0 physicists ‘shake hands’ across the Atlantic in key grid test appeared first on CERN Courier.

Particle physics software aids space and medicine

cern — Wed, 29 May 2002 22:00:00 +0000

ESA’s X-ray Multi Mirror mission (XMM) telescope.

Simulation programs play a fundamental role in optimizing the design of particle physics experiments. In the development of reconstruction programs, they provide the necessary input in the form of simulated raw data. In the analysis process they are required to understand the systematic effects resulting from detector resolution and acceptance, as well as the influence of background processes. The predecessors of the Geant4 toolkit – which were written in the now almost obsolete Fortran language – were successfully used at CERN for experiments at the laboratory’s Large Electron-Positron collider and for the design of experiments for the Large Hadron Collider (LHC).

Geant4 was launched as an R&D project in 1994 to demonstrate the suitability of object-oriented programming technology for large software projects in particle physics. The initial collaboration of members of particle physics institutes around the world has since been joined by scientists from the European Space Agency (ESA) and members of the medical community.

The Geant4 software toolkit was designed to simulate particle interactions with matter for particle physics. It contains components to model in detail the geometry and materials of complex particle detectors. The simulated particles are propagated through magnetic and electrical fields and through the materials of the detectors. The core of the program contains information on numerous physics processes that govern the interactions of particles across a wide energy range. Visualization tools and a flexible user interface are available as separate components. Rigorous software engineering makes Geant4 open to change in a rapidly evolving software environment, while at the same time ensuring that it can be easily and fully maintained over the lifetime of large-scale experiments.

Accurate simulations

Geant4 was publicly released in December 1998 and has since been further developed. All Geant4 code and documentation is openly available via the Web. At a recent conference on calorimetry in particle physics at the California Institute of Technology, US, the quality of Geant4’s simulation of the response of electromagnetic and hadronic showers in calorimeters was demonstrated in comparisons of test-beam data with simulation. One of the speakers for the ATLAS experiment (currently in preparation for the LHC) concluded that Geant4 is mature enough as a toolkit, with sufficient physics for electromagnetic showers implemented, to be considered for large-scale detectors.

Other speakers reported on the first results of ongoing comparison projects of hadronic interactions in calorimeters. These first results look very promising. In fact, Geant4 is used in production for the BaBar experiment at the Stanford Linear Accelerator Center, US, and more than 300 million events have been simulated already. This, together with the fact that Geant4 applications are as fast as similar Fortran-based applications, shows that object-oriented technology is capable of standing up to the challenge.

Simulation is equally important in space-based astroparticle physics. Most space probes need to be able to operate for many years without the possibility of physical repair after launch. It is therefore essential to be able to predict the behaviour of all components in the space environment, and in particular to judge the likely effect of radiation on on-board electronics and detectors. The availability of the ISO standard for the exchange of product data (STEP) interface in Geant4 is especially advantageous, as the use of professional computer-aided design tools is commonplace in the aerospace industry.

Geant4 was first used for space applications by ESA in 1999, when ESA and the US National Aeronautics and Space Administration (NASA) each launched an X-ray telescope. Both telescopes follow highly eccentric orbits, reaching at their far point one-third of the distance to the Moon. NASA’s Chandra was launched in July 1999. During the initial phase of operation, some of the front-illuminated charge-coupled devices (CCDs) experienced an unexpected degradation in charge-transfer efficiency. ESA scientists, who had been planning to launch their X-ray multi-mirror (XMM) Newton Observatory in December 1999, needed to understand the possible origin of this problem to protect their detectors from similar damage.

XXM’s geometry implemented in Geant4. (ESA.)

The geometries of both telescopes, including the concentric mirror systems, were described using the Geant4 toolkit. Particles, in particular low-energy protons trapped by the Earth’s magnetosphere in the Van Allen radiation belts, were simulated entering the apertures of the telescopes. The simulation revealed that these particles are scattered at shallow angles from the mirror surfaces and are focused onto the surface of the sensitive CCD detectors, completely bypassing the collimators and other elements that were supposed to shield the devices.

The ATLAS experiment for CERN’s forthcoming LHC has used Geant4 to simulate its muon detectors. (ATLAS collaboration.)

This simulation explained why NASA’s emergency measure to move the detectors out of the focal plane during the passage of the radiation belt prevented any further degradation. With the Geant4 study’s input, the operational procedures of XMM Newton were arranged so that the detectors were powered off during the passage of the radiation belts for about 8 h of the 48 h orbit. Both telescopes now deliver magnificent scientific data.

The dose estimation by simulation of the International Space Station (ISS) radiation environment project (known as DESIRE) aims to use Geant4 to calculate radiation levels inside the Columbus ISS module and to estimate the radiation doses on the astronauts. Apart from assessing the risk involved in space missions from exposure to radiation, Geant4 plays an important role in evaluating the performance of particle detectors. For the ESA BepiColombo mission to Mercury, currently planned for launch in 2009, detectors will analyse the spectrum of fluorescence from planetary material induced by solar flares. Using Geant4, the spectra and expected detector response have been simulated and the optimization of the detector technology in the severe radiation environment close to the Sun is under way.

Medical applications

Geant4’s extended set of physics models, which handle both electromagnetic and hadronic interactions, can be used to address a range of medical applications from conventional photon-beam radiotherapy to brachytherapy (using radioactive sources), hadron therapy and boron neutron capture therapy. The tools for describing geometries, materials and electromagnetic fields can precisely model diverse real-life configurations. An interface to the Digital Imaging and Communications in Medicine (DICOM) standard will soon make it possible to import computer tomograph images directly into a Geant4 geometrical model. The quality-assurance methods applied in Geant4, its open-source distribution and its independent validation by a worldwide user community are particularly important in the medical domain.

Geant4 can play a significant role in estimating the accuracy of radiotherapy treatment planning, exemplified by comparisons of its simulations with commercial software and experimental data. One study exploited Geant4’s accurate simulation of electromagnetic interactions down to very low energies to account precisely for effects resulting from source anisotropy. The same method has also been applied to calculate dose distribution for certain superficial brachytherapy applicators where no other treatment-planning software is available.

A superficial brachytherapy device and the resulting dose distribution, simulated with Geant4 and analysed with CERN’s Anaphe system. (National Institute for Cancer Research, Genoa/INFN Genoa.)

Other studies have exploited Geant4’s capability for precision-modelling of geometries, materials and physics processes to provide accurate dose distributions in heterogeneous geometries. High-precision dose evaluation is important because, in some tumour sites, a 5% under-dosage would decrease local tumour-control probability from around 75% to 50%. As with typical physics applications, in which simulation is used to optimize the design of particle detectors, Geant4 has allowed the optimization of brachytherapy seeds, improving the treatment’s effectiveness while sparing surrounding healthy tissue. The suitability of Geant4 has been demonstrated in advanced radiotherapy techniques, such as intra-operatory and intensity-modulated radiotherapy. Several projects also apply Geant4 in the domain of radiodiagnostics. Possible future extensions include modelling the effects of radiation at the biomolecular level.

Geant4 is developed and maintained by an international collaboration of physicists and computer scientists. The open and collaborative relationship between the development team and its user communities has led to a two-way transfer of technology, with users from fields other than particle physics actively contributing. The expertise of the biomedical and space user communities in simulation has resulted in many significant contributions to Geant4 in areas such as testing and validation, as well as extensions of functionality. These developments bring valuable enhancements to Geant4’s applications in particle physics.

The post Particle physics software aids space and medicine appeared first on CERN Courier.

Datagrid is put to the test

cern — Mon, 22 Apr 2002 22:00:00 +0000

The European Union (EU) funded DataGrid project passed its first-year review at the beginning of March. In a one-day exercise, external experts appointed by the EU watched as a grid testbed was put through its paces. Jobs submitted from several institutes across Europe used grid technology to make the best use of distributed network resources.

Three of the jobs submitted during the DataGrid testbed review ran on machines at NIKHEF in Amsterdam.

The DataGrid project brings together five European institutions engaged in particle physics research: CERN; the French CNRS; Italy’s INFN; NIKHEF in the Netherlands; and PPARC in the UK, along with the European Space Agency as principal contractors. A total of 17 institutions are involved in developing the so-called “middleware” software to analyse distributed computing, storage and data resources and to determine the best place to run a job. Middleware is also responsible for distributing the computation, as well as handling all necessary logging and bookkeeping, and providing fast, secure data transfer and cataloguing.

For the first-year review, computing centres run by the five principal particle physics contractors took part, and 15 jobs were submitted. These came mostly from particle physics experiments, with a smaller number being submitted by the Earth observation and computational biology communities. The jobs were efficiently distributed across the available resources on the testbed network, providing a strong demonstration that the middleware was doing its job correctly. A few of the jobs did not finish as expected, but the reviewers accepted this, saying that they would not have believed that the demonstration was live if there had been no glitches.

The main motivation behind the DataGrid project is storing and processing the enormous amount of data that will be produced by experiments at CERN’S Large Hadron Collider. A data flow of a few petabytes per year is anticipated, and more than 50,000 workstations will be needed for analysis. Although the jobs in the first-year review consumed a total of less than one hour of computing time, they demonstrated the principle that such a task can be handled using a grid approach. In its current state, the testbed can provide 8 months of CPU time in a single day on a total of 242 machines. The next step is to expand testbed use to more users and to conduct more challenging tests.

The post Datagrid is put to the test appeared first on CERN Courier.

New high-speed data link between CERN and the US

cern — Fri, 22 Feb 2002 00:00:00 +0000

The growth of international collaboration in science was underlined last December by the award of a contract to Dutch telecoms provider KPNQuest for a new transatlantic high-speed data link at 622 Mbps, to replace the existing two 155 Mbps links.

Connecting CERN to StarLight^TM, the optical component of the STAR TAP^TM Internet exchange in Chicago, the new link will be funded by a consortium of the French particle and nuclear physics institute (IN2P3), the US Department of Energy and National Science Foundation, the Canadian high-energy physics community, the World Health Organization and CERN. Research users of transatlantic networking should start to notice the benefits from April 2002.

TM Internet exchange. StarLight is the emerging optical component of the National Science Foundation-funded STAR TAP^TM international interconnection point for advanced research and education networks. (Electronic Visualization Laboratory, University of Illinois, Chicago.)' data-caption='Optical cables at Chicago’s StarLight^TM Internet exchange. StarLight is the emerging optical component of the National Science Foundation-funded STAR TAP^TM international interconnection point for advanced research and education networks. (Electronic Visualization Laboratory, University of Illinois, Chicago.)'>TM Internet exchange. StarLight is the emerging optical component of the National Science Foundation-funded STAR TAP^TM international interconnection point for advanced research and education networks. (Electronic Visualization Laboratory, University of Illinois, Chicago.)'>

Optical cables at Chicago’s StarLight^TM Internet exchange. StarLight is the emerging optical component of the National Science Foundation-funded STAR TAP^TM international interconnection point for advanced research and education networks. (Electronic Visualization Laboratory, University of Illinois, Chicago.)

A second very-high-performance data link operating at 2.5 Gbps, also connecting CERN to StarLight^TM, is expected to be ordered soon. This is part of the European Union-funded DataTAG (research & technological development for a transatlantic Grid) project, in collaboration with the Department of Energy and National Science Foundation. It will form an important part of the network for the Large Hadron Collider computing Grid.

The post New high-speed data link between CERN and the US appeared first on CERN Courier.

Green light for massive increase in computing power for LHC data

cern — Wed, 31 Oct 2001 00:00:00 +0000

CERN’s particle physics experiments require massive data storage installations.

The first phase of the impressive Computing Grid project for CERN’s future Large Hadron Collider (LHC) was approved at a special meeting of CERN’s Council, its governing body, on 20 September.

CERN is gearing up for an unprecedented avalanche of data from the large experiments at the LHC (CERN Courier October p31). After LHC commissioning in 2006, the collider’s four giant detectors will be accumulating more than 10 million Gbytes of particle-collision data each year (equivalent to the contents of about 20 million CD-ROMs). To handle this will require a thousand times as much computing power than is available to CERN today.

Nearly 10,000 scientists, at hundreds of universities round the world, will group in virtual communities to analyse this LHC data. The strategy relies on the coordinated deployment of communications technologies at hundreds of institutes via an intricately interconnected worldwide grid of tens of thousands of computers and storage devices.

The LHC Computing Grid project will proceed in two phases. The first, to be activated in 2002 and continuing in 2003 and 2004, will develop the prototype equipment and techniques necessary for the data-intensive scientific computing of the LHC era. In 2005, 2006 and 2007, Phase 2 of the project, which will build on the experience gained in the first phase, will construct the production version of the LHC Computing Grid.

Phase 1 will require an investment at CERN of SwFr 30 million (some EURO 20 million) which will come from contributions from CERN’s member states and major involvement of industrial sponsors. More than 50 positions for young professionals will be created. Significant investments are also being made by participants in the LHC programme, particularly in the US and Japan, as well as Europe.

This challenge of handling huge quantities of data now being confronted by CERN will be faced subsequently by governments, commerce and other organizations. The LHC will be a computing testbed for the world.

Openlab attracts big names

To push the LHC computing effort, CERN has set up the openlab for DataGrid applications. Already, three leading information technology firms – Enterasys Networks, Intel and KPNQwest – are collaborating on this project in advanced distributed computing. Each firm will invest SwFr 2.5 million (EURO 1.6 million) over three years.

CERN already coordinates one major Grid computing effort – the EU-funded DataGrid project (CERN Courier March p5). An important aim of the CERN openlab is to take the results of these projects and apply them in the LHC Computing Grid.

The World Wide Web, which was developed at CERN during the run-up to research at the LEP collider, allows easy access to previously prepared information. Grid technologies will go further, searching out and analysing data from tens of thousands of interconnected computers and storage devices across the world.

This new capability will enable data stored anywhere to be exploited much more efficiently. Particle physics is blazing a scientific Grid trail for meteorologists, biologists and medical researchers.

See http://www.cern.ch/openlab.

The post Green light for massive increase in computing power for LHC data appeared first on CERN Courier.

CERN sells its internal transaction management software to UK firm

cern — Mon, 01 Oct 2001 22:00:00 +0000

Most of the team that created CERN’s Internal Transaction Management system, called Permissioning under a new deal with specialist software house Transacsys , but better known at CERN as Electronic Document Handling. Left to right: Derek Mathieson, Rotislav Titov, Per Gunnar Jonsson, Ivica Dobrovicova and James Purvis. Team member Jurgen de Jonghe is not present.

CERN has sold its Internal Transaction Management system to UK internal transaction management concern Transacsys for 1 million Swiss francs (EURO 660,000). The system, which has been cited by software giant Oracle as the blueprint for building large-scale e-business systems, is being launched commercially. Transacsys is co-operating with Oracle in the marketing of the software.

Internal transactions are the actions that people take and the processes that they use in the course of their job. Internal transactions need managing because organizations need to know and to control how people commit and expend corporate resources.

The CERN software on the one hand empowers individuals to transact and on the other hand controls such transactions in accordance with corporate rules. It has been designed to be totally flexible so that users themselves can create new processes, and implement and change them at will, with no programming required.

CERN, with an annual budget of more than EURO 600 million and more than 6000 regular users working in 500 institutes in 50 different countries, can support the software internally using just two people.

Permissioning is the name that Transacsys has given to this enterprise-wide process, which enables people to have speedy authorization to execute tasks and organizations to control these processes without the need for extensive administrative resources.

In 1990 CERN developed the World Wide Web to help to empower its user community of more than 6000 physicists around the world to share information across remote locations. Soon after, CERN began to develop what became the Permissioning system, when an advanced informatics support project was launched. In this project the system, which is known as Electronic Document Handling (EDH) at CERN, was to provide an electronic replacement for a rickety system of paper administration forms that had accumulated over the years. Functionality has been progressively extended over more than eight years of constant development.

Transacsys and CERN have formed a long-term joint steering group to co-operate on further development of the system. CERN will, of course, continue to use the system and it will be free for use by other particle physics laboratories associated with CERN.

The post CERN sells its internal transaction management software to UK firm appeared first on CERN Courier.

Close encounters with clusters of computers

cern — Mon, 01 Oct 2001 22:00:00 +0000

Presenting LHC computing needs – Wolfgang von Rueden, head of the Physics Data Processing Group in CERN’s Information Technology Division.

Recent revolutions in computer hardware and software technologies have paved the way for the large-scale deployment of clusters of off-the-shelf commodity computers to address problems that were previously the domain of tightly coupled multiprocessor computers. Near-term projects within high-energy physics and other computing communities will deploy clusters of some thousands of processors serving hundreds or even thousands of independent users. This will expand the reach in both dimensions by an order of magnitude from the current, successful production facilities.

A Large-Scale Cluster Computing Workshop held at Fermilab earlier this year examined these issues. The goals of the workshop were:

to determine what tools exist that can scale up to the cluster sizes foreseen for the next generation of high energy physics experiments (several thousand nodes) and by implication to identify areas where some investment of money or effort is likely to be needed;
to compare and record experiences gained with such tools;
to produce a practical guide to all stages of planning, installing, building and operating a large computing cluster in HEP;
to identify and connect groups with similar interest within HEP and the larger clustering community.

Thousands of nodes

Particle physics experiments spearhead the demand for more computing power.

Computing experts with responsibility and/or experience of such large clusters were invited. The clusters of interest were those equipping centres of the sizes of Tier 0 (thousands of nodes) for CERN’s LHC project, or Tier 1 (at least 200-1000 nodes) as described in the MONARC (Models of Networked Analysis at Regional Centres for LHC Experiments). The attendees came not only from various particle physics sites worldwide but also from other branches of science, including biomedicine and various Grid computing projects, as well as from industry.

The attendees shared freely their experiences and ideas, and proceedings are currently being edited from material collected by the convenors and offered by attendees. In addition the convenors, again with the help of material offered by the attendees, are in the process of producing a guide to building and operating a large cluster. This is intended to describe all phases in the life of a cluster and the tools used or planned to be used. This guide should then be publicized (made available on the Web and presented at appropriate meetings and conferences) and regularly kept up to date as more experience is gained. It is planned to hold a similar workshop in 18-24 months to update the guide. All of the workshop material is available via http://conferences.fnal.gov/lccws.

The meeting began with an overview of the challenge facing high-energy physics. Matthias Kasemann, head of Fermilab’s Computing Division, described the laboratory’s current and near-term scientific programme, including participation in CERN’s future LHC programme, notably in the CMS experiment. He described Fermilab’s current and future computing needs for its Tevatron collider Run II experiments, pointing out where clusters, or computing “farms” as they are sometimes known, are used already. He noted that the overwhelming importance of data in current and future generations of high-energy physics experiments had prompted the interest in Data Grids. He posed some questions for the workshop to consider:

Should or could a cluster emulate a mainframe?
How much could particle physics computer models be adjusted to make most efficient use of clusters?
Where do clusters not make sense?
What is the real total cost of ownership of clusters?
Could we harness the unused power of desktops?
How can we use clusters for high I/O applications?
How can we design clusters for high availability?

LHC computing needs

Wolfgang von Rueden, head of the Physics Data Processing group in CERN’s Information Technology Division, presented the LHC computing needs. He described CERN’s role in the project, displayed the relative event sizes and data rates expected from Fermilab Run II and from LHC experiments, and presented a table of their main characteristics, pointing out in particular the huge increases in data expected and consequently the huge increase in computing power that must be installed and operated.

The other problem posed by modern experiments is their geographical spread, with collaborators throughout the world requiring access to data and computer power. Von Rueden noted that typical particle physics computing is more appropriately characterized as high throughput computing as opposed to high performance computing.

Clustered computers. The MONARC project is a multilayered approach based on a large central site to collect and store raw data (Tier 0 at CERN), with multitiers (for example National Computing Centres, Tier 1), down to individual user’s desks (Tier 4), each with data extracts and/or data copies and performing different stages of physics analysis.

The need to exploit national resources and to reduce the dependence on links to CERN has produced the MONARC multilayered model. This is based on a large central site to collect and store raw data (Tier 0 at CERN) and multitiers (for example National Computing Centres, Tier 1 – examples of these are Fermilab for the US part of the CMS experiment at the LHC and Brookhaven for the US part of the ATLAS experiment), down to individual user’s desks (Tier 4), each with data extracts and/or data copies and each one performing different stages of physics analysis.

Von Rueden showed where Grid Computing will be applied. He ended by expressing the hope that the workshop could provide answers to a number of topical problem questions, such as cluster scaling and making efficient use of resources, and some good ideas to make progress in the domain of the management of large clusters.

The remainder of the meeting was given over to some formal presentations of clustering as seen by some large sites (CERN, Fermilab and SLAC) and also from small sites without on-site accelerators of their own (NIKHEF in Amsterdam and CCIN2P3 in Lyon). However, the largest part of the workshop was a series of interactive panel sessions, each seeded with questions and topics to discuss, and each introduced by a few short talks. Full details of these and most of the overheads presented during the workshop can be seen on the workshop Web site.

Fermilab Computing Division head Matthias Kasemann discusses computing requirements for the CMS experiment at CERN’s LHC with Vivian O’Dell.

Many tools were highlighted: some commercial, some developed locally and some adopted from the open source community. In choosing whether to use commercial tools or develop one’s own, it should be noted that so-called “enterprise packages” are typically priced for commercial sites where downtime is expensive and has quantifiable cost. They usually have considerable initial installation and integration costs. However, one must not forget the often high ongoing costs for home-built tools as well as vulnerability to personnel loss/reallocation.

Discussing the G word

There were discussions on how various institutes and groups performed monitoring, resource allocation, system upgrades, problem debugging and all of the other tasks associated with running clusters. Some highlighted lessons learned and how to improve a given procedure next time. According to Chuck Boeheim of SLAC, “A cluster is a very good error amplifier.”

Different sites described their methods for installing, operating and administering their clusters. The G word (for Grid) cropped up often, but everyone agreed that it was not a magic word and that it would need lots of work to implement something of general use. One of the panels described the three Grid projects of most relevance to high-energy physics, namely the European DataGrid project and two US projects – PPDG (Particle Physics Data Grid) and GriPhyN (Grid Physics Network).

Two “farms” of PCs at Fermilab. Such clusters have largely replaced multiprocessor mainframes.

A number of sites described how they access data. Within an individual experiment, a number of collaborations have worldwide “pseudo-grids” operational today. In this context, Kors Bos of NIKHEF, Amsterdam, referred to the existing SAM database for the D0 experiment at Fermilab as an “early-generation Grid”. These already point toward issues of reliability, allocation, scalability and optimization for the more general Grid.

Delegates agreed that the meeting had been useful and that it should be repeated in approximately 18 months. There was no summary made of the Large-Scale Cluster Computing Workshop, the primary goal being to share experiences, but returning to the questions posed at the start by Matthias Kasemann, it is clear that clusters have replaced mainframes in virtually all of the high-energy physics world, but that the administration of them is particularly far from simple and poses increasing problems as cluster sizes scale. In-house support costs must be balanced against bought-in solutions, not only for hardware and software but also for operations and management. Finally, delegates attending the workshop agreed that there are several solutions for, and a number of practical examples of, the use of desktop machines to increase the overall computing power available.

The Grid: crossing borders and boundaries

The World Wide Web was invented at CERN to exchange information among particle physicists, but particle physics experiments now generate more data than the Web can handle. So physicists often put data on tapes and ship the tapes from one place to another – an anachronism in the Internet era. However, that is changing, and the US Department of Energy’s new Scientific discovery through advanced computing program (SciDAC) will accelerate the change.

Fermilab is receiving additional funds through SciDAC, some of which will be channelled into Fermilab contributions to the Compact Muon Solenoid Detector (CMS) being built for CERN. A major element in this is the formulation of a distributed computing system for widespread access to data when CERN’s LHC Large Hadron Collider begins operation in 2006. Fermilab’s D0 experiment has established its own computing grid called SAM, which is used to offer access for experiment collaborators at six sites in Europe.

With SciDAC support, the nine-institution Particle Physics DataGrid collaboration (Fermilab, SLAC, Lawrence Berkeley, Argonne, Brookhaven, Jefferson, CalTech, Wisconsin and UC San Diego) will develop the distributed computing concept for particle physics experiments at the major US high-energy physics research facilities. Both D0 and US participation in the CMS experiment for the LHC are member experiments. The goal is to offer access to the worldwide research community, developing “middleware” to make maximum use of the bandwidths available on the network.

The DataGrid collaboration will serve high-energy physics experiments with large-scale computing needs, such as D0 at Fermilab, BaBar at SLAC and the CMS experiment, now under construction to operate at CERN, by making the experiments’ data available to scientists at widespread locations.

The post Close encounters with clusters of computers appeared first on CERN Courier.

CERN computing wins top award

cern — Tue, 28 Aug 2001 22:00:00 +0000

On 4 June in Washington’s National Building Museum, Les Robertson, deputy leader of CERN ‘s information technology division, accepted a 21st-century Achievement award from the Computerworld Honors Program, on behalf of the laboratory.

Members of the team that initiated the SHIFT project at CERN. Left to right: Ben Segal, Matthias Schroeder, Gail Hanson (holding the Computerworld trophy), Bernd Panzer, Jean-Philippe Baud, Les Robertson and Fréderic Hemmer.

This prestigious award was made to CERN for its innovative application of information technology to the benefit of society, and it followed the laboratory’s nomination by Lawrence Ellison, chairman and CEO of the Oracle Corporation. Ellison nominated CERN in the science category in recognition of “pioneering work in developing a large-scale data warehouse” – an innovative computing architecture that responds precisely to the global particle physics community’s needs.

The kind of computing needed to analyse particle physics data is known as high-throughput computing – a field in which CERN has played a pioneering role for over a decade. In the early 1990s a collaboration of computer scientists from the laboratory, led by Les Robertson, and physicists from many of CERN’s member states developed a computing architecture called SHIFT, which allowed multiple tape, disc and CPU servers to interact over high-performance network protocols. SHIFT’s modular design simultaneously allowed scalability and easy adoption of new technologies.

Over the years, CERN has proved these features by evolving SHIFT from the systems of the 1990s, based on RISC (reduced instruction set computer) workstations and specialized networks, to today’s massive systems. These include thousands of Linux PC nodes linked by gigabit Ethernet to hundreds of Terabytes of automated tape storage cached by dozens of Terabytes of caches based on commodity disk components.

CERN has since worked on evolving SHIFT in collaboration with physicists and engineers from universities and laboratories around the world. Several collaborations with industrial partners have been formed as successive technologies were integrated into the system. Today, SHIFT is in daily use by the many physics experiments that use CERN’s facilities, providing a computing service for more than 7000 researchers worldwide.

For the future, CERN and other particle physics institutes are working on scaling up this innovative architecture to handle tens of thousands of nodes, and incorporating computational grid technology to link the CERN environment with other computing facilities, easing access to the colossal quantities of data that will be produced by experiments at the laboratory’s forthcoming particle accelerator, the Large Hadron Collider, which will switch on in 2006.

Welcoming the award, CERN director-general, Luciano Maiani said: “This is an important recognition of CERN’s excellence in information technology. In particular, it is a reward for the teams of physicists on CERN’s LEP experiments who contributed to the development and implementation of this new architecture. The prize is also an encouragement for the physicists working on the complex challenges of LHC computing.”

Hans Hoffmann, CERN’s director of scientific computing, commented: “In addition to its major contribution to physics, CERN has been a consistent innovator in information technology, from the Web to its current work on grid computing. We are delighted with this prize; particularly as it demonstrates recognition for CERN’s computing initiatives, not from the academic world but from industry’s leading computing experts.”

Also among the winners this year was Tim Berners-Lee, who received the Cap Gemini Ernst & Young Leadership award for Global Integration in recognition of his pioneering work on the World Wide Web – work carried out while he was at CERN in the early 1990s.

* More information on the Computerworld Honors Programme is available at “http://www.cwheroes.org”.

The post CERN computing wins top award appeared first on CERN Courier.

A major SHIFT in outlook

cern — Sun, 01 Jul 2001 22:00:00 +0000

The changing landscape at CERN’s Computer Centre: 1988, with the Cray supercomputer (in blue and yellow) in the background. Credit: CERN

I don’t remember exactly who first proposed running physics batch jobs on a UNIX workstation, rather than on the big IBM or Cray mainframes that were doing that kind of production work in 1989 at CERN. The workstation in question was to be an Apollo DN10000, the hottest thing in town with reduced instruction set (RISC) CPUs of a formidable five CERN Units (a CERN Unit was defined as one IBM 370/168, equivalent to four VAX 11-780s) each and costing around SwFr 150 000 for a 4-CPU box.

It must have been the combined idea of Les Robertson, Eric McIntosh, Frederic Hemmer, Jean-Philippe Baud, myself and perhaps some others who were working at that time around the biggest UNIX machine that had ever crossed the threshold of the Computer Centre – a Cray XMP-48, running UNICOS.

At any rate, when we spoke to the Apollo salespeople about our idea, they liked it so much that they lent us the biggest box they had, a DN10040 with four CPUs plus a staggering 64 Mb of memory and 4 Gb of disk space. Then, to round it off, they offered to hire a person of our choice for three years to work on the project at CERN.

In January 1990 the machine was installed and our new “hireling”, Erik Jagel, an Apollo expert after his time managing the Apollo farm for the L3 experiment, coined the name “HOPE” for the new project. (Hewlett-Packard had bought Apollo and OPAL had expressed interest, so it was to be the “HP OPAL Physics Environment”).

We asked where we could find the space to install HOPE in the Computer Centre. We just needed a table with the DN10040 underneath and an Ethernet connection to the Cray, to give us access to the tape data. The reply was: “Oh, there’s room in the middle” – where the recently obsolete round tape units had been – so that was where HOPE went, looking quite lost in the huge computer room, with the IBM complex on one side and the Cray supercomputer on the other.

Soon the HOPE cycles were starting to flow. The machine was surprisingly reliable, and porting the big physics FORTRAN programs was easier than we had expected. After around six months, the system was generating 25 per cent of all CPU cycles in the centre. Management began to notice the results when we included HOPE’s accounting files in the weekly report we made that plotted such things in easy-to-read histograms.

We were encouraged by this success and went to work on a proposal to extend HOPE. The idea was to build a scalable version from interchangeable components: CPU servers, disk servers and tape servers, all connected by a fast network and software to create a distributed mainframe. “Commodity” became the keyword – we would use the cheapest building-blocks available from the manufacturers that gave the best price performance for each function.

Ten years on: CERN’s Computer Centre in 1998, with banks of SHIFT-distributed computing arrays for major experiments. Credit: CERN

On how large a scale could we build such a system and what would it cost? We asked around, and received help from some colleagues who treated it as a design study. A simulation was done of the workflow through such a system, bandwidth requirements were estimated for the fast network “backplane” that was needed to connect everything, prices were calculated, essential software was sketched out and the manpower required for development and operation was predicted.

Software development would be a challenge. Fortunately, some of us had been working with Cray at CERN, adding some facilities to UNIX that were vital for mainframe computing: a proper batch scheduler and a tape-drive reservation system, for example. These could be reused quite easily.

Other new functions would include a distributed “stager” and a “disk-pool manager”. These would allow the pre-assembly of each job’s tape data (read from drives on tape servers) into efficiently-managed disk pools that would be located on disk servers, ready to be accessed by the jobs in the CPU servers. Also new would be the “RFIO”, a remote file input-output package that would offer a unified and optimized data-transfer service between all of the servers via the backplane. It looked like Sun’s networking filing system, but was much more efficient.

SHIFT in focus

Finally, a suitable name was coined, again by Erik Jagel: “SHIFT”, for “Scalable Heterogeneous Integrated FaciliTy”, suggesting the paradigm shift that was taking place in large-scale computing: away from mainframes and towards a distributed low-cost approach.

The “SHIFT” proposal report was finished in July 1990. It had 10 names on it, including the colleagues from several groups that had offered their ideas and worked on the document.

“Were 10 people working on this?” and “How many Cray resources were being used and/or counted?” came the stern reply. In response, we pointed out that most of the 10 people had contributed small fractions of their time, and that the Cray had been used simply as a convenient tape server. It was the only UNIX machine in the Computer Centre with access to the standard tape drives, all of which were physically connected to the IBM mainframe at that time.

Closer to home, the idea fell on more fertile ground, and we were told that if we could persuade at least one of the four LEP experiments to invest in our idea, we could have matching support from the Division. The search began. We spoke to ALEPH, but they replied, “No, thank you, we’re quite happy with our all-VAX VMS approach.” L3 replied, “No thanks, we have all the computing power we need.” DELPHI replied, “Sorry, we’ve no time to look at this as we’re trying to get our basic system running.”

Only OPAL took a serious look. They had already been our partner in HOPE and also had a new collaborator from Indiana with some cash to invest and some small computer system interface (SCSI) disks for a planned storage enhancement to their existing VMS-based system. They would give us these contributions until March 1991, the next LEP start-up – on the condition that everything was working by then, or we’d have to return their money and disks. It was September 1990, and there was a lot of work to do.

Our modular approach and use of the UNIX, C language, TCP/IP and SCSI standards were the keys to the very short timescale we achieved. The design studies had included technical evaluations of various workstation and networking products.

PC power: CERN’s Computer Centre in 2001 has wall-to-wall PCs. Credit: CERN

By September, code development could begin and orders for hardware went out. The first tests on site with SGI Power Series servers connected via UltraNet took place at the end of December 1990. A full production environment was in place by March 1991, the date set by OPAL.

And then we hit a problem. The disk server system began crashing repeatedly with unexplained errors. Our design evaluations had led us to choose a “high-tech” approach: the use of symmetric multiprocessor machines from Silicon Graphics for both CPU and disk servers, connected by the sophisticated “UltraNet” Gigabit network backplane. One supporting argument had been that if the UltraNet failed or could not be made to work in time, then we could put all the CPUs and disks together in one cabinet and ride out the OPAL storm. We hadn’t expected any problems in the more conventional area of the SCSI disk system.

Our disks were mounted in trays inside the disk server, connected via high-performance SCSI channels. It looked standard, but we had the latest models of everything. Like a performance car, it was a marvel of precision but impossible to keep in tune. We tried everything, but still it went on crashing and we finally had to ask SGI to send an engineer. He found the problem: inside our disk trays was an extra metre of flat cable which had not been taken into account in our system configuration. We had exceeded the strict limit of 6 m for single-ended SCSI, and in fact it was our own fault. Rather than charging us penalties and putting the blame where it belonged, SGI lent us two extra CPUs to help us to make up the lost computing time for OPAL and ensure the success of the test period!

At the end of November 1991, a satisfied OPAL doubled its investment in CPU and disk capacity for SHIFT. At the same time, 16 of the latest HP 9000/720 machines, each worth 10 CERN Units of CPU, arrived to form the first Central Simulation Facility or “Snake Farm”. The stage was set for the exit of the big tidy mainframes at CERN, and the beginning of the much less elegant but evolving scene we see today on the floor of the CERN Computer Centre. SHIFT became the basis of LEP-era computing and its successor systems are set to perform even more demanding tasks for the LHC, scaled this time to the size of a worldwide grid.

The post A major SHIFT in outlook appeared first on CERN Courier.

Finnish technology takes on CERN’s data mountain

cern — Sun, 29 Apr 2001 22:00:00 +0000

In the early 1990s CERN was confronted with a big problem – how to manage the estimated 2.5 million documents needed to build its proposed new accelerator, the Large Hadron Collider (LHC). Fortunately a solution was at hand in the form of a novel distributed information system developed at the laboratory by Tim Berners-Lee and colleagues – the World Wide Web.

The Web, in combination with an initiative set up at the Helsinki Institute of Technology (HUT), has led to the successful transfer of technology and know-how from CERN to the young Helsinki-based company Single Source Oy.

When the LHC project got under way, HUT’s Institute of Particle Physics Technology surveyed competencies available in Finland to identify areas where the country could best contribute. Among their finds was a group at the university’s Institute of Industrial Automation that was studying the development of business processes in large international companies.

LHC testbed

Keeping track of information traffic between different nodes is a powerful tool for studying working relationships.

The LHC, as one of the largest international projects that has ever been undertaken, provided an ideal testbed for the group’s nascent ideas, so the project director Ari-Pekka Hameri, together with many of his staff, relocated to CERN. In 1996 they launched TuoviWDM (the Tuovi Web Data Management project). A Finnish girl’s name, Tuovi takes it name from the Finnish acronym for product process visualization.

The TuoviWDM project provided the Web interface to CERN’s commercially-supplied Engineering Data Management System, in which all LHC-related documents reside. The project also interfaced naturally with CoDisCo (the Connecting Distributed Competencies project), run by a consortium of Nordic industrial companies funded by the Nordisk Industrifond. CoDisCo used CERN as a case-study for distributed project management practice, with the intention of transferring CERN’s Web experience across to industry.

Over the years the number of Finnish engineers and students passing through CERN to work on TuoviWDM steadily increased as the project evolved. Take-up at CERN was slow at first, but, when it became apparent that several underlying data management packages were being used – the LHC experiments, for example, do not use the same packages as the accelerator teams – the need for a single platform-independent interface became clear and TuoviWDM fitted the bill. The next question to be asked was how to ensure long-term support for a system that had been designed and built by a small in-house team.

The solution came at the end of 1996 in the form of an agreement between CERN and the Helsinki Institute of Physics (HIP), which has responsibility for Finland’s relationship with CERN. Under this agreement, HIP would finance future software development while CERN would continue to provide the necessary infrastructure and support. CERN was also granted an irrevocable, non-exclusive and permanent licence to use TuoviWDM free of charge. “The agreement gives CERN extensive benefits,” explained Dr Hameri, “in return for a modest contribution in terms of infrastructure support and a testbed for the technology.” However, the agreement left the question of long-term support open. Moreover, CERN was not the only body needing such support – companies involved in a TuoviWDM pilot project were also asking for the product to be put on a more solid footing, and so the idea of launching a commercial company was hatched.

At first, TuoviWDM provided a Web-based interface to all documentation related to a particular project. By 1998 this had been deployed in many particle physics research centres around Europe and was being used by about 12 000 people. It was also in 1998 that some of the original HUT people who had worked on the project at CERN started up Single Source Oy to support the software.

relationships. CERN’s Large Hadron Collider. An estimated 2.5 million documents will be produced in its construction.

Meanwhile, development was still under way at CERN, and the fledgling firm worked hand in hand with the lab to add features that would be invaluable to the LHC project and marketable by the company anywhere where large teams of people had to be managed. It was during this period that TuoviWDM evolved into the commercial product Kronodoc, which not only manages documents – keeping track of authorship and cataloguing modifications – but also provides a powerful management tool by tracking the use of documents.

Kronodoc allows project managers to see who is accessing documents and how they are using them. It distinguishes between viewing and downloading, which roughly equates to the difference between using a document and working on it. The software also builds self-organizing maps that show, at a glance, groups of closely collaborating individuals, as well as isolated groups that have little or no contact. In any large project it is natural for working partnerships to evolve, and for some groups to work closely together at one point in the project’s life and not at another. Engineers, for example, may work more closely with draughtsmen at the beginning of a project than they do as the project evolves. By revealing these working relationships, Kronodoc allows project managers to take the pulse of the project at any moment and then to make sure that all of the necessary working relationships have been put in place.

Today, Single Source Oy is a successful company, the customers of which include a leading manufacturer of both diesel power plants and marine diesel engines, the Wärtsilä corporation. In the view of Ari-Pekka Hameri, who is still at CERN, this success would not have been possible without the close collaboration between CERN, the Finnish institutions and industry. Over the lifetime of the project, some 38 people funded from Finland worked at CERN, collaborating closely with the laboratory’s personnel and making full use of their expertise. TuoviWDM produced 16 master’s theses and contributed to two doctorates, as well as training 18 students on summer placement programmes. These figures alone represent a significant transfer of technology through people, given that 80% of these students have so far found jobs in industry. According to Dr Hameri, “This flexible exchange of students and researchers, which could be coordinated to the changing needs of the development work, is a unique and highly positive feature of research institutes like CERN.”

Turning inventions into companies

In Finland an invention is the property of its inventor, not of the institution where s/he works. Moreover, the country encourages institutions to support inventors who wish to turn their ideas into companies. “The recent success of Finnish high-technology industry is at least partly due to this type of supportive environment,” said Dr Hameri, who intended to apply a similar approach to TuoviWDM. CERN’s technology-transfer policy, while not identical to Finland’s, allowed him to do so. CERN holds the intellectual property rights to the inventions of its personnel, but the lab’s policy is to publish all of its results, making them available to industry. This allowed members of the TuoviWDM team to take the ideas that they had developed at CERN and seek venture capital to establish a company.

With agreements between CERN, HIP and Single Source Oy guaranteeing the transfer of technology to the new company, Single Source Oy secured the funding that it needed in 2000 and the company now employs some 21 people, 14 of whom have worked on TuoviWDM at CERN. For its part, CERN has the long-term support that it needs, and one of its member states has a tangible return on its investment in basic science.

The post Finnish technology takes on CERN’s data mountain appeared first on CERN Courier.

Workshop looks through the lattice

cern — Sun, 29 Apr 2001 22:00:00 +0000

It follows from the underlying principles of quantum mechanics that the investigation of the structure of
matter at progressively smaller scales demands ever-increasing effort and ingenuity in constructing new accelerators.

Lattice of lattices – European groups involved in lattice gauge theory calculations.

As these updated machines come into operation, it becomes more and more important to as certain whether any deviation from theoretical predictions is the result of new physics or is due to extra (non-perturbative) effects within our current understanding – the Standard Model. Confronted with the difficulties of doing precise calculations, the lattice approach to quantum field theory attempts to provide a decisive test by simulating the continuum of nature with a discrete lattice of space-time points.

While this is necessarily an approximation, it is not as approximate as perturbation theory, which employs only selected terms from a series field theory expansion. Moreover, the lattice approximation can often be removed at the end in a controlled manner. However, despite its space-time economy, the lattice approach still needs the power of the world’s largest supercomputers to perform all of the calculations that are required to solve the complicated equations describing elementary particle interactions.

Berlin workshop

A recent workshop on High Performance Computing in Lattice Field Theory held at DESY Zeuthen, near Berlin, looked at the future of high-performance computing within the European lattice community. The workshop was organized by DESY and the John von Neumann Institute for Computing (NIC).

NIC is a joint enterprise between DESY and the Jülich research centre. Its elementary particle research group moved to Zeuthen on 1 October 2000 and will boost the already existing lattice gauge theory effort in Zeuthen. Although the lattice physics community in Europe is split into several groups, this arrangement fortunately does not prevent subsets of these groups working together on particular problems.

Physics potential

The workshop originated from a recommendation by working panel set up by the European Committee for Future Accelerators (ECFA) to examine the needs of high-performance computing for lattice quantum chromodynamics (QCD, the field theory of quarks and gluons; see Where did the ‘No-go’ theorems go?). It found that the physics potential of lattice field theory is within the reach of multiTeraflop machines, and the panel recommended that such machines should be developed. Another suggestion was to aim to coordinate European activities whenever possible.

Organized locally at Zeuthen by K Jansen (chair), F Jegerlehner, G Schierholz, H Simma and R Sommer, the workshop provided ample time to discuss this report. All members of the panel were present. The ECFA panel’s chairman, C Sachrajda of Southampton, gave an overview of the report, emphasizing again the main results and recommendations. The members of the ECFA panel then presented updated reports on the topics discussed in the ECFA report. These presentations laid the ground for discussions (led by K Jansen and C Sachrajda) that were lively and to some extent controversial. However, the emerging sentiment was a broad overall agreement with the ECFA panel’s conclusions.

Interpreting all of the data that results from experiments is an increasing challenge for the physics community, but lattice methods can make this process considerably easier. During the presentations made by major European lattice groups at the workshop, it became apparent that the lattice community is meeting the challenge head-on.

On behalf of the UK QCD group, R Kenway of Edinburgh dealt with a variety of aspects of QCD, which ranged from the particle spectrum to decay form factors.

Similar questions were addressed by G Schierholz of the QCDSF (QCD structure functions) group, located mainly in Zeuthen, who added a touch of colour by looking at structure functions on the lattice. R Sommer of the ALPHA collaboration, also based at Zeuthen, concentrated on the variation (“running”) of the quark-gluon coupling strength a_s (hence the collaboration’s name) and quark masses with the energy scale.

Workshop participants gather at DESY Zeuthen, near Berlin.

The chosen topic of the APE group (named after its computer) was weak decay amplitude, presented by F Rapuano of INFN/Rome. This difficult problem has gained fresh impetus following recent proposals and developments. T Lippert of the GRAL (going realistic and light) collaboration from the University of Wuppertal described the group’s attempts to explore the limit of small quark masses.

The activities of these collaborations are to a large extent coordinated by the recently launched European Network on Hadron Phenomenology from Lattice QCD.

New states of matter

Another interesting subject was explored by the EU Network for Finite Temperature Phase Transitions in Particle Physics, which is now tackling questions concerning new states of matter. These calculations are key to interpreting and guiding present and future experiments at Brookhaven’s RHIC heavy ion collider and at CERN. F Karsch and B Petersson, both from Bielefeld, presented the prospects.

The various presentations had one thing in common – all of the groups are starting to work with fully dynamical quarks and are thus going beyond the popular “quenched” approximation, which neglects internal mechanisms involving quarks.

Although this approximation works well in general, there are small differences between experiment and theory. To clarify whether these differences are signs of new physics or simply an artefact of the quenched approximation, lattice physicists now have to find additional computer power to simulate dynamical quarks – a quantum jump for the lattice community, as dynamical quarks are at least an order of magnitude more complicated.

This means that computers with multiTeraflop capacity will be required. All groups expressed their need for such computer resources in the coming years – only then can the European lattice community remain competitive with groups in Japan and the US.

Two projects that aim to realize this ambitious goal were presented at the workshop: the apeNEXT project (presented by L Tripiccione, Pisa), which is a joint collaboration of INFN in Italy with DESY and NIC in Germany and the University of Paris-sud in France; and the US-based QCDOC (QCD on a chip) project.

Ambitious computer projects

QCDOC and apeNEXT rely to a significant extent on custom-designed chips and networks, with QCDOC using a link to industry (IBM) to build machines with a performance of about 10 Tflop/s. Each of these projects is based on massively parallel architectures involving thousands of processors linked via a fast network. Both are well under way and there is strong optimism that 10 Tflop machines will be built by 2003. Apart from these big machines, the capabilities of lattice gauge theory machines based on PC clusters were discussed by K Schilling of Wuppertal and Z Fodor of Eotvos University, Budapest.

The calculations done using lattice techniques not only provide results that are interesting from a phenomenological point of view, but are also of great importance in the development of our understanding of quantum field theories in general. This aspect of lattice field theory was covered by a discussion on lattice chiral symmetry involving L Lellouch of Marseille, T Blum of Brookhaven and F Niedermayer of Bern. The structure of the QCD vacuum was covered by A DiGiacomo of Pisa.

There is great excitement in the lattice community that the coming years, with the advent of the next generation of massively parallel systems, will certainly bring new and fruitful results.

However, the proposed machines in the multiTeraflop range can only be an interim step. They will not be sufficient for generating higher-precision data for many observables. It is therefore not difficult to predict a future workshop in which lattice physicists will call for the subsequent generation of machines to reach the 100 Tflop range – a truly ambitious enterprise.

The post Workshop looks through the lattice appeared first on CERN Courier.

PCs gain greater importance in particle accelerator control

cern — Sun, 01 Apr 2001 22:00:00 +0000

PCs for particle accelerators – some of the attendees at the DESY workshop.

Personal computers are steadily making inroads into some specialist and very impersonal fields. One example is particle accelerators, and the impact of PCs was described in the Third International Workshop on PCs and Particle Accelerator Controls (PCaPAC), which was held recently at DESY.

From its inception in 1996, PCaPAC has specifically targeted the use of PCs in accelerator controls and has shown itself to be a valuable workshop in giving participants a chance to exchange ideas and experience in PC-related technologies, where trends can change rapidly. Participation in PCaPAC 2000 reached an all-time high of 93 contributions and 127 registered attendees from 43 different institutes and 17 countries.

At PCaPAC 2000, many running accelerators, the control-system infrastructure of which was based either entirely or in part on PCs, were presented. Among these were small systems built and maintained by a few people (e.g. the storage rings ASTRID and ELISA of the ISA Storage Ring Facilities at University of Aarhus in Denmark), medium-scale systems (e.g. the ANKA synchrotron light source in Karlsruhe, and accelereators at KEK in Japan) and the, probably, largest PC-based control systems for the HERA, PETRA and DORIS storage rings and their injectors at DESY in Hamburg.

Industrial solutions were also presented, covering complete packages, such as Supervisory Controls and Data Acquisition (SCADA) systems, as well as control systems based on Common Object Request Broker Architecture (CORBA) or Distributed Component Object Model (DCOM). In this vein, a joint venture between KEK and IT -Industry was presented, where a new Component Oriented Accelerator Control Kernel (COACK) was demonstrated.

In several cases, strategies to convert from legacy systems to modern ones and/or to integrate different platforms were presented. The distributed nature of PC control systems is manifest in the important role that is played by system administration. Also discussed were the needs and wishes of the accelerator operators regarding the control system as well as different approaches to supplying the optimal console profile to different and roaming users.

There were three special “tutorials”. First, a representative from CISCO described networking trends in the next three years of campus networks. Another covered “SCADA – current state and perspective”, when participants could get a real feel for both SCADA systems and trends in the field. Finally, as interest in such modern innovations as Java and CORBA remains high, whereas the number and variety of associated buzzwords make these subjects daunting for the uninitiated, a tutorial on these topics was also included.

While there is significant overlap in topics with the much larger ICALEPCS conference (see Computer control of physics is increasing), PCaPAC has nonetheless found its niche as a biennial workshop, alternating with ICAPLEPCS. The pace at which computer hardware and software as well as the Internet evolve is fast, so an event such as PCaPAC, where topics, trends and problems can be discussed in a workshop atmosphere, has been seen to be not only worthwhile but enthusiastically accepted by the controls community. For instance, in the category of Future Trends and Technologies, participants saw their first glimpse of data exchange via SOAP (Simple Object Access Protocol) and XML (Extensible Markup Language).

The Fourth International Workshop on PCs and Particle Accelerator Controls will be held in the autumn of 2002, in Asia or Italy.

The post PCs gain greater importance in particle accelerator control appeared first on CERN Courier.

Computing technology sits in the driving seat

cern — Sun, 01 Apr 2001 22:00:00 +0000

Pushpa Bhat of Fermilab co-chaired the recent 7th International Workshop on Advanced Computing and Analysis Techniques in Physics Research.

Fermilab director Mike Witherell, welcoming nearly 200 participants from around the world to the 7th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2000), said: “We have wonderful opportunities awaiting particle physics over the next decade. Two technologies are widely recognized as having driven our field from the beginning – accelerators and particle detectors. But there is also growing recognition that we rely on developments in advanced computing technologies. Innovative scientists often recognize the need for a revolutionary development before the wider world understands what it is good for. Once something becomes available, of course, lots of people know what to do with it, as we have learned over the last decade with the World Wide Web. There is a mutual benefit in collaboration between forefront physics research and computing technology. We rely, all over our laboratory (and our community), on continued innovations in the areas being discussed here at this conference.”

Short history

Matthias Kasemann of Fermilab co-chaired the recent 7th International Workshop on Advanced Computing and Analysis Techniques in Physics Research.

Reflecting the short history of these techniques, the first workshop in the series was held only 11 years ago in Lyon, France, under the name Artificial Intelligence in High Energy and Nuclear Physics, and was organized by Denis Perret-Gallix (LAPP, Annecy). Following this, the workshop was held in Europe every 18 months or so.

The 7th international workshop was the first to be held in the US with the updated name and with expanded scope. It followed four main tracks: artificial intelligence (neural networks and other multivariate analysis methods); innovative software algorithms and tools; symbolic problem solving; and very large-scale computing. It also covered applications in high-energy physics, astrophysics, accelerator physics and nuclear physics.

Besides the plenary, parallel and poster sessions, the workshop included working group and panel discussion sessions focusing on particular topics – uses of C++, large-scale simulations, advanced analysis environments and global computing – which allowed informal presentations and extensive discussion sessions.

The keynote talk, entitled “Information technology: transforming our society and our lives”, was given by Ruzena Bajcsy of the US National Science Foundation. John Moody, a former particle theorist and now professor of computer science and director of the Computer Finance Program at Oregon Graduate Institute, spoke on “Knowledge discovery through machine learning”. Gaston Gonnet from ETH, Institute for Scientific Computation, Zurich, Switzerland, talked about the “Computer algebra system”.

Inventor of C++ Bjarne Stroustrup from AT&T Bell Labs.

A big attraction early in the workshop was C++ inventor and world-renowned computer scientist Bjarne Stroustrup from AT&T Bell Labs. He gave a featured talk entitled “Speaking C++ as a native” and served as a distinguished panellist in discussions on the “Use of C++ in scientific computing”. His talk explained, by way of several simple but striking examples, how C++ can be used in a much more expressive manner than one commonly finds. Stroustrup, echoing the comments of Mike Witherell, noted that the world is slow to catch on to new ideas. He also emphasized the need for physicists to be involved in the C++ Standards Committees if they wish to influence the further development of that language.

Fermilab and Chicago cosmologist Rocky Kolb (left) with Stephen Wolfram, creator of the Mathematica software packages.

Another distinguished participant and speaker was Stephen Wolfram, creator of the Mathematica software packages and the winner of a McArthur Foundation Fellowship award in 1981 at the age of 22. Early in his career he worked in high-energy physics, cosmology and quantum field theory. For the last couple of decades he has been developing a general theory of complexity.

Wolfram gave a special colloquium describing his perspective on the development of Mathematica and the establishment of Wolfram Research. His talk gave glimpses of his work on “A new kind of science,” which has occupied his attention during the past nine years. Stephen Wolfram has been working on cellular automata and the evolution of complex systems, and he is writing an epic volume (of about 1000 pages) on the subject, which is soon to be published.

New experiments

High-energy physics experiments and analyses took centre stage halfway through the workshop with plenary talks on “Advanced analysis techniques in HEP” by Pushpa Bhat (Fermilab), “Statistical techniques in HEP” by Louis Lyons (Oxford) and “The H1 neural network trigger project” by Chris Kiesling (MPI). These were followed by “Theoretical computations in electroweak theory” by Georg Weiglein (CERN).

There were vigorous and stimulating discussions in a panel session on Advanced Analysis Environments, with perspectives presented by Rene Brun (CERN), Tony Johnson(SLAC) and Lassi Tuura (CERN and Northeastern).

Fermilab is facing the collider Run II (which began in March) with upgraded CDF and D0 detectors. The advanced computing and analysis techniques discussed at this workshop may be crucial for making major discoveries at the Tevatron experiments.

The new generation of experiments under construction in particle physics, cosmology and astrophysics – CMS and ATLAS at CERN’s LHC collider, the Laser Interferometer Gravitational Observatory (LIGO) and the Sloan Digital Sky Survey (SDSS) – will usher in the most comprehensive programme of study ever attempted of the four fundamental forces of nature and the structure of the universe.

The LHC experiments will probe the tera-electronvolt frontier of particle energies to search for new phenomena and improve our understanding of the nature of mass. LIGO hopes eventually to detect and analyse gravitational waves arising from some of nature’s most energetic events. SDSS will survey a large fraction of the sky and provide the most comprehensive catalogue of astronomical data ever recorded.

Together, these investigations will involve thousands of scientists from around the world. Mining the scientific treasures from these experiments, over national and intercontinental distances, over the next decade or two, will present new problems in data access, processing and distribution, and remote collaboration on a scale never before encountered in the history of science.

Thus “grid computing” is emerging as one key component of the infrastructure that will connect multiple regional and national computational centres, creating a universal source of pervasive and dependable computing power. Grid computing was therefore the focus for a whole day at the workshop. Various champions of the grid projects GriPhyN, Particle Physics Data Grid (PPDG) and European DataGrid contributed, such as Ian Foster (ANL), Paul Avery (Florida), Harvey Newman (Caltech), Miron Livny (Wisconsin), Luciano Barone (INFN) and Fabrizio Gagliardi (CERN), along with other pioneers of grid and worldwide computing.

In the sphere of very-large-scale computing and simulations, Robert Ryne (Los Alamos) spoke on accelerator physics, Alex Szalay (John Hopkins) on astrophysics, Paul Mackenzie (Fermilab) on lattice calculations and Aiichi Nakano (LSU) on molecular dynamics simulations. A working group on large-scale simulations coordinated by Rajendran Raja (Fermilab) and Rob Rosner (Chicago) featured contributions from particle experiments CDF, D0, CMS and ATLAS, as well as from the muon collider and astrophysics communities.

Technology show

A major event at the workshop was a technology show coordinated by SGI representative Kathy Lawlor, Cisco representative Denis Carroll and Fermilab’s Ruth Pordes, Dane Skow and Betsy Schermerhorn.

The show featured the Reality Center for collaborative visualization, IP streaming video, IP Telephony, wireless LAN by SGI and Cisco, and hardware and application software exhibits from Wolfram Research, Platform Computing, Objectivity, Kuck & Associates Inc and Waterloo Maple.

The meeting was organized and co-chaired by Pushpalatha Bhat of Fermilab, who for more than a decade has been a strong advocate of the use of advanced multivariate analysis methods in high-energy physics, and by Matthias Kasemann, head of Fermilab’s Computing Division.

The workshop was sponsored by Fermilab, the US Department of Energy and the US National Science Foundation; it was co-sponsored by Silicon Graphics and Cisco Systems; and it was endorsed by the American and European Physical Societies.

The post Computing technology sits in the driving seat appeared first on CERN Courier.

Data Grid project gets EU funding

cern — Mon, 26 Feb 2001 00:00:00 +0000

An initiative to build a prototype computer grid infrastructure for handling the data produced by experiments at CERN’s LHC collider is well under way. Only one Tier 1 centre is shown here, but in fact there will be Tier 1 centres in France, Italy, the Netherlands and the UK, with two or more Tier 2 regional centres elsewhere in each country. A number of Tier 3 centres will be implemented at university campus level, while Tier 4 centres will be located inside research departments.

Starting from this year, the Data Grid project will receive in excess of Ý9.8 million for three years to develop middleware (software) to deploy applications on widely distributed computing systems. In addition to receiving EU support, the enterprise is being substantially underwritten by funding agencies from a number of CERN’s member states. Due to the large volume of data that it will produce, CERN’s LHC collider will be an important component of the Data Grid (see The grid is set to grapple with large computations).

As far as CERN is concerned, this programme of work will integrate well into the computing testbed activity already planned for the LHC. Indeed, the model for the distributed computing architecture that Data Grid will implement is largely based on the results of the MONARC (Models of Networked Analysis at Regional Centres for LHC experiments) project. CERN’s part in the Data Grid project will be integrated into its ongoing programme of work and will be jointly staffed by EU- and CERN-funded personnel.

The work that the project will involve has been divided into numbered subsections, or “work packages”. CERN’s main contribution will be to three of these work packages: WP 2, dedicated to data management and data replication; WP 4, which will look at computing fabric management; and WP 8, which will deal with high-energy physics applications. Most of the resources for WP 8 will come from the four major LHC experimental collaborations: ATLAS, CMS, ALICE and LHCb.

Other work will cover areas such as workload management (coordinated by the INFN in Italy), monitoring and mass storage (coordinated in the UK by the PPARC funding authority and the UK Rutherford Appleton Laboratory) and testbed and networking (coordinated in France by IN2P3 and the CNRS). CERN is also contributing to the work on testbeds and networking, and it is responsible for the overall management and administration of the project with resources partially funded by the EU.

The data management work package will develop and demonstrate the necessary middleware to ensure secure access to petabyte databases, enabling the efficient movement of data between Grid sites with caching and replication of data. Strategies will be developed for optimizing and costing queries on the data, including the effect of dynamic usage patterns. A generic interface to various mass storage management systems in use at different Grid sites will also be provided.

The objective of the fabric management work package is to develop new automated system management techniques. This will enable the deployment of very large computing fabrics constructed from tens of thousands of mass-market components, with reduced systems administration and operations costs. All aspects of management will be covered, from system installation and configuration through monitoring, alarms and troubleshooting.

WP 8 aims to deploy and run distributed simulation, reconstruction and analysis programs using Grid technology. This package is central to the project because it is among those that enable the large-scale testing of the middleware being developed by the other work package groups and it provides the user requirements that drive the definition of the architecture of the project.

Dozens of physicists, mostly from Europe, will participate in the endeavour while continuing to perform their day-to-day research activities.

A project architecture task force has recently been appointed, with participants from the relevant middleware work packages and a representative from the applications. Leading US computer scientists are also participating in this effort to ensure that developments in the US continue in parallel with work being carried out in Europe. Data Grid is hosting the first Global Grid Forum in Amsterdam in March, which will aim to coordinate Grid activity on a worldwide scale.

The post Data Grid project gets EU funding appeared first on CERN Courier.

How the Web was Born

cern — Fri, 01 Dec 2000 14:16:30 +0000

The World Wide Web was conceived at CERN to allow particle physicists easy access to information, wherever it was and they happened to be. It was a great success, so great that it went on to take the whole world by storm – a veritable communications revolution.

CERN Courier News Editor James Gillies teamed up with CERN World Wide Web pioneer Robert Cailliau to write a detailed history of modern telecommunications, particularly as seen through CERN eyes. As the book points out, the fact that the Web was invented at CERN “is no accident”.

How the Web was Born by James Gillies and Robert Cailliau, Oxford University Press, ISBN 0192862073, pbk.

This book is a surprising, ambitious, interesting and courageous account of a series of developments culminating in the invention at CERN of the World Wide Web. It is not only a history of the Web – it covers in considerable detail the necessary evolution of networks, personal computers and software technology which enabled Tim Berners-Lee’s brilliant creation of the Web in 1989.

CERN Courier News Editor James Gillies (right) and CERN World Wide Web pioneer Robert Cailliau are seen here with the Next computer on which Cailliau and Tim Berners-Lee launched the world’s first Web site at CERN in 1990. The computer on the right registers the number of Web sites currently active throughout the world. The display was specially arranged for the recent LEP celebration at CERN.

I say “surprising” because it had seemed to me that enough good Internet histories had already been written (Salus 1995; Hafner and Lyon 1996; Randall 1997). Furthermore, Berners-Lee’s own account Weaving the Web was published only last year (Berners-Lee 1999). However, there is much new material in this book. I call it “ambitious” and “interesting” because it covers all the surrounding areas in depth and is not afraid to follow sidetracks, personalities and anecdotes, which are always the key to attracting and holding a reader’s attention.

The book reveals the quirky human attitudes and the bureaucratic and business struggles which make this a real human story rather than just a dateline of cold technological developments. Finally, it is “courageous” because, although written by two CERN authors, it is truthful even about those parts of the story which are not too flattering to CERN.

There is not much to criticize. The first 10 pages on telephones and LANs contribute little, are messy and may deter some readers from reaching the true start of the story. My attention was first aroused on p11 by the phrase: “The Birth of the Internet: On 31 January 1958, the United States launched Explorer I, its first satellite, though few now remember that.”

There are some minor slips of detail: for example, STELLA was a CERN satellite project, not an Italian one (p81), and began in 1978, not in 1981 (p317). The proofreading of the book was also not up to my expectations of Oxford University Press.

Essentially, however, this book makes a major contribution. I believe the authors have succeeded with their aim “to tell a story of human endeavour, and to provide a good read in the process”. They stress the multiplicity of contributions of many individuals over half a century, including the essential ones and without forgetting the elements of accident and personality which often proved crucial.

Humour abounds: Senator Edward Kennedy, in congratulating the Boston team that had won a contract for an ARPANET Interface Message Processor (IMP), refers to it as an “interfaith” processor. When the first IMP was delivered to UCLA and found in horror to be upside down in its crate, a team member declared this only “meant that it had been turned over an odd number of times”. This was after finding that the IMP had survived.

There is also much wisdom, such as that of Frank Heart, manager of the small Bolt, Beranek and Newman team developing the IMP, which he described as follows: “All the software people knew something about hardware, and all the hardware people programmed. It was a set of people who all knew a lot about the whole project. I consider that pretty important in anything very big.”Thirty years later, nothing much has changed.

The battles of culture and practice between proprietary, ISO and TCP/IP networking, fought to the death between the late 1960s and the early 1990s, are handled with insight and accuracy. This is required reading for today’s younger generation, many of whom surprise me by their casual ignorance of what was for some of us a struggle over many years, dividing colleagues, damaging careers and delaying progress towards the now realized dream of a networked world.

Culture and practice continue to collide in the later chapters, where we approach the fateful few years where all the strands will meet. Berners-Lee’s personal trajectory is followed, showing how his curiosity and taste for research was nurtured and amplified by contact with like minds, first by his parents, teachers and other early influential figures, and later by collaboration and discourse with CERN colleagues and the blossoming Internet community. With the Web idea launched outside CERN, the germination and maturation of the software worldwide is then traced in detail, including the NCSA/Mosaic/Netscape saga and the demise of competing products like Gopher and Archie.

The book captures the rare combination of Berners-Lee’s talents: steady vision, broad interests and detailed attention. It shows the triumph of a mind that could arrive at something simple, starting from a situation where things were deeply complicated beforehand. The idea that order could be created in the chaotically non-standard environments of document exchange and networking as they stood in the late 1980s was simply unbelievable for practically everyone at that time.

Berners-Lee’s success serves as a living example of the power and the necessity of the KISS principle – “Keep It Simple, Stupid” – which tells us to be humble in the face of the world’s ever growing complexity. There is also a quality in him of those inexperienced youngsters, recruited by Data General during a classic underground project to develop the world’s fastest minicomputer, specifically because they were too naive to know that certain things “can’t be done” (Kidder 1981).

The mouse that roared – the first-ever computer mouse in the hand of its inventor Doug Engelbart. It was demonstrated at the 1968 Fall Joint Computer Conference in San Francisco.

Why, if it was all so simple, did the Web take so long to arrive – 20 years after the start of the Internet? And why at CERN? This latter question is carefully examined by the authors, who say “it is no accident that it happened at CERN” and cite several supporting reasons. For me, the most convincing reason was Berners-Lee’s own statement that it was “…a question of being in the right place at the right time…in the right environment”, and: “…with great bosses in Peggie Rimmer and Mike Sendall, and a lot of stimulating colleagues, all prepared to think outside the box”.

The other burning question addressed by the book is: why did CERN “give away” the Web and lose its creator to MIT? It is a complex matter, taking up the entire last chapter of the book. Starting with Berners-Lee’s chance meeting with Michael Dertouzos of MIT/LCS, a complex web of interests and rivalries is traced between US and European players: CERN, INRIA, NCSA, MIT, Mosaic Communications and the European Commission. Bluff, counterbluff and misunderstandings succeeded each other. But the bottom line appears to be that CERN, fighting a life-or-death battle for approval of the LHC project, lacked basic commitment to a non-physics activity, even one of such huge potential.

The book is dedicated to the memory of two people: first to Donald Davies, pioneer of packet switching, the most essential of all Internet components; and second to Mike Sendall, who in 1989 “did not say no to Tim Berners-Lee and consequently the Web got off the ground”. Let me wholeheartedly applaud that conclusion: not saying no to the young and starry-eyed is one way the world can advance from chaos to order, from the impossible to the imaginable.

Ben Segal is currently leader of CERN’s Technology for Experiments Section, responsible for development in areas including High Performance and Storage Area Networks and CERN’s online Central Data Recording service, and is responsible for Data Management within the new European “Data Grid” Project. From 1985 until 1988, he served as CERN’s first TCP/IP coordinator, responsible for the introduction of the Internet protocols within CERN. In 1995, he was a co-founder of the Internet Society (ISOC) Geneva Chapter.

The post How the Web was Born appeared first on CERN Courier.

Quantum Computation and Quantum Information

cern — Mon, 30 Oct 2000 14:31:15 +0000

by Michael Nielsen and Isaac Chuang, Cambridge University Press, ISBN 0521632358, £80/$130 (hbk); ISBN 0521635039 £29.95/$47.95 (pbk).

In such a new and fast-developing field such as quantum computing, it is always good to have an authoritative introduction for newcomers. This book is designed to be accessible by those who do not necessarily have a background in quantum physics.

The post Quantum Computation and Quantum Information appeared first on CERN Courier.

RHIC starts producing data

cern — Thu, 21 Sep 2000 22:00:00 +0000

Side view in the STAR Time Projection Chamber of a gold-gold collision at a collision energy of 130 GeV.

End view in the STAR Time Projection Chamber of a gold-gold collision at a collision energy of 130 GeV.

After achieving first collisions of gold ion beams on the night of 12 June, the gleaming new Relativistic Heavy Ion Collider (RHIC) facility at Brookhaven wasted no time in ramping up in energy and intensity and starting the process of analysing data.

A few days after the first gold ions collided at a collision energy of 56 GeV/nucleon, all four of the RHIC detectors (BRAHMS, PHENIX, PHOBOS and STAR) began recording data at a collision energy of 130 GeV/nucleon. By the end of July the first physics result – a measurement of charged particle density at mid-rapidity for central gold -gold collisions at these two energies – was submitted for publication by the PHOBOS collaboration.

The PHOBOS detector’s response to particles produced in a gold ion collision. The green points show where particles hit the silicon.

Outlines of the two PHENIX central arms. The dots are hits in the detector. The white lines on the right are calorimeter clusters with lengths corresponding to energy. The pink lines are projections of the drift chamber tracks to the vertex. The copper tube is the beam pipe.

With these data points in hand, and further analysis results in the pipeline from each of the four experiments, theorists who were at Brookhaven to attend a series of summer workshops immediately began to ponder the first glimpse of high-density matter in this new energy regime.

In the meantime the machine staff shifted focus from first collisions to achieving sustained collider operation. The goal for machine operation over the summer was to bring the collider and its injector complex, consisting of tandem Van de Graaff, booster and AGS synchrotron, to the level at which all of the experiments would obtain an initial data run with event rates approaching 10% of the final design luminosity.

Colliding under controlled conditions – Fulvia Pilat and Thomas Roser at work in the RHIC/AGS control room at Brookhaven.

By mid-August, RHIC’s two superconducting rings were routinely colliding stored beams of gold ions, with the full complement of 55 ion bunches in each ring, beam lifetimes of more than 4 h and some storage cycles lasting 10 h and more. The four experiments simultaneously recorded data throughout these runs, transferring data to the RHIC Computing Facility at peak rates of more than 40 Mbyte/s.

RHIC ran through mid-September, with continued data taking as well as accelerator physics work to complete the commissioning of the collider systems. A comprehensive look at the first physics results from this year’s run will take place at the Quark Matter 2001 meeting on 15-20 January, which is being jointly hosted by the State University of New York at Stony Brook and Brookhaven. It is expected that the collider will start up again early in 2001 and begin operating soon after at the full design energy of 200 GeV/nucleon for gold-gold collisions.

The post RHIC starts producing data appeared first on CERN Courier.

Workshop tackles GLOBUS and Grid

cern — Thu, 21 Sep 2000 22:00:00 +0000

Steve Tuecke from Ian Foster’s pioneer Grid group at Argonne explains a topological issue at a recent GLOBUS Grid workshop.

The Grid, a highly distributed computing environment that is seen by many as a step beyond the World Wide Web, is catalysing many new computing developments. One is GLOBUS, a toolkit that provides Grid building blocks.

At a recent GLOBUS Grid workshop at the UK Central Laboratories of the Research Councils, Steve Tuecke and Lee Liming from Ian Foster’s pioneer group at Argonne gave presentations at the Rutherford Appleton Laboratory, while the Daresbury Laboratory joined by videoconference. The sessions were recorded and will be available in RealVideo format (for details contact Rob Allan, e-mail “R.Allan@dl.ac.uk”).

The first day offered an introduction to the computational Grid and the GLOBUS toolkit, together with a user’s tutorial. The second day was a developer’s tutorial for Grid programming and went into significantly greater technical detail (common services and security, information services, resource management, remote data management, fault management and communications). The final day concentrated on directing the GLOBUS team’s expert advice onto current particle physics Grid activities and setting up future projects for collaboration with the Foster group.

The workshop was considered to be a great success and is undoubtedly the start of many focused Grid activities in the UK particle physics community.

The post Workshop tackles GLOBUS and Grid appeared first on CERN Courier.

Meeting the ALICE data challenge

cern — Tue, 27 Jun 2000 22:00:00 +0000

Alice in gigabyteland. This simulation shows a fraction of the tracks in a lead-ion collision, as would be seen by the ALICE detector at CERN’s LHC collider.

One of CERN’s StorageTek tape silos used in the ALICE data challenge.

Imagine trying to record a symphony in a second. That is effectively what CERN’s ALICE collaboration will have to do when the laboratory’s forthcoming Large Hadron Collider (LHC) starts up in 2005. Furthermore, that rate will have to be sustained for a full month each year.

ALICE is the LHC’s dedicated heavy-ion experiment. Although heavy-ion running will occupy just one month per year, the huge number of particles produced in ion collisions means that ALICE will record as much data in that month as the ATLAS and CMS experiments plan to do during the whole of the LHC annual run. The target is to store one petabyte (10¹⁵ bytes) per year, recorded at the rate of more than 1 Gbyte/s. This is the ALICE data challenge, and it dwarfs existing data acquisition (DAQ) applications. At CERN’s current flagship accelerator LEP, for example, data rates are counted in fractions of 1 Mbyte/s. Even NASA’s Earth Observing System, which will monitor the Earth day and night, will take years to produce a petabyte of data.

Meeting the challenge is a long-term project, and work has already begun. People from the ALICE collaboration have been working with members of CERN’s Information Technology Division to develop the experiment’s data acquisition and recording systems. Matters are further complicated by the fact that the ALICE experiment will be situated several kilometres away from CERN’s computer centre, where the data will be recorded. This adds complexity and makes it even more important to start work now.

Standard components – such as CERN’s network backbone and farms of PCs running the Linux operating system – will be used to minimize capital outlay. They will, however, be reconfigured for the task in order to extract the maximum performance from the system. Data will be recorded by StorageTek tape robots installed as part of the laboratory’s tape-automation project to pave the way for handling the large number of tapes that will be required by LHC experiments.

The first goal for the ALICE data challenge was to run the full system at a data transfer rate of 100 Mbyte/s – 10% of the final number. This was scheduled for March and April 2000 so as not to interfere with CERN’s experimental programme, which will get up to speed in the summer.

Data sources for the test were simulated ALICE events from a variety of locations at CERN. After being handled by the ALICE DAQ system (DATE) they were formatted by the ROOT software, developed by the global high energy physics community. The data were then sent through the CERN network to the computer centre, where two mass storage systems were put through their paces for two weeks each. The first, HPSS, is the fruit of a collaboration between industry and several US laboratories. The second, CASTOR, has been developed at CERN.

Although each component of the system had been tested individually and shown to work with high data rates, this year’s tests have demonstrated the old adage that the whole is frequently greater than the sum of its parts: problems only arose when all of the component systems were integrated.

The tests initially achieved a data rate of 60 Mbyte/s with the whole chain running smoothly. However, then problems started to appear in the Linux operating system used in the DAQ system’s PC farms. Because Linux is not a commercial product, the standard way of getting bugs fixed is to post a message on the Linux newsgroups. However, no-one has previously pushed Linux so hard, so solutions were not readily forthcoming and the team had to work with the Linux community to find their own.

That done, the rate was cranked up and failures started to occur in one of the CERN network’s many data switches. These were soon overcome – thanks this time to an upgrade provided by the company that built the switches – and the rate was taken up again. Finally the storage systems had trouble absorbing all of the data. When these problems were ironed out, the target peak rate of 100 Mbyte/s was achieved for short periods.

At the end of April the ALICE data challenge team had to put their tests on hold, leaving the CERN network and StorageTek robots at the disposal of ongoing experiments and test beams. During the tests, more than 20 Tbyte of data – equivalent to some 2000 standard PC hard disks – had been stored. The next milestone, scheduled for 2001, is to run the system at 100 Mbyte/s in a sustained way before increasing the rate, step by step, towards the final goal of 1 Gbyte/s by 2005. The ALICE data challenge team may not yet have made a symphony, but the overture is already complete.

The post Meeting the ALICE data challenge appeared first on CERN Courier.

Where did the ‘No-go’ theorems go?

cern — Tue, 27 Jun 2000 22:00:00 +0000

Participants of the 2000 Ringberg Workshop on Lattice Field Theory.

At the workshop – Guido Martinelli (left) and Chris Sachrajda contemplate power subtractions for non-leptonic kaon decays.

At the smallest possible scales, physics calculations are extremely complicated. This is the dilemma facing particle physicists.

Lattice field theories were originally proposed by 1982 Nobel laureate Ken Wilson as a means of tackling quantum chromodynamics (QCD) – the theory of strong interactions – at low energies, where calculations based on traditional perturbation theory fail.

The lattice formulation replaces the familiar continuous Minkowski space-time with a discrete Euclidean version, where space time points are separated by a finite distance – the lattice spacing. In this way results can be obtained by simulations, but the computing power required is huge, requiring special supercomputers.

This methodology has been applied extensively to QCD: recent years have witnessed increasingly accurate calculations of many quantities, such as particle masses (including those of glueballs and hybrids) and form factors for weak decays, as well as quark masses and the strong (inter-quark) coupling constant. These results provide important pointers to future progress.

The romantic Ringberg Castle, with its panoramic view of the Bavarian Tegernsee, was the scene of a recent workshop entitled Current Theoretical Problems in Lattice Field Theory, where physicists from Europe, the US and Japan discussed and assessed recent progress in this increasingly important area of research.

Obstacles removed

Despite the many successes of lattice QCD, there are stubborn areas where little progress has been made. For instance, until recently it was thought that the lattice formulation was incompatible with the concept of a single left-handed fermion (such as the Standard Model neutrino). The notion of this chirality plays a key role for the strongly and weakly interacting sectors of the Standard Model. Furthermore, weak decays like that of a kaon into two pions have been studied on the lattice with only limited success.

A non-perturbative treatment of such processes is highly desirable, because they are required for our theoretical understanding of direct CP violation and the longstanding problem of explaining isospin selection rules in weak decays. However, there have been impressive theoretical advances in both of these areas, which were discussed at the Ringberg workshop.

Gian Carlo Rossi (Rome II) gave a general introduction to lattice calculations of KÆpp. By the early 1990s, all attempts to study this process on the lattice had been abandoned, because it was realized that the necessary physical quantity cannot be obtained from the correlation functions computed on the lattice. This Maiani-Testa No-go theorem was analysed in great detail by Chris Sachrajda (Southampton). Laurent Lellouch (Annecy) then described how the theorem can be circumvented by treating the decay in a finite volume, when the energy spectrum of the two-pion final state is not continuous, in turn violating one of the conditions for the No-go theorem to apply.

Furthermore, the transition amplitude in finite volume can be related to the physical decay rate. An implementation of this method in a real computer simulation requires lattice sizes of about 5-7 fm. This stretches the capacities of current supercomputers to the limit, but a calculation will certainly be feasible with the next generation of machines.

Guido Martinelli (Rome I) presented the decay from a different angle by relating it to the conceptually simpler kaon-pion transition. This strategy has been known for some time, and recent work concentrated on the final-state interactions between the two pions. The inclusion of these effects may influence theoretical predictions for measurements of direct CP violation. Given recent experimental progress in this sector, this is surely of great importance.

Many lattice theorists’ hopes of being able to study the electroweak sector of the Standard Model had been frustrated by another famous No-go theorem, this time by Nielsen and Ninomiya. This states that chiral symmetry cannot be realized on the lattice, which, for instance, makes it impossible to treat neutrinos in a lattice simulation.

Recently it has been shown how the Nielsen-Ninomiya theorem could be sidestepped: a chiral fermion (such as a neutrino) can be put on the lattice provided that its discretized Dirac operator satisfies the so-called Ginsparg-Wilson relation. Several solutions to this relation have been constructed, and the most widely used are known in the trade as “Domain Wall” and “Overlap” fermions.

Progress in understanding how nature works on the smallest possible scale depends on such theoretical and conceptual advances as well as sheer computer power

At Ringberg, Pilar Hernández (CERN) examined whether these solutions can be implemented efficiently in computer simulations. Obviously these more technical aspects have to be investigated before one can embark on more ambitious projects. Hernández concluded that the computational cost of both formulations is comparable, but substantially higher compared with conventional lattice fermions. In particular, her results indicate that the numerical effort needed to preserve chiral symmetry by simulating Domain Wall fermions is far greater than previously thought. This point was further explored during an open discussion session led by Karl Jansen (CERN) and Tassos Vladikas (Rome II). A conclusion was that conventional lattice fermions appear quite sufficient to address many – if not all – of the problems in applied lattice QCD.

As well as calculating hard results, the preservation of chiral symmetry on the lattice has also been exploited in the study of more formal aspects of quantum field theories. Oliver Bär (DESY) presented recent work on global anomalies, which can now be analysed in a rigorous, non-perturbative way using the lattice framework. SU(2) gauge theory coupled to one massless, left handed neutrino thereby leads to the lattice analogue of the famous Witten anomaly. Further work on anomalies was presented by Hiroshi Suzuki (Trieste), while Yigal Shamir (Tel Aviv) reviewed a different approach to lattice chiral gauge theories based on gauge fixing.

Among other topics discussed at Ringberg was the issue of non-perturbative renormalization, with contributions from Roberto Petronzio (Rome II), Steve Sharpe (Seattle) and Rainer Sommer (Zeuthen). The problem is to relate quantities (for example form factors and decay constants) computed on the lattice to their continuum counterparts via non-perturbatively defined renormalization factors. Such a procedure avoids the use of lattice perturbation theory, which is known to converge only very slowly.

The successful implementation of non-perturbative renormalization for a large class of operators removes a major uncertainty in lattice calculations. Furthermore, talks by Antonio Grassi, Roberto Frezzotti (both Milan) and Stefan Sint (Rome II) discussed recent work on QCD with an additional mass term which is expected to protect against quark zero modes. It is hoped that this will help in the simulation of smaller quark masses.

Many other contributions, for example two-dimensional models, Nahm dualities and the bosonizaton of lattice fermions, could also lead to further progress. However, the variety of topics discussed at the workshop underlines that lattice field theory is a very active research area with many innovative ideas. Progress in understanding how nature works on the smallest possible scale depends on such theoretical and conceptual advances as well as sheer computer power.

The Ringberg meeting was organized by Martin Lüscher (CERN), Erhard Seiler and Peter Weisz (MPI Munich).

Directions for lattice computing

Quantum physics calculations are not easy. Most students, after having worked through the solutions of the Schrödinger equation for the hydrogen atom, take the rest of quantum mechanics on trust. Likewise, quantum electrodynamics is demonstrated with a few easy examples involving colliding electrons. This tradition of difficult calculation continues, and is even accentuated, by the physics of the quarks and gluons inside subnuclear particles.

Quantum chromodynamics – the candidate theory of quarks and gluons – can only be handled using powerful computers, and. even then drastic assumptions must be made to make the calculations tractable. For example, a discrete lattice (several fm) has to replace the space-time continuum. Normally only the valence quarks, which give the particle its quantum number assignment, can be taken into account (the quenched approximation), and the myriad of accompanying virtual quarks and antiquarks have to be neglected.

The benchmark of lattice QCD is the calculation of particle masses, where encouraging results are being achieved, but physicists are still far from being able to explain the observed spectrum of particle masses. Future progress in understanding subnuclear particles and their interactions advances in step with available computer power.

To point the way forward, the European Committee for Future Accelerators recently set up a panel (chaired by Chris Sachrajda of Southampton) to assess both the computing resources required for this work and the scientific opportunities that would be opened up. The panel’s main conclusions were:

* The future research programme using lattice simulations is a very rich one, investigating problems of central importance for the development of our understanding of particle physics. The programme includes detailed (unquenched) computations of non perturbative QCD effects in hadronic weak decays, studies of hadronic structure, investigations of the quark-gluon plasma, exploratory studies of the non-perturbative structure of supersymmetric gauge theories, studies of subtle aspects of hadronic spectroscopy, and much more.

* The European lattice community is large and very strong, with experience and expertise in applying numerical solutions to a wider range of physics problems. For more than 10 years it has organized itself into international collaborations when appropriate, and these will form the foundation for any future European project. Increased coordination is necessary in preparation for the 10 Tflops generation of machines.

*Future strategy must be driven by the requirements of the physics research programme. We conclude that it is both realistic and necessary to aim for machines of the order of 10 Tflops processing power by 2003. As a general guide, such machines will enable results to be obtained in unquenched simulations with similar precision to those currently found in quenched ones.

* It will be important to preserve the diversity and breadth of the physics programme, which will require a number of large machines as well as a range of smaller ones.

* The lattice community should remain alert to all technical possibilities in realizing its research programme. However, the panel concludes that it is unlikely to be possible to procure a 10 Tflops machine commercially at a reasonable price by 2003, and hence recognizes the central importance of the apeNEXT project to the future of European lattice physics.

The post Where did the ‘No-go’ theorems go? appeared first on CERN Courier.

The grid is set to grapple with large computations

cern — Tue, 30 May 2000 22:00:00 +0000

Grid reference – to prepare for the avalanche of data that will be produced by the experiments at CERN’s LHC collider, an initiative to build a prototype computer grid infrastructure for the UK is well advanced. A Tier 1 centre will be based at the UK Rutherford Appleton Laboratory with two or more Tier 2 regional centres elsewhere in the UK. Tier 3 centres will be at university campus level, while Tier 4 would be inside research departments. Similar arrangements will be developed in other countries.

When CERN’s LHC collider begins operation in 2005, it will be the most powerful machine of its type in the world, providing research facilities for thousands of researchers from all over the globe.

This requirement arrived on the scene at the same time as a growing awareness that major new projects in science and technology need matching computer support and access to resources worldwide.

In the 1970s and 1980s the Internet grew up as a network of computer networks, each established to service specific communities and each with a heavy commitment to data processing.

Internet, Web, what next?

However, the Web is not the end of the line. New thinking for the millennium, summarized in a milestone book entitled The Gridby Ian Foster of Argonne and Carl Kesselman of the Information Sciences Institute of the University of Southern California, aims to develop new software (“middleware”) to handle computationsspanning widely distributed computational and information resources – from supercomputers to individual PCs.

In the same way that the World Wide Web makes information stored on a remote site immediately accessible anywhere on the planet without the end user having to worry unduly where the information is held and how it arrives, so the Grid would extend this power to large computational problems.

Just as a grid for electric power supply brings watts to the wallplug in a way that is completely transparent to the end user, so the new data Grid will do the same for information.

Each of the major LHC experiments – ATLAS, CMS and ALICE – is estimated to require computer power equivalent to 40 000 of today’s PCs. Adding LHCb to the equation gives a total equivalent of 140 000 PCs, and this is only for day 1 of the LHC.

For the LHC, each experiment will have its own central computer and data storage facilities at CERN, but these have to be integrated with regional computing centres accessed by the researchers from their home institutes.

CERN serves as Grid testbed

In Europe, European Commission funding is being sought to underwrite this major new effort to propel computing into a new orbit.

The post The grid is set to grapple with large computations appeared first on CERN Courier.

Computer control of physics is increasing

cern — Tue, 18 Apr 2000 22:00:00 +0000

The mayor of Trieste, R Illy, welcomes participants at the opening session of the recent International Conference on Accelerator and Large Experimental Physics Control Systems (ICALEPCS).

The changing face of physics and physics research is underlined by the increasing use of and emphasis on sophisticated control systems. Once dominated by systems for big particle accelerators, control systems are now widely used in other major facilities, and increasingly in large experiments.

This was demonstrated at the recent International Conference on Accelerator and Large Experimental Physics Control Systems (ICALEPCS), the scientific and technical programme of which covered controls for, among others, particle accelerators, detectors, telescopes, nuclear fusion devices and nuclear reactors.

ICALEPCS’99 saw an increased number of contributions from the plasma physics and astronomical community and also, although to a lesser extent, from the particle detector community. Philippe Charpentier of CERN presented a memorable talk entitled “The evolution of the DELPHI experiment control system – how to survive 10 years of running”.

ICALEPCS looked at all aspects – hardware and software – of experimental physics control systems, but concentrated on how controls can contribute to the success of a major experiment. With this objective in mind, different technology and engineering issues were covered. State-of-the-art software and hardware technologies were reviewed in terms of the possibilities that they offer for dealing with systems of increasing complexity and sophistication within restricted budgets and human resources.

Software

In the software domain, several applications that were described are based on Windows NT using the Common Object Request Broker Architecture (CORBA) as a distributed programming model. Examples are A Goetz (ESRF, France) with “Tango – an object oriented control system based on CORBA”; and C Scafuri (Sincrotrone Trieste, Italy) with “The ELETTRA object-oriented framework for high-level software development”, as a distributed programming model, using Java as a programming language.

Noteworthy is the growth of Windows 98/NT, followed closely by Linux. Both are competing with the more traditional UNIX platforms. Increased geographical distribution of systems as well as requirements relating to remote observation and monitoring lead naturally to the application of the Web and related technologies (J Farthing, JET, UK – “Technical preparations for remote participation at JET”).

The crucial role played by well integrated centralized data repositories was also emphasized by H Shoaee (SLAC, Stanford) – “The role of a central database for configuration management”. Indeed, controls are no longer stand alone systems but rather part of a unity that ties physics to other areas, both technical and administrative, in a Computer Integrated Manufacturing environment.

Although the Experimental Physics and Industrial Control System (EPICS) is still rather popular as a framework and set of tools for developing control system software, both in the US (K White – “The evolution of Jefferson Lab’s control system”) and in some non-US labs, commercial Supervisory Controls and Data Acquisition (SCADA) systems are now penetrating the experimental physics “market” as well (A Daneels, CERN – “What is SCADA?”).

SCADA systems prove to be effective and efficient in controlling infrastructure systems such as vacuum, cryogenics, cooling, ventilation and personnel access, and in controlling experimental physics processes such as some small to medium-sized particle detectors.

In the wake of SCADA, technologies such as OLE for Process Controls (OPC) and SoftPLC are becoming more popular.

Hardware

The hardware domain makes increasing use of commercial Programmable Logic Controllers (PLC) connected to devices via fieldbuses, and of PCI (Peripheral Component Interconnect) and its related standards.

With restricted resources, and individual in-house development minimized in favour of buying industrial systems, the task of experimental physics control specialists is steadily moving towards the integration of these industrial products into an overall comprehensive and consistent control system.

Networks are being re-engineered using 100 Mbit/s Ethernet with GigaEthernet backbones, while the Asynchronous Transfer Mode (ATM) is also considered to be candidate technology for the long distance communication of time-critical accelerator data.

Of particular importance are timing systems (T Korhonen, PSI, Switzerland – “Review of accelerator timing systems”). Telescopes as well as tokamaks and accelerators require highly stable, highly precise and highly flexible timing systems, both for event timing and counter-based systems.

The increasing complexity and sophistication of physics processes leads to the introduction of ever-more complex feedback systems, often themselves relying on measurements that need high data rates. Such high-performance measurements may require sampling rates as high as hundreds of megahertz and state-of-the-art Digital Signal Processors (DSP) (J Lister, CRPP-EPFEL, Switzerland – “The control of modern tokamaks”; J Safranek, SLAC – “Orbit control at synchrotron light sources”; and T Shea, Brookhaven – “Bunch-by-bunch instability feedback systems”).

In particular, new developments in the field of accelerator power supplies are taking advantage of the available digital technology by the use of embedded DSP controllers; the digital generation of high-stability, high-precision reference signals; and real-time algorithms for regulation (J Carwardine and F Lenkszus, Argonne – “Trends in the use of digital technology for control and regulation of power supplies”).

Engineering and management

The frequently unappreciated engineering and management aspects of control systems were also highlighted. The weight of maintenance and adaptation costs in software projects were discussed.

In the context of increasingly elaborate systems and reduced resources, and considering the progress demonstrated by industry in keeping proper control of the lifecycle of software development, project management and engineering have shown their worth in today’s physics world as well. Particular attention was paid to requirements engineering, and emphasis was given to sharing experiences and techniques in these fields. Applications of solutions from the industrial world were also presented and discussed.

News content came from status reports from a variety of control and data acquisition projects of new experimental physics facilities. Among them were the Swiss Light Source (SLS), which is being built at the Paul Scherrer Institute in Villigen, Switzerland (S Hunt, PSI – “Control and data acquisition system of the Swiss Light Source”), and the Spallation Neutron Source (SNS), which is to be built in Oak Ridge, US (D Gurd, Los Alamos – “Plans for a collaboratively developed distributed control system for the Spallation Neutron Source”). Gianni Raffi from the European Southern Observatory summarized the meeting.

Posters under control – K Furukawa of KEK, Japan (right) and Chunhong Wang of IHEP, Beijing during one of the three ICALEPCS poster sessions, which brought delegates closer together.

Added attractions

As well as the conference, two preconference workshops covered EPICS (Experimental Physics Industrial Control System) and SOSH (Software Sharing), which were organized by M Clausen of DESY and W A Watson of Jefferson Lab, respectively.

During the conference, a round table discussion, “Prospective directions in controls in geographically distributed collaborations”, chaired by W Humphrey (SLAC) and involving H Burckhart (CERN), R Claus (SLAC), J Farthing (JET), D Gurd (Los Alamos) and G Raffi (ESO), focused on the management of projects developed by distributed teams and on the experience with the available technologies for long-distance interaction.

Four tutorials covered special topics: “Cases for requirements capture and tracing” (G Chiozzi, ESO); “Network technology” (G Montessoro, Udine); “Introduction to JAVA” (J P Forestier, OSYX, France); and “Introduction to OPC” (OLE for Process Control) (F Iwanitz, Softing, Germany).

D Bulfone (conference co-chairman, Sincrotrone Trieste), left, and A Daneels (conference co-chairman, CERN), right, after the closure of their ICALEPCS’99 meeting with (centre) M Mouat (TRIUMF, Canada) of the Conference Programme Committee.

ICALEPCS’99, the seventh biennial conference, was held in Trieste on 4-8 October 1999, hosted by Sincrotrone Trieste. It took place at the “Stazione Marittima”, which has recently been restored as the city’s congress centre.

The meeting was organized by Sincrotrone Trieste in conjunction with the European Physical Society’s (EPS) Interdivisional Group on Experimental Physics Control Systems and the Istituto Nazionale di Fisica Nucleare. The International Scientific Advisory Committee was chaired by D Bulfone of Sincrotrone Trieste and A Daneels of CERN.

G Baribaud (left) of CERN, current chairman of the EPS Interdivisional Group on Experimental Physics Control Systems, awards the first EPS-EPCS prize to Thijs Wijnands of CERN during the ICALEPCS banquet at the Venetian “Villa Manin”.

The meeting brought together some 400 control specialists from 32 different countries, covering Africa, the US, Asia and Europe, and representing 116 organizations. The proceedings are available at “http://www.elettra.trieste.it/ICALEPCS99/”. An industrial programme included an exhibition and seminars.

During the conference the EPS Experimental Physics Control Systems prize was awarded for the first time. It went to T Wijnands of CERN for an advanced plasma control system for TORE SUPRA.

The post Computer control of physics is increasing appeared first on CERN Courier.

Computing is put on the MAP

cern — Mon, 06 Mar 2000 00:00:00 +0000

Computing on the Liverpool MAP. Each rack carries 30 PCs and 2 ethernet hubs. Left to right: G Patel, E Gabathuler, A Moreton and T Bowcock. In the foreground is a prototype 1 TByte server on loan from Dell Computers UK.

The University of Liverpool has just commissioned a major computer system that is dedicated to the simulation of data for current and future scientific experiments.

One of the largest in Europe, the system comprises three hundred 400 MHz PCs running under Linux. The primary role of the computer system is to simulate large numbers of events to help to optimize the design of the central vertex detector for the LHCb experiment at CERN’s LHC proton collider.

The Monte Carlo Array Processor (MAP) is now fully commissioned and produces more than 250 000 fully simulated events per day. All of the components of the system are low-price commodity items packed into custom rack-mounted boxes. The mounting ensures minimal space requirements and optimal cooling.

The power of MAP reflects the simplicity of its architecture, with essentially all of the PCs dedicated to one job. A custom control system and protocol written at the University of Liverpool has enabled very reliable communication between the “master” and the “slave” nodes on the 100BaseT internal network.

A small fraction of the system is reserved for system development and it is hoped to use this to test direct node to node communication. This would enable MAP to handle problems of much wider applicability than just event simulation.

The project will provide an insight into the operation of large-scale PC arrays planned for the LHC as well as providing the LHCb collaboration with sufficient computer power for its vertex detector optimization studies.

Despite its power, MAP is still a long way from being a general-purpose machine for analysing real or simulated data. A potential solution to the storage and analysis of large amounts of data is to store the output of the experiment on large disk servers.

The Liverpool team has tested a prototype 1 TByte server on loan from Dell Computers, UK. Unlike standard RAID architectures, there are no specialized hardware components but simply 1 Tbyte of SCSI disks attached to a high-performance server. This has the benefit of low cost compared with standard systems, and it is hoped to equip MAP with such a storage system to test its operation in such an environment.

The post Computing is put on the MAP appeared first on CERN Courier.

Weaving the Web – The Original Design and Ultimate Destiny of the World Wide Web by its Inventor

cern — Thu, 27 Jan 2000 10:01:09 +0000

by Tim Berners-Lee and Mark Fischetti, Harper, San Francisco, 1999, ISBN 0 060 251586 1 ($26).

If you’ve ever wondered what goes on in the mind of an inventor you could do a lot worse than delve into Tim Berners-Lee’s Weaving the Web. In it he and co-author Mark Fischetti explain the origins of the ideas that are now revolutionizing the communications landscape, and the vision that lies behind them.

From a childhood spent discussing maths at the breakfast table and building mock-up replicas of the Ferranti computers his parents worked on, Berners-Lee moved on to building his own computer out of salvaged pieces of electronics and an early microprocessor chip.

In 1980, he went to CERN on a six-month contract. There he wrote a hypertext program called Enquire to help him keep track of the complex web of who did what on the accelerator controls project he worked on. Back at CERN at the end of the decade, Berners-Lee transported the idea behind Enquire to the Internet, with the now well known results.

Berners-Lee’s book is a very personal account, and it’s all the more readable for that. Like most of us, Tim Berners-Lee has a mind that’s better at storing random associations than hierarchical structures. And, like most of us, his mind is prone to mislaying some of those associations. Enquire began as an effort to overcome that shortcoming and evolved into something much bigger.

Berners-Lee is an idealist, driven by the desire to make the world a better place and the profound belief that the Web can do that. Now far from the rarefied air of a pure research laboratory, Berners-Lee gives credit to the atmosphere in which his ideas were allowed to mature. “I was very lucky, in working at CERN, to be in an environment… of mutual respect and of building something very great through a collective effort that was well beyond the means of any one person,” he explained. “The environment was complex and rich; any two people could get together and exchange views, and even end up working together. This system produced a weird and wonderful machine, which needed care to maintain, but could take advantage of the ingenuity, inspiration, and intuition of individuals in a special way. That, from the start, has been my goal for the World Wide Web.”

The post Weaving the Web – The Original Design and Ultimate Destiny of the World Wide Web by its Inventor appeared first on CERN Courier.