The largest scientific instrument on the planet will produce roughly 15 Petabytes (15 million Gigabytes) of data annually when it begins operations
System crashes and the ensuing data loss may be most IT managers’ idea of the end of the world.
Yet spare a thought for the folk running the LHC Computing Grid (LCG) designed by CERN to handle the massive amounts of data produced by the Large Hadron Collider (LHC).
Many people believe the USD $4bn energy particle acclerator, which crisscrosses the border between France and Switzerland, is a Doomsday Machine that is going to create micro black holes and strangelets when switched on tomorrow.
While that is, hopefully, pure fantasy what is more of a nightmare is how to deal with the colossal amounts of data that the 27km-long LHC is going to produce.
The project is expected to generate 27 TB of raw data per day, plus 10 TB of "event summary data", which represents the output of calculations done by the CPU farm at the CERN data center.
The LHC is CERN’s new flagship research facility, which is expected to provide new insights into the mysteries of the universe.
It will produce beams seven times more energetic than any previous machine, and around 30 times more intense when it reaches design performance, probably by 2010.
Once stable circulating beams have been established, they will be brought into collision, and the final step will be to commission the LHC’s acceleration system to boost the energy to 5 TeV, taking particle physics research to a new frontier.
CERN director general, Robert Aymar, said: “The LHC will enable us to study in detail what nature is doing all around us.
“The LHC is safe, and any suggestion that it might present a risk is pure fiction.”
Originally standing for Conseil Européen pour la Recherche Nucléaire (European Council for Nuclear Research), CERN was where the World Wide Web began as a project called ENQUIRE, initiated by Sir Tim Berners-Lee and Robert Cailliau in 1989.
Berners-Lee and Cailliau were jointly honored by the ACM in 1995 for their contributions to the development of the World Wide Web.
Appropriately, sharing data around the world is the goal of the LCG project.
Since it is the world’s largest physics laboratory, CERN’s main site at Meyrin has a large computer center containing very powerful data processing facilities primarily for experimental data analysis.
Its mission has been to build and maintain a data storage and analysis infrastructure for the entire high energy physics community that will use the LHC.
And because of the need to make the data available to researchers around the world to access and analyse, it is a major wide area networking hub.
The data from the LHC experiments will be distributed according to a four-tiered model. A primary backup will be recorded on tape at CERN, the “Tier-0” center of LCG.
After initial processing, this data will be distributed to a series of Tier-1 centers, large computer centers with sufficient storage capacity and with round-the-clock support for the Grid.
The Tier-1 centers will make data available to Tier-2 centers, each consisting of one or several collaborating computing facilities, which can store sufficient data and provide adequate computing power for specific analysis tasks.
Individual scientists will access these facilities through Tier-3 computing resources, which can consist of local clusters in a University Department or even individual PCs, and which may be allocated to LCG on a regular basis.
A live webcast of the event will be broadcast tomorrow. What are your thoughts on LHC – will it reveal the secrets of the universe or a gaping black hole?
Comments