Handling Astronomical Data from the World's Largest Telescope

First Posted: Dec 18, 2013 02:54 PM EST

What is dark energy? How did the first stars form? And are we alone in the universe?

To begin answering such big questions, the Square Kilometre Array (SKA) project — a collaboration of ten countries and headquartered at Jodrell Bank, UK — plans to build thousands of dishes and millions of radio-wave dipole antennas in the South African and Australian deserts, producing astronomically large amounts of data. How to deal with this data is the first challenge to be conquered before the big questions themselves can be answered.

Though the SKA project is in the early stages of development, when fully operational the telescope array could produce data at a rate of many petabits per second, which is 100 times greater than current global internet traffic. Accounting for Moore’s Law, the fastest supercomputer in existence at the time SKA begins operation in 2020 should be reaching exascale. Yet, even this computer would barely be able to meet the data-processing needs of SKA in its early stages, and would soon become overwhelmed as more telescopes in the array become operational.

Large surveys are changing the way we process data. “Until recently, astronomers would observe at the telescope, then take their data home on disks and spend weeks or months processing it,” says Ray Norris, an astronomer at the Commonwealth Scientific and Industrial Research Organisation, Australia. “Next-generation large surveys will generate data volumes that are too large to transport, and so all processing must be done in situ”.

IBM’s Center for Exascale Technology and Netherlands Institute of Radio Astronomy (ASTRON) have formed a collaboration called DOME to try to meet SKA data needs. They are developing hybrid systems containing traditional supercomputer elements and another kind of processor called an accelerator. Accelerators use pattern recognition algorithms to recognize the most useful data being produced by radio telescopes and send only that data for analysis.

“The system learns by itself. By monitoring how the astronomers make use of the data, it learns how to work the data differently on several different storage units,” says Ton Engbersen, scientific director of ASTRON.

While Moore’s Law means that enough computer processing power is available to process the required amount of data, another law of computation has also been critical in making SKA a reality: the steady increase of processor efficiency over time. Known as Koomey’s law (after its discoverer Jonathan Koomey from Stanford University, California, USA), it observes that for a fixed amount of computation, power requirements fall by half every 1.6 years.

So, the biggest problem of having so much data isn’t the storage issue, but the power requirements of dealing with so much data. Ton Engbersen’s team are also attempting to design computers efficient enough to perform the required calculations without using exorbitant amounts of power.

“Every time you move a piece of data it takes power. It doesn’t happen by itself. Our focus is on trying to move the data as little as possible,” says Engbersen. Originally, the SKA project planned to send astrometric data 700km to a server farm for processing. This would cost a fortune in electrical power, explains Engbersen.

Primary filtering and analysis of the data collected would, therefore, need to be done close to the antennae. In order to do so, they would need very small, inexpensive and highly energy efficient servers to perform those processing jobs.

Engbersen’s team use standard ‘off-the-shelf’ components to make their ‘micro’ servers. What makes them more efficient compared to current standards is their configuration. Processors and memory chips are located as closely together as possible in the 3D environment. Reducing the distance data needs to travel reduces the server’s power consumption. This is especially important as the arrays are located far from traditional sources of power.

While squeezing components together may help reduce the power demand of the servers, the problem of overheating becomes exaggerated. To prevent such dense packages of electronics from overheating, IBM has developed a water cooling technique that could be used. Water is directed through microscopic channels that pass just a few micrometers from the surfaces of the processors. Heat is drawn away 10 times more effectively than with passive air cooling and removes the disadvantage of using fans for cooling, which themselves cost power. As a bonus, the waste hot water could be used to desalinate seawater in the local region.

These advances being made in server technologies help other ‘big science’ projects, too. Equally, server farms supporting the internet may also see the benefits of increased electrical efficiency and processor speed — likely to be much needed given that by 2020 tens of billions of devices are expected to be connected to the internet. -- By Charles Harvey, © i SGTW

See Now: NASA's Juno Spacecraft's Rendezvous With Jupiter's Mammoth Cyclone

TagsSupercomputer, Big Data, Square Kilometre Array