VESTEC is a European funded project that builds a flexible toolchain to combine multiple data sources, efficiently extract essential features, enable flexible scheduling and interactive supercomputing, and realise 3D visualization environments for interactive explorations by stakeholders and decision makers.
VESTEC will develop and evaluate methods and interfaces to integrate high-performance data analytics processes into running simulations and real-time data environments. Interactive ensemble management will launch new simulations for new data, building up statistically more and more accurate pictures of emerging, time-critical phenomena. Innovative data compression approaches, based on topological feature extraction and data sampling, will result in considerable reductions in storage and processing demands by discarding domain-irrelevant data.
Objectives of VESTEC
Enable visionary applications by emerging HPC use modes
VESTEC explicitly addresses urgent decision making as an emerging use mode and enables three pilot applications with high impact for EU society: wild-fire monitoring, mosquito-borne diseases, and space weather forecasting. For each pilot we foresee visionary approaches to coupling and fusion concepts for data from sensor networks and simulations.
Support processing of real-time data
All three applications will integrate near real-time data from multiple sources. The wild-fire pilot will process and integrate near real-time sensor data from MODIS, VIIRS and other sources. The mosquito- borne diseases application needs to include and combine large amount of environmental measures including temperature, precipitation, relative humidity, vegetation indices and land use information. This information is gathered from in-situ sensors such as weather stations (via IoT technology) and high-resolution satellite imagery. The space weather pilot will integrate space-borne and Earth-based sensor data. For the integration and analysis of those sensor data sources we will evaluate big data technologies such as Apache Kafka or Spark Streams for practical use in combination with extreme computing environments.
Enable HPC architecture for real-time data analysis
Each pilot application requires the execution of computation-intensive simulations and real-time data analysis on supercomputers. To derive reliable decisions, each pilot demands the assimilation of simulation with real-time sensor data to initialize the simulation, adapt parameters during run-time, or evaluate the simulation results.
Enable interactive supercomputing and adapt operational HPC procedures
VESTEC will evaluate and develop methods to use the high performance computing facilities of today and tomorrow for urgent decision making using a new mode of interactive supercomputing, with special focus on minimizing latencies between the steps of data processing, numerical simulation, data analytics, and scientific visualization.
Develop co-scheduling techniques
Interactive supercomputing also considers co-scheduling. We will investigate performance measures to optimize the coupling of numerical simulation codes and the scheduling of individual parts of the processing pipeline from data assimilation to 3D visualization.
Treat uncertainties in predictions
VESTEC will create a framework for the pilot applications to execute multiple simulations with varying input data and simulation parameters, with new simulations starting as new data become available. Theseensemble simulations will each become samples of a larger and more comprehensive simulation population, building greater awareness of uncertainties and more accurate risk-based information for decision making. Eventually, this will lead to action planning that is more reliable.
Support processing of large-scale data
For all use cases, the workflows have to process large-scale datasets arising from ensembles of complex, coupled simulations; our space weather application produces 10-100 TB of data per simulation. The coupled weather/fire and the weather/disease risk simulations will also produce large datasets. Approaches for efficient in-situ processing will be designed to deliver verdicts for fast decisions.
Support processing of stored data
Stored datasets may be used for offline evaluation of the framework. They can also be used as input for real-time scenarios where sensor data is not available or is delayed. One example will be the integration of existing weather simulation data as input for the spread of wild-fire and for the risk-assessment of mosquito-borne diseases. By using standard streaming data sources such as Apache Kafka, VESTEC’s data assimilation framework will have this flexibility built in.
Support analytics of different data structures
The wild-fire pilot requires the analysis of structured time dependent simulation data. By contrast, the mosquito-borne diseases pilot requires the analysis of structured multi-layer simulation data (mosquito abundance, outbreak probability, case number distributions) which has to be correlated with statistical metrics. Finally, the space weather use case is based on a large 3-dimensional and time dependent particle- in-cell data structure. In each case, simulation ensembles add a further dimension. The employment ofdifferent data structures will prove the generality of our implementations.
Disruptive algorithmic strategies for minimized data movement
VESTEC will develop innovative data sampling strategies. These algorithms will exploit the topological information extracted (in-memory) to reduce data sizes and therefore data movement between processing steps and external visualization environments. This approach should also enable interactive exploration in virtual environments with data on remote HPC resources.
Direct processing of compressed data
The topology-based data sampling approach will also specify a novel data structure which will reduce the storage demands for simulation codes – in effect a lossy compression algorithm. Through careful design we expect high quality in the reconstructed results and no impact in the accuracy for urgent decision scenarios. An in-depth analysis with all pilots will be performed to prove not only the usefulness for those scenarios but also for general purpose application in other scientific applications.
Enable interactive and immersive 3D visualization
The development of interactive visual data exploration approaches and applications for urgent decision making is one of the primary goals of VESTEC. Extracted features will be streamed to web- and desktop- based applications. They will also be rendered in virtual environments for immersive exploration.