Vision of the Berlin Big Data Center
Research and Development of methods and technologies for data science at the interface between data management and machine learning
The declarative specification and automatic optimization, parallelization and hardware adaptation of advanced machine learning methods constitute the scientific core of the Berlin Big Data Center.
That is, we will develop and declaratively specify scalable machine learning algorithms and thus fuse the academic disciplines of machine learning and data management into scalable data analysis.
As a first step, it has been shown that massive parallelization is an appropriate means for some learning methods. Approaches that emanate directly from an online learning setting in which the data is processed as a stream are also promising.
We will dramatically simplify the creation of data analysis programs, increase the big data analyst user base and drastically reduce the cost of the creating complex Big Data analyses.
Scalable Data Analysis
To facilitate the analysis of large volumes of heterogeneous data with complex machine learning algorithms, we need to extend existing parallel programming models by incorporating varying concepts, such as ordered collections, multi-dimensionality, and access to a distributed state within between iterative algorithm execution steps.
Furthermore, we will design and build a highly scalable open-source system which can handle the automatic optimization, parallelization as well as adaptation to heterogeneous hardware setups of such declaratively specified analysis methods, in conjunction with a toolbox of scalable machine learning and other data analysis algorithms.