Prof. Dr. Rainer Gemulla, Universität Mannheim

31.10.2106, 4 pm

Read more

I'm happy to announce that a workshop proposal by me and Reza Zadeh from Stanford has been accepted at Sigmod'17.

Read more

Big data is often defined as any data set that cannot be handled using today’s widely available mainstream techniques and technologies. The challenges of handling big data are often described using 3-Vs (volume, variety and velocity): high volume of data from a variety of data sources arriving with high velocity analysed to achieve an economic benefit. However, the 3-Vs fail to reflect complexity of “Big Data” in its entirety. The real complexity from a technical perspective stems from the fact that complex predictive and prescriptive analytic methods need to be applied to huge, heterogeneous data sets. However, “Big Data” (or often also called “Smart Data”) has a much wider scope and has challenges and opportunities in 5 dimensions: technology, application, economic, legal and social ...

Opens internal link in current windowRead the full article

Data Scientist - Bridging the Talent Gap

According to the Opens external link in current windowHarvard Business Review, Data Scientist is “The Sexiest Job of the 21st Century”. Data scientists are often considered to be wizards that deliver value from big data. These wizards need to have knowledge in three very distinct subject areas, namely, scalable data management, data analysis and domain area expertise. However, it is a challenge to find these jacks-of-all-trades that cover all three areas. Or, as the Opens external link in current windowWall Street Journal puts it “Big Data’s Problem is Little Talent”. Naturally, finding talented data scientists is also a requirement, if we are to put big data to good use. If data analysis were specified using a declarative language, data scientists would not have to worry about low-level programming any longer. Instead, they would be free to concentrate on their data analysis problem. The goal of the Berlin Big Data Center is to help bridge the Talent Gap of Big Data through researching and developing novel technology. Our starting point is the Opens external link in current windowApache Flink system. We aim to enable deep analytics of huge heterogeneous data sets with low latency by developing advanced, scalable data analysis and machine learning methods. Our goal is to specify in these methods a declarative way and optimize and parallelize them automatically, in order to empower data scientists to focus on the analysis problem at hand. That is, relieving them from the need to be system programmers.

Read more about it in the article of the VLDB keynote "Opens external link in current windowBreaking the Chains: On Declarative Data Analysis and Data Independence in the Big Data Era" by Volker Markl.