Big and Fast Data Analytics

Yanlei Diao, a Professor at the Computer Science Laboratory of the École Polytechnique, develops new algorithms and methods to analyse and extract useful insights from data. Her work funded by the European Research Council is conducted in collaboration with major research institutions and industrial partners.
Big and Fast Data Analytics
15 Mar. 2023
Research, IA et Science des données, LIX, Informatique

Today, we live in the era of big data. Data is the foundation of artificial intelligence and can generate innovative tools. Yet, the volume of data is exploding, the time needed to analyze it is increasing, so is the cost of insight extraction. Therefore, there is a decisive question for many businesses:  Given ever-growing data, can we accelerate this process while minimizing cost?

At the Computer Science Laboratory of the École Polytechnique (LIX*), Yanlei Diao addresses this question in her research. As a Professor of Computer Science, she is leading a project called “Charting a New Horizon of Big and Fast Data Analysis through Integrated Algorithm Design” with 2.5 million euros in funding over five years from the European Research Council (ERC). Her team includes collaborators from institutes in the US, as well as industry partners across Europe and Asia. “Many companies address this decisive question by hiring engineering teams, yet still suffer from suboptimal performance because of the complexity of big data. In our research project, we focus on two questions to help handle big data,” she says.

The first research issue is how to achieve both high speed and low cost for insight extraction. For each business user, a big data system stores data on many machines and handles numerous analytical tasks. Achieving high speed and low cost in this process requires controlling many system parameters and technical know-how.  The team pioneered an automated optimization approach where speed and cost objectives are modelled over many parameters through large-scale machine learning. Then these parameters are resolved in a multi-objective optimization process, maximizing speed while minimizing cost. “My past partnership with Alibaba Cloud enabled its finance, food, and travel businesses to improve speed and reduce cloud computing cost, both by up to 70%, in addition to saving the cost of its engineering team. And our techniques are being deployed by Amazon Web Services, the world’s leading cloud service provider,” explains Yanlei Diao, who also holds a part-time position as an Amazon Scholar.

The second research issue is further accelerating insight extraction when data arrives from live sources. Users need insights from such data in real time, known as stream analytics. However, traditional systems are too slow because they first write data to disk and then run analytics. That is why the team develops new algorithms for stream analytics. Among them, the “explainable anomaly detection” algorithm automatically detects a variety of anomalies via the latest deep-learning techniques, and further overcomes the non-interpretability of deep learning by returning human-readable, actionable insights to prevent or remedy the situation. As such, Yanlei Diao and her collaborators became one of the first partners of SWIFT, a financial messaging service provider. They provided its first unsupervised, explainable anomaly detection algorithm for fraudulent transactions in near real-time.

About Yanlei Diao

Yanlei Diao is Professor of Computer Science at Ecole Polytechnique, France, and formerly at the University of Massachusetts Amherst, USA. She also holds a part-time position at Amazon as an Amazon Scholar. She received her Ph.D. in Computer Science from the University of California, Berkeley, in 2005. Prof. Diao was a recipient of the 2016 ERC Consolidator Award, 2013 CRA-W Borg Early Career Award (one female computer scientist selected each year for outstanding contributions), IBM Scalable Innovation Faculty Award, and NSF Career Award. Professor Diao’s research interests lie in big data analytics and scalable, intelligent information systems, with a focus on optimization in cloud analytics, data stream analytics, explainable anomaly detection, interactive data exploration, genomic data analysis, and uncertain data management.

Yanlei Diao's research website

*LIX: a joint research unit CNRS, École Polytechnique, Institut Polytechnique de Paris, 91120 Palaiseau, France