ZüKoSt: Seminar on Applied Statistics

Would you like to be notified about these presentations via e-mail? Please subscribe here.

Modal title

Modal content

Spring Semester 2025

Date / Time

Speaker

Title

Location

20 February 2025
15:15-16:00

Rafael M. Frongillo
CU Boulder

Details

ZueKoSt: Seminar on Applied Statistics

Title	Incentive problems in data science competitions, and how to fix them
Speaker, Affiliation	Rafael M. Frongillo, CU Boulder
Date, Time	20 February 2025, 15:15-16:00
Location	HG G 19.2
Abstract	Abstract: Machine learning and data science competitions, wherein contestants submit predictions about held-out data points, are an increasingly common way to gather information and identify experts. One of the most prominent platforms is Kaggle, which has run competitions with prizes up to 3 million USD. The traditional mechanism for selecting the winner is simple: score each prediction on each held-out data point, and the contestant with the highest total score wins. Perhaps surprisingly, this reasonable and popular mechanism can incentivize contestants to submit wildly inaccurate predictions. The talk will begin with intuition for the incentive issues and what sort of strategic behavior one would expect---and when. One takeaway is that, despite conventional wisdom, large held-out data sets do not always alleviate these incentive issues, and small ones do not necessarily suffer from them, as we confirm with formal results. We will then discuss a new mechanism which is approximately truthful, in the sense that rational contestants will submit predictions which are close to their best guess. If time permits, we will see how the same mechanism solves an open question for online learning from strategic experts. Bio: Rafael (Raf) Frongillo is an Associate Professor of Computer Science at the University of Colorado Boulder. His research lies at the interface between theoretical machine learning and economics, primarily focusing on information elicitation mechanisms, which incentivize humans or algorithms to predict accurately. Before Boulder, Raf was a postdoc at the Center for Research on Computation and Society at Harvard University and at Microsoft Research New York. He received his PhD in Computer Science at UC Berkeley, advised by Christos Papadimitriou and supported by the NDSEG Fellowship.

Incentive problems in data science competitions, and how to fix themread_more

HG G 19.2

11 April 2025
15:15-16:15

Victoria Stodden
University of Southern California

Details

ZueKoSt: Seminar on Applied Statistics

Title	Leveraging AI in Scientific Research: Transparency, Reproducibility, and Trust
Speaker, Affiliation	Victoria Stodden, University of Southern California
Date, Time	11 April 2025, 15:15-16:15
Location	HG G 19.1
Abstract	In the last 10 years colossal cloud infrastructure investments behind the rise of near-ubiquitous global mobile technologies have trickled down to scientific research through innovative infrastructure including cloud compute and storage, I/O tools, data analysis and modeling frameworks, which in turn have generated broad and expanding communities of users and supporters. Arguably, the recent success of Large Language Models were catalyzed by the resulting technological innovations of 1) open and accessible massive data, and 2) re-executable discovery pipelines for model estimation and prediction. These changes are deeply disruptive to the research community since they open new paths to knowledge creation that were previously inaccessible and largely culturally unknown. The scientific community is faced with the challenge of responding to changes in research modalities due to these technological innovations. Research is now conducted as an “Olympics” of benchmarked competitions between Machine Learning models leveraged by the opaque results of Large Language Models, access to massive data, and redeployment of complex scientific discovery workflows. In this seminar I provide a roadmap of challenges and responses by various stakeholders in the research community to ensure that scientific results remain reliable and reproducible, and secure within a position of trust in the broader society.

Leveraging AI in Scientific Research: Transparency, Reproducibility, and Trustread_more

HG G 19.1

8 May 2025
15:15-16:15

Toby Dylan Hocking
Université de Sherbrooke, Canada

Details

ZueKoSt: Seminar on Applied Statistics

Title	Using and contributing to the data.table package for efficient big data analysis
Speaker, Affiliation	Toby Dylan Hocking , Université de Sherbrooke, Canada
Date, Time	8 May 2025, 15:15-16:15
Location	HG G 43
Abstract	data.table is an R package with C code that is one of the most efficient open-source in-memory database packages available today. First released to CRAN by Matt Dowle in 2006, it continues to grow in popularity, and now over 1500 other CRAN packages depend on data.table. This talk will discuss basic and advanced data manipulation topics, and end with a discussion about how you can contribute to data.table.

Using and contributing to the data.table package for efficient big data analysisread_more

HG G 43

16 May 2025
15:15-16:00

Stephan Mandt
University of California, Irvine

Details

ZueKoSt: Seminar on Applied Statistics

Title	Scientific Inference with Diffusion Generative Models
Speaker, Affiliation	Stephan Mandt, University of California, Irvine
Date, Time	16 May 2025, 15:15-16:00
Location	HG E 33.5
Abstract	Diffusion models have transformed generative modeling in various domains such as vision and language. But can they serve as tools for scientific inference? In this talk, I present a perspective that reframes diffusion models as Bayesian solvers for scientific inverse problems—involving a noisy measurement process--with applications ranging from climate modeling to astrophysical imaging. Scientific use cases demand more than photorealism—they require calibrated uncertainty, distributional fidelity, efficient conditional sampling, and the ability to model heavy-tailed data. I’ll highlight four recent advances developed to meet these needs: 1. Variational Control, an improved framework for conditional generation in pretrained diffusion models (ICML ’25) 2. Heavy-Tailed Diffusion Models, for enabling accurate modeling of sparse and extreme-valued scientific data (ICLR ’25) 3. Conjugate Integrators, for enabling fast conditional sampling without retraining (NeurIPS ’24) 4. Generative Uncertainty for Diffusion Models, for assessing and exploiting epistemic uncertainties in data generation tasks (UAI '25) Speaker: Prof. Stephan Mandt [He is an Associate Professor of Computer Science and Statistics at the University of California, Irvine. His research contributes to the foundations and applications of generative AI, with a focus on generative modeling of 2D, 3D, and sequential data, compression, resource-efficient learning, inference algorithms, and AI-driven scientific discovery. He is a Chan Zuckerberg Investigator and AI Resident and has received the NSF CAREER Award, the UCI ICS Mid-Career Excellence in Research Award, and a Kavli Fellowship. Before UCI, he led the machine learning group at Disney Research and held postdoctoral positions at Princeton and Columbia. Stephan frequently serves as a Senior Area Chair for NeurIPS, ICML, and ICLR and was most recently Program Chair for AISTATS 2024 and General Chair for AISTATS 2025.]

Scientific Inference with Diffusion Generative Modelsread_more

HG E 33.5

Notes: if you want you can subscribe to the iCal/ics Calender.

Archive: AS 25 SS 25 AS 24 SS 24 AS 23 SS 23 AS 22 SS 22 AS 21 SS 20 AS 19 SS 19 AS 18 SS 18 AS 17 SS 17 AS 16 SS 16 AS 15 SS 15 AS 14 SS 14 AS 13 SS 13 AS 12 SS 12 AS 11 SS 11 AS 10 SS 10 AS 09