The 7th Mining and Learning from Time Series Workshop

Introduction

Time series data are ubiquitous. In domains as diverse as finance, retail, entertainment, transportation and health care, we observe a fundamental shift away from parsimonious, infrequent measurement to nearly continuous monitoring and recording. Recent advances in diverse sensing technologies, ranging from remote sensors to wearables and social sensing, are generating a rapid growth in the size and complexity of time series archives. Thus, although time series analysis has been studied extensively, its importance only continues to grow. What is more, modern time series data pose significant challenges to existing techniques (e.g., irregular sampling in hospital records and spatiotemporal structure in climate data). Finally, time series mining research is challenging and rewarding because it bridges a variety of disciplines and demands interdisciplinary solutions. Now is the time to discuss the next generation of temporal mining algorithms. The focus of MiLeTS workshop is to synergize the research in this area and discuss both new and open problems in time series analysis and mining. The solutions to these problems may be algorithmic, theoretical, statistical, or systems-based in nature. Further, MiLeTS emphasizes applications to high impact or relatively new domains, including but not limited to biology, health and medicine, climate and weather, road traffic, astronomy, and energy.
The MiLeTS workshop will discuss a broad variety of topics related to time series, including:

Time series pattern mining and detection, representation, searching and indexing, classification, clustering, prediction, forecasting, and rule mining.
Time series with special structure: spatiotemporal (e.g., traffic speeds at different locations), relational (e.g., patients with similar diseases), hierarchical, etc.
Time series with sparse or irregular sampling, missing values at and not at random, and special types of measurement noise or bias.
Time series that are multivariate, high-dimensional, heterogeneous, etc., or that possess other atypical properties.
Time series analysis using less traditional approaches, such as deep learning and subspace clustering.
Privacy preserving time series mining and learning.
Online, high-speed learning and mining from streaming time series.
Uncertain time series mining.
Applications to high impact or relatively new time series domains, such as health and medicine, road traffic, and air quality.
New, open, or unsolved problems in time series analysis and mining.

Schedule

8:00 AM - 5:00 PM, August 15th, 2021, Pacific Time

MORNING SESSION (Pacific Time)

08:00-08:10 Opening remarks

08:10-09:10 Keynote Talk

Deep Learning Data-driven approaches for Epidemic Forecasting, Aditya Prakash

9:10-10:30 Contributed Talks

RLAD: Time Series Anomaly Detection through ReinforcementLearning and Active Learning. Tong Wu and Jorge Ortiz
Personalized and Environment-Aware Battery Prediction for Electric Vehicles. Dongyue Li, Guangyu Li, Bo Jiang, Zhengping Che and Yan Liu
Exploring Generative Data Augmentation in Multivariate Time Series Forecasting : Opportunities and Challenges. Ankur Debnath, Govind Waghmare, Hardik Wadhwa, Siddhartha Asthana and Ankur Arora
Short Text Clustering in Continuous Time Using Stacked Dirichlet-Hawkes Process with Inverse Cluster Frequency Prior. Avirup Saha and Balaji Ganesan
TE-ESN: Time Encoding Echo State Network for Prediction Based on Irregularly Sampled Time Series Data. Chenxi Sun, Shenda Hong, Moxian Song and Hongyan Li
Learning Robust Representations using a Change Point Framework. Ame Osotsi and Qunhua Li
Low-Rank Autoregressive Tensor Completion for Spatiotemporal Traffic Data Imputation. Xinyu Chen, Mengying Lei, Nicolas Saunier and Lijun Sun
Improving COVID-19 Forecasting using eXogenous Variables. Mohammadhossein Toutiaee, Xiaochuan Li, Yogesh Chaudhari, Shophine Sivaraja, Aishwarya Venkataraj, Indrajeet Javeri, Yuan Ke, Ismailcem Arpinar, Nicole Lazar and John Miller

10:30-11:00 Coffee Break

11:00-12:00 Keynote Talk

Modeling raw, messy time series data with latent stochastic differential equations, David Duvenaud

12:00-13:00 Lunch Break

AFTERNOON SESSION (Pacific Time)

13:00-14:00 Keynote Talk

Irrational Exuberance: Why we should not believe 99% of papers on Time Series Anomaly Detection, Eamonn Keogh

14:00-14:30 Poster Spotlights

14:30-16:00 Coffee Break & Poster Session

16:00-16:50 Keynote Talk

On the Interface between Optimal Transport and Time Series, Marco Cuturi

16:50-17:00 Concluding Remarks

Speakers

B. Aditya Prakash

Associate Professor
Georgia Institute of Technology

Deep Learning Data-driven approaches for Epidemic Forecasting

The devastating impact of the currently unfolding global COVID-19 pandemic and those of the Zika, SARS, MERS, and Ebola outbreaks over the past decade has sharply illustrated our enormous vulnerability to emerging infectious diseases. There are many questions that are being studied by epidemiologists and public officials during these outbreaks. Building on our prior work, we have been pursuing multiple activities amidst the COVID-19 pandemic in the United States, collaborating with partners in academia, industry and public health agencies, from award-winning work on helping forecast pandemic trajectories (also shown on the CDC website) to designing more localized and less burdensome campus interventions. In this talk, I will briefly give an overview of our recent research in designing well calibrated, robust, accurate and interpretable deep learning models for epidemic forecasting, illustrating the important role data science and machine learning have to play for pandemic prevention and prediction.

Bio
B. Aditya Prakash is an Associate Professor in the College of Computing at the Georgia Institute of Technology (“Georgia Tech”). He received a Ph.D. from the Computer Science Department at Carnegie Mellon University in 2012, and a B.Tech (in CS) from the Indian Institute of Technology (IIT) -- Bombay in 2007. He has published one book, more than 80 papers in major venues, holds two U.S. patents and has given several tutorials at leading conferences. His work has also received multiple best-of-conference, best paper and travel awards. His research interests include Data Science, Machine Learning and AI, with emphasis on big-data problems in large real-world networks and time-series, with applications to computational epidemiology/public health, urban computing, security and the Web. Tools developed by his group have been in use in many places including ORNL, Walmart and Facebook. He has received several awards such as a Facebook Faculty Award, the NSF CAREER award and was named as one of ‘AI Ten to Watch’ by IEEE. His work has also won awards in multiple data science challenges (e.g the Facebook COVID19 Symptom Challenge) and been highlighted by several media outlets/popular press like FiveThirtyEight.com. He is also a member of the infectious diseases modeling MIDAS network and core-faculty at the Center for Machine Learning (ML@GT) and the Institute for Data Engineering and Science (IDEaS) at Georgia Tech. Aditya’s Twitter handle is @badityap.

David Duvenaud

Assistant Professor
University of Toronto

Modeling raw, messy time series data with latent stochastic differential equations

Much real-world data is sampled at irregular intervals, but most time series models require regularly-sampled data. Continuous-time models address this problem, but until now only deterministic models (based on ordinary differential equations) or linear-Gaussian models were efficiently trainable with millions of parameters. We construct a scalable algorithm for computing gradients through samples from stochastic differential equations (SDEs), and for gradient-based stochastic variational inference in function space, all with the use of adaptive black-box SDE solvers. This allows us to fit a new family of richly-parameterized distributions over time series, in which neural networks can parameterize both dynamics and likelihoods. We demonstrate these latent SDEs on motion capture data, and provide an open-source PyTorch library for fitting large SDE models.
[Slides] [Paper] [Code]

Bio
David Duvenaud is an assistant professor at the University of Toronto. His research focuses on constructing deep probabilistic models to help predict, explain and design things. For example: Neural ODEs, a kind of continuous-depth neural network, Automatic chemical design using generative models, Gradient-based hyperparameter tuning, Structured latent-variable models for modeling video, and Convolutional networks on graphs. Previously, He was a postdoc in the Harvard Intelligent Probabilistic Systems group with Ryan Adams. He did my Ph.D. at the University of Cambridge, where his advisors were Carl Rasmussen and Zoubin Ghahramani. His M.Sc. advisor was Kevin Murphy at the University of British Columbia. He spent a summer working on probabilistic numerics at the Max Planck Institute for Intelligent Systems, and the two summers before that at Google Research, doing machine vision. He co-founded Invenia, an energy forecasting and trading firm where he still consult. He is also a founding member of the Vector Institute.

Eamonn Keogh

Distinguished Professor
University of California - Riverside

Irrational Exuberance: Why we should not believe 99% of papers on Time Series Anomaly Detection

In the last five years there has been an explosion of paper on time series anomaly detection (TSAD) appearing in all the top conferences. In this talk I will make a surprising claim. Almost all such papers suffer from various flaws, including: Testing on deeply flawed datasets, use of inappropriate measures of success, non-reproducible experiments, unjustified complexity, and ignoring competitive decades-old methods. I will demonstrate that because of these flaws, we should not believe the claims of most papers on time series anomaly detection. Rather than a completely pessimistic talk, I will take this opportunity to release a new set of 250 benchmark datasets and guidelines that will go some way to mitigating the problems.
[Slides]

Bio
Eamonn Keogh is a Distinguished Professor of Computer Science at the University of California Riverside. With his students, he invented many of the most commonly used time series primitives, including Shapelets, Discords, Motifs, SAX, PAA, LB_keogh and the Matrix Profile. He has won at least one best paper award in the major data mining conferences (SIGMOD, SIGKDD, ICDM, SDM) and his h-index of 100 reflects the significant amount of citations his work has attracted.

Marco Cuturi

Professor
CREST - ENSAE, Institut Polytechnique de Paris, Google Brain

On the Interface between Optimal Transport and Time Series

I will start this talk with a short intro to optimal transport theory, and mention a few interesting areas at the intersection of both optimal transport and time series. I will describe in more detail the JKO proximal gradient descent scheme and show how it can be used to model time series of snapshots of populations (joint work with Charlotte Bunne, Laetitia Papaxanthos and Andreas Krause) and also the Wasserstein DTW discrepancy for such time series.

Bio
Marco Cuturi joined Google Brain, in Paris, in October 2018. He graduated from ENSAE (2001), ENS Cachan (Master MVA, 2002) and holds a PhD in applied maths obtained in 2005 at Ecole des Mines de Paris. He worked as a post-doctoral researcher at the Institute of Statistical Mathematics, Tokyo, between 11/2005 and 03/2007. He worked in the financial industry between 04/2007 and 09/2008. After working at the ORFE department of Princeton University between 02/2009 and 08/2010 as a lecturer, he was at the Graduate School of Informatics of Kyoto University between 09/2010 and 09/2016 as a tenured associate professor. He then joined ENSAE, the french national school for statistics and economics, in 9/2016, where he still teaches. His recent proposal to solve optimal transport using an entropic regularization has re-ignited interest in optimal transport and Wasserstein distances in the machine learning community. His work has recently focused on applying that loss function to problems involving probability distributions, e.g. topic models / dictionary learning for text and images, parametric inference for generative models, regression with a Wasserstein loss and probabilistic embeddings for words.

Accepted Posters

Visual Time Series Forecasting: An Image-driven Approach Naftali Cohen, Srijan Sood, Zhen Zeng, Tucker Balch and Manuela Veloso.

Multi-Window-Finder: Domain Agnostic Window Size for Time Series Data Shima Imani, Alireza Abdoli, Ali Beyram, Azam Imani and Eamonn Keogh.

Detection and clustering of lead-lag networks for multivariate time series with an application to financial markets Stefanos Bennett, Mihai Cucuringu and Gesine Reinert.

Temporal Progression: A case study in Porcine Survivability through Hemostatic Nanoparticles Chhaya K, Nuzhat Maisha, Leasha J Schaub, Jacob Glaser, Erin Lavik and Vandana Janeja.

Recurrent Attentive Kernel Learning for Shark Activity Recognition Matthew Buchholz, Wenlu Zhang, Emily N. Meese, Yu Yang, Christopher G. Lowe and Hen-Geul Yeh.

HIVE-COTE 2.0: a new meta ensemble for time series classification Matthew Middlehurst, James Large, Michael Flynn, Jason Lines, Aaron Bostrom and Anthony Bagnall.

Aggregate Learning for Mixed-Frequency Data Daisuke Moriwaki, Takamichi Toda and Kazuhiro Ota.

Multivariate time series forecasting with diffusion kernels: Freeway traffic prediction Semin Kwak, Nikolasa Geroliminis and Pascal Frossard.

Time Series Features for Classification of Contaminated Cell Cultures Laura Tupper, Charles Keese and David Matteson.

Actionable Insights in Multivariate Time-series for Urban Analytics Anika Tabassum, Supriya Chinthavali, Varisara Tansakul and B. Aditya Prakash.

Forecasting COVID-19 Counts At A Single Hospital: A Hierarchical Bayesian Approach Alexandra Lee, Panagiotis Lymperopoulos, Joshua T. Cohen, John B. Wong and Michael Hughes.

Call for Papers

Submissions should follow the SIGKDD formatting requirements (unless otherwise stated) and will be evaluated using the SIGKDD Research Track evaluation criteria. Preference will be given to papers that are reproducible, and authors are encouraged to share their data and code publicly whenever possible. Submissions are strongly recommended to be no more than 4 pages, excluding references or supplementary materials (all in a single pdf). The appropriateness of using additional pages over the recommended length will be judged by reviewers. All submissions must be in pdf format using the workshop template (latex, word). Submissions will be managed via the MiLeTS 2021 EasyChair website: https://easychair.org/conferences/?conf=milets2021.

Note on open problem submissions: In order to promote new and innovative research on time series, we plan to accept a small number of high quality manuscripts describing open problems in time series analysis and mining. Such papers should provide a clear, detailed description and analysis of a new or open problem that poses a significant challenge to existing techniques, as well as a thorough empirical investigation demonstrating that current methods are insufficient.

COVID-19 Time Series Analysis Special Track: The COVID-19 pandemic is impacting almost everyone worldwide and is expected to have life-altering short and long-term effects. There are many potential applications of time series analysis and mining that can contribute to understanding of this pandemic. We encourage submission of high quality manuscripts describing original problems, time series datasets, and novel solutions for time series analysis and forecasting of COVID-19.

The review process is single-round and double-blind (submission files have to be anonymized). Concurrent submissions to other journals and conferences are acceptable. Accepted papers will be presented as posters during the workshop and list on the website (non-archival/without proceedings). Besides, a small number of accepted papers will be selected to be presented as contributed talks.

Any questions may be directed to the workshop e-mail address: kdd.milets@gmail.com.

Introduction

Schedule

8:00 AM - 5:00 PM, August 15th, 2021, Pacific Time

MORNING SESSION (Pacific Time)

AFTERNOON SESSION (Pacific Time)

Speakers

B. Aditya Prakash

Deep Learning Data-driven approaches for Epidemic Forecasting

David Duvenaud

Modeling raw, messy time series data with latent stochastic differential equations

Eamonn Keogh

Irrational Exuberance: Why we should not believe 99% of papers on Time Series Anomaly Detection

Marco Cuturi

On the Interface between Optimal Transport and Time Series

Accepted Papers

Accepted Posters

Call for Papers

Key Dates

Workshop Organizers & Steering Committee

Sanjay Purushotham

YaGuang Li

Zhengping Che

Eamonn Keogh

Yan Liu

Abdullah Mueen

Program Committee