Featured Publications

Swapnil Mishra, Seth Flaxman, Tresnia Berah, Harrison Zhu, Mikko Pakkanen, Samir Bhatt

January 2022 Statistics and Computing

π VAE: a stochastic process prior for Bayesian deep learning with MCMC

Stochastic processes provide a mathematically elegant way to model complex data. In theory, they provide flexible priors over function classes that can encode a wide range of interesting assumptions. However, in practice efficient inference by optimisation or marginalisation is difficult, a problem further exacerbated with big data and high dimensional input spaces. We propose a novel variational autoencoder (VAE) called the prior encoding variational autoencoder (𝜋VAE). 𝜋VAE is a new continuous stochastic process. We use 𝜋VAE to learn low dimensional embeddings of function classes by combining a trainable feature mapping with generative model using a VAE. We show that our framework can accurately learn expressive function classes such as Gaussian processes, but also properties of functions such as their integrals. For popular tasks, such as spatial interpolation, 𝜋VAE achieves state-of-the-art performance both in terms of accuracy and computational efficiency. Perhaps most usefully, we demonstrate an elegant and scalable means of performing fully Bayesian inference for stochastic processes within probabilistic programming languages such as Stan.

PDF Code Dataset DOI Supplementary Material

Swapnil Mishra*, Nuno R. Faria*, Thomas A. Mellan*, Charles Whittaker*, Ingra M. Claro*, Darlan da S. Candido*, others, Oliver G. Pybus‡, Seth Flaxman‡, Samir Bhatt‡, Ester C. Sabino‡

April 2021 Science (2021)

Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil

Cases of SARS-CoV-2 infection in Manaus, Brazil, resurged in late 2020, despite previously high levels of infection. Genome sequencing of viruses sampled in Manaus between November 2020 and January 2021 revealed the emergence and circulation of a novel SARS-CoV-2 variant of concern. Lineage P.1, acquired 17 mutations, including a trio in the spike protein (K417T, E484K and N501Y) associated with increased binding to the human ACE2 receptor. Molecular clock analysis shows that P.1 emergence occurred around mid-November 2020 and was preceded by a period of faster molecular evolution. Using a two-category dynamical model that integrates genomic and mortality data, we estimate that P.1 may be 1.7–2.4-fold more transmissible, and that previous (non-P.1) infection provides 54–79% of the protection against infection with P.1 that it provides against non-P.1 lineages. Enhanced global genomic surveillance of variants of concern, which may exhibit increased transmissibility and/or immune evasion, is critical to accelerate pandemic responsiveness.

PDF Code Dataset Source Document DOI Supplementary Material

Swapnil Mishra*, Erik Volz*, Meera Chand*, Jeffrey C. Barrett*, Robert Johnson*, Lily Geidelberg, Wes R. Hinsley, Daniel J. Laydon, Gavin Dabrera, Áine O’Toole, Roberto Amato, Manon Ragonnet-Cronin, Ian Harrison, Ben Jackson, Cristina V. Ariani, Olivia Boyd, Nicholas J. Loman, John T. McCrone, Sónia Gonçalves, David Jorgensen, Richard Myers, Verity Hill, David K. Jackson, Katy Gaythorpe, Natalie Groves, John Sillitoe, Dominic P. Kwiatkowski, The COVID-19 Genomics UK (COG-UK) consortium, Seth Flaxman, Oliver Ratmann, Samir Bhatt, Susan Hopkins, Axel Gandy*, Andrew Rambaut*, Neil M. Ferguson*

March 2021 Nature (2021)

Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England

The SARS-CoV-2 lineage B.1.1.7, designated a Variant of Concern 202012/01 (VOC) by Public Health England1, originated in the UK in late Summer to early Autumn 20202. Whole genome SARS-CoV-2 sequence data collected from community-based diagnostic testing shows an unprecedentedly rapid expansion of the B.1.1.7 lineage during Autumn 2020, suggesting a selective advantage. We find that changes in VOC frequency inferred from genetic data correspond closely to changes inferred by S-gene target failures (SGTF) in community-based diagnostic PCR testing. Analysis of trends in SGTF and non-SGTF case numbers in local areas across England shows that the VOC has higher transmissibility than non-VOC lineages, even if the VOC has a different latent period or generation time. The SGTF data indicate a transient shift in the age composition of reported cases, with a larger share of under 20 year olds among reported VOC than non-VOC cases. Time-varying reproduction numbers for the VOC and cocirculating lineages were estimated using SGTF and genomic data. The best supported models did not indicate a substantial difference in VOC transmissibility among different age groups. There is a consensus among all analyses that the VOC has a substantial transmission advantage with a 50% to 100% higher reproduction number.

PDF Code Dataset Source Document Supplementary Material

Swapnil Mishra*, Seth Flaxman*, Axel Gandy*, H. Juliette T. Unwin, Thomas A. Mellan, Helen Coupland, Charles Whittaker, Harrison Zhu, Tresnia Berah, Jeffrey W. Eaton, Mélodie Monod, Imperial College COVID-19 Response Team, Azra C. Ghani, Christl A. Donnelly, Steven M. Riley, Michaela A. C. Vollmer, Neil M. Ferguson, Lucy C. Okell, Samir Bhatt*

June 2020 Nature (2020)

Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe.

Following the emergence of a novel coronavirus (SARS-CoV-2) and its spread outside of China, Europe has experienced large epidemics. In response, many European countries have implemented unprecedented non-pharmaceutical interventions such as closure of schools and national lockdowns. We study the impact of major interventions across 11 European countries for the period from the start of COVID-19 until the 4th of May 2020 when lockdowns started to be lifted. Our model calculates backwards from observed deaths to estimate transmission that occurred several weeks prior, allowing for the time lag between infection and death. We use partial pooling of information between countries with both individual and shared effects on the reproduction number. Pooling allows more information to be used, helps overcome data idiosyncrasies, and enables more timely estimates. Our model relies on fixed estimates of some epidemiological parameters such as the infection fatality rate, does not include importation or subnational variation and assumes that changes in the reproduction number are an immediate response to interventions rather than gradual changes in behavior. Amidst the ongoing pandemic, we rely on death data that is incomplete, with systematic biases in reporting, and subject to future consolidation. We estimate that, for all the countries we consider, current interventions have been sufficient to drive the reproduction number Rt below 1 (probability Rt< 1.0 is 99.9%) and achieve epidemic control. We estimate that, across all 11 countries, between 12 and 15 million individuals have been infected with SARS-CoV-2 up to 4th May, representing between 3.2% and 4.0% of the population. Our results show that major non-pharmaceutical interventions and lockdown in particular have had a large effect on reducing transmission. Continued intervention should be considered to keep transmission of SARS-CoV-2 under control.

PDF Code Dataset Project Slides Video Source Document DOI Supplementary Material

Swapnil Mishra, Marian-Andrei Rizoiu, Lexing Xie

April 2018 ICWSM

Modeling Popularity in Asynchronous Social Media Streams with Recurrent Neural Networks

Understanding and predicting the popularity of online items is an important open problem in social media analysis. Considerable progress has been made recently in data-driven predictions, and in linking popularity to external promotions. However, the existing methods typically focus on a single source of external influence, whereas for many types of online content such as YouTube videos or news articles, attention is driven by multiple heterogeneous sources simultaneously - e.g. microblogs or traditional media coverage. Here, we propose RNN-MAS, a recurrent neural network for modeling asynchronous streams. It is a sequence generator that connects multiple streams of different granularity via joint inference. We show RNN-MAS not only to outperform the current state-of-the-art Youtube popularity prediction system by 17%, but also to capture complex dynamics, such as seasonal trends of unseen influence. We define two new metrics: promotion score quantifies the gain in popularity from one unit of promotion for a Youtube video; the loudness level captures the effects of a particular user tweeting about the video. We use the loudness level to compare the effects of a video being promoted by a single highly-followed user (in the top 1% most followed users) against being promoted by a group of mid-followed users. We find that results depend on the type of content being promoted: superusers are more successful in promoting Howto and Gaming videos, whereas the cohort of regular users are more influential for Activism videos. This work provides more accurate and explainable popularity predictions, as well as computational tools for content producers and marketers to allocate resources for promotion campaigns.

Preprint PDF Code Dataset

Swapnil Mishra, Marian-Andrei Rizoiu, Lexing Xie

October 2016 CIKM

Feature Driven and Point Process Approaches for Popularity Prediction

Predicting popularity, or the total volume of information outbreaks, is an important subproblem for understanding collective behavior in networks. Each of the two main types of recent approaches to the problem, feature-driven and generative models, have desired qualities and clear limitations. This paper bridges the gap between these solutions with a new hybrid approach and a new performance benchmark. We model each social cascade with a marked Hawkes self-exciting point process, and estimate the content virality, memory decay, and user influence. We then learn a predictive layer for popularity prediction using a collection of cascade history. To our surprise, Hawkes process with a predictive overlay outperform recent feature-driven and generative approaches on existing tweet data [44] and a new public benchmark on news tweets. We also found that a basic set of user features and event time summary statistics performs competitively in both classification and regression tasks, and that adding point process information to the feature set further improves predictions. From these observations, we argue that future work on popularity prediction should compare across feature-driven and generative modeling approaches in both classification and regression tasks.

Preprint PDF Code Dataset Project Slides DOI

Wray L. Buntine, Swapnil Mishra

August 2014 KDD

Experiments with non-parametric topic models

In topic modelling, various alternative priors have been developed, for instance asymmetric and symmetric priors for the document-topic and topic-word matrices respectively, the hierarchical Dirichlet process prior for the document-topic matrix and the hierarchical Pitman-Yor process prior for the topic-word matrix. For information retrieval, language models exhibiting word burstiness are important. Indeed, this burstiness effect has been show to help topic models as well, and this requires additional word probability vectors for each document. Here we show how to combine these ideas to develop high-performing non-parametric topic models exhibiting burstiness based on standard Gibbs sampling. Experiments are done to explore the behavior of the models under different conditions and to compare the algorithms with previously published. The full non-parametric topic models with burstiness are only a small factor slower than standard Gibbs sampling for LDA and require double the memory, making them very competitive. We look at the comparative behaviour of different models and present some experimental insights.

Preprint PDF Code Project Slides Video DOI

Swapnil Mishra

Assistant Professor

National University of Singapore

Biography

Interests

Education

Featured Publications

π VAE: a stochastic process prior for Bayesian deep learning with MCMC

Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil

Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England

Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe.

Modeling Popularity in Asynchronous Social Media Streams with Recurrent Neural Networks

Feature Driven and Point Process Approaches for Popularity Prediction

Experiments with non-parametric topic models

Recent Publications

Grants

Digital Technology Development Award Wellcome Trust

Imperial COVID-19 Response Fund

Microsoft AI for Health program

Amazon AWS Compute Grant

Teaching

Contact