• A graph similarity for deep learning
  • An Unsupervised Information-Theoretic Perceptual Quality Metric
  • Self-Supervised MultiModal Versatile Networks
  • Benchmarking Deep Inverse Models over time, and the Neural-Adjoint method
  • Off-Policy Evaluation and Learning for External Validity under a Covariate Shift
  • Neural Methods for Point-wise Dependency Estimation
  • Fast and Flexible Temporal Point Processes with Triangular Maps
  • Backpropagating Linearly Improves Transferability of Adversarial Examples
  • PyGlove: Symbolic Programming for Automated Machine Learning
  • Fourier Sparse Leverage Scores and Approximate Kernel Learning
  • Improved Algorithms for Online Submodular Maximization via First-order Regret Bounds
  • Synbols: Probing Learning Algorithms with Synthetic Datasets
  • Adversarially Robust Streaming Algorithms via Differential Privacy
  • Trading Personalization for Accuracy: Data Debugging in Collaborative Filtering
  • Cascaded Text Generation with Markov Transformers
  • Improving Local Identifiability in Probabilistic Box Embeddings
  • Permute-and-Flip: A new mechanism for differentially private selection
  • Deep reconstruction of strange attractors from time series
  • Reciprocal Adversarial Learning via Characteristic Functions
  • Statistical Guarantees of Distributed Nearest Neighbor Classification
  • Stein Self-Repulsive Dynamics: Benefits From Past Samples
  • The Statistical Complexity of Early-Stopped Mirror Descent
  • Algorithmic recourse under imperfect causal knowledge: a probabilistic approach
  • Quantitative Propagation of Chaos for SGD in Wide Neural Networks
  • A Causal View on Robustness of Neural Networks
  • Minimax Classification with 0-1 Loss and Performance Guarantees
  • How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization
  • Coresets for Regressions with Panel Data
  • Learning Composable Energy Surrogates for PDE Order Reduction
  • Efficient Contextual Bandits with Continuous Actions
  • Achieving Equalized Odds by Resampling Sensitive Attributes
  • Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates
  • Hard Shape-Constrained Kernel Machines
  • A Closer Look at the Training Strategy for Modern Meta-Learning
  • On the Value of Out-of-Distribution Testing: An Example of Goodhart’s Law
  • Generalised Bayesian Filtering via Sequential Monte Carlo
  • Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time
  • Flows for simultaneous manifold learning and density estimation
  • Simultaneous Preference and Metric Learning from Paired Comparisons
  • Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee
  • Learning Manifold Implicitly via Explicit Heat-Kernel Learning
  • Deep Relational Topic Modeling via Graph Poisson Gamma Belief Network
  • One-bit Supervision for Image Classification
  • What is being transferred in transfer learning?
  • Submodular Maximization Through Barrier Functions
  • Neural Networks with Recurrent Generative Feedback
  • Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Prediction
  • Exploiting weakly supervised visual patterns to learn from partial annotations
  • Improving Inference for Neural Image Compression
  • Neuron Merging: Compensating for Pruned Neurons
  • FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
  • Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing
  • Towards Playing Full MOBA Games with Deep Reinforcement Learning
  • Rankmax: An Adaptive Projection Alternative to the Softmax Function
  • Online Agnostic Boosting via Regret Minimization
  • Causal Intervention for Weakly-Supervised Semantic Segmentation
  • Belief Propagation Neural Networks
  • Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality
  • Post-training Iterative Hierarchical Data Augmentation for Deep Networks
  • Debugging Tests for Model Explanations
  • Robust compressed sensing using generative models
  • Fairness without Demographics through Adversarially Reweighted Learning
  • Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
  • Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian
  • The route to chaos in routing games: When is price of anarchy too optimistic?
  • Online Algorithm for Unsupervised Sequential Selection with Contextual Information
  • Adapting Neural Architectures Between Domains
  • What went wrong and when? Instance-wise feature importance for time-series black-box models
  • Towards Better Generalization of Adaptive Gradient Methods
  • Learning Guidance Rewards with Trajectory-space Smoothing
  • Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization
  • Tree! I am no Tree! I am a low dimensional Hyperbolic Embedding
  • Deep Structural Causal Models for Tractable Counterfactual Inference
  • Convolutional Generation of Textured 3D Meshes
  • A Statistical Framework for Low-bitwidth Training of Deep Neural Networks
  • Better Set Representations For Relational Reasoning
  • AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning
  • A Combinatorial Perspective on Transfer Learning
  • Hardness of Learning Neural Networks with Natural Weights
  • Higher-Order Spectral Clustering of Directed Graphs
  • Primal-Dual Mesh Convolutional Neural Networks
  • The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning
  • Watch out! Motion is Blurring the Vision of Your Deep Neural Networks
  • Sinkhorn Barycenter via Functional Gradient Descent
  • Coresets for Near-Convex Functions
  • Bayesian Deep Ensembles via the Neural Tangent Kernel
  • Improved Schemes for Episodic Memory-based Lifelong Learning
  • Adaptive Sampling for Stochastic Risk-Averse Learning
  • Deep Wiener Deconvolution: Wiener Meets Deep Learning for Image Deblurring
  • Discovering Reinforcement Learning Algorithms
  • Taming Discrete Integration via the Boon of Dimensionality
  • Blind Video Temporal Consistency via Deep Video Prior
  • Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering
  • Model Selection for Production System via Automated Online Experiments
  • On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems
  • Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond
  • Adaptation Properties Allow Identification of Optimized Neural Codes
  • Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems
  • Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
  • Conservative Q-Learning for Offline Reinforcement Learning
  • Online Influence Maximization under Linear Threshold Model
  • Ensembling geophysical models with Bayesian Neural Networks
  • Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation
  • Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability
  • Understanding Deep Architecture with Reasoning Layer
  • Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
  • Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration
  • Detection as Regression: Certified Object Detection with Median Smoothing
  • Contextual Reserve Price Optimization in Auctions via Mixed Integer Programming
  • ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks
  • FleXOR: Trainable Fractional Quantization
  • The Implications of Local Correlation on Learning Some Deep Functions
  • Learning to search efficiently for causally near-optimal treatments
  • A Game Theoretic Analysis of Additive Adversarial Attacks and Defenses
  • Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts
  • Recurrent Quantum Neural Networks
  • No-Regret Learning and Mixed Nash Equilibria: They Do Not Mix
  • A Unifying View of Optimism in Episodic Reinforcement Learning
  • Continuous Submodular Maximization: Beyond DR-Submodularity
  • An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits
  • Assessing SATNet’s Ability to Solve the Symbol Grounding Problem
  • A Bayesian Nonparametrics View into Deep Representations
  • On the Similarity between the Laplace and Neural Tangent Kernels
  • A causal view of compositional zero-shot recognition
  • HiPPO: Recurrent Memory with Optimal Polynomial Projections
  • Auto Learning Attention
  • CASTLE: Regularization via Auxiliary Causal Graph Discovery
  • Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect
  • Explainable Voting
  • Deep Archimedean Copulas
  • Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization
  • UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging
  • Thunder: a Fast Coordinate Selection Solver for Sparse Learning
  • Neural Networks Fail to Learn Periodic Functions and How to Fix It
  • Distribution Matching for Crowd Counting
  • Correspondence learning via linearly-invariant embedding
  • Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning
  • On Adaptive Attacks to Adversarial Example Defenses
  • Sinkhorn Natural Gradient for Generative Models
  • Online Sinkhorn: Optimal Transport distances from sample streams
  • Ultrahyperbolic Representation Learning
  • Locally-Adaptive Nonparametric Online Learning
  • Compositional Generalization via Neural-Symbolic Stack Machines
  • Graphon Neural Networks and the Transferability of Graph Neural Networks
  • Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms
  • Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction
  • Deep Transformers with Latent Depth
  • Neural Mesh Flow: 3D Manifold Mesh Generation via Diffeomorphic Flows
  • Statistical control for spatio-temporal MEG/EEG source imaging with desparsified mutli-task Lasso
  • A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees
  • Efficient Exact Verification of Binarized Neural Networks
  • Ultra-Low Precision 4-bit Training of Deep Neural Networks
  • Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
  • On Numerosity of Deep Neural Networks
  • Outlier Robust Mean Estimation with Subgaussian Rates via Stability
  • Self-Supervised Relationship Probing
  • Information Theoretic Counterfactual Learning from Missing-Not-At-Random Feedback
  • Prophet Attention: Predicting Attention with Future Attention
  • Language Models are Few-Shot Learners
  • Margins are Insufficient for Explaining Gradient Boosting
  • Fourier-transform-based attribution priors improve the interpretability and stability of deep learning models for genomics
  • MomentumRNN: Integrating Momentum into Recurrent Neural Networks
  • Marginal Utility for Planning in Continuous or Large Discrete Action Spaces
  • Projected Stein Variational Gradient Descent
  • Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks
  • SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks
  • On the equivalence of molecular graph convolution and molecular wave function with poor basis set
  • The Power of Predictions in Online Control
  • Learning Affordance Landscapes for Interaction Exploration in 3D Environments
  • Cooperative Multi-player Bandit Optimization
  • Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits
  • Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
  • A Loss Function for Generative Neural Networks Based on Watson’s Perceptual Model
  • Dynamic Fusion of Eye Movement Data and Verbal Narrations in Knowledge-rich Domains
  • Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
  • Optimizing Neural Networks via Koopman Operator Theory
  • SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence
  • Adversarial Robustness of Supervised Sparse Coding
  • Differentiable Meta-Learning of Bandit Policies
  • Biologically Inspired Mechanisms for Adversarial Robustness
  • Statistical-Query Lower Bounds via Functional Gradients
  • Near-Optimal Reinforcement Learning with Self-Play
  • Network Diffusions via Neural Mean-Field Dynamics
  • Self-Distillation as Instance-Specific Label Smoothing
  • Towards Problem-dependent Optimal Learning Rates
  • Cross-lingual Retrieval for Iterative Self-Supervised Training
  • Rethinking pooling in graph neural networks
  • Pointer Graph Networks
  • Gradient Regularized V-Learning for Dynamic Treatment Regimes
  • Faster Wasserstein Distance Estimation with the Sinkhorn Divergence
  • Forethought and Hindsight in Credit Assignment
  • Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification
  • Rescuing neural spike train models from bad MLE
  • Lower Bounds and Optimal Algorithms for Personalized Federated Learning
  • Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework
  • Deep Imitation Learning for Bimanual Robotic Manipulation
  • Stationary Activations for Uncertainty Calibration in Deep Learning
  • Ensemble Distillation for Robust Model Fusion in Federated Learning
  • Falcon: Fast Spectral Inference on Encrypted Data
  • On Power Laws in Deep Ensembles
  • Practical Quasi-Newton Methods for Training Deep Neural Networks
  • Approximation Based Variance Reduction for Reparameterization Gradients
  • Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation
  • Consistent feature selection for analytic deep neural networks
  • Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification
  • Information Maximization for Few-Shot Learning
  • Inverse Reinforcement Learning from a Gradient-based Learner
  • Bayesian Multi-type Mean Field Multi-agent Imitation Learning
  • Bayesian Robust Optimization for Imitation Learning
  • Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance
  • Riemannian Continuous Normalizing Flows
  • Attention-Gated Brain Propagation: How the brain can implement reward-based error backpropagation
  • Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance
  • Online Robust Regression via SGD on the l1 loss
  • PRANK: motion Prediction based on RANKing
  • Fighting Copycat Agents in Behavioral Cloning from Observation Histories
  • Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model
  • Structured Prediction for Conditional Meta-Learning
  • Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient
  • The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
  • Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function
  • Identifying Learning Rules From Neural Network Observables
  • Optimal Approximation - Smoothness Tradeoffs for Soft-Max Functions
  • Weakly-Supervised Reinforcement Learning for Controllable Behavior
  • Improving Policy-Constrained Kidney Exchange via Pre-Screening
  • Learning abstract structure for drawing by efficient motor program induction
  • Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? — A Neural Tangent Kernel Perspective
  • Dual Instrumental Variable Regression
  • Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes
  • Interventional Few-Shot Learning
  • Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
  • Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning
  • ShiftAddNet: A Hardware-Inspired Deep Network
  • Network-to-Network Translation with Conditional Invertible Neural Networks
  • Intra-Processing Methods for Debiasing Neural Networks
  • Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems
  • Model-based Policy Optimization with Unsupervised Model Adaptation
  • Implicit Regularization and Convergence for Weight Normalization
  • Geometric All-way Boolean Tensor Decomposition
  • Modular Meta-Learning with Shrinkage
  • A/B Testing in Dense Large-Scale Networks: Design and Inference
  • What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation
  • Partially View-aligned Clustering
  • Partial Optimal Tranport with applications on Positive-Unlabeled Learning
  • Toward the Fundamental Limits of Imitation Learning
  • Logarithmic Pruning is All You Need
  • Hold me tight! Influence of discriminative features on deep network boundaries
  • Learning from Mixtures of Private and Public Populations
  • Adversarial Weight Perturbation Helps Robust Generalization
  • Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes
  • Adversarial Self-Supervised Contrastive Learning
  • Normalizing Kalman Filters for Multivariate Time Series Analysis
  • Learning to summarize with human feedback
  • Fourier Spectrum Discrepancies in Deep Network Generated Images
  • Lamina-specific neuronal properties promote robust, stable signal propagation in feedforward networks
  • Learning Dynamic Belief Graphs to Generalize on Text-Based Games
  • Triple descent and the two kinds of overfitting: where & why do they appear?
  • Multimodal Graph Networks for Compositional Generalization in Visual Question Answering
  • Learning Graph Structure With A Finite-State Automaton Layer
  • A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions
  • Unsupervised object-centric video generation and decomposition in 3D
  • Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization
  • Multi-label classification: do Hamming loss and subset accuracy really conflict with each other?
  • A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances
  • Causal analysis of Covid-19 Spread in Germany
  • Locally private non-asymptotic testing of discrete distributions is faster using interactive mechanisms
  • Adaptive Gradient Quantization for Data-Parallel SGD
  • Finite Continuum-Armed Bandits
  • Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
  • Compact task representations as a normative model for higher-order brain activity
  • Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs
  • Co-exposure Maximization in Online Social Networks
  • UCLID-Net: Single View Reconstruction in Object Space
  • Reinforcement Learning for Control with Multiple Frequencies
  • Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval
  • Neural Message Passing for Multi-Relational Ordered and Recursive Hypergraphs
  • A Unified View of Label Shift Estimation
  • Optimal Private Median Estimation under Minimal Distributional Assumptions
  • Breaking the Communication-Privacy-Accuracy Trilemma
  • Audeo: Audio Generation for a Silent Performance Video
  • Ode to an ODE
  • Self-Distillation Amplifies Regularization in Hilbert Space
  • Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators
  • Community detection using fast low-cardinality semidefinite programming

  • Modeling Noisy Annotations for Crowd Counting
  • An operator view of policy gradient methods
  • Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases
  • Online MAP Inference of Determinantal Point Processes
  • Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement
  • Inferring learning rules from animal decision-making
  • Input-Aware Dynamic Backdoor Attack
  • How hard is to distinguish graphs with graph neural networks?
  • Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition
  • Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks
  • Cross-Scale Internal Graph Neural Network for Image Super-Resolution
  • Unsupervised Representation Learning by Invariance Propagation
  • Restoring Negative Information in Few-Shot Object Detection
  • Do Adversarially Robust ImageNet Models Transfer Better?
  • Robust Correction of Sampling Bias using Cumulative Distribution Functions
  • Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach
  • Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation
  • Classification with Valid and Adaptive Coverage
  • Learning Global Transparent Models consistent with Local Contrastive Explanations
  • Learning to Approximate a Bregman Divergence
  • Diverse Image Captioning with Context-Object Split Latent Spaces
  • Learning Disentangled Representations of Videos with Missing Data
  • Natural Graph Networks
  • Continual Learning with Node-Importance based Adaptive Group Sparse Regularization
  • Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
  • Bidirectional Convolutional Poisson Gamma Dynamical Systems
  • Deep Reinforcement and InfoMax Learning
  • On ranking via sorting by estimated expected utility
  • Distribution-free binary classification: prediction sets, confidence intervals and calibration
  • Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow
  • Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals
  • Variance reduction for Random Coordinate Descent-Langevin Monte Carlo
  • Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration
  • All Word Embeddings from One Embedding
  • Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm
  • How to Characterize The Landscape of Overparameterized Convolutional Neural Networks
  • On the Tightness of Semidefinite Relaxations for Certifying Robustness to Adversarial Examples
  • Submodular Meta-Learning
  • Rethinking Pre-training and Self-training
  • Unsupervised Sound Separation Using Mixture Invariant Training
  • Adaptive Discretization for Model-Based Reinforcement Learning
  • CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching
  • On Warm-Starting Neural Network Training
  • DAGs with No Fears: A Closer Look at Continuous Optimization for Learning Bayesian Networks
  • OOD-MAML: Meta-Learning for Few-Shot Out-of-Distribution Detection and Classification
  • An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch
  • Learning About Objects by Learning to Interact with Them
  • Learning discrete distributions with infinite support
  • Dissecting Neural ODEs
  • Teaching a GAN What Not to Learn
  • Counterfactual Data Augmentation using Locally Factored Dynamics
  • Rethinking Learnable Tree Filter for Generic Feature Transform
  • Self-Supervised Relational Reasoning for Representation Learning
  • Sufficient dimension reduction for classification using principal optimal transport direction
  • Fast Epigraphical Projection-based Incremental Algorithms for Wasserstein Distributionally Robust Support Vector Machine
  • Differentially Private Clustering: Tight Approximation Ratios
  • On the Power of Louvain in the Stochastic Block Model
  • Fairness with Overlapping Groups; a Probabilistic Perspective
  • AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control
  • Searching for Low-Bit Weights in Quantized Neural Networks
  • Adaptive Reduced Rank Regression
  • From Predictions to Decisions: Using Lookahead Regularization
  • Sequential Bayesian Experimental Design with Variable Cost Structure
  • Predictive inference is free with the jackknife±after-bootstrap
  • Counterfactual Predictions under Runtime Confounding
  • Learning Loss for Test-Time Augmentation
  • Balanced Meta-Softmax for Long-Tailed Visual Recognition
  • Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization
  • MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning
  • How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods
  • On the Error Resistance of Hinge-Loss Minimization
  • Munchausen Reinforcement Learning
  • Object Goal Navigation using Goal-Oriented Semantic Exploration
  • Efficient semidefinite-programming-based inference for binary and multi-class MRFs
  • Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
  • Semantic Visual Navigation by Watching YouTube Videos
  • Heavy-tailed Representations, Text Polarity Classification & Data Augmentation
  • SuperLoss: A Generic Loss for Robust Curriculum Learning
  • CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models
  • Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards
  • Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations
  • Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms
  • Learning Differential Equations that are Easy to Solve
  • Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
  • Influence-Augmented Online Planning for Complex Environments
  • PAC-Bayes Learning Bounds for Sample-Dependent Priors
  • Reward-rational (implicit) choice: A unifying formalism for reward learning
  • Probabilistic Time Series Forecasting with Shape and Temporal Diversity
  • Low Distortion Block-Resampling with Spatially Stochastic Networks
  • Continual Deep Learning by Functional Regularisation of Memorable Past
  • Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning
  • Fast Fourier Convolution
  • Unsupervised Learning of Dense Visual Representations
  • Higher-Order Certification For Randomized Smoothing
  • Learning Structured Distributions From Untrusted Batches: Faster and Simpler
  • Hierarchical Quantized Autoencoders
  • Diversity can be Transferred: Output Diversification for White- and Black-box Attacks
  • POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis
  • AvE: Assistance via Empowerment
  • Variational Policy Gradient Method for Reinforcement Learning with General Utilities
  • Reverse-engineering recurrent neural network solutions to a hierarchical inference task for mice
  • Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation
  • Efficient Low Rank Gaussian Variational Inference for Neural Networks
  • Privacy Amplification via Random Check-Ins
  • Probabilistic Circuits for Variational Inference in Discrete Graphical Models
  • Your Classifier can Secretly Suffice Multi-Source Domain Adaptation
  • Labelling unlabelled videos from scratch with multi-modal self-supervision
  • A Non-Asymptotic Analysis for Stein Variational Gradient Descent
  • Robust Meta-learning for Mixed Linear Regression with Small Batches
  • Bayesian Deep Learning and a Probabilistic Perspective of Generalization
  • Unsupervised Learning of Object Landmarks via Self-Training Correspondence
  • Randomized tests for high-dimensional regression: A more efficient and powerful solution
  • Learning Representations from Audio-Visual Spatial Alignment
  • Generative View Synthesis: From Single-view Semantics to Novel-view Images
  • Towards More Practical Adversarial Attacks on Graph Neural Networks
  • Multi-Task Reinforcement Learning with Soft Modularization
  • Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models
  • On the training dynamics of deep networks with L 2 L_2 L2 regularization
  • Improved Algorithms for Convex-Concave Minimax Optimization
  • Deep Variational Instance Segmentation
  • Learning Implicit Functions for Topology-Varying Dense 3D Shape Correspondence
  • Deep Multimodal Fusion by Channel Exchanging
  • Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic Systems
  • AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity
  • Delay and Cooperation in Nonstochastic Linear Bandits
  • Probabilistic Orientation Estimation with Matrix Fisher Distributions
  • Minimax Dynamics of Optimally Balanced Spiking Networks of Excitatory and Inhibitory Neurons
  • Telescoping Density-Ratio Estimation
  • Towards Deeper Graph Neural Networks with Differentiable Group Normalization
  • Stochastic Optimization for Performative Prediction
  • Learning Differentiable Programs with Admissible Neural Heuristics
  • Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method
  • Domain Adaptation as a Problem of Inference on Graphical Models
  • Network size and size of the weights in memorization with two-layers neural networks
  • Certifying Strategyproof Auction Networks
  • Continual Learning of Control Primitives : Skill Discovery via Reset-Games
  • HOI Analysis: Integrating and Decomposing Human-Object Interaction
  • Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering
  • Deep Direct Likelihood Knockoffs
  • Meta-Neighborhoods
  • Neural Dynamic Policies for End-to-End Sensorimotor Learning
  • A new inference approach for training shallow and deep generalized linear models of noisy interacting neurons
  • Decision-Making with Auto-Encoding Variational Bayes
  • Attribution Preservation in Network Compression for Reliable Network Interpretation
  • Feature Importance Ranking for Deep Learning
  • Causal Estimation with Functional Confounders
  • Model Inversion Networks for Model-Based Optimization
  • Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks
  • Exact expressions for double descent and implicit regularization via surrogate random design
  • Certifying Confidence via Randomized Smoothing
  • Learning Physical Constraints with Neural Projections
  • Robust Optimization for Fairness with Noisy Protected Groups
  • Noise-Contrastive Estimation for Multivariate Point Processes
  • A Game-Theoretic Analysis of the Empirical Revenue Maximization Algorithm with Endogenous Sampling
  • Neural Path Features and Neural Path Kernel : Understanding the role of gates in deep learning
  • Multiscale Deep Equilibrium Models
  • Sparse Graphical Memory for Robust Planning
  • Second Order PAC-Bayesian Bounds for the Weighted Majority Vote
  • Dirichlet Graph Variational Autoencoder
  • Modeling Task Effects on Meaning Representation in the Brain via Zero-Shot MEG Prediction
  • Counterfactual Vision-and-Language Navigation: Unravelling the Unseen
  • Robust Quantization: One Model to Rule Them All
  • Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming
  • Federated Accelerated Stochastic Gradient Descent
  • Robust Density Estimation under Besov IPM Losses
  • An analytic theory of shallow networks dynamics for hinge loss classification
  • Fixed-Support Wasserstein Barycenters: Computational Hardness and Fast Algorithm
  • Learning to Orient Surfaces by Self-supervised Spherical CNNs
  • Adam with Bandit Sampling for Deep Learning
  • Parabolic Approximation Line Search for DNNs
  • Agnostic Learning of a Single Neuron with Gradient Descent
  • Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits
  • Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry
  • Generative causal explanations of black-box classifiers
  • Sub-sampling for Efficient Non-Parametric Bandit Exploration
  • Learning under Model Misspecification: Applications to Variational and Ensemble methods
  • Language Through a Prism: A Spectral Approach for Multiscale Language Representations
  • DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles
  • Towards practical differentially private causal graph discovery
  • Independent Policy Gradient Methods for Competitive Reinforcement Learning
  • The Value Equivalence Principle for Model-Based Reinforcement Learning
  • Structured Convolutions for Efficient Neural Network Design
  • Latent World Models For Intrinsically Motivated Exploration
  • Estimating Rank-One Spikes from Heavy-Tailed Noise via Self-Avoiding Walks
  • Policy Improvement via Imitation of Multiple Oracles
  • Training Generative Adversarial Networks by Solving Ordinary Differential Equations
  • Learning of Discrete Graphical Models with Neural Networks
  • RepPoints v2: Verification Meets Regression for Object Detection
  • Unfolding the Alternating Optimization for Blind Super Resolution
  • Entrywise convergence of iterative methods for eigenproblems
  • Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views
  • A Catalyst Framework for Minimax Optimization
  • Self-supervised Co-Training for Video Representation Learning
  • Gradient Estimation with Stochastic Softmax Tricks
  • Meta-Learning Requires Meta-Augmentation
  • SLIP: Learning to predict in unknown dynamical systems with long-term memory
  • Improving GAN Training with Probability Ratio Clipping and Sample Reweighting
  • Bayesian Bits: Unifying Quantization and Pruning
  • On Testing of Samplers
  • Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective
  • MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
  • Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization
  • Woodbury Transformations for Deep Generative Flows
  • Graph Contrastive Learning with Augmentations
  • Gradient Surgery for Multi-Task Learning
  • Bayesian Probabilistic Numerical Integration with Tree-Based Models
  • Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
  • Graph Meta Learning via Local Subgraphs
  • Stochastic Deep Gaussian Processes over Graphs
  • Bayesian Causal Structural Learning with Zero-Inflated Poisson Bayesian Networks
  • Evaluating Attribution for Graph Neural Networks
  • On Second Order Behaviour in Augmented Neural ODEs
  • Neuron Shapley: Discovering the Responsible Neurons
  • Stochastic Normalizing Flows
  • GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification
  • Random Reshuffling is Not Always Better
  • Model Agnostic Multilevel Explanations
  • NeuMiss networks: differentiable programming for supervised learning with missing values.
  • Revisiting Parameter Sharing for Automatic Neural Channel Number Search
  • Differentially-Private Federated Linear Bandits
  • Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?
  • Learning Physical Graph Representations from Visual Scenes
  • Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking
  • Meta-learning from Tasks with Heterogeneous Attribute Spaces
  • Estimating decision tree learnability with polylogarithmic sample complexity
  • Sparse Symplectically Integrated Neural Networks
  • Continuous Object Representation Networks: Novel View Synthesis without Target View Supervision
  • Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence
  • Solver-in-the-Loop: Learning from Differentiable Physics to Interact with Iterative PDE-Solvers
  • Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension
  • Predicting Training Time Without Training
  • How does This Interaction Affect Me? Interpretable Attribution for Feature Interactions
  • Optimal Adaptive Electrode Selection to Maximize Simultaneously Recorded Neuron Yield
  • Neurosymbolic Reinforcement Learning with Formally Verified Exploration
  • Wavelet Flow: Fast Training of High Resolution Normalizing Flows
  • Multi-task Batch Reinforcement Learning with Metric Learning
  • On 1/n neural representation and robustness
  • Boundary thickness and robustness in learning models
  • Demixed shared component analysis of neural population data from multiple brain areas
  • Learning Kernel Tests Without Data Splitting
  • Unsupervised Data Augmentation for Consistency Training
  • Subgroup-based Rank-1 Lattice Quasi-Monte Carlo
  • Minibatch vs Local SGD for Heterogeneous Distributed Learning
  • Multi-task Causal Learning with Gaussian Processes
  • Proximity Operator of the Matrix Perspective Function and its Applications
  • Generative 3D Part Assembly via Dynamic Graph Learning
  • Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention
  • The Power of Comparisons for Actively Learning Linear Classifiers
  • From Boltzmann Machines to Neural Networks and Back Again
  • Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality
  • Pruning neural networks without any data by iteratively conserving synaptic flow
  • Detecting Interactions from Neural Networks via Topological Analysis
  • Neural Bridge Sampling for Evaluating Safety-Critical Autonomous Systems
  • Interpretable and Personalized Apprenticeship Scheduling: Learning Interpretable Scheduling Policies from Heterogeneous User Demonstrations
  • Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes
  • Benchmarking Deep Learning Interpretability in Time Series Predictions
  • Federated Principal Component Analysis
  • (De)Randomized Smoothing for Certifiable Defense against Patch Attacks
  • SMYRF - Efficient Attention using Asymmetric Clustering
  • Introducing Routing Uncertainty in Capsule Networks
  • A Simple and Efficient Smoothing Method for Faster Optimization and Local Exploration
  • Hyperparameter Ensembles for Robustness and Uncertainty Quantification
  • Neutralizing Self-Selection Bias in Sampling for Sortition
  • On the Convergence of Smooth Regularized Approximate Value Iteration Schemes
  • Off-Policy Evaluation via the Regularized Lagrangian
  • The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning
  • Neural Power Units
  • Towards Scalable Bayesian Learning of Causal DAGs
  • A Dictionary Approach to Domain-Invariant Learning in Deep Networks
  • Bootstrapping neural processes
  • Large-Scale Adversarial Training for Vision-and-Language Representation Learning
  • Most ReLU Networks Suffer from ℓ 2 \ell^2 2 Adversarial Perturbations
  • Compositional Visual Generation with Energy Based Models
  • Factor Graph Grammars
  • Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs
  • Autoregressive Score Matching
  • Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization
  • Neural Controlled Differential Equations for Irregular Time Series
  • On Efficiency in Hierarchical Reinforcement Learning
  • On Correctness of Automatic Differentiation for Non-Differentiable Functions
  • Probabilistic Linear Solvers for Machine Learning
  • Dynamic Regret of Policy Optimization in Non-Stationary Environments
  • Multipole Graph Neural Operator for Parametric Partial Differential Equations
  • BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images
  • Online Structured Meta-learning
  • Learning Strategic Network Emergence Games
  • Towards Interpretable Natural Language Understanding with Explanations as Latent Variables
  • The Mean-Squared Error of Double Q-Learning
  • What Makes for Good Views for Contrastive Learning?
  • Denoising Diffusion Probabilistic Models
  • Barking up the right tree: an approach to search over molecule synthesis DAGs
  • On Uniform Convergence and Low-Norm Interpolation Learning
  • Bandit Samplers for Training Graph Neural Networks
  • Sampling from a k-DPP without looking at all items
  • Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence
  • Hierarchical Poset Decoding for Compositional Generalization in Language
  • Evaluating and Rewarding Teamwork Using Cooperative Game Abstractions
  • Exchangeable Neural ODE for Set Modeling
  • Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Distributions
  • CoADNet: Collaborative Aggregation-and-Distribution Networks for Co-Salient Object Detection
  • Regularized linear autoencoders recover the principal components, eventually
  • Semi-Supervised Partial Label Learning via Confidence-Rated Margin Maximization
  • GramGAN: Deep 3D Texture Synthesis From 2D Exemplars
  • UWSOD: Toward Fully-Supervised-Level Capacity Weakly Supervised Object Detection
  • Learning Restricted Boltzmann Machines with Sparse Latent Variables
  • Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
  • Curriculum learning for multilevel budgeted combinatorial problems
  • FedSplit: an algorithmic framework for fast federated optimization
  • Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data
  • Correlation Robust Influence Maximization
  • Neuronal Gaussian Process Regression
  • Nonconvex Sparse Graph Learning under Laplacian Constrained Graphical Model
  • Synthetic Data Generators – Sequential and Private
  • Uncertainty Quantification for Inferring Hawkes Networks
  • Implicit Distributional Reinforcement Learning
  • Auxiliary Task Reweighting for Minimum-data Learning
  • Small Nash Equilibrium Certificates in Very Large Games
  • Training Linear Finite-State Machines
  • Efficient active learning of sparse halfspaces with arbitrary bounded noise
  • Swapping Autoencoder for Deep Image Manipulation
  • Self-Supervised Few-Shot Learning on Point Clouds
  • Faster Differentially Private Samplers via Rényi Divergence Analysis of Discretized Langevin MCMC
  • Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE
  • RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning
  • Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning
  • Interior Point Solving for LP-based prediction+optimisation
  • A simple normative network approximates local non-Hebbian learning in the cortex
  • Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks
  • Understanding the Role of Training Regimes in Continual Learning
  • Fair regression with Wasserstein barycenters
  • Training Stronger Baselines for Learning to Optimize
  • Exactly Computing the Local Lipschitz Constant of ReLU Networks
  • Strictly Batch Imitation Learning by Energy-based Distribution Matching
  • On the Ergodicity, Bias and Asymptotic Normality of Randomized Midpoint Sampling Method
  • A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems
  • Generating Correct Answers for Progressive Matrices Intelligence Tests
  • HyNet: Learning Local Descriptor with Hybrid Similarity Measure and Triplet Loss
  • Preference learning along multiple criteria: A game-theoretic perspective
  • Multi-Plane Program Induction with 3D Box Priors
  • Online Neural Connectivity Estimation with Noisy Group Testing
  • Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free
  • Implicit Neural Representations with Periodic Activation Functions
  • Rotated Binary Neural Network
  • Community detection in sparse time-evolving graphs with a dynamical Bethe-Hessian
  • Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
  • Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment
  • Hierarchical nucleation in deep neural networks
  • Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
  • Graph Geometry Interaction Learning
  • Differentiable Augmentation for Data-Efficient GAN Training
  • Heuristic Domain Adaptation
  • Learning Certified Individually Fair Representations
  • Part-dependent Label Noise: Towards Instance-dependent Label Noise
  • Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization
  • An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods
  • Geometric Exploration for Online Control
  • Automatic Curriculum Learning through Value Disagreement
  • MRI Banding Removal via Adversarial Training
  • The NetHack Learning Environment
  • Language and Visual Entity Relationship Graph for Agent Navigation
  • ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping
  • Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks
  • No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium
  • Estimating weighted areas under the ROC curve
  • Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study
  • Generalized Hindsight for Reinforcement Learning
  • Critic Regularized Regression
  • Boosting Adversarial Training with Hypersphere Embedding
  • Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs
  • Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows
  • Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent
  • Training Normalizing Flows with the Information Bottleneck for Competitive Generative Classification
  • Detecting Hands and Recognizing Physical Contact in the Wild
  • On the Theory of Transfer Learning: The Importance of Task Diversity
  • Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards
  • Neural Star Domain as Primitive Representation
  • Off-Policy Interval Estimation with Lipschitz Value Iteration
  • Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics
  • Deep Statistical Solvers
  • Distributionally Robust Parametric Maximum Likelihood Estimation
  • Secretary and Online Matching Problems with Machine Learned Advice
  • Deep Transformation-Invariant Clustering
  • Overfitting Can Be Harmless for Basis Pursuit, But Only to a Degree
  • Improving Generalization in Reinforcement Learning with Mixture Regularization
  • Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework
  • Learning from Aggregate Observations
  • The Devil is in the Detail: A Framework for Macroscopic Prediction via Microscopic Models
  • Subgraph Neural Networks
  • Demystifying Orthogonal Monte Carlo and Beyond
  • Optimal Robustness-Consistency Trade-offs for Learning-Augmented Online Algorithms
  • A Scalable Approach for Privacy-Preserving Collaborative Machine Learning
  • Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
  • Towards Learning Convolutions from Scratch
  • Cycle-Contrast for Self-Supervised Video Representation Learning
  • Posterior Re-calibration for Imbalanced Datasets
  • Novelty Search in Representational Space for Sample Efficient Exploration
  • Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
  • Adversarial Blocking Bandits
  • Online Algorithms for Multi-shop Ski Rental with Machine Learned Advice
  • Multi-label Contrastive Predictive Coding
  • Rotation-Invariant Local-to-Global Representation Learning for 3D Point Cloud
  • Learning Invariants through Soft Unification
  • One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL
  • Variational Bayesian Monte Carlo with Noisy Likelihoods
  • Finite-Sample Analysis of Contractive Stochastic Approximation Using Smooth Convex Envelopes
  • Self-Supervised Generative Adversarial Compression
  • An efficient nonconvex reformulation of stagewise convex optimization problems
  • From Finite to Countable-Armed Bandits
  • Adversarial Distributional Training for Robust Deep Learning
  • Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes
  • Theory-Inspired Path-Regularized Differential Network Architecture Search
  • Conic Descent and its Application to Memory-efficient Optimization over Positive Semidefinite Matrices
  • Learning the Geometry of Wave-Based Imaging
  • Greedy inference with structure-exploiting lazy maps
  • Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
  • Finding the Homology of Decision Boundaries with Active Learning
  • Reinforced Molecular Optimization with Neighborhood-Controlled Grammars
  • Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes
  • Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability
  • Certified Defense to Image Transformations via Randomized Smoothing
  • Estimation of Skill Distribution from a Tournament
  • Reparameterizing Mirror Descent as Gradient Descent
  • General Control Functions for Causal Effect Estimation from IVs
  • Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
  • Certified Robustness of Graph Convolution Networks for Graph Classification under Topological Attacks
  • Zero-Resource Knowledge-Grounded Dialogue Generation
  • Targeted Adversarial Perturbations for Monocular Depth Prediction
  • Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties
  • Offline Imitation Learning with a Misspecified Simulator
  • Multi-Fidelity Bayesian Optimization via Deep Neural Networks
  • PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals
  • Bad Global Minima Exist and SGD Can Reach Them
  • Optimal Prediction of the Number of Unseen Species with Multiplicity
  • Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe
  • Factor Graph Neural Networks
  • A Closer Look at Accuracy vs. Robustness
  • Curriculum Learning by Dynamic Instance Hardness
  • Spin-Weighted Spherical CNNs
  • Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks
  • AutoPrivacy: Automated Layer-wise Parameter Selection for Secure Neural Network Inference
  • Baxter Permutation Process
  • Characterizing emergent representations in a space of candidate learning rules for deep networks
  • Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation
  • Adaptive Probing Policies for Shortest Path Routing
  • Approximate Heavily-Constrained Learning with Lagrange Multiplier Models
  • Faster Randomized Infeasible Interior Point Methods for Tall/Wide Linear Programs
  • Sliding Window Algorithms for k-Clustering Problems
  • AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
  • Approximate Cross-Validation for Structured Models
  • Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation
  • Debiased Contrastive Learning
  • UCSG-NET- Unsupervised Discovering of Constructive Solid Geometry Tree
  • Generalized Boosting
  • COT-GAN: Generating Sequential Data via Causal Optimal Transport
  • Impossibility Results for Grammar-Compressed Linear Algebra
  • Understanding spiking networks through convex optimization
  • Better Full-Matrix Regret via Parameter-Free Online Learning
  • Large-Scale Methods for Distributionally Robust Optimization
  • Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring
  • Bandit Linear Control
  • Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals
  • PEP: Parameter Ensembling by Perturbation
  • Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View
  • Adversarial Example Games
  • Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts
  • Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach
  • Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms
  • Learning to Play Sequential Games versus Unknown Opponents
  • Further Analysis of Outlier Detection with Deep Generative Models
  • Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning
  • Neural Networks Learning and Memorization with (almost) no Over-Parameterization
  • Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits
  • Towards a Combinatorial Characterization of Bounded-Memory Learning
  • Chaos, Extremism and Optimism: Volume Analysis of Learning in Games
  • On Regret with Multiple Best Arms
  • Matrix Completion with Hierarchical Graph Side Information
  • Is Long Horizon RL More Difficult Than Short Horizon RL?
  • Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond
  • Adversarial Learning for Robust Deep Clustering
  • Learning Mutational Semantics
  • Learning to Learn Variational Semantic Memory
  • Myersonian Regression
  • Learnability with Indirect Supervision Signals
  • Towards Safe Policy Improvement for Non-Stationary MDPs
  • Finer Metagenomic Reconstruction via Biodiversity Optimization
  • Causal Discovery in Physical Systems from Videos
  • Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data
  • Smoothed Analysis of Online and Differentially Private Learning
  • Self-Paced Deep Reinforcement Learning
  • Kalman Filtering Attention for User Behavior Modeling in CTR Prediction
  • Towards Maximizing the Representation Gap between In-Domain & Out-of-Distribution Examples
  • Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels
  • GNNGuard: Defending Graph Neural Networks against Adversarial Attacks
  • Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction
  • Optimal visual search based on a model of target detectability in natural images
  • Towards Convergence Rate Analysis of Random Forests for Classification
  • List-Decodable Mean Estimation via Iterative Multi-Filtering
  • Exact Recovery of Mangled Clusters with Same-Cluster Queries
  • Steady State Analysis of Episodic Reinforcement Learning
  • Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures
  • Bayesian Optimization for Iterative Learning
  • Minimax Bounds for Generalized Linear Models
  • Projection Robust Wasserstein Distance and Riemannian Optimization
  • CoinDICE: Off-Policy Confidence Interval Estimation
  • Simple and Fast Algorithm for Binary Integer and Online Linear Programming
  • Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction
  • Learning Rich Rankings
  • Color Visual Illusions: A Statistics-based Computational Model
  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
  • Universal guarantees for decision tree induction via a higher-order splitting criterion
  • Trade-offs and Guarantees of Adversarial Representation Learning for Information Obfuscation
  • A Boolean Task Algebra for Reinforcement Learning
  • Learning with Differentiable Pertubed Optimizers
  • Optimal Learning from Verified Training Data
  • Online Linear Optimization with Many Hints
  • Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification
  • Causal Discovery from Soft Interventions with Unknown Targets: Characterization and Learning
  • Exploiting the Surrogate Gap in Online Multiclass Classification
  • The Pitfalls of Simplicity Bias in Neural Networks
  • Automatically Learning Compact Quality-aware Surrogates for Optimization Problems
  • Empirical Likelihood for Contextual Bandits
  • Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?
  • Non-reversible Gaussian processes for identifying latent dynamical structure in neural data
  • Listening to Sounds of Silence for Speech Denoising
  • BoxE: A Box Embedding Model for Knowledge Base Completion
  • Coherent Hierarchical Multi-Label Classification Networks
  • Walsh-Hadamard Variational Inference for Bayesian Deep Learning
  • Federated Bayesian Optimization via Thompson Sampling
  • MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation
  • Neural Complexity Measures
  • Optimal Iterative Sketching Methods with the Subsampled Randomized Hadamard Transform
  • Provably adaptive reinforcement learning in metric spaces
  • ShapeFlow: Learnable Deformation Flows Among 3D Shapes
  • Self-Supervised Learning by Cross-Modal Audio-Video Clustering
  • Optimal Query Complexity of Secure Stochastic Convex Optimization
  • DynaBERT: Dynamic BERT with Adaptive Width and Depth
  • Generalization Bound of Gradient Descent for Non-Convex Metric Learning
  • Dynamic Submodular Maximization
  • Inference for Batched Bandits
  • Approximate Cross-Validation with Low-Rank Data in High Dimensions
  • GANSpace: Discovering Interpretable GAN Controls
  • Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization
  • Neuron-level Structured Pruning using Polarization Regularizer
  • Limits on Testing Structural Changes in Ising Models
  • Field-wise Learning for Multi-field Categorical Data
  • Continual Learning in Low-rank Orthogonal Subspaces
  • Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
  • Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms
  • Learning Deformable Tetrahedral Meshes for 3D Reconstruction
  • Information theoretic limits of learning a sparse rule
  • Self-supervised learning through the eyes of a child
  • Unsupervised Semantic Aggregation and Deformable Template Matching for Semi-Supervised Learning
  • A game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learning
  • What shapes feature representations? Exploring datasets, architectures, and training
  • Optimal Best-arm Identification in Linear Bandits
  • Data Diversification: A Simple Strategy For Neural Machine Translation
  • Interstellar: Searching Recurrent Architecture for Knowledge Graph Embedding
  • CoSE: Compositional Stroke Embeddings
  • Learning Multi-Agent Coordination for Enhancing Target Coverage in Directional Sensor Networks
  • Biological credit assignment through dynamic inversion of feedforward networks
  • Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
  • Learning Multi-Agent Communication through Structured Attentive Reasoning
  • Private Identity Testing for High-Dimensional Distributions
  • On the Optimal Weighted ℓ 2 \ell_2 2 Regularization in Overparameterized Linear Regression
  • An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search
  • MetaSDF: Meta-Learning Signed Distance Functions
  • Simple and Scalable Sparse k-means Clustering via Feature Ranking
  • Model-based Adversarial Meta-Reinforcement Learning
  • Graph Policy Network for Transferable Active Learning on Graphs
  • Towards a Better Global Loss Landscape of GANs
  • Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
  • BanditPAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits
  • UDH: Universal Deep Hiding for Steganography, Watermarking, and Light Field Messaging
  • Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders
  • An Unbiased Risk Estimator for Learning with Augmented Classes
  • AutoBSS: An Efficient Algorithm for Block Stacking Style Search
  • Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point
  • Stochastic Optimization with Laggard Data Pipelines
  • Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs
  • GPS-Net: Graph-based Photometric Stereo Network
  • Consistent Structural Relation Learning for Zero-Shot Segmentation
  • Model Selection in Contextual Stochastic Bandit Problems
  • Truncated Linear Regression in High Dimensions
  • Incorporating Pragmatic Reasoning Communication into Emergent Language
  • Deep Subspace Clustering with Data Augmentation
  • An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits
  • Can Graph Neural Networks Count Substructures?
  • A Bayesian Perspective on Training Speed and Model Selection
  • On the Modularity of Hypernetworks
  • Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
  • Provably Efficient Neural GTD for Off-Policy Learning
  • Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration
  • Stable and expressive recurrent vision models
  • Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form
  • BRP-NAS: Prediction-based NAS using GCNs
  • Deep Shells: Unsupervised Shape Correspondence with Optimal Transport
  • ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding
  • Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D
  • Regularizing Black-box Models for Improved Interpretability
  • Trust the Model When It Is Confident: Masked Model-based Actor-Critic
  • Semi-Supervised Neural Architecture Search
  • Consistency Regularization for Certified Robustness of Smoothed Classifiers
  • Robust Multi-Agent Reinforcement Learning with Model Uncertainty
  • SIRI: Spatial Relation Induced Network For Spatial Description Resolution
  • Adaptive Shrinkage Estimation for Streaming Graphs
  • Make One-Shot Video Object Segmentation Efficient Again
  • Depth Uncertainty in Neural Networks
  • Non-Euclidean Universal Approximation
  • Constraining Variational Inference with Geometric Jensen-Shannon Divergence
  • Gibbs Sampling with People
  • HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory
  • FrugalML: How to use ML Prediction APIs more accurately and cheaply
  • Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth
  • Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
  • Monotone operator equilibrium networks
  • When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes
  • Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control
  • High-Dimensional Sparse Linear Bandits
  • Non-Stochastic Control with Bandit Feedback
  • Generalized Leverage Score Sampling for Neural Networks
  • An Optimal Elimination Algorithm for Learning a Best Arm
  • Efficient Projection-free Algorithms for Saddle Point Problems
  • A mathematical model for automatic differentiation in machine learning
  • Unsupervised Text Generation by Learning from Search
  • Learning Compositional Rules via Neural Program Synthesis
  • Incorporating BERT into Parallel Sequence Decoding with Adapters
  • Estimating Fluctuations in Neural Representations of Uncertain Environments
  • Discover, Hallucinate, and Adapt: Open Compound Domain Adaptation for Semantic Segmentation
  • SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm
  • Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
  • General Transportability of Soft Interventions: Completeness Results
  • GAIT-prop: A biologically plausible learning rule derived from backpropagation of error
  • Lipschitz Bounds and Provably Robust Training by Laplacian Smoothing
  • SCOP: Scientific Control for Reliable Neural Network Pruning
  • Provably Consistent Partial-Label Learning
  • Robust, Accurate Stochastic Optimization for Variational Inference
  • Discovering conflicting groups in signed networks
  • Learning Some Popular Gaussian Graphical Models without Condition Number Bounds
  • Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding
  • Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions
  • Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition
  • VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain
  • The Smoothed Possibility of Social Choice
  • A Decentralized Parallel Algorithm for Training Generative Adversarial Nets
  • Phase retrieval in high dimensions: Statistical and computational phase transitions
  • Fair Performance Metric Elicitation
  • Hybrid Variance-Reduced SGD Algorithms For Minimax Problems with Nonconvex-Linear Function
  • Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information
  • Soft Contrastive Learning for Visual Localization
  • Fine-Grained Dynamic Head for Object Detection
  • LoCo: Local Contrastive Representation Learning
  • Modeling and Optimization Trade-off in Meta-learning
  • SnapBoost: A Heterogeneous Boosting Machine
  • On Adaptive Distance Estimation
  • Stage-wise Conservative Linear Bandits
  • RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces
  • Metric-Free Individual Fairness in Online Learning
  • GreedyFool: Distortion-Aware Sparse Adversarial Attack
  • VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data
  • RetroXpert: Decompose Retrosynthesis Prediction Like A Chemist
  • Sample-Efficient Optimization in the Latent Space of Deep Generative Models via Weighted Retraining
  • Improved Sample Complexity for Incremental Autonomous Exploration in MDPs
  • TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning
  • RD 2 ^2 2: Reward Decomposition with Representation Decomposition
  • Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID
  • Fairness constraints can help exact inference in structured prediction
  • Instance-based Generalization in Reinforcement Learning
  • Smooth And Consistent Probabilistic Regression Trees
  • Computing Valid p-value for Optimal Changepoint by Selective Inference using Dynamic Programming
  • Factorized Neural Processes for Neural Processes: K-Shot Prediction of Neural Responses
  • Winning the Lottery with Continuous Sparsification
  • Adversarial robustness via robust low rank representations
  • Joints in Random Forests
  • Compositional Generalization by Learning Analytical Expressions
  • JAX MD: A Framework for Differentiable Physics
  • An implicit function learning approach for parametric modal regression
  • SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images
  • Coresets for Robust Training of Deep Neural Networks against Noisy Labels
  • Adapting to Misspecification in Contextual Bandits
  • Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters
  • MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
  • Learning to solve TV regularised problems with unrolled algorithms
  • Object-Centric Learning with Slot Attention
  • Improving robustness against common corruptions by covariate shift adaptation
  • Deep Smoothing of the Implied Volatility Surface
  • Probabilistic Inference with Algebraic Constraints: Theoretical Limits and Practical Approximations
  • Provable Online CP/PARAFAC Decomposition of a Structured Tensor via Dictionary Learning
  • Look-ahead Meta Learning for Continual Learning
  • A polynomial-time algorithm for learning nonparametric causal graphs
  • Sparse Learning with CART
  • Proximal Mapping for Deep Regularization
  • Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models
  • Hierarchical Granularity Transfer Learning
  • Deep active inference agents using Monte-Carlo methods
  • Consistent Estimation of Identifiable Nonparametric Mixture Models from Grouped Observations
  • Manifold structure in graph embeddings
  • Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier with Application to Real-Time Information Filtering on the Web
  • MCUNet: Tiny Deep Learning on IoT Devices
  • In search of robust measures of generalization
  • Task-agnostic Exploration in Reinforcement Learning
  • Multi-task Additive Models for Robust Estimation and Automatic Structure Discovery
  • Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration
  • Softmax Deep Double Deterministic Policy Gradients
  • Online Decision Based Visual Tracking via Reinforcement Learning
  • Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity
  • DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs
  • Distributional Robustness with IPMs and links to Regularization and GANs
  • A shooting formulation of deep learning
  • CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances
  • Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning
  • MATE: Plugging in Model Awareness to Task Embedding for Meta Learning
  • Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits
  • Predictive Information Accelerates Learning in RL
  • Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization
  • High-Fidelity Generative Image Compression
  • A Statistical Mechanics Framework for Task-Agnostic Sample Design in Machine Learning
  • Counterexample-Guided Learning of Monotonic Neural Networks
  • A Novel Approach for Constrained Optimization in Graphical Models
  • Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology
  • On the Trade-off between Adversarial and Backdoor Robustness
  • Implicit Graph Neural Networks
  • Rethinking Importance Weighting for Deep Learning under Distribution Shift
  • Guiding Deep Molecular Optimization with Genetic Exploration
  • Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks
  • TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
  • Neural Topographic Factor Analysis for fMRI Data
  • Neural Architecture Generator Optimization
  • A Bandit Learning Algorithm and Applications to Auction Design
  • MetaPoison: Practical General-purpose Clean-label Data Poisoning
  • Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation
  • Training Generative Adversarial Networks with Limited Data
  • Deeply Learned Spectral Total Variation Decomposition
  • FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
  • Improving Neural Network Training in Low Dimensional Random Bases
  • Safe Reinforcement Learning via Curriculum Induction
  • Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning
  • How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19?
  • Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses
  • Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization
  • Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method
  • PGM-Explainer: Probabilistic Graphical Model Explanations for Graph Neural Networks
  • Few-Cost Salient Object Detection with Adversarial-Paced Learning
  • Minimax Estimation of Conditional Moment Models
  • Causal Imitation Learning With Unobserved Confounders
  • Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling
  • Learning Black-Box Attackers with Transferable Priors and Query Feedback
  • Locally Differentially Private (Contextual) Bandits Learning
  • Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax
  • Kernel Based Progressive Distillation for Adder Neural Networks
  • Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization
  • Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space
  • The Wasserstein Proximal Gradient Algorithm
  • Universally Quantized Neural Compression
  • Temporal Variability in Implicit Online Learning
  • Investigating Gender Bias in Language Models Using Causal Mediation Analysis
  • Off-Policy Imitation Learning from Observations
  • Escaping Saddle-Point Faster under Interpolation-like Conditions
  • Matérn Gaussian Processes on Riemannian Manifolds
  • Improved Techniques for Training Score-Based Generative Models
  • wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
  • A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs
  • Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients
  • Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?
  • Value-driven Hindsight Modelling
  • Dynamic Regret of Convex and Smooth Functions
  • On Convergence of Nearest Neighbor Classifiers over Feature Transformations
  • Mitigating Manipulation in Peer Review via Randomized Reviewer Assignments
  • Contrastive learning of global and local features for medical image segmentation with limited annotations
  • Self-Supervised Graph Transformer on Large-Scale Molecular Data
  • Generative Neurosymbolic Machines
  • How many samples is a good initial point worth in Low-rank Matrix Recovery?
  • CSER: Communication-efficient SGD with Error Reset
  • Efficient estimation of neural tuning during naturalistic behavior
  • High-recall causal discovery for autocorrelated time series with latent confounders
  • Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes
  • Joint Contrastive Learning with Infinite Possibilities
  • Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time
  • Adversarially-learned Inference via an Ensemble of Discrete Undirected Graphical Models
  • GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators
  • SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows
  • Learning Causal Effects via Weighted Empirical Risk Minimization
  • Revisiting the Sample Complexity of Sparse Spectrum Approximation of Gaussian Processes
  • Incorporating Interpretable Output Constraints in Bayesian Neural Networks
  • Multi-Stage Influence Function
  • Probabilistic Fair Clustering
  • Stochastic Segmentation Networks: Modelling Spatially Correlated Aleatoric Uncertainty
  • ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA
  • Testing Determinantal Point Processes
  • CogLTX: Applying BERT to Long Texts
  • f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning
  • Non-parametric Models for Non-negative Functions
  • Uncertainty Aware Semi-Supervised Learning on Graph Data
  • ConvBERT: Improving BERT with Span-based Dynamic Convolution
  • Practical No-box Adversarial Attacks against DNNs
  • Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model
  • Walking in the Shadow: A New Perspective on Descent Directions for Constrained Minimization
  • Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks
  • Reward Propagation Using Graph Convolutional Networks
  • LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration
  • Fully Dynamic Algorithm for Constrained Submodular Optimization
  • Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation
  • Autofocused oracles for model-based design
  • Debiasing Averaged Stochastic Gradient Descent to handle missing values
  • Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning
  • CompRess: Self-Supervised Learning by Compressing Representations
  • Sample complexity and effective dimension for regression on manifolds
  • The phase diagram of approximation rates for deep neural networks
  • Timeseries Anomaly Detection using Temporal Hierarchical One-Class Network
  • EcoLight: Intersection Control in Developing Regions Under Extreme Budget and Network Constraints
  • Reconstructing Perceptive Images from Brain Activity by Shape-Semantic GAN
  • Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
  • A Spectral Energy Distance for Parallel Speech Synthesis
  • Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations
  • Learning from Positive and Unlabeled Data with Arbitrary Positive Shift
  • Deep Energy-based Modeling of Discrete-Time Physics
  • Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning
  • Self-Learning Transformations for Improving Gaze and Head Redirection
  • Language-Conditioned Imitation Learning for Robot Manipulation Tasks
  • POMDPs in Continuous Time and Discrete Spaces
  • Exemplar Guided Active Learning
  • Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps
  • Node Embeddings and Exact Low-Rank Representations of Complex Networks
  • Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications
  • Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction
  • On Infinite-Width Hypernetworks
  • Interferobot: aligning an optical interferometer by a reinforcement learning agent
  • Program Synthesis with Pragmatic Communication
  • Principal Neighbourhood Aggregation for Graph Nets
  • Reliable Graph Neural Networks via Robust Aggregation
  • Instance Selection for GANs
  • Linear Disentangled Representations and Unsupervised Action Estimation
  • Video Frame Interpolation without Temporal Priors
  • Learning compositional functions via multiplicative weight updates
  • Sample Complexity of Uniform Convergence for Multicalibration
  • Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement
  • The interplay between randomness and structure during learning in RNNs
  • A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks
  • Instance-wise Feature Grouping
  • Robust Disentanglement of a Few Factors at a Time
  • PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
  • Group Contextual Encoding for 3D Point Clouds
  • Latent Bandits Revisited
  • Is normalization indispensable for training deep neural network?
  • Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions
  • Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks
  • Linear Time Sinkhorn Divergences using Positive Features
  • VarGrad: A Low-Variance Gradient Estimator for Variational Inference
  • A Convolutional Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction
  • Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method
  • Adversarial Counterfactual Learning and Evaluation for Recommender System
  • Memory-Efficient Learning of Stable Linear Dynamical Systems for Prediction and Control
  • Evolving Normalization-Activation Layers
  • ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
  • RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder
  • Efficient Learning of Discrete Graphical Models
  • Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals
  • Neurosymbolic Transformers for Multi-Agent Communication
  • Fairness in Streaming Submodular Maximization: Algorithms and Hardness
  • Smoothed Geometry for Robust Attribution
  • Fast Adversarial Robustness Certification of Nearest Prototype Classifiers for Arbitrary Seminorms
  • Multi-agent active perception with prediction rewards
  • A Local Temporal Difference Code for Distributional Reinforcement Learning
  • Learning with Optimized Random Features: Exponential Speedup by Quantum Machine Learning without Sparsity and Low-Rank Assumptions
  • CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations
  • Deep Automodulators
  • Convolutional Tensor-Train LSTM for Spatio-Temporal Learning
  • The Potts-Ising model for discrete multivariate data
  • Interpretable multi-timescale models for predicting fMRI responses to continuous natural speech
  • Group-Fair Online Allocation in Continuous Time
  • Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis
  • Understanding Gradient Clipping in Private SGD: A Geometric Perspective
  • O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
  • Identifying signal and noise structure in neural population activity with Gaussian process factor models
  • Equivariant Networks for Hierarchical Structures
  • MinMax Methods for Optimal Transport and Beyond: Regularization, Approximation and Numerics
  • A Discrete Variational Recurrent Topic Model without the Reparametrization Trick
  • Transferable Graph Optimizers for ML Compilers
  • Learning with Operator-valued Kernels in Reproducing Kernel Krein Spaces
  • Learning Bounds for Risk-sensitive Learning
  • Simplifying Hamiltonian and Lagrangian Neural Networks via Explicit Constraints
  • Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency
  • Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations
  • Constant-Expansion Suffices for Compressed Sensing with Generative Priors
  • RANet: Region Attention Network for Semantic Segmentation
  • A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent
  • Learning sparse codes from compressed representations with biologically plausible local wiring constraints
  • Self-Imitation Learning via Generalized Lower Bound Q-learning
  • Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity
  • Directional Pruning of Deep Neural Networks
  • Smoothly Bounding User Contributions in Differential Privacy
  • Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
  • Online Planning with Lookahead Policies
  • Learning Deep Attribution Priors Based On Prior Knowledge
  • Using noise to probe recurrent neural network structure and prune synapses
  • NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity
  • Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge
  • Neural FFTs for Universal Texture Image Synthesis
  • Graph Cross Networks with Vertex Infomax Pooling
  • Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms
  • Calibration of Shared Equilibria in General Sum Partially Observable Markov Games
  • MOPO: Model-based Offline Policy Optimization
  • Building powerful and equivariant graph neural networks with structural message-passing
  • Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning
  • Practical Low-Rank Communication Compression in Decentralized Deep Learning
  • Mutual exclusivity as a challenge for deep neural networks
  • 3D Shape Reconstruction from Vision and Touch
  • GradAug: A New Regularization Method for Deep Neural Networks
  • An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay
  • Learning Utilities and Equilibria in Non-Truthful Auctions
  • Rational neural networks
  • DISK: Learning local features with policy gradient
  • Transfer Learning via ℓ 1 \ell_1 1 Regularization
  • GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network
  • Deep Inverse Q-learning with Constraints
  • Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities
  • Prediction with Corrupted Expert Advice
  • Human Parsing Based Texture Transfer from Single Image to 3D Human via Cross-View Consistency
  • Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition
  • Point process models for sequence detection in high-dimensional neural spike trains
  • Adversarial Attacks on Linear Contextual Bandits
  • Meta-Consolidation for Continual Learning
  • Organizing recurrent network dynamics by task-computation to enable continual learning
  • Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting
  • Kernel Methods Through the Roof: Handling Billions of Points Efficiently
  • Spike and slab variational Bayes for high dimensional logistic regression
  • Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness
  • Fast geometric learning with symbolic matrices
  • MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler
  • CoinPress: Practical Private Mean and Covariance Estimation
  • Planning with General Objective Functions: Going Beyond Total Rewards
  • Scattering GCN: Overcoming Oversmoothness in Graph Convolutional Networks
  • KFC: A Scalable Approximation Algorithm for k k k−center Fair Clustering
  • Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms
  • Learning the Linear Quadratic Regulator from Nonlinear Observations
  • Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
  • Scalable Graph Neural Networks via Bidirectional Propagation
  • Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning
  • Assisted Learning: A Framework for Multi-Organization Learning
  • The Strong Screening Rule for SLOPE
  • STLnet: Signal Temporal Logic Enforced Multivariate Recurrent Neural Networks
  • Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks
  • Reducing Adversarially Robust Learning to Non-Robust PAC Learning
  • Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples
  • Black-Box Optimization with Local Generative Surrogates
  • Efficient Generation of Structured Objects with Constrained Adversarial Networks
  • Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning
  • Recovery of sparse linear classifiers from mixture of responses
  • Efficient Distance Approximation for Structured High-Dimensional Distributions via Learning
  • A Single Recipe for Online Submodular Maximization with Adversarial or Stochastic Constraints
  • Learning Sparse Prototypes for Text Generation
  • Implicit Rank-Minimizing Autoencoder
  • Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning
  • Task-Oriented Feature Distillation
  • Entropic Causal Inference: Identifiability and Finite Sample Results
  • Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement
  • Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis
  • AdaTune: Adaptive Tensor Program Compilation Made Efficient
  • When Do Neural Networks Outperform Kernel Methods?
  • STEER : Simple Temporal Regularization For Neural ODE
  • A Variational Approach for Learning from Positive and Unlabeled Data
  • Efficient Clustering Based On A Unified View Of K K K-means And Ratio-cut
  • Recurrent Switching Dynamical Systems Models for Multiple Interacting Neural Populations
  • Coresets via Bilevel Optimization for Continual Learning and Streaming
  • Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs
  • Understanding and Exploring the Network with Stochastic Architectures
  • All-or-nothing statistical and computational phase transitions in sparse spiked matrix estimation
  • Deep Evidential Regression
  • Analytical Probability Distributions and Exact Expectation-Maximization for Deep Generative Networks
  • Bayesian Pseudocoresets
  • See, Hear, Explore: Curiosity via Audio-Visual Association
  • Adversarial Training is a Form of Data-dependent Operator Norm Regularization
  • A Biologically Plausible Neural Network for Slow Feature Analysis
  • Learning Feature Sparse Principal Subspace
  • Online Adaptation for Consistent Mesh Reconstruction in the Wild
  • Online learning with dynamics: A minimax perspective
  • Learning to Select Best Forecast Tasks for Clinical Outcome Prediction
  • Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping
  • Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach
  • From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering
  • The Autoencoding Variational Autoencoder
  • A Fair Classifier Using Kernel Density Estimation
  • A Randomized Algorithm to Reduce the Support of Discrete Measures
  • Distributionally Robust Federated Averaging
  • Sharp uniform convergence bounds through empirical centralization
  • COBE: Contextualized Object Embeddings from Narrated Instructional Video
  • Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control
  • Finite Versus Infinite Neural Networks: an Empirical Study
  • Supermasks in Superposition
  • Nonasymptotic Guarantees for Spiked Matrix Recovery with Generative Priors
  • Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition
  • Learning to Incentivize Other Learning Agents
  • Displacement-Invariant Matching Cost Learning for Accurate Optical Flow Estimation
  • Distributionally Robust Local Non-parametric Conditional Estimation
  • Robust Multi-Object Matching via Iterative Reweighting of the Graph Connection Laplacian
  • Meta-Gradient Reinforcement Learning with an Objective Discovered Online
  • Learning Strategy-Aware Linear Classifiers
  • Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss
  • Calibrating Deep Neural Networks using Focal Loss
  • Optimizing Mode Connectivity via Neuron Alignment
  • Information Theoretic Regret Bounds for Online Nonlinear Control
  • A kernel test for quasi-independence
  • First Order Constrained Optimization in Policy Space
  • Learning Augmented Energy Minimization via Speed Scaling
  • Exploiting MMD and Sinkhorn Divergences for Fair and Transferable Representation Learning
  • Deep Rao-Blackwellised Particle Filters for Time Series Forecasting
  • Why are Adaptive Methods Good for Attention Models?
  • Neural Sparse Representation for Image Restoration
  • Boosting First-Order Methods by Shifting Objective: New Schemes with Faster Worst-Case Rates
  • Robust Sequence Submodular Maximization
  • Certified Monotonic Neural Networks
  • System Identification with Biophysical Constraints: A Circuit Model of the Inner Retina
  • Efficient Algorithms for Device Placement of DNN Graph Operators
  • Active Invariant Causal Prediction: Experiment Selection through Stability
  • BOSS: Bayesian Optimization over String Spaces
  • Model Interpretability through the lens of Computational Complexity
  • Markovian Score Climbing: Variational Inference with KL(p||q)
  • Improved Analysis of Clipping Algorithms for Non-convex Optimization
  • Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
  • A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection
  • StratLearner: Learning a Strategy for Misinformation Prevention in Social Networks
  • A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms
  • Kernel Alignment Risk Estimator: Risk Prediction from Training Data
  • Calibrating CNNs for Lifelong Learning
  • Online Convex Optimization Over Erdos-Renyi Random Networks
  • Robustness of Bayesian Neural Networks to Gradient-Based Attacks
  • Parametric Instance Classification for Unsupervised Visual Feature learning
  • Sparse Weight Activation Training
  • Collapsing Bandits and Their Application to Public Health Intervention
  • Neural Sparse Voxel Fields
  • A Flexible Framework for Designing Trainable Priors with Adaptive Smoothing and Game Encoding
  • The Discrete Gaussian for Differential Privacy
  • Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing
  • Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes
  • Learning efficient task-dependent representations with synaptic plasticity
  • A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions
  • Error Bounds of Imitating Policies and Environments
  • Disentangling Human Error from Ground Truth in Segmentation of Medical Images
  • Consequences of Misaligned AI
  • Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning
  • Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences
  • Hitting the High Notes: Subset Selection for Maximizing Expected Order Statistics
  • Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous GNNs
  • Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses
  • The Lottery Ticket Hypothesis for Pre-trained BERT Networks
  • Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity
  • Beyond Perturbations: Learning Guarantees with Arbitrary Adversarial Test Examples
  • AdvFlow: Inconspicuous Black-box Adversarial Attacks using Normalizing Flows
  • Few-shot Image Generation with Elastic Weight Consolidation
  • On the Expressiveness of Approximate Inference in Bayesian Neural Networks
  • Non-Crossing Quantile Regression for Distributional Reinforcement Learning
  • Dark Experience for General Continual Learning: a Strong, Simple Baseline
  • Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping
  • Neural encoding with visual attention
  • On the linearity of large non-linear models: when and why the tangent kernel is constant
  • PLLay: Efficient Topological Layer based on Persistent Landscapes
  • Decentralized Langevin Dynamics for Bayesian Learning
  • Shared Space Transfer Learning for analyzing multi-site fMRI data
  • The Diversified Ensemble Neural Network
  • Inductive Quantum Embedding
  • Variational Bayesian Unlearning
  • Batched Coarse Ranking in Multi-Armed Bandits
  • Understanding and Improving Fast Adversarial Training
  • Coded Sequential Matrix Multiplication For Straggler Mitigation
  • Attack of the Tails: Yes, You Really Can Backdoor Federated Learning
  • Certifiably Adversarially Robust Detection of Out-of-Distribution Data
  • Domain Generalization via Entropy Regularization
  • Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels
  • Skeleton-bridged Point Completion: From Global Inference to Local Adjustment
  • Compressing Images by Encoding Their Latent Representations with Relative Entropy Coding
  • Improved Guarantees for k-means++ and k-means++ Parallel
  • Sparse Spectrum Warped Input Measures for Nonstationary Kernel Learning
  • An Efficient Adversarial Attack for Tree Ensembles
  • Learning Continuous System Dynamics from Irregularly-Sampled Partial Observations
  • Online Bayesian Persuasion
  • Robust Pre-Training by Adversarial Contrastive Learning
  • Random Walk Graph Neural Networks
  • Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling
  • Fast and Accurate k k k-means++ via Rejection Sampling
  • Variational Amodal Object Completion
  • When Counterpoint Meets Chinese Folk Melodies
  • Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces
  • Universal Domain Adaptation through Self Supervision
  • Patch2Self: Denoising Diffusion MRI with Self-Supervised Learning​
  • Stochastic Normalization
  • Constrained episodic reinforcement learning in concave-convex and knapsack settings
  • On Learning Ising Models under Huber’s Contamination Model
  • Cross-validation Confidence Intervals for Test Error
  • DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation
  • Bayesian Attention Modules
  • Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations
  • SoftFlow: Probabilistic Framework for Normalizing Flow on Manifolds
  • A meta-learning approach to (re)discover plasticity rules that carve a desired function into a neural network
  • Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough
  • Path Integral Based Convolution and Pooling for Graph Neural Networks
  • Estimating the Effects of Continuous-valued Interventions using Generative Adversarial Networks
  • Latent Dynamic Factor Analysis of High-Dimensional Neural Recordings
  • Conditioning and Processing: Techniques to Improve Information-Theoretic Generalization Bounds
  • Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning
  • GAN Memory with No Forgetting
  • Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games
  • Gaussian Gated Linear Networks
  • Node Classification on Graphs with Few-Shot Novel Labels via Meta Transformed Network Embedding
  • Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning
  • Convex optimization based on global lower second-order models
  • Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition
  • Relative gradient optimization of the Jacobian term in unsupervised deep learning
  • Self-Supervised Visual Representation Learning from Hierarchical Grouping
  • Optimal Variance Control of the Score-Function Gradient Estimator for Importance-Weighted Bounds
  • Explicit Regularisation in Gaussian Noise Injections
  • Numerically Solving Parametric Families of High-Dimensional Kolmogorov Partial Differential Equations via Deep Learning
  • Finite-Time Analysis for Double Q-learning
  • Learning to Detect Objects with a 1 Megapixel Event Camera
  • End-to-End Learning and Intervention in Games
  • Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms
  • Predictive coding in balanced neural networks with noise, chaos and delays
  • Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs
  • On the Equivalence between Online and Private Learnability beyond Binary Classification
  • AViD Dataset: Anonymized Videos from Diverse Countries
  • Probably Approximately Correct Constrained Learning
  • RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning
  • Decisions, Counterfactual Explanations and Strategic Behavior
  • Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample
  • A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained Optimization
  • Reservoir Computing meets Recurrent Kernels and Structured Transforms
  • Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection
  • Linear Dynamical Systems as a Core Computational Primitive
  • Ratio Trace Formulation of Wasserstein Discriminant Analysis
  • PAC-Bayes Analysis Beyond the Usual Bounds
  • Few-shot Visual Reasoning with Meta-Analogical Contrastive Learning
  • MPNet: Masked and Permuted Pre-training for Language Understanding
  • Reinforcement Learning with Feedback Graphs
  • Zap Q-Learning With Nonlinear Function Approximation
  • Lipschitz-Certifiable Training with a Tight Outer Bound
  • Fast Adaptive Non-Monotone Submodular Maximization Subject to a Knapsack Constraint
  • Conformal Symplectic and Relativistic Optimization
  • Bayes Consistency vs. H-Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class
  • Inverting Gradients - How easy is it to break privacy in federated learning?
  • Dynamic allocation of limited memory resources in reinforcement learning
  • CryptoNAS: Private Inference on a ReLU Budget
  • A Stochastic Path Integral Differential EstimatoR Expectation Maximization Algorithm
  • CHIP: A Hawkes Process Model for Continuous-time Networks with Scalable and Consistent Estimation
  • SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection
  • Design Space for Graph Neural Networks
  • HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
  • Unbalanced Sobolev Descent
  • Identifying Mislabeled Data using the Area Under the Margin Ranking
  • Combining Deep Reinforcement Learning and Search for Imperfect-Information Games
  • High-Throughput Synchronous Deep RL
  • Contrastive Learning with Adversarial Examples
  • Mixed Hamiltonian Monte Carlo for Mixed Discrete and Continuous Variables
  • Adversarial Sparse Transformer for Time Series Forecasting
  • The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks
  • CLEARER: Multi-Scale Neural Architecture Search for Image Restoration
  • Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights
  • Compositional Explanations of Neurons
  • Calibrated Reliable Regression using Maximum Mean Discrepancy
  • Directional convergence and alignment in deep learning
  • Functional Regularization for Representation Learning: A Unified Theoretical Perspective
  • Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
  • Understanding Global Feature Contributions With Additive Importance Measures
  • Online Non-Convex Optimization with Imperfect Feedback
  • Co-Tuning for Transfer Learning
  • Multifaceted Uncertainty Estimation for Label-Efficient Deep Learning
  • Continuous Surface Embeddings
  • Succinct and Robust Multi-Agent Communication With Temporal Message Control
  • Big Bird: Transformers for Longer Sequences
  • Neural Execution Engines: Learning to Execute Subroutines
  • Random Reshuffling: Simple Analysis with Vast Improvements
  • Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors
  • Statistical Optimal Transport posed as Learning Kernel Embedding
  • Dual-Resolution Correspondence Networks
  • Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization
  • f-Divergence Variational Inference
  • Unfolding recurrence by Green’s functions for optimized reservoir computing
  • The Dilemma of TriHard Loss and an Element-Weighted TriHard Loss for Person Re-Identification
  • Disentangling by Subspace Diffusion
  • Towards Neural Programming Interfaces
  • Discovering Symbolic Models from Deep Learning with Inductive Biases
  • Real World Games Look Like Spinning Tops
  • Cooperative Heterogeneous Deep Reinforcement Learning
  • Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization
  • ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool
  • Dense Correspondences between Human Bodies via Learning Transformation Synchronization on Graphs
  • Reasoning about Uncertainties in Discrete-Time Dynamical Systems using Polynomial Forms.
  • Applications of Common Entropy for Causal Inference
  • SGD with shuffling: optimal rates without component convexity and large epoch requirements
  • Unsupervised Joint k-node Graph Representations with Compositional Energy-Based Models
  • Neural Manifold Ordinary Differential Equations
  • CO-Optimal Transport
  • Continuous Meta-Learning without Tasks
  • A mathematical theory of cooperative communication
  • Penalized Langevin dynamics with vanishing penalty for smooth and log-concave targets
  • Learning Invariances in Neural Networks from Training Data
  • A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods
  • Pruning Filter in Filter
  • Learning to Mutate with Hypergradient Guided Population
  • A convex optimization formulation for multivariate regression
  • Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
  • The All-or-Nothing Phenomenon in Sparse Tensor PCA
  • Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis
  • ARMA Nets: Expanding Receptive Field for Dense Prediction
  • Diversity-Guided Multi-Objective Bayesian Optimization With Batch Evaluations
  • SOLOv2: Dynamic and Fast Instance Segmentation
  • Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization
  • Axioms for Learning from Pairwise Comparisons
  • Continuous Regularized Wasserstein Barycenters
  • Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting

更多推荐

【论文阅读笔记】NeurIPS2020文章列表Part1