- semigroups and automata theory
- Advanced Algebra and Geometry
- Natural Language Processing Techniques
- Analytic Number Theory Research
- Algorithms and Data Compression
- Algebraic Geometry and Number Theory
- Mathematical Dynamics and Fractals
- Phonetics and Phonology Research
- Gender Studies in Language
- Advanced Algebra and Logic
- Advanced Combinatorial Mathematics
- Graph Theory and Algorithms
- Neural Networks and Applications
- Interconnection Networks and Systems
- Adversarial Robustness in Machine Learning
- Advanced Image Fusion Techniques
- Parallel Computing and Optimization Techniques
- Advanced Image and Video Retrieval Techniques
- Random Matrices and Applications
- Advanced Neural Network Applications
- Advanced Mathematical Theories and Applications
- Explainable Artificial Intelligence (XAI)
- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Topic Modeling
Stanford University
2021
University of Michigan
2018-2021
Williams College
2018
Princeton University
2018
Columbia University
2018
Dartmouth College
2018
University of California, Riverside
2018
Graph-based algorithms have gained significant interest in several application domains. Solutions addressing the computational efficiency of such mostly relied on many-core architectures. Cleverly laying out input graphs storage, by placing adjacent vertices a same storage unit (memory bank or cache unit), enables fast access during graph traversal. Dynamic graphs, however, must be continuously repartitioned to leverage this benefit. Yet software repartitioning solutions rely costly,...
Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce computational and memory bottlenecks modeling long sequences. However, it is still challenging balance trade-off between model quality efficiency perform a one-size-fits-all approximation for different tasks. To better understand this trade-off, we observe that sparse approximations excel regimes, determined by softmax temperature attention, + can outperform...
Mechanistic interpretability aims to explain what a neural network has learned at nuts-and-bolts level. What are the fundamental primitives of representations? Previous mechanistic descriptions have used individual neurons or their linear combinations understand representations learned. But there clues that and not correct units description: directions cannot describe how networks use nonlinearities structure representations. Moreover, many instances polysemantic (i.e. they multiple...
Let $\mathcal E: y^2 = x^3 + A(T)x B(T)$ be a nontrivial one-parameter family of elliptic curves over $\mathbb{Q}(T)$, with $A(T), B(T) \in \mathbb Z(T)$, and consider the $k$\textsuperscript{th} moments $A_{k,\mathcal{E}}(p) := \sum_{t (p)} a_{\mathcal{E}_t}(p)^k$ Dirichlet coefficients $a_{\mathcal{E}_t}(p) p 1 - |\mathcal{E}_t (\mathbb{F}_p)|$. Rosen Silverman proved conjecture Nagao relating first moment $A_{1,\mathcal{E}}(p)$ to rank Michel that if $j(T)$ is not constant then second...
Recently Burkhardt et. al. introduced the $k$-checkerboard random matrix ensembles, which have a split limiting behavior of eigenvalues (in limit all but $k$ are on order $\sqrt{N}$ and converge to semi-circular behavior, with remaining size $N$ converging hollow Gaussian ensembles). We generalize their work consider non-Hermitian ensembles complex eigenvalues; instead blip new is seen, ranging from multiple satellites annular rings. These results based moment method techniques adapted plane...
An equivalent definition of the Fibonacci numbers is that they are unique sequence such every integer can be written uniquely as a sum non-adjacent terms. We view this we have bins length 1, take at most one element from bin, and if choose an bin cannot neighboring bin. generalize to allowing varying restrictions how many elements may used in decomposition. derive conditions on when resulting sequences uniqueness decomposition, (similar case) number summands converges Gaussian; main tool...
We derive a refined conjecture for the variance of Gaussian primes across sectors, with power saving error term, by applying L-functions Ratios Conjecture. observe bifurcation point in main consistent Random Matrix Theory (RMT) heuristic previously proposed Rudnick and Waxman. Our model also identifies second point, undetected RMT model, that emerges upon taking into account lower order terms. For sufficiently small we moreover prove an unconditional result is our down to
When solving challenging problems, language models (LMs) are able to identify relevant information from long and complicated contexts. To study how LMs solve retrieval tasks in diverse situations, we introduce ORION, a collection of structured spanning six domains, text understanding coding. Each task ORION can be represented abstractly by request (e.g. question) that retrieves an attribute the character name) context story). We apply causal analysis on 18 open-source with sizes ranging 125...
A generalized lexicographic order on words is a where the total of alphabet depends position comparison. Lyndon word finite which strictly smallest among its class rotations with respect to order. This notion can be extended infinite words: an suffixes. We prove conjecture Dolce, Restivo, and Reutenauer: every has unique nonincreasing factorization into words. When this finitely many terms, we characterize last term factorization. Our methods also show that are precisely infinitely prefixes.