- Algorithms and Data Compression
- Topic Modeling
- Computational Geometry and Mesh Generation
- Natural Language Processing Techniques
- Genomics and Phylogenetic Studies
- Advanced Graph Neural Networks
- RNA and protein synthesis mechanisms
- Complex Network Analysis Techniques
- Advanced Graph Theory Research
- DNA and Biological Computing
- Advanced Text Analysis Techniques
- Data Management and Algorithms
- Graph Theory and Algorithms
- Authorship Attribution and Profiling
- Gene expression and cancer classification
- Digital Image Processing Techniques
- Machine Learning and Algorithms
- Web Data Mining and Analysis
- Computability, Logic, AI Algorithms
- Computer Graphics and Visualization Techniques
- Sentiment Analysis and Opinion Mining
- Genome Rearrangement Algorithms
- Complexity and Algorithms in Graphs
- Constraint Satisfaction and Optimization
- Advanced biosensing and bioanalysis techniques
Stony Brook University
2015-2024
State University of New York
2004-2023
Cornell University
1991-2020
Technion – Israel Institute of Technology
2020
Tampere University
2020
University of Illinois Urbana-Champaign
1985-2020
Institute for Research in Fundamental Sciences
2020
Georgia Institute of Technology
2020
University of Michigan
2020
University of California, Berkeley
2008
We present DeepWalk, a novel approach for learning latent representations of vertices in network. These encode social relations continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements language modeling and unsupervised feature (or deep learning) from sequences words to graphs.
As a result of the redundancy genetic code, adjacent pairs amino acids can be encoded by as many 36 different synonymous codons. A species-specific "codon pair bias" provides that some codon are used more or less frequently than statistically predicted. We synthesized de novo large DNA molecules using hundreds over-or underrepresented to encode poliovirus capsid protein. Underrepresented caused decreased rates protein translation, and polioviruses containing such acid-independent changes...
We propose a new computational approach for tracking and detecting statistically significant linguistic shifts in the meaning usage of words. Such are especially prevalent on Internet, where rapid exchange ideas can quickly change word's meaning. Our meta-analysis constructs property time series word usage, then uses sound point detection algorithms to identify shifts. consider analyze three approaches increasing complexity generate such series, culmination which distributional...
Exploring the utility of de novo gene synthesis with aim designing stably attenuated polioviruses (PV), we followed two strategies to construct PV variants containing synthetic replacements capsid coding sequences either by deoptimizing synonymous codon usage (PV-AB) or maximizing position changes existing wild-type (wt) poliovirus codons (PV-SD). Despite 934 nucleotide in region, PV-SD RNA produced virus characteristics. In contrast, no viable was recovered from PV-AB carrying 680 silent...
We present HARP, a novel method for learning low dimensional embeddings of graph’s nodes which preserves higher-order structural features. Our proposed achieves this by compressing the input graph prior to embedding it, effectively avoiding troublesome configurations (i.e. local minima) can pose problems non-convex optimization. HARP works finding smaller approximates global structure its input. This simplified is used learn set initial representations, serve as good initializations...
Multilingual knowledge graph (KG) embeddings provide latent semantic representations of entities and structured with cross-lingual inferences, which benefit various knowledge-driven NLP tasks. However, precisely learning such inferences is usually hindered by the low coverage entity alignment in many KGs. Since multilingual KGs also literal descriptions entities, this paper, we introduce an embedding-based approach leverages a weakly aligned KG for semi-supervised using descriptions. Our...
Deep generative models have been enjoying success in modeling continuous data. However it remains challenging to capture the representations for discrete structures with formal grammars and semantics, e.g., computer programs molecular structures. How generate both syntactically semantically correct data still largely an open problem. Inspired by theory of compiler where syntax semantics check is done via syntax-directed translation (SDT), we propose a novel variational autoencoder (SD-VAE)...
There is a growing interest in mining opinions using sentiment analysis methods from sources such as news, blogs and product reviews. Most of these have been developed for English are difficult to generalize other languages. We explore an approach utilizing state-of-the-art machine translation technology perform on the foreign language text. Our experiments indicate that (a) entity scores obtained by our method statistically significantly correlated across nine languages news five parallel...
Summary The effects of elevated atmospheric CO 2 (560 p.p.m.) and subsequent plant responses on the soil microbial community composition associated with trembling aspen was assessed through classification 6996 complete ribosomal DNA sequences amplified from Rhinelander WI free‐air O 3 enrichment (FACE) experiments metagenome. This in‐depth comparative analysis provides an unprecedented, detailed deep branching profile population changes incurred as a response to this environmental...
With examples of all 450 functions in action plus tutorial text on the mathematics, this book is definitive guide to Experimenting with Combinatorica, a widely used software package for teaching and research discrete mathematics. Three interesting classes exercises are provided--theorem/proof, programming exercises, experimental explorations--ensuring great flexibility learning material. The Combinatorica user community ranges from students engineers, researchers computer science, physics,...
We use quantitative media (blogs, and news as a comparison) data generated by large-scale natural language processing (NLP) text analysis system to perform comprehensive comparative study on how company related variables anticipates or reflects the company's stock trading volumes financial returns. Building our findings, we give sentiment-based market-neutral strategy which gives consistently favorable returns with low volatility over long period. Our results are significant in confirming...
The problem of ethnicity identification from names has a variety important applications, including biomedical research, demographic studies, and marketing. Here we report on the development an classifier where all training data is extracted public, non-confidential (and hence somewhat unreliable) sources. Our uses hidden Markov models (HMMs) decision trees to classify into 13 cultural/ethnic groups with individual group accuracy comparable earlier binary (e.g., Spanish/non-Spanish)...
Almost all scientific visualization involving surfaces is currently done via triangles. The speed at which such triangulated can be displayed crucial to interactive and bounded by the rate data sent graphics subsystem for rendering. Partitioning polygonal models into triangle strips significantly reduce rendering times over transmitting each individually. We present new efficient algorithms constructing from partially models, experimental results showing these are on average 15% better than...
We present WALKLETS, a novel approach for learning multiscale representations of vertices in network. In contrast to previous works, these explicitly encode multi-scale vertex relationships way that is analytically derivable.
Sentiment analysis in a multilingual world remains challenging problem, because developing language-specific sentiment lexicons is an extremely resourceintensive process. Such remain scarce resource for most languages. In this paper, we address lexicon gap by building high-quality 136 major We integrate variety of linguistic resources to produce immense knowledge graph. By appropriately propagating from seed words, construct each component language our Our have polarity agreement 95.7% with...
We present HARP, a novel method for learning low dimensional embeddings of graph's nodes which preserves higher-order structural features. Our proposed achieves this by compressing the input graph prior to embedding it, effectively avoiding troublesome configurations (i.e. local minima) can pose problems non-convex optimization. HARP works finding smaller approximates global structure its input. This simplified is used learn set initial representations, serve as good initializations...