Xiangrui Meng

ORCID: 0000-0002-2628-9960
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Stochastic Gradient Optimization Techniques
  • Sparse and Compressive Sensing Techniques
  • Coding theory and cryptography
  • graph theory and CDMA systems
  • Robotics and Sensor-Based Localization
  • Cloud Computing and Resource Management
  • Complexity and Algorithms in Graphs
  • Advanced SAR Imaging Techniques
  • Machine Learning and Algorithms
  • Scientific Computing and Data Management
  • Statistical Methods and Inference
  • Magnesium Oxide Properties and Applications
  • Cellular Automata and Applications
  • Advanced Data Storage Technologies
  • Advanced Photocatalysis Techniques
  • Parallel Computing and Optimization Techniques
  • Synthetic Aperture Radar (SAR) Applications and Techniques
  • Data Stream Mining Techniques
  • Underwater Acoustics Research
  • Copper-based nanomaterials and applications
  • Advanced Vision and Imaging
  • Cancer Mechanisms and Therapy
  • Environmental Impact and Sustainability
  • Matrix Theory and Algorithms
  • Mathematical Approximation and Integration

Changchun University of Science and Technology
2024

Chinese Academy of Sciences
2016-2024

Aerospace Information Research Institute
2022-2024

Shandong University of Technology
2021-2024

Tianjin Medical University Cancer Institute and Hospital
2024

Hefei University of Technology
2024

Anhui University of Science and Technology
2010-2024

Jianghan University
2024

Yunnan Normal University
2024

Yunnan University
2024

This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.

10.1145/2934664 article EN Communications of the ACM 2016-10-28

Spark SQL is a new module in Apache that integrates relational processing with Spark's functional programming API. Built on our experience Shark, lets programmers leverage the benefits of (e.g. declarative queries and optimized storage), users call complex analytics libraries machine learning). Compared to previous systems, makes two main additions. First, it offers much tighter integration between procedural processing, through DataFrame API code. Second, includes highly extensible...

10.1145/2723372.2742797 article EN 2015-05-27

Apache Spark is a popular open-source platform for large-scale data processing that well-suited iterative machine learning tasks. In this paper we present MLlib, Spark's distributed library. MLlib provides efficient functionality wide range of settings and includes several underlying statistical, optimization, linear algebra primitives. Shipped with Spark, supports languages high-level API leverages rich ecosystem to simplify the development end-to-end pipelines. has experienced rapid growth...

10.48550/arxiv.1505.06807 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Low-distortion embeddings are critical building blocks for developing random sampling and projection algorithms common linear algebra problems. We show that, given a matrix A ∈ Rn x d with n >> p [1, 2), constant probability, we can construct low-distortion embedding Π RO(poly(d)) that embeds Ap, the lp subspace spanned by A's columns, into (RO(poly(d)), |~cdot~|p); distortion of our is only O(poly(d)), compute in O(nnz(A)) time, i.e., input-sparsity time. Our result generalizes time l2...

10.1145/2488608.2488621 article EN 2013-05-28

We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. computes the min-length solution to min x∈ℝ n ‖Ax - b‖2, where A ∈ ℝ m × with ≫ or ≪ n, and may be rank-deficient. Tikhonov regularization also included. Since involved only in matrix-matrix matrix-vector multiplications, it can dense sparse matrix linear operator, automatically speeds up when fast operator. The preconditioning phase consists of projection, which embarrassingly...

10.1137/120866580 article EN SIAM Journal on Scientific Computing 2014-01-01

Recommender systems have to deal with the cold start problem as new users and/or items are always present. Rating elicitation is a common approach for handling start. However, there still lacks principled model guiding how select most useful ratings. In this paper, we propose identify representative and using representative-based matrix factorization. Not only do show that selected representatives superior other competing methods in terms of achieving good balance between coverage diversity,...

10.1145/2043932.2043943 article EN 2011-10-23

We describe matrix computations available in the cluster programming framework, Apache Spark. Out of box, Spark provides abstractions and implementations for distributed matrices optimization routines using these matrices. When translating single-node algorithms to run on a cluster, we observe that often simple idea is enough: separating operations from vector shipping be ran while keeping local driver. In case Singular Value Decomposition, by taking this an extreme, are able exploit...

10.1145/2939672.2939675 article EN 2016-08-08

R is a popular statistical programming language with number of extensions that support data processing and machine learning tasks. However, interactive analysis in usually limited as the runtime single threaded can only process sets fit machine's memory. We present SparkR, an package provides frontend to Apache Spark uses Spark's distributed computation engine enable large scale from shell. describe main design goals discuss how high-level DataFrame API enables scalable some key details our...

10.1145/2882903.2903740 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-14

@ are a coreset for the problem, consisting of sampled and rescaled rows A b; s is independent n polynomial in d. Our results improve on best previous algorithms when > d, all p e [1, ∞) except = 2; particular, they O(nd1.376+) running time Sohler Woodruff (STOC, 2011) 1, that uses asymptotically fast matrix multiplication, O(nd5 log n) Dasgupta et al. (SICOMP, 2009) general p, ellipsoidal rounding. We also provide suite improved finding well-conditioned bases via rounding, illustrating...

10.5555/2627817.2627851 article EN Symposium on Discrete Algorithms 2013-01-06

In this era of large-scale data, distributed systems built on top clusters commodity hardware provide cheap and reliable storage scalable processing massive data. With storage, instead storing only currently relevant it is common to store as much data possible, hoping that its value can be extracted later. way, exabytes (1018 bytes) are being created a daily basis. Extracting from these however, requires implementations advanced analytical algorithms beyond simple processing, e.g.,...

10.1109/jproc.2015.2494219 article EN Proceedings of the IEEE 2015-12-17

Quantile regression is a method to estimate the quantiles of conditional distribution response variable, and as such it permits much more accurate portrayal relationship between variable observed covariates than methods least-squares or least absolute deviations regression. It can be expressed linear program, and, with appropriate preprocessing, interior-point used find solution for moderately large problems. Dealing very problems, e.g., involving data up beyond terabyte regime, remains...

10.1137/130919258 article EN SIAM Journal on Scientific Computing 2014-01-01

We provide fast algorithms for overconstrained $\ell_p$ regression and related problems: an $n\times d$ input matrix $A$ vector $b\in\mathbb{R}^n$, in $O(nd\log n)$ time we reduce the problem $\min_{x\in\mathbb{R}^d} \|Ax-b\|_p$ to same with $\tilde A$ of dimension $s \times corresponding b$ $s\times 1$. Here, are a coreset problem, consisting sampled rescaled rows $b$; $s$ is independent $n$ polynomial $d$. Our results improve on best previous when $n\gg all $p\in [1,\infty)$ except $p=2$;...

10.1137/140963698 article EN SIAM Journal on Computing 2016-01-01

Previous chapter Next Full AccessProceedings Proceedings of the 2013 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)The Fast Cauchy Transform and Faster Robust Linear RegressionKenneth L. Clarkson, Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, Xiangrui Meng, David P. WoodruffKenneth Woodruffpp.466 - 477Chapter DOI:https://doi.org/10.1137/1.9781611973105.34PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract We provide fast...

10.1137/1.9781611973105.34 preprint EN 2013-01-06

Geometrical optimization of nanowire arrays (NWAs) has been regarded as a straightforward and important route to improve the performance for photoelectrocatalytic systems but not realized with copper oxides yet. In this work we successfully performed control CuO NWAs via electrochemically prepared Cu(OH)2 intermediate structures. The consist uniform nanowires tunable length from 2 10 μm aspect ratios over 100. Results suggest that can significantly photocurrents several times, which is...

10.1021/acsaem.0c00554 article EN ACS Applied Energy Materials 2020-06-19
Coming Soon ...