- Stochastic Gradient Optimization Techniques
- Sparse and Compressive Sensing Techniques
- Coding theory and cryptography
- graph theory and CDMA systems
- Robotics and Sensor-Based Localization
- Cloud Computing and Resource Management
- Complexity and Algorithms in Graphs
- Advanced SAR Imaging Techniques
- Machine Learning and Algorithms
- Scientific Computing and Data Management
- Statistical Methods and Inference
- Magnesium Oxide Properties and Applications
- Cellular Automata and Applications
- Advanced Data Storage Technologies
- Advanced Photocatalysis Techniques
- Parallel Computing and Optimization Techniques
- Synthetic Aperture Radar (SAR) Applications and Techniques
- Data Stream Mining Techniques
- Underwater Acoustics Research
- Copper-based nanomaterials and applications
- Advanced Vision and Imaging
- Cancer Mechanisms and Therapy
- Environmental Impact and Sustainability
- Matrix Theory and Algorithms
- Mathematical Approximation and Integration
Changchun University of Science and Technology
2024
Chinese Academy of Sciences
2016-2024
Aerospace Information Research Institute
2022-2024
Shandong University of Technology
2021-2024
Tianjin Medical University Cancer Institute and Hospital
2024
Hefei University of Technology
2024
Anhui University of Science and Technology
2010-2024
Jianghan University
2024
Yunnan Normal University
2024
Yunnan University
2024
This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.
Spark SQL is a new module in Apache that integrates relational processing with Spark's functional programming API. Built on our experience Shark, lets programmers leverage the benefits of (e.g. declarative queries and optimized storage), users call complex analytics libraries machine learning). Compared to previous systems, makes two main additions. First, it offers much tighter integration between procedural processing, through DataFrame API code. Second, includes highly extensible...
Apache Spark is a popular open-source platform for large-scale data processing that well-suited iterative machine learning tasks. In this paper we present MLlib, Spark's distributed library. MLlib provides efficient functionality wide range of settings and includes several underlying statistical, optimization, linear algebra primitives. Shipped with Spark, supports languages high-level API leverages rich ecosystem to simplify the development end-to-end pipelines. has experienced rapid growth...
Low-distortion embeddings are critical building blocks for developing random sampling and projection algorithms common linear algebra problems. We show that, given a matrix A ∈ Rn x d with n >> p [1, 2), constant probability, we can construct low-distortion embedding Π RO(poly(d)) that embeds Ap, the lp subspace spanned by A's columns, into (RO(poly(d)), |~cdot~|p); distortion of our is only O(poly(d)), compute in O(nnz(A)) time, i.e., input-sparsity time. Our result generalizes time l2...
We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. computes the min-length solution to min x∈ℝ n ‖Ax - b‖2, where A ∈ ℝ m × with ≫ or ≪ n, and may be rank-deficient. Tikhonov regularization also included. Since involved only in matrix-matrix matrix-vector multiplications, it can dense sparse matrix linear operator, automatically speeds up when fast operator. The preconditioning phase consists of projection, which embarrassingly...
Recommender systems have to deal with the cold start problem as new users and/or items are always present. Rating elicitation is a common approach for handling start. However, there still lacks principled model guiding how select most useful ratings. In this paper, we propose identify representative and using representative-based matrix factorization. Not only do show that selected representatives superior other competing methods in terms of achieving good balance between coverage diversity,...
We describe matrix computations available in the cluster programming framework, Apache Spark. Out of box, Spark provides abstractions and implementations for distributed matrices optimization routines using these matrices. When translating single-node algorithms to run on a cluster, we observe that often simple idea is enough: separating operations from vector shipping be ran while keeping local driver. In case Singular Value Decomposition, by taking this an extreme, are able exploit...
R is a popular statistical programming language with number of extensions that support data processing and machine learning tasks. However, interactive analysis in usually limited as the runtime single threaded can only process sets fit machine's memory. We present SparkR, an package provides frontend to Apache Spark uses Spark's distributed computation engine enable large scale from shell. describe main design goals discuss how high-level DataFrame API enables scalable some key details our...
@ are a coreset for the problem, consisting of sampled and rescaled rows A b; s is independent n polynomial in d. Our results improve on best previous algorithms when > d, all p e [1, ∞) except = 2; particular, they O(nd1.376+) running time Sohler Woodruff (STOC, 2011) 1, that uses asymptotically fast matrix multiplication, O(nd5 log n) Dasgupta et al. (SICOMP, 2009) general p, ellipsoidal rounding. We also provide suite improved finding well-conditioned bases via rounding, illustrating...
In this era of large-scale data, distributed systems built on top clusters commodity hardware provide cheap and reliable storage scalable processing massive data. With storage, instead storing only currently relevant it is common to store as much data possible, hoping that its value can be extracted later. way, exabytes (1018 bytes) are being created a daily basis. Extracting from these however, requires implementations advanced analytical algorithms beyond simple processing, e.g.,...
Quantile regression is a method to estimate the quantiles of conditional distribution response variable, and as such it permits much more accurate portrayal relationship between variable observed covariates than methods least-squares or least absolute deviations regression. It can be expressed linear program, and, with appropriate preprocessing, interior-point used find solution for moderately large problems. Dealing very problems, e.g., involving data up beyond terabyte regime, remains...
We provide fast algorithms for overconstrained $\ell_p$ regression and related problems: an $n\times d$ input matrix $A$ vector $b\in\mathbb{R}^n$, in $O(nd\log n)$ time we reduce the problem $\min_{x\in\mathbb{R}^d} \|Ax-b\|_p$ to same with $\tilde A$ of dimension $s \times corresponding b$ $s\times 1$. Here, are a coreset problem, consisting sampled rescaled rows $b$; $s$ is independent $n$ polynomial $d$. Our results improve on best previous when $n\gg all $p\in [1,\infty)$ except $p=2$;...
Previous chapter Next Full AccessProceedings Proceedings of the 2013 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)The Fast Cauchy Transform and Faster Robust Linear RegressionKenneth L. Clarkson, Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, Xiangrui Meng, David P. WoodruffKenneth Woodruffpp.466 - 477Chapter DOI:https://doi.org/10.1137/1.9781611973105.34PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract We provide fast...
Geometrical optimization of nanowire arrays (NWAs) has been regarded as a straightforward and important route to improve the performance for photoelectrocatalytic systems but not realized with copper oxides yet. In this work we successfully performed control CuO NWAs via electrochemically prepared Cu(OH)2 intermediate structures. The consist uniform nanowires tunable length from 2 10 μm aspect ratios over 100. Results suggest that can significantly photocurrents several times, which is...