- Advanced Vision and Imaging
- Topic Modeling
- Digital Media Forensic Detection
- Image and Signal Denoising Methods
- Image and Object Detection Techniques
- Face recognition and analysis
- Optical measurement and interference techniques
- Robotics and Sensor-Based Localization
- Biomedical Text Mining and Ontologies
- Generative Adversarial Networks and Image Synthesis
- Natural Language Processing Techniques
- Handwritten Text Recognition Techniques
- Web Data Mining and Analysis
- 3D Shape Modeling and Analysis
- Data-Driven Disease Surveillance
- Advanced Image Processing Techniques
- Data Quality and Management
- Anomaly Detection Techniques and Applications
- Scientific Computing and Data Management
- Image Retrieval and Classification Techniques
- Misinformation and Its Impacts
- Semantic Web and Ontologies
- 3D Surveying and Cultural Heritage
- Data Mining Algorithms and Applications
- Image Enhancement Techniques
Nanjing University of Science and Technology
2015-2024
University of Chicago
2024
Google (United States)
2022-2023
University of California, Los Angeles
2017-2022
University of California, Berkeley
2019-2022
Czech Academy of Sciences, Institute of Computer Science
2018
Courant Institute of Mathematical Sciences
2018
New York University
2018
Tel Aviv University
2018
ETH Zurich
2018
We present a conceptually simple yet effective algorithm to detect wireframes in given image. Compared the previous methods which first predict an intermediate heat map and then extract straight lines with heuristic algorithms, our method is end-to-end trainable can directly output vectorized wireframe that contains semantically meaningful geometrically salient junctions lines. To better understand quality of outputs, we propose new metric for evaluation penalizes overlapped line segments...
In this work, we introduce the novel problem of identifying dense canonical 3D coordinate frames from a single RGB image. We observe that each pixel in an image corresponds to surface underlying geometry, where frame can be identified as represented by three orthogonal axes, one along its normal direction and two tangent plane. propose algorithm predict these axes RGB. Our first insight is computed automatically with recently introduced field synthesis methods provide training data for task....
Extracting event temporal relations is a critical task for information extraction and plays an important role in natural language understanding. Prior systems leverage deep learning pre-trained models to improve the performance of task. However, these often suffer from two shortcomings: 1) when performing maximum posteriori (MAP) inference based on neural models, previous only used structured knowledge that assumed be absolutely correct, i.e., hard constraints; 2) biased predictions dominant...
We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution maps with unparalleled sharpness and high-frequency details. The predictions are metric, absolute scale, without relying on the availability of metadata such as camera intrinsics. And is fast, producing 2.25-megapixel map in 0.3 seconds standard GPU. These characteristics enabled by number technical contributions, including an efficient multi-scale vision...
We present a simple yet effective end-to-end trainable deep network with geometry-inspired convolutional operators for detecting vanishing points in images. Traditional neural networks rely on aggregating edge features and do not have mechanisms to directly exploit the geometric properties of as intersections parallel lines. In this work, we identify canonical conic space which can effectively compute global information locally, propose novel operator named convolution that be implemented...
The outbreak of the novel coronavirus, COVID-19, has become one most severe pandemics in human history. In this paper, we propose to leverage social media users as sensors simultaneously predict pandemic trends and suggest potential risk factors for public health experts understand spread situations recommend proper interventions. More precisely, develop deep learning models recognize important entities their relations over time, thereby establishing dynamic heterogeneous graphs describe...
Being able to infer 3D structures from 2D images with geometric principles, vanishing points have been a well-recognized concept in vision research. It has widely used autonomous driving, SLAM, and AR/VR for applications including road direction estimation, camera calibration, pose estimation. Existing point detection methods often need trade off between robustness, precision, inference speed. In this paper, we introduce VaPiD, novel neural network-based rapid Vanishing Point Detector that...
Abstract We have developed ACROBAT (Annotation for Case Reports using Open Biomedical Annotation Terms), a typing system detailed information extraction from clinical text. This resource supports identification and categorization of entities, events, relations within text documents, including clincal case reports (CCRs) the free-text components electronic health records. Using 200 CCRs, we annotated wide variety real-world disease presentations. The resulting dataset, MACCROBAT2018, is rich...
There has been a steady need to precisely extract structured knowledge from the web (i.e. HTML documents). Given page, extracting object along with various attributes of interest (e.g. price, publisher, author, and genre for book) can facilitate variety downstream applications such as large-scale base construction, e-commerce product search, personalized recommendation. Considering each page is rendered an DOM tree, existing approaches formulate problem tree node tagging task. However, they...
Smoke removal is an important and meaningful issue for endoscopic surgery, which can enhance the visual quality of images. Because it practically impossible to construct a large training dataset pair-matched images with/without smoke, Generative Adversarial Nets (GANs) based models are usually used image desmoke. But they have difficulties in either locating accurate smoke area, or recovering realistic internal organ tissue details. In this paper, we propose new approach, called...
We present a conceptually simple yet effective algorithm to detect wireframes in given image. Compared the previous methods which first predict an intermediate heat map and then extract straight lines with heuristic algorithms, our method is end-to-end trainable can directly output vectorized wireframe that contains semantically meaningful geometrically salient junctions lines. To better understand quality of outputs, we propose new metric for evaluation penalizes overlapped line segments...
Given a web page, extracting an object along with various attributes of interest (e.g. price, publisher, author, and genre for book) can facilitate variety downstream applications such as large-scale knowledge base construction, e-commerce product search, personalized recommendation. Prior approaches have either relied on computationally expensive visual feature engineering or required large amounts training data to get acceptable precision. In this paper, we propose novel method, LeArNing...
We present Re current F eature A lignment (ReFA), an end-to-end neural network for the very rapid creation of production-grade face assets from multi-view images. ReFA is on par with industrial pipelines in quality producing accurate, complete, registered, and textured directly applicable to physically-based rendering, but produces asset end-to-end, fully automatically at a significantly faster speed 4.5 FPS, which unprecedented among neural-based techniques. Our method represents geometry...
Removing noise and other artifacts in the electrocardiogram (ECG) is a critical preprocessing step for further heart disease analysis diagnosis. In this paper, we propose sparse representation based ECG signal denoising baseline wandering (BW) correction algorithm. Unlike traditional filtering-based methods, like Fourier or Wavelet transform, which use fixed basis, proposed algorithm models as superposition of few inner structures plus additive random noise, while those can be learned from...
Professional basketball provides an intriguing example of a dynamic spatio-temporal game that incorporates both hidden strategy policies and situational decision making. During game, the coaches players are assumed to follow general plan, but also forced make spur-of-the-moment decisions based on immediate conditions court. However, because it is challenging process heterogeneous signals court space potential actions outcomes massive, hard for find optimal fly given short amount time observe...
COVID-19 has caused lasting damage to almost every domain in public health, society, and economy. To monitor the pandemic trend, existing studies rely on aggregation of traditional statistical models epidemic spread theory. In other words, historical statistics COVID-19, as well population mobility data, become essential knowledge for monitoring trend. However, these solutions can barely provide precise prediction satisfactory explanations long-term disease surveillance while ubiquitous...
Understanding visually-rich business documents to extract structured data and automate workflows has been receiving attention both in academia industry. Although recent multi-modal language models have achieved impressive results, we find that existing benchmarks do not reflect the complexity of real seen In this work, identify desiderata for a more comprehensive benchmark propose one call Visually Rich Document (VRDU). VRDU contains two datasets represent several challenges: rich schema...
3D reconstruction from a single RGB image is challenging problem in computer vision. Previous methods are usually solely data-driven, which lead to inaccurate shape recovery and limited generalization capability. In this work, we focus on object-level present geometry-based end-to-end deep learning framework that first detects the mirror plane of reflection symmetry commonly exists man-made objects then predicts depth maps by finding intra-image pixel-wise correspondence symmetry. Our method...