Fang Zheng

ORCID: 0000-0003-2395-730X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Data Storage Technologies
  • Distributed and Parallel Computing Systems
  • Scientific Computing and Data Management
  • Parallel Computing and Optimization Techniques
  • Cryptography and Data Security
  • Data Management and Algorithms
  • Interconnection Networks and Systems
  • Advanced Database Systems and Queries
  • Privacy-Preserving Technologies in Data
  • Cloud Computing and Resource Management
  • Cloud Data Security Solutions
  • Embedded Systems Design Techniques
  • Satellite Communication Systems
  • Telecommunications and Broadcasting Technologies
  • Software System Performance and Reliability
  • Advanced Graph Neural Networks
  • Advanced Wireless Communication Techniques
  • Energy Efficient Wireless Sensor Networks
  • Computational Drug Discovery Methods
  • Bioinformatics and Genomic Networks
  • Algorithms and Data Compression
  • Deception detection and forensic psychology
  • Cryptography and Residue Arithmetic
  • Fluoride Effects and Removal
  • Internet Traffic Analysis and Secure E-voting

Jiangxi University of Finance and Economics
2024

Peng Cheng Laboratory
2024

People's Bank of China
2024

Sichuan University
2013-2024

Minzu University of China
2023

Wuhan Business University
2023

Shanxi University of Finance and Economics
2019-2023

East China University of Science and Technology
2023

Huazhong Agricultural University
2007-2021

Ocean University of China
2021

Can artificial intelligence (AI) assist human employees in increasing employee creativity? Drawing on research AI–human collaboration, job design, and creativity, we examine AI assistance the form of a sequential division labor within organizations: task, handles initial portion, which is well-codified repetitive, focus subsequent involving higher-level problem-solving. First, provide causal evidence from field experiment conducted at telemarketing company. We find that generating sales...

10.5465/amj.2022.0426 article EN Academy of Management Journal 2023-03-28

Since IO performance on HPC machines strongly depends machine characteristics and configuration, it is important to carefully tune libraries make good use of appropriate library APIs. For instance, current petascale machines, independent tends outperform collective IO, in part due bottlenecks at the metadata server. The problem exacerbated by scaling issues, since each scales differently machine, typically, operates efficiently different levels machines. With scientific codes being run a...

10.1109/ipdps.2009.5161052 article EN 2009-05-01

Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful science end users, such data must organized its layout, indexed, sorted, otherwise manipulated for subsequent presentation, visualization, detailed analysis. In addition, scientists desire gain insights into selected characteristics `hidden' or `latent' these massive datasets while is being produced by simulations....

10.1109/ipdps.2010.5470454 article EN 2010-01-01

Significant challenges exist for achieving peak or even consistent levels of performance when using IO systems at scale. They stem from sharing system resources across the processes single largescale applications and/or multiple simultaneous programs causing internal and external interference, which in turn, causes substantial reductions performance. This paper presents interference effects measurements two different file supercomputing sites. These motivate developing a 'managed' approach...

10.1109/sc.2010.32 article EN 2010-11-01

Known challenges for petascale machines are that (1) the costs of I/O high performance applications can be substantial, especially output tasks like checkpointing, and (2) noise from actions inject undesirable delays into runtimes such codes on individual compute nodes. This paper introduces flexible 'DataStager' framework data staging alternative services within jointly address (2). Data moving nodes to or prior storage used reduce overheads applications' total processing times, explicit...

10.1145/1551609.1551618 article EN 2009-06-09

Increasingly severe I/O bottlenecks on High-End Computing machines are prompting scientists to process simulation output data online while simulations running and before storing disk. There several options place analytics along the path: compute nodes, separate nodes dedicated analytics, or after is stored persistent storage. Since different placements have impact performance cost, there a consequent need for flexibility in location of analytics. The FlexIO middleware described this paper...

10.1109/ipdps.2013.46 article EN 2013-05-01

Severe I/O bottlenecks on High End Computing platforms call for running data analytics in situ. Demonstrating that there exist considerable resources compute nodes un-used by typical high end scientific simulations, we leverage this fact creating an agile runtime, termed GoldRush, can harvest those otherwise wasted, idle to efficiently run situ analytics. GoldRush uses fine-grained scheduling "steal" resources, ways minimize interference between the simulation and This involves recognizing...

10.1145/2503210.2503279 article EN 2013-10-30

With data explosion in scale and variety, OLAP databases play an increasingly important role serving real-time analysis with low latency (e.g., hundreds of milliseconds), especially when incoming queries are complex ad hoc nature. Moreover, these systems expected to provide high query concurrency write throughput, support over structured types JSON, vector texts). In this paper, we introduce AnalyticDB, a database system developed at Alibaba. AnalyticDB maintains all-column indexes...

10.14778/3352063.3352124 article EN Proceedings of the VLDB Endowment 2019-08-01

The complexity of HPC systems has increased the burden on developer as applications scale to hundreds thousands processing cores. Moreover, additional efforts are required achieve acceptable I/O performance, where it is important how performed, which resources used, and functionality deployed. Specifically, by scheduling data movement effectively placing operators affecting volumes or information about data, tremendous gains can be achieved both in performance simulation output usability...

10.1109/clustr.2009.5289167 article EN 2009-01-01

Despite the implementation of safety alignment strategies, large language models (LLMs) remain vulnerable to jailbreak attacks, which undermine these guardrails and pose significant security threats. Some defenses have been proposed detect or mitigate jailbreaks, but they are unable withstand test time due an insufficient understanding mechanisms. In this work, we investigate mechanisms behind jailbreaks based on Linear Representation Hypothesis (LRH), states that neural networks encode...

10.48550/arxiv.2502.07557 preprint EN arXiv (Cornell University) 2025-02-11

The remote visual exploration of live data generated by scientific simulations is useful for discovery, performance monitoring, and online validation the simulation results. Online visualization methods are challenged, however, continued growth in volume output that has to be transferred from its source - running on high end machine where it analyzed, visualized, displayed. A specific challenge this context limits communication bandwidth between source(s) sinks. Previous work places queries...

10.1109/cluster.2013.6702635 article EN 2013-09-01

10.2478/nimmir-2024-0005 article EN cc-by-nc NIM Marketing Intelligence Review 2024-04-25

The skyline query can help identify the “best” objects in a multi-attribute dataset. During past decade, this has received considerable attention database research community. Most focused on computing “skyline” of dataset, or set “skyline objects” that are not dominated by any other object. Such algorithms appropriate an online system, which should respond real time to requests with arbitrary subsets attributes (also called subspaces). To guarantee real-time response, system precompute...

10.1145/2188349.2188357 article EN ACM Transactions on Database Systems 2012-05-01

Lack of I/O scalability is known to cause measurable slowdowns for large-scale scientific applications running on high end machines. This prompting researchers devise 'I/O staging' methods in which outputs are processed via online analysis and visualization support desired science outcomes. Organized as workflows carried out pipelines, these components run concurrently with simulations, often using a smaller set nodes the machine termed 'staging areas'. paper presents new approach dealing...

10.1109/ipdpsw.2013.198 article EN 2013-05-01

In order to understand the complex physics of mother nature, physicist often use many approximations one area and then write a simulation reduce these equations ones that can be solved on computer. Different lead different model physics, which completely code. As computers become more powerful, scientists either models all or they produce several codes each for portions 'couple' together. this paper, we concentrate latter, where look at our code coupling approach modeling full device fusion...

10.1145/1645164.1645172 article EN 2009-11-16

Increasingly severe I/O bottlenecks on High-End Computing machines are prompting scientists to process output data during simulation time, "in-situ", and before placing disks. This paper argues for flexibility in the implementation of such in-situ analytics, using measurements a performance model that demonstrate potential advantages limitations performing analytics at different levels hierarchy, including machine's compute nodes vs. separate "staging" dedicated analysis tasks. Model...

10.1145/2159352.2159362 article EN 2011-11-13

Increasingly larger scale simulations are generating an unprecedented amount of output data, causing researchers to explore new `data staging' methods that buffer, use, and/or reduce such data online rather than simply pushing it disk. Leveraging the capabilities staging, this study explores potential for reduction via compression, first using general compression techniques and then proposing use-specific permit users define simple queries cause only identified by those be emitted. Using...

10.1109/sc.companion.2012.114 article EN 2012-11-01

Scientific Data Management has become essential to the productivity of scientists using ever larger machines and running applications that produce more data. There are several specific issues when on petascale (and beyond) machines. One is need for massively parallel data output, which in part, depends formats semantics being used. Here, inhibition parallelism by file system notions strict immediate consistency can be addressed with ldrdelayed consistencypsila methods. Such methods also used...

10.1109/pdsw.2008.4811881 article EN 2008-11-01

Data Stream Processing is an important class of data intensive applications in the "Big Data" era. Chip Multi-Processors (CMPs) are standard hosting platforms modern centers. Gaining high performance for stream processing on CMPs therefore great interest. Since largely depends their effective use complex cache structure present CMPs, this paper proposes StreamMap approach tuning streaming applications' cache. Our major idea to map application threads CPU cores facilitate sharing AND mitigate...

10.1109/icdcs.2013.13 article EN 2013-07-01

In-situ analysis on the output data of scientific simulations has been made necessary by ever-growing volumes and increasing costs movement as supercomputing is moving towards exascale. With hardware accelerators like GPUs becoming increasingly common in high end machines, new opportunities arise to co-locate online performed generated simulations. However, asynchronous nature GPGPU programming models limited context-switching capabilities GPU pose challenges co-locating simulation same GPU....

10.1109/ccgrid.2016.58 article EN 2016-05-01
Coming Soon ...