Phitchaya Mangpo Phothilimthana

ORCID: 0000-0003-3492-3690
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Software Testing and Debugging Techniques
  • Embedded Systems Design Techniques
  • Cloud Computing and Resource Management
  • Advanced Neural Network Applications
  • Interconnection Networks and Systems
  • Logic, programming, and type systems
  • Software Engineering Research
  • Ferroelectric and Negative Capacitance Devices
  • Software System Performance and Reliability
  • Distributed and Parallel Computing Systems
  • Caching and Content Delivery
  • Optimization and Search Problems
  • Topic Modeling
  • Online Learning and Analytics
  • Formal Methods in Verification
  • Semiconductor materials and devices
  • Domain Adaptation and Few-Shot Learning
  • Complexity and Algorithms in Graphs
  • Machine Learning and Data Classification
  • Data Stream Mining Techniques
  • Natural Language Processing Techniques
  • Advanced Image and Video Retrieval Techniques
  • Advanced Graph Neural Networks
  • Intelligent Tutoring Systems and Adaptive Learning

Google (United States)
2019-2025

Brain (Germany)
2022

University of California, Berkeley
2013-2019

Berkeley College
2016

Massachusetts Institute of Technology
2013

Trends in both consumer and high performance computing are bringing not only more cores, but also increased heterogeneity among the computational resources within a single machine. In many machines, one of greatest is now their graphics coprocessors (GPUs), just primary CPUs. But GPU programming memory models differ dramatically from conventional CPUs, relative characteristics different processors vary widely between machines. Different system often perform best with algorithms usage...

10.1145/2451116.2451162 article EN 2013-03-16

Developing a code optimizer is challenging, especially for new, idiosyncratic ISAs. Superoptimization can, in principle, discover machine-specific optimizations automatically by searching the space of all instruction sequences. If we can increase size fragments superoptimizer optimize, will be able to more optimizations. We develop LENS, search algorithm that increases synthesize rapidly pruning away invalid candidate programs. Pruning achieved selectively refining abstraction under which...

10.1145/2872362.2872387 article EN 2016-03-25

We developed Chlorophyll, a synthesis-aided programming model and compiler for the GreenArrays GA144, an extremely minimalist low-power spatial architecture that requires partitioning program into fragments of no more than 256 instructions 64 words data. This processor is 100-times energy efficient its competitors, but currently can only be programmed using low-level stack-based language.

10.1145/2594291.2594339 article EN 2014-05-13

Developing server applications that offload computation to a NIC accelerator is complex and laborious. Developers have explore the design space, which includes semantic changes for different offloading strategies, as well variations on parallelization, program-to-resource mapping, communication strategies program components across devices.We therefore FLOEM -- language, compiler, runtime programming NIC-accelerated applications. enables exploration by providing abstractions assign hardware...

10.5555/3291168.3291217 article EN Operating Systems Design and Implementation 2018-10-08

Utilizing memory and register bandwidth in modern architectures may require swizzles --- non-trivial mappings of data computations onto hardware resources such as shuffles. We develop Swizzle Inventor to help programmers implement swizzle programs, by writing program sketches that omit delegating their creation an automatic synthesizer. Our synthesis algorithm scales real-world allowing us invent new GPU kernels for stencil computations, matrix transposition, a finite field multiplication...

10.1145/3297858.3304059 article EN 2019-04-04

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers make heuristic decisions, superoptimizers as a minimization objective, or autotuners find an optimal configuration for specific program. However, they difficult develop because contemporary processors complex, and the recent proliferation of deep learning accelerators has increased development burden. We demonstrate method from corpus tensor computation graph programs Tensor...

10.48550/arxiv.2008.01040 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Tensor compilers, essential for generating efficient code deep learning models across various applications, employ tensor graph rewrites as one of the key optimizations. These optimize computational graphs with expectation preserving semantics tensors arbitrary rank and size. Despite this expectation, to best our knowledge, there does not exist a fully automated verification system prove soundness these Previous works, while successful in verifying concrete rank, do provide guarantees...

10.1145/3704865 article EN Proceedings of the ACM on Programming Languages 2025-01-07

2D image convolution is ubiquitous in processing and computer vision problems such as feature extraction. Exploiting parallelism a common strategy for accelerating convolution. Parallel processors keep getting faster, but algorithms remain memory bounded on parallel GPUs. Therefore, reducing communication fundamental to To reduce communication, we reorganize the algorithm prefetch regions register, do more work per thread with fewer threads. enable portability future architectures, implement...

10.1109/icip.2013.6738436 article EN 2013-09-01

We developed Chlorophyll, a synthesis-aided programming model and compiler for the GreenArrays GA144, an extremely minimalist low-power spatial architecture that requires partitioning program into fragments of no more than 256 instructions 64 words data. This processor is 100-times energy efficient its competitors, but currently can only be programmed using low-level stack-based language. The Chlorophyll allows programmers to provide human insight by specifying partial data computation....

10.1145/2666356.2594339 article EN ACM SIGPLAN Notices 2014-06-05

Representative modeling of I/O activity is crucial when designing large-scale distributed storage systems. Particularly important use cases are counterfactual "what-if" analyses that assess the impact anticipated or hypothetical new policies hardware prior to deployment. We propose Thesios, a methodology accurately synthesize such full-resolution traces by carefully combining down-sampled collected from multiple disks attached servers. Applying this approach real-world already routinely...

10.1145/3620666.3651337 article EN 2024-04-24

Developing an optimizing compiler backend remains a laborious process, especially for nontraditional ISAs that have been appearing recently. Superoptimization sidesteps the need many code transformations by searching most optimal instruction sequence semantically equivalent to original fragment. Even though superoptimization discovers best machine-specific optimizations, it has yet become widely-used. We propose GreenThumb, extensible framework reduces cost of constructing superoptimizers...

10.1145/2892208.2892233 article EN 2016-03-14

In massive programming courses, automated hint generation offers the promise of zero-cost, zero-latency assistance for students who are struggling to make progress on solving a program. While more robust approach based path construction requires tremendous engineering effort build, another easier-to-build program mutations suffers from low coverage.

10.1145/3059009.3059058 article EN 2017-06-28

Search-based techniques have been demonstrated effective in solving complex optimization problems that arise domain-specific compilers for machine learning (ML). Unfortunately, deploying such production is impeded by two limitations. First, prior works require factorization of a computation graph into smaller subgraphs over which search applied. This decomposition not only non-trivial but also significantly limits the scope optimization. Second, to be applied single stage compilation flow,...

10.1109/pact52795.2021.00008 article EN 2021-09-01

Trends in both consumer and high performance computing are bringing not only more cores, but also increased heterogeneity among the computational resources within a single machine. In many machines, one of greatest is now their graphics coprocessors (GPUs), just primary CPUs. But GPU programming memory models differ dramatically from conventional CPUs, relative characteristics different processors vary widely between machines. Different system often perform best with algorithms usage...

10.1145/2490301.2451162 article EN ACM SIGARCH Computer Architecture News 2013-03-16

In the past few years, neural architecture search (NAS) has become an increasingly important tool within deep learning community. Despite many recent successes of NAS, however, most existing approaches operate highly structured design spaces, and hence explore only a small fraction full space architectures while also requiring significant manual effort from domain experts. this work, we develop techniques that enable efficient NAS in significantly larger space. To accomplish this, propose to...

10.1145/3563329 article EN Proceedings of the ACM on Programming Languages 2022-10-31

Trends in both consumer and high performance computing are bringing not only more cores, but also increased heterogeneity among the computational resources within a single machine. In many machines, one of greatest is now their graphics coprocessors (GPUs), just primary CPUs. But GPU programming memory models differ dramatically from conventional CPUs, relative characteristics different processors vary widely between machines. Different system often perform best with algorithms usage...

10.1145/2499368.2451162 article EN ACM SIGPLAN Notices 2013-03-16

Precise hardware performance models play a crucial role in code optimizations. They can assist compilers making heuristic decisions or aid autotuners identifying the optimal configuration for given program. For example, autotuner XLA, machine learning compiler, discovered 10-20% speedup on state-of-the-art serving substantial production traffic at Google. Although there exist few datasets program prediction, they target small sub-programs such as basic blocks kernels. This paper introduces...

10.48550/arxiv.2308.13490 preprint EN cc-by arXiv (Cornell University) 2023-01-01

We provide an implementation of algorithm that, given a triangulated planar graph with m edges, returns simple cycle that is 3/4-balanced separator consisting at most √8 edges. An efficient construction short and balanced forms essential in numerous algorithms, for example, computing shortest paths, minimum cuts, or maximum flows. To the best our knowledge, this first such worst-case guarantee on length. evaluate performance compare it to algorithms recently studied by Holzer et al. [2009]....

10.1145/2957318 article EN ACM Journal of Experimental Algorithmics 2016-09-15

Analytical hardware performance models yield swift estimation of desired metrics. However, developing these analytical for modern processors with sophisticated microarchitectures is an extremely laborious task and requires a firm understanding target microarchitecture's internal structure. In this paper, we introduce GRANITE <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> , new machine learning model that estimates the throughput basic...

10.1109/iiswc55918.2022.00012 article EN 2022-11-01

Developing a code optimizer is challenging, especially for new, idiosyncratic ISAs. Superoptimization can, in principle, discover machine-specific optimizations automatically by searching the space of all instruction sequences. If we can increase size fragments superoptimizer optimize, will be able to more optimizations. We develop LENS, search algorithm that increases synthesize rapidly pruning away invalid candidate programs. Pruning achieved selectively refining abstraction under which...

10.1145/2954679.2872387 article EN ACM SIGPLAN Notices 2016-03-25

Developing a code optimizer is challenging, especially for new, idiosyncratic ISAs. Superoptimization can, in principle, discover machine-specific optimizations automatically by searching the space of all instruction sequences. If we can increase size fragments superoptimizer optimize, will be able to more optimizations. We develop LENS, search algorithm that increases synthesize rapidly pruning away invalid candidate programs. Pruning achieved selectively refining abstraction under which...

10.1145/2954680.2872387 article EN ACM SIGOPS Operating Systems Review 2016-03-25
Coming Soon ...