default search action
Wen-Mei W. Hwu
Person information
- affiliation: University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering, Urbana-Champaign, IL, USA
- award (1999): Grace Murray Hopper Award
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j84]Mohit Mahajan, Wen-Mei Hwu, Rakesh Nagi:
Determining optimal channel partition for 2:4 fine grained structured sparsity. Optim. Lett. 18(9): 2079-2090 (2024) - [j83]Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, Wen-Mei Hwu:
Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses. Proc. VLDB Endow. 17(6): 1227-1240 (2024) - [c245]Chia-Hao Chang, Jihoon Han, Anand Sivasubramaniam, Vikram Sharma Mailthody, Zaid Qureshi, Wen-Mei Hwu:
GMT: GPU Orchestrated Memory Tiering for the Big Data Era. ASPLOS (3) 2024: 464-478 - [c244]Kun Wu, Mert Hidayetoglu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, Wen-Mei Hwu:
Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures. ASPLOS (3) 2024: 528-544 - [c243]Mert Hidayetoglu, Simon Garcia De Gonzalo, Elliott Slaughter, Yu Li, Christopher Zimmer, Tekin Bicer, Bin Ren, William Gropp, Wen-Mei Hwu, Alex Aiken:
CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes. ICS 2024: 426-436 - [i62]Ali Hassani, Wen-Mei Hwu, Humphrey Shi:
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level. CoRR abs/2403.04690 (2024) - [i61]Jeongmin Brian Park, Kun Wu, Vikram Sharma Mailthody, Zaid Qureshi, Scott A. Mahlke, Wen-Mei W. Hwu:
LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme. CoRR abs/2407.15264 (2024) - [i60]Mert Hidayetoglu, Simon Garcia de Gonzalo, Elliott Slaughter, Pinku Surana, Wen-Mei W. Hwu, William Gropp, Alex Aiken:
HiCCL: A Hierarchical Collective Communication Library. CoRR abs/2408.05962 (2024) - [i59]Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoglu, Vikram Sharma Mailthody, Sitao Huang, Steven S. Lumetta, Wen-Mei W. Hwu:
TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading. CoRR abs/2408.10013 (2024) - 2023
- [j82]Mohamed El-Hadedy, Xinfei Guo, Kazutomo Yoshii, Yichen Cai, Robert Herndon, Bryan Banta, Wen-Mei Hwu:
RECO-ASCON: Reconfigurable ASCON hash functions for IoT applications. Integr. 93: 102061 (2023) - [c242]Mohammad Almasri, Yen-Hsiang Chang, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-mei W. Hwu:
Parallelizing Maximal Clique Enumeration on GPUs. PACT 2023: 162-175 - [c241]Jie Huang, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei Hwu:
Can Language Models Be Specific? How? ACL (Findings) 2023: 716-727 - [c240]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-Mei W. Hwu:
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture. ASPLOS (2) 2023: 325-339 - [c239]Luyang Yu, Yizhen Lu, Meghna Mandava, Edward Richter, Vikram Sharma Mailthody, Seungwon Min, Wen-Mei W. Hwu, Deming Chen:
FSSD: FPGA-Based Emulator for SSDs. FPL 2023: 101-108 - [c238]Samiran Kawtikwar, Mohammad Almasri, Wen-Mei Hwu, Rakesh Nagi, Jinjun Xiong:
BEEP: Balanced Efficient subgraph Enumeration in Parallel. ICPP 2023: 142-152 - [c237]Mohamed El-Hadedy, Russell Hua, Kazutomo Yoshii, Wen-Mei Hwu, Martin Margala:
RECO-LFSR: Reconfigurable Low-power Cryptographic processor based on LFSR for Trusted IoT platforms. ISQED 2023: 1-7 - [c236]Arpandeep Khatua, Vikram Sharma Mailthody, Bhagyashree Taleka, Tengfei Ma, Xiang Song, Wen-Mei Hwu:
IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research. KDD 2023: 4284-4295 - [c235]Mohamed El-Hadedy, Russell Hua, Shahzman Saqib, Kazutomo Yoshii, Wen-Mei Hwu, Martin Margala:
BLTESTI: Benchmarking Lightweight TinyJAMBU on Embedded Systems for Trusted IoT. SOCC 2023: 1-6 - [c234]Benjamin Reidys, Yuqi Xue, Daixuan Li, Bharat Sukhwani, Wen-Mei Hwu, Deming Chen, Sameh W. Asaad, Jian Huang:
RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design. SOSP 2023: 182-199 - [i58]Kun Wu, Mert Hidayetoglu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, Wen-Mei W. Hwu:
PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks. CoRR abs/2301.06284 (2023) - [i57]Arpandeep Khatua, Vikram Sharma Mailthody, Bhagyashree Taleka, Tengfei Ma, Xiang Song, Wen-mei W. Hwu:
IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research. CoRR abs/2302.13522 (2023) - [i56]Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, Wen-Mei Hwu:
Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses. CoRR abs/2306.16384 (2023) - [i55]Jeongmin Brian Park, Zaid Qureshi, Vikram S. Mailthody, Andrew Gacek, Shunfan Shao, Mohammad Almasri, Isaac Gelado, Jinjun Xiong, Chris J. Newburn, I-Hsin Chung, Michael Garland, Nikolay Sakharnykh, Wen-Mei W. Hwu:
CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs. CoRR abs/2307.03760 (2023) - [i54]Benjamin Reidys, Yuqi Xue, Daixuan Li, Bharat Sukhwani, Wen-mei W. Hwu, Deming Chen, Sameh W. Asaad, Jian Huang:
RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design. CoRR abs/2309.06513 (2023) - 2022
- [j81]Omer Anjum, Mohammad Almasri, Simon Garcia de Gonzalo, Wen-Mei W. Hwu:
An efficient GPU implementation and scaling for higher-order 3D stencils. Inf. Sci. 586: 326-343 (2022) - [j80]Xiaofan Zhang, Yuan Ma, Jinjun Xiong, Wen-Mei W. Hwu, Volodymyr V. Kindratenko, Deming Chen:
Exploring HW/SW Co-Design for Video Analysis on CPU-FPGA Heterogeneous Systems. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(6): 1606-1619 (2022) - [j79]Mert Hidayetoglu, Tekin Biçer, Simon Garcia de Gonzalo, Bin Ren, Doga Gürsoy, Rajkumar Kettimuthu, Ian T. Foster, Wen-Mei W. Hwu:
MemXCT: Design, Optimization, Scaling, and Reproducibility of X-Ray Tomography Imaging. IEEE Trans. Parallel Distributed Syst. 33(9): 2014-2031 (2022) - [c233]Jie Huang, Kevin Chang, Jinjun Xiong, Wen-Mei Hwu:
Open Relation Modeling: Learning to Define Relations between Entities. ACL (Findings) 2022: 297-308 - [c232]Mhd Ghaith Olabi, Juan Gómez-Luna, Onur Mutlu, Wen-Mei Hwu, Izzat El Hajj:
A Compiler Framework for Optimizing Dynamic Parallelism on GPUs. CGO 2022: 1-13 - [c231]Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei Hwu:
Understanding Jargon: Combining Extraction and Generation for Definition Modeling. EMNLP 2022: 3994-4004 - [c230]Jie Huang, Kerui Zhu, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei Hwu:
DEER: Descriptive Knowledge Graph for Explaining Entity Relationships. EMNLP 2022: 6686-6698 - [c229]Mohammad Almasri, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-Mei Hwu:
Parallel K-clique counting on GPUs. ICS 2022: 21:1-21:14 - [c228]Vibhor Dodeja, Mohammad Almasri, Rakesh Nagi, Jinjun Xiong, Wen-Mei Hwu:
PARSEC: PARallel Subgraph Enumeration in CUDA. IPDPS 2022: 168-178 - [c227]Seungwon Min, Kun Wu, Mert Hidayetoglu, Jinjun Xiong, Xiang Song, Wen-Mei Hwu:
Graph Neural Network Training and Data Tiering. KDD 2022: 3555-3565 - [c226]Xiangdong Wei, Mohamed El-Hadedy, Sergiu Mosanu, Zhengping Zhu, Wen-Mei Hwu, Xinfei Guo:
RECO-HCON: A High-Throughput Reconfigurable Compact ASCON Processor for Trusted IoT. SOCC 2022: 1-6 - [d1]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelago, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-mei W. Hwu:
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture. Zenodo, 2022 - [i53]Mhd Ghaith Olabi, Juan Gómez-Luna, Onur Mutlu, Wen-Mei W. Hwu, Izzat El Hajj:
A Compiler Framework for Optimizing Dynamic Parallelism on GPUs. CoRR abs/2201.02789 (2022) - [i52]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-Mei W. Hwu:
BaM: A Case for Enabling Fine-grain High Throughput GPU-Orchestrated Access to Storage. CoRR abs/2203.04910 (2022) - [i51]Jie Huang, Kerui Zhu, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei Hwu:
DKG: A Descriptive Knowledge Graph for Explaining Relationships between Entities. CoRR abs/2205.10479 (2022) - [i50]Jie Huang, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei Hwu:
Can Language Models Be Specific? How? CoRR abs/2210.05159 (2022) - [i49]Omer Anjum, Alok Kamatar, Toby Liang, Jinjun Xiong, Wen-Mei Hwu:
Submission-Aware Reviewer Profiling for Reviewer Recommender System. CoRR abs/2211.04194 (2022) - [i48]Mohammad Almasri, Yen-Hsiang Chang, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-mei W. Hwu:
Parallelizing Maximal Clique Enumeration on GPUs. CoRR abs/2212.01473 (2022) - 2021
- [j78]Seungwon Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei W. Hwu:
Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture. Proc. VLDB Endow. 14(11): 2087-2100 (2021) - [j77]Sitao Huang, Kun Wu, Hyunmin Jeong, Chengyue Wang, Deming Chen, Wen-Mei Hwu:
PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow. IEEE Trans. Computers 70(12): 2015-2028 (2021) - [j76]Qin Li, Xiaofan Zhang, Jinjun Xiong, Wen-Mei Hwu, Deming Chen:
Efficient Methods for Mapping Neural Machine Translator on FPGAs. IEEE Trans. Parallel Distributed Syst. 32(7): 1866-1877 (2021) - [c225]Sultan Durrani, Muhammad Saad Chughtai, Mert Hidayetoglu, Rashid Tahir, Abdul Dakkak, Lawrence Rauchwerger, Fareed Zaffar, Wen-Mei W. Hwu:
Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles. PACT 2021: 345-355 - [c224]Jie Huang, Kevin Chang, Jinjun Xiong, Wen-Mei Hwu:
Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach. ACL/IJCNLP (1) 2021: 3641-3651 - [c223]Ashutosh Dhar, Paul Reckamp, Jinjun Xiong, Wen-Mei Hwu, Deming Chen:
Graviton: A Reconfigurable Memory-Compute Fabric for Data Intensive Applications. ARC 2021: 254-264 - [c222]Sitao Huang, Aayush Ankit, Plínio Silveira, Rodrigo Antunes, Sai Rahul Chalamalasetti, Izzat El Hajj, Dong Eun Kim, Glaucimar Aguiar, Pedro Bruel, Sergey Serebryakov, Cong Xu, Can Li, Paolo Faraboschi, John Paul Strachan, Deming Chen, Kaushik Roy, Wen-Mei W. Hwu, Dejan S. Milojicic:
Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators. ASP-DAC 2021: 372-377 - [c221]Jiachen Li, Bowen Cheng, Rogério Feris, Jinjun Xiong, Thomas S. Huang, Wen-Mei Hwu, Humphrey Shi:
Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection. CVPR Workshops 2021: 2378-2387 - [c220]Chengyue Wang, Sitao Huang, Wen-Mei Hwu, Deming Chen:
Extending HLS with High-Level Descriptive Language for Configurable Algorithm-Level Spatial Structure Design. FCCM 2021: 261 - [c219]Sitao Huang, Kun Wu, Hyunmin Jeong, Chengyue Wang, Deming Chen, Wen-Mei Hwu:
PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow. FPGA 2021: 227-228 - [c218]Carl Pearson, Kun Wu, I-Hsin Chung, Jinjun Xiong, Wen-Mei Hwu:
TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes. HPDC 2021: 95-106 - [c217]Mohammad Almasri, Neo Vasudeva, Rakesh Nagi, Jinjun Xiong, Wen-Mei Hwu:
HyKernel: A Hybrid Selection of One/Two-Phase Kernels for Triangle Counting on GPUs. HPEC 2021: 1-7 - [c216]Zhonghao Wang, Kai Wang, Mo Yu, Jinjun Xiong, Wen-Mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi:
Interpretable Visual Reasoning via Induced Symbolic Space. ICCV 2021: 1858-1867 - [c215]Sultan Durrani, Muhammad Saad Chughtai, Abdul Dakkak, Wen-Mei Hwu, Lawrence Rauchwerger:
FFT blitz: the tensor cores strike back. PPoPP 2021: 488-489 - [c214]Omer Anjum, Mohammad Almasri, Jinjun Xiong, Wen-Mei W. Hwu:
PhraseScope: An Effective and Unsupervised Framework for Mining High Quality Phrases. SDM 2021: 639-647 - [i47]Vikram Sharma Mailthody, James Wei, Nicholas Chen, Mohammad Behnia, Ruihao Yao, Qihao Wang, Vedant Agrawal, Churan He, Lijian Wang, Leihao Chen, Amit Agarwal, Edward Richter, Wen-Mei Hwu, Christopher W. Fletcher, Jinjun Xiong, Andrew Miller, Sanjay Patel:
Safer Illinois and RokWall: Privacy Preserving University Health Apps for COVID-19. CoRR abs/2101.07897 (2021) - [i46]Seungwon Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, Wen-Mei W. Hwu:
PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses. CoRR abs/2101.07956 (2021) - [i45]Seungwon Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, Wen-Mei W. Hwu:
Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture. CoRR abs/2103.03330 (2021) - [i44]Mohammad Almasri, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-Mei W. Hwu:
K-Clique Counting on GPUs. CoRR abs/2104.13209 (2021) - [i43]Jiachen Li, Bowen Cheng, Rogério Feris, Jinjun Xiong, Thomas S. Huang, Wen-Mei Hwu, Humphrey Shi:
Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection. CoRR abs/2104.14082 (2021) - [i42]Jie Huang, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-mei W. Hwu:
Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach. CoRR abs/2105.13255 (2021) - [i41]Jie Huang, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei W. Hwu:
Open Relation Modeling: Learning to Define Relations between Entities. CoRR abs/2108.09241 (2021) - [i40]Yen-Hsiang Chang, Jianhao Pu, Wen-Mei W. Hwu, Jinjun Xiong:
MLHarness: A Scalable Benchmarking System for MLCommons. CoRR abs/2111.05231 (2021) - [i39]Seungwon Min, Kun Wu, Mert Hidayetoglu, Jinjun Xiong, Xiang Song, Wen-mei W. Hwu:
Graph Neural Network Training with Data Tiering. CoRR abs/2111.05894 (2021) - 2020
- [j75]Seungwon Min, Vikram Sharma Mailthody, Zaid Qureshi, Jinjun Xiong, Eiman Ebrahimi, Wen-Mei Hwu:
EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs. Proc. VLDB Endow. 14(2): 114-127 (2020) - [j74]Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Sapan Agarwal, Matthew J. Marinella, Martin Foltin, John Paul Strachan, Dejan S. Milojicic, Wen-Mei Hwu, Kaushik Roy:
PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-Efficient ReRAM. IEEE Trans. Computers 69(8): 1128-1142 (2020) - [c213]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-Mei W. Hwu:
The Design and Implementation of a Scalable Deep Learning Benchmarking Platform. CLOUD 2020: 414-425 - [c212]Abdul Dakkak, Tom Wickham-Jones, Wen-Mei Hwu:
The design and implementation of the wolfram language compiler. CGO 2020: 212-228 - [c211]Omer Anjum, Chak Ho Chan, Tanitpong Lawphongpanich, Yucheng Liang, Tianyi Tang, Shuchen Zhang, Wen-Mei Hwu, Jinjun Xiong, Sanjay Patel:
Vertext: An End-to-end AI Powered Conversation Management System for Multi-party Chat Platforms. CSCW Companion 2020: 1-6 - [c210]Zhonghao Wang, Yunchao Wei, Rogério Schmidt Feris, Jinjun Xiong, Wen-Mei W. Hwu, Thomas S. Huang, Honghui Shi:
Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation. CVPR Workshops 2020: 4043-4047 - [c209]Zhonghao Wang, Mo Yu, Yunchao Wei, Rogério Feris, Jinjun Xiong, Wen-Mei Hwu, Thomas S. Huang, Honghui Shi:
Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation. CVPR 2020: 12632-12641 - [c208]Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, Jinjun Xiong, Wen-mei W. Hwu, Deming Chen:
EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions. DAC 2020: 1-6 - [c207]Jie Huang, Zilong Wang, Kevin Chang, Wen-Mei Hwu, Jinjun Xiong:
Exploring Semantic Capacity of Terms. EMNLP (1) 2020: 8509-8518 - [c206]Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, Jinjun Xiong, Wen-Mei Hwu, Deming Chen:
Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices. ACM Great Lakes Symposium on VLSI 2020: 283-290 - [c205]Mert Hidayetoglu, Carl Pearson, Vikram Sharma Mailthody, Eiman Ebrahimi, Jinjun Xiong, Rakesh Nagi, Wen-Mei Hwu:
At-Scale Sparse Deep Neural Network Inference With Efficient GPU Implementation. HPEC 2020: 1-7 - [c204]Xiaofan Zhang, Hanchen Ye, Junsong Wang, Yonghua Lin, Jinjun Xiong, Wen-Mei Hwu, Deming Chen:
DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator. ICCAD 2020: 61:1-61:9 - [c203]Mohamed El-Hadedy, Martin Margala, Sergiu Mosanu, Danilo Gligoroski, Jinjun Xiong, Wen-Mei Hwu:
Micro - GAGE: A Low-power Compact GAGE Hash Function Processor for IoT Applications. ICECS 2020: 1-4 - [c202]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wei Wei, Lingjie Xu, Wen-Mei Hwu:
XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs. IPDPS 2020: 326-327 - [c201]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-Mei Hwu:
Benanza: Automatic μBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs. IPDPS 2020: 440-450 - [c200]Carl Pearson, Mert Hidayetoglu, Mohammad Almasri, Omer Anjum, I-Hsin Chung, Jinjun Xiong, Wen-Mei W. Hwu:
Node-Aware Stencil Communication for Heterogeneous Supercomputers. IPDPS Workshops 2020: 796-805 - [c199]Wen-Mei Hwu:
Advancing Computing Infrastructure for Very Large-Scale Deep Learning at C3SR. IPDPS Workshops 2020: 989 - [c198]Ashutosh Dhar, Xiaohao Wang, Hubertus Franke, Jinjun Xiong, Jian Huang, Wen-Mei W. Hwu, Nam Sung Kim, Deming Chen:
FReaC Cache: Folded-logic Reconfigurable Computing in the Last Level Cache. MICRO 2020: 102-117 - [c197]Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, Jinjun Xiong, Thomas S. Huang, Honghui Shi, Wen-Mei Hwu, Deming Chen:
SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems. MLSys 2020 - [c196]Abdul Dakkak, Cheng Li, Jinjun Xiong, Wen-mei W. Hwu:
DLSpec: A Deep Learning Task Exchange Specification. OpML 2020 - [c195]Mert Hidayetoglu, Tekin Bicer, Simon Garcia De Gonzalo, Bin Ren, Vincent De Andrade, Doga Gürsoy, Raj Kettimuthu, Ian T. Foster, Wen-mei W. Hwu:
Petascale XCT: 3D image reconstruction with hierarchical communications on multi-GPU nodes. SC 2020: 37 - [c194]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-Mei Hwu:
DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs. ICPE 2020: 202-209 - [e5]Eduard Ayguadé, Wen-mei W. Hwu, Rosa M. Badia, H. Peter Hofstee:
ICS '20: 2020 International Conference on Supercomputing, Barcelona Spain, June, 2020. ACM 2020, ISBN 978-1-4503-7983-0 [contents] - [i38]Abdul Dakkak, Cheng Li, Jinjun Xiong, Wen-Mei Hwu:
MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale. CoRR abs/2002.08295 (2020) - [i37]Abdul Dakkak, Cheng Li, Jinjun Xiong, Wen-Mei Hwu:
DLSpec: A Deep Learning Task Exchange Specification. CoRR abs/2002.11262 (2020) - [i36]Zhonghao Wang, Mo Yu, Yunchao Wei, Rogério Schmidt Feris, Jinjun Xiong, Wen-Mei Hwu, Thomas S. Huang, Honghui Shi:
Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation. CoRR abs/2003.08040 (2020) - [i35]Zhonghao Wang, Yunchao Wei, Rogério Feris, Jinjun Xiong, Wen-Mei Hwu, Thomas S. Huang, Honghui Shi:
Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation. CoRR abs/2004.00794 (2020) - [i34]Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, Jinjun Xiong, Wen-Mei W. Hwu, Deming Chen:
EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions. CoRR abs/2005.02563 (2020) - [i33]Seungwon Min, Vikram Sharma Mailthody, Zaid Qureshi, Jinjun Xiong, Eiman Ebrahimi, Wen-Mei W. Hwu:
EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs. CoRR abs/2006.06890 (2020) - [i32]Mert Hidayetoglu, Carl Pearson, Vikram Sharma Mailthody, Eiman Ebrahimi, Jinjun Xiong, Rakesh Nagi, Wen-mei W. Hwu:
Efficient Inference on GPUs for the Sparse Deep Neural Network Graph Challenge 2020. CoRR abs/2007.14152 (2020) - [i31]Zaid Qureshi, Vikram Sharma Mailthody, Seungwon Min, I-Hsin Chung, Jinjun Xiong, Wen-Mei W. Hwu:
Tearing Down the Memory Wall. CoRR abs/2008.10169 (2020) - [i30]Xiaofan Zhang, Hanchen Ye, Junsong Wang, Yonghua Lin, Jinjun Xiong, Wen-Mei W. Hwu, Deming Chen:
DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator. CoRR abs/2008.12745 (2020) - [i29]Mert Hidayetoglu, Tekin Bicer, Simon Garcia De Gonzalo, Bin Ren, Vincent De Andrade, Doga Gürsoy, Raj Kettimuthu, Ian T. Foster, Wen-mei W. Hwu:
Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes. CoRR abs/2009.07226 (2020) - [i28]Jie Huang, Zilong Wang, Kevin Chen-Chuan Chang, Wen-Mei Hwu, Jinjun Xiong:
Exploring Semantic Capacity of Terms. CoRR abs/2010.01898 (2020) - [i27]Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, Jinjun Xiong, Wen-Mei Hwu, Deming Chen:
Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices. CoRR abs/2010.07185 (2020) - [i26]Zhonghao Wang, Mo Yu, Kai Wang, Jinjun Xiong, Wen-Mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi:
Interpretable Visual Reasoning via Induced Symbolic Space. CoRR abs/2011.11603 (2020) - [i25]Carl Pearson, Kun Wu, I-Hsin Chung, Jinjun Xiong, Wen-Mei Hwu:
Fast CUDA-Aware MPI Datatypes without Platform Support. CoRR abs/2012.14363 (2020)
2010 – 2019
- 2019
- [c193]Abdul Dakkak, Cheng Li, Simon Garcia De Gonzalo, Jinjun Xiong, Wen-Mei Hwu:
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function-as-a-Service. CLOUD 2019: 372-382 - [c192]Qin Li, Xiaofan Zhang, Jinjun Xiong, Wen-Mei Hwu, Deming Chen:
Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS. ASP-DAC 2019: 693-698 - [c191]Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei W. Hwu, John Paul Strachan, Kaushik Roy, Dejan S. Milojicic:
PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference. ASPLOS 2019: 715-731 - [c190]Ahmed H. M. O. Abulila, Vikram Sharma Mailthody, Zaid Qureshi, Jian Huang, Nam Sung Kim, Jinjun Xiong, Wen-Mei W. Hwu:
FlatFlash: Exploiting the Byte-Accessibility of SSDs within a Unified Memory-Storage Hierarchy. ASPLOS 2019: 971-985 - [c189]Simon Garcia De Gonzalo, Sitao Huang, Juan Gómez-Luna, Simon D. Hammond, Onur Mutlu, Wen-Mei Hwu:
Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs. CGO 2019: 73-84 - [c188]Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, Jinjun Xiong, Kyle Rupnow, Wen-Mei Hwu, Deming Chen:
FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge. DAC 2019: 206 - [c187]Omer Anjum, Hongyu Gong, Suma Bhat, Wen-Mei Hwu, Jinjun Xiong:
PaRe: A Paper-Reviewer Matching Approach Using a Common Topic Space. EMNLP/IJCNLP (1) 2019: 518-528 - [c186]Seungwon Min, Sitao Huang, Mohamed El-Hadedy, Jinjun Xiong, Deming Chen, Wen-Mei Hwu:
Analysis and Optimization of I/O Cache Coherency Strategies for SoC-FPGA Device. FPL 2019: 301-306 - [c185]Omer Anjum, Simon Garcia De Gonzalo, Mert Hidayetoglu, Wen-Mei Hwu:
An Efficient GPU Implementation Technique for Higher-Order 3D Stencils. HPCC/SmartCity/DSS 2019: 552-561 - [c184]Mohammad Almasri, Omer Anjum, Carl Pearson, Zaid Qureshi, Vikram S. Mailthody, Rakesh Nagi, Jinjun Xiong, Wen-Mei W. Hwu:
Update on k-truss Decomposition on GPU. HPEC 2019: 1-7 - [c183]Sitao Huang, Carl Pearson, Rakesh Nagi, Jinjun Xiong, Deming Chen, Wen-Mei W. Hwu:
Accelerating Sparse Deep Neural Networks on FPGAs. HPEC 2019: 1-7 - [c182]Carl Pearson, Mohammad Almasri, Omer Anjum, Vikram S. Mailthody, Zaid Qureshi, Rakesh Nagi, Jinjun Xiong, Wen-Mei W. Hwu:
Update on Triangle Counting on GPU. HPEC 2019: 1-7 - [c181]Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, Jinjun Xiong, Wen-Mei Hwu, Junli Gu, Deming Chen:
NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving. ICCAD 2019: 1-8 - [c180]Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, Jinjun Xiong, Thomas S. Huang, Wen-Mei Hwu, Honghui Shi:
SPGNet: Semantic Prediction Guidance for Scene Parsing. ICCV 2019: 5217-5227 - [c179]Abdul Dakkak, Cheng Li, Jinjun Xiong, Isaac Gelado, Wen-Mei W. Hwu:
Accelerating reduction and scan using tensor core units. ICS 2019: 46-57 - [c178]Ashutosh Dhar, Sitao Huang, Jinjun Xiong, Damir A. Jamsek, Bruno Mesnet, Jian Huang, Nam Sung Kim, Wen-Mei W. Hwu, Deming Chen:
Near-Memory and In-Storage FPGA Acceleration for Emerging Cognitive Computing Workloads. ISVLSI 2019: 68-75 - [c177]Vikram Sharma Mailthody, Zaid Qureshi, Weixin Liang, Ziyan Feng, Simon Garcia De Gonzalo, Youjie Li, Hubertus Franke, Jinjun Xiong, Jian Huang, Wen-Mei Hwu:
DeepStore: In-Storage Acceleration for Intelligent Queries. MICRO 2019: 224-238 - [c176]Hongyu Gong, Suma Bhat, Lingfei Wu, Jinjun Xiong, Wen-Mei W. Hwu:
Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus. NAACL-HLT (1) 2019: 3168-3180 - [c175]Mert Hidayetoglu, Tekin Biçer, Simon Garcia De Gonzalo, Bin Ren, Doga Gürsoy, Rajkumar Kettimuthu, Ian T. Foster, Wen-mei W. Hwu:
MemXCT: memory-centric X-ray CT reconstruction with massive parallelization. SC 2019: 85:1-85:56 - [c174]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-Mei Hwu:
MLModelScope: Evaluate and Introspect Cognitive Pipelines. SERVICES 2019: 335-338 - [c173]Sitao Huang, Li-Wen Chang, Izzat El Hajj, Simon Garcia De Gonzalo, Juan Gómez-Luna, Sai Rahul Chalamalasetti, Mohamed El-Hadedy, Dejan S. Milojicic, Onur Mutlu, Deming Chen, Wen-Mei W. Hwu:
Analysis and Modeling of Collaborative Execution Strategies for Heterogeneous CPU-FPGA Architectures. ICPE 2019: 79-90 - [c172]Carl Pearson, Abdul Dakkak, Sarah Hashash, Cheng Li, I-Hsin Chung, Jinjun Xiong, Wen-Mei Hwu:
Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects. ICPE 2019: 209-218 - [i24]Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-Mei Hwu, John Paul Strachan, Kaushik Roy, Dejan S. Milojicic:
PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference. CoRR abs/1901.10351 (2019) - [i23]Hongyu Gong, Suma Bhat, Lingfei Wu, Jinjun Xiong, Wen-Mei Hwu:
Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus. CoRR abs/1903.10671 (2019) - [i22]Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, Jinjun Xiong, Kyle Rupnow, Wen-Mei Hwu, Deming Chen:
FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge. CoRR abs/1904.04421 (2019) - [i21]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-Mei Hwu:
Challenges and Pitfalls of Reproducing Machine Learning Artifacts. CoRR abs/1904.12437 (2019) - [i20]Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, Jinjun Xiong, Wen-Mei W. Hwu, Deming Chen:
A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices. CoRR abs/1905.08369 (2019) - [i19]Omer Anjum, Wen-Mei Hwu, Jinjun Xiong:
A Retrospective Recount of Computer Architecture Research with a Data-Driven Study of Over Four Decades of ISCA Publications. CoRR abs/1906.09380 (2019) - [i18]Xiaofan Zhang, Cong Hao, Haoming Lu, Jiachen Li, Yuhong Li, Yuchen Fan, Kyle Rupnow, Jinjun Xiong, Thomas S. Huang, Honghui Shi, Wen-Mei Hwu, Deming Chen:
SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection. CoRR abs/1906.10327 (2019) - [i17]Seungwon Min, Sitao Huang, Mohamed El-Hadedy, Jinjun Xiong, Deming Chen, Wen-Mei Hwu:
Analysis and Optimization of I/O Cache Coherency Strategies for SoC-FPGA Device. CoRR abs/1908.01261 (2019) - [i16]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wei Wei, Lingjie Xu, Wen-Mei Hwu:
Across-Stack Profiling and Characterization of Machine Learning Models on GPUs. CoRR abs/1908.06869 (2019) - [i15]Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, Jinjun Xiong, Thomas S. Huang, Wen-Mei Hwu, Honghui Shi:
SPGNet: Semantic Prediction Guidance for Scene Parsing. CoRR abs/1908.09798 (2019) - [i14]Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, Jinjun Xiong, Thomas S. Huang, Honghui Shi, Wen-mei W. Hwu, Deming Chen:
SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems. CoRR abs/1909.09709 (2019) - [i13]Omer Anjum, Hongyu Gong, Suma Bhat, Wen-Mei Hwu, Jinjun Xiong:
PaRe: A Paper-Reviewer Matching Approach Using a Common Topic Space. CoRR abs/1909.11258 (2019) - [i12]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-Mei Hwu:
Benanza: Automatic μBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs. CoRR abs/1911.06922 (2019) - [i11]Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, Jinjun Xiong, Wen-Mei Hwu, Junli Gu, Deming Chen:
NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving. CoRR abs/1911.07446 (2019) - [i10]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-Mei W. Hwu:
DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs. CoRR abs/1911.07967 (2019) - [i9]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-Mei W. Hwu:
The Design and Implementation of a Scalable DL Benchmarking Platform. CoRR abs/1911.08031 (2019) - [i8]Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Sapan Agarwal, Matthew J. Marinella, Martin Foltin, John Paul Strachan, Dejan S. Milojicic, Wen-Mei W. Hwu, Kaushik Roy:
PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-efficient ReRAM. CoRR abs/1912.11516 (2019) - 2018
- [j73]Seungwon Min, Mohammad Alian, Wen-Mei Hwu, Nam Sung Kim:
Semi-Coherent DMA: An Alternative I/O Coherency Management for Embedded Systems. IEEE Comput. Archit. Lett. 17(2): 221-224 (2018) - [j72]José M. Cecilia, Antonio Llanes, José L. Abellán, Juan Gómez-Luna, Li-Wen Chang, Wen-Mei W. Hwu:
High-throughput Ant Colony Optimization on graphics processing units. J. Parallel Distributed Comput. 113: 261-274 (2018) - [j71]Onur Mutlu, Scott A. Mahlke, Thomas M. Conte, Wen-Mei W. Hwu:
Iterative Modulo Scheduling. IEEE Micro 38(1): 115-117 (2018) - [j70]Wen-Mei Hwu, Sanjay J. Patel:
Accelerator Architectures A Ten-Year Retrospective. IEEE Micro 38(6): 56-62 (2018) - [c171]Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-Mei W. Hwu, Deming Chen:
AccDNN: An IP-Based DNN Generator for FPGAs. FCCM 2018: 210 - [c170]Sitao Huang, Mohamed El-Hadedy, Cong Hao, Qin Li, Vikram S. Mailthody, Ketan Date, Jinjun Xiong, Deming Chen, Rakesh Nagi, Wen-Mei Hwu:
Triangle Counting and Truss Decomposition using FPGA. HPEC 2018: 1-7 - [c169]Vikram S. Mailthody, Ketan Date, Zaid Qureshi, Carl Pearson, Rakesh Nagi, Jinjun Xiong, Wen-Mei Hwu:
Collaborative (CPU + GPU) Algorithms for Triangle Counting and Truss Decomposition. HPEC 2018: 1-7 - [c168]Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-Mei W. Hwu, Deming Chen:
DNNBuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAs. ICCAD 2018: 56 - [c167]Joao Ambrosi, Aayush Ankit, Rodrigo Antunes, Sai Rahul Chalamalasetti, Soumitra Chatterjee, Izzat El Hajj, Guilherme Fachini, Paolo Faraboschi, Martin Foltin, Sitao Huang, Wen-Mei Hwu, Gustavo Knuppe, Sunil Vishwanathpur Lakshminarasimha, Dejan S. Milojicic, Mohan Parthasarathy, Filipe Ribeiro, Lucas Rosa, Kaushik Roy, Plínio Silveira, John Paul Strachan:
Hardware-Software Co-Design for an Analog-Digital Accelerator for Machine Learning. ICRC 2018: 1-13 - [c166]Mert Hidayetoglu, Carl Pearson, Izzat El Hajj, Levent Gürel, Weng Cho Chew, Wen-Mei W. Hwu:
A Fast and Massively-Parallel Inverse Solver for Multiple-Scattering Tomographic Image Reconstruction. IPDPS 2018: 64-74 - [c165]Mohammad Alian, Seungwon Min, Hadi Asgharimoghaddam, Ashutosh Dhar, Dong Kai Wang, Thomas Roewer, Adam J. McPadden, Oliver O'Halloran, Deming Chen, Jinjun Xiong, Daehoon Kim, Wen-Mei W. Hwu, Nam Sung Kim:
Application-Transparent Near-Memory Processing Architecture with Memory Channel Network. MICRO 2018: 802-814 - [c164]Carl Pearson, I-Hsin Chung, Zehra Sura, Wen-Mei Hwu, Jinjun Xiong:
NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems. ISC Workshops 2018: 448-454 - [i7]Raymond A. Yeh, Jinjun Xiong, Wen-mei W. Hwu, Minh N. Do, Alexander G. Schwing:
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts. CoRR abs/1803.11209 (2018) - [i6]Carl Pearson, Abdul Dakkak, Cheng Li, Sarah Hashash, Jinjun Xiong, Wen-Mei W. Hwu:
SCOPE: C3SR Systems Characterization and Benchmarking Framework. CoRR abs/1809.08311 (2018) - [i5]Bowen Cheng, Yunchao Wei, Rogério Schmidt Feris, Jinjun Xiong, Wen-mei W. Hwu, Thomas S. Huang, Honghui Shi:
Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection. CoRR abs/1810.04002 (2018) - [i4]Bowen Cheng, Yunchao Wei, Jiahui Yu, Shiyu Chang, Jinjun Xiong, Wen-Mei W. Hwu, Thomas S. Huang, Humphrey Shi:
A Simple Non-i.i.d. Sampling Approach for Efficient Training and Better Generalization. CoRR abs/1811.09347 (2018) - [i3]Abdul Dakkak, Cheng Li, Simon Garcia De Gonzalo, Jinjun Xiong, Wen-Mei W. Hwu:
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments. CoRR abs/1811.09732 (2018) - [i2]Abdul Dakkak, Cheng Li, Isaac Gelado, Jinjun Xiong, Wen-Mei W. Hwu:
Accelerating Reduction and Scan Using Tensor Core Units. CoRR abs/1811.09736 (2018) - [i1]Abdul Dakkak, Cheng Li, Abhishek Srivastava, Jinjun Xiong, Wen-Mei W. Hwu:
MLModelScope: Evaluate and Measure ML Models within AI Pipelines. CoRR abs/1811.09737 (2018) - 2017
- [j69]Nam Sung Kim, Deming Chen, Jinjun Xiong, Wen-mei W. Hwu:
Heterogeneous Computing Meets Near-Memory Acceleration and High-Level Synthesis in the Post-Moore Era. IEEE Micro 37(4): 10-18 (2017) - [j68]Izzat El Hajj, Thomas B. Jablin, Dejan S. Milojicic, Wen-Mei W. Hwu:
SAVI objects: sharing and virtuality incorporated. Proc. ACM Program. Lang. 1(OOPSLA): 45:1-45:24 (2017) - [c163]Sitao Huang, Gowthami Jayashri Manikandan, Anand Ramachandran, Kyle Rupnow, Wen-mei W. Hwu, Deming Chen:
Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling. FPGA 2017: 275-284 - [c162]Simon Garcia De Gonzalo, Simon D. Hammond, Christian R. Trott, Wen-Mei W. Hwu:
Revisiting Online Autotuning for Sparse-Matrix Vector Multiplication Kernels on Next-Generation Architectures. HPCC/SmartCity/DSS 2017: 72-80 - [c161]Ketan Date, Keven Feng, Rakesh Nagi, Jinjun Xiong, Nam Sung Kim, Wen-Mei W. Hwu:
Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture: Static graph challenge: Subgraph isomorphism. HPEC 2017: 1-7 - [c160]Pedro Bruel, Sai Rahul Chalamalasetti, Chris I. Dalton, Izzat El Hajj, Alfredo Goldman, Catherine Graves, Wen-Mei W. Hwu, Phil Laplante, Dejan S. Milojicic, Geoffrey Ndu, John Paul Strachan:
Generalize or Die: Operating Systems Support for Memristor-Based Accelerators. ICRC 2017: 1-8 - [c159]Wen-mei W. Hwu, Izzat El Hajj, Simon Garcia De Gonzalo, Carl Pearson, Nam Sung Kim, Deming Chen, Jinjun Xiong, Zehra Sura:
Rebooting the Data Access Hierarchy of Computing Systems. ICRC 2017: 1-4 - [c158]Abdul Dakkak, Carl Pearson, Cheng Li, Wen-mei W. Hwu:
RAI: A Scalable Project Submission System for Parallel Programming Courses. IPDPS Workshops 2017: 315-322 - [c157]Wen-Mei W. Hwu:
Keynote: Architecture and software for emerging low-power systems. ISLPED 2017: 1 - [c156]Juan Gómez-Luna, Izzat El Hajj, Li-Wen Chang, Victor Garcia-Flores, Simon Garcia De Gonzalo, Thomas B. Jablin, Antonio J. Peña, Wen-mei W. Hwu:
Chai: Collaborative heterogeneous applications for integrated-architectures. ISPASS 2017: 43-54 - [c155]Raymond A. Yeh, Jinjun Xiong, Wen-Mei W. Hwu, Minh N. Do, Alexander G. Schwing:
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts. NIPS 2017: 1912-1922 - [c154]Francesc Lordan, Rosa M. Badia, Wen-Mei Hwu:
Enabling GPU Support for the COMPSs-Mobile Framework. WACCPD@SC 2017: 83-102 - [c153]Li-Wen Chang, Juan Gómez-Luna, Izzat El Hajj, Sitao Huang, Deming Chen, Wen-mei W. Hwu:
Collaborative Computing for Heterogeneous Integrated Systems. ICPE 2017: 385-388 - 2016
- [j67]Yun Heo, Anand Ramachandran, Wen-mei W. Hwu, Jian Ma, Deming Chen:
BLESS 2: accurate, memory-efficient and fast error correction method. Bioinform. 32(15): 2369-2371 (2016) - [j66]Deming Chen, Jason Cong, Swathi T. Gurumani, Wen-mei W. Hwu, Kyle Rupnow, Zhiru Zhang:
Platform choices and design demands for IoT platforms: cost, power, and performance tradeoffs. IET Cyper-Phys. Syst.: Theory & Appl. 1(1): 70-77 (2016) - [j65]Onur Mutlu, Rich Belgard, Thomas R. Gross, Norman P. Jouppi, John L. Hennessy, Steven A. Przybylski, Chris Rowen, Yale N. Patt, Wen-mei W. Hwu, Stephen W. Melvin, Michael Shebanow, Tse-Yu Yeh, Andy Wolfe:
Common Bonds: MIPS, HPS, Two-Level Branch Prediction, and Compressed Code RISC Processor. IEEE Micro 36(4): 70-85 (2016) - [j64]Ying Chen, Tan Nguyen, Yao Chen, Swathi T. Gurumani, Yun Liang, Kyle Rupnow, Jason Cong, Wen-mei W. Hwu, Deming Chen:
FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs With the FCUDA Flow. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 35(12): 2032-2045 (2016) - [j63]Juan Gómez-Luna, I-Jui Sung, Li-Wen Chang, José María González-Linares, Nicolás Guil, Wen-mei W. Hwu:
In-Place Matrix Transposition on GPUs. IEEE Trans. Parallel Distributed Syst. 27(3): 776-788 (2016) - [c152]Izzat El Hajj, Alexander Merritt, Gerd Zellweger, Dejan S. Milojicic, Reto Achermann, Paolo Faraboschi, Wen-mei W. Hwu, Timothy Roscoe, Karsten Schwan:
SpaceJMP: Programming with Multiple Virtual Address Spaces. ASPLOS 2016: 353-368 - [c151]Li-Wen Chang, Hee-Seok Kim, Wen-mei W. Hwu:
DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model. ASPLOS 2016: 667-680 - [c150]Gowthami Jayashri Manikandan, Sitao Huang, Kyle Rupnow, Wen-mei W. Hwu, Deming Chen:
Acceleration of the Pair-HMM Algorithm for DNA Variant Calling. FCCM 2016: 137 - [c149]Subho S. Banerjee, Arjun P. Athreya, Liudmila S. Mainzer, C. Victor Jongeneel, Wen-mei W. Hwu, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer:
Efficient and Scalable Workflows for Genomic Analyses. DIDC@HPDC 2016: 27-36 - [c148]Wen-mei W. Hwu:
AsHES 2016 Keynote. IPDPS Workshops 2016: 610 - [c147]Abdul Dakkak, Carl Pearson, Wen-mei W. Hwu:
WebGPU: A Scalable Online Development Platform for GPU Programming Courses. IPDPS Workshops 2016: 942-949 - [c146]Li-Wen Chang, Izzat El Hajj, Christopher I. Rodrigues, Juan Gómez-Luna, Wen-mei W. Hwu:
Efficient kernel synthesis for performance portable programming. MICRO 2016: 12:1-12:13 - [c145]Izzat El Hajj, Juan Gómez-Luna, Cheng Li, Li-Wen Chang, Dejan S. Milojicic, Wen-mei W. Hwu:
KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism. MICRO 2016: 13:1-13:12 - [c144]Li-Wen Chang, Izzat El Hajj, Hee-Seok Kim, Juan Gómez-Luna, Abdul Dakkak, Wen-mei W. Hwu:
A programming system for future proofing performance critical libraries. PPoPP 2016: 32:1-32:2 - [c143]Sao-Jie Chen, Grace Liu, Hsin-Ping Yang, Cheng-Hao Luo, Wen-Mei W. Hwu:
Design of a power-efficient ARM processor with a timing-error detection and correction mechanism. SoCC 2016: 217-222 - [e4]Ayal Zaks, Bilha Mendelson, Lawrence Rauchwerger, Wen-mei W. Hwu:
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016, Haifa, Israel, September 11-15, 2016. ACM 2016, ISBN 978-1-4503-4121-9 [contents] - 2015
- [j62]Hiroyuki Takizawa, Shoichi Hirasawa, Makoto Sugawara, Isaac Gelado, Hiroaki Kobayashi, Wen-mei W. Hwu:
Optimized Data Transfers Based on the OpenCL Event Management Mechanism. Sci. Program. 2015: 576498:1-576498:16 (2015) - [j61]Javier Cabezas, Isaac Gelado, John E. Stone, Nacho Navarro, David Blair Kirk, Wen-mei W. Hwu:
Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications. IEEE Trans. Parallel Distributed Syst. 26(5): 1405-1418 (2015) - [c142]Hee-Seok Kim, Izzat El Hajj, John A. Stratton, Steven S. Lumetta, Wen-mei W. Hwu:
Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures. CGO 2015: 257-268 - [c141]Anand Ramachandran, Yun Heo, Wen-mei W. Hwu, Jian Ma, Deming Chen:
FPGA accelerated DNA error correction. DATE 2015: 1371-1376 - [c140]Juan Gómez-Luna, Li-Wen Chang, I-Jui Sung, Wen-mei W. Hwu, Nicolás Guil:
In-Place Data Sliding Algorithms for Many-Core Architectures. ICPP 2015: 210-219 - [c139]Javier Cabezas, Lluís Vilanova, Isaac Gelado, Thomas B. Jablin, Nacho Navarro, Wen-mei W. Hwu:
Automatic Parallelization of Kernels in Shared-Memory Multi-GPU Nodes. ICS 2015: 3-13 - [c138]Javier Cabezas, Marc Jordà, Isaac Gelado, Nacho Navarro, Wen-mei W. Hwu:
GPU-SM: shared memory multi-GPU programming. GPGPU@PPoPP 2015: 13-24 - [c137]Nicholas Haydel, Sandra Gesing, Ian J. Taylor, Gregory R. Madey, Abdul Dakkak, Simon Garcia De Gonzalo, Wen-mei W. Hwu:
Enhancing the Usability and Utilization of Accelerated Architectures via Docker. UCC 2015: 361-367 - 2014
- [j60]Yun Heo, Xiaolong Wu, Deming Chen, Jian Ma, Wen-mei W. Hwu:
BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads. Bioinform. 30(10): 1354-1362 (2014) - [j59]Wen-mei W. Hwu:
What is ahead for parallel computing. J. Parallel Distributed Comput. 74(7): 2574-2581 (2014) - [c136]Javier Cabezas, Lluís Vilanova, Isaac Gelado, Thomas B. Jablin, Nacho Navarro, Wen-mei W. Hwu:
Automatic execution of single-GPU computations across multiple GPUs. PACT 2014: 467-468 - [c135]Xuhao Chen, Shengzhao Wu, Li-Wen Chang, Wei-Sheng Huang, Carl Pearson, Zhiying Wang, Wen-mei W. Hwu:
Adaptive Cache Bypass and Insertion for Many-core Accelerators. MES 2014: 1-8 - [c134]Xuhao Chen, Li-Wen Chang, Christopher I. Rodrigues, Jie Lv, Zhiying Wang, Wen-mei W. Hwu:
Adaptive Cache Management for Energy-Efficient GPU Computing. MICRO 2014: 343-355 - [c133]I-Jui Sung, Juan Gómez-Luna, José María González-Linares, Nicolás Guil, Wen-mei W. Hwu:
In-place transposition of rectangular matrices on accelerators. PPoPP 2014: 207-218 - [c132]Christopher I. Rodrigues, Thomas B. Jablin, Abdul Dakkak, Wen-mei W. Hwu:
Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing. PPoPP 2014: 247-258 - [c131]Guido Juckeland, William C. Brantley, Sunita Chandrasekaran, Barbara M. Chapman, Shuai Che, Mathew E. Colgrove, Huiyu Feng, Alexander Grund, Robert Henschel, Wen-mei W. Hwu, Huian Li, Matthias S. Müller, Wolfgang E. Nagel, Maxim Perminov, Pavel Shelepugin, Kevin Skadron, John A. Stratton, Alexey Titov, Ke Wang, G. Matthijs van Waveren, Brian Whitney, Sandra Wienke, Rengan Xu, Kalyan Kumaran:
SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance. PMBS@SC 2014: 46-67 - [p1]Li-Wen Chang, Wen-mei W. Hwu:
A Guide for Implementing Tridiagonal Solvers on GPUs. Numerical Computations with GPUs 2014: 29-44 - 2013
- [j58]Ian C. Atkinson, Geng (Daniel) Liu, Nady Obeid, Keith R. Thulborn, Wen-mei W. Hwu:
Rapid computation of sodium bioscales using gpu-accelerated image reconstruction. Int. J. Imaging Syst. Technol. 23(1): 29-35 (2013) - [j57]Jiading Gai, Nady Obeid, Joseph L. Holtrop, Xiaolong Wu, Fan Lam, Maojing Fu, Justin P. Haldar, Wen-mei W. Hwu, Zhi-Pei Liang, Bradley P. Sutton:
More IMPATIENT: A gridding-accelerated Toeplitz-based strategy for non-Cartesian high-resolution 3D MRI on GPUs. J. Parallel Distributed Comput. 73(5): 686-697 (2013) - [j56]Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen-mei W. Hwu:
Efficient compilation of CUDA kernels for high-performance computing on FPGAs. ACM Trans. Embed. Comput. Syst. 13(2): 25:1-25:26 (2013) - [j55]Xiaohuang Huang, Christopher I. Rodrigues, Stephen Jones, Ian Buck, Wen-mei W. Hwu:
Scalable SIMD-parallel memory allocation for many-core machines. J. Supercomput. 64(3): 1008-1020 (2013) - [c130]Ivan Tanasic, Lluís Vilanova, Marc Jordà, Javier Cabezas, Isaac Gelado, Nacho Navarro, Wen-mei W. Hwu:
Comparison based sorting for systems with multiple GPUs. GPGPU@ASPLOS 2013: 1-11 - [c129]Alexandros Papakonstantinou, Deming Chen, Wen-mei W. Hwu, Jason Cong, Yun Liang:
Throughput-oriented kernel porting onto FPGAs. DAC 2013: 11:1-11:10 - [c128]Hiroyuki Takizawa, Makoto Sugawara, Shoichi Hirasawa, Isaac Gelado, Hiroaki Kobayashi, Wen-mei W. Hwu:
clMPI: An OpenCL Extension for Interoperation with the Message Passing Interface. IPDPS Workshops 2013: 1138-1148 - [c127]Wen-mei W. Hwu:
Rethinking computer architecture for throughput computing. ICSAMOS 2013 - 2012
- [b2]Hyesoon Kim, Richard W. Vuduc, Sara S. Baghsorkhi, JeeWhan Choi, Wen-mei W. Hwu:
Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU). Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers 2012, ISBN 978-3-031-00609-8 - [j54]Xiaolong Wu, Yun Heo, Izzat El Hajj, Wen-mei W. Hwu, Deming Chen, Jian Ma:
TIGER: tiled iterative genome assembler. BMC Bioinform. 13(S-19): S18 (2012) - [j53]John A. Stratton, Christopher I. Rodrigues, I-Jui Sung, Li-Wen Chang, Nasser Anssari, Geng (Daniel) Liu, Wen-mei W. Hwu, Nady Obeid:
Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems. Computer 45(8): 26-32 (2012) - [j52]I-Jui Sung, Nasser Anssari, John A. Stratton, Wen-mei W. Hwu:
Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications. Int. J. Parallel Program. 40(1): 4-24 (2012) - [c126]Hee-Seok Kim, Minwook Ahn, John A. Stratton, Wen-mei W. Hwu:
Design evaluation of OpenCL compiler framework for Coarse-Grained Reconfigurable Arrays. FPT 2012: 313-320 - [c125]Kai-Wei Chang, Biplab Deka, Wen-mei W. Hwu, Dan Roth:
Efficient Pattern-Based Time Series Classification on GPU. ICDM 2012: 131-140 - [c124]Sara S. Baghsorkhi, Isaac Gelado, Matthieu Delahaye, Wen-mei W. Hwu:
Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors. PPoPP 2012: 23-34 - [c123]Li-Wen Chang, John A. Stratton, Hee-Seok Kim, Wen-mei W. Hwu:
A scalable, numerically stable, high-performance tridiagonal solver using GPUs. SC 2012: 27 - 2011
- [j51]Michael T. Showerman, Jeremy Enos, Craig P. Steffen, Sean Treichler, William Gropp, Wen-mei W. Hwu:
EcoG: A Power-Efficient GPU Cluster Architecture for Scientific Computing. Comput. Sci. Eng. 13(2): 83-87 (2011) - [c122]Alexandros Papakonstantinou, Yun Liang, John A. Stratton, Karthik Gururaj, Deming Chen, Wen-mei W. Hwu, Jason Cong:
Multilevel Granularity Parallelism Synthesis on FPGAs. FCCM 2011: 178-185 - [c121]Li-Wen Chang, Men-Tzung Lo, Nasser Anssari, Ke-Hsin Hsu, Norden E. Huang, Wen-mei W. Hwu:
Parallel implementation of Multi-dimensional Ensemble Empirical Mode Decomposition. ICASSP 2011: 1621-1624 - [c120]Hee-Seok Kim, Shengzhao Wu, Li-Wen Chang, Wen-mei W. Hwu:
A Scalable Tridiagonal Solver for GPUs. ICPP 2011: 444-453 - [c119]Per Stenström, Doug Burger, Wen-mei W. Hwu, Vipin Kumar, Kunle Olukotun, David A. Padua, Burton Smith:
Panel Statement. IPDPS 2011: 877 - [c118]Xiaolong Wu, Jiading Gai, Fan Lam, Maojing Fu, Justin P. Haldar, Yue Zhuo, Zhi-Pei Liang, Wen-mei W. Hwu, Bradley P. Sutton:
Impatient MRI: Illinois Massively Parallel Acceleration Toolkit for image reconstruction with enhanced throughput in MRI. ISBI 2011: 69-72 - [c117]Xiaolong Wu, Yue Zhuo, Jiading Gai, Fan Lam, Maojing Fu, Justin P. Haldar, Wen-mei W. Hwu, Zhi-Pei Liang, Bradley P. Sutton:
Advanced MRI reconstruction toolbox with accelerating on GPU. Parallel Processing for Imaging Applications 2011: 78720Q - [r2]Wen-mei W. Hwu:
Superscalar Processors. Encyclopedia of Parallel Computing 2011: 1962-1966 - 2010
- [b1]David Blair Kirk, Wen-mei W. Hwu:
Programming Massively Parallel Processors - A Hands-on Approach. Morgan Kaufmann 2010, ISBN 978-0-12-381472-2, pp. I-XVIII, 1-258 - [j50]Volodymyr V. Kindratenko, Robert B. Wilhelmson, Robert J. Brunner, Todd J. Martínez, Wen-mei W. Hwu:
High-Performance Computing with Accelerators. Comput. Sci. Eng. 12(4): 12-16 (2010) - [c116]Xiaohuang Huang, Christopher I. Rodrigues, Stephen Jones, Ian Buck, Wen-mei W. Hwu:
XMalloc: A Scalable Lock-free Dynamic Memory Allocator for Many-core Machines. CIT 2010: 1134-1139 - [c115]Xiaolong Wu, Nady Obeid, Wen-mei W. Hwu:
Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures. CIT 2010: 1175-1180 - [c114]Wen-mei W. Hwu:
Raising the level of many-core programming with compiler technology: meeting a grand challenge. PACT 2010: 5-6 - [c113]I-Jui Sung, John A. Stratton, Wen-mei W. Hwu:
Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. PACT 2010: 513-522 - [c112]Isaac Gelado, Javier Cabezas, Nacho Navarro, John E. Stone, Sanjay J. Patel, Wen-mei W. Hwu:
An asymmetric distributed shared memory model for heterogeneous parallel systems. ASPLOS 2010: 347-358 - [c111]John A. Stratton, Vinod Grover, Jaydeep Marathe, Bastiaan Aarts, Mike Murphy, Ziang Hu, Wen-mei W. Hwu:
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs. CGO 2010: 111-119 - [c110]Lijuan Luo, Martin D. F. Wong, Wen-mei W. Hwu:
An effective GPU implementation of breadth-first search. DAC 2010: 52-55 - [c109]Yue Zhuo, Xiaolong Wu, Justin P. Haldar, Wen-mei W. Hwu, Zhi-Pei Liang, Bradley P. Sutton:
Accelerating iterative field-compensated MR image reconstruction on GPUS. ISBI 2010: 820-823 - [c108]Stephen M. Kofsky, Daniel R. Johnson, John A. Stratton, Wen-mei W. Hwu, Sanjay J. Patel, Steven S. Lumetta:
Implementing a GPU Programming Model on a Non-GPU Accelerator Architecture. ISCA Workshops 2010: 40-51 - [c107]Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. Patel, William D. Gropp, Wen-mei W. Hwu:
An adaptive performance modeling tool for GPU architectures. PPoPP 2010: 105-114
2000 – 2009
- 2009
- [j49]Wen-mei W. Hwu, Christopher I. Rodrigues, Shane Ryoo, John A. Stratton:
Compute Unified Device Architecture Application Suitability. Comput. Sci. Eng. 11(3): 16-26 (2009) - [j48]Hillery C. Hunter, Erik M. Nystrom, Daniel A. Connors, Wen-mei W. Hwu:
Hardware-compiler co-design for adjustable data power savings. Microprocess. Microsystems 33(4): 244-253 (2009) - [j47]Dennis J. Lin, Xiaohuang Huang, Quang H. Nguyen, Joshua Blackburn, Christopher I. Rodrigues, Thomas S. Huang, Minh N. Do, Sanjay J. Patel, Wen-Mei W. Hwu:
The parallelization of video processing. IEEE Signal Process. Mag. 26(6): 103-112 (2009) - [c106]John E. Stone, Jan Saam, David J. Hardy, Kirby L. Vandivort, Wen-mei W. Hwu, Klaus Schulten:
High performance computation and interactive display of molecular orbitals on GPUs and multi-core CPUs. GPGPU 2009: 9-18 - [c105]Albert Sidelnik, I-Jui Sung, Wanmin Wu, María Jesús Garzarán, Wen-mei W. Hwu, Klara Nahrstedt, David A. Padua, Sanjay J. Patel:
Optimization of tele-immersion codes. GPGPU 2009: 85-93 - [c104]Volodymyr V. Kindratenko, Jeremy Enos, Guochun Shi, Michael T. Showerman, Galen Wesley Arnold, John E. Stone, James C. Phillips, Wen-mei W. Hwu:
GPU clusters for high-performance computing. CLUSTER 2009: 1-8 - [c103]Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen-mei W. Hwu:
High-performance CUDA kernel execution on FPGAs. ICS 2009: 515-516 - [c102]Wen-mei W. Hwu:
Many-core parallel computing - Can compilers and tools do the heavy lifting? IPDPS 2009: 1 - [c101]Elijah Roberts, John E. Stone, Leonardo Sepulveda, Wen-mei W. Hwu, Zaida Luthey-Schulten:
Long time-scale simulations of in vivo diffusion using GPU hardware. IPDPS 2009: 1-8 - [c100]Wen-mei W. Hwu, Deepthi Nandakumar, Justin P. Haldar, Ian C. Atkinson, Bradley P. Sutton, Zhi-Pei Liang, Keith R. Thulborn:
Accelerating MR Image Reconstruction on GPUS. ISBI 2009: 1283-1286 - [c99]Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen-mei W. Hwu:
FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs. SASP 2009: 35-42 - 2008
- [j46]David Yeh, Li-Shiuan Peh, Shekhar Borkar, John A. Darringer, Anant Agarwal, Wen-mei W. Hwu:
Thousand-Core Chips [Roundtable]. IEEE Des. Test Comput. 25(3): 272-278 (2008) - [j45]Wen-mei W. Hwu, Kurt Keutzer, Timothy G. Mattson:
The Concurrency Challenge. IEEE Des. Test Comput. 25(4): 312-320 (2008) - [j44]Sam S. Stone, Justin P. Haldar, Stephanie C. Tsao, Wen-mei W. Hwu, Bradley P. Sutton, Zhi-Pei Liang:
Accelerating advanced MRI reconstructions on GPUs. J. Parallel Distributed Comput. 68(10): 1307-1318 (2008) - [j43]Shane Ryoo, Christopher I. Rodrigues, Sam S. Stone, John A. Stratton, Sain-Zee Ueng, Sara S. Baghsorkhi, Wen-mei W. Hwu:
Program optimization carving for GPU computing. J. Parallel Distributed Comput. 68(10): 1389-1401 (2008) - [j42]Sanjay J. Patel, Wen-mei W. Hwu:
Guest Editors' Introduction: Accelerator Architectures. IEEE Micro 28(4): 4-12 (2008) - [c98]Sam S. Stone, Justin P. Haldar, Stephanie C. Tsao, Wen-mei W. Hwu, Zhi-Pei Liang, Bradley P. Sutton:
Accelerating advanced mri reconstructions on gpus. Conf. Computing Frontiers 2008: 261-272 - [c97]Christopher I. Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-mei W. Hwu:
GPU acceleration of cutoff pair potentials for molecular modeling applications. Conf. Computing Frontiers 2008: 273-282 - [c96]Shane Ryoo, Christopher I. Rodrigues, Sam S. Stone, Sara S. Baghsorkhi, Sain-Zee Ueng, John A. Stratton, Wen-mei W. Hwu:
Program optimization space pruning for a multithreaded gpu. CGO 2008: 195-204 - [c95]Elaine Wah, Erik Johnson, Loretta Auvil, Umesh Thakkar, Wen-mei W. Hwu, David Blair Kirk, Thom H. Dunning, Sharon C. Glotzer:
Visualization and Analysis of GPU Summer School Applicants and Participants. eScience 2008: 362-363 - [c94]Isaac Gelado, John H. Kelm, Shane Ryoo, Steven S. Lumetta, Nacho Navarro, Wen-mei W. Hwu:
CUBA: an architecture for efficient CPU/co-processor data communication. ICS 2008: 299-308 - [c93]Sain-Zee Ueng, Melvin Lathara, Sara S. Baghsorkhi, Wen-mei W. Hwu:
CUDA-Lite: Reducing GPU Programming Complexity. LCPC 2008: 1-15 - [c92]John A. Stratton, Sam S. Stone, Wen-mei W. Hwu:
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs. LCPC 2008: 16-30 - [c91]Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David Blair Kirk, Wen-mei W. Hwu:
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. PPoPP 2008: 73-82 - [c90]Alexandros Papakonstantinou, Deming Chen, Wen-mei W. Hwu:
Application Acceleration with the Explicitly Parallel Operations System - the EPOS Processor. SASP 2008: 20-25 - 2007
- [j41]Ravishankar K. Iyer, Zbigniew Kalbarczyk, Karthik Pattabiraman, William Healey, Wen-mei W. Hwu, Peter Klemperer, Reza Farivar:
Toward Application-Aware Security and Reliability. IEEE Secur. Priv. 5(1): 57-62 (2007) - [j40]Shane Ryoo, Sain-Zee Ueng, Christopher I. Rodrigues, Robert E. Kidd, Matthew I. Frank, Wen-mei W. Hwu:
Automatic Discovery of Coarse-Grained Parallelism in Media Applications. Trans. High Perform. Embed. Archit. Compil. 1: 194-213 (2007) - [c89]John H. Kelm, Isaac Gelado, Mark J. Murphy, Nacho Navarro, Steven S. Lumetta, Wen-mei W. Hwu:
CIGAR: Application Partitioning for a CPU/Coprocessor Architecture. PACT 2007: 317-326 - [c88]Lauren Sarno, Wen-mei W. Hwu, Craig Lund, Markus Levy, James R. Larus, James Reinders, Gordon Cameron, Chris Lennard, Takashi Yoshimori:
Corezilla: Build and Tame the Multicore Beast? DAC 2007: 632-633 - [c87]Wen-mei W. Hwu, Shane Ryoo, Sain-Zee Ueng, John H. Kelm, Isaac Gelado, Sam S. Stone, Robert E. Kidd, Sara S. Baghsorkhi, Aqeel Mahesri, Stephanie C. Tsao, Nacho Navarro, Steven S. Lumetta, Matthew I. Frank, Sanjay J. Patel:
Implicitly Parallel Programming Models for Thousand-Core Microprocessors. DAC 2007: 754-759 - [c86]Shane Ryoo, Christopher I. Rodrigues, Wen-mei W. Hwu:
Iteration Disambiguation for Parallelism Identification in Time-Sliced Applications. LCPC 2007: 110-124 - 2006
- [j39]Ronald D. Barnes, Shane Ryoo, Wen-mei W. Hwu:
Tolerating Cache-Miss Latency with Multipass Pipelines. IEEE Micro 26(1): 40-47 (2006) - [j38]Ronald D. Barnes, John W. Sias, Erik M. Nystrom, Sanjay J. Patel, Nacho Navarro, Wen-mei W. Hwu:
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining. IEEE Trans. Computers 55(1): 18-33 (2006) - 2005
- [j37]Wen-mei W. Hwu, Krishna V. Palem:
Guest Editors' Introduction. IEEE Trans. Computers 54(10): 1185-1187 (2005) - [c85]Wen-mei W. Hwu, Sanjay J. Patel:
The Future of Computer Architecture Research: An Industrial Perspective. HPCA 2005: 264 - [c84]Ronald D. Barnes, Shane Ryoo, Wen-mei W. Hwu:
"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense. MICRO 2005: 319-330 - [e3]Thomas M. Conte, Nacho Navarro, Wen-mei W. Hwu, Mateo Valero, Theo Ungerer:
High Performance Embedded Architectures and Compilers, First International Conference, HiPEAC 2005, Barcelona, Spain, November 17-18, 2005, Proceedings. Lecture Notes in Computer Science 3793, Springer 2005, ISBN 3-540-30317-0 [contents] - 2004
- [c83]John W. Sias, Sain-Zee Ueng, Geoff A. Kent, Ian M. Steiner, Erik M. Nystrom, Wen-mei W. Hwu:
Field-testing IMPACT EPIC research results in Itanium 2. ISCA 2004: 26-39 - [c82]Lakshmi N. Chakrapani, John C. Gyllenhaal, Wen-mei W. Hwu, Scott A. Mahlke, Krishna V. Palem, Rodric M. Rabbah:
Trimaran: An Infrastructure for Research in Instruction-Level Parallelism. LCPC 2004: 32-41 - [c81]Erik M. Nystrom, Hong-Seok Kim, Wen-mei W. Hwu:
Importance of heap specialization in pointer analysis. PASTE 2004: 43-48 - [c80]Erik M. Nystrom, Hong-Seok Kim, Wen-mei W. Hwu:
Bottom-Up and Top-Down Context-Sensitive Summary-Based Pointer Analysis. SAS 2004: 165-180 - 2003
- [j36]Jeffrey P. Monks, Jean-Pierre Ebert, Wen-mei W. Hwu, Adam Wolisz:
Energy saving and capacity improvement potential of power control in multi-hop wireless networks. Comput. Networks 41(3): 313-330 (2003) - [c79]Ronald D. Barnes, Erik M. Nystrom, John W. Sias, Sanjay J. Patel, Nacho Navarro, Wen-mei W. Hwu:
Beating in-order stalls with "flea-flicker" two-pass pipelining. MICRO 2003: 387-398 - [e2]Richard Johnson, Tom Conte, Wen-mei W. Hwu:
1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 23-26 March 2003, San Francisco, CA, USA. IEEE Computer Society 2003, ISBN 0-7695-1913-X [contents] - 2002
- [c78]Hillery C. Hunter, Wen-mei W. Hwu:
Code coverage and input variability: effects on architecture and compiler research. CASES 2002: 79-87 - [c77]Ronald D. Barnes, Erik M. Nystrom, Matthew C. Merten, Wen-mei W. Hwu:
Vacuum packing: extracting hardware-detected program phases for post-link optimization. MICRO 2002: 233-244 - 2001
- [j35]Wen-Mei W. Hwu, David I. August, John W. Sias:
Program decision logic optimization using predication and control speculation. Proc. IEEE 89(11): 1660-1675 (2001) - [j34]Matthew C. Merten, Andrew R. Trick, Ronald D. Barnes, Erik M. Nystrom, Christopher N. George, John C. Gyllenhaal, Wen-mei W. Hwu:
An Architectural Framework for Runtime Optimization. IEEE Trans. Computers 50(6): 567-589 (2001) - [c76]Erik M. Nystrom, Ronald D. Barnes, Matthew C. Merten, Wen-mei W. Hwu:
Code Reordering and Speculation Support for Dynamic Optimization System. IEEE PACT 2001: 163-174 - [c75]Jeffrey P. Monks, Vaduvur Bharghavan, Wen-mei W. Hwu:
A Power Controlled Multiple Access Protocol for Wireless Packet Networks. INFOCOM 2001: 219-228 - [c74]Jeffrey P. Monks, Jean-Pierre Ebert, Adam Wolisz, Wen-mei W. Hwu:
A Study of the Energy Saving and Capacity Improvement Potential of Power Control in Multi-Hop Wireless Networks. LCN 2001: 550-559 - [c73]Matthew C. Merten, Wen-mei W. Hwu:
Modulo schedule buffers. MICRO 2001: 138-149 - [c72]John W. Sias, Hillery C. Hunter, Wen-mei W. Hwu:
Enhancing loop buffering of media and telecommunications applications using low-overhead predication. MICRO 2001: 262-273 - 2000
- [c71]Daniel A. Connors, Hillery C. Hunter, Ben-Chung Cheng, Wen-mei W. Hwu:
Hardware Support for Dynamic Management of Compiler-Directed Computation Reuse. ASPLOS 2000: 222-233 - [c70]Matthew C. Merten, Andrew R. Trick, Erik M. Nystrom, Ronald D. Barnes, Wen-mei W. Hwu:
A hardware mechanism for dynamic extraction and relayout of program hot spots. ISCA 2000: 59-70 - [c69]Jeffrey P. Monks, Vaduvur Bharghavan, Wen-mei W. Hwu:
Transmission Power Control for Multiple Access Wireless Packet Networks. LCN 2000: 12-21 - [c68]John W. Sias, Wen-mei W. Hwu, David I. August:
Accurate and efficient predicate analysis with binary decision diagrams. MICRO 2000: 112-123 - [c67]Ben-Chung Cheng, Wen-mei W. Hwu:
Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation. PLDI 2000: 57-69
1990 – 1999
- 1999
- [j33]Thomas M. Conte, Wen-mei W. Hwu, Mark Smotherman:
Editor's Introduction. Int. J. Parallel Program. 27(5): 325-326 (1999) - [j32]David I. August, Wen-mei W. Hwu, Scott A. Mahlke:
The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication. Int. J. Parallel Program. 27(5): 381-423 (1999) - [j31]Thomas M. Conte, Wen-mei W. Hwu, Mark Smotherman:
Editors' Introduction. Int. J. Parallel Program. 27(6): 425-426 (1999) - [j30]Teresa L. Johnson, Daniel A. Connors, Matthew C. Merten, Wen-mei W. Hwu:
Run-Time Cache Bypassing. IEEE Trans. Computers 48(12): 1338-1354 (1999) - [c66]Daniel A. Connors, Jean-Michel Puiatti, David I. August, Kevin M. Crozier, Wen-mei W. Hwu:
An Architecture Framework for Introducing Predicated Execution into Embedded Microprocessors. Euro-Par 1999: 1301-1311 - [c65]Matthew C. Merten, Andrew R. Trick, Christopher N. George, John C. Gyllenhaal, Wen-mei W. Hwu:
A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization. ISCA 1999: 136-147 - [c64]David I. August, John W. Sias, Jean-Michel Puiatti, Scott A. Mahlke, Daniel A. Connors, Kevin M. Crozier, Wen-mei W. Hwu:
The Program Decision Logic Approach to Predicated Execution. ISCA 1999: 208-219 - [c63]Ben-Chung Cheng, Wen-mei W. Hwu:
An Empirical Study of Function Pointers Using SPEC Benchmarks. LCPC 1999: 490-493 - [c62]Daniel A. Connors, Wen-mei W. Hwu:
Compiler-Directed Dynamic Computation Reuse: Rationale and Initial Results. MICRO 1999: 158-169 - [c61]Le-Chun Wu, Rajiv Mirani, Harish Patil, Bruce Olsen, Wen-mei W. Hwu:
A New Framework for Debugging Globally Optimized Code. PLDI 1999: 181-191 - [r1]Daniel A. Connors, Wen-mei W. Hwu:
Architecture. The VLSI Handbook 1999 - 1998
- [j29]Wen-mei W. Hwu:
Introduction to Predicate Execution. Computer 31: 49-50 (1998) - [j28]Steve Beaty, Wen-mei W. Hwu:
Foreword to the Special Issue. Int. J. Parallel Program. 26(4): 345-347 (1998) - [j27]John C. Gyllenhaal, Wen-mei W. Hwu, B. Ramakrishna Rau:
Optimization of Machine Descriptions for Efficient Use. Int. J. Parallel Program. 26(4): 417-447 (1998) - [j26]Thomas M. Conte, Mary Ann Hirsch, Wen-mei W. Hwu:
Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation. IEEE Trans. Computers 47(6): 714-720 (1998) - [c60]Brian L. Deitrich, Ben-Chung Cheng, Wen-mei W. Hwu:
Improving Static Branch Prediction in a Compiler. IEEE PACT 1998: 214-221 - [c59]Teresa L. Johnson, Daniel A. Connors, Wen-mei W. Hwu:
Run-Time Adaptive Cache Management. HICSS (7) 1998: 774-775 - [c58]Wen-mei W. Hwu, Yale N. Patt:
Retrospective: HPSm, a High Performance Restricted Data Flow Architecture Having Minimal Functionality. 25 Years ISCA: Retrospectives and Reprints 1998: 43-44 - [c57]Wen-mei W. Hwu:
Retrospective: IMPACT: An Architectural Framework for Multiple-Instruction Issue. 25 Years ISCA: Retrospectives and Reprints 1998: 77-79 - [c56]David I. August, Daniel A. Connors, Scott A. Mahlke, John W. Sias, Kevin M. Crozier, Ben-Chung Cheng, Patrick R. Eaton, Qudus B. Olaniran, Wen-mei W. Hwu:
Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture. ISCA 1998: 227-237 - [c55]Wen-mei W. Hwu, Yale N. Patt:
HPSm, a High Performance Restricted Data Flow Architecture Having Minimal Functionality. 25 Years ISCA: Retrospectives and Reprints 1998: 300-308 - [c54]Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Nancy J. Warter, Wen-mei W. Hwu:
IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors. 25 Years ISCA: Retrospectives and Reprints 1998: 408-417 - [c53]Ben-Chung Cheng, Daniel A. Connors, Wen-mei W. Hwu:
Compiler-Directed Early Load-Address Generation. MICRO 1998: 138-147 - 1997
- [j25]Cheng-Hsueh A. Hsieh, Marie T. Conte, Teresa L. Johnson, John C. Gyllenhaal, Wen-mei W. Hwu:
Optimizing NET Compilers for Improved Java Performance. Computer 30(6): 67-75 (1997) - [j24]Richard E. Hank, Wen-mei W. Hwu, B. Ramakrishna Rau:
Region-based compilation: Introduction, motivation, and initial experience. Int. J. Parallel Program. 25(2): 113-146 (1997) - [c52]Cheng-Hsueh A. Hsieh, Marie T. Conte, Teresa L. Johnson, John C. Gyllenhaal, Wen-mei W. Hwu:
A study of the cache and branch performance issues with running Java on current hardware platforms. COMPCON 1997: 211-216 - [c51]David I. August, Daniel A. Connors, John C. Gyllenhaal, Wen-mei W. Hwu:
Architectural Support for Compiler-Synthesized Dynamic Branch Prediction Strategies: Rationale and Initial Results. HPCA 1997: 84-93 - [c50]Teresa L. Johnson, Wen-mei W. Hwu:
Run-Time Adaptive Cache Hierarchy Management via Reference Analysis. ISCA 1997: 315-326 - [c49]Teresa L. Johnson, Matthew C. Merten, Wen-mei W. Hwu:
Run-Time Spatial Locality Detection and Optimization. MICRO 1997: 57-64 - [c48]David I. August, Wen-mei W. Hwu, Scott A. Mahlke:
A Framework for Balancing Control Flow and Predication. MICRO 1997: 92-103 - 1996
- [j23]Matthew K. Farrens, Wen-mei W. Hwu:
Guest Editors' Introduction. Int. J. Parallel Program. 24(1): 1-2 (1996) - [c47]Brian L. Deitrich, Wen-mei W. Hwu:
Speculative Hedge: Regulating Compile-time Speculation Against Profile Variations. MICRO 1996: 70-79 - [c46]Cheng-Hsueh A. Hsieh, John C. Gyllenhaal, Wen-mei W. Hwu:
Java Bytecode to Native Code Translation: The Caffeine Prototype and Preliminary Results. MICRO 1996: 90-99 - [c45]Daniel M. Lavery, Wen-mei W. Hwu:
Modulo Scheduling of Loops in Control-intensive Non-numeric Programs. MICRO 1996: 126-137 - [c44]John C. Gyllenhaal, Wen-mei W. Hwu, B. Ramakrishna Rau:
Optimization of Machine Descriptions for Efficient Use. MICRO 1996: 349-358 - 1995
- [j22]Thomas M. Conte, Wen-mei W. Hwu:
Advances in Benchmarking Techniques: New Standards and Quantitative Metrics. Adv. Comput. 41: 231-253 (1995) - [j21]Wen-Mei W. Hwu, Richard E. Hank, David M. Gallagher, Scott A. Mahlke, Daniel M. Lavery, Grant E. Haab, John C. Gyllenhaal, David I. August:
Compiler technology for future microprocessors. Proc. IEEE 83(12): 1625-1640 (1995) - [j20]Chung-Chi Jim Li, Shyh-Kwei Chen, W. Kent Fuchs, Wen-mei W. Hwu:
Compiler-Based Multiple Instruction Retry. IEEE Trans. Computers 44(1): 35-46 (1995) - [j19]Pohua P. Chang, Daniel M. Lavery, Scott A. Mahlke, William Y. Chen, Wen-mei W. Hwu:
The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors. IEEE Trans. Computers 44(3): 353-370 (1995) - [j18]Pohua P. Chang, Nancy J. Warter, Scott A. Mahlke, William Y. Chen, Wen-mei W. Hwu:
Three Architecutral Models for Compiler-Controlled Speculative Execution. IEEE Trans. Computers 44(4): 481-494 (1995) - [j17]Neal J. Alewine, Shyh-Kwei Chen, W. Kent Fuchs, Wen-mei W. Hwu:
Compiler-Assisted Multiple Instruction Rollback Recovery Using a Read Buffer. IEEE Trans. Computers 44(9): 1096-1107 (1995) - [c43]Roger A. Bringmann, Scott A. Mahlke, Wen-mei W. Hwu:
A study of the effects of compiler-controlled speculation on instruction and data caches. HICSS (1) 1995: 211-220 - [c42]Scott A. Mahlke, Richard E. Hank, James E. McCormick, David I. August, Wen-mei W. Hwu:
A Comparison of Full and Partial Predicated Execution Support for ILP Processors. ISCA 1995: 138-150 - [c41]Richard E. Hank, Wen-mei W. Hwu, B. Ramakrishna Rau:
Region-based compilation: an introduction and motivation. MICRO 1995: 158-168 - [c40]Daniel M. Lavery, Wen-mei W. Hwu:
Unrolling-based optimizations for modulo scheduling. MICRO 1995: 327-337 - 1994
- [j16]William Y. Chen, Scott A. Mahlke, Nancy J. Warter, Sadun Anik, Wen-mei W. Hwu:
Profile-assisted instruction scheduling. Int. J. Parallel Program. 22(2): 151-181 (1994) - [j15]Wen-mei W. Hwu, Alex Nicolau:
From the guest editors. Int. J. Parallel Program. 22(3): 207-208 (1994) - [j14]Sadun Anik, Wen-mei W. Hwu:
Performance Implications of Synchronization Support for Parallel Fortran Programs. J. Parallel Distributed Comput. 22(2): 202-215 (1994) - [j13]Shyh-Kwei Chen, Neal J. Alewine, W. Kent Fuchs, Wen-mei W. Hwu:
Incremental Compiler Transformations for Multiple Instruction Retry. Softw. Pract. Exp. 24(12): 1179-1198 (1994) - [j12]Wen-mei W. Hwu, Thomas M. Conte:
The Susceptibility of Programs to Context Switching. IEEE Trans. Computers 43(9): 994-1003 (1994) - [c39]David M. Gallagher, William Y. Chen, Scott A. Mahlke, John C. Gyllenhaal, Wen-mei W. Hwu:
Dynamic Memory Disambiguation Using the Memory Conflict Buffer. ASPLOS 1994: 183-193 - [c38]Shyh-Kwei Chen, W. Kent Fuchs, Wen-mei W. Hwu:
An Analytical Approach to Scheduling Code for Superscalar and VLIW Architectures. ICPP (1) 1994: 285-292 - [c37]Yoji Yamada, John C. Gyllenhaal, Grant E. Haab, Wen-mei W. Hwu:
Data relocation and prefetching for programs with large data sets. MICRO 1994: 118-127 - [c36]Scott A. Mahlke, Richard E. Hank, Roger A. Bringmann, John C. Gyllenhaal, David M. Gallagher, Wen-mei W. Hwu:
Characterizing the impact of predicated execution on branch prediction. MICRO 1994: 217-227 - 1993
- [j11]Aloke Gupta, Wen-mei W. Hwu:
An execution Profiler for Window-oriented Applications. Softw. Pract. Exp. 23(5): 487-510 (1993) - [j10]William Y. Chen, Pohua P. Chang, Thomas M. Conte, Wen-mei W. Hwu:
The Effect of Code Expanding Optimizations on Instruction Cache Design. IEEE Trans. Computers 42(9): 1045-1057 (1993) - [j9]Wen-mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, Daniel M. Lavery:
The superblock: An effective technique for VLIW and superscalar compilation. J. Supercomput. 7(1-2): 229-248 (1993) - [j8]Scott A. Mahlke, William Y. Chen, Roger A. Bringmann, Richard E. Hank, Wen-mei W. Hwu, B. Ramakrishna Rau, Michael S. Schlansker:
Sentinel Scheduling for VLIW and Superscalar Processors. ACM Trans. Comput. Syst. 11(4): 376-408 (1993) - [c35]W. Kent Fuchs, Wen-mei W. Hwu, Neal J. Alewine:
Application of Compiler-Assisted Rollback Recovery to Speculative Execution Repair. Hardware and Software Architectures for Fault Tolerance 1993: 45-65 - [c34]Tokuzo Kiyohara, Scott A. Mahlke, William Y. Chen, Roger A. Bringmann, Richard E. Hank, Sadun Anik, Wen-mei W. Hwu:
Register Connection: A New Approach to Adding Registers into Instruction Set Architectures. ISCA 1993: 247-256 - [c33]Roger A. Bringmann, Scott A. Mahlke, Richard E. Hank, John C. Gyllenhaal, Wen-mei W. Hwu:
Speculative execution exception recovery using write-back suppression. MICRO 1993: 214-223 - [c32]Richard E. Hank, Scott A. Mahlke, Roger A. Bringmann, John C. Gyllenhaal, Wen-mei W. Hwu:
Superblock formation using static program analysis. MICRO 1993: 247-255 - [c31]Nancy J. Warter, Scott A. Mahlke, Wen-mei W. Hwu, B. Ramakrishna Rau:
Reverse If-Conversion. PLDI 1993: 290-299 - 1992
- [j7]Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Wen-mei W. Hwu:
Profile-guided Automatic Inline Expansion for C Programs. Softw. Pract. Exp. 22(5): 349-369 (1992) - [j6]Wen-mei W. Hwu, Pohua P. Chang:
Efficient Instruction Sequencing with Inline Target Insertion. IEEE Trans. Computers 41(12): 1537-1551 (1992) - [c30]Scott A. Mahlke, William Y. Chen, Wen-mei W. Hwu, B. Ramakrishna Rau, Michael S. Schlansker:
Sentinel Scheduling for VLIW and Superscalar Processors. ASPLOS 1992: 238-247 - [c29]Neal J. Alewine, Shyh-Kwei Chen, Chung-Chi Jim Li, W. Kent Fuchs, Wen-mei W. Hwu:
Branch Recovery with Compiler-Assisted Multiple Instruction Retry. FTCS 1992: 66-73 - [c28]William Y. Chen, Scott A. Mahlke, Wen-mei W. Hwu:
Tolerating First Level Memory Access Latency in High-Performance Systems. ICPP (1) 1992: 36-43 - [c27]Sadun Anik, Wen-mei W. Hwu:
Executing Nested Parallel Loops on Shared-Memory Multiprocessors. ICPP (3) 1992: 241-244 - [c26]William Y. Chen, Scott A. Mahlke, Wen-mei W. Hwu, Tokuzo Kiyohara, Pohua P. Chang:
Tolerating data access latency with register preloading. ICS 1992: 104-113 - [c25]William Y. Chen, Roger A. Bringmann, Scott A. Mahlke, Sadun Anik, Tokuzo Kiyohara, Nancy J. Warter, Daniel M. Lavery, Wen-mei W. Hwu, Richard E. Hank, John C. Gyllenhaal:
Using Profile Information to Assist Advaced Compiler Optimization and Scheduling. LCPC 1992: 31-48 - [c24]Thomas M. Conte, Wen-mei W. Hwu:
Systematic prototyping of superscalar computer architectures. RSP 1992: 161-170 - [c23]Scott A. Mahlke, William Y. Chen, John C. Gyllenhaal, Wen-mei W. Hwu:
Compiler Code Transformations for Superscalar-Based High Performance Systems. SC 1992: 808-817 - [c22]Aloke Gupta, Wen-mei W. Hwu:
Xprof: Profiling the Execution of X Window Programs. SIGMETRICS 1992: 253-254 - [e1]Wen-mei W. Hwu:
Proceedings of the 25th Annual International Symposium on Microarchitecture, Portland, Oregon, USA, November 1992. ACM / IEEE Computer Society 1992, ISBN 0-8186-3175-9 [contents] - 1991
- [j5]Thomas M. Conte, Wen-mei W. Hwu:
Benchmark Characterization. Computer 24(1): 48-56 (1991) - [j4]Thomas M. Conte, Wen-mei W. Hwu:
A brief survey of benchmark usage in the architecture community. SIGARCH Comput. Archit. News 19(4): 37-44 (1991) - [j3]Pohua P. Chang, Scott A. Mahlke, Wen-mei W. Hwu:
Using Profile Information to Assist Classic Code Optimizations. Softw. Pract. Exp. 21(12): 1301-1321 (1991) - [c21]Scott A. Mahlke, Nancy J. Warter, William Y. Chen, Pohua P. Chang, Wen-mei W. Hwu:
The Effect of Compiler Optimizations on Available Parallelism in Scalar Programs. ICPP (2) 1991: 142-145 - [c20]Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Nancy J. Warter, Wen-mei W. Hwu:
IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors. ISCA 1991: 266-275 - [c19]Pohua P. Chang, William Y. Chen, Scott A. Mahlke, Wen-mei W. Hwu:
Comparing Static and Dynamic Code Scheduling for Multiple-Instruction-Issue Processors. MICRO 1991: 25-33 - [c18]William Y. Chen, Scott A. Mahlke, Pohua P. Chang, Wen-mei W. Hwu:
Data Access Microarchitectures for Superscalar Processors with Compiler-Assisted Data Prefetching. MICRO 1991: 69-73 - 1990
- [j2]Andy Glew, Wen-Mei Hwu:
Snoopy cache test-and-test-and-set without execessive bus contention. SIGARCH Comput. Archit. News 18(2): 25-32 (1990) - [c17]Nancy J. Warter, Wen-mei W. Hwu:
A software based approach to achieving optimal performance for signature control flow checking. FTCS 1990: 442-449
1980 – 1989
- 1989
- [c16]Pohua P. Chang, Wen-mei W. Hwu:
Control flow optimization for supercomputer scalar processing. ICS 1989: 145-153 - [c15]Wen-mei W. Hwu, Thomas M. Conte, Pohua P. Chang:
Comparing Software and Hardware Schemes For Reducing the Cost of Branches. ISCA 1989: 224-233 - [c14]Wen-mei W. Hwu, Pohua P. Chang:
Achieving High Instruction Cache Performance with an Optimizing Compiler. ISCA 1989: 242-251 - [c13]P.-H. Chang, Wen-mei W. Hwu:
Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors. MICRO 1989: 188-198 - [c12]Wen-mei W. Hwu, Pohua P. Chang:
Inline Function Expansion for Compiling C Programs. PLDI 1989: 246-257 - [c11]Wen-mei W. Hwu, Thomas M. Conte:
A Simulation Study of Simultaneous Vector Prefetch Performance in Multiprocessor Memory Subsystems (Extended Abstract). SIGMETRICS 1989: 227 - 1988
- [c10]Wen-mei W. Hwu, Pohua P. Chang:
Exploiting Parallel Microprocessor Microarchitectures With a Compiler Code Generator. ISCA 1988: 45-53 - [c9]Pohua P. Chang, Wen-mei W. Hwu:
Trace selection for compiling large C application programs to microcode. MICRO 1988: 21-29 - 1987
- [j1]Wen-mei W. Hwu, Yale N. Patt:
Checkpoint Repair for High-Performance Out-of-Order Execution Machines. IEEE Trans. Computers 36(12): 1496-1514 (1987) - [c8]Wen-mei W. Hwu, Yale N. Patt:
Checkpoint Repair for Out-of-order Execution Machines. ISCA 1987: 18-26 - [c7]Wen-mei W. Hwu, Yale N. Patt:
Exploiting horizontal and vertical concurrency via the HPSm microprocessor. MICRO 1987: 154-161 - [c6]James E. Wilson, Stephen W. Melvin, Michael Shebanow, Wen-mei W. Hwu, Yale N. Patt:
On tuning the microarchitecture of an HPS implementation of the VAX. MICRO 1987: 162-167 - 1986
- [c5]Yale N. Patt, Wen-mei W. Hwu, Stephen W. Melvin, Michael Shebanow, Chein Chen, Jiajuin Wei:
Experiments with HPS, a Restricted Data Flow Microarchitecture for High Performance Computers. COMPCON 1986: 254-258 - [c4]Wen-mei W. Hwu, Yale N. Patt:
HPSm, a High Performance Restricted Data Flow Architecture Having Minimal Functionality. ISCA 1986: 297-306 - [c3]Yale N. Patt, Stephen W. Melvin, Wen-mei W. Hwu, Michael Shebanow, Chein Chen:
Run-time generation of HPS microinstructions from a VAX instruction stream. MICRO 1986: 75-81 - 1985
- [c2]Yale N. Patt, Wen-mei W. Hwu, Michael Shebanow:
HPS, a new microarchitecture: rationale and introduction. MICRO 1985: 103-108 - [c1]Yale N. Patt, Stephen W. Melvin, Wen-mei W. Hwu, Michael Shebanow:
Critical issues regarding HPS, a high performance microarchitecture. MICRO 1985: 109-116
Coauthor Index
aka: Simon Garcia de Gonzalo
aka: Vikram Sharma Mailthody
aka: Honghui Shi
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-07 21:35 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint