KRnet

세부프로그램

[I3] Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines

코드번호 : 1

발표자 : 김건희

소속 : 서울대학교

부서 : 컴퓨터공학부

직위 : 교수

세션시간 : 16:00~18:00

발표자약력 :

2015-현재: 서울대학교 컴퓨터공학부 조교수
2013-2015: Disney Research 박사후 연구원
2013. Carnegie Mellon University, Computer Science Department 박사
2008. Carnegie Mellon University, The Robotics Institute 석사
2006. 한국과학기술연구원 (KIST) 연구원
2001. 한국과학기술원(KAIST) 기계공학과 학사/석사

강연요약 :

In this talk, I will introduce Poseidon, a scalable system architecture for distributed inter-machine communication in existing deep learning frameworks. Poseidon features three key contributions: (1) a three-level hybrid architecture that allows Poseidon to support both CPU-only and GPU-equipped clusters, (2) a distributed wait-free backpropagation (DWBP) algorithm to improve GPU utilization and to balance communication, and (3) a structure-aware communication protocol (SACP) to minimize communication overheads. I also present experiment results that Poseidon converges to same objectives as a single machine, and achieves state-of-the-art training speedup across multiple models and well-established datasets, using a commodity GPU cluster of 8 nodes (4.5x on AlexNet, 4x on GoogLeNet). On the much larger ImageNet 22K dataset, Poseidon with 8 nodes achieves better speedup and competitive accuracy to recent CPU-based distributed deep learning systems such as Adam and Le et al, which use 10s to 1000s of nodes. Poseidon is active open-source framework, and the current release is available at https://github.com/petuum/poseidon.