|
ÄÚµå¹øÈ£ : 11 |
|
¹ßÇ¥ÀÚ : ÀÌÁÖÇö |
|
¼Ò¼Ó : ÇѾç´ëÇб³ |
|
ºÎ¼ : ÀüÀÚ°øÇкΠ|
|
Á÷À§ : ±³¼ö |
|
¼¼¼Ç½Ã°£ : 6¿ù 24ÀÏ(¿ù) 09:00~10:50 |
|
¹ßÇ¥ÀÚ¾à·Â : |
2018 – ÇöÀç, ÇѾç´ëÇб³ Á¶±³¼ö
2014 – 2018, The Ohio State University, Postdoctoral Researcher
2014 – 2014, KAIST, Postdoctoral Researcher
2014, KAIST Àü±â¹×ÀüÀÚ°øÇаú °øÇйڻç
2008, KAIST Àü±â¹×ÀüÀÚ°øÇаú °øÇлç |
|
|
°¿¬¿ä¾à : |
°ÈÇнÀ(Reinforcement Learning)Àº ÀüÅëÀûÀÎ ÀΰøÁö´É ¿¬±¸ÀÇ ÇÑ ÃàÀ¸·Î ¾ËÆÄ°í¿Í °°Àº ´ëÇ¥ÀûÀÎ ÀΰøÁö´É ¾Ë°í¸®Áò¿¡ ¾²ÀÌ´Â ÇÙ½ÉÀÌ·ÐÀÌ´Ù. ±âº»ÀûÀÎ °ÈÇнÀÀÇ ¸ñÇ¥´Â ÇöÀçÀÇ »óÅÂ(State)¸¦ °üÂûÇÏ°í ¼±Åà °¡´ÉÇÑ Çൿ(Action)µé Áß ´©Àû º¸»ó(Cumulative Reward)À» ÃÖ´ëÈÇÏ´Â ÇൿÀ» ¼±ÅÃÇÏ´Â °ÍÀÌ´Ù. ÃÖ±Ù ÀΰøÁö´É¿¡ ´ëÇÑ Æø¹ßÀûÀÎ °ü½É¿¡ µû¶ó, °ÔÀÓ, º¸ÇàµîÀÇ °íÀüÀûÀÎ ÀÀ¿ë»Ó¸¸ ¾Æ´Ï¶ó Åë½Å ³×Æ®¿öÅ©¿¡ °ÈÇнÀÀ» Àû¿ëÇÑ »ç·Ê°¡ Áõ°¡ÇÏ°í ÀÖ´Ù. º» °ÀÇ¿¡¼´Â ´ÙÁß ½½·Ô¸Ó½Å(Multi-armed Bandits), ¸¶¸£ÄÚºê °áÁ¤ °úÁ¤(Markov Decision Process) µîÀÇ ±âÃÊÀûÀÎ °ÈÇнÀ ¸ðµ¨°ú ´ëÇ¥ÀûÀÎ ¾Ë°í¸®ÁòÀ» ¼Ò°³ÇÏ°í Åë½Å ³×Æ®¿öÅ©¸¦ Áß½ÉÀ¸·Î ÀÀ¿ë »ç·ÊµéÀ» ¾Ë¾Æº»´Ù. |
|
|
|