주식회사 메타컴퓨팅(Metacomputing Inc. )의 메타버스(Metaverse)는 아래의 표와 같이 정의되어있다. https://metacomputing.co.kr/ 위의 메타버스(Metaverse)는 학문적 접속의 형태로 정의되어있으며, 라이프니츠의 가능세계론에 기반을 둔다. 아래는 주식회사 메타컴퓨팅(Metacomputing Inc.)가 출원 중인 특허 속 메타버스(Metaverse) 기저 컴퓨팅 인프라(Computing Infrastructure)이다. 보다 상세한 내용은 과학기술정보통신부 한국연구재단 정책연구실에서 곧 발행될 예정인 관련 정책 자문 원고(메타버스, 양자컴퓨팅(Quantum Computing))를 참조하거나 아래의 회사 홈페이지를 통해 알 수 … Read more
https://github.com/drserendipity/udacity/tree/main/solutions Udacity Project Solutions for Nanodegree Programs – Artificial Intelligence, AI for Trading, Computer Vision, Deep Learning, Deep Reinforcement Learning, Natural Language Processing including sources codes, reports, and reviews’ comments
확실히 글로 적고보니 좀 더 자극이 된다.(라고 하지만, 이미 저녁을 많이 먹어서 배가 빵빵한 상태) 지난주에는 식이/내 마음가짐/운동 이렇게 3가지로 나눴다면 이번에는 요일별로 내 생활을 기록해 보고자 한다. 1주차 링크!! 1.월요일 출근하는날. 아침 아무것도 먹지 못 하고 아이스 아메리카노 하나 가지고 출근함 회의 후 업무보는데, 도시락이 왔다 해서 그걸로 먹음 -메뉴 : 밥, 제육볶음, 순두부찌개, … Read more
Coding Exercise Please use the next concept to complete the following section of Monte_Carlo.ipynb: Part 2: MC Control To reference the pseudocode while working on the notebook, you are encouraged to look at this sheet. Download the Exercise If you would prefer to work on your own machine, you can download the exercise from the DRLND GitHub repository. … Read more
Constant-alpha In the video below, you will learn about another improvement that you can make to your Monte Carlo control algorithm. Here are some guiding principles that will help you to set the value of $\alpha$ when implementing constant-$\alpha$ MC control: You should always set the value for $\alpha$ to a number greater than zero and less than (or equal … Read more
Incremental Mean In our current algorithm for Monte Carlo control, we collect a large number of episodes to build the Q-table (as an estimate for the action-value function corresponding to the agent’s current policy). Then, after the values in the Q-table have converged, we use the table to come up with an improved policy. Maybe … Read more
Exploration vs. Exploitation Exploration-Exploitation Dilemma (Source) Solving Environments in OpenAI Gym In many cases, we would like our reinforcement learning (RL) agents to learn to maximize reward as quickly as possible. This can be seen in many OpenAI Gym environments. For instance, the FrozenLake-v0 environment is considered solved once the agent attains an average reward of 0.78 … Read more
The Road Ahead You now have a working algorithm for Monte Carlo control! So, what’s to come? In the next concept (Exploration vs. Exploitation), you will learn more about how to set the value of $\epsilon$ when constructing $\epsilon$-greedy policies in the policy improvement step. Then, you will learn about two improvements that you can make to the … Read more
Correct! As long as epsilon > 0, the agent has nonzero probability of selecting any of the available actions.
Greedy Policies Correct! For state 1, action 2 has the highest estimated return (2>1). For state 2, action 1 has the highest estimated return (4>3).
Coding Exercise Please use the next concept to complete the following sections of Monte_Carlo.ipynb: Part 0: Explore BlackjackEnv Part 1: MC Prediction To reference the pseudocode while working on the notebook, you are encouraged to look at this sheet. Important Note Please do not complete the entire notebook in the next concept – you should only complete Part 0 and Part 1. … Read more
Workspace – Introduction You will write all of your implementations within the classroom, using an interface identical to the one shown below. Your Workspace contains the following files (among others): Monte_Carlo.ipynb – the Jupyter notebook where you will write all of your implementations (this is the only file that you will modify!) Monte_Carlo_Solution.ipynb – the corresponding instructor solutions plot_utils.py – … Read more
OpenAI Gym: BlackJackEnv In order to master the algorithms discussed in this lesson, you will write code to teach an agent to play Blackjack. Playing Cards (Source) Please read about the game of Blackjack in Example 5.1 of the textbook. When you have finished, please review the corresponding GitHub file, by reading the commented block in the … Read more
MC Prediction So far in this lesson, we have discussed how the agent can take a bad policy, like the equiprobable random policy, use it to collect some episodes, and then consolidate the results to arrive at a better policy. In the video in the previous concept, you saw that estimating the action-value function with … Read more
Important Note In this video, we demonstrated a toy example where the agent collected two episodes, consolidated the information in a table, and then used the table to come up with a better policy. However, as discussed in the previous video, in real-world settings (and even for the toy example depicted here!), the agent will … Read more
Gridworld Example Quiz To check your understanding of the environment, please answer the questions below.