{"id":2398,"date":"2024-03-21T10:13:02","date_gmt":"2024-03-21T17:13:02","guid":{"rendered":"http:\/\/zackmdavis.net\/blog\/?p=2398"},"modified":"2024-03-21T19:31:22","modified_gmt":"2024-03-22T02:31:22","slug":"deep-learning-is-function-approximation","status":"publish","type":"post","link":"http:\/\/zackmdavis.net\/blog\/2024\/03\/deep-learning-is-function-approximation\/","title":{"rendered":"\"Deep Learning\" Is Function Approximation"},"content":{"rendered":"<h2 id=\"a-surprising-development-in-the-study-of-multi-layer-parameterized-graphical-function-approximators\">A Surprising Development in the Study of Multi-layer Parameterized Graphical Function Approximators<\/h2>\n<p>As a programmer and epistemology enthusiast, I\u2019ve been studying some statistical modeling techniques lately! It\u2019s been boodles of fun, and might even prove useful in a future dayjob if I decide to pivot my career away from the backend web development roles I\u2019ve taken in the past.<\/p>\n<p>More specifically, I\u2019ve mostly been focused on multi-layer parameterized graphical function approximators, which map inputs to outputs via a sequence of affine transformations composed with nonlinear \u201cactivation\u201d functions.<\/p>\n<p>(Some authors call these <a href=\"https:\/\/en.wikipedia.org\/wiki\/Deep_learning\">\u201cdeep neural networks\u201d<\/a> for some reason, but <a href=\"https:\/\/www.lesswrong.com\/posts\/WBdvyyHLdxZSAMmoz\/taboo-your-words\">I like my name better<\/a>.)<\/p>\n<p>It\u2019s a curve-fitting technique: by setting the multiplicative factors and additive terms appropriately, multi-layer parameterized graphical function approximators can <a href=\"https:\/\/en.wikipedia.org\/wiki\/Universal_approximation_theorem\">approximate any function<\/a>. For a popular choice of \u201cactivation\u201d rule <a href=\"https:\/\/en.wikipedia.org\/wiki\/Rectifier_(neural_networks)\">which takes the maximum of the input and zero<\/a>, the curve is specifically a piecewise-linear function. We iteratively improve the approximation f(x, \u03b8) by adjusting the parameters \u03b8 in the direction of the derivative of some error metric on the current approximation\u2019s fit to some example input\u2013output pairs (x, y), which some authors call <a href=\"https:\/\/en.wikipedia.org\/wiki\/Gradient_descent\">\u201cgradient descent\u201d<\/a> for some reason. (The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Mean_squared_error\">mean squared error<\/a> (f(x, \u03b8) \u2212 y)\u00b2 is a popular choice for the error metric, as is the negative log likelihood \u2212log P(y | f(x, \u03b8)). Some authors call these \u201closs functions\u201d for some reason.)<\/p>\n<p>Basically, the big empirical surprise of <a href=\"https:\/\/bmk.sh\/2019\/12\/31\/The-Decade-of-Deep-Learning\/\">the previous decade<\/a> is that given a lot of desired input\u2013output pairs (x, y) and the proper engineering know-how, you can use large amounts of computing power to find parameters \u03b8 to fit a function approximator that \u201cgeneralizes\u201d well\u2014meaning that if you compute y\u0302 = f(x, \u03b8) for some x that wasn\u2019t in any of your original example input\u2013output pairs (which some authors call \u201ctraining\u201d data for some reason), it turns out that y\u0302 is usually pretty similar to the y you would have used in an example (x, y) pair.<\/p>\n<p>It wasn\u2019t obvious beforehand that this would work! You\u2019d expect that if your function approximator has more parameters than you have example input\u2013output pairs, it would <a href=\"https:\/\/en.wikipedia.org\/wiki\/Overfitting\">overfit<\/a>, implementing a complicated function that reproduced the example input\u2013output pairs but outputted crazy nonsense for other choices of x\u2014the more expressive function approximator proving useless for <a href=\"https:\/\/www.lesswrong.com\/posts\/mB95aqTSJLNR9YyjH\/message-length\">the lack of evidence to pin down the correct approximation<\/a>.<\/p>\n<p>And that is what we see for function approximators with only slightly more parameters than example input\u2013output pairs, but for <em>sufficiently large<\/em> function approximators, <a href=\"https:\/\/www.lesswrong.com\/posts\/FRv7ryoqtvSuqBxuT\/understanding-deep-double-descent\">the trend reverses<\/a> and \u201cgeneralization\u201d improves\u2014the more expressive function approximator proving useful after all, as it admits <a href=\"https:\/\/www.lesswrong.com\/posts\/nGqzNC6uNueum2w8T\/inductive-biases-stick-around\">algorithmically simpler functions<\/a> that fit the example pairs.<\/p>\n<p>The other week I was talking about this to an acquaintance who seemed puzzled by my explanation. \u201cWhat are the preconditions for this intuition about neural networks as function approximators?\u201d they asked. (I paraphrase only slightly.) \u201cI would assume this is true under specific conditions,\u201d they continued, \u201cbut I don\u2019t think we should expect such niceness to hold under capability increases. Why should we expect this to carry forward?\u201d<\/p>\n<p>I don\u2019t know where this person was getting their information, but this made zero sense to me. I mean, okay, <a href=\"https:\/\/gwern.net\/scaling-hypothesis\">when you increase the number of parameters<\/a> in your function approximator, it gets better at representing more complicated functions, which I guess you could describe as \u201ccapability increases\u201d?<\/p>\n<p>But multi-layer parameterized graphical function approximators created by iteratively using the derivative of some error metric to improve the quality of the approximation are still, actually, function approximators. Piecewise-linear functions are still piecewise-linear functions even when there are a lot of pieces. What did <em>you<\/em> think it was doing?<\/p>\n<h2 id=\"multi-layer-parameterized-graphical-function-approximators-have-many-exciting-applications\">Multi-layer Parameterized Graphical Function Approximators Have Many Exciting Applications<\/h2>\n<p>To be clear, you can do a lot with function approximation!<\/p>\n<p>For example, if you assemble a collection of desired input\u2013output pairs (x, y) where the x is <a href=\"https:\/\/en.wikipedia.org\/wiki\/MNIST_database\">an array of pixels depicting a handwritten digit<\/a> and y is a character representing which digit, then you can fit a \u201cconvolutional\u201d multi-layer parameterized graphical function approximator to approximate the function from pixel-arrays to digits\u2014effectively allowing computers to read handwriting.<\/p>\n<p>Such techniques have proven useful in all sorts of domains where a task can be conceptualized as a function from one data distribution to another: image synthesis, voice recognition, recommender systems\u2014you name it. Famously, by approximating the next-token function in tokenized internet text, large language models can answer questions, write code, and perform other natural-language understanding tasks.<\/p>\n<p>I could see how someone reading about computer systems performing cognitive tasks previously thought to require intelligence might be alarmed\u2014and become further alarmed when reading that these systems are \u201ctrained\u201d rather than coded in the manner of traditional computer programs. The summary evokes imagery of training a wild animal that might turn on us the moment it can seize power and reward itself rather than being dependent on its masters.<\/p>\n<p>But \u201ctraining\u201d is just a <a href=\"https:\/\/www.lesswrong.com\/posts\/yxWbbe9XcgLFCrwiL\/dreams-of-ai-alignment-the-danger-of-suggestive-names\">suggestive name<\/a>. It\u2019s true that we don\u2019t have a mechanistic understanding of how function approximators perform tasks, in contrast to traditional computer programs whose source code was written by a human. It\u2019s plausible that this opacity represents grave risks, if we create powerful systems that we don\u2019t know how to debug.<\/p>\n<p>But whatever the real risks are, any hope of mitigating them is going to depend on acquiring the most accurate possible understanding of the problem. If the problem is itself largely one of our own lack of understanding, it helps to be <em>specific<\/em> about exactly which parts we do and don\u2019t understand, rather than surrendering the entire field to a blurry aura of mystery and despair.<\/p>\n<h2 id=\"an-example-of-applying-multi-layer-parameterized-graphical-function-approximators-in-success-antecedent-computation-boosting\">An Example of Applying Multi-layer Parameterized Graphical Function Approximators in Success-Antecedent Computation Boosting<\/h2>\n<p>One of the exciting things about multi-layer parameterized graphical function approximators is that they can be combined with other methods for the automation of cognitive tasks (which is usually called \u201ccomputing\u201d, but some authors say \u201cartificial intelligence\u201d for some reason).<\/p>\n<p>In the spirit of being specific about exactly which parts we do and don\u2019t understand, I want to talk about <a href=\"https:\/\/arxiv.org\/abs\/1312.5602\">Mnih <em>et al.<\/em> 2013\u2019s work on getting computers to play classic Atari games<\/a> (like <a href=\"https:\/\/en.wikipedia.org\/wiki\/Pong\"><em>Pong<\/em><\/a>, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Breakout_(video_game)\"><em>Breakout<\/em><\/a>, or <a href=\"https:\/\/en.wikipedia.org\/wiki\/Space_Invaders\"><em>Space Invaders<\/em><\/a>). This work is notable as one of the first high-profile examples of using multi-layer parameterized graphical function approximators in conjunction with success-antecedent computation boosting (which some authors call <a href=\"https:\/\/en.wikipedia.org\/wiki\/Reinforcement_learning\">\u201creinforcement learning\u201d<\/a> for some reason).<\/p>\n<p>If you only read the news\u2014if you\u2019re not in tune with there being things to read <em>besides<\/em> news\u2014I could see this result being quite alarming. Digital brains learning to play video games at superhuman levels <em>from the raw pixels<\/em>, rather than because a programmer sat down to write an automation policy for that particular game? Are we not <a href=\"https:\/\/www.online-literature.com\/george_eliot\/theophrastus-such\/17\/\">already in the shadow of the coming race<\/a>?<\/p>\n<p>But people who read textbooks and not just news, being no less impressed by the result, are often inclined to take a subtler lesson from any particular headline-grabbing advance.<\/p>\n<p>Mnih <em>et al.<\/em>\u2019s Atari result built off the technique of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Q-learning\">Q-learning<\/a> introduced two decades prior. Given a discrete-time present-state-based outcome-valued stochastic control problem (which some authors call a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Markov_decision_process\">\u201cMarkov decision process\u201d<\/a> for some reason), Q-learning concerns itself with defining a function Q(s, a) that describes the value of taking action a while in state s, for some discrete sets of states and actions. For example, to describe the problem faced by an policy for a grid-based video game, the states might be the squares of the grid, and the available actions might be moving left, right, up, or down. The Q-value for being on a particular square and taking the move-right action might be the expected change in the game\u2019s score from doing that (including a scaled-down expectation of score changes from future actions after that).<\/p>\n<p>Upon finding itself in a particular state s, a Q-learning <a href=\"https:\/\/www.lesswrong.com\/posts\/rmfjo4Wmtgq8qa2B7\/think-carefully-before-calling-rl-policies-agents\">policy<\/a> will usually perform the action with the highest Q(s, a), <a href=\"https:\/\/en.wikipedia.org\/wiki\/Exploration-exploitation_dilemma\">\u201cexploiting\u201d<\/a> its current beliefs about the environment, but <a href=\"https:\/\/en.wikipedia.org\/wiki\/Multi-armed_bandit#Approximate_solutions\">with some probability<\/a> it will \u201cexplore\u201d by taking a random action. The predicted outcomes of its decisions are compared to the actual outcomes to update the function Q(s, a), which can simply be represented as a table with as many rows as there are possible states and as many columns as there are possible actions. We have theorems to the effect that as the policy thoroughly explores the environment, it will eventually converge on the correct Q(s, a).<\/p>\n<p>But Q-learning as originally conceived doesn\u2019t work for the Atari games studied by Mnih <em>et al.<\/em>, because it assumes a discrete set of possible states that could be represented with the rows in a table. This is intractable for problems where the state of the environment varies continuously. If a \u201cstate\u201d in <em>Pong<\/em> is a 6-tuple of floating-point numbers representing the player\u2019s paddle position, the opponent\u2019s paddle position, and the x- and y-coordinates of the ball\u2019s position and velocity, then there\u2019s no way for the traditional Q-learning algorithm to base its behavior on its past experiences without having already seen that exact conjunction of paddle positions, ball position, and ball velocity, which almost never happens. So Mnih <em>et al.<\/em>\u2019s great innovation was\u2014<\/p>\n<p>(Wait for it \u2026)<\/p>\n<p>\u2014to replace the table representing Q(s, a) with a multi-layer parameterized graphical function approximator! By approximating the mapping from state\u2013action pairs to discounted-sums-of-\u201crewards\u201d, the \u201cneural network\u201d allows the policy to \u201cgeneralize\u201d from its experience, taking similar actions in relevantly similar states, without having visited those exact states before. There are <a href=\"https:\/\/www.lesswrong.com\/posts\/kyvCNgx9oAwJCuevo\/deep-q-networks-explained\">a few other minor technical details<\/a> needed to make it work well, but that\u2019s the big idea.<\/p>\n<p>And understanding the big idea probably changes your perspective on the headline-grabbing advance. (It certainly did for me.) \u201cDeep learning is like evolving brains; it solves problems <a href=\"https:\/\/www.lesswrong.com\/posts\/CpjTJtW2RNKvzAehG\/most-people-don-t-realize-we-have-no-idea-how-our-ais-work\">and we don\u2019t know how<\/a>\u201d is an importantly different story from \u201cWe swapped out a table for a multi-layer parameterized graphical function approximator in this specific success-antecedent computation boosting algorithm, and now it can handle continuous state spaces.\u201d<\/p>\n<h2 id=\"risks-from-learned-approximation\">Risks From Learned Approximation<\/h2>\n<p>When I solicited reading recommendations from people who ought to know about risks of harm from statistical modeling techniques, I was directed to <a href=\"https:\/\/www.lesswrong.com\/posts\/uMQ3cqWDPHhjtiesc\/agi-ruin-a-list-of-lethalities\">a list of reputedly fatal-to-humanity problems, or \u201clethalities\u201d<\/a>.<\/p>\n<p>Unfortunately, I don\u2019t think I\u2019m qualified to evaluate the list as a whole; I would seem to lack some necessary context. (The author keeps using the term \u201cAGI\u201d without defining it, and <a href=\"https:\/\/www.irs.gov\/e-file-providers\/definition-of-adjusted-gross-income\">adjusted gross income<\/a> doesn\u2019t make sense in context.)<\/p>\n<p>What I can say is that when the list discusses the kinds of statistical modeling techniques I\u2019ve been studying lately, it starts to <em>talk funny<\/em>. I don\u2019t think someone who\u2019s been reading the same textbooks as I have (like <a href=\"http:\/\/udlbook.com\">Prince 2023<\/a> or <a href=\"https:\/\/www.bishopbook.com\/\">Bishop and Bishop 2024<\/a>) would write like this:<\/p>\n<blockquote><p>Even if you train really hard on an exact loss function, that doesn\u2019t thereby create an explicit internal representation of the loss function inside an AI that then continues to pursue that exact loss function in distribution-shifted environments. Humans don\u2019t explicitly pursue inclusive genetic fitness; <strong>outer optimization even on a very exact, very simple loss function doesn\u2019t produce inner optimization in that direction.<\/strong> [\u2026] This is sufficient on its own [\u2026] to trash entire categories of naive alignment proposals which assume that if you optimize a bunch on a loss function calculated using some simple concept, you get perfect inner alignment on that concept.<\/p><\/blockquote>\n<p>To be clear, I agree that if you fit a function approximator by iteratively adjusting its parameters in the direction of the derivative of some loss function on example input\u2013output pairs, that doesn\u2019t create an explicit internal representation of the loss function inside the function approximator.<\/p>\n<p>It\u2019s just\u2014why would you want that? And really, what would that even mean? If I use the mean squared error loss function to approximate a set of data points in the plane with a line (which some authors call a \u201clinear regression model\u201d for some reason), obviously the line itself does not somehow contain a representation of general squared-error-minimization. The line is just a line. The loss function defines how my choice of line responds to the data I\u2019m trying to approximate with the line. (The mean squared error has some <a href=\"https:\/\/www.benkuhn.net\/squared\/\">elegant mathematical properties<\/a>, but is more sensitive to outliers than the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Mean_absolute_error\">mean absolute error<\/a>.)<\/p>\n<p>It\u2019s the same thing for piecewise-linear functions defined by multi-layer parameterized graphical function approximators: <a href=\"https:\/\/nonint.com\/2023\/06\/10\/the-it-in-ai-models-is-the-dataset\/\">the model is the dataset<\/a>. It\u2019s just not meaningful to talk about what a loss function implies, independently of the training data. (Mean squared error <em>of what?<\/em> Negative log likelihood <em>of what?<\/em> Finish the sentence!)<\/p>\n<p>This confusion about loss functions seems to be linked to a particular theory of how statistical modeling techniques might be dangerous, in which \u201couter\u201d training results in the emergence of an \u201cinner\u201d intelligent agent. If you expect that, and you <a href=\"https:\/\/www.lesswrong.com\/posts\/RQpNHSiWaXTvDxt6R\/coherent-decisions-imply-consistent-utilities\">expect intelligent agents to have a \u201cutility function\u201d<\/a>, you might be inclined to think of \u201cgradient descent\u201d \u201ctraining\u201d as trying to transfer an outer \u201closs function\u201d into an inner \u201cutility function\u201d, and perhaps to think that the attempted transfer primarily doesn\u2019t work because \u201cgradient descent\u201d is an insufficiently powerful optimization method.<\/p>\n<p>I <a href=\"https:\/\/www.lesswrong.com\/posts\/6mysMAqvo9giHC4iX\/what-s-general-purpose-search-and-why-might-we-expect-to-see\">guess the emergence of inner agents might be possible<\/a>? I can\u2019t <em>rule it out<\/em>. (\u201cFunctions\u201d are very general, so I can\u2019t claim that a function approximator could never implement an agent.) Maybe it would happen at some scale?<\/p>\n<p>But taking the technology in front of us at face value, that\u2019s not my default guess at how the machine intelligence transition would go down. If I had to guess, I\u2019d imagine someone deliberately building an agent using function approximators as a critical component, rather than your function approximator secretly having an agent inside of it.<\/p>\n<p>That\u2019s a different threat model! If you\u2019re trying to build a good agent, or trying to prohibit people from building bad agents using coordinated violence (which some authors call \u201cregulation\u201d for some reason), it matters what your threat model is!<\/p>\n<p>(Statistical modeling engineer Jack Gallagher has described his experience of this debate as \u201clike trying to discuss crash test methodology with people who insist that the wheels must be made of little cars, because how else would they move forward like a car does?\u201d)<\/p>\n<p>I don\u2019t know how to build a general agent, but contemporary computing research offers clues as to how function approximators can be composed with other components to build systems that perform cognitive tasks.<\/p>\n<p>Consider <a href=\"https:\/\/en.wikipedia.org\/wiki\/AlphaGo\">AlphaGo<\/a> and its successor <a href=\"https:\/\/en.wikipedia.org\/wiki\/AlphaZero\">AlphaZero<\/a>. In AlphaGo, one function approximator is used to approximate a function from board states to move probabilities. Another is used to approximate the function from board states to game outcomes, where the outcome is +1 when one player has certainly won, \u22121 when the other player has certainly won, and a proportionately intermediate value indicating who has the advantage when the outcome is still uncertain. The system plays both sides of a game, using the board-state-to-move-probability function and board-state-to-game-outcome function as heuristics to guide a search algorithm which some authors call <a href=\"https:\/\/en.wikipedia.org\/wiki\/Monte_Carlo_tree_search\">\u201cMonte Carlo tree search\u201d<\/a>. The board-state-to-move-probability function approximation is improved by adjusting its parameters in the direction of the derivative of its <a href=\"https:\/\/en.wikipedia.org\/wiki\/Cross-entropy\">cross-entropy<\/a> with the move distribution found by the search algorithm. The board-state-to-game-outcome function approximation is improved by adjusting its parameters in the direction of the derivative of its squared difference with the self-play game\u2019s ultimate outcome.<\/p>\n<p>This kind of design is not trivially safe. A similarly superhuman system that operated in the real world (instead of the restricted world of board games) that iteratively improved an action-to-money-in-this-bank-account function seems like it would have undesirable consequences, because if the search discovered that theft or fraud increased the amount of money in the bank account, then the action-to-money function approximator would generalizably steer the system into doing more theft and fraud.<\/p>\n<p>Statistical modeling engineers have a saying: if you\u2019re surprised by what your nerual net is doing, you haven\u2019t looked at your training data closely enough. The problem in this hypothetical scenario is not that multi-layer parameterized graphical function approximators are inherently unpredictable, or must necessarily contain a power-seeking consequentialist agent in order to do any useful cognitive work. The problem is that you\u2019re approximating the wrong function and <a href=\"https:\/\/www.lesswrong.com\/posts\/HBxe6wdjxK239zajf\/what-failure-looks-like#Part_I__You_get_what_you_measure\">get what you measure<\/a>. The failure would still occur if the function approximator \u201cgeneralizes\u201d from its \u201ctraining\u201d data the way you\u2019d expect. (If you can <em>recognize<\/em> fraud and theft, it\u2019s easy enough to just not use that data as examples to approximate, but by hypothesis, this system is only looking at the account balance.) This doesn\u2019t itself rule out more careful designs that use function approximators to approximate <a href=\"https:\/\/www.lesswrong.com\/posts\/pYcFPMBtQveAjcSfH\/supervise-process-not-outcomes\">known-trustworthy processes<\/a> and <a href=\"https:\/\/www.lesswrong.com\/posts\/9fL22eBJMtyCLvL7j\/soft-optimization-makes-the-value-target-bigger\">don\u2019t search harder than their representation of value can support<\/a>.<\/p>\n<p>This may be cold comfort to people who anticipate a competitive future in which cognitive automation designs that more carefully respect human values will foreseeably fail to keep up with the frontier of more powerful systems that do <a href=\"https:\/\/ai-alignment.com\/aligned-search-366f983742e9\">search harder<\/a>. It <a href=\"https:\/\/arbital.com\/p\/safe_useless\/\">may not matter to the long-run future of the universe<\/a> that you can build helpful and harmless language agents today, if your civilization gets eaten by more powerful and unfriendlier cognitive automation designs some number of years down the line. As a humble programmer and epistemology enthusiast, I have no assurances to offer, no principle or theory to guarantee everything will turn out all right in the end. Just a conviction that, whatever challenges confront us in the future, we\u2019ll be a better position to face them by understanding the problem in as much detail as possible.<\/p>\n<hr>\n<h3 id=\"bibliography\">Bibliography<\/h3>\n<p>Bishop, Christopher M., and Andrew M. Bishop. 2024. <em>Deep Learning: Foundations and Concepts<\/em>. Cambridge, UK: Cambridge University Press. <em>https:\/\/www.bishopbook.com\/<\/em><\/p>\n<p>Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. \u201cPlaying Atari with Deep Reinforcement Learning.\u201d <em>https:\/\/arxiv.org\/abs\/1312.5602<\/em><\/p>\n<p>Prince, Simon J.D. 2023. <em>Understanding Deep Learning<\/em>. Cambridge, MA: MIT Press. <em>http:\/\/udlbook.com<\/em><\/p>\n<p>Sutton, Richard S. and Andrew G. Barto. 2024. <em>Reinforcement Learning<\/em>. 2nd ed. Cambridge, MA: MIT Press.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A Surprising Development in the Study of Multi-layer Parameterized Graphical Function Approximators As a programmer and epistemology enthusiast, I\u2019ve been studying some statistical modeling techniques lately! It\u2019s been boodles of fun, and might even prove useful in a future dayjob &hellip; <a href=\"http:\/\/zackmdavis.net\/blog\/2024\/03\/deep-learning-is-function-approximation\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[20],"tags":[],"_links":{"self":[{"href":"http:\/\/zackmdavis.net\/blog\/wp-json\/wp\/v2\/posts\/2398"}],"collection":[{"href":"http:\/\/zackmdavis.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/zackmdavis.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/zackmdavis.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/zackmdavis.net\/blog\/wp-json\/wp\/v2\/comments?post=2398"}],"version-history":[{"count":4,"href":"http:\/\/zackmdavis.net\/blog\/wp-json\/wp\/v2\/posts\/2398\/revisions"}],"predecessor-version":[{"id":2402,"href":"http:\/\/zackmdavis.net\/blog\/wp-json\/wp\/v2\/posts\/2398\/revisions\/2402"}],"wp:attachment":[{"href":"http:\/\/zackmdavis.net\/blog\/wp-json\/wp\/v2\/media?parent=2398"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/zackmdavis.net\/blog\/wp-json\/wp\/v2\/categories?post=2398"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/zackmdavis.net\/blog\/wp-json\/wp\/v2\/tags?post=2398"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}