đź“Ł solving problems the clojure way

My attempt at explaining two of the core pillars of Clojure programming: functional programming and data-oriented programming. Features a step-by-step refactor of imperative code to functional code (in Javascript).

"This is by far the best talk I've seen explaining functional programming" <- someone on YouTube, so you should definitely trust them

  • Object-oriented programming said "let's organize state, use objects, and think of programs as interacting agents". Functional programming agrees state is confusing but instead says "let's avoid it, let's just say state is bad"
    • When I say state I mean: any values that change in place. If you have a array and you want to increment it in an imperative programming language you just increment that array, but in functional programming we'd say "no, that's bad; because then when this array is being passed to some function and that function changes it, and we didn't expect that, things go wrong."
    • Basically: state is the root of all problems so let's try to avoid it.
    • Compared to classical imperative programming, functional programming languages also gave us a new tool and that was the function.
      • We use the word "function" to distinguish from a procedure; the main difference between a function and a procedure is that a procedure, you can declare—you have to give it a name. You can pass the names of procedures around, but procedures can't really be passed around by value, and you can't create them on the fly and pass them around, whereas with functions, you can.
      • If you've worked with JavaScript, you know anonymous functions; they're everywhere, so it's not that crazy of a concept. But at the time, it was definitely something very different. If you work in a strict object-oriented programming language, they don't have functions that you can just pass around. Sometimes they're also called lambdas, but many OOP languages don't have them—Java only recently made it possible, but barely anyone uses lambdas because, "why do you need functions when you have methods on objects?" In functional programming, we say, "No, we don't need an object; you can just have a function. It can just sit on its own."
      • The mental model of functional programming isn't steps to achieve some solution. It isn't interacting agents working with each other. It's thinking of a program as a pipeline of input to output. We think, "What is the data that our program is going to be receiving over time, and how do we transform that data into the outputs that we want?"
    • You might be thinking, "We want to avoid state, but mutable state and side effects are inescapable."
      • That's almost the point of programming. We want our programs to do something, have some side effects. If all our program did was just make our laptop hotter and then contribute to the heat death of the universe, it wouldn't really be useful—unless you're cold. But the purpose of programs is to do something, even if at the least, it's to print out a single number. If all we want to do is take in a bunch of inputs and print out a number, that's still some side effect—it's printing to the console. But, you know, real programs have other side effects that we think about. Things like communicating to a database, changing things in a database, sending an email, or triggering some other external API. Making an HTTP request to another server or writing to the file system—these are all kinds of effects that a program typically would want to have on other parts of your computer system.
      • Functional programming would say, "Okay, yes, we do want side effects." It's not saying, "Let's get rid of all state; let's get rid of all kinds of side effects." Functional programming is saying, "...let's just do our best to avoid it."
    • How do we avoid mutable state (values that can change)? How do we avoid mutating that state in our programs? How do we avoid all these other kinds of side effects?
      • Functional programming has like three techniques, three little tricks that you can use, and those are: minimizing, concentrating, and deferring.
        • Minimizing is just trying to have fewer states, fewer values that we're keeping around.
        • Concentrating is saying, "Okay, well, if we have to have values, let's keep them all in one place rather than throughout the program."
        • Deferring, well, we'll get to deferring in a moment.
      • This "minimize, concentrate, defer" approach also applies to mutations. With mutations, we're saying, "Okay, let's try to have as few parts of our code that actually change values in places." If we do have to have mutations, let's keep them as much together rather than spread throughout the program.
        • Similarly, with side effects, let's try to decrease the number of places where we do side effects, concentrate the places that we do side effects, and if possible, defer them either to the last step in our program or to a completely separate system.
      • Talking a bit more about minimizing, how do we minimize state? Functional programming gives us a few things we can do:
        • We can derive values if possible. Sometimes there might be states that you want to keep, but you could actually completely avoid it. For example, if you were trying to program a game of tic-tac-toe, you might think you have to keep track of whose turn it is—Is it X's turn or O's turn?—but actually, you don't need to because you can determine whose turn it is based on how many grids are already full. So, if I were being hardcore functional, I wouldn't keep track of a variable called turn; I would just have a function that's based on the state of the grid that would tell me if it's X's or O's turn.
        • Another technique is to copy data instead of mutating it in place. As part of programs, we do want to change data structures, but with functional programming, we just copy them instead. You might think, "That sounds crazy. If I have an array of a thousand items and I want to add one thing to that array, you want me to make a whole new copy of that array and then add that one thing? That seems incredibly inefficient!" But thanks to a bunch of math and computer science and really smart people, we came up with data structures that allow doing that efficiently, which are called immutable data structures. In functional programming languages like Clojure, if you have an array and you add an element to that array, that new array is not a full copy; it's actually just two references: a reference to the original array and to that new value. So, there's almost no new memory being used, just two pointers and that one added value.
          • But you might say, "But what if someone changes that original array; the new array is also going to change." But because we're saying you can't change anything, it's okay; you can make these derived chains of "this array is actually based on this other array that's based on this other array that's based on this other array" because the system doesn't let you change those previous arrays.
          • You might think, "This sounds kind of complicated; now I have to think of an array being derived from a previous array being derived from some other thing." In practice, you don't—it just feels like working with any other language. Take an array, concatenate it with another array. In Clojure and most other languages, you have a whole bunch of these data structures: vectors (aka arrays), maps (aka objects or dictionaries depending on what language you're talking about), and those two data structures have all the common things that you would want to do with them implemented in an efficient way—in a functional way.
      • Another technique we have is using lambdas, aka anonymous functions. If we create functions dynamically, they will remember values in their scope, and you can make use of that (and we'll see that in a few places).
        • More importantly, you can also create higher-order functions. An example is functions like map and reduce—functions that I will use absolutely every day that I'm programming in Clojure. Map and reduce get rid of many situations where you'd be using loops in imperative JavaScript. If you wanted to find all the unique items in an array, you'd have to create another array, loop through the first one, and then associate stuff into the new array that you created. Whereas with map and reduce, you just pass a function in, and map and reduce will do that stuff for you under the hood. They do that recursively, and that's kind of our last technique.
      • Pretty much every loop solution to a problem has an equivalent recursion solution, and we just prefer recursion because you don't actually have to keep track of state as much; you basically defer that problem of keeping track of state, keeping track of the stack, to your system, so that's one less thing that you have to manually keep track of.
    • So, those are the three techniques: minimize, concentrate, and defer. I'm going to go through an example of libraries and programs that you might have encountered that have actually done this so you can better see what the difference is in taking a functional approach.
      • My first example is jQuery versus React. In the early days of the web, when we did things with jQuery, our programs would be full of imperative, stateful manipulations of the DOM. In the browser, you'd just say, "find this object, change this thing on that object." That would be state inside of the DOM, and if you wanted to change what the page looks like, you'd have to keep track or keep checking, "what is my current value, what do I need to do to change it, let's do that to change it."
        • Then React came around a few years ago and said, "you know what, let's not do all these mutations. Instead, you can write and think about your program as just a series of functions, and each function declares a part of your interface. You chain these functions together to compose your interface. All of that mutation and figuring out what's currently there and what needs to be there, we'll let the library take care of that. We'll let some external system do it."
        • This is an example of trying to defer and concentrate. Defer the stateful stuff—it's unavoidable working with the DOM in a browser to do mutations, but we can come up with solutions that let us program as if we didn't have to worry about that. We can program in a pure, functional way where all we have to do is declare: "here's a function that describes this part of the UI, here's another function that describes this part of the UI." As long as you pass some values, it just returns the corresponding HTML, and it's all pure, and all the stateful stuff is taken care of by another part of the system. So, this is definitely a very functional way of thinking about it.
    • Another example is React. In React, there's a common question: "Our interfaces have some state to deal with - where do we put it?" In the early days of React, largely because the dominant idea in programming was object-oriented programming, people thought: "Each component in the interface should keep track of its own state, and we want to have encapsulation and separation of concerns in our React codebases."
      • Here's a visual of our tree of React components. If we had some state that only this component cared about, and if you needed to show the state and maybe have a function that needed to change that state, we'd keep it down here. But the problem with this approach is that if there was any point in time where one component needed to show something but another component needed to update it, then our state would have to go to the lowest common denominator—or in this case, it would be more like the "highest" common denominator, because React only works top-down. We can't keep the state here and make it accessible to this component; we have to move it up. In larger programs, what tends to happen is that the lowest common denominator is the root. So, in this technique, there's this effort of keeping the state as low as possible, but the natural force of these programs is for the state to move up and up and up.
      • This led to a point where a bunch of React practitioners decided: "Well, if this state is trying to go all the way up to the top, and we have all this state up at the top and in a bunch of places near the top of our hierarchy, why don't we just shift our thinking and instead let's just put it all at the top? Let's just have one place. Let's concentrate our state in one place because then the rest of our application is pure—these are pure dumb functional components if you work with React, but basically, these things are just pure functions. They take inputs and they return HTML; they don't change any state; they don't do anything. Or when they do change the state, it's all kept in one place. And there has been a shift over the last few years in React, where the majority of the community was doing things this way and is now shifting towards this technique.
      • But we can go even further. People realized: "This is nice, we now have just one place for all of our state. We don't have to worry about things changing in a whole bunch of different places. But it is kind of annoying because now you have to pass everything down. If this component needs something, it needs to be added to the state and passed through all of its parents. It would be quite typical to see in some larger React applications having like 10, 15, 20 things being passed to one component, just so that another five or six can be passed down to this component and so on and so forth. That becomes a little tedious and hard to read and hard to understand.
        • So, the next proposal is: Okay, well, why don't we rip out the state and make it global? We have a single one of them, so why not make it a global object? People are resistant to this idea because it's been beaten into people's heads that globals are bad—and it's true, globals are bad when you have a hundred of them—but globals are perfectly fine when you only have one.
        • This is what most large React programs now do, and this is the pattern that Redux follows (which is a library that people use with React). The neat thing is that this idea actually didn't come from Redux; it was one of the things that came out of a Clojure library called Re-frame, which predates Redux. There was an equivalent sort of issue in Clojure, where we have libraries that wrap React. The Reagent way of doing things (Reagent is a very light wrapper around React) started out as thinking: "Let's have lots of little states and lots of little Clojure atoms that we can modify." But then when Re-frame came about, it said: "Let's just do this, the right-hand side instead."
        • Now, as a functional programmer, what do I think? Well, I think the one on the left-hand side is very object-oriented. The one in the middle is probably the most functional because even though the global state makes it easier to work with, these components are no longer pure; they now get state and access state from somewhere else rather than just being passed in directly via the function arguments or props. But, as a pragmatic functional programmer, I would say it's probably still worth it; I'm willing to put up with a few little impurities in my code if it makes it net simpler. If I was a Haskell programmer, I'd probably do the middle, but if I was a Clojure programmer, I'd be happy with the one on the right-hand side.
    • So, I mentioned Re-frame, which is the Redux equivalent; it's a library that helps manage state in a front-end application. There's also a very interesting example in a change that happened in Re-frame.
      • Originally, in one of the first one or two years of Re-frame's existence, you'd have some code like here. It's Clojure code, so you can be happy we're at a Clojure conference if you see some Clojure code. This is code to register an event handler, something that does a state transition when you click a button and want to trigger something to happen. You'd register one of these to handle that. This example is adding a product to a cart.
      • In this situation, when the user clicks that button and triggers this event, we want to do three things: make an AJAX call to tell the server "this person added this product to their cart," possibly dispatch some other event to happen, and update our local global state object, which in this case is called "db," to indicate that the item has been added to the cart.
      • So, this seemed fine, but the Re-frame folks are pretty hardcore about functional programming, and they saw this and thought: "This isn't functional enough." This could be done better because we have three side effects, three things that this function is doing that are triggering side effects in the rest of our system. Could we do better? You might think, "No, the whole point of this is to have side effects," but check this out!
      • They came up with a new idea: instead of actually calling other side-effectful functions in your code (like making an AJAX request, triggering another event, or updating the database – well, actually, in this case, updating the database just returns a new database, so it's these two that are the problem: AJAX and dispatch), let's instead write our functions so that they just return an object that indicates what I would like to be done. When calling this event, I would like an AJAX event to happen with the following information, I would like for this other event to be dispatched, and I would like for the database to now look like this. But this function itself doesn't do those things; it just says, "Here is what I want to be done." It declares these are the things to be done. Whereas, on the left-hand side, it actually does them. And then the Re-frame system, other parts of the system, actually take this and do something with it.
      • This is another example of deferring. Instead of having a whole bunch of these events that actually have side effects and do things, we have these functions just declare what they want done, and we have one part of the system – it's a complicated part of the system; it has to figure out how to do all these side effects – but at least it's all concentrated in one place, and the rest of our application is pure and easy to understand and easy to test. If I had to test this, I'd have to test: "Is the AJAX event actually happening, so maybe you have to mock an AJAX thing or do an integration test?" Whereas on this side, I just have to check if the thing that it's spitting out is the thing that I wanted.
      • This is a trivial example, but you can imagine that there might be some complicated stuff that's going on in here, and all you have to do is check if it's doing that complicated stuff, and then implement the thing that does the AJAX bit. You can test those things completely separately. Again, another example of doing things functionally.
  • Let's get back to our toy example (GOPS)
    • Here's our object-oriented code. I've removed the syntax highlighting. Let's take a look at it from a functional point of view.
      • If I were a functional programmer, I would immediately be looking at it in terms of where I have state, where I have mutations that change these values, and where I have code that triggers some other side effects, like printing or writing to the file system (in our case, it's only logging to the console that is a side effect).
      • So, it would look like this. I've highlighted mutable state in blue, mutations in orange, and external side effects in green. What I see is this stuff is spread all over. There's state in a bunch of different objects, there are side effects happening all over the place, there are mutations all over the place.
      • I would say, "You know what, object-oriented programming, I don't think you've really made state easier. You've organized it a bit, but it's still a problem. You still have stateful stuff going on all over the place."
    • Taking a look at our imperative example, if we take the functional lens at it - it's a little better, because all of our state is in one place, but, again, we have mutations and side effects all over.
  • So, what I'm going to do in the next five minutes is to go step by step and refactor this to be more functional.
    • Some people, when they're trying to work in a functional way, think they need to write functionally from step one. When I teach people how to write functionally, I just say, "It's okay to write it however way you want to write it, and then you can incrementally tweak it to get it where you want. Eventually, you'll develop the habits to get it right the first time."
    • So, let's do some refactoring.
      • (transcript redacted, it heavily references the code, it's best to just watch the talk for the step by step)
    • Why is this good? In general, it's because pure functions are good, and so we want pure functions everywhere. If I had to leave you with one core concept of what is functional programming, it's this image. Burn this into your mind. Functional programming is trying to program with as many pure functions as possible and then figuring out the details.
      • Pure functions are good because they're so easy to test, and they are so easy to understand. If I look at this function or any one of these functions, I could ignore the rest of the universe and just have to figure out how this function is doing what it's doing. They're very easy to test. They're really easy to use in a parallel system if you're trying to parallelize. They can also be trivially memoized because, given an input, the outputs are always the same. You could create a system that will cache all of these responses.
        • A fun example is solving Fibonacci recursively: the recursive solution is like, you know, n squared, and it's terrible, and it's pretty slow if you give it a large number. But the moment you just add caching, which in Clojure actually there's a single word you can have a function and just say memoize function, it turns it into a more efficient approach to that problem and makes it instant.
  • I also want to talk a little bit about data-driven programming.
    • I'm running a little over time, and we started late, so I'm going to go through this a bit faster than I wanted to.
    • Clojure is also called data-driven, and what does this mean? "Data-driven" is not a well-known concept; it's not really clear what data-driven is, and people will argue about it, and there's no consensus. It's early days in the data-driven world, but there are kind of three things that we mean when we talk about data-driven.
      • One is this idea that when we design our programs, we think about what is our data, where do we need to move it, how do we transform it - it's a data-first thinking approach, but it's very high level.
      • The second definition or way of thinking about it is the fact that Clojure just uses plain data structures. When you write Clojure code, you don't type, you don't have objects, you don't have typed structs. You're literally just using vectors and maps and passing them around all over the place. People from the typed programming background would think, "Oh my god, that's crazy, how can you know that things work?" - but it does, and there might be some trade-offs, but Clojure is firmly in the camp of it's better to have just a common simple way of transferring data around your system rather than typing it all.
      • The third definition, which is what people typically think of when they do data-driven programming, is this idea of programming where data structures define some of our control flow. I'll contrast this to macros: you might say code is data, so you can take some code and manipulate it with other code to change what you want to do. But actually, the Clojure community is not a big fan of macros, and instead, what the Clojure community seems to be moving towards is this idea of "let's use data structures to describe our logic and then have some other code manipulate those data structures and do the things we want because data structures are the purest thing you can have; they don't even have behavior, it's just data. I like calling this "configuration-driven development" - as in, what if you could put more of your program in a config file of plain data and less of it in like Turing complete code?
    • My favorite example of this is the AWS SDK. If you want to work with AWS, you'd probably use the SDK. Amazon would need to write that same SDK for like 20-30 different languages, so they came up with this idea of "let's just describe all the things that you can do with the AWS system in JSON." They have this giant list of like 1,000 JSON files, and each of them describes a part of Amazon's web services, and it's literally JSON that says, "Okay, well, there's this operation you can do on S3 called abort-multi-part-upload, and here's a bunch of metadata about it, here are the inputs it takes, here are the outputs here the errors and so on and so forth. And then in order to write the Ruby library or the Node library, they just need to write a compiler or some sort of system that translates these JSON files into an SDK or library you can use. But they don't actually have to implement every one of these functions in that library. When you're using their libraries, they're actually built from these JSON data structures. It's crazy, but it's so much easier to write something that translates this JSON into some code than to have to write all that code and maintain all of that code. AWS has a bunch of them that they've written, and then a few months ago, Cognitect wrote the equivalent version for Clojure.
    • Another quick example of data-driven is how we write HTML and CSS in Clojure. With JSX in React, you have this weird mangle of JavaScript code that turns into HTML code, and you switch back and forth, and there's a whole bunch of syntax and about how to do that. People who work with it are like, "Oh yeah, this is fine, it makes total sense," but I teach students React, and I tell you it is not trivial. But in Clojure land, we came up with this syntax called Hiccup where you just use keywords and strings and maps and vectors to create your structure that corresponds to the HTML, and it is absolutely trivial to work with. It makes perfect sense. There's no ambiguity, and you can just use all the stuff you're used to using in Clojure, like for loops and things - for is more like map - to generate the data structure that you want, and then later (deferred) Hiccup converts it into the HTML that you expect. And similarly for CSS. The reason this is good is because it's tangible, it's fungible: I can take this data structure and I can manipulate it, and as Clojure programmers, we love manipulating data; that's what we do, that's what we live and breathe. Clojure makes it super trivial to work with. And so having these things as data structures rather than some strings makes it super easy to work with. We're seeing this more and more. More and more libraries are trying to turn to this way of describing the functionality that you want from the library just as data structures that you can either just write out by hand or you can write and manipulate and then pass it off to the library.
    • An example of this is Compojure, which was the popular solution for doing HTTP routing in 2008 when it came out. It's still one of the most popular ones and it relies on using macros and functions. But once you write it, you can't do anything with it; it just gives you a function. But now libraries instead have come out that do it in a data-structured way.
  • Why is data better than code? It's tangible; you can work with it. I recommend the talk "Transparency Through Data" by James Reeves, which is worth watching, and he spends another 40 minutes talking about it.
  • In review: Clojure - it's functional, it's data-driven, but lastly, it's also pragmatic. These are the ideals we want. The ideal is functional programming - pure functions. The ideal is using data as much as possible. But in reality, sometimes, the world gets in the way, and problems are hard, so you can use stateful things as well. But we just try to optimize for functional programming and data-driven programming.
  • 2019-04-19
    #clojure #functional-programming #data-driven-programming