In your first programming class, you were probably introduced to the idea of a compiler. Heres how it works? You first write out your program but a computer can't read it yet. So in order to actually run your program, you first need to pass it through this special program called the compiler. Then out pops a new version of your program that can be read by a computer.
It was probably then tested by running it with a bunch of inputs and expected outputs or something. So there are two versions of your program the one you wrote but a computer cant read and the magically generated one that a computer can read, except its better than magic. A compiler is a complex machine that bridges the gap between human readable code and computer readable code. So what exactly is the compiler doing and what does the executable version of your program actually look like.
What? You say you have no idea what I'm talking about. I think I know it might have happened in your first programming class you probably just used an IDE in which case this whole process was hidden from you when you click the Run button your work is saved, your program is built and it runs automatically. So now they're up to speed what does the executable look like and how does the compiler. All you did was Python scripting oh my god there are so many edge cases okay this is what happens to programs in general deal with it how does it work? Let's find out.
At a low level, computer processors can only do a small number of things, they can read and write to memory and they can do math with numbers they're holding. Modern processors do other things too but this is basically it. Now an executable program the one generated by the compiler is just a list of instructions for the processor to follow written in binary. The instructions are things like read these bytes for memory, do stuff to them write by system memory jump forward this many lines jump back this many lines but only if this flag is set stuff like that. A program expressed in a list of binary instructions is called machine code and this is the kind of program that your computer can actually read but why does a computer processor only read programs that look like this.
Why this specifically? Well in short here's how sir works. The processor already contains the circuitry to do all of these instructions but the correct circuitry only gets connected together when the corresponding instruction gets fed into it. The ones and zeros in the instruction caused certain transistors to open or close which ends up connecting the correct circuitry together to execute that instruction.
If you want to look more into how this works you might want to check out crash course computer science particularly episodes 5, through 8 episodes, 3 & 4 are also helpful if you need a refresher on binary and logic gates. Though you could just watch my video too and for the record crash course isn't paying me to recommend them I just really like this series but in short, just know that executable programs look like this.
But when you first learn to program you didn't need to know anything about these complicated machine code things like memory management, operations on bytes or conditional jumps. Programming was about variables in these statements and loops and functions. Well, these things are just higher-level constructs that make it easier for humans to think about programming. A program expressed in this form is called source code, its the version of a program that a human understands and thus the version that most humans actually write code in. The compiler's job is to take this source code that is human readable and turn it into machine code that is computer readable. But how does it do that? How does it turn a string of text into a list of instructions in binary?
Let's look at some examples here's a pretty simple program declare a variable of type integer that well call X, and then assign it a value of 3. For now, this program only exists says source code I know it looks like it has some kind of structure but for the computer its just a meaningless sequence of characters, its just text. And I know this program doesn't really do anything useful but we're starting simple. Let's pass this source code into the compiler and see what it does. The compiler first divides the text up into individual tokens. Its kind of like the compiler is figuring out what the words are in this program. Then the tokens are organized into a hierarchical structure known as a parse tree which is like figuring out what the grammar is in this program. The structure, then the compiler records context about the program including variable and function names this is the stuff that a compete needs to keep track of in different parts of the program.
The final step is to traverse the tree and figure out some machine code that would effectively do the same thing as this particular source code. I just want to clarify that typically compilers don't go directly from the parse tree to the machine code, there's usually a few intermediate steps that we're going to skip over. This is what the machine instructions look like in binary. It's a little hard to read and interpret select to shorten it by writing it in hexadecimal actually it's still a pain to read.
Let's start with if statement, we only execute the code in this block if this condition is true. If the condition is false we skip over this code. In assembly, the code inside the block gets translated normally but before it, we have some instructions for evaluating the condition and then we have a conditional jump instruction.
In this case, to jump past the block we want to skip, but only if we're supposed to skip it .the processor knows whether or not we are supposed to take this jump based on the result of the previous instruction. That instruction temporarily sets some flags in the processor so we could remember the result by the time we got to this conditional jump to skip over this block. If we're not supposed to skip it then the processor ignores the jump and continues normally, conveniently executing the code inside the block. Notice that these machine instructions are effectively doing the same thing as our s statement.
In assembly, we have the block's code the instructions to evaluate the condition and the jumps emulating the loop. Functions are a little more complicated basically functions encapsulate a code block so that it can be used in multiple parts of a program. Most programmers out there should know that they also isolate context and they can do recursion and stuff. This is what its equivalent assembly code looks like. Let's run it to see what it does.
Hitting the function call we save all context into memory allocate new space on top of it, execute the function code which may involve calling more functions and pop back down once it's done. This makes it possible for functions to call other functions or even themselves. You just push more memory and pop back down when you do so that's how compilers work, they take your source code and make machine code. But there's one problem if you compile your program on one computer and then copy it over to another computer and try to run it, it might not work if the new computer has a different operating system or has a different processor model it probably uses different machine instructions. So if you want your program to be able to run on this new computer you better be able to compile to that computers machine code and if your program's users might run your software on different platforms. Unless you're distributing the source you're going to need to keep a copy of an executable for every platform you want to run on.
Some languages like Java sneaked around this issue instead of machine code Java gets compiled to an intermediate representation known as bytecode and then the bytecode can get sent to other computers, where it gets converted to that specific Computers machine code when the program is run via an interpreter.
The language you write your code is compatible with a wide variety of processors and operating systems. Can you imagine what it was like? Back in the day when computer programming meant putting assembly or even machine code on to punch cards, not only would you need to figure out the correct holes to punch for each instruction. Remember that the compiler is a program itself if people use compilers to develop programs. How was the compiler developed?
Well, it was probably written and compiled in another language or even the same language compiling a compiler with a previous version of itself. If we follow this chain backward at some point we reached the origins of development tools, programs written, directly in machine code that help you write other programs literally automating part of the process of creating automation.
The history of computer languages is pretty complex no wonder it took decades to get where we are now. Remember that the next time you're writing code we have all these beautiful things like syntax highlighting, static analysis, object-oriented, programming, functional programming, libraries, linkers, build tools and debuggers but its still amazing that we can just tell our computers to follow our exact instructions at the push of a button.