Lua notes (or: my very long list in which i analyze some oddly specific aspects of Lua)

i've said before that i really like lua but i don't really understand why, since i feel like i really shouldn't. and i think the reason is that almost all of the design choices (with some exceptions that i'll mention below) i see a clear rationale for, and even if i disagree with it it's clear that lua is opinionated but in a good way. it has a very specific scope and sticks with it, and that's very respectable.

i won't really talk about the syntax itself that much, since there's not much to analyze. some people may not like it that much, but it does its job so it's really fine. there are a few interesting parts of it though, which i'll probably write below if i remember.

lua was initially designed as a configuration language (and in part still kinda is), and in that context having global variables be the default makes a lot more sense. not that it's a good design, but i can understand where they were coming from

apparently lua has undefined behavior? not in the C sense, but "The order of the assignments in a [table constructor] is undefined." which is just dumb because i feel like there's not really a reason to require assignments to happen sequentially

"The field list can have an optional trailing separator, as a convenience for machine-generated code." because the only reason anyone would want to use a trailing comma is for machine-generated code??? this makes disallowing trailing commas in call expressions make more sense i guess

lua's grammar is ALMOST free-form, except for ONE instance and it really bothers me:

local a = b
(a).b()

this is ambiguous because it could be one expression in an assignment statement where b is called as a function, or it could be two statements. lua parses this as one statement; a semicolon needs to be added to the end of the first line to make it two statements. as far as i can tell this is the only time that a semicolon needs to be added for disambiguation

lua has a `goto` keyword but not a `continue` keyword, and this is one of the design decisions that i really can't justify, i just find it bizarre. `goto` doesn't really have a place in a language like lua IMO, and it was added relatively recently-ish (5.2), so i'm really curious what the actual rationale was here

speaking of design decisions that are just, bad: 1-based indexing. here's the thing about it though: it's bad, but it could be worse. because lua functions also use closed ranges rather than half-open ranges. which, if you're gonna do 1-based indexing, is the correct way to go about it. this is a decision which i can sorta understand why they would've thought it was a good idea initially: counting starts at 1, lua is a configuration language or something, should be easy for regular people to grasp, blablabla. there's also the argument that indices aren't offsets like they are in languages like C. none of this is to justify the decision, i think it's bad, but i think the intentions behind it were sound at least.

tables are actually great. they're a very general data structure but that also means they're useful for a lot of stuff. they're used as hash maps, and as arrays (as of 5.0 tables internally store both an array and a hash map, and decide intelligently which to use, so using tables as arrays doesn't have any significant overhead). plus there's metatables, which is honestly a really great way to do metaprogramming. not that it's generally a good idea for any language wanting to implement this stuff, but for lua it's perfect. i use metatables extensively in the generic tetromino game API, they're just very versatile

lua doesn't really have a spec, but it also kinda does? it has very extensive documentation, which details pretty much everything in the language, including the grammar itself, so it pretty much fills the role of a spec. i haven't found any instance of undocumented behavior, though it's possible there is and i just haven't looked hard enough. my one gripe is that it also mixes in implementation details alongside documentation for the language itself, which i feel should be separate. like: "The computation of the length of a table has a guaranteed worst time of O(log n), where n is the largest integer key in the table." is this an implementation requirement? it's difficult to say. i think lua was designed with the language and the implementation being intertwined, as one cohesive thing. which makes sense since lua is really just a library that is capable of interpreting lua code (and lots of other related stuff too obviously), but in practice, other lua implementations do exist, which leads me to:

luajit is the most popular alternative implementation of lua, but it only implements lua 5.1, with some 5.2 extensions. and there's no interest in supporting any later versions. so there's a split: lua code written for the original lua implementation isn't necessarily compatible with luajit, and vise versa. it's like, both projects have the idea in mind that code is written for only one implementation, which i think kinda sucks. it'd be nice to be able to use one as a drop-in replacement for the other.

the lua interpreter is a very thin wrapper around some library functions, so it's REALLY small. it's also how i was able to so easily implement a lua REPL within generic tetromino game: the entire mod.lua file is 242 lines, and that's including lots of unrelated stuff like binds for ctrl+L and ctrl+U, auxillary console functions, etc. one interesting thing i learned from implementing a lua interpreter though: how does the interpreter know when a statement is finished, or when it's just part of a multiline statement? the answer is, if the statement doesn't compile, it checks if the error message ends with "<eof>", and if it does, it treats it as a multiline statement and continues to prompt. which is... kinda weird lol. like it works, and it makes sense, but it still feels like a hack.

global variables are themselves stored in a table like any other table; it's accessible in the global namespace with _G. kinda. see, there's also a table called _ENV, which is the current environment. and _ENV is usually just equal to _G. but the environment of code can be changed when loading it, so you could have an actual environment that's different from _G. so then the "globals" are actually in _ENV, but you can make _G accessible from _ENV if you want. i take advantage of this in the generic tetromino game console, by implementing a custom "print" function that writes to the console in-game rather than stdout. the global "print" function is untouched, but when compiling code run in the console, i pass in a custom environment with my modifications. so it's kinda difficult to wrap your head around at first, but it's actually super useful

lua supports tail-call recursion, but it's not smart about it. that is, a statement of the form `return f(...)` causes a tail call (as an aside: it kinda sucks that `...` is an actual operator in lua so i can't use it in examples without specifying that no here it's actually just a placeholder). but nothing else does. like, `return (f())` *doesn't* tail call. technically the semantics here are different, since if f() returns multiple values, parenthesizing it causes all but the first to be discarded (more on multivals below), but it's still kinda funny that just wrapping an expression in parentheses gets rid of the tail call entirely. and i kinda actually like this approach: lua doesn't try to do anything super clever here to check when tail calls can be implemented, it has a very specific syntax that causes them, so they're predictable. i think this is the best way to implement tail calls in a language like this.

the # operator gets the length of a string or table. for strings it just returns the number of bytes in the string, simple enough. for tables, it's a bit more complicated. the idea is it gets the length of the array part of the table, like, the number of numbered elements in the table, starting at 1 (another aside: this is the only part of the lua language itself that mandates that arrays start at 1, everything else is either an implementation detail or part of the stdlib. so, that's kinda annoying lol). but what if there's holes in the table? like, if t = {1,2,3,4}, then i do `t[3] = nil`, what does `#t` yield? i would've thought 2, but it's actually unspecified whether it returns 2 or 4. the implementation can just do whatever it wants there. and here's the thing: it makes sense when you think about it: the "last" element of an "array" is an element which isn't nil, where the next sequential element is nil, with exceptions for when [1] is nil or [math.maxinteger] is not nil. so, for efficiency reasons, the implementation can just choose the first element that fits this description that it finds, so it doesn't have to iterate through the entire array portion of the table everytime the length is calculated. so it makes sense, but i still don't like it. i think this is the best solution though, since the alternative is making getting the length an O(n) operation. btw, when i put that quote above about O(log n) guaranteed worse time, it's for this. so, it's unclear if an implementation is *allowed* to always choose to traverse linearly, since then this statement wouldn't hold

strings in lua are fine, they get the job done, and for lua specifically they're probably the best they can be. they're just sequences of bytes, no particular encoding is assumed. they also store their length, so NUL bytes are allowed. that being said, it does make working with lua strings from C very easy to get wrong: there's functions that get the length of a string alongside the data itself, but there's also functions to just get the data, and in that case it's impossible to know whether a NUL byte is a terminating byte or part of the string data. even if you retrieve the length of the string, you still can't use libc string functions if you want to correclty handle NUL bytes. generic tetromino game pretty much always gets this wrong, since it uses strcmp a lot on lua strings, so i'll have to go through and fix all of that. but it's a pretty big footgun. lua's stdlib also provides a `string` library, which has utility functions for working with strings, but these functions also treat the strings as byte sequences, so working with e.g. UTF-8 encoded data is more difficult than it maybe should be. but given lua's stdlib is very minimal (see below), it kinda makes sense. again, lua's strings are reasonable enough in the context of an embeddable language that works fine-ish with C, so i think their design is sound.

let's talk about nil: it's used both as a value itself, but also for the absence of a value. and, wait for it, this yet again is a design decision that has very good rationale for lua, even if it's kinda dumb sometimes. it means that empty fields in tables are actually just nil values, and since the global namespace is also itself a table, unset variables are implicitly "nil". this makes checking whether a field in a table is set very easy: just check if it's nil. no need for a try-catch mechanism or a protected call. in practice, this actually works pretty well, with the exception that if you misspell a variable it just gives back nil and continues along. another aspect of this design is that the fact that binding and assignment using the same syntax *actually makes sense here*. in languages like python, it doesn't feel right, since binding and assignment are fundamentally different operations, and combining them into one necessitates the inclusion of extra features like `global` and `nonlocal`. in lua, the conceptual model is that you're always assigning. the variable already "existed", as nil, and you're just setting it to a new value. if you want to "delete" it, you just set its value back to nil. this is another part of lua's design that i feel like i should really dislike but actually think works really well

here's something else about nil, and also about functions in general: if you pass the "incorrect" number of arguments to a lua function, it doesn't error. it just sets all extra parameters to nil, and discards any extra arguments. on one hand: this allows for optional parameters to be added without introducing any new language features, and it's also consistent with other parts of lua which work very similarly. it also keeps with lua's VERY simple type system, where there's only one function type, much like there's only one table type. on the other hand: it means mistakes that should be easy to catch are silently ignored. but the thing that *really* bothers me about this design is that internally, lua knows the difference between a "nil" parameter and a parameter where nothing was given. notice that stdlib functions will correctly error out if you supply the wrong number of arguments: tostring() will error, but tostring(nil) returns "nil". this is impossible to implement in lua. it's only possible in C (or whatever the host language you're using is), since you can distinguish between LUA_TNIL and LUA_TNONE. and knowing that something is *possible* but just can't be done due to arbitrary limitations in the language is very annoying, albeit i'm not sure how i'd design things differently.

lots of stdlib functions return a "fail" value, which is currently equal to nil. i'll let the docs do the talking here: "The notation fail means a false value representing some kind of failure. (Currently, fail is equal to nil, but that may change in future versions. The recommendation is to always test the success of these functions with (not status), instead of (status == nil).)" i *really* dislike this. i get that the idea is that in a newer version of the language a different error value could be added here, but in practice lua versions are almost always incompatible anyways. all this does is add a subtle footgun, where comparing to nil *works* but is apparently incorrect since the behavior could change at some point. this is also a matter of stylistic preference: i always prefer to directly compare to a null value, even in languages like C where it's not strictly required. this includes in lua, but this design choice requires that i use `not` here instead (or check for success by using the function return value itself as an `if` condition). (as an aside: having both `nil` and `false` be falsey values and having everything else be truthy is questionable, though i guess i don't know how else i'd want it to work. it's better than having 0 be falsey, a mistake that python inherits)

lua doesn't have a try-catch mechanism, or exceptions at all. it has errors that can be raised with the `error` function, and it has a `pcall` function to do a "protected call" of a function, which returns whether an error occured, as well as the function return result(s) if an error didn't occur, or the error value if it did occur. i actually really like this: it's much simpler than try-catch would be, but it works just as well, at least within lua. error values can be anything, though in practice they're usually strings, but i think it's fine to keep this as a convention rather than a requirement. i've also noticed that errors are usually reserved for programming errors, like incorrect function arguments and assertion failures, not for exceptional conditions like failing to open a file or a write failure, hence the "fail" value described above. i have no real opinion on this: if the programmer forgets to check the return value of a function, then the program will most likely just error down the line, when they try to e.g. index from a nil value. my only real complaint here is since errors are implemented internally with longjmp, it's difficult to debug an error in an unprotected lua API call, since the backtrace is meaningless.

strings are very commonly used instead of enum values, which i don't like very much, since it's not clear what all the possible values are that you could provide without searching through the docs, and it's easier to make typos. two examples of this that come to mind are `collectgarbage` and `select`, the latter of which has a special case when the first argument is "#" instead of an integer, returning the number of variadic arguments supplied (which really should just be its own separate function). generic tetromino game is kinda inconsistent here, but i still like the way it does things: "types" like Tetromino are actually just a string of the letter of the tetromino, and when a value indicates either a-type or b-type, it just uses the string "a" or "b", but it also has constants in game.color, game.events, and game.screen, so all functions taking in a color, event, or screen actually take in what's basically an enum value. this isn't really idiomatic lua, but i still prefer it to using strings, so i'm probably gonna keep this. my mind could be changed here though.

the stdlib in lua is, ehhh fine? but also not very good. so, it's intentionally very minimal, which given lua's intended use cases makes complete sense. my issue is that what *is* provided is often not satisfactory, and there some very basic stuff that *isn't* provided, when it could be, perhaps getting rid of some stuff that *is* provided but not actually that useful. just going through a few things:

- there's no function to deep-copy a table. it's not difficult to implement
  yourself, but it's also strange to not include it within the `table`
  namespace. a shallow-copy can be done with the poorly named `table.move`
- there's functions to compile a lua function from a string or from a file:
  `load` and `loadfile`. but there's also `dofile`, which is a convenience
  function that calls `loadfile` and then calls the returned function with no
  arguments. but no such equivalent exists for `load`, that is, there's no
  `dostring`. which isn't really a problem, since you can just, call the
  function yourself, it's just strange. i think it would've made more sense to
  not provide `dofile`, since the equivalent operation is just `loadfile(foo)()`
- `io.tmpfile` is provided to create and open a temporary file (which is
  automatically removed when the program ends), but there's also `os.tmpname` to
  get the name of a temporary file. the latter of which is a strange choice for
  inclusion, since it's very easy to use it incorrectly and create a race
  condition, and i don't see much of a reason for it over `io.tmpfile`. i
  wouldn't be bringing this up if it weren't for the fact that the rest of the
  `io` and `os` namespaces are so small, where even other useful functions just
  aren't provided at all, so i'm not a fan of the fact that what they *did*
  provide is just, not good
- files created with `io.tmpfile` are automatically removed when the program
  ends; not when they go out of scope (as in a to-be-closed variable, see below)
  or when they're garbage collected. i guess the fact that it doesn't happen
  when garbage collected is good, since it's more predictable? but i also feel
  like, if the file is no longer being used, there's no reason to keep it lying
  around.
- there's no `os.chdir` for instance, i had to implement my own chdir function
  in generic tetromino game
- there's `os.execute`, which is equivalent to libc's system(3), which is just
  bad all around. it really wouldn't be difficult to have an execute function
  where each argument is passed separately, rather than having a shell do the
  parsing
- both `string.find` and `string.match` exist, which both do pretty much the
  same thing but with slightly different behavior, and i feel like it should be
  possible to merge these into one function, but maybe the split is there for a
  reason. sometimes it is nicer to use one over the other i suppose.

many string functions operate on "patterns", which are basically just less powerful regular expressions (albeit there's some things patterns can do that pure regular expressions can't, like obtain the index of a part of the string as a match). and these kinda make sense for lua. they're simpler than regular expressions, meaning the implementation is smaller. and i imagine they're also very fast (i haven't actually done any tests here, and true regular expressions are also very fast so i'm not sure if there's actually a noticable difference here). but given how minimal the stdlib is, patterns make some sense here. i also like how patterns use "%" in place of where regular expressions use a backslash, so they're nicer to use in string literals. there's only a few big problems i have here:

- there's no way to perform a case-insensitive match. you need to resort to very
  hacky wrapper functions (that you need to write yourself) that modifies a
  pattern
- string.find has an optional boolean parameter to do a raw string search rather
  than interpreting a pattern, which 1. ew boolean parameter, and 2. this is the
  only function that has this option. all other functions that take patterns
  (string.match, string.gmatch, string.gsub) have no such option. there's no
  real justification for this.

generic for loops are just syntactic sugar for iterator functions: the statement takes an iterator function, a state (usually a table), and other initial values to should be passed into the iterator (usually this is all done by just having another function be called to return these values, such as `pairs` or `ipairs`), and then the iterator is called repeatedly with these values, each time returning new values to be both used for executing the loop body and to be passed into the iterator again (alongside the state). the loop ends when the iterator function returns nil. so, like:

for k, v in pairs(t) do
    -- stuff stuff stuff
end

`pairs` returns the iterator function `next`, the argument `t` (a table), and nil, so then `next` is repeatedly called, each time returning the next key-value pair, which is then passed back into `next` (alongside `t`) until the entire table has been iterated through. this is another example of lua being very opinionated, basically mandating that this design is used for iterators. however, in doing so, it doesn't need to add anything new to introduce iterators, just some syntactic sugar to make them easier to use. so this design is actually really nice. iterating over a table isn't builtin, the stdlib provides it (albeit it is builtin to the API, the stdlib function is just a wrapper around lua_next), but you can choose to implement your own custom iterator pretty easily and use it the same way you'd use the stdlib one. basically, this is another example of lua not adding any new concepts into the language, just extending what's already there to allow things like iterators to exist very nicely. and this is what i really like about lua: this (alongside tables) are great examples of places where the language has one very general concept that's used to implement more specific things.

multivals in lua are a bit interesting. there's already a good write-up about all of their footguns and weird behavior here: https://benaiah.me/posts/everything-you-didnt-want-to-know-about-lua-multivals/ but despite their unintuitive behavior, i don't actually think they're awful. i think in lua, multivals are a more sensible choice than tuples: for lua, it would be always using tables in place of multivals, and that would just be ugly and distracting. their behavior is weird, sure, but i don't think it's possible to implement something like this *without* it having some strange behavior. the choices the designers made i think are reasonable with this in mind.

lua uses ^ for exponentiation, which is objectively the correct thing to do for a scripting language like this (unlike python which uses **). for xor, it uses ~, which, hot take, is again the objectively correct thing to do. it's pretty much the opposite of what go did, where go uses ^ for xor and unary not, lua uses ~ for both, using ^ for exponentiation. and it *works*. i don't know what other languages do this, but i really wish this were more common.

that being said, it is one inconsistency from other languages that lua has, of which lua has MANY. this is a common complaint for lua, that it's just very different, and unnecessarily so. for instance, instead of !=, it uses ~=. which, ok, if you *really* think about it it kinda makes sense, since ! isn't used anywhere else in the language, and ~ can mean "not", but that being said != is more common and they really should've used it here. comments use --, which is also different, but this isn't as much of a big deal, though it does mean lua needs to include a special case where the first line is skipped if it begins with # to allow for shebangs, where languages like sh and python just interpret these as normal comments. for raw multiline strings, lua uses [[double brackets]], which i also really like, preferring it over methods like backticks, or python's R"strange syntax". finally, it uses --[[this syntax]] for multiline comments: it's just the multiline string syntax but prefixed with --, and again i actually really like this. i guess the main takeaway here is that sure, there's a couple weird inconsistencies, but once you know them they honestly don't matter, at all. i can see how people would mistakenly type != when they mean ~=, but after writing a lot of lua (in addition to other languages) i can say i really don't make that mistake. when designing new languages, don't do what lua did, but what lua did really isn't a big deal.

addition and concatenation are separate operators in lua: addition is done with `+`, concatenation with `..`. and this is another decision that at first is a bit strange, but is honestly a very good choice, much better than overloading `+`. to be clear, lua is perfectly capable of overloading operators, but having separate operators here means that types can be converted implicitly without needing to implement awful type conversion rules like in JS. numbers can be concatenated: `1 .. 2`. numbers can be concatenated with strings: `1 .. "hi"` (one issue is that a space is needed between the number and the `..` since otherwise it parses as a float, then fails to parse the rest of the expression, and although i dislike this and think a different operator could've been chosen, in practice i always put spaces around `..` anyway so it's a non-issue). furthermore, you can add string representations of numbers: `"1" + "2" == 3`, `1 + "2" == 3`. there's no JS-like fuckery happening here, this is just how these operators are defined. `..` does the equivalent of `tostring` on its operands; `+` does the equivalent of `tonumber` on its operands. so you end up with predictable behavior *and* nice implicit type conversions (which are nice for a language like this). so i really like this. i've also heard people say that using `+` for concatenation is non-sensical since you're not "adding" anything, and while i guess that's true, i've never seen anyone be confused by this (even when first learning), and there's lots of other common issues like `=` being used for assignment rather than equality that are even more common)

(aside: hot take: `=` really shouldn't be used for assignment. it was always a mistake to use this for assignment and `==` for equality, but at this point i understand why most language choose this, since it's so common that deviating would just be being different for no real reason. that being said... lua is already different for no reason! so for lua it would've been nice to have something like `:=` and `=` here. but this is all nitpicking syntax)

lua 5.4 adds const variables. the syntax is a bit weird: `local foo <const> = bar` (since the angle brackets can have other attributes in them besides "const", but i'll get to that), and these work pretty much as you'd expect: trying to assign to them results in an error. because of lua's whole "nil is the absence a value" thing, this can't really be done for globals or table fields, since then it'd be impossible to "delete" them. but it's nice i guess for locals? i mean, i've *never* seen this used in actual code. i certainly don't use it, even when i could. the syntax is just clunky enough that i just don't bother. and of course it doesn't help that this can't be used in luajit since it's 5.4.

in addition to <const>, there's also <close>, which marks a variable as "to-be-closed", meaning its `__close` metamethod will be called when it goes out of scope. and, i still don't understand fully what the point of this is. why is this necessary when garbage collection already exists? you can just use the `__gc` metamethod to close things when it's garbage collected; files returned from `io` already do this. i can see how this behaves differently, but i don't see why you'd want to use this. i guess if you use `collectgarbage` to stop the garbage collector? idk, i have to imagine there is a compelling use case here. notably, to-be-closed variables interfere with tail calls, since tail calls can't happen if a variable needs to be closed once the scope is closed.

speaking of things that i really don't understand and haven't found a use case for: coroutines! on paper i understand how they work, something something cooperative multitasking, but in practice i've never used these, and everytime i run into a problem that i *think* is best solved with corountines, i realize that actually no coroutines aren't good for this problem, and it's actually pretty easily solvable in other ways. i'd be interested to here actual examples for when coroutines are useful, because as of now i feel like they don't fit into lua very well.

"As a convention, programs should avoid creating names that start with an underscore followed by one or more uppercase letters (such as _VERSION)." it's weird that this is called a "convention", rather than just calling these reserved identifiers.

here's a weird bit of lua syntax: `repeat ... until foo`. it's kinda like a while loop, except the condition is checked after the body is executed rather than before (like do-while in C), but the condition is negated for some reason? i understand why do-while on its own isn't possible, since it wouldn't work within the grammar (`do ... end` blocks already exist, so `while` would be interpreted as the start of a new statement within the block rather than the end of the block), but using `until` here is bizarre. i don't think including this syntax at all makes much sense, there's very few times where you'd use this over a regular `while` loop.

nil keys in tables are disallowed (i.e. `t[nil] = v` will error), and i still can't figure out why. i'm sure there's a very good reason for this, i just don't know what it is.

for some reason people don't talk about this that much: lua is VERY portable. like, portable as in written in the subset of ANSI C that's also compatible with C++ and uses only a near-universally supported subset of libc. it'll run on pretty much anything. and this makes sense for an embedded language, but it's still very impressive nonetheless. there's very few other languages that come close to this level of portability.