Python Proposal: Compound statement expressions, complex comprehensions, and assignment functions

# Python Proposal: Compound statement expressions, complex comprehensions, and assignment functions This is a proposal for a new syntax to allow complex comprehensions containing statements. It's inspired by other modern languages where these ideas are common and the line between statement and expression is sometimes blurry, such as Ruby, Scala, and Kotlin. I know that similar (sometimes identical) ideas have been discussed to death before, so I apologise for bringing this up again. I have made note of some previous objections and have tried very hard to be thorough and think through all the possible consequences. Theoretically this could be broken down into at least 5 separate proposals, but there is a lot of overlap between them so I have chosen to present them all at once. If this is a problem, I'm happy to hear suggestions on how to separate the discussions. ## Examples Here are some simple examples of the proposed new syntaxes: Instead of: if c1: z = foo(bar) x = y1 + z elif c2: x = y2 else: x = y3 one may write: x = ( if c1: z = foo(bar) y1 + z elif c2: y2 else: y3 ) Instead of: try: x = f() except ValueError: x = default1 except IndexError: x = default2 foo(x) one may write: foo( try: f() except ValueError: default1 except IndexError: default2 ) Instead of: sorted(lst, key=lambda t: t[0]) one may write: sorted(lst, key=def (t):= t[0]) What's going on above is that: - `def` can be used as an expression which returns the function object. - The function name is now optional, so in this case we've omitted it. - `:=` instead of `:` means that the expression is automatically returned Instead of: def key(x): # complex logic return y sorted(lst, key=key) one may write: sorted(lst, key= def key(t):= # complex logic y ) This is pretty similar to the last example. Key points: - I felt like putting a name there this time, because I can - Multiple statements are allowed in the function body. - `:=` automatically returns the last statement which must be an expression. Instead of: [ f(x) for y in z for x in y if g(x) ] one may write: [ for y in z: for x in y: if g(x): f(x) ] Instead of: lst = [] for x in y: if cond(x): break z = f(x) lst.append(z * 2) one may write: lst = [ for x in y: if cond(x): break z = f(x) yield z * 2 ] Instead of: [ {k: v for k, v in foo} for foo in bar ] one may write: [ for foo in bar: {for k, v in foo: k: v} ] ## Specification ### New expressions from compound statements The value of a list of statements (a body) is the value of the last statement, or no value if the last statement is not an expression. This is determined at compile time. The value of an `if/elif/else` or a `try/except/else` is the value of the body of the last clause that executes. If the body of one of the clauses has no value, the compound statement containing the clause cannot be used as an expression, and attempting to do so is a `SyntaxError`, even if some other clauses have values. An `if[/elif]` with no `else` doesn't have a value. A `try` with a `finally` doesn't have a value. A function definition (`def`) is an expression which evaluates to the function object. The function name is optional. Parentheses around parameters are still required, particularly because otherwise `def foo:` could mean either `def foo():` or `def (foo):` (an anonymous function with one parameter `foo`). An `if`, `try`, or `def` used as an expression shall here be called a compound statement expression, or CSE for short. #### Whitespace If all bodies of a CSE consist of just one expression, then we call this an *inline* CSE, because it can be inserted in the same line(s) as other code and there are fewer concerns about whitespace and disambiguation. The beginning of the CSE is clearly marked by a keyword, while the end would be determined by the same logic which determines the end of the current conditional expressions `x if C else y` or the body of a `lambda`. If the body of an inline CSE is also a CSE, the inner CSE must also be an inline CSE. Put differently, all our current expressions are inline expressions, and a CSE is also an inline expression if all of its bodies are inline expressions. Perhaps we should require that inner inline CSEs must have parentheses, e.g: foo = if C: (if D: x else: y) else: z If at least one body of a CSE has multiple statements or contains a non-expression statement, then it is not inline. The CSE must start and end with a newline, i.e. the lines containing the CSE cannot contain any tokens from outside. For example, this is allowed: foo = bar( if C: x else: y ) but this is not: foo = bar(if C: x else: y) This means that if you wanted to add more arguments after in a function or a list literal, the comma must be on the next line. Since this looks weird: foo = bar( if C: x else: y , spam, ) one might instead format it as: foo = bar( ( if C: x else: y ), spam, ) which is also a bit clunky but I think that's OK because it's a gentle deterrent from abusing this syntax and putting too much information in one expression. This restriction ensures that it's easy to copy paste entire lines to move them around, whereas refactoring the invalid example above without specific tools would be annoying and error-prone. It also makes it easy to adjust code outside the CSE (e.g. rename `foo` to something longer) without messing up indentation and alignment. The first line after the end of a non-inline CSE must be less indented than the bodies of the CSE. For example, this: x = \ if y: 1 else: 2 + 3 is the same as x = ( if y: 1 else: 2 ) + 3 whereas x = \ if y: 1 else: 2 + 3 means that the `else` has two statements `2` and `+ 3`. These rules should help keep code readable at a glance and resolve questions about how statements embedded in expressions are disambiguated, for both humans and parsers. Inside a non-inline CSE, the rules for indentation and such are the same as anywhere else. The syntax of the CSE is valid if and only if it's also valid as a normal statement outside any expression. ### Assignment functions An assignment function is a function which starts with `:=` instead of `:` and which automatically returns the value of its body, unless it encounters an explict return earlier on. If the body has no value, it's a syntax error. Implicit returns like this are quite popular in other modern programming languages. I've seen someone forgetting to return from a function as they were still adjusting to Python after years of Ruby. In this case I've copied the name and semantics from [Coconut](https://coconut.readthedocs.io/en/master/HELP.html), a language which extends Python with functional programming syntax. The `:=` could instead be `->, =>, ::`, etc. Coconut just uses `=`. The point is just to distinguish from regular functions so that existing functions don't start returning new values. Note that `def` as an expression is a distinct concept from assignment functions. It's the combination of them that makes for a decent alternative to `lambda`, but you could also just use a `def` with an explicit `return`. ### New style of comprehensions A list/set/dict comprehension or generator expression is written as the appropriate brackets containing any number of statements including at least one `for` or `while` loop. In the general case the body looks like a generator function, and the elements (e.g. of the list) are the yielded values. If the comprehension contains exactly one expression statement at any level of nesting, i.e. if there is only one place where a `yield` can be placed at the start of a statement, then `yield` is not required and the expression is implicitly yielded. In particular this means that any existing comprehension translated into the new style doesn't require `yield`. If the comprehension doesn't contain exactly one expression statement and doesn't contain a `yield`, it's a SyntaxError. For dictionary comprehensions, a `key: value` pair is allowed as its own pseudo-statement or in a yield. It's not a real expression. If needed, maybe we'll require surrounding the pair in parentheses or just use a 2-tuple instead. New style comprehensions follow the same rules as CSEs regarding whitespace. Since a loop on its own is not an expression, comprehensions with nested loops must be spread across multiple lines. ## Benefits/comparison to current methods ### Uniform syntax This replaces existing syntactical constructs that were each somewhat ad-hoc solutions with expressions that mimic existing syntax: - The conditional expression `x if C else y` is replaced with the `if` statement as an expression. - `lambda` is replaced by `def` as an expression, with optional syntactic sugar: - leaving out the name - implicitly returning with `:=` - Comprehensions just look like normal loops in brackets, or generator functions. - `pass` could just be replaced by `None`. It has no real use right now except to signal intent - writing `None` as a statement on its own currently looks weird and potentially means a misunderstanding that a linter should flag. With the new syntax it would become sensible and commonplace, e.g. as the body of an `except` for a default value when there's an exception. In general the lack of additional syntactic constructs should make it easier for beginners to learn the language. For example a lambda involves a new dedicated keyword, lack of parentheses, an implicit return, and the restriction to a single expression. A `def` expression can be cut and pasted verbatim. A particular concept that's easier to learn is comprehensions that contain multiple loops. Consider this comprehension over a nested list: [ f(cell) for row in matrix for cell in row ] It's easy for an experienced Python coder to write this, but for beginners it can easily be confusing. Yes there's a rule that they can learn, but putting it in reverse also seems logical, perhaps even more so: [ f(cell) for cell in row for row in matrix ] Now the comprehension is 'consistently backwards', it reads more like English, and the usage of `cell` is right next to its definition. But of course that order is wrong...unless we want a nested list comprehension that produces a new nested list: [ [ f(cell) for cell in row ] for row in matrix ] Again, it's not hard for an experienced coder to understand this, but for a beginner grappling with new concepts this is not great. Now consider how the same two comprehensions would be written in the new syntax: [ for row in matrix: for cell in row: f(cell) ] [ for row in matrix: [ for cell in row: f(cell) ] ] ### No restriction to a single expression The current constructs can only contain one expression in their 'bodies'. This restriction makes it difficult to solve certain problems elegantly and creates an uncomfortable grey area where it's hard to decide between squeezing maybe a bit too much into an expression or doing things 'manually'. This can lead to analysis paralysis and disagreements between coders and reviewers. For example, which of the following is the best? clean = [ line.strip() for line in lines if line.strip() ] stripped = [line.strip() for line in lines] clean = [line for line in stripped if line] clean = list(filter(None, map(str.strip, lines))) clean = [] for line in lines: line = line.strip() if line: clean.append(line) def clean_lines(): for line in lines: line = line.strip() if line: yield line clean = list(clean_lines()) You probably have a favourite, but it's very subjective and this kind of problem requires judgement depending on the situation. For example, I'd choose the first version in this case, but a different version if I had to worry about duplicating something more complex or expensive than `.strip()`. And again, there's an awkward sweet spot where it's hard to decide whether I care enough about the duplication. Even more annoying is when I've already written a list comprehension but a new requirement forces me to change it to, say, the `.append` version. It's a tedious refactoring and leaves me with a completely unhelpful `git diff`. What about assignment expressions? We could do this: clean = [ stripped for line in lines if (stripped := line.strip()) ] Like the nested loops, this is tricky to parse without experience. The execution order can be confusing and the variable is used away from where it's defined. Even if you like it, there can be no doubt that it's controversial. I think the fact that assignment expressions were a desired feature despite being so controversial is a symptom of this problem. It's the kind of thing that happens when we're stuck with the limitations of a single expression. The solution with the new syntax is: clean = [ for line in lines: stripped = line.strip() if stripped: stripped ] or if you'd like to use an assignment expression: clean = [ for line in lines: if stripped := line.strip(): stripped ] I think both of these look great and are easily better than any of the other options. And I think it would be the clear winner in any similar situation - no careful judgement needed. This would become the one (and only one) obvious way to do it. The new syntax has the elegance of list comprehensions and the flexibility of multiple statements. It's completely scalable and works equally well from the simplest comprehension to big complicated constructions. I can easily add logic as I please and get a nice simple diff. ### Support in previous PEPs The syntax `if C: x else: y` was AFAICT the generally preferred syntax for conditional expressions in [PEP 308](https://www.python.org/dev/peps/pep-0308/) before Guido chose the current syntax instead. [PEP 463 -- Exception-catching expressions](https://www.python.org/dev/peps/pep-0463/) proposed allowing expressions like: value = (lst[2] except IndexError: "No value") This was rejected, but clearly there was some demand and I think it would have been a nice feature. Perhaps `try` could be made similarly optional in simple inline expressions. ## Compatibility I believe this proposal has the following properties: - Implementing it (without dropping any existing syntax) would have no effect on the behaviour of existing code. - All of the new syntaxes and all existing syntax can happily live side by side. - There is exactly one obvious way to migrate any existing lambdas, conditional expressions, and comprehensions into the new syntax. Therefore this could easily be done by an automated tool and the new code would look sensible, albeit maybe in need of some formatting.