make -j
, the command converts them into a set of enumerations and constants in C-headers. This allows you to refer to them in the future. stmt: simple_stmt | compound_stmt simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE # ... pass_stmt: 'pass' flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt break_stmt: 'break' continue_stmt: 'continue' # .. import_as_name: NAME ['as' NAME]
simple_stmt
is a simple expression, it may have a semicolon or not, for example, when you enter import pdb; pdb.set_trace()
import pdb; pdb.set_trace()
, and end on the new line NEWLINE. Pass_stmt
- word skip, break_stmt
- work interruption. Simple, isn't it? # expr_stmt: testlist_star_expr (annassign | augassign (yield_expr|testlist) | ('=' (yield_expr|testlist_star_expr))* | incr_stmt | decr_stmt) annassign: ':' test ['=' test] testlist_star_expr: (test|star_expr) (',' (test|star_expr))* [','] augassign: ('+=' | '-=' | '*=' | '@=' | '/=' | '%=' | '&=' | '|=' | '^=' | '<<=' | '>>=' | '**=' | '//=') # , del_stmt: 'del' exprlist # incr_stmt: '++' decr_stmt: '--'
Incr_stmt
will be our increment method and decr_stmt
will be decrement. Both follow NAME (variable name) and form a small autonomous expression. When we build a Python project, it will generate components for us (not now). Token <ERRORTOKEN>/'++' … Illegal token
tokenizer.c
. It has functions that read from a file (for example, python file.py
) a string (for example, REPL). It also processes a special comment for the encoding at the top of the files and analyzes your file as UTF-8, etc. It processes nesting, the async and yield keywords, detects sets and tuples of assignments, but only grammatically. He does not know what these things are or what to do with them. He cares only text.o
-notation for octal values is in the tokenizer . The code that actually creates octal values is in the compiler.INCREMENT
and DECREMENT
tokens are the keys that the DECREMENT
returns for each part of the code. /* */ const char *_PyParser_TokenNames[] = { "ENDMARKER", "NAME", "NUMBER", ... "INCREMENT", "DECREMENT", ...
INCREMENT
or DECREMENT
token each time we see ++ or -. There is already a function for two-character operators, so we are expanding it in accordance with our case. @@ -1175,11 +1177,13 @@ PyToken_TwoChars(int c1, int c2) break; case '+': switch (c2) { + case '+': return INCREMENT; case '=': return PLUSEQUAL; } break; case '-': switch (c2) { + case '-': return DECREMENT; case '=': return MINEQUAL; case '>': return RARROW; }
token.h
#define INCREMENT 58 #define DECREMENT 59
It's a token we know - !
ast.py
and ast.c
ast.c
is the file we need to change. The AST code is broken into methods that handle token types, ast_for_stmt
processes operators, ast_for_expr
processes expressions. We put incr_stmt
and decr_stmt
as possible expressions. They are almost identical to extended expressions, for example, test + = 1, but there is no right expression (1), it is implicit. static stmt_ty ast_for_expr_stmt(struct compiling *c, const node *n) { ... else if ((TYPE(CHILD(n, 1)) == incr_stmt) || (TYPE(CHILD(n, 1)) == decr_stmt)) { expr_ty expr1, expr2; node *ch = CHILD(n, 0); operator_ty operator; switch (TYPE(CHILD(n, 1))){ case incr_stmt: operator = Add; break; case decr_stmt: operator = Subtract; break; } expr1 = ast_for_testlist(c, ch); if (!expr1) { return NULL; } switch (expr1->kind) { case Name_kind: if (forbidden_name(c, expr1->v.Name.id, n, 0)) { return NULL; } expr1->v.Name.ctx = Store; break; default: ast_error(c, ch, "illegal target for increment/decrement"); return NULL; } // PyObject 1 PyObject *pynum = parsenumber(c, "1"); if (PyArena_AddPyObject(c->c_arena, pynum) < 0) { Py_DECREF(pynum); return NULL; } // ++/-- expr2 = Num(pynum, LINENO(n), n->n_col_offset, c->c_arena); return AugAssign(expr1, operator, expr2, LINENO(n), n->n_col_offset, c->c_arena);
incr_stmt
or decr_stmt
. Returning to the Python REPL after compilation, we can see our new statement!ast.parse ("test=1; test++).body[1]
, and you will see the return type of AugAssign
. AST just converted the operator into an expression that can be processed by the compiler. The AugAssign
function sets the Kind
field, which used by the compiler.compile_visit_stmt
. It's just a big switch statement that defines the type of statement. We had the AugAssign
type, so it accesses the compiler_augassign
for handling details. This function then converts our assertion into a set of bytecodes. This is an intermediate language between machine code (01010101) and the syntax tree. The byte code sequence is what is cached in .pyc files. static int compiler_augassign(struct compiler *c, stmt_ty s) { expr_ty e = s->v.AugAssign.target; expr_ty auge; assert(s->kind == AugAssign_kind); switch (e->kind) { ... case Name_kind: if (!compiler_nameop(c, e->v.Name.id, Load)) return 0; VISIT(c, expr, s->v.AugAssign.value); ADDOP(c, inplace_binop(c, s->v.AugAssign.op)); return compiler_nameop(c, e->v.Name.id, Store);
dis
module, you can see the bytecode: !
Source: https://habr.com/ru/post/345526/
All Articles