8000 GitHub - norskeld/cyan: C-to-Assembly (x86-64) compiler for a basic subset of C.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

norskeld/cyan

Repository files navigation

cyan

Checks

C-to-Assembly (x86-64) compiler for a basic subset of C.

Why

Simply to learn more about compilers, assembly, and how not to design languages. :)

Features

If something is missing in the list below, then it's not planned to be implemented.

  • Operators:
    • Unary:
      • Prefix (--, ++, !, ~, -)
      • Postfix (--, ++)
    • Binary
      • Arithmetic (+, -, *, /, %)
      • Bitwise (&, |, ^, <<, >>)
    • Logical (!, &&, ||)
    • Relational (<, <=, >, >=, ==, !=)
  • Local variables:
    • Declaration
    • Assignments
    • Compound assignments (+=, -=, etc.)
    • Scopes
  • Storage-class specifiers:
    • static
    • extern
    • typedef
  • Conditionals and control flow:
    • If statements
    • Ternary expressions
    • Labeled statements
    • Switch statements
    • goto statements
    • break and continue
  • Loops:
    • For loops
    • While loops
    • Do-while loops
  • Functions:
    • Function declarations
    • Function definitions
    • Function calls
  • Types:
    • void
    • int
    • long
    • unsigned int
    • unsigned long
    • double
    • char
    • signed char
    • unsigned char
    • Structs
    • Unions
    • Pointers
    • Pointer arithmetic
    • Arrays
  • Memory management:
    • sizeof operator
    • malloc
    • calloc
    • realloc
    • aligned_alloc
    • free

Optimizations:

  • Constant folding
  • Dead code elimination
  • Dead store elimination
  • Copy propagation
  • Register allocation
  • Register coalescing

Grammar

Defined using EBNF-like notation.

Definition
<program>     = <function>
<function>    = "int" <identifier> "(" "void" ")" <block>
<block>       = "{" { <block-item> } "}"
<block-item>  = <declaration> | <statement>
<declaration> = "int" <identifier> [ "=" <expression> ] ";"
<statement>   = "return" <expression> ";"
              | <expression> ";"
              | <identifier> ":" <statement>
              | "if" "(" <expression> ")" <statement> [ "else" <statement> ]
              | "break" ";"
              | "continue" ";"
              | "switch" "(" <expression> ")" <statement>
              | "while" "(" <expression> ")" <statement>
              | "do" <statement> "while" "(" <expression> ")" ";"
              | "for" "(" <initializer> [ <expression> ] ";" [ <expression> ] ";" [ <expression> ] ")" <statement>
              | "goto" <identifier> ";"
              | <block>
              | ";"
<initializer> = <declaration> | [ <expression> ] ";"
<expression>  = <factor>
              | <expression> <binary-op> <expression>
              | <expression> "?" <expression> ":" <expression>
<factor>      = <unary-op> <factor> | <postfix>
<postfix>     = <primary> { <postfix-op> }
<primary>     = <int> | <identifier> | "(" <expression> ")"
<unary-op>    = "-" | "~" | "!" | "++" | "--"
<postfix-op>  = "++" | "--"
<binary-op>   = "+" | "-" | "*" | "/" | "%"
              | "<<" | ">>" | "&" | "|" | "^"
              | "&&" | "||" | "==" | "!=" | "<" | "<=" | ">" | ">="
              | "=" | "+=" | "-=" | "*=" | "/=" | "%=" | "&=" | "|=" | "^=" | "<<=" | ">>="

<identifier>  = ? An identifier token ?
<int>         = ? A constant token ?

Trees and IRs

AST

This is used to represent the syntax tree of the program, and to perform semantic analysis.

Three Address Code (TAC)

This IR stands between the AST and the assembly code, and lets us handle structural transformations separately from the details of assembly language (this is to be done), and it's also well suited for applying some compile-time optimizations (also to be done).

Assembly AST (AAST)

This IR is very low-level, and is used to emit assembly code.

Links

License

MIT.

About

C-to-Assembly (x86-64) compiler for a basic subset of C.

Resources

License

Stars

Watchers

Forks

Languages

0