Skip to main content

11 posts tagged with "blogging"

View All Tags

· One min read
Abhinn Pandey

Finishing up

All the features for v1 is Now completed

The Features for v1 are done, This Includes :-

  • Scanner
  • Parser
  • VM
  • Strings
  • Variables
  • Integers
  • Artithematic Operations
  • Closures
  • Array Data Structure
  • HashMap Data Structure
  • BuiltIn Functions
  • Boolean
  • Conditional Logics
  • Loops

Now it is time for the Hard Part:-

BUGS b_gs bbuugggggs BUUUUggsss fIxInG

· One min read
Abhinn Pandey

Exams just got Over and Programming just got started

Basics Are Finished

The Basic Implementations are done, This Includes :-

  • Scanner
  • Parser
  • VM
  • Strings
  • Variables
  • Integers
  • Artithematic Operations

So Whats next ?

Now The Plan is to Add More things such as :-

  • [] Closures
  • [] Array Data Structure
  • [] HashMap Data Structure
  • [] BuiltIn Functions
  • [] Boolean
  • [] Conditional Logics
  • [] Loops

· One min read
Abhinn Pandey

NEVER GONNA GIVE UP NEVER GONNA GIVE YOU UP

I got Distracted by School Work And Exams and will continue be distracted because of Mid-term Examination, But i Will Continue to Work on the Pig, OINK !

I COMPLETED THE VM

I Dont know how it works and why it works but it works, i think it works because of the match Statements and assert_eq! macro

WILL CONTINUE DEVLOGS AFTER THE EXAMS

· One min read
Abhinn Pandey

" Magicians protect their secrets not because the secrets are large and important, but because they are so small and trivial. The wonderful effects created on stage are often the result of a secret so absurd that the magician would be embarrassed to admit that that was how it was done. "

-Christopher Priest, The Prestige

The idea of representing a program as a series of bytecode instructions has been discussed extensively, but it is like to studying biology using only plush, lifeless creatures. Though we've never actually seen instructions in action, we know what they do in theory, so it's difficult to fully comprehend what they accomplish. Without a solid grasp of the behavior of the bytecode, it would be difficult to create a compiler that produces bytecode.

Virtual Machine

Virtual Machine is an Machine which uses bytecode on top of the Bare Metal. Think of it as an Portable Machine we can carry and run anywhere.

· 5 min read
Abhinn Pandey

" If you find that you’re spending almost all your time on theory, start turning some attention to practical things; it will improve your theories. If you find that you’re spending almost all your time on practice, start turning some attention to theoretical things; it will improve your practice. "

-Donald Knuth

Why not Walk the AST ?

The inefficiency arises from the way memory is utilized. Each bit of code syntax is transformed into an AST node, resulting in a multitude of objects linked together with numerous pointers. Even a simple expression like '1 + 2' generates a cluster of objects with interconnections.

These pointers add extra overhead of 32 or 64 bits to each object, dispersing data across the heap in a disjointed structure, which adversely impacts spatial locality.

Modern CPUs operate far more rapidly than their capability to retrieve data from RAM. To address this, processors incorporate multiple tiers of caching. When data is present in the cache, retrieval occurs significantly faster—sometimes up to a hundred times faster.

The mechanism that fills this cache involves speculative inclusion by the machine. Essentially, when the CPU reads data from RAM, it pulls in a set of adjacent bytes, storing them in the cache. If subsequent data requests align with these cached lines, the CPU operates efficiently like a well-coordinated assembly line.

To optimize cache utilization, the way code is stored in memory should mimic a dense and orderly structure, facilitating sequential reading.

However, the tree-like structure of the AST doesn't conform to this efficiency requirement. The dispersed nature of sub-objects within the tree causes inefficiencies. Traversing the tree often leads the CPU to step outside the cache bounds, necessitating a stall until new data is fetched from RAM. The overhead generated by these tree nodes, with their numerous pointers and object headers, further distances objects from each other, hindering cache performance.

Why not compile to native code?

To achieve maximum speed, it's essential to eliminate layers of indirection. This involves reaching down to the hardware level—machine code. The concept itself implies rapidity and directness.

The fastest programming languages compile directly into the native instruction set supported by the chip. This method has remained the most efficient since the early days when programmers manually crafted programs in machine code.

Writing machine code, often done on punched cards back in the day, was an intricate process. Native code is a concise sequence of operations represented in binary. Each instruction is typically between one and a few bytes in length and operates at an incredibly low level, involving commands like moving values between memory addresses and performing arithmetic operations on registers.

The CPU executes these instructions sequentially, without any tree structure as seen in our AST. Control flow is managed by direct jumps from one code location to another, devoid of indirection, overhead, or unnecessary pointer manipulation.

However, this lightning-fast performance comes with a price. Compiling code to native instructions is an arduous task. Most widely used chips have complex architectures with a multitude of instructions accumulated over decades, requiring sophisticated techniques for register allocation, pipelining, and instruction scheduling.

Moreover, opting for native code sacrifices portability. Mastery of one architecture merely grants access to one of several prevalent instruction sets. To support multiple architectures, one must delve into and write separate back ends for each unique instruction set—a time-consuming and challenging endeavor.

What is Bytecode?

Consider two ends of a spectrum: a tree-walk interpreter, which is straightforward, adaptable, and sluggish, versus native code, which is intricate, tied to specific platforms, and swift. Bytecode resides in the middle ground, preserving the portability of a tree-walker while sacrificing some simplicity for a performance boost, though not matching the speed of native code.

Structurally, bytecode mirrors machine code. It consists of a compact, sequential arrangement of binary instructions, reducing overhead and aligning favorably with cache behavior. However, it operates as a much simpler, higher-level instruction set compared to real-world chip architectures. (In many bytecode formats, each instruction is a single byte in length, hence the term "bytecode.")

If you were tasked with designing the simplest architecture for a native compiler from a source language, bytecode embodies that notion. It represents an idealized, conceptual instruction set that facilitates the compiler-writing process.

The challenge with a conceptual architecture is its non-existence in reality. To address this, an emulator comes into play—an emulated chip designed in software to interpret the bytecode instruction by instruction. This emulation creates a virtual machine (VM) framework.

The overhead introduced by this emulation layer is a primary reason why bytecode execution is slower than native code. However, in return, it bestows upon us the gift of portability. By crafting our VM in a language like C, which is already supported across the machines we're interested in, we gain the ability to execute our emulator on a variety of hardware platforms.

· 2 min read
Abhinn Pandey

In the realm of code representation, our primary goal is to devise a format that's easily generated by the parser and effortlessly consumed by the interpreter. When deciphering an arithmetic expression like "1 + 2 * 3 - 4," your brain employs the order of operations (PEMDAS - Parentheses, Exponents, Multiplication and Division, Addition and Subtraction) to evaluate it.

Mentally, you organize this expression in a tree structure. In this tree representation, the leaf nodes contain numbers, while interior nodes represent operators with branches corresponding to their operands.

To evaluate an arithmetic expression from this tree, you start from the leaves, computing the values of their subtrees and moving upwards—performing a post-order traversal. The process involves:

Evaluating the tree from the bottom up.

A. Initiating with the entire tree, evaluate the bottom-most operation, 2 * 3.

B. Then, compute the + operation.

C. Proceed with the - operation.

D. Finally, arrive at the ultimate answer.

AST_TREE

Given an arithmetic expression, drawing this type of tree structure is straightforward. Similarly, when provided with such a tree, evaluating it becomes effortless. Hence, it seems intuitive that a workable representation of our code could be a tree mirroring the grammatical structure—specifically, the nesting of operators in the language.

However, in order to define this grammar precisely, we delve into the realm of formal grammars. Just as there are lexical grammars for individual characters and lexemes, we now shift our focus to syntactic grammars, operating at a higher level of granularity. Here, each "letter" in the alphabet corresponds to an entire token, and a "string" signifies a sequence of tokens—essentially, an entire expression.

Let's clarify this transition using a table:

TerminologyLexical GrammarSyntactic Grammar
Alphabet isCharactersTokens
A string isToken or lexemeExpression
Implemented byScannerParser

This shift in focus from lexical to syntactic grammar helps us define the rules and structure of the expressions within the language more precisely, laying the groundwork for the interpreter.

· One min read
Abhinn Pandey

Fixing Bugs is like being the Culprit and Detective in an Thriller Movie

Its Pretty Hard to detect Bugs and also to not Add anymore while fixing them, so Lets Write some Tests to Find them and to maintain the quality of the code.

Tests are Basically an INPUT of an experssion and the expected OUTPUT of the expression. For Example

  • INPUT :- add(1,2)
  • EXPECTED OUTPUT :- 3
  • OUTPUT :- 5
  • TEST RESULT :- FAIL

In Rust, we can use assert_eq!() macro to check these Tests,

Error Handling !!

The tools our language provides for dealing with errors make up a large portion of its user interface. When the user’s code is working, they aren’t thinking about our language at all—their headspace is all about their program. It’s usually only when things go wrong that they notice our implementation.

When that happens, it’s up to us to give the user all the information they need to understand what went wrong and guide them gently back to where they are trying to go.

· 4 min read
Abhinn Pandey

"Take big bites. Anything worth doing is worth overdoing."

- Robert A. Heinlein, Time Enough for Love

The commencement of any compiler or interpreter involves scanning, a process where raw source code, depicted as a string of characters, undergoes segmentation into meaningful segments termed as tokens. These tokens represent the essential building blocks that define the structure and grammar of the language.

Referred to interchangeably as "scanning" or "lexing" (short for "lexical analysis") over time, these terms historically carried different interpretations. In the earlier computing era, "scanner" was specifically linked to code handling raw source code characters' retrieval from disk and subsequent buffering in memory. Meanwhile, "lexing" encompassed the phase following this, involving the useful processing of these characters.

Initiating with scanning serves as a favorable entry point for us. The task isn't overly complex; essentially, it involves a match statement with ambitious undertones, serving as a precursor to more intricate elements later on. By chapter's end, our aim is to construct a comprehensive and efficient scanner capable of transforming any Pig source code string into tokens, subsequently utilized in parsing.

Tokens

Tokens represent the primary components in the language's syntax, and our initial focus will be on their implementation. They encompass diverse elements, such as identifiers, literals (like integers, floats, strings), and various operators, meticulously categorized within the Token enum structure.

#[derive(Debug, Clone)]
pub enum Token {
Illegal,
Eof,

// Identifiers + literals
Ident(String), // add, foobar, x, y, ...
Int(String), // 123456
Float(String), // 123.456
String(String), // "hello"

// Operators
Assign,
Plus,
Minus,
Bang,
Asterisk,
Slash,

Lt,
Gt,

Eq,
NotEq,

Comma,
Colon,
Semicolon,

Lparen,
Rparen,
Lbrace,
Rbrace,
Lbracket,
Rbracket,

Function,
Let,
True,
False,
If,
Else,
Return,
}

Scanner

With the token definitions in place, the subsequent step is the implementation of the scanner. This component traverses through each line of the source code, character by character, discerning and assembling them into respective tokens. Employing a match construct, individual characters are identified and mapped to their corresponding tokens. For instance, characters like '=', ':', ';', '(' are matched to specific tokens like Token::Eq, Token::Colon, Token::Semicolon, Token::Lparen, etc.

Furthermore, the code snippet incorporates conditional logic for complex character combinations (like '==', '!=') and other constructs such as identifiers, integers, floating-point numbers, and strings. These conditions are instrumental in accurately categorizing characters and generating respective tokens.

        match self.ch {
'=' => {
if self.peek_char() == '=' {
self.read_char();
tok = Token::Eq;
} else {
tok = Token::Assign;
}
}
':' => {
tok = Token::Colon;
}
';' => {
tok = Token::Semicolon;
}
'(' => {
tok = Token::Lparen;
}
')' => {
tok = Token::Rparen;
}
',' => {
tok = Token::Comma;
}
'+' => {
tok = Token::Plus;
}
'-' => {
tok = Token::Minus;
}
'!' => {
if self.peek_char() == '=' {
self.read_char();
tok = Token::NotEq;
} else {
tok = Token::Bang;
}
}
'*' => {
tok = Token::Asterisk;
}
'/' => {
tok = Token::Slash;
}
'<' => {
tok = Token::Lt;
}
'>' => {
tok = Token::Gt;
}
'{' => {
tok = Token::Lbrace;
}
'}' => {
tok = Token::Rbrace;
}
'[' => {
tok = Token::Lbracket;
}
']' => {
tok = Token::Rbracket;
}
'"' => {
tok = Token::String(self.read_string().to_string());
}
'\u{0}' => {
tok = Token::Eof;
}
_ => {
if is_letter(self.ch) {
let ident = self.read_identifier();
return token::lookup_ident(ident);
} else if is_digit(self.ch) {
let integer_part = self.read_number().to_string();
if self.ch == '.' && is_digit(self.peek_char()) {
self.read_char();
let fractional_part = self.read_number();
return Token::Float(format!("{}.{}", integer_part, fractional_part));
} else {
return Token::Int(integer_part);
}
} else {
tok = Token::Illegal
}
}
}

· 2 min read
Abhinn Pandey

Project Overview

Pig is a Simple Quick Programming Language for Beginners and Advanced Users.

Project Goals

The Main Target is to create an Programming Language that anyone can use even those who dont know Coding, Here are some Goals To Fulfill:

  1. Showcase the developer's skills in Logical and Analytical Thinking and Programming.
  2. Create a captivating Programming Language that WORKS.
  3. Implement Variables, Datatypes and Functions.
  4. Implement Control Flow :- If,Else,Then and For.

Project Scope

The project scope encompasses the following key elements:

Scanner

Scanner will Do the Lexical Analysis and Translates source code to to Tokens,

Parser

Parser will Make the AST or Abstract Syntax Tree for defining the GRAMMAR Of the Language.

CodeGen

Codegen Function to Translate ASTs to LLVM IR, So LLVM Can Compile it.

Compiler

LLVM will take the IR and Translate it to Machine Code.

Project Requirements

To achieve the desired project goals and scope, the following requirements have been identified:

  1. Language Keywords and Grammer
  2. An Programming Language to Implement the Compiler, I chose Rust
  3. Deep Knowledge about how Programming Languages work
  4. Good in Programming And Logical THinking

Initial Planning and Development Setup

During the project initiation phase, the following activities were undertaken:

  1. Research: I Will Read and dive deep into other Programming Languages Source Codes for Ideas and help.

  2. Project Plan: A detailed project plan was created, outlining the various stages of development, key milestones, and estimated timelines. This plan provides a roadmap for the project's execution and ensures effective project management.

  3. Development Environment Setup:

    • Installed Emacs
    • Installed Rust
    • Initialized Git Repo