Skip to content

Beicai91/Minishell

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Minishell

Minishell is a small UNIX shell that reproduces the core behavior of bash. It takes user input, tokenizes and parses it, builds an Abstract Syntax Tree (AST), and executes the resulting command structure while handling pipes, redirections, command lists, logical operators, blocks, heredocs, quoting, and environments.

Installation

  1. Install the Readline library Minishell depends on the GNU readline library for interactive input. You must install Readline on your system before building the project

macOS (Homebrew)

brew install readline

Linux(Debian/Ubuntu)

sudo apt-get install libreadline-dev
  1. Update the Makefile (H_PATH and LIB_PATH)
H_PATH := -I ./include -I <readline-install-path>/include
LIB_PATH := -L ./lib/libft -L <readline-install-path>/lib

Replace <readline-install-path> with the actual directory on your machine.

  1. Build the project
make all

Usage

./minishell [command-line]

Dev features

  • Manual tokenizer with operator-aware lookahead
  • Recursive-descent parser respecting shell precedence
  • AST-based command representation
  • Pipeline and redirection engine
  • Logical operators (&&, ||) and command lists
  • Built-in command suite
  • Environment/variable management
  • Signal integration
  • Heredoc handling (with Command Substitution)
  • Wildcards
  • History handling

Technical notes

  • AST-based command representation
    This project uses an Abstract Syntax Tree (AST) to represent the command line. More specifically, it follows a type-tagged, polymorphic AST design.
    At the core is a minimal base struct (t_cmd) that contains only an int type field. Every concrete command —pipeline, list operator, redirection, exec command, logical AND/OR, heredoc, and so on - has its own specialized struct that embeds the base struct's header field (int type) and then carries whatever additional data.Because all nodes share this common int type header, the parser and executor can treat any specialized node as a generic t_cmd and select the correct behavior based solely on the type.
    During parsing, the input command line is broken into logical components and assembled into an AST. Binary operators such as pipelines, command lists, and logical AND/OR become nodes containing two children (left and right). Commands like redirections or heredocs wrap a single subcommand. Leaf nodes are exec-command nodes, which hold the final command arguments and all information needed to run the program.
    Execution is performed by recursively traversing the AST and handling each node according to its type, allowing the shell to evaluate complex command lines in a structured and predictable way.

  • Lookahead-based tokenizer
    The shell uses a hand-written, lookahead-based tokenizer to scan the input line and produce tokens for the parser. The tokenizer walks the input character by character, skips irrelevant whitespace, and classifies each token according to shell syntax rules. It handles multi-character operators (such as &&, ||, <<, >>), quotes, parentheses, redirections, and ordinary words. Each call identifies a token’s type, returns the token’s start and end positions, and advances the input pointer so the parser can continue from the correct location.

    AST_examples

  • Operator Precedence and Parsing Strategy
    The shell follows a well-defined operator precedence, which determines how complex command lines are grouped. From strongest binding to weakest:

    Parentheses
    Group commands and force them to be parsed as a unit.

    Redirections (<, >, >>, <<)
    Attach tightly to the command they modify.

    Simple commands and arguments
    The basic building blocks of execution.

    Pipelines (|)
    Connect the output of one command to the input of another.

    Logical operators (&&, ||)
    Control execution flow based on command success or failure.

    Command lists (;)
    The weakest binding, used to sequence independent commands.

    To implement these rules, the project uses a recursive-descent parser with top-down precedence handling. Each parsing function corresponds to one precedence level. Higher-precedence constructs call lower-precedence ones first, ensuring that tightly bound operations (like redirections or pipelines) are parsed before more general ones (like lists or logical operators). Parenthesized expressions re-enter the parser at the highest level, allowing a complete command line to be nested inside a block. This structure produces a clean and predictable Abstract Syntax Tree (AST) where each node reflects the operator precedence defined by the shell grammar.
    parsing-strategy

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •