Software Security project of group 37 - MEIC @ IST 2023/2025.
@IST
Master in Computer Science and Computer Engineering
Software Security - Group 37
Winter Semester of 2024/2025
The js_analyser.py tool is a static analysis solution designed to identify vulnerabilities in JavaScript program slices.
It evaluates information flows from entry points (sources) to sensitive operations (sinks) and detects the application (
or lack) of sanitization functions based on customizable vulnerability patterns.
- Install the required packages:
pip install -r requirements.txt- Run the application:
python ./src/js_analyser.py <path_to_slice>.js <path_to_patterns>.jsonExample:
python ./src/js_analyser.py ./test/fixtures/1-basic-flow/1a-basic-flow.js ./test/fixtures/1-basic-flow/1a-basic-flow.patterns.jsonTo verify functionality, run the included test suite:
pytest ./testThe tool begins by parsing the input JavaScript slice into an Abstract Syntax Tree (AST) using the esprima library (check Appendix A. Syntax Tree Format for more information on the AST format). Each vulnerability pattern from the input JSON file is converted into a Pattern object, which contains sources, sinks, sanitizers, and whether implicit flows are to be tracked.
When the analyze method is called, the analyzer will iterate over each pattern and create a deep copy of the AST to avoid modifying the original tree. It will then call the __analyze method, which will iterate over each statement in the AST and call the analyse_stmt method to analyze the statement.
It also initializes a history dictionary to keep track of the variables that have been analyzed and with the vulnerabilities associated with them (passed through a flow).
A "vulnerability" is an instance of TaintData, which contains the source, source line, a list of sanitizers (instances of SanitizerData containing the sanitizer name and the line where it is located), and a flag implicit that indicates if the vulnerability is implicit (i.e., the source is not directly assigned to the sink).
The analyse_stmt method will check the type of the statement and call the appropriate handler method to analyze it. For example, if the statement is an ExpressionStatement, it will call the analyze_expression_statement method.
We handle the following types of statements:
- Expression Statements: The
analyze_expression_statementmethod will call thetaint_expressionmethod to taint the expression. - Block Statements: The
analyse_block_statementmethod will iterate over each statement in the block and call theanalyse_stmtmethod to analyze it. - If Statements: An if statement contains a test expression, a consequent block, and an optional alternate block. The
analyse_if_statementmethod will taint the test expression (to handle implicit flows) and then analyze the consequent and alternate blocks. To analyze the blocks, it will create a copy of AST, append the block statements to the copy, and call the__analyzeto analyze the flow as if it was a slice. - While Statements: The
analyse_while_statementmethod will taint the test expression (to handle implicit flows) and then analyze the block. To analyze the block, it will perform 3 iterations using the__analyzemethod to analyze the flow as if it was a slice:- One iteration without the loop body, to check flows where the loop is not executed.
- One iteration with the loop body, to check flows where the loop is executed once.
- One iteration with the loop body, to check flows where the loop is executed multiple times.
To "taint" expressions, we implemented the taint_expression method, which will check the expression type and call the appropriate handler method to taint it. For example, if the expression is a CallExpression, it will call the taint_call_expression method.
We handle the following types of expressions:
- Call Expressions: The
taint_call_expressionmethod will calltaint_expressionfor each argument in the call expression. Then, it taints the callee (calling different functions depending on the callee type). It also checks if the callee is a sanitizer, and if so, it will add the sanitizer to all the current tainted variables (vulnerabilities). Finally, it checks if the callee is a sink, and if so, it will callprocess_sink. - Assignment Expressions: The
taint_assignment_expressionmethod will taint the right-hand side of the assignment expression, add the variable to thehistorydictionary, and check if the variable is a sink. If it is, it will callprocess_sink. - Identifier Expressions: The
taint_identifier_expressionmethod will check if the identifier is one of the sources and, if so, it will create a new vulnerability and add it to thevulnerabilitieslist. It also checks if the identifier is in the history dictionary and, if so, it will add the vulnerabilities in the history to the current vulnerabilities. Finally, if it is not in the history, it means that the variable is not instantiated in the slice, so it will add a vulnerability with the source and the line where the variable is used. - Binary and Logical Expressions: The
taint_binary_expressionhandles both binary and logical expressions. It will calltaint_expressionfor the left and right expressions. - Unary Expressions: The
taint_unary_expressionmethod will calltaint_expressionfor the argument of the unary expression. - Member Expressions: The
taint_member_expressionmethod will calltaint_expressionfor the object and property expressions.
Literal expressions are not tainted, as they are not sources of vulnerabilities.
Sensitive sinks, as defined in the vulnerability patterns, are identified during the analysis of call expressions and assignment expressions. If tainted data reaches a sink, the flow is marked as vulnerable unless sanitized.
When a pattern specifies implicit flows, the tool tracks taint propagation through control flow structures. A custom statement called ImplicitWrapperStatement wraps statements that could lead to implicit vulnerabilities.
Sanitization functions are identified during the analysis of call expressions. If a sanitizer is found, it is applied to all tainted data in the current scope.