Skip to content

CSS @import rule breaks morphing model #18

@psivesely

Description

@psivesely

Summary

Since we only parse the HTML document and not the CSS document(s), which can import more stylesheets into themselves, the page* we think we're creating is not equal to the page the user is actually downloading. Furthermore, any stylesheets imported with @import from other stylesheets will not be padded.

* Page can be defined here as a 3-tuple of values sampled from our distributions: (HTML size scalar, object number scalar, and object size vector).

Background

The CSS @import rule allows you to import one or more style sheets into another.

@import url|string list-of-mediaqueries;
  • url|string: A url or a string representing the location of the resource to import. The url may be absolute or relative.
  • list-of-mediaqueries: A comma-separated list of media queries conditioning the application of the CSS rules defined in the linked URL.

Example

Import the "mobstyle.css" style sheet ONLY if the media is screen and the viewport is maximum 768 pixels:

@import "mobstyle.css" screen and (max-width: 768px);

Background source.

See also the Tor Project onion site.

Solution Proposal

Parse the CSS as part of the HTML morphing process and add any imported stylesheets to the initial page model for morphing. Since "@import rule must be at the top of the document (but after any @charset declaration)," we needn't parse the entire CSS document.

During HTML morphing each CSS object in the HTML page will receive a query string of the form (?...alpaca0=<size 0>&alpaca1=<size 1>...&alpacan=<size n>), where n is the number of @import statements the stylesheet contains. When the CSS object is requested with this string, we will have to morph the CSS to add each of the n query strings to their corresponding imported stylesheets (e.g., @import "mobstyle.css?alpaca=<size 1>"). Then when the substylesheet is requested, we simply pad it to size 1 length.

Of course, we may sometimes need to build a deeper page model if one stylesheet imports another that imports another, and that opens up another set of problems:

  • How do we represent the model?
  • How do we deal with circular imports (how do browsers handle that)?
  • Etc..

For best results, we'll need to study and model the behavior of TBB, so we know which sheets will actually be downloaded. Our algorithm again won't work as expected if we assume all CSS stylesheets will be downloaded, even if the media query arguments ensure it won't in TBB. We'll assume the security slider is set to high and the window size has not been modified for simplicity.

At a later point I intend to introduce memoization for the static content in the creation of page models, so in many cases the overhead of this addition would be minimal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions