Faster HTML and CSS: Layout Engine Internals for Web Developers

Faster HTML and CSS: Layout Engine Internals for Web Developers

>>Thanks for coming. This talk was started–we
were thinking about doing some experiments and around performance in CSS and search the
web with search engines found very little. There’s not a lot of research out there on
this so it occurred that maybe we should go to the guy with the secret knowledge. How
does this stuff really work? We’re poking at the black box but we don’t even know how
it works. So, I want to introduce David Baron, he’s a software engineer at Mozilla where
he’s worked since 1998. He’s a member of the W3C CSS standards working group and he is
here to present.>>BARON: Thanks. So, it will be a little
interesting because right now I have the slides there but I don’t have them here. So, you
know, I make it a little confused to which was I’m moving the mouse at some point, but
in any case. So, what I’m going to talk about is really the–so, just–well, one little
pointer is that, if anyone can’t see them I also put the slides on the web although
they–at But they, sort of players are relatively recent browser to
look at because I basically wrote them against Mozilla trunk using HTML and SPG. So it’s
a little fun. Anyway, essentially what I’m going to talk about is, it’s sort of a how
browsers work from my perspective which, you know, you ask any engineer who works on web
browsers how they work and they have, you know, they work more in a different area and
they’re going to be biased towards talking about that area of the browser rather than
some of the other areas that they have less experience in. So, I’m going to try and sort
of give a–how browsers were talk but added it pretty heavily to focus on the parts that
are relevant towards what authors can do to make webpages faster. But it’s still fundamentally
sort of this is how a web browser goes from getting stuff on the wire to just playing
stuff on the screen but with a lot of other comments interjected in. So, when–I’m coming
at this from the perspective of somebody who works at Mozilla. You know, so I work on one
web browser. I don’t–I know a lot more about one than I know about the others. So a lot
of the details that I’m going to sort of, that I’m going to get into, some of them are
sort of pretty common across browsers whereas some of the others vary more between browsers.
And I’m not even necessarily sure in some cases which are which. And beyond that a lot
of these things vary between browser versions in that–when you’re writing a web browser
you have a lot of compatibility requirements in terms of what you have to–basically anything
that you’ve shipped before that other browsers do too, you have to keep doing. And it’s not
necessarily that way with performance. If you’re slow at something before nobody is
really going to complain if you find some way to speed it up. So we sort of have a lot
more ability to change the performance characteristics of our loud engine than we do to change the
output in terms of, you know, visual output or behavior or other characteristics. So an
example of that would be like–in Firefox3, we made some pretty significant changes–well,
there where many significant changes between Firefox2 and 3 but one of them that’s relevant
here is that the way we handle style changes changed drastically so that we would coalesce
separate style changes and process them all at once rather than processing them all separately
which could pretty significantly change performance characteristics of pages that were exercising
that. But, you know, it doesn’t break anyone. We have the ability to do that. So, with that
preface in mind, I want to just sort of dive in and talk about the data structures we have
in web browsers and some of the things we do. So, sort of one of the central data structures
we have is the DOM tree or the contact tree. We call it a bunch of different things. But
basically, HTML, this bunch of tags is a serialization of a tree structure and most modern browsers
actually turn that into an actual in memory tree structure. This–in some older browser
that actually wasn’t the case, but these days most browsers actually have a tree in memory
and when you use the DOM APIs that work on a tree, there’s an underlying tree data structure
that looks pretty much like you think it would look. So, you know, a simple HTML document
has a bunch of HTML element nodes and bunch of text nodes and so on. And, you know, the
types of the nodes in this content tree or things like HTML elements and then there are
specific types of HTML elements. And they differ based on, you know, based on what DOM
methods they have. You can also have an SVG DOM tree or if you have something like the
slides I’m using today, you can have a DOM tree that mixes both of them. But the thing
about this tree structure is that the types of the nodes are related to the types of the
elements. Then–now in addition to that and this varies–this starts to vary a little
bit more between browsers but I think it’s still reasonably similar. We have a second
tree structure that represents the–what we render for all of these elements. So, we call
it the frame tree which is sort of odd and I will tend to use that term just because
it’s the term I’m used to using. People will also call it the rendering tree. I’ll probably
use them interchangeably. The nodes in this tree all represent rectangles, essentially.
But the more important difference is that the types of the objects in this tree aren’t
things like element type. There are things like CSS–the values of the CSS display property
will mostly correspond to the types of nodes in this tree, so block or inline or various
table types or text nodes. In many cases there’s a one to one correspondence between the nodes
in these trees, but in some cases there isn’t. For example, a node that has CSS display none
wouldn’t create any nodes in the rendering tree. So you–like you see in this slide,
the head element doesn’t have–there’s no nodes in the rendering tree pointing to the
head element because we make the head element display none, so there’s just nothing generated
there. Likewise, there are cases–especially when we break elements across lines or pages
where you’ll have multiple rectangles representing a single element in the DOM. So, with these
two data structures–now, I want to sort of walk through the process of what we do as
we display a webpage. So, for a start, there’s sort of–I’m actually going to start off to
the left edge of this slide. But we start off, you know, we’re just–we’re reading HTML.
Things like parsing are mostly linear time in the length of the thing you’re parsing.
It’s not–there’s nothing all that complicated in terms of parsing that–at least that I’m
interested in. Then again, it’s sort of not like, not as much my–the area that I work
on, so I’m sure there are other people at Mozilla who could talk about performance aspects
of parsing for quite awhile. But there are actually some–in some cases, parsing–the
process of parsing a document actually ends up not being linear time, not because–not
so much because of the algorithms but because of the way it’s done incrementally. And in
particular, if one element has a very, very large number of children, you end up with
some quadratic time algorithms showing up just because of the process of incrementally
loading a document and incrementally adding those children, there are–they’re sort of
a very small quadratic term because there’s some operations, especially when you’re dealing
with laying out the document and displaying it where you’re going to walk over that child
list as a–as it incrementally grows every single time, though buy and large it’s linear
time. So, the more interesting stuff about the process of loading a document is dealing
with things like loading style sheets and loading scripts and loading images. Because
when you’re just sort of in this static version and–what I–what I’m sort of going through
now is the sort of the case of displaying an HTML document that’s just static and doesn’t
change dynamically which is sort of the basic case, and then I’m going to go back over again
and talk about how we handle dynamic changes that are a lot more relevant to use of HTML
and applications. But in any case, the–so, loading images is sort of straightforward.
You–at–when you’re constructing the Dom tree, you hit a node that’s an image, you
kick off a network load that starts loading that image and you just keep going. It’s a
synchronous. It’s not giving you some huge penalty, although there’s, of course issues
with a limited number of it–we’re limiting the number of HTDP connections to a given
server. So there’s some serialization against other resources that you might be loading.
Scripts and style sheets are a little bit more interesting because scripts have this
model that I suspect a lot of you are familiar with, where what’s in a script executes at
that point. So right where the script is linked, you are executing that script. So you have
to wait for the script to load because the script could document .write, a start tag
and not an end tag. It could document .write, all sorts of things. And the programming model
used on the web is a synchronous model where your–where the–where the script has to execute
at the point its loading. So when you’re waiting for scripts to load, you’re often waiting
for–you’re essentially not even parsing the HTML to find other things to load. That was
actually true until yesterday on Mozilla Trunk. We actually landed a patch yesterday that
finally sort of speculatively parses the HTML after the script on the assumption that the
script isn’t going to do anything too serious and starts initiating the network loads for
things that it finds linked there. But even so, this is–this is something that you have
to be pretty careful with. Style sheets are sort of in between case because the idea with
style sheets is that they can have drastic effect on the rendering tree but they have
no effect on the Dom tree at all. So, you really want to wait for style sheets to load
before you construct the rendering tree but you can keep building the Dom tree and potentially
executing script even while they’re loading. That said the way that’s happened has changed
overtime in Mozilla. The original way this was implemented is somebody said, “Oh, hmm,
we need to do this. Why don’t we just reuse the code for script?” So in the old days,
we actually did all the things for–that we do for script also from style sheets. That’s
changed such that we will load a style–we will now continue parsing the HTML, continue
loading the page while we’re waiting for a style sheet to load. But potentially that
means you’re running scripts and those scripts could potentially ask for the result–for
a layout information which means, suddenly we need a rendering tree in order to give
the script the information they asked for which can potentially produce a problem that
web developers hate so much that they’ve given it a name which is Flash of Unstyled Content.
And since we started doing this for style sheets in Mozilla, we’ve actually started
having this type of problem in a few rare cases where a page asked for layout related
information. So, this is all sort of the preface to building up this content tree. Once we
have the content tree, we then have to go decide how to–what types of objects to put
in the rendering tree. And since the types of objects we put in the rendering tree depend
on CSS styles like the display property and some other properties, we actually have to
compute the style for the element in order to construct the rendering tree. So, the next
thing I’m going to talk about is CSS selector matching which is sort of fundamentally, it’s
from an algorithmic perspective, it’s sort of looks like it aught to be a bit of a performance
hotspot although it actually ends up not being not bad in most cases because of the way we
optimize it. So the basic idea of CSS selector matching is that you have a set of elements
in the content tree, and you have a set of CSS rules and for each element you’re asking
does this rule–does this selector–if this selector matches this element then we’ll use
this rule. So fundamentally, you have a problem where you’re running this algorithm for every
pair of element and selector which can add up to a lot. So the question is, first of
all, how we optimize that? And second of all, what it is that can make that more or less
expensive? So I actually want to briefly step through the un-optimized version of CSS selector
matching just so that it’s clear how this works because it says a few things about what
types of CSS selectors can be faster or slower. So, the–a CSS selector–the things over here
on the right side of this slide are examples of pretty simple CSS selectors. The first
one reprints a div–any div elements, the second one represents any element with a class
attribute that’s item, and third is any element with an idea attribute that side bar, the
fourth is any div element that has an idea attribute with class side–with ID side bar,
the fifth one represents any p element that is a descendant of a div element. So, this
is–this is one of the things I wanted to talk about here which is essentially the process
of matching. So, if we were trying to figure out which selectors match the body element.
In the un-optimized case, you’d sort of look at this and say “No,” you know, “this–it’s
not a div element.” “No, it doesn’t have class item.” “No, it doesn’t have ID side bar.”
It doesn’t really matter fundamentally which one you match here first. To match this one,
the way pretty much all browsers do it is that CSS selectors match from right to left.
So, the first thing you look at when you’re trying to match the selector is the part that’s
at the right of–the part to the right of the right most combinator where the space
is a combinator that represents is descendant of the greater than sign represents is child
of and so on. So, if you’re trying to match the p, say this p element here. These rule–it’s–there’s
only one–there’s no combinators in these. You just look at the piece, the one simple
selector which is the unit between the combinators and in these four cases that doesn’t match.
In this case, when you match the–try to find out if the p matches, the right most simple
selector does match, so then you look at the combinatory and say, “We’ll we want to find
an ancestor that’s a div.” So you look up and say, “Okay, this one matches.” So it turns
out that selector matches. Whereas with a selector like ul p, if you’re trying to see
if it matches this p, you start at the p, it matches, then you look for an ancestor.
So you end up walking all the way up the tree looking for an ancestor to see if it matches.
So, if you’re dealing with a deep document tree, you can potentially spend a lot of a
lot of time even just on a single selector, never mind this problem of multiplying all
the elements times all the selectors. So, there are actually some–some selectors are
even worse in that there’s back tracking required. For example, this one here body, you’re looking
for a p that is a descendant of a div that’s the child of body. If you’re trying to see
if this p matches, what the browser will do is well, it’ll save that p matches–matches
the rightmost part. Then we’ll find the div matches the next part but this div doesn’t
match the next part. So since the body div relationship is a child relationship, you
have to backtrack, try to match this div against the middle part. That succeeds but this fails
to match body. Back track, match this against div and finally you get a match the third
time. So that’s just–that’s the un-optimized case. Now, the way we avoid the problem of
having to match every element against every selector, is that we hash the selectors into
a bunch of buckets in advance to filter out the one’s that we know aren’t going to match.
And on that filtering is done on the rightmost part of the selector, in other words the part
to the right of the last combinatory. So essentially, we’ll save it if an element has–if a selector
has an ID in that rightmost part of the selector, we’ll stick in into a hash table for selectors
that have an ID, buy it’s ID. If it has a class, we’ll stick it into a hash table for
selectors that have a class, unless it was already in the hash table for those with an
ID. If it has a tag name, we’ll stick it in a hash table by tag name, and otherwise we’ll
just stick it in the list of all the selectors that we couldn’t classify. So then, when we
want to find the selectors that match, say this div here, what we’ll do is go to that
or hash of selectors with IDs in them, pull out the entry for ID equal side bar. There’s
no class on that div, so it doesn’t matter. We’ll then go to the hash for all the selectors
with–for the hash of selectors by tag name and pull out the div–pull out the selectors
that have div in that rightmost part. And we will then combine those lists and only
run those selectors. So what this is saying is that in Mozilla, and I think this is also
reasonably true in other browser engines, your selectors are going to cause much less
of a performance problem if they’re more specific–if the rightmost part of them is as specific
as possible. Because then you won’t even–there won’t even be any code at all to deal with
testing them against all these other elements that probably are going to fail but maybe
not all that quickly in the algorithm. So–in any case–so once we have the list of selectors
that match, we have a set of–we have a list of CSS rules that match. You take all the
declarations, compute a property value for every element and start constructing this
rendering tree which is, you know, there aren’t too many interesting performance characteristics
of the–constructing the rendering tree in terms of the static case. It’s pretty much
you go build objects and pretty boring. Then once we have a rendering tree, we compute
all the positions of those objects which is like constructing the rendering tree, it’s
a recursive process in that–and one of the–so essentially, the process of layout which we
sometimes call reflow at Mozilla, involves essentially assigning coordinates to the rectangle
for all of these rendering objects. So, you know, traditional document layout algorithms
tend to treat widths as inputs and heights as output. So it’s this–so it’s essentially
done as a recursive algorithm where the apparent will have some width input. It will compute
its own width, tell its children to fit in that width and they’ll add up to some amount
of height and then you’ll come back out to the parent. It’ll finish–determine its own
height and pass that on–back up to its own parent. Now, it’s not completely true that
with certain input and heights are output, there are cases where we use intrinsic widths
of content where essentially widths are output but that’s not too relevant here. Now, the
code that does this is going to vary a lot by frame type. How it’s optimized is going
to vary a lot by frame type. So, you know, block–things like blocks and tables probably
are optimized reasonably well. Unusual things might not be–might not be as careful about
being efficient. Then once we’ve computed all these rectangles, we then come along and
we want to actually display something so we build a display list for all the things that
we have to display within a rectangle. We–and then we essentially paint that display list
in back to front order using a 2D graphics API. Now, that’s sort of–the–now, in that
painting process, there are some things that make it slower. For example, if you have opacity
which is group opacity, you have to compose things and do an off-screen surface or paint
things into an off-screen surface and then compose that onto the rendering and so on.
But I don’t want to go into that too much. Now, that sort of the one-pass-through simplified
version for the static case. Now, when you’re building applications, you end up potentially
dealing with more possibilities here. Sorry. So, there’s a lot of dynamic changes that–when
you’re writing a script, when you’re writing–yeah, when you’re writing script that–there’s a
bunch of different types of dynamic changes that can happen to cause changes in this whole
pattern. So, you know, one of simplest sort of adding and removing elements from the Dom
which is something you can do with DOM APIs. That’s pretty common. Basically in that case,
you sort of run through this same pattern on the elements you added, the same static
pattern in a pretty straightforward way. So it’s not all that interesting in terms of
unusual performance characteristics. However there are a bunch of other types of changes
that have different performance characteristics. So, essentially web browsers, in some sense
CSS centric, in that they’ve sort of used the design of CSS and sort of a central part
of their architecture. So, a lot of changes, changes that affect layout are often sort
of indirectly CSS changes that happen because something causes the computed style of CSS
properties to change, which in turn causes–causes the display to change. So, examples of, I
mean, so the–sort of the simplest example of that change is simply changing the style
attribute. So you change on some element. You’re pretty clearly changing
the computed value–the computed style of an element. So, they’re sort of–I have this–they’re
sort of a bunch of–a bunch of different pads I drew here. So I’m sort of thinking about,
you know, what types of content changes. There are–if you change the computed style of an
element, if–sorry, the style attribute of an element, you’re going to change the computed
style, there’s no way around that. However there are some other types of changes where
you can avoid changing the computed style but that sometimes change computed style.
Like if you change an attribute, like the class–the class attribute is pretty likely
to affect computed style. But there are some attributes that are more or less likely to
change the style. And we have some optimizations to detect whether or not they will which I’ll
talk about in a second. Then there sort of a third class of changes where the changes
are actually not completely avoiding the system at all. One of the interesting ones there
is scrolling which scrolling is a pretty optimized process because it’s something users do a
lot and it’s something that graphics cards are reasonably good at doing in the common
case which is in most cases scrolling down a few pixels, you can simply tell the graphics
card to move everything a few pixels up and then you manually repaint the little slice
at the bottom that appeared. So that’s–that not only avoids dealing with all of the systems
except for painting, but it also avoids even repainting anything but the little region
that changed at the bottom. Now, there are a bunch of cases where that’s actually not
the case where we have to repaint everything. Some of the obvious ones are if you used background
detachment fixed or position fixed which basically are a way of creating something that doesn’t
move when you scroll. Then if you’re drawing something that’s a composite of things that
do move when you scroll on things that don’t, you have to repaint the whole thing. You can’t
just move bits on the screen. There’s actually a 3rd case there that’s sort of interesting
which is when you have overflow on an element that’s–so the CSS overflow property lets
you create something that’s scrollable inside a document. If you have overflow on an element
that is–that does not–that has a transparent background and it’s on top of something that’s
not uniform, then we again have to repaint the whole thing when you scroll. And we’ve
gotten better at detecting some of the optimization cases there. I think at this point we will
actually detect, you know, if you have something that is–if you’re scrolling something that
has a transparent background but it’s on top of something that has a uniform background,
I think we’ll still optimize that but that’s probably something that differs a good bit
across browsers and versions as well. To move away from this–scrolling is sort of an interesting
side point because it’s something that you can do programmatically through the DOM. You
can change element.scroll top and element.scroll left which is often a much faster way of doing
something that affects the people do by changing and using absolute position
in a relative positioning to move things. When–there are–some of those effects can
also be accomplished by scrolling something programmatically and this could be something
with overflow hidden which still can be scrolled programmatically, and that’s sort of a way
to bypass this whole pipeline and just deal with the repainting. So then there’s this
3rd set of changes that sometimes cause new–sometimes cause re-computation of style and sometimes
don’t. These are things like changing attributes. And the classic case that needs to be pretty
heavily optimized is what we call event states. Now the event state that’s important is the–in
term of optimization is the hover state because hover is a CSS selector that applies to whatever
element is under–whatever elements are under–basically the element that’s underneath the mouse pointer
and all of its ancestors. So in theory what element matches that selector, changes every
time the user moves the mouse. But–so, the optimization through which we avoid doing
this re-computation of style is essentially geared towards optimizing that case so that
we don’t have to re-compute style every time the user moves the mouse which is that every
time–so when the element that is–when element changes whether or not it’s in the hover state.
We essentially look at all of the CSS selectors that have hover somewhere in them. So not
necessarily in the rightmost part but we look at any CSS selector that has Collin Hover
in any part of a selector. If that selector matches–if that part of the selector all
the way to the left end matches the element, then we might have to–then there might be
some style change. Because you can have a selector like Collin Hover space p that applies
to any paragraph inside of an element that’s currently in the hover state. So we need to
check not only the rightmost part but all the part in the selector which sort of yield
the 2nd guideline for fast CSS selectors which is that, there’s–anytime you write something
like hover in a selector or write an attribute selector based on the attribute change–an
attribute that changes a lot, it’s also worthwhile to–to have that part of the selector be as
specific as possible even if its not the rightmost part of the selector. So, the reason this
is so valuable to optimize is that at that point we can check only one element. Whereas
once we decide that we need to go through and re-compute style for an element that implies
that we’re also re-computing style for all of its descendants both because of its CSS
inheritance. Because a lot of properties are inherited. And because of a lot there–it’s
pretty common to have selectors that move from just–that select based on ancestors.
So the way of handling–so we sort of just handle that all in the same code. So, it turns
out that once–so once we decide that we do need to do this restyling, we then coalesce
as many restyles as possible. Now, essentially what this means in the normal case, is that
we post an event to the main event loop and say, “When this event fires, we’ll process
all the restyles that have happened between the first one and when the event fires.” However,
the web has sort of–the web has evolved with a synchronous programming model that doesn’t
let us quite do that because basically the expectation of script talkers and the expectation
of all the pages that they’ve written that we have to be compatible with, is that changes
take affect immediately. So, when we say that–when we do things asynchronously, we can do that
as an optimization. But then if somebody asks–but then if a script asks for information that
depends on the thing that we’re planning to do later, we suddenly have to do that immediately
to provide the information the script wanted. And there’s a lot of things, a lot of drama
script, a lot of DOM APIs that actually require this information. So if you’re looking at
the computed style for an element that requires that all the style be up-to-date and, you
know, it in fact requires that the layout be up-to-date in some cases. If you’re asking
for various properties, like offset top or offset left that requires that the layout
be up-to-date which in turn requires that the style will be up-to-date. So there’s a
lot of things that cause us to–cause us to flush our cue of all the things that we would
like to coalesce. And that poses a potential danger to script authors because you can–it’s
pretty easy to write a loop where you’re making a change and then reading something that requires
that change to be flushed. Whereas, for example, if this were split into two loops, you could
read all the data that you needed and then may call the changes, you would then have
all those changes coalesced and it would be much faster to make all of them than if they’ve
all been–than if you force them to all be flushed separately. So, once–when we re-compute
the style for a bunch of elements, we then essentially compare the–compare the style
for–the old style data and the new style data. So we have, you know, we find out that,
for example, the CSS display property changed. The display property affects what type of
frames we constructed. So if the display property changed–changes, we need to go–we need to
go construct frames and then go through the rest of the pipeline. If, say the width property
changes, that doesn’t affect what type of rendering objects we construct but it affects
the layout and so we need to go through the pipeline from here. It could also be that,
say the color property changed, at which point–the color property doesn’t affect the first two
so we can jump straight to the third. So, when we handle this–when we handle all these
things that change CSS properties, defending on the property will do a different amount
of work to handle that change. So, then there’s the question of how much work these different
types of things take? So, reconstructing frames for something, basically if we reconstruct
the frame for an element, we’re also reconstructing the frames for all its descendants. That’s
just an invariant that we maintain. I don’t know if other browsers do that or not. But
it’s not particularly–its–there–there’s no interesting behavior regarding the depth
of the tree except for things like–except for a few odd cases where we have to go and
reconstruct the ancestors because there are a few really strange cases, like when you
have blocks inside of inline elements where suddenly we–there’s enough complicated fix-up
that we need to go for a lot more in order to redo the fix-up from the top of the tree.
Doing re-layout or reflow is a little bit more interesting in that a re-layout is always
a recursive process running down from the top of the tree because we have this algorithm
where the widths are input and the heights are output. So if there is some change way
down in the depth of the tree, its possible that that change propagates out into different
heights all the way back up to the top. So, what we do to in–potentially some other things
that we need to update during layout. Like over–like the regions of overflow which are
sort of like a second rectangle. So, when we do incremental reflow its–there is this
aspect that’s a function of the depth of the tree. So, this is a diagram that I–I stole
from a presentation a colleague did six years ago. Essentially we sort of optimize–we call
these re-layout methods all the way down the tree, and some of them aren’t necessarily
going to be all that efficient but they, in turn, at least aren’t going to re-layout all
of their children, they’re going to just re-layout the child on the path to get to what needs
the layout. So, the cost of doing re-layout can be pretty heavily affected by the ancestors
of that element. For example, if you have an element that’s inside a floating element
that’s got a lot of floating siblings, recovering state–the state we have to recover for floating
elements is pretty substantial because we need to rebuild state in order to know where–what
areas we can wrap around and what areas not to wrap around. So the cost–the cost of a
reflow can vary a lot depending on what–depending on what something is inside, not just depending
on what it is that is being laid out again. So, then–the final step is repainting where–essentially
what we’re doing is we’re invalidating regions, telling the operating system that a region
is invalid, it will then come back to us with a paint event telling us to repaint that region.
So, there’s sort of this hierarchy of css properties in terms of which cause more damaging
style changes than others and this can introduce some tradeoffs. For example, if you want to
hide elements, there are actually multiple ways that you can hide an element. You can
change it to be display non and like I said earlier, making something display none means
were not going to construct any rendering objects for it. So if you change something
to display none, we’re going to destroy all the frames for it. And then if you change
it back from display non to whatever it was before, we then have to rebuild all the frames,
lay them out over again and paint everything. If in turn, you hide something with a visibility
property, you don’t incur any of those costs because the visibility property doesn’t affect
frame–doesn’t affect the frame tree, it doesn’t affect the layout. But you have slightly higher
costs in terms of what you’re doing every time. In other words, the tradeoff between
display and visibility is essentially that with visibility, changes are cheaper but the
overall cost when its not displayed is higher because the visibility property, you still
have the rendering object, you still have to do all the layout but then you just don’t
paint it. So, I want to–I want to go back now and talk about some implications of these
four things that you can–for ways to test. This is one of the things Lindsey asked me
about–when he asked me to give this talk is that people were thinking about, you know,
what types of things are useful in terms of testing performance of one pages. And some
of that depends on what it is you want to test but–so, one example is, if you want
to figure out essentially what the cost of building the frames and laying them out is,
for some particular piece of content. You could do something like the following: you
could set the element or maybe it’s the body element to be display non, then you could
get the offset top property of some random element in the tree which will in turn flush
all the style changes and flush the layout, so that you’ve essentially flushed the buffer
of what’s cued up, then you get a time stamp, then set the thing that you had set to display
non back to its original value, then again access offset top of some random element in
order to flush everything. That will–what that’ll do is–getting offset top will flush
all the style changes. It will recreate the frames and lay them out again. And then you
can look at another time stamp to see how long that–how–essentially how heavy that
trunk of markup is. Now, during that–within something that’s dynamic has a–could potentially
throw-in some confounding factors because by forcing these flushes, you’re also forcing–splitting
up things that could potentially be coalesced within a real application. So, likewise, something
I talked about a little bit earlier is dealing with the cost of incremental layout. You know,
you–one of the things that might be interesting to test in terms of layout is how expensive
some structure is in terms of the–its performance effect on re-layout of what’s inside of it
because like I mentioned, the layout process depends on the depth of the tree. So you can
do, again, similar things by, you know, checking an offset top, making some small change and
then seeing how–seeing if different structures–seeing if different structures take shorter or longer
amounts of time to handle every layout. I’m sure there’s lots and lots of other examples
here, but those are just a small number. And hopefully I’ve–hopefully I’ve given you some
ideas here for other things that you can test. And I’m certainly open to questions about
pretty much anything I talked about. Thanks. And I was told if you have questions you should
use the microphone.>>Hello. Well, thanks for the great presentation.
I do have a question. So, awhile back you suggested using overflow hidden in changing
scroll in order to move things around…>>BARON: Yes.
>>…but I found in the past and I’ve done that, it causes unrelated elements in the
screen to kind of have stuff flashed behind them. Do you know if there’s a reason or work
around for that or…?>>BARON: I don’t know. I’d be interested
to see a test case for that. It’s really not something that should happen. I don’t know.
>>Okay. I was just curious. Thank you.>>Thanks for the great talk. I have a question
about absolute positioning and what kind of optimization you do to not reflow the rest
of the tree when you move it around, for example.>>BARON: So, absolute positioning is sort
of interesting in that it’s–so absolute positioning is–the CSS specification defines this concept
that it calls the Containing Block where–that sort of the–in–for normal elements, it’s
the nearest block level ancestor. But for absolutely positioned elements, it’s the nearest
relatively positioned element or the view port if there is no containing relative–oh,
the near–sorry, the nearest positioned element whether it’s relatively or absolutely positioned
or the view port. So, for absolutely position–so, when we build the–this–when we build the
frame tree, the absolutely positioned element is a child of its containing block. So, for
absolutely positioned elements that are positioned relative to the view port, in other words,
if their containing block is the view port, then their–then in our implementation, their
parent is the view port. So the only–so there’s essentially no structure that you have to
delve down through in order to get to them. But if an absolutely positioned element is
inside a relatively positioned element that’s inside some complicated structure, we actually
are going to go all the way down through that structure to the relatively positioned element
and then jump from there straight to the absolutely positioned element.
>>All right. Thanks.>>BARON: Sure.
>>So, you discussed the performance differences between hiding an object using display non
versus visibility hidden. So that sounds, sort of like an implementation detail or is
that actually–is that behavior somehow part of the standard in…?
>>BARON: So, in terms of visibility, it sort of is part the standard because visibility
is something that can be over ridden by descendants. So, inside something that’s visibility hidden,
you could actually have something that’s explicitly visibility visible and then it suddenly appears
again. So it is effectively part of the standard that visibility–elements with visibility
hidden need to have–need to be laid out. As far as display, I mean, I could–it’s not
strictly part of the standard but if some things display non, you don’t know what display
value it would have if it weren’t non, so you don’t really know how–like if you were
to try to lay it out, you wouldn’t know what display type to give it because what type
of rendering objects you construct and how you lay it out are a function of its display
value.>>So if you were building a three column
layout for performance, do you pick tables or do you pick floats? It sounds from what
you said; you distinguish table elements being very optimized from other things, so. I don’t
know what your [INDISTINCT] is.>>BARON: Well, tables and floats are both
reasonably well optimized although probably for different cases. I tend–when I’m trying
to do a layout, I tend not to be worried so much about the performance aspect. In terms
of floats versus tables, I worry more about what can actually do the layout I want because
usually there’s only one answer that I can come up with at which point I just run with
it. I think a lot of it depends on what–like I don’t think there’s one answer I would give
for that. Like I think it depends on what exactly you’re trying to do. What you want
to be flexible, what you–what you know widths of and so on.
>>Question. Near the end, you were talking about if we were doing timing, we should have
start up the clock, do some–whatever it is you want to and then stop the clock. I found
that any concept of timers or wall clock is incredibly jittery with regards to browsers,
is there any other measure of work I can use in order to time it?
>>BARON: Not that I know of. So, there’s–I think there have been some improvements to
the accuracy of things like recently. It used to be very inaccurate on windows but
I think that’s fixed now. That is on–in Mozilla on windows. My solution to timing things when
the timers are inaccurate is always just do it more times.
>>Do you have performance benchmarks for layout and rendering the test that you’ve
improved?>>BARON: That we use on the–yes. So we have
a bunch of different performance benchmarks that we keep track off. Some of them are just
page loading benchmarks where we’re essentially timing the load of a whole set of pages that
were downloaded at some point and which we’ve, say archived for that benchmark. We also have
some other benchmarks that are testing particular things, a bunch of constructed test to test
DOM performance and graphics performance. Those are the–we also have some benchmarks
for application performance as well. But those are probably the key benchmarks that we’re
looking at all the time. There’s also, you know, people look at specific test cases as
well but not for tracking purposes.>>Coming back to Lindsey’s question about
cables versus floats for horizontal layouts. What if you have deep nesting? Like if you
want to create like a general–generally just a structure for laying the things out horizontally,
is deep nesting of tables generally more expensive than deep nesting of floats or…?
>>BARON: I can’t think of the top of my head why one of them is going to be worse than
another immediately. One factor is going to be that in some–so in some cases, tables
depend more–require more intrinsic width computation in that table cells have–so they’re
sort of–any piece of content has two intrinsic widths. One is sort of–the simple way to
think about it is using a paragraph as an example. If you layout all the texts in a
paragraph on one line, that’s the larger of the intrinsic widths and then smaller intrinsic
widths of the two intrinsic widths is the width of the longest word in the paragraph,
and then you can sort of extrapolate those intrinsic widths outwards. So, table cells
have a rule that they will never go smaller than the smaller of those intrinsic widths
which means they always have to compute that one, even if you’ve assigned them a width.
They will still check and compute it and make sure that they don’t go below it, unless you’re
using fixed layout in which case you don’t have to deal with that. So there–now, some
of that with deep nesting of tables, depending on the browser, some browsers are going to
respond pretty bad–basically, whenever you have deep nesting of things that require intrinsic
width computation; some browsers are going to respond pretty badly in particular, Mozilla
before–so Firefox 2 or earlier. And I suspect Internet Explorer also, because there are
sort of two fundamental different–fundamentally different designs for how to do this intrinsic
width calculation. And basically, Gecko recently changed–between Firefox 2 and 3, Gecko changed
from one to the other. So that now we don’t have–there’s not as much of a penalty for
dealing with deeply nested tables because we’ve essentially separated the two. But what
we did back in Firefox 2 was intrinsic width computation was essentially also treated as
a layout pass where we would sort of say “do a layout at some arbitrary width”. You know,
at infinite width, essentially. And–so then the process of redoing–the process of doing
that would destroy the information that you had in the normal layout. So if you had a
series of deeply nested things that all needed intrinsic width information, you could get
into trouble essentially throwing away the layout to compute an intrinsic width and then
having to rebuild it multiple times as you recurred down and up the tree in order to
lay things out. So that’s one reason you could get in trouble with deeply nested structures,
although that’s–that should be much less of a problem in Firefox now and shouldn’t
be a problem in WebKit or Opera.>>You had also mentioned, dynamically altering
an element that had a float element as an ancestor is being potentially expensive because
more of contextual information has to be recomputed.>>BARON: Well, it’s more dealing–it’s more
any–really any change where there’s a bunch of floats somewhere along that path. Because
essentially when we layout an element, we essentially, even–so when we’re doing layout
on an element because one of its descendants needs to be laid out again, we still have
to sort of look at each one of its children and say “So does this child need to be laid
out? Does this trial need to be laid out and so on?” And if that trial is a float, then
there’s a bit of information that we deal with. So if something along the path has a
lot of float children that might be a problem. But some of these are–like some of those
problems are things that only show up if you’re, say, using single pixels divs to build 1,000
by 1,000 image. People who do that tend to find all these performance problems in browsers
that nobody else finds. So, you know, some of those things might only the things you
hit with very large numbers of children. Anyway, thanks.


  1. He's not a presentation expert, but then again, that's not his job. He was just asked to present the CSS way of mozilla, which IS his job.

  2. hrvatwrestle: But that is not what I see when I look at this guy. Obviously he takes his time and seems comfortable, but I see someone that really likes to tell something about his own expertise. He doesn't complicate things at all, on contrary he paraphrases constantly and spends a lot of time on each point.

  3. At my first take I thought of David Baron's presentation style as pretty dry, but after watching it again it's pretty OK. Since the content is very interesting, here's a wrapup (split because of the 500 characters limit)

  4. – The browser's internal representation of the HTML document looks similar to the one exported by the DOM

    – 07:40 There is a "rendering tree" (or "frame tree") which represents the rendering areas of the document. It consists of rectangular areas.
    There is not a 1:1 representation of DOM nodes to nodes in the rendering tree.

  5. – 09:45 Complexity of parsing a document is not linear because it is done incrementally. For instance, elements with many nodes end up with n^2 complexity. Reason: layouting is applied on the "whole", thus by adding incrementally parts to it the layouting is redone allways on the "whole".

    – 11:56 Scripts block rendering until they are loaded because they are supposed to be inlined to the document right at the point they are loaded in.

  6. – 13:28 Stylesheets block construction of the rendering tree until they are loaded because they affect the visual result. However the DOM tree is beein built.
    – 14:30 "Flash of unstyled content": scripts access layouting information before CSS is loaded.
    – 15:30 CSS selectors – explanation on how they work and what you can do with them, how they map on the DOM tree. Matching is complicated and slow on deep documents

  7. – 20:55 Optimisations undertaken by the browser on CSS selectors. What you can do: the right most part of the selector expression should be as specific as possible.
    -28:00 Dealing with scripts that change the style in DOM/layout. FF can detect when changes to the style will require redoing the layout.
    -30:00 Scrolling is usually optimised. The position:fixed style inhibits optimisations in scrolling. Overflow with transparencies above non-uniform elements disable optimisations in scrolling.

  8. – 32:35 Event states/hover: should be specific to avoid unneccessary layouting computation.
    – 36:30 Problems with deferred repainting and coallescing: scripts asking for information that is available only after redoing the layout. This forces the browser to immediatelly do the computation and slows scripts down. He gives tips and best practices on how to avoid that. Relayouting.
    – 41:40 "The cost of relayouting an element is affected by the type of element the ancestor is"

  9. – 43:16 Hiding elements. display:none vs. visibility affect rendering performance in a different (and non obvious) way.
    – 49:00 Optimisations with absolute positioning: Absolutely positioned elements within other absolutely/relatively positioned elements cause more computations.
    – 52:00,55:50 Table vs floating layout for performance: depends, width computation on deep nested elements is expensive
    – 54:30 Benchmarking

  10. I think he is brillant, you d!cks would be a lot more nervous if you had to explain this to a crowd of people, it's so easy to make anonymous insults, losers.

  11. wow how did u put 1h vid ?? Are u Guru on Youtube i mean whn i created my acc i put Guru too but i cant do more then 10 min

  12. He's just not well trained in public speaking. It comes with the fact that people who are of higher IQ tending to be shier than people who are more "average".

  13. This guy can't teach anything, I would rather prefer someone with less knowledge but with the ability to actually transmit it.

  14. does this mean that under hover (besides being specific) its best to start with visibility:hidden so that it will switch faster while in site (I doubt it takes too much time to load)

  15. I don't care what people say about him. He's good. But I was having a hard time absorbing his words, apology English is my second language. I usually understand native English speakers but this one, I absolutely don't understand. I know this talk is excellent, talking about tips etc, …Anybody who could provide short summary of his talks. thanks in advance!

  16. Can people start appreciating other people's work without complaining every two seconds?
    The root reason for dissatisfaction usually lies in the poor sexual life. You know.

  17. the guy is acting like if someone is pinching on his balls. fuck sake he is such a stupid person can't use a fuking mouse!!!

  18. #thankyou   #mozilla  #web #browser #data   #user   #mozillapubliclicense   #datadarwinism   #notimeforcaution   #badwolf   #redpill  @Sefra Correa @Marielyn Correa @Wilson Correa watch this video

  19. Not enjoyed watching it thought. Though concepts he covers are ggod, but more code examples would have been useful. better search for what he covered here in google.

  20. i have my site
    that is html css template
    i want to decrease its loading time as it takes more then 7 seconds for loading a page
    so let me know how can i decrease time loading to 3 seconds ?

Leave a Reply

Your email address will not be published. Required fields are marked *