Hugepages on Linux…Yes You Can Actually Use Them

March 3rd, 2010

Memory on a computer is broken up into pages. Pages used to be one size (eg: 4096 bytes). For various reasons nowadays there are lots of page sizes. Depending on the hardware, in addition to a “base” page size you might have the additional ability to declare that certain parts of memory should be treated as pages of different sizes (eg: 8K, 64K, 256K, 1M, 4M, 16M, 256M…). The reason you’d use these “large” or “huge” pages is that in some cases it improves application performance by minimizing low level overheads and maximizing low level optimizations.

All nice in theory, but it’s been a highly manual choice that a programmer has to make: Which size do I make which pages? Knowing the answer probably requires more computer micro-architectural knowledge than most software developers have. Then there are all sorts of additional complications with practical stuff like getting hugepages, managing how many of which size pages are available for use, and being robust in the face of changes to that availability. For the most part there have been very few programs which use hugepages because of this complexity, which in the past even had annoying features like essentially requiring one to reboot in order to change some of these system configuration settings. And it’s just been too hard for the programmer unless they’ve got a fairly simple use of memory, like a large static-sized shared memory region (ie: DB2/Oracle/MySQL).

Things have progressed massively though. Huge page usability on linux has gotten much easier thanks especially to kernel and userspace utility work Mel Gorman has spearheaded over the past couple of years. Mel’s got a series on LWN discussing these advances. The articles cover a bit of the technical details behind how hugepages give your programs benefit, how to use libhugetlbfs with your applications, and how new utilities in libhugetlbfs allow you to quickly and easily do things like manage multiple pools of different pagesized pages, test whether your application benefits from different possible hugepage backings and actually exploit those benefits even without rewriting your application specifically for huge pages.

While you might want the holy grail of an omniscient operating system that automagically always has your application backed by the “right” page sizes in the right places, what we have now can truly be considered user-ready.

Specifically the first part covers the background context of how hardware and low level operating system software deals with memory. The second part gives a quick introduction to some of the interfaces available for using huge pages with the different types of memory regions an application can have (eg: text, data, BSS, heap, stack, shared memory, anonymous mmap’s).

And…as a teaser:

Subsequent articles in the series will be getting into the key commands for hugepage pool management, running an application with hugepages in different configurations and showing off how easy it is now to test/profile an application workload to see if hugepages would provide benefit. Mel’s written up specific examples of testing a couple well known benchmarks with hugepages and discusses how (and how much) hugepages works for certain types of workloads.

Jeremy Allison on Sun’s death

March 3rd, 2010

I’ve poked at Sun repeatedly in past blog posts. While Jeremy Allison isn’t with Sun, he was once and understands why Sun did the things they did.

Linux systems from Red Hat and others ate Sun up from the inside out, by colonizing their customer base. Sun vs. the Linux world is a wonderful example of the weakness of proprietary licensing and trying to maintain control over software versus the GNU General Public License (GPL) and decentralized development model that Linux uses.

It’s not just Linux v. Solaris though. The same issues around user/developer community (ie: market) came up with Java, OpenOffice, MySQL, SPARC. Open source people talk about “community” and there are all sorts of ways to think about that vague term. But in the end, not being “community friendly” equates to not being friendly to your customer base and it hurts your market share over time. Jeremy’s blog post gives some nice examples of this.

It’ll be interesting to see what Oracle does going forward.

Great February for solar power in Oregon

February 28th, 2010

Our panels produced 173 kWh this month. I don’t actually know what our consumption was, but probably this works out to a production equivalent to something like 25% of our consumption. Not bad for the middle of winter in our rainy, gloomy region.

Chickens made it one week!

February 26th, 2010

We’ve made it one week. Today the birds are starting to establish a pecking order: pecks on the head and standing tall posturing in front of each other. One took a hopping flight across the box and pulled its feet up under it and landed on its butt, bounced, then stopped and looked around with a WTF? sort of look on its face. They’re pretty funny to watch!

Chicks (day #6)

February 25th, 2010

More attempted roosting on the feeder. More wing flapping and even semi-coordinated jumping and flapping that sends them in the direction they appear to intend to go, albeit with the occasional veering to the side into another bird or the box wall.

This picture gives an idea of their wingspan now. On Saturday they didn’t really have but the littlest stubs of arms.

Chickens (day #5)

February 24th, 2010

The birds are starting to try to fly. They have spurts of energy where they run and flap and really get moving.

As every day, there’s a bit more feather and less down. Their feet seem suddenly bigger.

They’re starting sit on the feeder sometimes as if it were a perch. I think I might make a small perch for them even though they’re clearly not quite coordinated enough to get up on it or stay on it. Given the option though, if they get to where they can sit there, I’m thinking that’ll keep the feed cleaner. Diapers clearly aren’t an viable option and I’m guessing you can’t litter train these critters.

The Rhode Island Red is starting to seem like the leader. The others are tending to bed down, eat, drink or just run somewhere to look at something a bit after she chooses to.

I put the water bottle on a piece of 3/4 inch thick board to lift it up off the pine shavings and it’s stayed much, much cleaner through the day.

Chickens (day #4)

February 23rd, 2010

Main thing about the birds today: they’re getting messy. They’re all about eating and pooping and kicking the pine shavings around. The first day we needed to clean their water once a day. Now it’s definitely every couple hours. And no sooner do you clean something, they’ve pooped on it. Where are chicken diapers? I guess those would need changed every three minutes. But nevertheless, it’s easier than having a litter of puppies or cats.

Multicore and no cache coherency

February 22nd, 2010

The talk today in Portland State’s CS colloquium series featured Intel’s Tim Mattson talking parallel programming and Intel’s research chips.

I’d followed the Intel press releases on their Terascale chip with a lot of interest. Turns out that was definitely meant as a research chip only to test some hardware, with all of maybe five people ever having written software for it and that seemingly as an after thought so the marketing would be able to say something more about the chip.

Just the last two months Intel Research has been making some press with their SCC chip (“Single Chip Cloud” computer…what a marketing name!). This one they’re aiming to actually get out into the hands of researchers. They’ve got a bare metal mode, a full linux kernel per core, and Microsoft’s announced something or other too. It’s particularly interesting though in how it is set up to leverage message passing and does not give cache coherency. It should spawn some interesting academic research in the coming year or two.

Mattson’s definitely of the mind that the way to deal with some of the central issues with scaling is to stop trying to have cache coherence. He makes pretty straight forward arguments. It’s interesting the parallels with distributed computing going back a couple decades even, both in basic programming and in reliability assumptions.

One point that struck me: He said that not many programmers are used to thinking in distributed terms and that most prefer a shared memory model. Probably most of HPC is looking at cache coherent single system images when you’re at the dozens of cores type scale…the scale of these research chips. Mattson comes from a chem background and certainly a number in the audience were HPC types. But maybe their marketing name actually has a bit of foundation in looking at the web space instead of HPC. If you look at today’s really popular web applications, they’re backed by a distributed software model using low end commodity systems that are assumed to be failure prone. So there’s a whole generation of programmers who take it for granted that if you scale up much you need to take the time to architect in a distributed way…they don’t just scale by adding/allowing parallelism and assuming they’ve got a giant machine with a single address space. Probably most of them don’t even program at a level where they know or care what an address space is!

The other thing I took away is that I should probably be paying a little attention to OpenCL.

Chickens (day #3)

February 22nd, 2010

Today was a routine day for the chicks. It’s amazing the rate at which they’re growing. They’re eating and pooping more. They are stretching their legs and arms and necks, which are all notably larger. Their wings are increasingly covered by regular feathers already. Tonight they’re starting to show loose tail down and a wee bit of normal feather tail peaking out, especially when the stretch and point their tail out.

There are couple notable changes in behavior today. The little chickies are acting more like hens, scratching the ground, pecking the ground and starting to assert and peck each other even a little. And on occasion they’re sitting like a hen, as opposed to falling asleep and slow motion falling over forwards onto their faces and sleeping with their arms splayed out at their sides. They’re also preening themselves a lot, which makes sense given the extent to which their subbing out the down for feather.

The Rhode Island Red took a nap in Jenn’s lap.

Chickens still alive

February 21st, 2010

Day two with the chickens has been pretty straight forward. They like to mess their water up. They’re eating, drinking, sleeping, running around like spazzes and then falling over on their faces asleep. Ms. Brahma got pasted-up and I got to learn how to address that…hopefully I did the right thing.