This is the report I'm submitting to my department, which I put here so that there's a public place I can link to.
The USENIX-LISA Conference: A Trip Report
On December 6 through 9, I attended LISA '11: 25th Large Installation System Administration Conference in Boston Massachusetts. This conference is total immersion, day and night, time away not just from work but from life in general, where you swim in the sea of technogeekery until you forget that there is any dry land. I'm somewhat peripheral to this world, since I don't myself have to plan for data storage scalability nor is it likely I will ever use or select or even document a configuration management tool (for example), but one of the frustrations of my job as a technical writer is that I have very little opportunity to learn what is happening outside of the very specific things I document. Even more importantly, I have very little opportunity to meet the people who spend their worklives using the sort of documentation I write, and that is why I try to go to LISA and maintain the connections I make there.
SPECIAL NOTE: After writing this report, I found out that the LISA conference site has been posting videos and audios of many of the talks. If you want to check out any of these talks further, go to this site to see what's available: http://www.usenix.org/events/lisa11/tech/
I spend my time at LISA looking for a hook for this report, a specific incident that encapsulates what this conference is about beyond the talks and informal sessions themselves. (The conference also includes many training classes, but I don't attend them.) I found it on Thursday night, heading towards 1 am in the conference hotel bar. I was sitting with the Web Hosting Department team from the Boston publishing company Scholastic, and they were buying me a drink. I say a "team" because this was a group of system administrators and developers and their manager -- the manager of a group that would traditionally be from at least two departments.
So point one is: The theme of this conference was "DevOps" -- more on this later -- which, in summary, is a general term for the cooperation between developers and operators as the paradigm for the direction that system administration is moving. This group was at the conference to present a session on "Fixing the Flying Plane: A Production DevOps Team", and they were celebrating that the talk had gone well. I hadn't seen the talk, so they were filling me in on the history of their group and how it operates.
Point two is: Sitting around a hotel lounge drinking with a DevOps team is a way of learning about their work and their lives that is unusual and, to my thinking, ideal.
Point three is: Wait, why were they buying me a drink? (They would have bought me many drinks, if the bar didn't kick us out at 1am, and they were arguing among themselves over who would have the privilege of paying for it.) Because the previous night, after a very long evening attending a couple of technical sessions at which Red Hat had a strong presence, and then talking informally with a group of other Red Hat employees (one of my longterm conference friends told me the next day that I had been "on the clock" until way way too late), I went over to one of the "Vendor BOFS" -- basically parties hosted by companies trying to publicize themselves, in this case Dropbox. Some of my friends were sitting with these folks, and they called me over and we all talked for a while and I sang a song, so I was a familiar face and we were already on friendly terms.
What I'm describing was something of a party, yes, but the conversation (as most of the conversations are at this event) was very much about our work. In a field that defines itself through process and structure and tools, you can forget the importance of more open-ended ways of learning
LISA, Red Hat, and Me
This year, there was a far larger Red Hat presence at the LISA conference than there has been before. There was even a thread on memo-list in which all the attendees were weighing in. I think this is good in general, both for the attendees and for Red Hat's reputation. In fact, there is a thread on memo-list this week about increasing Red Hat's presence at technical conferences in general, for publicity reasons.
One of the evening "BOFs" (Birds of a Feather session, informal meetups) was on Gluster, given by John Walker. There was a large Red Hat contingent there, and it was my first general introduction to Glusterfs. Since the point of the BOF was to talk about Gluster and its purpose and its history to an extremely interested audience, it was as good of an introduction to the product as I can imagine. It was also good to meet the other Red Hatters; I often feel isolated in Minneapolis.
The day before I went to LISA I went to Red Hat's Westford Office for the day, where I ran into Ric Wheeler who told me that he would be attending the file system BOFs at the conference. I told him that I'd see him at the Gluster BOF and he said that he'd see me at the pNFS BOF as well. I said I hadn't planned on going to the pNFS BOF and he repeated that he'd see me at the pNFS BOF as well. I got the hint -- I hadn't realized there was a Red Hat connection, but it turns out that what Red Hat releases support NFSv4.1 and what aspects of pNFS is a significant part of what people cared about. Ric provided that information.
During the Q&A session after the opening talk, I got up to the microphone to make a point and introduced myself as a "technical writer at Red Hat". What this meant is that for the rest of the conference people came up to me to talk about their Red Hat setups -- not with questions or problems (mostly), but as a way of making small talk. I learned to smile and nod and pretend I understood the technical ramifications of their concerns. But, oddly and uncomfortably, a couple of people came up to me to complain about technical writers in general. I wasn't sure how, exactly, to respond, but as I say, I had already learned to smile and nod and look as if I understood. Still, I was making a point of making my (and Red Hat's) presence felt.
Oh, and I should point out once again that in the larger sysadmin world, the fact that I wrote the LVM manual seems to be my big claim to fame.
One of the more interesting comments in the strangers-approaching vein was from a man named Alan Kraft at USPTO (he gave me permission to use his name). On hearing that I was a technical writer from Red Hat, he wanted to know why Red Hat was not sponsoring the sort of small system administration special topics booklets that the SAGE organization (the USENIX sysadmin spinoff) provides. It was his advice that this could help Red Hat, particularly as the booklets would talk about technical configurations that Red Hat supports. I'm not sure where to take that suggestion, good as it is, since we are barely treading water in providing what we need to provide for RHEL customers, but on the whole I liked both the idea and that he cared about it (and us).
I was not the only technical writer there. Janice Gelb of Oracle (formerly of Sun), who has previously given talks at this conference on documentation, oversaw a "Guru" session on documentation for system administrators that I sat in on. Some advice she had for administrators working without technical writers:
- Provide templates for procedures.
- Don't use spreadsheets.
- Separate out sections of a wiki that are regularly maintained from comments sections, which can confuse people about what has been fixed.
- Convince people it is in their self-interest to document.
- Check out the book "Read Me First! A Style Guide for the Computer Industry".
DevOps: The Conference Theme and Three Talk Summaries
Although this is not, specifically, its purpose, the LISA conference provides what is usually the only time a system administrator has to stand back from day-to-day tasks and projects and consider more general aspects of the profession, and to consider his or her place in the field. This can be inspiring, and energizing, and is, I think, why many people I know make a point of attending LISA year after year. This holds true especially for me, even though I am not a system administrator; attending my first LISA conference is what made me feel that what I'm doing each day connects me with a larger community.
The conference chairs, Doug Hughes and Tom Limoncelli, were very clever this year in bringing the subtext out into the open and giving the conference a theme that could be summarized as "where is the profession of system administration heading as developing technologies change the technological landscape". The answer to this question is a jargon word: "DevOps", which was a new term to me (which just emphasizes how important it is for me to attend conferences like this, so I can learn these words).
OK, then, what is DevOps?
Devops Talk 1:
SRE@Google:Thousands of Dev Ops since 2004
Tom Limoncelli, Google
Fortunately, I was able to attend an unscheduled talk the night before the conference that was given by conference chair Tom Limoncelli. Tom works for Google, and his talk (originally given at a Perl conference), summarized the processes they have been putting in place there.
Tom's talk began by noting that in the 80s, software development used a "waterfall methodology": Developers were separate from operators, and sent their work down a procedural waterfall, with no reverse direction. There were no bugs because there was no bug tracking. But by around 2005 the paradigm no longer worked. Competition grew more intense and uptime increased in importance. Feature upmanship was important as well, but you want to be reliable.
The concept of DevOps is a paradigm lets you break through the waterfall. It is a developmental process that requires interaction between development and operators. At Google there are no "system administrators"; instead there are SREs: Site Reliability Engineers. Their focus is not on administration, but on reliability. Developers run their own services. SREs can get assigned to a development group, but only after some time and at a point of hand-off readiness.
I can't say that I entirely understood what this means in terms of the day-to-day job of system administrators in companies that don't produce the sort of applications that Google does, but I think that was the point, to consider what it could mean for system administrators to take a more active role in the reliability of a company's product than previous working models allowed for. It's sort of a pie-in-the-sky thought game, I think, which is why you need to go away for a few days even to think about it.
DevOps Talks 2:
The Devops Transformation, by Ben Rockwood of Joyent
With Tom's talk as background, I felt armed to attend the keynote address: The Devops Transformation, by Ben Rockwood of Joyent. Ben said that we are seeing a transformation in the culture of system administration.
What, then, is DevOps? A cultural and professional movement. It is not a tool (thing), and it is not a title (person). DevOps is:
- Collaboration of people
- Convergence of Process
- Creation of tools
In that order.
Ben talked about issues of motivation, and collaboration, and thinking about the "why" that's at the core of any development rather than the "what" that is the tools and services themselves. Dev requirements and ops requirements have traditionally been "siloed" -- each in its own silo. This is related to Tom's notion that traditionally there was a waterfall, down from dev to ops. Either way, the goal is to change the model to a more collaborative one. You as the system administrator have to know what's going on, you have to know the "why" behind why you do anything.
Ben spoke about some current business models that have been pushed down on employees from the top (Agile and ITSM in particular, but others as well that were not familiar to me either). But, he noted, the Cloud changes things and the IT paradigm shifts. Broad platform standardization becomes realistic, dev can bypass IT, anyone can be a player. On the other hand, the Cloud allows IT teams to offload undesirable or complex components. The Cloud is here to stay, but it will create more demand for system administrators, as concerns move up the stack.
For the rest of Ben's talk he summarized the history of various business tools and models of operations management over the last hundred years, and how each built on the previous ones. He noted that lots of things we consider common sense in business practice were recent inventions. He spoke of Frederick Taylor (scientific management), Henry Ford, Alfred Sloan, Toyada, Shewhard, Deming, Taiichi Ohno, Peter Drucker (father of modern management), Ludwig von Bertalanffy (the father of systems theory), Arnold Feigenbaum, Eliyahu Goldratt, James Womack (The Machine that Changed the World). His point was that we sit on the shoulders of giants, and there is a continuous chain of ideas. Ideas are not new, but what's new is the application of proven ideas that are new to the field.
It's probably a good thing now and then to get a little seminar in the history of business practice and theory.
DevOps talk 3:
3 Myths and 3 Challenges to Bring System Administration out of the Dark Ages
Mark Burgess of CFEngine
Mark began his talk by saying it was the same talk as the keynote, but in a slightly different way. Mark reiterated the general theme that big changes are taking place in the system administration world. The proliferation of mobile devices is driving changes in IT systems and the ways we have to support them. We have to work in particular way, adapt to new situations.
DevOps is one expression of that wave of change.
Mark then presented a few slides on "where we've come from and where we're going" -- the same general approach as the other DevOps talk. He spoke of the Dark Ages when we solved problems with brute force, and compared it to system administration. He noted that there are still areas of system administration where brute force methods are widely in use. We apply keywords, blunt force instruments, to swat these frogs away. Firefighting, frogfighting.
Industrial techniques have taken over and allowed us to scale up size of things. Similarly in system administration. Early LISA papers covered topics such as how to install 100 machines in a day. Now the papers describe how to install 10000 machines in a day. He spoke of Alvin Toffler's 3 waves of technology, and extended their meaning to apply to 3 waves of system administration:
- 1st wave: Agricultural (by hand)
- 2nd wave: Industrial (amplified by machine),
- 3rd wave: Knowledge (learn and design with intent)
Mark spoke of some of the processes that came in the 2nd wave of system administration as being myths. For example, the second wave of system administration produced the myth that you need ordered sequential control -- it was the flow-chart script era. But technologies make possible optimizations that were not possible in serial world.
The second wave of system administration produced the myth of determinism and rollback -- transaction control, dealing with the outcome, not the mistake. But time does have an arrow and you can't roll things back arbitrarily. You need to plan for irreversible and high-risk operations, and make things predictable.
A third myth was: hierarchy or bust, using decision trees. Trees give you the illusion of predictability, but it's easy to get wrong when you can't put things into distinct categories. He summarized the categorization approach as "Lie back and think about sets." He noted that a more general approach is to tag, to allow for sets that overlap.
Mark then talked about the challenges that the third wave brings: emergent complexity, commerce alignment, We need the infrastructure to be self-sustaining, since we don't want to bring out massive machines and force humans to work for technology. An important third wave challenge is knowledge; experience is being diluted and is harder to come by. Individual skill is now irreplaceable, and expertise is now at a premium. Patterns bring meaning.
The summary concluding question of the talk: How do we help sysadmins (infrastructure engineers) ask the right questions?
Some Other Talks I Attended
Some of the talks I attended were about specific approaches system administration teams took to address challenges for which I'm not sure there's any value in typing up my notes. These are the talks I attended in this category:
- Issues and Trends in Reliably Sanitizing Solid State Disk: Michael Wei of University of California San Diego. Sanitizing single files is difficult. Software overwrites cannot reliably sanitize. Scrubbing allows us to sanitize by modifying the FTL (flash translation layer).
- Apache Traffic Server: More Than Just a Proxy: Leif Hedstrom of GoDaddy. Proxy servers can do a lot of stuff, including act as a router.
- Infrastructure Best Practices: Linux System Capacity Planning: Rodrigo Campos. Best quote, originating with physicist Neil Gunther: "Models come from G-d, data comes from the devil."
In addition, however, I attended several talks of more general interest. I summarize them here.
One Size Does Not Fit All in DB Systems
Andy Palmer of Novartis Institute for Biomedical Research
Andy Palmer works in what he described as the "intersection of bioinformatics and technology". There are now many sizes of databases available, and he has experience in "database fitting rooms".
The fundamental storage change is big data, where the cost of fixing errors grows exponentially and you can't redo your data as easily as you once could. Traditionally in an academic setting you could rework experiments, but it has always been different in business. In any case, it is "time for an enema" in the database business.
Database warehousing is 1/3 of the database market, but it is responsible for 60% of the growth. 25-year-old DBM technology ignores disk trends. Drives got bigger, but slower. Meanwhile CPUs got faster. In the 90s there were read-mostly applications, but these were bandaids. The architecture still limited performance.
In the 2000s many specialized engines appeared on the scene (Vertica, Hadoop, Google big table, many more that I didn't get down from the slide). There are new technologies. Formerly there was no choice, but now there's lots of choice. The question is which engine is right for which workload. To determine this, you need a framework and tools to characterize workloads and match databases to engines based on empirical characteristics. DB professionals have to pick right engine before starting.
Andy then summarized various database technologies: raw store, column store, file-oriented, document-oriented, array-oriented, graph databases. An interdisciplinary skill set is required to build, maintain, design, and manage databases.
- Oracle sucks.
- Oracle is the solution to everything.
- Each application should have its own database.
- We should have only one large database.
- Federation is the solution.
When there are needs other than scale, traditional DB engines such as Oracle are not sufficient. You need to try different technologies, but start with questions and analytics.
- One size does not fit all.
- Engines can produce radical benefits.
- Important to consider up front which engine matches the workload.
- Operation of new engine requires integrated skill set.
My First Petabyte: Now What?
Jacob Farmer of Cambridge Computer
Jacob Farmer gave a talk on various issues that arise as your storage needs grow. He had many, many slides each jam-packed with text. This is just my edited summary of what I was able to get down, my sense of his major points.
Naturally, after writing this up, I found the presentation on YouTube:
Jacob identified five stages of data proliferation grief: Consolidation, scale out, backup futility, budget anxiety, misplacement. As disciplines become more data-intensive, we are a snowball rolling down the hill, and the rate of accumulation picks up.
Intrinsic challenges as you grow big:
- scalability of the individual file system
- backup and restore
- fault tolerance
- disaster recovery
- hardware refresh
- namespace issues
In a modern petatbyte-scale RAID array, drives can be striped vertically across cabinets. One cabinet can fail and you don't lose your data.
Is too much redundancy a good idea, since you can't back up and restore in a meaningful amount of time? Is there a better way to protect against a device failure? Consider replication, backup, mirroring at a different level of abstraction. How big is the building block? What are you building?
Copying a petabyte can take an extremely long time, depending on the storage. Copying a petabyte over LAN is daunting. Most modern backup systems try to minimize repetitive movment of data that does not change.
Conventional file system metadata is insufficiently descriptive. You can use different methods of abstraction if you associate descriptive metadata with files.
Consider what you need for backup policies. Are the policies the same for the entire petabyte? How often do you need to backup, how long can you wait, how long to you need to retain data, etc. Consider throwing data away: Can you afford to keep absolutely everything forever? Can you move some to low cost media?
I have no ultimate conclusion here, except to summarize that there are various approaches people need to start taking to change how they store and backup data.
What Will be Hot Next Year: A Panel
A group of high-level consultants sat on a panel and answered open-ended questions about what they see coming down the line in the world of system administration. There was nothing particularly dramatic here -- in fact, all of the panelists seem to think that there's nothing exotic and different in the near future. The future is already here, and what we will see is evolutionary rather than revolutionary.
Some small things that came up in answer to the moderator's question about what you see coming down
- Hosted disaster recovery through the cloud
- Near imminent employment of 10 gig in mass volume
- Deployment of monitoring measurements into Enterprise
- Solid state storage getting faster and cheaper, people are figuring out what to do with it
- Virtualiziation of individual disk space
- Issues with how humans will be able parse large dataand make it relevant. Spend time on statistics.
What should people be preparing for?
- Talk to your vendors, ask for APIs, ask for open standards.You have the power, use it.
- You need to be familiar with capabilities of different standards (how interconnect works).
- In storage, it helps to start by knowing what you need.
David was distressed to find out, when he was hiring recently for a system administrator position, that pretty much all his candidates seemed sad about the fact that they were not currently in a position where they were able to implement best practices. And the more he talked to them about their current jobs, they sadder they got.
So he put together a talk on the research that has been done on what makes people happy at work, in the hopes of making system administrators happier at work.
David said there are two mindsets: A fixed mindset and a growth mindset. To a fixed mindset, qualities are given and carved in stone. To a growth mindset, things can change and grow through application and experience.
To a fixed mindset, every situation calls for confirmation of intelligence, personality, or character. Every situation is evaluated: will I succeed or fail, will I look smart of dumb. This requires a diet of easy successes. Mindsets can the meaning of failure and effort.
Organizations also need to make mistakes in order to improve, but they go to great lengths to avoid anything resembling an error. People learn better when told to learn and not worry about mistakes. In other words, a growth mindset makes for a happier work experience.
David says you can change your mindset:
- Learn to hear your fixed mindset "voice".
- Recognize that you have a choice.
- Talk back to it with a growth mindset voice.
There are two motivations: extrinsic and intrinsic. Using extrinsic rewards lends to hedonic adaptation. To kill intrinsic motivation, give an algorithmic task with fixed rules. To gain intrinsic motivation, give a generalized "here's what you have to accomplish, now go do it".
Intrinsic motivation is:
- autonomy (not independence)
- not empowerment -- empowerment makes it sounds as if the company has all the power and they are ladling it out to you.
Can you act with choice: task, time, technique, team .
Change the organization around you:
- Change one part of the system
- Gain autonomy (negotiate for a whole project)
- Gain mastery (use VMs to make a world you can break)
- Gain purpose (teach)
You can change your perception of the world.
David has a bunch of books on the issues of motivation and work happiness he can point you to.
Project Caua: A project for Latin America
Jon "maddog" Hall
Jon Hall talked about Project Caua, an open source hardware and software project in Brazil.
In Brazil, 86 percent of the population lives in urban areas, densely packed, which provides opportunities for developing Internet networks and support. The goal of Project Caua is to use these opportunity to create millions of private sector high tech jobs in Latin America. The additional goals include:
- Making computers easier to use and environmentally friendly
- Creating low-cost or gratis super computing capability
- Using sustainable private sector funding to make business materials open and free.
The model for the project is that there will be HA servers in tall buildings, which will bring high capacity Internet into the building, with thin clients throughout the building running a FOSS HPC grid. This infrastructure can enable create millions of new jobs: This enables system administrator/entrepreneurs to start their own business selling computer services.
Most people can do things on their system that they do every day, but have trouble with things you do once in a while (backup, update). A system administrator can do this, and can create a business doing this.
In the US, over the last 40 years, support has been moved further and further away. First support moved from offices into basements, then to support centers, then India. Those who buy support are not the ones who use it. But if you could sit down with a person you could solve your problems in just a few minutes. That person can see what a customer's issues are and can find programs and tools to address them. This is the model of system support Project Caua is going for.
In all those tall building, there are lots of small businesses that can't afford a fulltime system administrator, but they could share one. There are vertical markets: small businesses, apartments, hospitality, point of sale. In private apartments the system administrators can service: desktop, digital tv, IP, Radio, security system etc. In the small-medium business market provide filing, email, print services.
Environmentally friendly computing can reduce electric usage. Thin clients are built locally. You can resell and recycle: 3 size servers, upgrades, parts are interchangeable and under warranty. The systems can form a supercomputer grid that can call idle cycles.
What about training? This project will create the need for two million administrators in Brazil alone.
- Training available over the Net
- Use virtualization to allow virtualized network training
- Use older systems for real hardware training
- Provide specialized training for specialized tasks
The base salary for a system administrator is $1800/month, and the administrator can make additional revenues from various things. Information will be available on how to buy business loans, private banks, underwriting. Business plans will be available on web.
The SA is an entrepreneur: The SA leases, rotates old equipment, keeps servers responsive, owns their own business.
How large is the project? There 194 million Brazilians, and even at 10% penetration that's 100,000 jobs. The field test for the project is scheduled for February, with V1 in May-June. Apprenticeship training program in Argentina
This could work in US inner cities; you cold use some technologies, but not all.
What is Watson
Michael Perrone, Manager Multicore Computing
The final talk of the conference was about the Watson Jeopardy-playing computer. The final conference talks tend to be on the lighter side. I didn't even take notes, I just sat and enjoyed this.
Here are some random things I remember from the talk.
- This guy knew his audience. At the end of a talk filled with funny stories and interesting tales of refining algorithms, he said he saved the best for last, which was a one-page summary of the hardware used for the Watson project.
- Watson spent many months (years?) playing test games with Jeopardy champions.
- No visuals or audios were used in the game itself; Watson is not yet equipped for this, in part because image recognition software is not yet as developed as word parsing software.
- They built a replica Jeopardy set at the Watson research center, which is still there. They say they have gotten huge amounts of publicity surrounding this project, and they are still milking it.
- An advantage that humans have is that humans can know that they know something before they actually have the answer.
- Because Jeopardy has been on the air for so many decades, they had an enormous number of past questions for their training data.
- The speaker gave a few examples of answers that Watson gave while they were refining his algorithms, the best of which was in answer to the question "Rhyming term for a hit below the belt". The answer you or I would give is probably "low blow", but Watson came up with "wang bang", a much better response.
- A friend of mine asked if you could use the Watson research to develop ways to teach autistic (asperger's?) children to understand speech nuance. The answer was that this is a great idea, but unfortunately there's more funding for business applications than educational ones.
Bringing this back to the conference theme: I don't know that there's anything in particular I am going to pull away from the notion of DevOps, but the model of collaboration and a two-way street between development and technical writers is one I think we've all been working towards for years. It's true that when I started in this field, information came from on high, and the participation of technical writers in project and planning meetings would have been unthinkable. So there are definite parallels in the paradigm shift. I think, in general, I like the notion that I'm part of the team that produces what we produce. This conference reminds me of that. It does re-energize me for my work. And teaches me all the new buzzwords.