Skip to content

Continuous Integration Blogs Aggregator - Automated Build and Unit Testing

Companies

Our First Postmortem

The Circle Blog - CircleCI - 13 hours 55 min ago

CircleCI is a platform for continuous delivery. This means (among other things) we’re building serious distributed systems: dozens of servers running thousands of builds across hundreds of “container” hosts, coordinating between all the moving parts, and taking care of all the low-level details so that you have the simplest, fastest continuous integration and deployment possible.

In mid-March, we made some enormous infrastructural advances, starting with allowing a single build to use containers spanning several servers (previously, a parallel build’s containers all had to be co-located on a single server.) This, in turn, allowed us to crank up the parallelism: several of our largest customers went from running their builds across 4 containers to running them across 12 containers, over the course of a couple of weeks. Fantastic!

The features worked well in & of themselves, but the sudden growth put a lot of strain on our system as a whole, and blew several of what might otherwise have been small problems into huge ones. We spent the last week of March and the first few weeks of April in almost full-time firefighting. Here’s what happened…

issue #0 – dying servers

As is often the case for us, the first thing we (and our users) noticed was a queue backlog. Internally, we observed “stuck” builds, which our coordination layer thought were running, but were not actually running on any server. And we also saw builds failing, when the containers they were using became suddenly unreachable.

Very quickly we realized that our servers were dying. If the server which died was the “owner” of a particular build, that build became stuck in the coordination system. If, on the other hand, the dead server was just a slave doing work on behalf of some builds, those builds would fail because they couldn’t reach it anymore.

The stuck builds confused our concurrency logic, blocking lots of builds from running even though they should have been. Our scaling code also relies on being able to get an accurate high-level view of what’s running from the coordination layer, so it wasn’t making good decisions. And finally, our scaling code relies on working build infrastructure to deploy new servers! Lots of our attempts to scale up the cluster were getting stuck… just like everyone else’s builds.

For about a day, we mitigated the problems by hand while investigating. Manual mitigation involved starting many, many, many servers by hand to work around the scaling failures (and overcompensate for crashing.) We also built ourselves some crude tools for detecting and clearing stuck builds.

The first round of analysis quickly showed we were having kernel panics, causing the boxes to reboot (we do not set up our servers to automatically rejoin the cluster on reboot: we prefer to stick to the well-worn path of starting a new machine.) Our monitoring wasn’t catching these well because the servers very quickly came back “up”. To make sure we’d get alerted when a system reboots, we added heartbeat-based monitoring of the actual server process.

And then the other shoe dropped: the cluster leader died. The cluster leader in our system is very simply the most recently booted server: its job is to decide when the cluster should scale up or down. When the leader died, early in the morning Pacific time, the cluster became static, and completely failed to scale up for the weekday morning demand.

After a mad scramble to get the cluster back on its feed, we added a deadman’s switch, for the leader role only.

Once we’d bought ourselves breathing room with our changes, it only took a day to reproduce, isolate and fix the problem.

The speedy fix was, frankly, a lucky guess. There were some gnarly error messages in the system error logs which pointed roughly at LVM snapshot problems. (LVM is the linux volume manager, which among other things allows point-in-time snapshot volumes.) We reproduced the issue by creating the aptly-named ‘circleci/dummy-disk-filler’ project, and running its tests many times concurrently. The lucky guess was in the isolation step: the first thing we removed was a vaguely suspicious command: resize2fs, which we use to adjust the logical volume size to match the space available in the underlying physical device. Lo and behold, removing it completely fixed the problem!

The fix we eventually shipped actually just moved the resize2fs command. Instead of:

lvcreate image -> lvcreate snapshot -> resize2fs

it went:

lvcreate image -> resize2fs -> lvcreate snapshot

(which makes a lot more sense, actually.)

In the process of torturing the disks, we found and fixed several other defects in the LVM configuration, one of which was a kernel-panicking bug in slightly different circumstances!

issue #1 – slowclones

This bug manifested as — you guessed it — a severely backlogged queue, and a huge spike in the number of concurrently-running builds, but without a simultaneous spike in load. We were running a ton of builds… but they weren’t doing any work.

We quickly traced the problem to extremely slow git clones from GitHub. In particular, we found that clones of large repos were taking an extremely long time (hours) in git-upload-pack. They eventually seemed to finish successfully, though.

The first time this happened, it caused a bunch of nasty cascading failures. There was (at the time) no timeout wrapped around the git commands which we use to configure a project for building. Similarly, the ‘cancel’ functionality didn’t work during this stage, so people couldn’t “work around” the issue themselves by canceling their slow builds. But the worst thing was that our scaling infrastructure relied on GitHub’s good health, because each new server clones the code from GitHub — so we were almost completely unable to scale to the huge demand for (basically idle) build capacity. GitHub’s support suspected a bug with older git clients, and suggested that we upgrade our git clients to the latest version…

And then it went away.

In the lull between crises, we added timeouts and cancel-ability to the “configure the build” step, made some minor adjustments to the scaling algorithm, and upgraded git. We also started working on non-git-backed deployment of our own code, to keep our scaling working without GitHub (and to speed up our deploys, in general.)

When this problem came back, we did better: builds were slow but could time out (or be canceled) much more quickly, so the demand for idle build capacity was lower. However, we still couldn’t scale effectively without being able to clone our own repo, so there was still a big queue backlog.

During this outage we were able to trace the problem much more deeply. We discovered that packet loss between our servers (in EC2 US East) and GitHub (in Rackspace) was 10-30%, and we started working closely with some folks at GitHub and AWS to figure out what was causing the packet loss, and why packet loss caused slowclones.

And then it went away!

We made two quick, barely substantiated changes: we lowered the MTU on our servers to 1400, to match the manually-discovered path MTU between us and GitHub, to rule out any chance of a PMTUD issue. We also tweaked our git clone commands to try to be a bit less network-intensive (e.g. we made our shallower clones shallower, and cranked the compression settings.) Neither seemed to help.

The problem came back a third time, and we got some really odd stuff out of tcpdump, in particular a pattern of very slow retries of lost segments:

17:57:00.959562 IP (tos 0x0, ttl 64, id 46011, flags [DF], length 52)
    circleci.43171 > github.com.ssh: ack 95967, win 1584, length 0,
    options [nop,nop,TS val 1407819]
17:57:41.433879 IP (tos 0x8, ttl 48, id 29061, flags [DF], length 1500)
    github.com.ssh > circleci.43171: seq 90175:91623, ack 5768, win 27,
    length 1448, options [nop,nop,TS val 4131799216 ecr 1407818]
17:57:41.433926 IP (tos 0x0, ttl 64, id 46012, flags [DF], length 64)
    circleci.43171 > github.com.ssh: ack 95967, win 1611, length 0,
    options [nop,nop,TS val 1417937,nop,nop,sack 1 {90175:91623}]
17:59:02.385413 IP (tos 0x8, ttl 48, id 29062, flags [DF], length 1500)
    github.com.ssh > circleci.43171: seq 90175:91623, ack 5768, win 27,
    length 1448, options [nop,nop,TS val 4131819454 ecr 1407818]

Then, we shipped our new deployment code. The slowclones have come and gone a few times since then, and we cope with them as well as can be expected: we scale up fast to handle the spike in builds, but only the people whose projects are cloning slowly are impacted — the global queue doesn’t back up.

GitHub support and infrastructure folks have also been great: they acknowledge the packet loss each time, and resolve it as quickly as they can. However, we still don’t really know the root cause: we know that slowclones are triggered by bad network conditions, but that’s not a fully satisfactory explanation for why a 30-second clone should take 2+ hours.

interlude – bunk git upgrade

The git client upgrade turned out badly. The morning after it shipped, we were informed by our customers that we’d screwed something up, and most of their builds were failing. We immediately reverted the bad code!

This turned out to be a nice, straightforward bug. We isolated it easily: between git 1.7 and 1.8, the default behavior of shallow clones changed from --no-single-branch to --single-branch. In our system, this had the effect of breaking all non-master builds: we’d do a shallow clone and then try to reset to the correct commit, but it wouldn’t be there.

We fixed our stuff (including adding a non-master-branch end-to-end test!) and rolled forward again the next day without incident. Unfortunately, it didn’t help at all with the slowclones.

issue #2 – machines vanishing

After we’d resolved the kernel panics, while we were fighting slowclones, we also started seeing more and more of our servers vanishing. The symptoms were similar to those we saw during the kernel panic incident: “stuck” builds and spurious failed builds, and cascading effects on the queue.

The first round of analysis found that these boxes seemed to just “wedge” or deadlock: they didn’t panic or reboot or scream into the console. They just stopped responding to incoming requests, and any logged-in sessions became instantly unresponsive. After a reboot, the logs all appeared to stop at the same moment, with no complaints evident in any logs. Monitoring showed no dips, blips or spikes in memory, disk, network, cpu, etc at the time.

Without a lead, we put a lot of effort into mitigation. In particular, we extended the deadman’s switch so that it cleaned up nicely when a server vanished, and we parallelized our scaling code to be able to spin up way more servers simultaneously.

By the time those fixes were deployed, we’d spotted a pattern in the logs, even though it made no sense: 100% of the vanishing boxes were running a “restore cache” action. During “restore cache” we download a .tar.gz file, and then pipe its contents into tar -xzf - processes inside each container, over ssh. Pretty innocuous stuff!

In seeking a repro, we managed to make a box “vanish”, with identical symptoms, by overcommitting part of the LVM stack. This was very suspicious, since it was problems with the LVM stack which had caused our previous kernel panics, so we focused almost exclusively on that part of our system while attempting to reproduce this bug.

We managed to get a semi-reliable repro in our staging environment by hammering on the “restore cache” code paths — but it was very slow and (since it wasn’t reliable) difficult to interpret results.

We made a lot of barely-substantiated changes while we tried to pin this bug down: we fiddled with the initialization of our LVM stack, serialized most of the container initialization steps, and added a bit of swap to the servers. Several times, we thought we’d found the smoking gun… but in the end, none of these tweaks helped.

We also serialized cache restoration: in a parallel build, we had been piping the tarball contents into all containers in parallel. Switching this to one container at a time vastly improved things in production. It was still pretty broken, and much much slower, but we had a bit of much-needed breathing space.

We spoke with friends at Iron.io, support at AWS, and scoured the internet for bug reports and workarounds that might conceivably be related to our issue. We got a lot of advice, a lot of tips, and a lot of leads. We must have tried a hundred permutations of the core LVM stack. Along the way, we identified several scary, possibly related-looking issues in various components, and upgraded just about everything in the vain hope that it was Someone Else’s Problem. None of that worked (though the LVM upgrade, in particular, was something we’d been putting off for a while, and gave us some new superpowers.)

Finally, three hair-pulling weeks into this bug-hunt, we reached the point of going back to re-check all the conclusions we had made so far. Very quickly, an experiment found that the boxes hadn’t been deadlocking at all: they were just dropping off the network! Argh! There was a lot of (â•ŻÂ°â–ĄÂ°ïŒ‰â•Żïž” ┻━┻ in hipchat…

Within an hour of this revelation, and focusing on the network instead of the LVM stack, we had a simple, reliable repro script:

  1. start a fresh cc2.8xlarge instance with the AMI we use for our servers
  2. run our system initialization on it
  3. run 20 concurrent copies of cat /dev/zero | ssh $PUBLIC_IP 'cat > /dev/null'

These steps dropped the box off the network within seconds, every time.

As soon as we could reproduce the issue reliably, we started to remove things from our system init to isolate the combination of factors that broke the network. Two hours after that, we had a fix in production.

So… remember when we were fighting slowclones, and we tweaked the MTU down to 1400? Well, when we took out that part of our system initialization, the problem went away.

What an anti-climax :(

We sent some repro scripts and data upstream for other people to investigate why something as innocuous as changing the MTU would cause complete networking collapse… but we haven’t chased these up. We have no answers… only a working system :P

interlude – schejulure bug

At 12:11am on the first Sunday after we started fighing the vanishing servers bug, we were alerted by a slew of very strange errors. The deadman’s switch had misfired! Servers were constantly pushing each other out of the coordination system, even though they were fine.

We tracked down the bug to schejulure, a simple scheduling library we use: it was using 0-based day of the week, whereas the underlying time library used 1-based day of the week. We forked their code, fixed the bug, switched to our fork… and things went back to normal.

We even filed a pull request, so now everyone else’s scheduled tasks will also run on Sundays :)

coda

This was a brutal month, but on the whole, it worked out well. We weathered the storm, and our infrastructure is massively more robust in the face of failures and odd load patterns than it was in March. We have better monitoring and alerting. We’ve refined our outage communication processes, so our customers hear about problems from us, not the other way around.

Most importantly, we now have the all-purpose (shitstorm) emoticon in hipchat: shitstorm-1364257072

We want to apologize to everyone whose builds failed or ran slowly during this time. We also really want to thank those customers who said nice, supportive things — it seems many of you had similar problems during your growth, and your support was really appreciated during our most difficult days of desperate bug-hunting.

There’s still a lot of work to do. Here’s hoping we don’t have to use that emoticon too often!

Categories: Companies

CD and Embedded – Another Note on this Topic

The Electric Cloud Blog - Mon, 05/20/2013 - 18:57

Our PM David Rosen, recently wrote an excellent blog on Continuous Delivery (CD) and its applicability to embedded technologies.  And it goes to the heart of what we hear often from our embedded prospects and customers – Can we really do CD?

I understand the confusion and angst and would like to point the readers to two recent blogs on this topic:

Fundamentally, there is a difference between Continuous Delivery and actually delivering to the customer.  CD implies that we treat every checkin or commit as a release candidate and automate whatever you can – the builds, tests and deployments.  It however does not stipulate that you actually ship code with every commit.  You may be in a business where you can ship code as quickly as it comes from development (Web, SaaS) or you may be in a business where you cannot (embedded, ISV etc.).  But that does not take away from the value of CD.

CD is a cultural change as much as an upgrade to the systems.  It requires development to work in a different way – to ensure that the code branches are always integrated, to detect problems (compilation, test etc.) quickly rather than at a later time and to automate and get efficiencies in every task.  By taking this proactive approach, development teams ensure a “always shippable” version of the software.  And this brings immense value to the development process (and improves product time to market and quality).  But this software need not be shipped.  It can go as far a pre-production/staging environment and no more.  And there is a huge value in just achieving that.

So embedded teams – go implement CD!  It does not have to be the Facebook or Netflix style CD.  It can be your own form of CD.  And you would be happy you did so.

Categories: Companies

Continuous Information vol. 4 - CloudBees Newsletter for Jenkins

CloudBees' Blog - Mon, 05/20/2013 - 16:33


 Kohsuke and I finally got Continuous Information out the door. Volume 4 of the CloudBees Newsletter for Jenkins features an article from Kohsuke, the latest Jenkins improvements, some handy recent blogs, details on upcoming Jenkins events, loads of Jenkins resources and various other useful info for Jenkins users.
Check out the headlines...
  • 730+ Plugins, 61k+ Active Installations
  • Giving Back to the Community: Kohsuke's Insights
  • Registration Open for JUC Bay Area and JUC Israel, and the Jenkins User Event in Copenhagen
  • What’s New in Jenkins?
  • Upcoming Jenkins Enterprise by CloudBees Release
  • Featured Blogs
  • ... and more
 Read the whole newsletter, then check out previous issues and sign up to receive it directly.
PS - Got something for the next newsletter? Please drop us an email.



-- Lisa Wells
CloudBees Partner Marketing Bee & Managing Editor, Continuous Information

 

Follow CloudBees:
   
Categories: Companies

CloudBees Buzzes in Europe!

CloudBees' Blog - Thu, 05/16/2013 - 12:58

If you read our news today, you know it’s indeed rocking here at CloudBees! Over the last few months, much has happened in our growth and market momentum. Let's take a closer look at CloudBees' business, on several fronts.
First, we have further extended our Partner Ecosystem. We added key Technology Partners to provide more choice for developers and further augment - in an integrated way - the feature set and add-on services available from within the CloudBees Platform.
On the Services Partner side, the family is growing fast. The program that we launched late last year provides our customers with experts skilled not only in development but also with the CloudBees PaaS. Services Partners are available locally in more and more places around the globe. They are doing wonderful things on our platform and delivering amazing applications in a fraction of the time and at a lower cost, compared to the traditional on-premise development and deployment model. To support these valuable Service Partners we now have Cindy Vranken, our first partner manager, on board. She spends her time ensuring our partners are successful and happy. And we are already looking at further expanding her team.
From a platform user perspective, we have introduced the Developer Success Team. This team’s objective is to ensure that every user that comes to the CloudBees platform has a positive experience. The Developer Success Team helps and guides the developers on our platform to maximise their productivity and effective use of the platform. For those of you that are new to CloudBees, I’m pretty sure you have had a conversation with FĂ©lix Belzunce Arcos, our first Developer Success Team member.
Finally given the strong adoption and demand for the CloudBees platform in Europe, we have expanded our product offering and our team. Users can now run their applications on Amazon Web Services EU West in both multi-tenanted and dedicated configurations, enabling them to comply with European regulations and reduce application latency. We also opened the European office in Brussels with a team on the ground to be closer to our users and partners.
Watch this space, as the team is growing and a lot more activity is on the way!

Michel Goossens
Vice President of Worldwide Sales
CloudBees
www.cloudbees.com

At a time of explosive demand for the CloudBees Platform, Michel Goossens is driving growth for CloudBees, globally. Prior to CloudBees, Michel was at e-commerce platform provider Magento, where he was vice president and general manager of EMEA. Prior to Magento, he served as vice president for JBoss EMEA. Goossens holds a Degree in Computer Science from EPHEC in Brussels, Belgium.



Follow CloudBees:
   
Categories: Companies

Q&A: A Tutorial for Getting Started with PaaS

CloudBees' Blog - Thu, 05/16/2013 - 00:08
A big thank you to everyone who attended our webinar, Building and Running Your Applications in the Cloud: A Tutorial for Getting Started with PaaS. It was great to have so many of you join us for the session and thanks also for all of your questions.  If you missed the webinar or would like to see it again, here's the WebEx recording.  You can find recordings of all of our other webinars (past, present and future) on the Webinar recordings page - please check them out to learn more about PaaS, all things cloud, Jenkins CI and Continuous Deployment.

There were some great questions during the webinar Q&A and I wanted to give you more detailed answers than we had time for on the webinar:

Q: I have created the database. How do I insert data into it?
A: There are a lot of possible answers to this question, but here are a couple to consider:
  1. Many application development frameworks (Spring is a good example - see this documentation) have built-in ways to initialize a data source.  Take a look at our ClickStarts as there are some examples of this: for example, the JBoss JPA/Hibernate one has an import.sql script.
  2. You can always run a database initialization script as part of your deployment: if you are running a Jenkins build job using DEV@cloud, then the easiest way to do this is simply to invoke your SQL script using the built-in "Execute shell" action.  You should use the Build Secret Plugin to protect your database administrator password.

Q: Please publish a step-by-step tutorial on using CloudBees.
A: Here are some resources that may be helpful:
  1. I've done a few short videos that go through the complete setup of your CloudBees project (including integration with GitHub and Eclipse): you can find these on our Getting Started with CloudBees Resource Center.
  2. Sign up or login to your CloudBees account and check the box to "Take the Tour." This will guide you through the various features of the CloudBees Platform.
  3. There are quite a number of videos available on the CloudBeesTV YouTube channel.

Q: Is an app-cell an 1/8th of an EC2 Compute Unit?A: Yes, correct.  There's a more detailed explanation of app-cells in our pricing FAQ and there's more about EC2 Compute Units and the reasons why Amazon introduced this model in the Amazon AWS FAQ.

Q: Does CloudBees support ec2-userdata?  I use this to configure which environment my EC2 instance is (demo, uat, prod) and JDBC URL, etc. Perhaps this is done better in a PaaS?A: No, we don't provide access to the ec2-userdata for the underlying Amazon EC2 instances your app is running on: this is very tightly tied to the AWS infrastructure and we aim to abstract all that away for you. As you suggest, there is a much easier way to do this using PaaS: take a look at this article about Configuration Parameters and how to use these to customize settings for different environments.

Q: Do you have a Hadoop setup to run MapReduce jobs for an App that needs a computed data/result?A: We don't ourselves offer a Hadoop service: one option would be to consider using Amazon's Elastic MapReduce service and call to it using the Amazon APIs from your apps running on CloudBees - it's really easy to "mix-and-match" PaaS and IaaS in this way: you can use the bees config:xxx commands from the CloudBees SDK to pass the AWS credentials for the IAM user into your application and then call the services exactly as shown in the AWS Java SDK.

Q: We have a web application that needs to write/read media files at runtime. Which model (dedicated/shared) do you recommend?A: The main issue to consider is whether your application only needs temporary access to the files or whether you are looking for longer-term storage.  By their nature, PaaS applications do not usually have persistent file storage associated with them: the PaaS runtime can (and does) relocate those apps to deal with problems in the underlying infrastructure, or to scale the application to respond to increases in demand and in that event, the application is liable to be re-started in a completely different virtual environment with no access to the original underlying file system.  Even with dedicated servers, this is still the case: the main reason for using that model is for a more deterministic performance/response time, since you know in advance exactly what hardware infrastructure will be available to your applications and so can scale accordingly.


Q: Are you working on a NetBeans plugin?
A: It's something we really want to develop (particularly the former NetBeans engineer on our team!), but I'm afraid we just haven't been able to spare the resources yet. There is a community-contributed NetBeans plugin on GitHub, but I haven't actually used it myself. If you have time to take a look at it, please do let me have any feedback so that we can incorporate that into the requirements.

Q: Any plans to allow customers to choose Amazon hosting in Australia?
A: Nothing planned right now, but if this is important to you then please do let me know, so that we can get it on the engineering radar as this is certainly something we could support fairly easily: we already have a multi-region capability built into our PaaS and we provide customers their choice of Amazon US and EU regions.


Q: Is there any comparison with CloudBees & non-cloud pricing for a web application?
A: Yes, please take a look at our Total Cost of Ownership white paper.


Q: Do you plan to go to market with the Drupal ClickStack?
A: We are just in the process of finishing up work on the initial Drupal ClickStack and as soon as that is completed, then we definitely plan to make this available as a supported runtime on the platform.  The combination of ClickStacks, ClickStarts and continuous delivery will, we think, make a compelling case for developing and testing Drupal-based applications on CloudBees. The beauty of ClickStacks is that they allow us to plug in different runtime containers very easily and (surprising as it might seem at first glance) the underlying model is very similar, as these diagrams illustrate:





Mark Prichard, Senior Director of Product ManagementCloudBeeswww.cloudbees.com

Mark Prichard is Java PaaS Evangelist for CloudBees. He came to CloudBees after 13 years at BEA Systems and Oracle, where he was Product Manager for the WebLogic Platform. A graduate of St John's College, Cambridge and the Cambridge University Computer Laboratory, Mark works for CloudBees in Los Altos, CA.  Follow Mark on Twitter and via his blog Clouds, Bees and Blogs.



Follow CloudBees:

   
Categories: Companies

Taming the Android Monster

The Electric Cloud Blog - Tue, 05/14/2013 - 16:15

A few weeks ago, we released our Android software delivery solution – which enables Android software teams build, test and release Android solutions more efficiently and faster.

Since that release, we have had many discussions with Android development teams and our hypothesis of the need for such a “delivery solution” has  been spot on

Indeed, Android software development teams struggle to build the Android software fast (not just the vanilla OS, but also their value added software); testing Android is a time consuming process (taking anywhere from 5 – 7 hours with Android CTS (compatibility test suite)). And of course, vendor struggle with the matrix problem  - many products, many OS’s, many software variation. Our solution is built to address exactly these issues and is enabling many such software teams – deliver Android devices faster and with higher quality.

We recently spoke to Tom Williams from RTC magazine and he has written an excellent article on this topic.  As Tom writes “The ability to organize, manage and automate the development workflow can go a long way toward shortening this cycle time, especially when that includes making the generated data automatically available to those tasks that need it as well as to developers for use in appropriate reports”.

If you’re an Android shop struggling to tame the Android monster – lets chat.  Most of the largest Android device makers already know the secret to conquering the monster.

Categories: Companies

Can your engineering team deliver one product every week to the marketplace?

The Electric Cloud Blog - Fri, 05/10/2013 - 18:32

I recently read an excellent note on http://www.zdnet.com/samsung-announces-one-new-handset-per-week-how-many-is-too-many-7000014815/  The net-net “wow” of this article can be summed up in this one sentence “Between January and the end of April, the South Korean manufacturer announced an average of one new smartphone per week.”

Now,  these devices are not simple to manufacture. They include high, mid range, and low end phones, tablets – each with an unique factor, software and value added packages. It makes you wonder – how the h*ck are these guys delivering products so quickly and with such high quality (you don’t hear many Samsung recalls!)

Without a doubt, Samsung has automated their entire development and delivery process. They have perfected the art of making devices – from microwaves and camcorders to semiconductors and tablets.  But they also mastered the art of delivering the software that powers these devices (and software by the way, makes up most of the IP and differentiation in these products).  They know how to write the software and customize it for their devices;  they also know about to build, test and release this software fast enough.

At Electric Cloud, we have spent a decade helping customers do just that – deliver software fast (at speed of business). We help our customer build, test and release software quickly and with quality. We cannot tell you if Samsung is a customer (but feel free to read between the lines).  And if you want to deliver software like these experts – drop us a email and we can show you how.

Categories: Companies

Yocto/OpenEmbedded Bitbake Build Visualization

The Electric Cloud Blog - Fri, 05/10/2013 - 10:07

Yocto Project As ecosystems such as the Yocto Project and OpenEmbedded becomes more popular for embedded device development, usage of the underlying bitbake build tool is increasing. Bitbake is a powerful tool used to manage, build and integrate complete operating system images, through package and distribution management activities such as fetching of source code, configuration, cross-compilation, installation.

As with all software build systems, speed and performance of bitbake is of utmost importance and a key enabler for productivity and quality. This has also been validated multiple times when I’ve talked to key contributors and users in the Yocto and bitbake communities, where there are a lot of focus and ongoing discussions about what can be done to improve performance.

Open Embedded

So a valid question is, can we understand how a bitbake build currently behaves and from that, determine where or if there are opportunities for improvements?
In this post I will explain my process of understanding bitbake build performance and provide some useful utilities that I hope the bitbake community will benefit from as the never-ending quest for additional build performance continues!

What’s available today in terms of bitbake build visualization and performance analysis?

After following the Yocto Project Quick Start guidelines and some exploration of the bitbake build process and its artifacts , I ran into a folder called buildstats under /tmp. This folder has the below structure:

buildstats/
   [target architecture] (e.g. core-image-sato-qemux86)/
      [timestamp]/
         [packages]/
            do_compile
            do_configure
            :
            :
            do_unpack

Looking at one of these do_[task] files (e.g. do_compile) reveals a lot of interesting data:

>cat do_compile
   Event: TaskStarted
   Started: 1367381613.31
   xkbcomp-1.2.4-r8.0: do_compile: Elapsed time: 3.25 seconds
   CPU usage: 23.3%
   EndIOinProgress: 0
   EndReadsComp: 0
   :
   :
   StartTimeWrite: 1725266628
   StartWTimeIO: 1725869092
   StartWritesComp: 0
   Status: PASSED
   Ended: 1367381616.57

So for each task, we can get start time, end time and a bunch of other potentially useful data, interesting!

Have someone in the bitbake or Yocto Project communities already done some profiling and analysis using this data?
It turns out there is a utility out there called pybootchart that can generate a static SVG-visualization as a vertical listing of all tasks in a bitbake build. With me being used to analyze build performance using the power of ElectricInsight, this visualization from pybootchart fails to scale with the amount of presented data, and provides very little additional actionable metrics and reporting that will help me understand where my bottlenecks and opportunities for improvement are.

So an interesting question is, is there a way to leverage ElectricInsight to visualize and understand bitbake build behaviour and performance, using the bitbake buildstats data? It turns out it was fairly trivial to implement a script that can transform all this data into an ElectricInsight compatible annotation-file, that we can use to understand actionable takeaways such as effects of bitbake concurrency, and overall task-by-time reporting. Further details below about how to access this script available at the public Electric Cloud GitHub Repository.

Using ElectricInsight to visualize and understand effects of bitbake concurrency

Bitbake supports at least two levels of parallelism – through multi-threading within the bitbake task execution mechanism (BB_NUMBER_THREADS) as well as through passing the -j flag to the underlying calls to make (PARALLEL_MAKE). When run on a physical 8-core server with 20GB of RAM and decent disk performance, the three screenshots below are from bitbake builds using varying levels of concurrency (concurrent threads on the y-axis, time on the x-axis, and individual tasks represented by the various colored boxes):

BB_NUMBER_THREADS=8 / PARALLEL_MAKE=8:
Bitbake Build Performance

BB_NUMBER_THREADS=12 / PARALLEL_MAKE=12:
Bitbake Build Performance

BB_NUMBER_THREADS=16 / PARALLEL_MAKE=16:
Bitbake Build Performance

As you can see for these three different configurations for this particular build on this particular box, 8-way concurrency delivers the best performance at roughly 74 minutes and distribution on the threads seems pretty packed. As you scale up the concurrency, there are two phases in the build at roughly the ~32m and ~50m marks where the gap or idle threads indicates serializations – obviously pointing out areas where I would start my analysis if I were to make an attempt at optimizing this build. When running ElectricInsight with such an annotation, filtering out which tasks are possible culprits for the serializations is easy to point out.

Apart from showcasing the ElectricInsight bitbake build visulizations through the above three screenshots I don’t aim to make an exhaustive analysis at this point.

Using ElectricInsight to understand relative task by time distribution

ElectricInsight has a built-in report that can be used to visualize a heat map of where your build is spending its time, called the “Job Time By Type” report. This automatic categorization into different job-types is done by some clever identification and mapping, happening under the hood of the tool. Unfortunately at this point, these categories in ElectricInsight are fixed and not customizable. To enable this categorization, I built in some mapping logic in the conversion script where the most significant task-to-job mappings are shown below.

Let’s take a look at what we get:
Selection_416

As you can see, the do_configure, do_compile and do_package tasks combine for a rough total of 77% of the total runtime. I must admit the relative significance of the do_configure task with an average runtime of 25s was a bit surprising to me, and would be interesting to explore further.

For context, other significant bitbake tasks in this build are:

   Filesystem I/O: do_install
   Exist: do_package_write_rpm
   Code gen: do_populate_sysroot

This is cool! How can I use this tool to visualize and better understand my own bitbake build?

It’s really simple:

In your bitbake build environment, simply run the following script downloaded from GitHub and open up the resulting annotation-file in ElectricInsight:
   bitbake_buildstats_annogenerator [path-to-bitbake-build-stats] [anno-outfile]

What else could be done with this tool?

ElectricInsight is a very powerful tool for build optimization, troubleshooting and analysis. There are a number of possible further capabilities that could be built into the bitbake-to-annotation conversion script:

  • Embed the stdout from each bitbake task, for easy search and troubleshooting
  • Leverage additional data-points from the bitbake buildstats task files for further metrics and data visualization

Are you finding this useful or have any feedback from using this tool? Don’t hesitate letting us know!

Categories: Companies

pulse 2.5.25 released

Latest Zutubi News - Thu, 05/09/2013 - 02:00

Pulse 2.5.25 has been released. This is a stable build in the 2.5 series. Changes include:

  • A fix for personal build client patch failures when using SSL and Java 7.

See the release notes for full details.

Pulse 2.5 packages are available from the downloads page.

Categories: Companies

5 aspects that makes Continuous Delivery for Embedded different

The Electric Cloud Blog - Fri, 05/03/2013 - 09:52

There are a lot of general theories and principles available on the concept of Continuous Delivery – and how an ideal product development organization should strive to continuously deliver release-ready product to end-users on every change of software, hardware, configuration or data. There is also an abundance of available material and practical recommendations on how to make Continuous Delivery work in the world of hosted cloud- and web-based product development, i.e. when you as a product development organization typically own and manage the end-to-end chain from development to operations.

Selection_378

Every product development environment is complex in its own right. In this post I aim to explain and discuss some challenges and hurdles with the application of Continuous Delivery theories and principles to embedded product development. The embedded and intelligent systems markets are huge, representing opportunities in a trillion dollar market for the organizations that succeed. For the sake of this discussion it includes but is not limited to manufacturers and suppliers in the following industries: automotive, aerospace and defense, medical devices, mobile and consumer electronics, networking and telecom infrastructure, semiconductor, and energy.

If you are exploring a Continuous Delivery implementation for your embedded product development organization, you will find it difficult at the time of this writing to find a lot of relevant available practical reference material – especially if you’re looking at this from an enterprise-scale perspective. Let me point out one great reference, well worth the investment to read, learn and take inspiration from: “A Practical Approach to Large-Scale Agile Development: How HP Transformed LaserJet FutureSmart Firmware”. This book will discuss the transforming journey of the embedded printer development team at HP, starting from what I would say a very common general base in terms of what problems and challenges they were facing. The authors present some really interesting thinking and practical solutions in the book, backing up the results by very impressive before/after statistics and metrics.

It’s important to note the complexity of realizing Continuous Delivery; it will not be easy and will require a lot of hard work – regardless of your industry, working environment within your organization, and baseline starting point. You need to recognize and understand that a successful Continuous Delivery project and implementation is a long-term, complex and ambitious transformation of your organization. It is not a turnkey solution or tool, and involves all dimensions of your R&D organization. To succeed, you need a solid platform consisting of tooling and infrastructure, process and configuration management, and finally people and change management. Learning from others that have walked the same path and are willing to share their experiences will greatly help you avoid a number of common mistakes.

While the absolute majority of the general theories and concepts defining Continuous Delivery are very much valid also for embedded product development organizations, the typical technical environments are vastly different from what’s commonly being referenced. I will cover details of each aspect later in this post individually – but legacy, infrastructure, lead times, and compliance are all very common challenges (in many cases intertwined with each other) that need to be addressed if you are to succeed with your Continuous Delivery implementation. Finally it is also important to understand that the end goal of a Continuous Delivery implementation for an embedded product development organization typically is very different than what it would be for a product development organization based on web or cloud based technologies.

1. Legacy

The typical norm for embedded product developers is that there are large if not enormous amounts of existing IP and technology that new and future products depend on. Across the hundreds of different embedded development teams I’ve interacted with in the past 10 years of working in this industry, I can only think of a couple embedded development teams with the luxury of having started from a clean sheet with their design, implementation and organization.
Selection_404
The practical consequences of this legacy are multi-fold and very complex – product architectures, massive codebases, team organizations, build systems, test environments… As an example, one embedded product development environment that I am actively working with are currently managing a growing legacy codebase of 130 MLOC that’s been around for two decades, with no signs of stabilizing and slowing down in terms of growth – the codebase has in fact grown by almost 50% in the last two years!

Retro-fitting this legacy into a Continuous Delivery model is not easy and is likely to be expensive, but is almost guaranteed to be a worthwhile effort, especially if you have a longer-term vision and intend for your product to stay competitive in the future.

Enabling the architecture of your legacy embedded system to fit in a model of Continuous Delivery is challenging and often very cumbersome – as the product architecture is the natural guiding principle for most product development teams in how to organize themselves and their work. More often than not, multiple layers of platform, framework and application components are deeply nested with each other causing complex monolithic codebases to deal with. Rarely do I find that these components and parts of the system can be individually handled in a way that allows for separate delivery and release streams – which is almost a necessity for a successful and efficient realization of Continuous Delivery.

2. Infrastructure

I have no hard data to back up the following claims but if there ever would be such rankings, I’d say that embedded product developers are likely to be on top of both the “compute core per capita” as well as the “test environment cost per capita” lists. So compared to other product developers, it’s fair to say that embedded developers stand out in a couple of ways with respect to their development infrastructure needs.

  1. Satisfying embedded developer’s insatiable hunger and need for compute infrastructure.
    Building an embedded device is a complex project involving both hardware and software components, supported by massive amounts of compute infrastructure. Whether or not you’re involved in integrating all the various components of your system into an image that will run on the device or if you are responsible for actual development of some specific functionality, you almost certainly have an insatiable need for more resources.More concretely, most embedded software development is implemented using native C/C++ programming languages, prone for their long and CPU-intensive build process. The most obvious and common solution to optimize and accelerate build times today is to throw lots of hardware at the problem, allowing for parallelism of the build process across available cores on the developer or build server. As an interesting reference example, the Android platform build requires 48+ cpu cores on a single machine to maximize performance. One way of satisfying these needs for loads of compute infrastructure is to buy and deploy large amounts of standard large off-the-shelf servers – but with the emerging growth of code that needs to be built and managed for any embedded product, you are setting yourself up for a costly and never-ending race against Dr. Gordon Moore!
    (Other native programming language paradigms are emerging with promises to overcome some challenges with respect to build times – but it will take many years if not decades for any of these languages to reach mainstream popularity in the embedded software industry, if it ever happens.)

    LargeComputer
    As your Continuous Delivery implementation scales and cycle times need to be shortened, your development teams will demand even more computational power to properly serve the increased load of software builds, tests and analysis jobs. These days, supplying the necessary compute power for some of these workloads while preserving economies of scale is a complex but fairly well understood problem – with centralized development clouds and dedicated backend high-performance compute infrastructure being common ways to satisfy your needs for large-scale efficient software builds, analysis and emulator processing.
  2. Managing automation of physical target-based testing.
    Another major difference for embedded developers is the problem of how to efficiently integrate and manage automation of physical target-based testing. This need for proper and automated testing on the actual embedded hardware is imminent and something I don’t expect to ever go away – as I have yet to hear of any embedded product development team being compliant to release product without testing on the real physical embedded hardware. And if you are reliant on manual configuration and deployment of your physical targets, it’s unreasonable to expect an efficient and always available Continuous Delivery environment.These physical targets are also typically custom hardware, very expensive and quite often in some prototype-mode, so prone to be fragile. Given their cost and maturity, I have never heard of a product development team with an abundance of these targets, so it is of utmost importance to maximize utilization of the ones in possession. Possible solutions and alternatives exist to avoid being so dependent on the actual physical targets, such as sophisticated full-system simulators that can run unchanged production binaries in managed simulated environments.

    The final aspect of automating physical target-based testing in your Continuous Delivery implementation is the actual technical integration, and how to properly interact with and orchestrate the System-Under-Test (SUT). The details of this topic is very specific to the target in question and deserves its own technical blog post or paper in its own right, and is out of scope for this discussion.

3. Lead times

Long lead times are detrimental for the productivity of any product development team, and making sure the end-to-end cycle time of the build-test-release workload is as short as possible should be a key priority for anyone implementing and scaling a Continuous Delivery environment.
fastime-2-stopwatch
As a concrete example of where the embedded market is today in terms of management of the lead times, the Yocto project is a ground-breaking thriving and active community focused around providing a common framework for managing, creating and building custom embedded Linux devices. In my discussions with embedded developers currently using the Yocto project, performance improvements stands out as the primary request or need.

As previously mentioned, most embedded developers are currently relying on C/C++ for their software development environment. This has significant consequences with respect to lead times. If you compare typical baseline build and analysis lead times for this native C/C++ programming paradigm vs. managed environments such as Java and .NET, there is a magnitude of difference which needs to be addressed in a successful Continuous Delivery implementation.

Fortunately there exist mature and sophisticated solutions for build, test and analysis acceleration that can reduce lead times by up to 90-95%, which could mean bringing hours of runtime down to a few minutes if not seconds. If your current build, test and analysis process are any close to being longer than what it take your developers to refill their cup of coffee, my recommendation would be to prioritize this as a key improvement to address. Accelerated builds and tests will pay dividends not only for your Continuous Delivery implementation but also for your developers in their day-to-day edit-test-compile cycles.

4. Compliance

Compliance Checklist
Many embedded developers in e.g. the automotive, aerospace, defense and medical device industries needs to meet rigorous compliance, security, safety and auditing standards in order to ship products to market – some example standards being MISRA, DO-178B/C, ISO 26262 and IEC 62304. Verifying for these regulatory requirements is a complex, costly and time-consuming task which obviously has negative implications for anyone trying to implement an efficient Continuous Delivery solution.

Fortunately there exist integrated automation solutions to reliably and securely manage policies and compliance for auditing purposes, as well as acceleration mechanisms that will help you run your comprehensive security analysis and testing faster and more often.

5. The End Goal

When you hear of Continuous Delivery implementations at companies such as Facebook, Netflix, Etsy, Gap and FamilySearch it is important to understand that all of these companies serve their customers and users through hosted web and cloud solutions, where the companies themselves own and are responsible for the end-to-end development-to-production infrastructure. In this delivery model, it makes total sense to strive towards an incremental release and customer shipment of every product change.
The typical end goal of Continuous Delivery in the context of embedded product development is somewhat different, in that you most often won’t own and have any control of the final destination and end-user target environment. But don’t let the fact you aren’t walking that extra mile to deliver incremental value to your end-users on every product change move the goal-post for your embedded Continuous Delivery implementation. In this context I’d like to think of the goal for Continuous Delivery as the constant or instant availability of a “shippable” compliant functional product, ready to be delivered to the market at any time at the push of a button.

Regarding lack of control and ownership of the end-user embedded target environments, technology are changing the game here as well.
Selection_380
As an example are various Over-The-Air (OTA) mechanisms that exist today to automatically deliver upgrades to embedded devices and are being used for e.g. mobile phones, settop boxes – even in cars like e.g. the new Tesla Model S! Due to the uncontrolled disruption in end-users usage and behavior, OTA-based product upgrades are being rolled out with low frequency – and cannot be compared to how a modern website of today is constantly being upgraded on every change.
But with software becoming more and more important as the differentiating value proposition, and various forms of sophisticated wireless technology are becoming more and more trusted as a secure and reliable barrier of data, I expect OTA-based upgrade mechanisms to continue evolve and mature in the near-term future, broadening in use and applicability to most if not all embedded product industries. This will have an interesting effect for future embedded product development organizations opening up the possibility for end-to-end Continuous Delivery, potentially leading to every change causing an upgrade in the end-user target environment.

Conclusion

Based on my experience, this post discussed some of the most outstanding and interesting differences and challenges with implementing Continuous Delivery in the context of embedded product development. Again, it’s important to recognize and understand that a successful Continuous Delivery project and implementation is a long-term, complex and ambitious transformation of your organization. It is not a turnkey solution or tool, and involves all dimensions of your R&D organization. To succeed, you need a solid platform consisting of tooling and infrastructure, process and configuration management, and finally people and change management

Did I miss anything? Do you not agree? Or does it makes total sense? Please let me know directly or post a comment below!

Categories: Companies

Meet the Butler at Jenkins User Conference Palo Alto

CloudBees' Blog - Tue, 04/30/2013 - 15:00
As of the end of February 2013, Jenkins had more than 57,000 active installations (a conservative number) - up more than 60% in the last year - and more than 600 plugins. Our Fall 2012 Jenkins survey showed that 83% consider it a mission-critical tool. 


2013 Palo Alto Jenkins User Conference

Wednesday, October 23, 2013Palo Alto Jewish Community Center
The Jenkins User Conference (JUC) provides the perfect venue for everyone – Jenkins experts and newbies alike – to learn more about the Jenkins continuous integration server, share knowledge, network, and build an even stronger open source community.

JUC Palo Alto 2013 features a keynote by Jenkins founder and most significant contributor Kohsuke Kawaguchi and two full tracks of presentations by Jenkins experts from the community (hopefully including you!). Light breakfast, lunch, snack and Jenkins conference freebie (usually an envy-inspiring t-shirt, but maybe we’ll surprise you this year) are included for everyone.


What You Need to Know

  • Register and join the fun! Early-bird tickets are only $54 through August 2.
  • Call for Papers ends June 9 – if you have exceptional Jenkins knowledge to share, please submit an abstract to present (scroll to the bottom for the form).
  • Sponsorship â€“ please drop a note 'juc-oc-ext AT cloudbees DOT com' if you would like show your awesomeness and support the Jenkins community... or even host a Jenkins event yourself.


To get a feel for the conference, check out the video & slides from 2012 JUC San Francisco and the video highlights and slides from the inaugural JUC in October 2011.

Finally, a special shout-out to the many sponsors who have already flocked to support JUC:


         





 

             






We expect the conference to sell out, so secure your spot now!

Can’t make it to California for Palo Alto JUC? Check out JUC Israel on June 6 and the Jenkins User Event in Copenhagen on September 9th.



Follow CloudBees:
   
Categories: Companies

Is It Worth The Time?

The Electric Cloud Blog - Mon, 04/29/2013 - 08:01

The current webcomic on xkcd.com titled “Is It Worth The Time” has a fantastic table listing how long you can work on making a routine task more efficient before you’re spending more time than you will save, based on a 5-year payback plan.

Is It Worth The Time

As a company spending most of our time and effort focusing on the general problem of accelerated software development and delivery, this is obviously great data and icing on the cake that can be used to justify the value of our solutions and offerings.

How much time and effort are you spending today optimizing and accelerating your software builds, tests and release processes?

Categories: Companies

pulse 2.5.24 released

Latest Zutubi News - Sun, 04/21/2013 - 02:00

Pulse 2.5.24 has been released. This is a stable build in the 2.5 series. Changes include:

  • A security fix for password hash exposure in configuration audit logs.

See the release notes for full details.

Pulse 2.5 packages are available from the downloads page.

Categories: Companies

Looking to transform your large-scale development organization?

The Electric Cloud Blog - Wed, 04/17/2013 - 02:14

Stuck in old legacy? Fighting a slow development process? Are complex dependencies between teams and product architecture prohibiting your ability to innovate? Struggling to adopt or scale Agile?

Selection_390

A few weeks back I read “A Practical Approach to Large-Scale Agile Development: How HP Transformed LaserJet FutureSmart Firmware” by Gary Gruver, Mike Smith and Pat Fulghum.
This is a very hands-on and quick read that articulates practical solutions to many common issues and challenges in a large-scale development organization – legacy product architecture, team organization, quality issues, and efficiency of product delivery. The book introduces an interpretation and conceptual practical realization of Lean and Agile methodologies, applied in the context of large-scale embedded product development. The reported Before/After-metrics are astoundingly impressive – I’m leaving the detailed metrics out of this post for readers of the book to take away.

The whole approach at HP Firmware and their transformation is admirable and very intriguing to read about. Below are my main takeaways and what I appreciated learning about the most:

  • Their vertical “thin-slicing” approach to refactoring a legacy product architecture in order to quickly understand and meet business objectives.
  • Their planning and estimation management using the role of a System Engineer and common Agile planning and estimating techniques. Essentially a System Engineer at HP Firmware is an experienced engineer with enough talent and oversight to understand both internal technical engineering and external customer and market needs – an interesting role that I would say is fairly uncommon in other similar organizations.
  • Their diligent and phased test automation management. Fully integrated and automated into their Continuous Delivery process, I appreciated their approach and process of constantly monitoring and optimizing what tests to run at what phase of their test automation implementation.
  • “The key is not to manage by metrics but to use the metrics to understand where to have conversations about what is not getting done”. Very well said.
  • If you’re looking for a concrete and real case study on how to transform a large-scale development organization, I can definitely recommend getting a copy of this book. Read it through, then start discuss and work with your peers to agree, identify and understand what your goals, needs and current pain points are. From there on, it’s an ongoing process of Continuous Learning and Improvement!

    Good luck!

Categories: Companies

3 Steps to Automate Your Way to Agile

The Electric Cloud Blog - Tue, 04/16/2013 - 02:56

Join our embedded webinar presented by VDC Tuesday, April 16th, at 9AM PDT/ 12PM EDT/ 4PM GMT

Register now:

http://www.electric-cloud.com/about/events-webinar-041613-electric.php

We all know that Agile enables software organizations to continuously deliver working software faster to customers (internal or external).

This helps software teams to not only deliver products faster but also in tune with the changing market needs. In practice however, organizations still struggle to get the full benefits of Agile methodology because they have not fully automated their practices (development, build, test, release).

Join us to hear Christopher Rommel, VP of M2M and Embedded Technology at VDC, address the fundamental issues and recommendations you should consider as you adopt Agile:

- Drivers for adopting Agile

- Critical organizational, process and tooling issues to consider and pitfalls to avoid

- Recommendations on how to do this right by automating your processes

I will be co-presenting with Christopher and look forward to your participation on this great topic.  This 1-hour webinar will be held on Tuesday, April 16th at 9am PDT.

Register now:

http://www.electric-cloud.com/about/events-webinar-041613-electric.php

Categories: Companies

pulse 2.5.23 released

Latest Zutubi News - Tue, 04/16/2013 - 02:00

Pulse 2.5.23 has been released. This is a stable build in the 2.5 series. Changes include:

  • A bug fix for permission enforcement when pinning builds.

See the release notes for full details.

Pulse 2.5 packages are available from the downloads page.

Categories: Companies

ElectricAccelerator 7 – pushing the boundaries of build acceleration, again

The Electric Cloud Blog - Sun, 04/14/2013 - 00:17

Today, Electric Cloud is announcing the immediate availability of ElectricAccelerator 7.0. This release brings significant new innovations and performance enhancements to the market for anyone looking to optimize and accelerate their software build environment.

We have publicly launched and talked about some of the new capabilities of this release already, back in February at the Android Builders Summit – here is a blog about what was presented.

The marquee features of ElectricAccelerator 7.0 are Parse Avoidance and Dependency Optimization:

  • Parse Avoidance significantly reduces makefile parse time. By caching and reusing parse results, this feature can speed up both full builds and incremental builds.
  • Dependency Optimization improves performance of a build by optimally schedule the workload in the build based on the actual dependencies, efficiently removing any superfluous dependency-information.

Apart from performing the upgrade, existing ElectricAccelerator customers will be able to take advantage of this release with no necessary changes of their build environment.

Below is a table of some of the internal benchmarks we have run as part of qualifying this release. Both builds are Android-based, with stock vanilla Android Jelly Bean 4.1.1 on the left and CyanogenMod 10.0 on the right.
Selection_385
We were using a 48-core machine for all the benchmarks presented above, percentages in blue refers to the relative performance improvement when Dependency Optimization and Parse Avoidance are enabled.
The columns named “48 agents, Remote” shows the benchmarks when ElectricAccelerator was configured in a distributed build cloud mode, with all computational workload being federated over the network to a remote 48-core machine through the ElectricAccelerator cluster-architecture. The “48 agents, EADE” columns show the results when ElectricAccelerator Developer Edition was being used on that single multi-core machine, with no distribution capability across remote machines. As you can see, significant performance improvement of both full and incremental builds in both setups!

Categories: Companies

Electric Cloud selected as 2013 DevOps Cool Vendor

The Electric Cloud Blog - Fri, 04/12/2013 - 20:01

Today, Electric Cloud was selected as a 2013 Cool Vendor by Gartner. We strongly believe that the selection is  a testament to the value that we provide to the Dev and Ops team in today’s fast paced application release process.

Gartner has always considered automation as a key cornerstone of DevOps and Electric Cloud solutions automate and accelerate the build-test-release-deploy processes. More importantly, our solutions  provide a common set of tools that can be used by both the Development (Dev) and Operations (Ops) organizations.  Dev and Ops can use the same deployment solutions  (ElectricDeploy) which improves the reliability of deployments; Dev and Ops also share a common release management process (ElectricCommander) that increases  visibility and quality of the released application.

The net result is that our solutions helps Dev and Ops work well together and improves the efficiency of the application release process. And that is exactly the benefit that hundreds of our customers have seen with our solution.

We are very pleased with the Cool Vendor selection. Stay tuned – we will announcing many more innovative solutions targeting the DevOps market.

Categories: Companies

Want to be a DevOps “Ninja”?

Nolio - Application Service Automation - Thu, 04/11/2013 - 11:16

We’d all love to be DevOps ‘Ninjas’ – ready and equipped to overcome all the challenges thrown at us on a daily basis. However, the truth is that most of us are merely ‘white belts’, struggling to get past the incessant deployment roadblocks obscuring our path.

Help is at hand! We can help you overcome these roadblocks with four techniques and tools:  

Tip One: Know the Process

Having one single pathway to product is vital for the implementation of a successful DevOps culture

It’s important to make sure that you have approvals (manual or automated) set up between every stage of the process. Your choice of process automation technology will depend on the ability to model both automated and manual approvals. Even if you want a fully automated continuous delivery platform, it’s better to adopt technology that is able to handle both automated and manual approvals if necessary.

Tip Two: Automate Your Processes

It doesn’t matter whether you’re in development, QA or operations – I’m sure that you’ve all been in the frustrating situation where it works on your machine but errors are occurring in another environment.

So what can we do to avoid this? If development and operations work closely together, there shouldn’t be any surprises about the choice or architecture or hardware. Close collaboration should enable operations to provide environments that very closely reflect the production environments. Virtualization is key! You need to be able to spin up instances on demand in order to avoid hardware-driven bottlenecks. If you give developers environments that closely resemble production, you can test changes against properly configured environments in the early stages of the deployment process.

Tip Three: Enable Reproducible Deployments

There are many different types of deployment automation. At the very least, most people have some kind of basic script to perform deployment automation at some level. As architectures and environments increase in complexity, so do the scripts. However, the scripts don’t always work for all of the environments and the authors are often too busy to maintain them once they’ve become too complex. Writing scripts in order to handle integrations to multiple systems and deploy products isn’t usually a core competency in a company.

Luckily, there are tools on the market like Puppet and Chef which give the ability to perform many low level operations easily. However, there is still a need for deployment automation that can integrate all aspects of the software development lifecycle. Custom integrations can be complex, time-consuming and fragile but they are critical in establishing and maintaining end-to-end traceability.

Tip Four: Put all the Pieces Together

Most companies won’t throw out technologies that are already set in place. Most deployment automation tools can handle integrations when applications are being deployed but there are many ‘touch points’ in the process. These include: issue tracking, continuous integration, test automation, requirements management and application monitoring. The process management layer should be capable of exchanging information by receiving information or pulling it from another system. With the higher-level process management framework in place, you have ensured traceability

All of the tips and techniques listed above will help you get on the path to being a DevOps ‘Ninja’. They will help you remove your deployment roadblocks and eliminate your problems easily and with minimum effort.

This article, Want to be a DevOps “Ninja”?, is based on an original post in DevOps Angle.

Want to be a DevOps “Ninja”?
Categories: Companies

Introducing Syntax Highlighting for the Edit Step page in Google Chrome

The Electric Cloud Blog - Wed, 04/10/2013 - 17:50

At Electric Cloud we have a history of using what we produce. This means I get to use ElectricCommander on a daily basis and find myself frequently creating and editing steps within the Commander Web UI. A recurring problem that comes up is how to debug a script defined in Commander that has an error. How many times have you seen an error like this:

Global symbol “$j” requires explicit package name at C:\Users\nvaze\AppData\Local\Temp\ecmdrAgent/agent.QXEM7ALX703PCTRY.run-4030-27875.cmd line 14.
Execution of C:\Users\nvaze\AppData\Local\Temp\ecmdrAgent/agent.QXEM7ALX703PCTRY.run-4030-27875.cmd aborted due to compilation errors.

Let’s take a look at the current UI and see if we can spot the error.

Screenshot of current Edit Step page in the ElectricCommander Web UI.

Screenshot of current Edit Step page in the ElectricCommander Web UI.

Hmm, it is pretty hard to count line numbers and reserved words do not show up in different colors. There had to be a better way, so I wrote a quick Chrome extension to use the CodeMirror project to provide syntax highlighting.

After you install the Chrome extension this is what you should see:

Screenshot of syntax highlighted edit step page.

Screenshot of syntax highlighted edit step page.

Notice how the text area now has line numbers and reserved words pop out. Line 14 is now very easy to pick out. As a daily Commander user this makes my life easier when viewing arbitrary steps.

Installation Instructions:
To enable syntax highlighting, install the Chrome extension from the Chrome web store and visit the Edit Step page for any step in your favorite procedure. Syntax highlighting will turn on automatically and defaults to the Perl language. There is a dropdown selector which allows users to change between various programming languages. Currently, the extension supports these languages: Perl, Shell, JavaScript, Python, Ruby and TCL.

Notes
1. The extension is only available for Google Chrome.
2. There is no compile time checking to show mistakes in real time.
3. The extension will only be active on web pages that match “commander*editStep*” (i.e. Commander Edit Step pages).
4. The extension will only work for ElectricCommander v4.2 and later.
5. For now the extension is in beta and is a side project of mine but I’m very interested in feedback!

Download the Google Chrome extension here.

Chrome is a trademark of Google Inc.

Categories: Companies