DevOps Mar-Apr2019 TheConnection

The Adventure of Making Git Work on NonStop

What you are about to read is based on facts, but the dates are approximate. It’s all really a blur at this point.

Porting’s Log – Stardate 4 (well, we really don’t have Stardate yet, so let’s just call it March 2016.) I found a version of git online going back to May 2014, version 2.0.0, that worked a little on NonStop, that Jojo had worked on, built, and had about seven patches contributed and accepted. It hung a lot and would not act as a server but was functional enough to get us going.

I had been a member of the ITUGLIB team for about six years by that point, and we were on a quest to modernize our build processes so that some of our core ports, like OpenSSL and OpenSSH, which were seeing an uptick in downloads, could be kept current. By this point, we had inherited a lot of ports from the past, and the GUARDIAN contributions, that were stored in tar and ZIP files. Our CVS port worked fine, so we could do basic version control, but none of the Open Source packages we needed used it. The traditional solution Open Source used was Subversion (SVN), but rumours were flying that that was going to be replaced. But with what? Well, we did not really know at the time, and attempts at porting Subversion had failed because it had dependencies that did not compile on NonStop. Our two security ports, OpenSSL and OpenSSH, maintained what they called mirrors at GitHub in git repositories, so we thought, “Hmmm, let’s try to get git to work, at least as a client”.

Porter’s Log – June 2016: The git port is driving me up a wall, it works for trivial repositories that we clone from GitHub, but not for the big ones. We need to figure out why. There’s some incompatibility.

So began the year-long effort to resolve between four or five key problems with git on NonStop that were causing hangs. The deepest dives were done by Jojo and myself and I learned more about eInspect than I ever wanted to know. From integrating FLOSS, modifying buffer lengths, changing maximum IO sizes, working around calls that were not implemented, playing with socket buffering, and back-porting FLOSS changes, the work was rather grueling. The team probably spent 500 hours on this. And finally we thought we had a working git.

Porter’s Log – October 2016: The git port is working as a client and is stable! Anyone have any champagne? Is anyone going to talk about this at TBC? And does anyone know why the server-side still hangs randomly?

Around this point, we had converted much of our ported code over to git. We had OpenSSH and OpenSSL in a reasonable state where fixes could be pulled down and deployed in about an hour, with little human intervention. A few git merge commands and running tests. We had also built out our Jenkins scripts so that the build/test/deploy phases were all automated. But we were still frustrated by not being able to use git as a server on NonStop. That was frustrating.

Porter’s Log – November 2016: I’m not sleeping. Between the git hang and getting NSGit release-ready, I have no life. We’ve gone through the communication code top-to-bottom and can’t see anything that would hang only on NonStop. What’s going on? We reported the hang to the git team. Maybe some wisdom will come of that.

Around January 2017, the git development team picked up a few motivated individuals – or maybe they just got inspired – and releases started to come out much more frequently. This loaded up the ITUGLIB team, who tried desperately to keep up with the changes. We were a bit worried about keeping up, but so far, the changes were mostly compatible. There were a few conflicts with each new release, but those could be managed.

Porter’s Log – March 2017: A note came through about a hang in the Windows port that sounded really familiar. As with many bug reports from the git team, they often come with a proposed fix.

This was a Eureka moment. Weeks prior to this, we had the area of code isolated to a region of code small enough that we could pass this off to the Global NonStop Support Centre. We had code compiled with symbols and a reproducible test case. They had the tools to look inside the operating system, where the hang was being experienced. Within a month, the Windows fix and the identification and confirmation from GNSC of where the actual hang was, came down to a file control setting that Linux allowed but Windows and NonStop did not. NonStop was reporting an error where git was expecting a success and retrying. The hang was found.

Porter’s Log – June 2017: Call me paranoid, but I’m still suspicious. I want to leave some of our fixes in place until things are proven to work.

So began our slow removal of differences between the NonStop port and the code git code. A few of our trivial fixes were accepted by the git team, who mostly did the changes themselves, like limiting the maximum I/O size. Each release that went out moved closer and closer to the standard git code. In case anyone is tracking this, we were just earlier than the git 2.5 version by this time. Git remained stable, with each careful removal of our hacks, uh… no longer necessary fixes.

Porter’s Log – August 2017: I am determined to make git work with NonStop SSH. As a server, it works fine, but as a client, we’re having issues.

Back to GNSC, and we eventually got a working structure. Fast forward to mid-2018.

Porter’s Log – August 2017: I think we’re there! The ITUGLIB team wants to submit the changes to the git team. This shouldn’t take long. There are only about six changes left.

Was I ever wrong about that feeling. Little did I know how sticky the git team is when you try to submit something. Jojo was able to get a one or two liner through, but the big batch of changes we had to make were another matter. I must admit it felt a bit like talking to myself, because I am a real stickler for explaining and justifying changes. The weird part, and I’m sure there is irony here, is that the git team uses GitHub as the main repository but does not use the issue tracking or approval mechanisms. I guess they don’t want to get into a recursive mess because GitHub depends on git, and git does not want to depend on GitHub.

There was one change that was really weird. I found a call to an inappropriate method and fixed it back in early 2017 and tried to contribute the change. It was accepted and forgotten. Each new release still had the wrong call. I kept nagging the team. Eventually they decided that I did not correct the error handling code inside and outside the call – who knew? – so they had rejected the change but did not bother to tell me. That was months of elapsed negotiation.

We had to remove a few of our custom changes, like being able to package git with special tar options – we moved those to Jenkins instead of convincing the git team to support custom tar options. It took almost six months of going back and forth, with five revisions of the submission to get the change comments into just the proper form, and the code changes just right.

Porter’s log – November 2018: If I have to remove one more trailing space from a line of code and resubmit my changes, I’m going to go nuts and I’m taking you all with me.

Finally, just around the time I was going to lose it, the changes were accepted. Of course, that does not mean that the code magically shows up in a release. So began the slow grind of going through the process, which is actually a variant of GitFlow called Gitworkflow with a bunch of integration branches.
First the code moved to the pu branch, which stands for proposed updates. This is a staging branch for changes that might make it into a release. The change sits there until it is considered stable, at which point it moves to the next branch, which will become the next release, or even the maint branch for imminent fixes. Finally, a release is created, and the changes are merged it, and tagged with the version number of the git release, which has not happened at the time this article was written. The ITUGLIB team had a virtual celebration at one of our weekly meetings when we did a git fetch on git and our changes showed up.

Where we will be in just a few more weeks is that git will contain all changes required to run on NonStop in OSS, which was our goal. We will no longer need to ship the git source code because all of our port changes will be available in the standard git code base. Customers will be able to build git all on their own, if you want, and have all of the dependencies – warning, it’s not a small list.

Thanks are due to the entire ITUGLIB team, with a special nod to Jojo, GNSC, and NonStop Development who found the incompatibility in the Linux code that solved the hang, and the git committers who put up with our inexperience.

For git, the hard work is done for now. We still have to build, test, and deploy each release. We have Jenkins for that. At least having to merge and resolve conflicts is a thing of the past. Now, the road leads to doing the same with OpenSSL and OpenSSH.

REGISTER HERE For APAC April 22, 11am SGT

Leave a Comment