by Garrett Serack on June 17, 2009 03:47pm
Previously, I talked about using PGO in the PHP build process. In order to use it I had to observe...
The Heisenberg build process
"A process cannot be understood by stopping it. Understanding must move with the flow of the process, must join it and flow with it." - The First Law of Mentat, quoted by Paul Atreides to Reverend Mother Gaius Helen Mohiam
Really, what I needed was a tool in two parts. The first would watch what happens during the build process, and the second would take that data and spit out some .vcproj files.
When I want to see what's happening on my own system I use ProcMon - a Sysinternals tool that monitors processes, what files they touch, what commands get executed, etc. I grabbed that and tried to watch what happens when you run NMake on the makefile when building PHP. It turns out that are a few problems with that - ProcMon isn't very scriptable (making it tricky to automate) and even if it was, it has problems chopping off the command line in its log files when it's past a certain length.
I found nothing else that did quite what I needed, so I started thinking about how to write a tool that does the same thing. In the past I have used Detours (an API detouring library built by Microsoft Research) to build a couple of quick-and-dirty snoop/debugging tools. Starting with a sample that came from the Detours library, I cobbled together a tool that would watch a process and its children, recording every file written or read, every command issued, and dump it into an XML file which I could process later.
Creating the project files
At the same time, I began working on a tool that would generate .vcproj files from the data gathered during the make process. I first tried just putting together a tool which assembled the .vcproj XML file from what I knew about the layout of the project file but, as the build got trickier, the xml was getting harder to make sure it came out the way that Visual Studio expected. I turned to the Visual Studio SDK to see if there were any COM objects I could use to manipulate project files - there were, but they aren't documented in great detail, and they were really designed to be used to inside Visual Studio for automation. Having scoured the planet, I found some examples of using the VCProjectEngine to generate project files.
For a couple of weeks solid, I worked on the tool to generate project files, compiling, testing, tweaking, etc. I finally reached a point where I generated a project file completely that would compile the php.exe and php5.dll . Having finally arrived at this point, I built PHP using PGO instrumentation, ran the bench.php script from the PHP source directory, and then re-linked the project. This first time, I saw about an 18% improvement in speed over the previous version!
"It ain't over 'til it's over, and maybe not then, either. " - Slovotsky's Law #29
Well, as anyone who's done software development will tell you, there's the moment when you finally get your program to do what you want under very controlled conditions, and then - quite some time later - there's the moment that you can give the fruits of that labor to someone else so they can do the same thing.
Now that I had passed the point where I'd finally proven that it was worth the effort to build a PGO-optimized version of PHP, I had to get it scripted so that it could be done in an automated fashion, not just on my computer or a computer in our Lab.
In the final part, I wrap up with the automation of the build and look to where we might go next in PHP.