February 26, 2007
During the last days I was heavily debugging my profiler listening application and wondered what could go wrong.
The current attempt is easy: I use PostMessage() to send the profiling information to the listening application. Of course, when you are doing lots of stuff, there is some heavy throughput, but nothing that should influence any of the applications. Neither the profiled application, nor the profiler itself. But the profiler user interface was almost dead when it came to lots of messages to be parsed.
It tried everything, handling multiple messages at one time, to get rid of some update() calls, use more stepping while processing the data, I even disabled updating the processed amount of data itself.
During one of these frustrating debugging sessions to figure out what is going on, I nearly wanted to kill the application. So I called the Task Manager of Windows XP, and what did I see there? Visual Studio EE took around 90% of CPU usage. So I called the profiler from console itself and guess! It ran like hell, without any problems, without any blocking.
The result of all this is, that Visual Studio seems to filter each message being send to your application while you are debugging it. There might be a reason for it, but currently I do not see a point in doing so. If your application crashes in the event handler, you will get a valid stack crawl anyway.
So finally I can continue work on the profiling application itself…
February 20, 2007
While I was continuing work on the function profiler I talked about earlier, I started coding a small application, which should visualize some results. I thought I encountered a bug in the profiler itself, so trying to isolate the problem directed me to a container issue.
The app is written in Qt, and it uses QMap quiet heavily to store data and have fast access to some elements. I thought there might be some problems with templates then, because the output repeated itself again and again, so I wrote a small testapp like this:
int main(int, char**)
map = 4;
Very short app, isn’t it? It inserts one element and clears the map afterwards. So what is your guess on how many functions are getting called? 3? Of course not. 10? Still more. It is 40 function calls including the main itself.
I was heavily shocked and thought that QMap is much too overheaded. But after a short talk to a friend, he suggested that I should try out the STL being included in the Platform SDK. So simply changing the app to
int main(int, char**)
map2 = 4;
resulted in 249 function calls. This is even more scary and now I am happy that I decided to use Qt for data storage
For the ones of you, who are interested in the exact functions, I attached two text files, which show the calls made during that 5 lines of code.
My guess is, that the actual function call count gets much reduced in non-profiling mode, as inline functions are not inlined in this situation, but still it is impressive and it gives you a good hint, that you should be careful which functions you call or which data types you want to use blindly.
Qt Call History
STL Call History
February 18, 2007
This is a question developers tend to ask their own code from time to time. Some weeks ago, I had to do some investigation on profiling applications for Windows derivatives at work. Although we decided to take an already existing project, I was interested in how one can catch what goes on with your code. That’s how this all started and today is part one of a series of entries about function call profiling.
Starting to read on the MSDN I found some interesting articles. When you compile your code with the /Gh option, the compiler automatically adds a call to a function with the prototype
extern "C" void __declspec(naked) _penter( void );
in the function prolog. The function prolog is actually the part of the function before variables get initialized or any code has been executed. For more information look at here.
The __declspec(naked) part describes that _penter() does not have a prolog and neither an epilog exists. This is very interesting, as a line like
simply does not work. The compiler will give up. So automatic variable assignments are done in the function prolog. Because of this, you have to write
The same applies for exiting a function. It is the /GH option described on this page. The function prototype is the same, so we use
extern "C" void __declspec(naked) _pexit( void );
By the way I would like to remark, that it is my personal taste to prefer extern “C” over _cdecl. They mean the same.
Now let us take a look at the assembler part. As I had nearly no clue about assembler (and to be honest I only now a little bit more by now), this was one of the most challenging parts. The MSDN docs linked above only show that you need to protect the profiled function by pushing the registers before you can do anything. The problem with this is, that we need one specific information stored in these registers. The value of ebp with some offsets is exactly the position, where the profiled function starts. With this position we can calculate the function name later.
Before I show the final version of the assembler part, one additional issue needs to be solved. For each variable you instantiate, you need to allocate the memory for it in the assembler code. Remember the function is naked, so nothing is done for you. Near to capitulation I found this. It describes, how you can use __LOCAL_SIZE for letting the compiler do the allocation for you.
Finally, when you are done do not forget to write your own epilog with freeing memory, and restoring the old registers.
unsigned int framePtr;
mov ebp, esp
sub esp, __LOCAL_SIZE
mov DWORD PTR [framePtr], ebp
// do stuff here
mov esp, ebp
So now that we have our functions, which get called when each profiled function enters or leave, we want to do something useful. So we continue on evaluating the function name. The resulted form will be ClassName::functionName.
The procedure is the following:
- Get the module name
- Access the symbol table
- Get the function name
- Clean up
If you like to take a look at the implementation, simply download the files attached to this article.
The profiler gets compiled as a static library, which then gets linked to your project you want to profile. Take care that you never compile the library with the /Gh nor /GH option, because then you would try to profile the profiler itself and will most likely end in an endless recursion.
Additionally I need to emphasize that you should not try this on large applications. As you all know, profiling slows your application down. But using printf inside of the profiler heavily kills the performance like every I/O call does. This will get fixed in a future version.
Also I would like to mention projects floating around on the web, which took also care about profiling. Take a look here and here. Both projects behave a little bit different. They use the _penter() function, but reset the return value of the profiled function. So when the function wants to returns, actually a _thunk() function gets called, which then calls _pexit(). Afterwards you need to reset the return value to the real one, otherwise things get invalid. This attempt seems to be a little bit overheaded for the current stuff and it only gains, that applications need to be compiled with /Gh and not the additional /GH flag.
If you want to try it out, download this zip file and have fun.
Next time I will have added some stuff on creating statistics and a small app displaying the results. I hope you liked the article and if you have questions or like to comment something, feel free to…
February 17, 2007
As some people have strongly the opinion that everything should be for free, I accommodate and like to share the following links with you.
The first ones are YouTube links, a place where you can find almost everything. Todays topic is “misheard” lyrics.
The first one is from Trivium and was posted on Dan’s page, so I did not explore this one myself. Beside the fun of this movie I recalled Trivium with a different style, so I even learned something new about them. Seems like they went more to a metalcore style. Here’s the link.
Following a much older entry of the band Nightwish with some misheard lyrics to the song Wishmaster. I still feel sorry for the apprentice being fired for farming, but watch yourself.
Finally someone gave me link on IRC to a great application called FretsOnFire. For those of you, who know Guitar Hero, it is a familiar concept. You press keys on your keyboard, which symbolize a tone or chord on a guitar. The concept is as easy as fetching and challenging. On of the biggest pro aspects of it is its availability for the most known operating systems Microsoft Windows, Linux, BSD and Mac OSX also. Additionally you are able to import Guitar Hero original songs, it resamples it to oggVorbis and MIDI files. So what are you waiting for. Get it and test your guitar skills.
Finally I updated to a new WordPress version as I wanted to try out some source code visualization plugins. Additionally spam is getting bigger and bigger. Some friends even decided to remove all comments. This was a step I never wanted to go and therefore investigated on some alternative solutions. The one I found is quiet funny and hopefully functional. It is called “Do you pass math?” and asks you a simple math question when you want to post a comment. Look here to download it or test it yourself.
Unfortunately the blogroll was broken for this theme and still is some sort of. The problem behind this is that links are now bookmarks. Bookmarks are categorized to the specific article you are currently reading. As no article has Blogroll as category, the blogroll was empty. I did a quick hack for now, but also read on the author’s page, that he is working on an update. So I will just wait for him.
To summarize the update process I am still happy that most of it worked fine, although it took much more time than there was scheduled for it. But this might also be related to the Snooker match currently being shown on TV
Good fight, good night…
Update: I installed a syntax higlighting plugin available here
int main(int argc, char**argv)
QCoreApplication app(argc, argv);