Some Information of Compiler

Deen · 发表于 2004-6-8 18:46:00

C/C++ Compiler Optimization
Dr. Dobb's Journal May 2004

Focusing on speed
By Matthew Wilson
Matthew is a software-development consultant for Synesis Software, creator of the STLSoft libraries, and author of Imperfect C++ (Addison-Wesley, 2004). He can be contacted via http://stlsoft.org/.
--------------------------------------------------------------------------------

In "Comparing C++ Compilers" (DDJ, October 2003), I compared leading Win32 C/C++ compilers against several criteria, including build size, build speed, and execution speed. In this article, I focus exclusively on execution speed. Because the performance of template code is something that is of particular interest to me—and modern C++ practice involves templates to a significant degree—most of the tests performed involve templates. However, there are also two C tests, for reasons that will be explained shortly.

I built all executables with each compiler's maximum speed optimization settings (see Table 1), as far as each allows for the target architecture—Pentium 4—of my test machine. The source and makefiles are available electronically (see "Resource Center," page 5).

There are a couple of issues that need to be addressed from the previous article. First, I made two errors (see http://synesis .com.au/resources/articles/errata/ddj200310.html), the first being that, when I did the Dhrystone test, most of the compilers were optimized for space rather than speed. I don't have any explanation for this; it was not my intention, rather a regrettable oversight. The second misstep was that I failed to apply the -ox flag, in addition to -ot, for the Watcom compiler. This one was plain ignorance, and I thank the chaps from the Open Watcom organization (http://www.openwatcom.org/) for helping me see clearly through the perplexing array of optimization options.

The second issue is that some of the scenarios had been built for size, and their speeds were tested. This was fair in the context of the previous article, since I was examining a raft of compiler characteristics, and optimization for space is a legitimate option that is often advised as the best policy for large systems. However, this message was not well expressed in the article, as I received several e-mails taking me to task on the issue. Furthermore, in hindsight of the tests run in this article, the abilities of compilers to provide good speed as a byproduct of size optimization are known to vary considerably. Some of the scenarios here are the same as those from the previous article, but when optimized for speed, the results for some compilers differ markedly. For others, they are pretty much the same.

Another difference is that the set of compilers to be examined has changed, reflecting changes in the industry over the last six months. Borland 5.6.4 (C++ BuilderX) is used instead of Version 5.6 (C++ Builder 6). Digital Mars is now at 8.38, rather than 8.34. I use the new Intel 8.0 rather than 7.0. Open Watcom is 1.2, rather than 1.0.

I've dropped Visual C++ 7.0 because it's an unnecessary enhancement to Version 6.0 when you consider that the excellent Version 7.1 is available for free as part of the .NET SDK.

Comeau 4.3.3 is now featured, though Comeau still does not yet officially support Win32. Despite this, I felt it was important to include it because it is the only 100-percent standard conforming compiler currently available. Also note that I have used it with the Visual C++ 6.0 back end. This means that some aspects of the performance may reflect that of Visual C++ 6.0 rather than Comeau's innate abilities. This is an artifact of the Comeau architecture and its reliance on a back-end compiler, and not something we can (expect to) do anything about other than to employ a different back-end compiler. However, if you're a Comeau user on Win32, one thing you might want to do is to e-mail the vendor about developing Intel back-end compatibility, as Comeau is a demand-driven (and very responsive) vendor. Note that Comeau uses the Visual C++ 6.0 runtime libraries and Intel uses the Visual C++ 7.1 runtime libraries.

Tests
There are two tests that are exclusively or primarily C only—Dhrystone and zlib, which I also featured in the previous article.

There are seven C++ tests: auto_buffer, fixed_array, int2string, multi_array, pod_vector, string tokenization (Boost), and string tokenization (STLSoft). For the C++ tests, I've endeavored to isolate any compiler library-specific performance by using the processheap_allocator from WinSTL (the Win32 subproject of STLSoft; http://winstl.org/) for all classes that take allocators. This directly accesses the Win32 heap API for the current process's heap, so all compiler's C++ executables should have the same memory allocation scheme and experience the same conditions.

auto_buffer uses the STLSoft template of the same name that efficiently provides local buffers whose sizes are determined at runtime (see "Efficient Variable Automatic Buffer," C/C++ Users Journal, December 2003). This test creates and uses 100-byte buffers, 10 million times.
fixed_array and multi_array are rectangular array template tests, conditionally compiled from the same source file. The former uses the STLSoft fixed_array_3d template class, the latter the Boost multi_array template. Both scenarios create dynamically sized arrays of 10×50×100 doubles, then walk through them accessing and setting each element to exercise the indexing functionality of the array classes.
int2string uses the STLSoft integer_to_string template function suite (http://www.cuj.com/documents/s=8943/cujexp0312wilson/) to efficiently perform conversions of 10 million integers to character string form.
pod_vector is an STLSoft template that provides superior performance over std::vector for POD (plain-old-data) types. It achieves this by omitting the destruction of elements, using direct memory-manipulation functions (memcpy(), for example), and auto_buffer. The first two are always beneficial; the latter represents an optimization in the average case. This test exercises a range of operations, such as front insertion, back insertion, front erasure, back erasure, and the like.
The two string tokenization scenarios are the same as described in my October 2003 article.
Apart from the Dhrystone, all the tests are carried out using a custom test harness that executes each compiler/scenario permutation a given number of times, extracts the performance figures via regular expressions, and calculates their averages, discarding the lowest and highest to try and avoid any blips or operating system caching. The Dhrystone figures were similarly obtained by calculating the averages of a large number of executions for each compiler.

For the Dhrystone test, the higher the number of Dhrystones per second, the better. For all other tests, lower time indicates better performance; all of these were obtained by measuring the active section of code using the WinSTL performance_counter (see my previous article; http://www.windevnet.com/documents/win0305a/).

There are two final points to note. Intel 8 is not explicitly supported by the version of Boost (1.30) that I used in this test, and warnings to that effect were printed to the console during compilation of multi_array and string tokenizer (Boost). Despite this, I have little doubt that there's anything in the Boost libraries that would be significantly different for compilation with Intel Versions 7 (which is recognized) and 8. The superior performance of the Intel compiler for these two scenarios bears this out.

The second issue is that Digital Mars 8.38 crashes the compiler in the compilation of the auto_buffer and pod_vector scenarios if exceptions are turned on, via the -Ae option. Once again, I don't think this affects the results much, but it's only fair to mention it, since all other compilers have exceptions enabled for all C++ scenarios.

I spent time to ensure as much compatibility as possible, but still not all compilers support all of the C++ scenarios. Digital Mars is not configured with the Boost 1.30 I used, despite now conforming almost completely to standard. Borland experienced internal compiler errors compiling some things from Boost. Others had similar issues.

As it turns out, these missing data points are of little consequence. As Table 2 illustrates, even if you adjust the average scores of these compilers to take into account only those scenarios in which they participate, the top five or six ranking places remain unchanged.

As expected, the correct Dhrystone results (Figure 1) paint a different picture to that presented in my October 2003 article. Visual C++ 7.1 is the best and, along with Intel, is head and shoulders above the rest. CodeWarrior also stands out with a good performance. Then come Open Watcom, Visual C++ 6.0, and Comeau close together. Borland, GCC, and Digital Mars fill the last three places, at around 60 percent of the performance of Visual C++ 7.1. The previous test had Digital Mars, Intel, GCC, VC++ 6.0, VC++ 7.1, Borland, CodeWarrior, Open Watcom, in that order; so its results were, indeed, misleading for most of our compilers.

In the zlib scenario, the manipulation of the large file to be compressed is done outside the timed region, so the performance figures obtained represent that of the compression function—zlib's compress()—only. As Figure 2 shows, Borland is the best, closely followed by CodeWarrior, then Open Watcom and Digital Mars. Intel, Visual C++ 7.1, and GCC trail by about 10-20 percent, and Visual C++ 6 and Comeau by about 35 percent.

The first of my C++ tests is a mixture of expected and surprising results; see Figure 3. Intel and Open Watcom are noticeably superior, followed by CodeWarrior and Visual C++ (6 and 7.1, respectively). GCC and Digital Mars are roughly twice as slow; Comeau around three times, and Borland is about five times slower. For such a simple template as auto_buffer, this is not good.

With the fixed_array performance scenario (Figure 4), once again Intel is the best, but with Visual C++ 7.1 snapping at its heels. Next are GCC, CodeWarrior, and Visual C++ 6, about 50 percent slower. Digital Mars is more than twice as slow as Intel, Borland three times, and Comeau around five times. The fixed_array_3d template is more complex than auto_buffer, but it's still surprising to see such a large range in performance.

If you were using Boost's powerful multi_array template class on Win32, the results of the multi_array test (Figure 5) would indicate that you should be using Intel or GCC; nothing else comes close. Visual C++ 6.0, CodeWarrior, and Visual C++ 7.1 all come in about the same, being about three times as slow as Intel. Alas, Comeau seems to be having a hard time, being more than 20 times slower than Intel.

An interesting feature of this test is that it shows that, with Intel, the Boost rectangular array is about on a par with the STLSoft one, which I wrote with performance in mind. For all other compilers, the STLSoft class performs significantly better (up to four times faster), which reflects its simpler, less flexible design.

I think these two rectangular array tests ably show just how challenging an area template optimization can be. The considerably increased sophistication of Boost's multi_array template over STLSoft's fixed_array exposes the difficulties that all compilers—except Intel in this specific case—have in translating the simple logical requirements of a programmer's intent into efficient code. It's no straightforward matter, and it is reckless to write arbitrarily complex code and just assume that the compiler takes care of optimizing it all away for you. This is a serious issue that all fans of template complexity, metaprogramming, and the like, should be aware of.

For integer-to-string conversions (Figure 6) Intel wins again, followed closely by GCC, then Digital Mars and Comeau. CodeWarrior and Visual C++ are next at about twice the cost of Intel. Open Watcom C and Borland bring up the rear.

When I badgered Walter Bright to look at the Digital Mars template optimization of the integer_to_string template, he explained that the compiler was not fully inlining all of the supporting functions. He reworked the compiler so that it does so with Version 8.38, as its previous performance was more in line with that of Borland. I would assume that this is what's happening, to varying degrees, with the slower compilers in this scenario. Indeed, my guess is that inlining depth, or lack thereof, is a major factor in the performance differences throughout the C++ scenarios.

The pod_vector (Figure 7) scenario doesn't throw too many surprises, other than that Comeau gives Intel a serious run for its money. Given the fact that this scenario exercises a number of different aspects of the pod_vector template, coupled with the complexity of pod_vector relative to most of those in the other scenarios presented here, it's impressive that we have such a relatively close grouping over the eight compilers featured in this summary. For a change, Intel is less than twice as fast as its competitors.

Figures 8 and 9 show Boost and STLSoft tokenizer performances, respectively. For Boost's string tokenizer, it is Intel first, and this time with GCC second. Just as with multi_array, these two compilers give the best performance with Boost. CodeWarrior and Visual C++ (6 and 7.1) are also in the game. Borland is more than twice as slow, and Comeau three times.

With the STLSoft tokenizer, Visual C++ 7.1 and CodeWarrior pip Intel for line honors by about 15 percent, which is at least a break from the monotony. Next come GCC and Visual C++ 6 at 25-30 percent slower, and then Borland, Comeau, and Digital Mars at about twice as slow as Visual C++ 7.1.

Conclusion
Any ranking scheme is, of course, arbitrary, so I'll stick to a straightforward one. The compilers scores points from 10 for best performance down to 2 for worst; those that do not compile or execute for a given scenario are given zero. Table 2 shows the averages of these rankings for the C scenarios, C++ scenarios, and overall. For those compilers that did not feature in all scenarios, the number of missing scenarios is noted along with an average score for the scenarios in which they did feature.

These rankings seem to indicate that Intel is the fastest compiler, with an impressive average score of 9.22. Since the Intel compiler specifically targets Intel processors, it's not surprising that it has a superior performance. All's not fair in business, and if you're exclusively targeting the Intel architecture, you probably want to seriously consider the Intel compiler—it provides a variety of optimization options for squeezing out the last drop in performance.

I confess, I was pleasantly surprised by the performance of Visual C++ 7.1 in averaging 7.56; it represents a significant improvement over Version 6, at least as far as these scenarios exercise their capabilities. (It might finally convince me to give up the trusty old Visual Studio 98.) CodeWarrior comes in a close third with 7.44, which is consistent with my expectations and previous experience with it (note that CodeWarrior 9 has just been released and may well perform even better than 8.3). GCC is the best performing of the free compilers, scoring 6.67.

I was surprised to see Visual C++ 6 come in next best, as this challenged several of my preconceptions/prejudices. Comeau was next, which is quite impressive when you consider that it's not yet officially supported on Win32, and that it used Visual C++ 6 as the back end; it's probable that if I'd used it with CodeWarrior or Visual C++ 7.1, it may have scored higher.

The remaining three compilers were all stymied by virtue of not being compatible with all the scenarios, and their average scores are correspondingly low. I've also included a weighted average of just the scenarios in which they did score, as a teaser for what kind of performance we might expect if/when they do support the templates. The next version of Borland is just around the corner. Digital Mars is now almost entirely standards compliant, and we can hope that the Boost configuration will be available soon. Alas, Open Watcom still seems some way from having sophisticated template support, but we can look forward to that time with some eagerness, if its performance in the auto_buffer scenario indicates likely performance in broader template contexts.

We should remember that this test was all about performance of compiled code, and focused on template code at that. We've not discussed conformance (Comeau takes the cake here), or cost (Digital Mars, GCC, Open Watcom, and Visual C++ 7.1 are free; Intel is free on Linux), or usability (CodeWarrior and Visual C++ have the best IDEs, in my opinion), or cross-platform abilities (CodeWarrior, Comeau, GCC, Intel, and Watcom all have versions that work with multiple platforms), or quality of warnings (they all score well on this). Even though some compilers do very well, it's not appropriate to assume that these relative performances will be reflected in, say, manipulation of polymorphic types; not without conducting tests to prove it, anyway. In any case, every compiler in this test acquits itself well in at least one scenario. I'm maintaining an errata/update page for this article at http://synesis.com.au/ resources/articles/errata/ddj200405.html. Performance results for newer compiler versions and, heaven forfend, any errata will be available there.

Whatever your work, I advise you to use as many compilers as possible to ensure the best quality of your software. In this context, the speed of some compilers may be secondary to their quality. You might elect to use a conforming compiler to validate your code's correctness, but actually build using a faster but less modern compiler.

Deen · 发表于 2004-6-8 18:51:00

Comparing C/C++ Compilers
Dr. Dobb's Journal October 2003

It's all about flexibility, portability, efficiency, and performance
By Matthew Wilson
Matthew is a consultant for Synesis Software, as well as author of the STLSoft libraries and the upcoming Imperfect C++ (Addison-Wesley, 2004). He can be contacted at matthew@synesis .com.au or http://stlsoft.org/.
--------------------------------------------------------------------------------

Despite the advent of new programming languages and technologies, C++ is the workhorse for many developers, and is likely to remain so for a long time to come. The main reasons for C++'s prominence are its flexibility, portability, efficiency, and performance. Yes, even with the increase in processing power, software performance continues to be important, and C++ is a language that—when used correctly—provides superior performance in virtually any context.

In this article, I compare nine popular C++ compilers in terms of performance, features, and tools. The compilers are either exclusively Win32 or provide Win32 variants. I conducted all studies on a Windows XP Pro machine (single-processor, 2 GHz, 512 MB) with no other busy processes. The compilers I examine are:

Borland C/C++ 5.6 (C++ Builder 6). http://www.borland .com/products/downloads/download_cbuilder.html
Digital Mars C/C++ 8.34. http://www.digitalmars.com/
GNU C/C++ 3.2 (The MinGW 2.0 distribution). http://www .gnu.org/software/gcc/
Intel C/C++ 7.0. http://intel.com/
Metrowerks CodeWarrior 8.3. http://store.metrowerks.com/
Microsoft Visual C++ 6.0. http://shop.microsoft.com/devtools/
Microsoft Visual C++.NET 2002 (VC++ 7.0).
Microsoft Visual C++.NET 2003 (VC++ 7.1).
Watcom C/C++ 12.0 (Open Watcom C/C++ 1.0). http:// openwatcom.org/
As for bias, I confess to having soft spots for DigitalMars, Intel, and CodeWarrior, all of which have helped me in creating the STLSoft libraries (http://stlsoft.org/). Nevertheless, my day-to-day tool of choice is not one of these.

Compilation Time
In many situations, compilation time is not important. However, it is crucial on large systems or in development situations with frequent builds (such as Extreme Programming). When compiling/linking source, important factors include the number of inclusions, use of precompiled headers, complexity of code, aggressiveness of optimization (in both compilation and linking), and size of translation units. For this article, I considered these scenarios:

1. C1. A large (1000 functions) monolithic (no include files) C-file (compilation only; no optimizations).

2. C2. A C file with a large number (500) of include files (compilation only; no optimizations).

3. C3. A C file with a large number (100) of nested include files, each of which is included by its prior file, and then by the main file, thereby testing the effects of multiple inclusions and include guards (compilation only; no optimizations).

4. pch. A suite of C++ files (main.cpp, pch.cpp, and 40 .h/.cpp class files) sharing common header(s), facilitating precompiled headers (compile and link; precompiled headers; no optimizations).

5. whereis. A single complex C++ file with several template and operating-system library includes (compilation only; optimized for space). This tool provides powerful command-line searching and is included as a sample in the STLSoft libraries, exercising much STLSoft code.

6. MMComBsc. A large (44 C and 37 C++ source files, 111 header files, 80 KB in production) DLL providing COM functions and classes (compile and link; precompiled headers; optimized for space).

7. zlib. A free, general-purpose, data-compression library portable across hardware and operating-system platforms.

I used Python scripts (available electronically; see "Resource Center," page 5) to generate the source files for scenarios 1-4. The source files are very large and not included with this article. The whereis source is available at http://stlsoft.org/. (You can get the most up-to-date binary from my company's web site, http:// synesis.com.au/r_systools.html.) The source files for MMComBsc.dll contain too many proprietary goodies for me to include here, so you'll have to take my word for the figures reported.

I used ptime (http://synesis.com.au/r_systools.html) to get the results from scenarios 1-3 and 5 by executing multiple (15) times, discarding the two-highest and one-lowest results, and reporting an average of the rest. This reduces distortion from caching or startup. I executed scenarios 4, 6, and 7 using makefiles, timing the process via ptime. Table 1 presents the results.

The "Did Not Compile" (DNC) notation for CodeWarrior in scenario C3 results from the compiler refusing to process the nested include depth of 100; tests showed that 30 was the limit. CodeWarrior help says, "To fix this error, study the logic behind your nested #includes. There's probably a way of dividing the large nested #includes into a series of smaller nests"—which is probably true, but may not always be so. Watcom could not compile the whereis and MMComBsc scenarios because it doesn't support templates sufficiently.

There are some significant differences—up to two orders of magnitude in some cases—between performances. Borland comes off best, closely followed by VC++ 6, with Digital Mars and VC++ 7 about an equal third. CodeWarrior, GCC, and Intel are the sluggards of the group. (Naturally, it's not possible to create a single objective comparison criterion, even if you have an exhaustive set of scenarios. The way I've done it is to do three rankings. First, positions 1-9 are summed—lowest value wins. Second, the first four positions are awarded 10, 7, 5, and 3 points—highest value wins. Third, the first three positions are awarded 5, 3, and 1 points—lowest value wins. Only when these rankings are in accord do I talk of "best," "second," and so on.)

VC++ and Watcom are streets ahead when precompilation is appropriate—that is, when most or all of the source is C++. VC++ 7 compiled the pch scenario 43 times faster than CodeWarrior! Also, VC++ 7.1 is slower than VC++ 7.0 in every test.

Speed of Generated Code
Next, I looked at the speed of generated code, restricting myself to these five scenarios:

1. Dhrystone. This benchmark (http://www.webopedia.com/TERM/D/Dhrystone.html) tests integer performance. Since it is CPU bound (that is, there is no I/O or resource allocation within the timed sections), it is a good test of pure compiled code speed. The performance is measured as number-of-Dhrystones per second (a bigger number is better).

2. Int2string. Converting integers to string form can be a costly business. Ten million integers (0=>9,999,999) are converted to string form, and their string lengths summed (to prevent over-optimization). The two approaches I used employ different mechanisms for conversions:

· The compiler library's sprintf(). This performance reflects the difference in the efficiency of the compilers' libraries. (Intel uses VC++ 7.0 libraries.).

· STLSoft's integer_to_string<> template function (see my article "Efficient Integer To String Conversions," C/C++ Users Journal, December 2002). This inline template derives the string form empirically, so performance directly reflects the compiled code's performance.

3. StringTok. This generates a large set of strings to tokenize, using ";" as the delimiter. It tokenizes the string, then iterates over the sequence totaling the token lengths. (It avoids over-optimization by the compiler, but maintains consistency of test data between compilers by pseudorandomizing based on the Win32 GetVersion() function, which returns the same value for all programs because they're run on one test system.) I used the boost::tokenizer<> (http://boost.org/) and stlsoft::string_tokeniser<> (http://stlsoft.org/) tokenizer libraries.

4. RectArr. To really hammer the ability of compilers to generate efficient code in complex template scenarios, I used STLSoft's fixed_array_3d<> 3D rectangular array template. I parameterized a value type of stlsoft::basic_simple_string<char> instead of std::ba- sic_string<> to promote effects of compiler efficiency and reduce differences in their respective standard library implementations. The scenario creates a variable-sized 3D array, (100×100×100) and iterates through all three parameter ranges, assigning a deterministic pseudorandom value to each element. Two approaches are performed.

The first approach conducts this enumeration once.
The second approach does it 10 times. Thus, the cost of allocating and initializing the 1 million members is amortized (and thus diluted) in the second variant, focusing instead on the costs involved with the array (template) element access methods.
5. zlib. This is a library featured in many applications (http://zlib.org/). It seemed a valuable, and uncontrived, performance test. The test program memory maps a given file, memory maps a corresponding output file, and then, within a timed loop, compresses the entire contents of the source file. I compiled both zlib 1.1.4 source and the test program with the nine compilers, and executed it on both large (65 MB) and small (149 KB) files.
Other than the Dhrystone scenario (the implementation I used has its own internal measurement mechanism), all scenarios derive their timing behavior from WinSTL's performance_counter class (see http://winstl.org/ and my article "Win32 Performance Measurement Options," Windows Developer Network, May 2003; http://www.windevnet.com/ documents/win0305a/), which times the appropriate internal loop. Each has a warmup loop so that the results reflect pure code performance, rather than being influenced by operating system or other effects. All scenarios were optimized for speed (-O2, -opt speed, -o+speed, -O3, -O2, -O2, -O2, -ot). Table 2 presents the results.

Except for the Dhrystone scenario, I executed within a custom test harness that ran them nine times, discarded the highest and lowest times, and averaged the remainder. The source code for all scenarios is available electronically.

The "DNC" for Digital Mars is because Digital Mars is not supported in Boost 1.30, which I was using. Boost/Digital Mars compatibility is underway, and may be complete as you read this article. The multiple DNC entries for Watcom reflect its general lack of template support.

Intel is streets ahead of the rest, being fastest in two scenarios and second in five. (Indeed, its only poor performance is in the Int2String(sprintf) scenario, in which its performance is heavily dependent on VC++ 7.0 run-time library's sprintf()). Second come Digital Mars, VC++ 7.0, and VC++ 7.1, all about even. Considering that Digital Mars has the Boost no-show, it's a creditable overall performance.

By virtue of its no-show in five scenarios, and very poor performance in two others, Watcom takes the wooden spoon. However, it wins the Int2String(sprintf()) scenario, so things aren't all bad. Borland and CodeWarrior do well in a few—Borland is quickest in zlib (large)—but let down in other areas. GCC performs badly all around, except for the two STLSoft variants.

It's worth noting the differences between the variants of the Int2String and StringTok scenarios. Using STLSoft's integer_to_string<> template provides significant performance advantages, with execution times being between 15 and 55 percent of those of sprintf(). The string tokenizers exhibit considerable differences: The execution time of STLSoft's tokenizer is between 6 and 26 percent of Boost's.

Size of Generated Code
Execution speed is not always more important than size, nor do speed optimizations always provide faster executing processes, since larger code is more likely to undergo cache misses and require consequent virtual memory activity by the operating system. (I always optimize for size, and only for speed based on the results of testing. I'm in good company. In Debugging Applications for .NET and Windows, John Robbins reports that Microsoft optimizes for size on all operating system components.)

In any event, you always prefer smaller code. In Table 3, which focuses on module size, VC++ wins hands down. VC++ 7.0 produces the smallest code, followed by VC++ 7.1, and then VC++ 6.0. Intel, Digital Mars, and Watcom acquit themselves reasonably well, taking one scenario each. Borland and CodeWarrior don't do too badly, except where it really matters in the one sizable, real-world project. The jaw-dropping miscreant is GCC, with modules up to 10 times the size of the leader in some scenarios.

Language Support
Compiler support for language features is also important. Since there are a huge number of features that are (not) supported by modern C++ compilers, I focus on those I know of and am interested in; see Table 4.

Having wchar_t as a built-in keyword is not that important, since it can be easily, portably, and robustly synthesized via the preprocessor—usually with a typedef from unsigned short;. However, this does reduce overloadability. The __func__ predefined identifier is nice for debugging infrastructure, but again, there are workarounds.

The importance of floating-point precision is not as easily dismissed (see "How Java's Floating-Point Hurts Everybody Everywhere," William Kahan and Darcy, http://http.cs.berkeley.edu/~wkahan/JAVAhurt.pdf). Only Borland, Digital Mars, GCC, and Intel (with option -Qlong_double) provide long-doubles that match the Intel architecture's 80-bit capabilities. For serious numerists, this will be important.

Static assertions are also important, since they facilitate checking of invariants at compile time, rather than run time. They are based on the illegality of zero or negative array dimensions (int ar[0];, for example) and are usually wrapped up in a macro such as:

#define stlsoft_static_assert(_x)

do { typedef int ai[(_x) ? 1 : 0]; } while(0)

The Digital Mars is the only compiler that does not support them, although it will do so from Version 8.35 onwards. Note that neither Borland 5.5(1) or 5.6 are able to optimize them out of code, leading to performance costs.

Variable-length arrays (VLAs) and dynamic application of the sizeof operator are C99 features. Only Digital Mars and GCC support them. Except for VC++ 6, all compilers support covariant return types.

Koenig lookup is a useful mechanism (see my article "Generalized String Manipulation: Access Shims and Type Tunneling," C/C++ Users Journal, August 2003), whereby operations associated with an element from one namespace may be automatically accessed from another without namespace qualification. VC++ (6 and 7), Watcom, and, except with nondefault parameter, Intel, do not support this.

The for-scoping rules were changed in C++.98 (ISO/IEC C++ Standard, 1998), and all compilers except VC++ 6 and Watcom support the new syntax. Interestingly, Intel gives a warning when used correctly, but in a way that would fail to compile under the old rule. (In my opinion, you should never write code that relies on either old or new rules, so this should not occur in production code. This warning is useful to avoid doing so.)

All the remaining issues involve templates. Though it does have some template support, Watcom fails on all remaining tests. VC++ 6 and 7 fail on the important facility of partial specialization, although Version 7.1 provides full support.

One VC++ 6 weirdness is that it won't accept the typename qualifier within the default parameters of a template, something other compilers (GCC and CodeWarrior, for example) mandate. You must resort to the preprocessor to support them all. (Conformance masochists can check out the definition of ss_typename_type_def_k in the STLSoft headers.)

Except for Digital Mars, VC++ 6.0, and Watcom, all compilers support template templates. This technique isn't currently widely used, but is useful and will be more so in the future. Future compilers need to support it.

Overall, GCC is the clear winner. Since I think static assertions, 80-bit floating-point, and Koenig lookup are more important than VLAs and __func__, I would put Borland second, CodeWarrior and VC++ 7.1 third, Digital Mars fourth, and Intel fifth. I would rate all of these as good compilers. Next come the other VC++s and Watcom. Once the next version of Digital Mars is released, it will likely have a perfect score, too. However, you can expect likewise from other vendors soon, since language conformance has become a marketable feature once more.

One feature not included in Table 4 involves typedef templates (see "The New C++: Typedef Templates," by Herb Sutter, C/C++ Users Journal, December 2002), which none of the compilers support. VC++ 7.1 does report that "error C2823: a typedef template is illegal," which suggests Microsoft intends on supporting it soon, possibly in Version 7.2.

Features
The Standard Library. Except for Digital Mars and Watcom, all compilers support the C++.98 Standard Library without major problems. Digital Mars C++ comes with both SGI's STL and the latest STLport (http://www.stlport.org/), but has yet to update to the new header names (<iostreams> rather than <iostreams.h>) and to have things declared within the std namespace. Both compilers are working toward full conformance.

ATL. As far as I know, all compilers other than GCC and Watcom support ATL, although I suspect that some only support Version 3, not 7. I'm not aware of any language-support issues preventing GCC from supporting ATL, but I haven't tested this. I am certain that Watcom could not support ATL, because of its current template deficiencies.

Boost. Boost is a library suite supported by all the compilers except Digital Mars (where support is coming), Visual C++ 6 (which has limitations), and Watcom (which is not supported at all).

Managed C++. Only VC++ 7.x supports Managed C++. In the context of this article, however, Managed C++ isn't a bona fide feature because Managed C++ is not C++, any more than C is C++.

MFC. Despite showing its age, MFC is a widely used (and occasionally useful) framework. It is fully supported by Visual C++, Intel C++, CodeWarrior, and Digital Mars. I also understand it is available with Borland C++ Builder, though I have not used it with that compiler. To my knowledge, neither Watcom nor GCC support MFC.

STLSoft. Except for Watcom, all compilers support most of the STLSoft libraries, and even Watcom supports a sizable part (where the templates are within its capabilities). STLSoft is bundled with Digital Mars 8.34 upwards.

Win32/Platform SDK. All the Win32 compilers support the Win32 API (including many Microsoft language extensions, such as __declspec()), although some do not support the version that comes with the Platform SDK (various constructs—including inline assembler—are not recognized). Specifically, GCC and Watcom do not support the February 2003 version of the Platform SDK.

16-bit. Both Digital Mars and Watcom support 16-bit targets. Demand for this is low; but if you need it, it's good to be able to get it somewhere.

WTL. I recently did some work to get various compilers (including VC++ 5!) to work with WTL, but have not reached a definitive conclusion as to support. What I can say is that VC++ and Intel work with it out-of-the-box, CodeWarrior with a little work, and Borland and Digital Mars with a fair bit of effort.

I'd like to mention one thing I'm fond of—the Digital Mars -wc flag—which warns about all C-style casts within C++ compilation units. This is great when sifting through code to find areas that need "modernizing." (Also, the author of the Digital Mars compiler added this feature on my request, and did so within an amazingly short turnaround.) It would be nice to see this in other compilers.

Tools
Most of the compilers come with Integrated Development and Debugging Environments (IDDEs). Since editor/IDDE preferences are nearly as religious as those for bracing style, I couldn't hope to do a balanced job—even if I knew all the salient features of each environment. While I have some experience with all the IDDEs, and can say that they all provide the minimum functionality required to create/edit projects and source and debug executables, some are (to be candid) pretty basic. The Digital Mars and Watcom IDDEs aren't going to pry many programmers from their favorite environments.

Perhaps anachronistically, my editor of choice is the Visual Studio 97 IDDE, which I use because I know the keystrokes, can write some useful wizards/plug-ins/macros, and it's quick, doesn't require the use of the mouse, can debug, and doesn't crash. I have experience with the IDDEs of C++ Builder, Digital Mars, CodeWarrior, and Visual Studio 98, and .NET, and they either have too many or too few features, crash, or make me take my hands off the keyboard. Nonetheless, I know people who swear by them, so it's a case of to each his own.

Conclusion
Clearly, you cannot simply say compiler X is superior to all others. Most compilers are superior to others in one or more respects.

QiKi · 发表于 2004-6-8 21:21:00

DDJ的文章就是好

账号		自动登录	找回密码
密码			立即注册

Some Information of Compiler

Go On ...

Re:Some Information of Compiler