C++ Study Note(2): Cast the light to the darkness

Development May 23rd, 2007

The old-school C casting is reckless without type checking, to make it worse, the casting is easily overlooked by the maintainers. C++ introduces four cast keywords to rescue: const_cast, static_cast, dynamic_cast, and reinterpret_cast.

The keyword const is a contract between the library developers and users. Dropping the const decorator may incur unexpected behavior, for example:

class Foo
{
public:
        Foo() : dd(0) {}
        // int& bar() { cout << "non-const " << dd << endl; return dd; }
        const int& bar() const { cout << "const " << dd << endl;  return dd; }
private:
        int dd;
};

… …
        Foo foo;
        int & x = const_cast<int&>(foo.bar());
        x = 3;

C++ impose the developers to use const_cast to highlight the action, “Warning, warning, the const decoration is dropped…”, the same reason lies in the bool type.

If you are quite confident the type downcasting, static_cast is the right option for you. The compiler would adjust the offset and return a derived class pointer for you in the compile time, aka free overhead. If you could not guarantee the heritage, use dynamic_cast, the runtime would try to down cast the pointer/reference and return a valid pointer/reference if everything is OK, otherwise, a null pointer is returned for pointer down-casting, or a bad_cast exception is raised for the reference casting.

Update: Boost’s developers dislike the inconsistence, a new cast template function, polymorph_cast is introduced for the downcasting and crosscasting:

template <class Target, class Source>
    inline Target polymorphic_cast(Source* x BOOST_EXPLICIT_DEFAULT_TARGET)
    {
        Target tmp = dynamic_cast<Target>(x);
        if ( tmp == 0 ) throw std::bad_cast();
        return tmp;
    }

It helps the careless developers to test the validity of the returned pointer, throw an exception if necessary.

If you would like both the type safety from dynamic_cast, and also the performance of static_cast, Boost has another neat cast function, polymorphic_downcast for you such a greedy jerk.

template <class Target, class Source>
    inline Target polymorphic_downcast(Source* x BOOST_EXPLICIT_DEFAULT_TARGET)
    {
        BOOST_ASSERT( dynamic_cast<Target>(x) == x );  // detect logic error
        return static_cast<Target>(x);
    }

This cast only works on downcast the pointer, you can tell from the name; it does dynamic_cast in the debug version and static_cast in the release version.

Last and the least, reinterpret_cast, the compiler would just simply pretend the object has a new type without any validation check, offset adjustment; furthermore, it is NOT portable. Use it in caution and make sure you know clearly what you are doing.

C++ Learning Note(5): Koenig Lookup

Development April 24th, 2007

This concept is well explained in the Wikipedia, and also in the codeproject. What confuses me is how this magic mentioned in Wikipedia is played:

While ADL makes it practical for free functions to be part of the interface of a class, it makes namespaces less strict and so can require the use of fully-qualified names when they would not otherwise be needed. For example, the C++ standard library makes extensive use of unqualified calls to std::swap to swap two values.

Here is my lousy approach to mimic the magic:

#include <iostream>
using namespace std;

void g();

void g() { cout << "g is outside NS" << endl; }

namespace NS {
        class A{};
        void f(A) { g(); cout << "f is called" << endl; }
        void g() { cout << "g is inside NS" << endl; }
}

int main()
{
        NS::A a;
        f(a);
}

with the following output:

g is outside NS
f is called

It looks good, suppose customized is missing, aka the Ln 6 is deleted. Compile, link, oops, gcc complains that the undefined reference to `g()’. Let’s move the g in the namespace before the f:

namespace NS {
        class A{};
        void g() { cout << "g is inside NS" << endl; }
        void f(A) { g(); cout << "f is called" << endl; }
}

It works, and the g inside NS namespace is called, no doubt, there is only g defined here. Let’s add a global g and check whether customized function is called, the full source code is:

#include <iostream>
using namespace std;

void g();

void g() { cout << "g is outside NS" << endl; }

namespace NS {
        class A{};
        void g() { cout << "g is inside NS" << endl; }
        void f(A) { g(); cout << "f is called" << endl; }
}

int main()
{
        NS::A a;
        f(a);
}

Wow, the output is:

g is inside NS
f is called

The global g is never called by NS::f! Is there something wrong? Please leave a feedback if you have an answer.

SUIF2 installation notes

Development March 23rd, 2007

SUIF2 is the infrastructure for compiler development. If you want to develop a compiler in 2 weeks, SUIF2 may be a good choice for you, at least in SUIF2 documentation, it looks quite powerful and easy to extend.

Setup the working environment

The distance of the dream and reality makes our lives more colorful. The SUIF2 can not pass the compilation in modern C++, for exmaple GCC -4.1.1-r3 in my Gentoo. I googled and found there is literarlly NO success story to compile SUIF2 with GCC 3.x, let alone GCC 4.x. I do believe some graduate students have ported it to the latest GCC as in-house work, but nobody would like to share this work. Ok, let’s face the reality, since gcc 2.96 would mess up my Gentoo’s profile, I decided to install RedHat 7.0 on my old Pentium III box.

It took a little while to find the legacy ISO, but the install procedure was quite smooth, the ugly text installation wizard reminded me the time in my college, some Linux fans scratched heads to make X works. The bottom line is a Linux box with GCC 2.96 and SSH access.

Prepare the dependencies

Besides the dependency on gcc 2.96, SUIF2 also depends on the graphviz and gv for the graphics output. I could not find the RPM for graphviz due to the “RPM dependency hell” of RedHat. This is just the option. Let’s move on.

Build the SUIF2

Just follow this HOWTO.

cd /path/to/the/expected/destination/of/suif2
tar xvfz ~/tmp/basesuif-2.2.0-4.tar.gz
cd nci
patch -p1 < /path/to/patches/gcc296.patch
./install
make setup
make

Done.

NOTICE install would setup the $NCIHOME to the current working directory, you may override it after the build.

Test drive

Here is a canonical helloworld in $NCIHOME/bin, create_suif_hello_world. If the indented output is generated, the basic suif2 environment is setup correctly.

To invoke the SUIF2, you still need PGI C Compiler front end, pgcpp1. You could get it from the SUIF2 website. And you might need to setup the LD_LIBRARY_PATH to load all SUIF2 and pgcpp’s shared objects. In my case, SUIF2 is installed in $HOME/opt:

export PATH=$PATH:$HOME/opt/nci/bin
export LD_LIBRARY_PATH=$HOME/opt/nci/lib:$HOME/opt/nci/solib

Now, for a road test, just prepare a helloworld.c, and run

c2suif helloworld.c

If you get helloworld.suif, Bravo!

What is the next?

The suif output is for the machine to parse, not readable for men. So we might need to install extra viewer for SUIF2.

cd ../
wget http://suif.stanford.edu/pub/suif/extras-2.2.0-4.tar.gz
tar xvfz extras*.tar.gz
cd nci
make

Extras would install several viewer for .suif file, for example, suif2c helloworld.suif would generate helloworld.out.c to inspect.

C++ Study Note (3) – typename, class and template

Development February 27th, 2007

C++ is strong type-safe language, the C++ compiler needs to check type and verify syntax correctness. C++ reuse the “*” as pointer dereference, also multiple, the template reuse the “< >“, which might be used in the stream “<<, >>” or logic operation, ( less than, great than ). C++ compiler may not figure out the ambiguous syntax meaning in some cases, for example:

template< class T, typename U> foo;

foo< int, vector<int>> v;  

// The following examples are copied from text book B1.1

iterator_traits<FwdIterator1>::value_type* pi = &*i;    

template <typename T, typename T::value_type>  struct sqrt_impl;  

{  return  x.convert<3>(pi);  }

Ln 1 demostrates either class or typename can be used to declare the template parameter.

Ln 2 shows a pitfall for “> >”, if there is no space between two >, the compiler would regards it as “>>” stream operator instead.

Ln 4 is a typical case when to use typename. The compiler could not determine whether

iterator_traits::value_type

is a type or a value. If it is a type, * would be the pointer decorator, otherwise, * is the multiplex. We need to help the compiler to clarify the ambiguousness by adding typename decorator like this:

typename iterator_traits::value_type

Ln 5 is the example of misusing typename as the template parameter. Since typename T::value_type could also regard as the typename declaration. The best approach is to use class only.

Ln 6 is the example when to use template keyword to disambiguate template. x.convert could be a template member function or member variable; therefore, the succeeding < > could interpreted as template or less than, great than operators. The work-around is like this:

{  return  x.template convert<3>(pi);  }

Here are some rules of thumbs:

  • typename is required anywhere in templates on qualified dependent names that denote type.
  • typename is forbidden on the name of base class.
  • template is required before dependent names accessing member templates via . , ->, or :: qualification.

C++ Study Note(1): Size matters

Development February 17th, 2007

C++ is so versatile that most likely the application runs with design flaws. Before we dive into this wonderful language, let me summarize overlooked tricks and pitfalls for you, all test cases are compiled and run in gcc (GCC) 4.1.1 (Gentoo 4.1.1-r3)

These are rules-of thumb for the object size:

  • The size of the empty object is minimum 1 instead of 0.
  • Static member does not consume space.
  • Objects with virtual function would pay the price for vtable, aka 4 bytes[1], the good news is this is fixed price.
  • Size of derived object is the sum of base objects.
#include <iostream>
using namespace std;

class Base {};

class StaticBase
{
public:
        static int foo;
};

class BaseWithVTable
{
public:
        virtual ~BaseWithVTable() {};
        virtual void bar() {};
};

class Derived : public BaseWithVTable
{
public:
        virtual ~Derived() {};
        void foo() {};
        int bar;
};

class MoreDerived: public Derived , public StaticBase
{
private:
        int hello;
};

int main()
{
        cout << "sizeof(Base) = " << sizeof(Base) << endl;
        cout << "sizeof(StaticBase) = " << sizeof(StaticBase) << endl;
        cout << "sizeof(BaseWithVTable) = " << sizeof(BaseWithVTable) << endl;
        cout << "sizeof(Derived) = " << sizeof(Derived)  << endl;
        cout << "sizeof(MoreDerived) = " << sizeof(MoreDerived)  << endl;
}

The output is:

sizeof(Base) = 1
sizeof(StaticBase) = 1
sizeof(BaseWithVTable) = 4
sizeof(Derived) = 8
sizeof(MoreDerived) = 12