Sunday, May 11, 2003

Right tools for the right job

Copyright Notice

Copyright:Ramesh Ananthakrishnan 11th May 2003

Pilfer at your own Peril

First, a few words about tools.
Basically, a tool is an object that enables you to take advantage of the
laws of physics and mechanics in such a way that you can seriously injure

----- Dave Barry in the Miami Herald.

Enginners are like painters. They bring your imagination into reality.
Nothing less, nothing more.

----- Scotty [stardate 98613.45765.4.21]

Tools about the house

You everyday use around a hundred tools. Starting from toothbrushes, to knifes, and paintbrushes
to pens, there are millions of tools that you use everyday. In fact, you are very particular
about the tools that you use. No more would you brush your teeth with your paintbrush, than
paint your flat with your toothbrush. Somethings aren't just done. Then why is it that, most
IT proffesionals know abt. just two-three languages when they graduate. In fact, they attempt
to implement all sorts of projects with just these three four languages. In fact, they hardly
even use other tools to improvise productivity. No more is this more evident than in the project
that they take up during their final sems.
Let me count as a mere exercise
the number of languages that they know.(NOTE: IT'S WHAT THEY KNOW. NOT WHAT THEY ARE TAUGHT)

1. C (good)

2. C++ (o.k.)

3. VB (excellent)

4. Java (bad)

5. Assembly

6. Cobol (gone the way of all mortal things)

7. Pascal (gah...)

Seven languages!!!! that's it? Most of these languages have been around since
the 1960's, and the only one that has a form of garbage collection and dated post
1980's is Java.GREAT!!!
And the only interpreted language is assembly. Great!!!! Since this is
what most students are taught, and since none of them is even taught to link against
a library they certainlly come up with a very warped idea of what CS is all about. They
think interpreted languages are slow, everything can be done if you know only C, VB is
the only way to draw Windows, and then spend three months trying to make an OS on a disk
and 6hrs. before project submission are still wondering why their string library is leaking
memory. Sheesh!!!

Programming in the Real World

Programming in the real world is all about using the right tools. Nobody cares if you
are a gung ho C expert. You can't even get code to work on unless you know how to use
some basic tools like CVS, diff, grep & perl. Yes! And if your project grows really big
nobody is going to give you credits for having done parts of it in deadly Object Oriented
C++ or Java. The trick is to see the larger problem and think about finishing the project
and not get bogged down in using all the finer aspects of the language. In fact, programming
in the real world is about making the right design choices from using the right
language, to the right object designs. For your own sake do not reinvent every bloddy
technology you encounter in C and don't think C is the only language around.
Some of the language, tools, and technologies encountered in the real world are:

1. COM/Corba XPCOM

2. XML

3. SQL and varieties

4. Perl

5. Python & Jython

6. Ruby

7. Swig

8. PHP Meerkat

9. Javascript

10. Lisp (CL, Scheme, guile)

11. Qt Gtk, KDE, Gnome, Tk and XUI

12. Rose suite, Unit testers

13. TCL

14. OML, caML

15. MatLab's M scripts. Ocatve, shell scripting.

But there are a lot of myths about some of these technologies. In fact, they are now
more urban legends. So let me dispel the myth for you and hope to convince you to use
the right tools in the real world.

Myth 1: You can do everything in/using C

Nope you can't. Even for programming in C you end up using a lot of tools that were
written in C. Like your editor and compiler. You don't start writing a C program,
by composing an editor in assembly first, and then making a C compiler in assembly.
You use the tools as is, and that's right. Other tools like PHP , Perl are tools
written in C, to do a set of jobs elegantly. In which way they are just like your text
editor. They do the job they were meant to beautifully well. Which should reveal a truth
about C to you. It's like the scaffolding outside used to construct a building. You cannot
construct the building without the scaffolding. But neither can you use the scaffolding for
everyday living, or build a building from scratch everyday. Tools are like that. They
are prebuilt and meant to ease life for you in certain particular areas. There's a more
philosophical note there but I'll leave it to you to explore it.

Myth 2: You can't do major stuff using some of these tools


TRUE FACT: Two friends of mine tried to write a program to solve a linear equation.
They took 6 days and at the end of that time were still struggling to make sure that the
strings grew dynamically. In Perl the entire script was 6 lines, and that includes comments,
and other features they acknowledge they could never have completed.

Yea! Yea! I know the argument. But I know only C! you say. Ever thought of learning Perl. It's
easy, comes with it's own documentation and other goodies, and can completely be learnt in a day.
I agree with Spolsky, that learning a new language is an investment of time, but that's time
well spent. I'd rather spend one day learning Perl and writing 6 lines of Perl code
to do stuff, than write 200 lines of C to do the same, and wondering where the memory leaked.
Here again use your discretion. If C is your forte, and you usually run up a pretty good
kernel before lunch, use C. However if you are having problems with pointers and lot,
shift to Perl. You are not going to be any the worse off. In all cases use your common sense.

Myth 3: Some of these languages are interpreted and we know interpreted=slow

ROT!!! Everytime I hear that I'd like to tell you people that the end result of compilation
is interpretation at the assembly level.
Sure interpreted languages compile and parse your program every time it's
executed, but the general structure of the language is so much better that somedays C compilation
seems to take eons compared to the time it takes to run a Perl script. In fact, interpretation is
so neat and just in time, that I frequently write programs that correct themselves. And some
interpreted languages are tuned for some jobs that really there is no way to match their speed
even using C.

e.g.: for searching patterns in a file, a C program and a Perl script were compared. Total
execution time for C program 25 seconds.
Perl code did the
same job in 2 seconds. The C code was hand tuned for speed. If that's the best the proffesionals
could do I wonder how much faster you can make your C code. And why do you need the speed,
are you doing Kernel programming? Remember, good algorithms and thought can make your programs
fast. If your algorithm is lousy there is no use blaming Perl. Your program wouldn't do well even
in C.

Myth 3: Only Java has a GC,references, VM, cross platform,security,netowrking

Java though nice is a pretty new language. In fact, it borrowed the concepts
of GC, references, platform portable, networking all from other languages. In fact,
there is nothing that cannot be done in Perl or Python or Ruby in a easier fashion
than is possible in Java. In fact, Java is mightily slow compared to languages like Perl
and Python. Both Perl and Python have GC's and references. They also have their own
virtual machines and can do windowing code both in Linux and Windows and MacOS with the
same functions. They are platform compatible, generate bytcode and don't stuff OO down
your throat. They also are extremely robust and don't hog memory like the JVM.
In fact 90% of the networking code on the Internet is written in Perl. Sys admins trust it.
8 years after Java's introduction to the world it has yet to account for even 2% of the world's
networking code.
DNS run on Perl. I have yet to see one that works on Java or a sysadministrator who trusts
a Java program to do his work.

Myth 4:Some of these languages aren't OO

OO is like colour. Even though a paper is black and white it's easy to splash it with
colour. If its necessary you do it. Java forces OO down your throat, and needlessly puts OO
in places where it isn't wanted. In fact, like C++ Java evolved with OO as a retrofit. Plus
like C++, Java has a ridiculous type mechanism. With the result, most times you waste time
getting the program to pass the compiler. If you want real solid OO try Ruby, or Jython. Both
do what Java does only better. Perl does OO, and it's non paranoid OO. This means lots
of saved effort. 90% of the time your project just works.... And anyway OO is a concept.
Sometimes it's needed, sometimes it isn't. A project should use OO only if it's necessary,
not as a boast.


If you check the Hindu ads, over the past one month you'll find that 50% of the jobs are
for guys who have knowledge of financial packages that are beyond the reach of students.
e.g.: Siebel, SAP,...
Of the remaining a large part of the skillsets are C, but other langauges you should know
include Perl, Python. The toolkits you should know include MFC, Qt. But you are in great
demand if you know Linux Kernel Programming.
Jobs are available for EJB, Java, Swing guys but these are available only in the finance
sector. Since, many many people know Java, it would be nice to differentiate your resume
from the hoi polloi by including another language or tool to your resume.

Myth 6:But most of these technologies are on Linux, and that's hard to learn

Heh! All these technologies are also available for Windows. In fact, unlike Windows
technologies all Linux ones are almost platform agnostic. And dear me bhoy, you'd better
atleast know Linux if you forsee any scope in continuing in this field. And forget abt. it
being difficult to learn. In fact, Linux (o.k. Linux is used here to denote Unix variants) is
much better logically organized than all of windows.... Yes! it's difficult to find people
who know beans abt. Linux. But that's good. If you know beans about Linux while nobody
else has a clue that means you've specialized man... Join your nearest Linux user's group
and learn Linux.

Myth 7: But these tools cost money

No they don't
Most times Windows comes up with a new technology, it's proven pretty expensive. When VC 5.0
came out I thought I'd buy the VC compiler and the suite. Wow!!! I found out it would cost
me a cool 1.5 lakhs. So I dropped that and installed a pirated copy. The problem was a complete
lack of documentation. So I tried purchasing a MFC for Windows book (Cost:1600Rs/). It didn't
tell me how to animate the wallpapers. So I bought Advanced MFC (Cost:2600 Rs/). Advanced MFC
said the answer would be in Win32 API and why don't I go look there. So I borrowed Win32 API
from the library. Searched it for a whole month. Didn't have a clue, though it asked me to look up
the call in MSDN (late fee: 160 Rs/). MSDN CD costs (Rs:400/). And even then I didn't have a clue.
Great!!! So I switched to Linux and believe me the sweetest thing was that the documentation
comes along with the system. I have learnt PHP, Perl and Python on my own and the total amount I
spent was nil. (Hint: I borrowed the Linux CD from my neighbour.) And don't worry. The Linux
philosophy is sharing. So you need not be concerned about copying software. In fact,
Linux enncourages it.

Myth 8: But all the hot projects are in Windows

No way!! Almost, all research takes place on Linux systems. Other than IIT maggots who use
Windows, and have hardly done any research, you can find a lot of cool cool projects in Linux.
Check out slashdot and other tech sites to learn abt. the
cool cool projects you can contribute to. Also remember that most of these projects are
international, and are recognized throughout the world, and the only admission requirement
is that you can code well. Think about it! You can participate in a high level
international research project for absolute peanuts
Isn't that something that would
look nice on your resume?

Myth 9: But there's nothing like VB in Linux

Heh!! In fact there are too many VB's in Linux. There is wxpython and wxperl and tk and tcl
and you even have a C# compiler and .Net libraries in Linux. Check out some of these
tools, they infact work better than even VB.

Myth 10: But C has excellent library support. Do these tools?

Yes they do. In fact, the amount of support they have is fascinating. In fact, it's easy
to find libraries to do whatever you want to in Perl and Python. They are neatly packaged
and are easy to install whatever be the platform or machine you are using. The packages
themselves easily take care of dependencies and all. Try doing that in C!

In fact, using some of these tools can really improve your chances of being hired, besides
improving your knowledge of computing in general. Don't hesitate.. Pick a new tool today.

The Zen of Libraries (Part III) Empower yourself: use Libraries

Copyright Notice

Copyright: Ramesh Ananthakrishnan 11th May 2003

Pilfer at your own Peril


If you have read the previous two articles, you'd like to now probably use libraries whenever you
program and are looking for some general hints. Well... I'd like to empower those people
who are still stuck with Turbo C and also help those who are using gcc on Linux and scratching
their head over what to do. So the article will focuss on these two specimens.

Rule 1: Search Google for Libraries that already do your job

For any given project there exist libraries that do around 70 % of your work. Use them. The major
problem is that most people are hardly aware these libraries exist. Hence they end up
reinventing the wheel. Also make sure that you can compile the libraries on Turbo C or with
gcc, and make sure that those libraries don't depend on other libraries, (this condition
is also known as dependency hell) something I'll come back to later. Once you get your libraries
make sure that you also download the headers for the library. Remember before you call a
function the function has to be defined. The header files contain the definition so please
use that. Without the header file you are in deep shit! So remember to download both the
library and the headers. And please search google atleast 10 times with different queries.

Rule 2: Divide your work into libraries

Divide your work to be composed of different libraries. E.g. : If you are writing an Audio
player, divide it into the following libraries. One to actually interpret the File and
give you raw data to be fed into the speakers. One to control the speaker volume, one to
draw the buttons on screen. Once you divide the stuff like this, you'll find out that
you can actually search google with exact terms, and find libraries that do what you want.
Test each of these libraries separately. This makes for easy development. It's easier to
test and debug 2 files with 100 lines of code, than to test and debug one program with
60 lines that calls the 2 files with 100 lines of code each abt. 35 times individually.
Do unit testing.

The Golden Rule: Once you start writing the main loop don't touch your libraries

This is the most important rule and one that requires iron discipline. If you cannot touch
your libraries once you start writing main, all your libraries should bloddy well do whatever it
is that you require of them. This forces you to think about what your libraries should do,
and forces you to clearly indicate the requirements of your library and this saves you a lot
of effort later on. Believe me abt. this!

HOWTO 1: How do I start making a library ?

Simple. If say you write string handling functions. Then divide it into a header file say...
my_string.h that should contain only function definitions or macros. Only include definitions
that you need to use. E.g. If you have a function that reverses the string but you are never
possibly going to need in you project do not define it in your header file. Write the entire
function out in the C file. The body of all the functions should be in the associated C file.


HOWTO 2: How do I compile it into a library?

Once you have the header and the cpp file.

In Turbo C : Make a new project with an appropriate name. Add your header and Cpp file to
it. In fact copy them both so that they are there in the same directory. Modify your
include directories and source dierectories so that the whole thing works. Now change the
exe type from the menu from standard exe to shared library. Compile now. Two files will
be generated, an obj file and an lib file. If your files were called say my_string.h and
my_string.cpp then the library will be called my_string.obj and my_string.lib. These are the
files you want.

In Linux : If your files are my_string.cpp and my_string.h type in
gcc -c my_string.cpp -I./ and the output will be a file my_string.o .
This is your library file.

HOWTO 3: Great. How do I now use it in my program

In Turbo C: Now let's say your main file is string_main.cpp You have included
my_string.h from this file. Open a new project say string_main.
Modify the include paths so that the compiler can find my_string.h.
Now include in your project my_string.obj and string_main.cpp in your project. Now compile
and link. The output will be an executable like string_main.exe that has the features in
my_string.cpp. You can use my_string.obj in as many projects as you want and never need to
compile my_string.cpp ever ever again.
Congrats!!! You have just used a general purpose library in your programs.

In Linux :type in gcc string_main.cpp my_string.o . The output
is an executable a.out. That's it. Linux rocks man!!!

HOWTO 4: But this library says it depends on another library

Just like you can make an executable using a library. It's possible to make a library
using another library. The only way to solve this is to locate all versions of these libraries
your library depends upon download them their headers and compile compile compile till your
application works. Also make sure that your library is compiled for your platform, OS and with
the versions of dynamic libraries present on your system. Downloading a library compiled
with VC 5.0 with MFC build 3.1 and expecting it to work on a Linux box is so naive I have
to warn you against it. Ask the questions: Is it for my platform? my set of libraries? my
compiler? my machine?
If you still can't work it out please move over to Perl or Python

HOWTO 5: I changed stuff in my library but the program isn't changing

Remember, most libraries are statically linked. If that's the case once you change
your library you have to recompile your program. Write Makefiles to ease these situations.
What's a Makefile? Wait for my article on "The Zen of Make". If it's dynamically loaded
make sure that the path is right. And if all this fails switch to Perl.

The Zen of Libraries (Part II) Inside Stuff

Copyright Notice

Copyright:Ramesh Ananthakrishnan 6th May 2003

Pilfer at your own Peril


But what type of alien is it? And whence did it come? And what news does it bear?
And is it chocolate flavoured?

---Monty Python


In the Zen of Libraries (Part I) you saw why it was necessary to use libraries. This
explains the inside stuff on libraries... but beware! you need to know about assembly,
and a little about operating systems for this to make sense

Assembling the assembly

In order to make this article short, I'll assume that you all know what assembly is, how it's
generated from mnemonics. I'll also assume that you know that the compiler builds a parse tree
and that you know that the next stage is when the compile tree is interpreted to emit assembly
code. In fact if you had the right tools, you can actually see the assembly instructions involved
in the simplest program like the quintessential "Hello World" program.
Also I'll assume you know abt. offset & segment addresses.
So since we know that an exe is composed of straight assembly what is a library composed of.
Well... the same assembly instructions. In fact the difference between exe's and libraries is
very simple. It is this:

While libraries do not have any start instruction, all exes have a standard start instruction
and address.

What do I mean by that? Whenever you run a program. the OS actually copies your program into
memory and then, asks the processor to jump to a specific address where your program is loaded.
In fact when your program is compiled "offset" addresses are generated. The starting address
for an exe is a standard offset like say 42 bytes from the start of the program.
Once the program is loaded into a segment in memory, the OS
instructs the processor to perform a long jump to the segment (known to the OS) + offset start
address (which is common to every executable in the OS).
So if your OS loads your program right after the 16th KB. Then the processor jumps to
16KB + 48 bytes. Here there are instruction on what to do next. So the executable can be
say... executed. (Well that's not the entire story... read Stallings for a more complete
But what about a library???

Libraries are dummies

Since some functions are used by a large number of programs, (think printf) these functions
are already compiled into assembly and kept in library files. So whenever you call say
a printf instruction the linker copies the exact assembly instructions from the library
and inserts it into your program. So a library does not have a start address, it's just a
collection of useful functions. When programming, keep in mind that you are simply gathering
all these library functions and giving them some dierection under main. So if you don't have
a main function all you have to do is compile your cpp file and you have a library all
toasty for you to use.

Think of a C library as just the same as a library of books. If you have a purpose
like a term paper, you have a sort of idea of which books you want to read. You go in
pick those books, copy some stuff from it, add some of your stuff to it, and form a term paper.
That's what programming is all about. The term paper is like your program. It has a structure
to it, the same structure which is given to an executable by the main function. Now your
profs. can read and make sense of your term paper/program. However just giving your prof.
a library full of books as a term paper doesn't make sense. He may as well write the term
paper himself. What he wants is some stuff extracted from the library, some stuff you have
written that makes a term paper he can understand.

Shared & Static libraries

Just imagine a term paper in which all you guys xerox around a 100 pages of the Java complete
reference API and submit to your professor. Isn't that a terrible waste of paper. Well, long time
ago when disk space was precious, it was (still is) asinine to copy the printf assembly
instructions into every tom, dick and harry's program. This lead to the concept of shared
libraries. This means that instead of everyone xeroxing the 100 pages, all of you would write
a small instruction in your term paper like say:
Look at The Java Complete Reference pg. 123
and the prof. would actually do this. This assumes that the prof. also has a copy of the
Java Complete Reference. This is what shared libraries are. In Turbo C the graphics libraries
are shared, so the instructions to draw on screen are not copied into your program.
Turbo C assumes that all machines have a copy of the Turbo C graphics library on their machines.
So when you run a graphics program the graphics library is first loaded into memory.
Then the OS notes where exactly all the graphic functions are in memory. Now when
a call to say... putpixel is made, the OS takes charge figures out where putpixel
is and then instructs the program to jump there. Like all else the above
description is a major simplification. For a detailed nuts & bolts view, look up Stallings.

But is that all...? and How do I use libraries....?

Well not exactly. Libraries are a large, large subject. Since their a little
high level most books that deal with programming languages don't teach you
anything about it. The books that deal with OS internals and high level stuff like
that don't deal with it because they assume that by the time you are reading OS books
you already know the zen of libraries. To compound the fact most buggers who CS nutcases
may know their algorithms, but fail to understand the basics abt. libraries. Which is why
I wrote the previous two articles. But how do I use libraries?? you ask. Fear not!
Read Part III of the Zen of Libraries on howto use libraries in your code.