Learn the “C” programming language by studying real projects: SQLite case study

Many resources exist talking about the better way to learn a programming language, we can enumerate:

  • Reading a book or a magazine.
  • Web sites.
  • From a collegue.
  • Doing a training.

Another more interesting approach is to study a known and mature open source project to discover how their developers implements the code. In case of C language a good candidate could be the  SQLite source code.

What’s interesting with SQLite is that it’s accessible even for the beginner C developers, its contributors use very basic coding rules, what makes it very easy to understand and maintain.

Let’s go inside the SQLite source code using CppDepend and discover some basic coding rules adopted by their developers.

Encapsulation

Encapsulation  is the hiding of functions and data which are internal to an implementation.  In C, encapsulation is performed by using the static keyword . These entities are called file-scope functions and variables.

Let’s search for all static functions by executing the following CQLinq query

linux17

We can use the Metric view to have a good idea how many functions are concerned. In the Metric View, the code base is represented through a Treemap. Treemapping is a method for displaying tree-structured data by using nested rectangles. The tree structure used in a CppDepend treemap is the usual code hierarchy:

  • Projects contains directories.
  • Directories contains files.
  • Files contains struects, functions and variables.

The treemap view provides a useful way to represent the result of a CQLinq request, and for the previous request  the static methods are the not greyed ones.

sqlite

As we can observe many functions are declared as static.

Use structs to store your data model

In C programing the functions uses variables to acheive their treatments, theses variables could be:

  • Static variables.
  • Global variables.
  • Local variables
  • Variables from structs.

Each project has it’s data model which could be used by many source files, using global variables is a solution but not the good one, using structs to group data is more recommended.

Let’s search for defined structs:

sqlite22

Many structs are used to specify the data model.

Let function be short and sweet

Here’s from the linux coding style web page, an advice about the length of functions:

Functions should be short and sweet, and do just one thing.  They should
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,
as we all know), and do one thing and do that well.

The maximum length of a function is inversely proportional to the
complexity and indentation level of that function.  So, if you have a
conceptually simple function that is just one long (but simple)
case-statement, where you have to do lots of small things for a lot of
different cases, it's OK to have a longer function.

 Let’s search for functions where the number of lines of code is less than 30

sqlite5

More than 90% of functions has less  than 30 lines of code.

Function Number of parameters

Functions where NbParameters > 8 might be painful to call  and might degrade performance. Another alternative is to provide  a structure dedicated to handle arguments passing.

sqlite6

only few methods has more than 8 parameters.

Number of local variables

Methods where NbVariables is higher than 8 are hard to understand and maintain. Methods where NbVariables is higher than 15 are extremely complex and should be split in smaller methods (except if they are automatically generated by a tool).

sqlite7

only few functions has more than 15 local variables.

Avoid defining complex functions

Many metrics exist to detect complex functions, NBLinesOfCode,Number of parameters and number of local variables are the basic ones.

There are other interesting metrics to detect complex functions:

  • Cyclomatic complexity is a popular procedural software metric equal to the number of decisions that can be taken in a procedure.
  • Nesting Depth is a metric defined on methods that is relative to the maximum depth of the more nested scope in a method body.
  • Max Nested loop is equals the maximum level of loop nesting in a function.

The max value tolerated for these metrics depends more on the team choices, there’s no standard values.

Let’s search for functions candidate to be refactored:

sqlite8

 

only very few functions could be considered as complex.

Be Const Correct

C provides the const key word to allow passing as parameters objects that cannot change to indicate when a method doesn’t modify its object. Using const in all the right places is called “const correctness.” It’s hard at first, but using const really tightens up your coding style. 

Let’s search for functions having at least one const parameter:

sqlite9

Function coupling

Functions using many other ones are very difficult to understand and maintain. It’s recommended  to minimize the efferent coupling of your functions.

For SQLite very few functions have a high efferent coupling:

sqlite10

If you can exit a function early, you should.

Early exits out of a function, specially through guard clauses at the top of a function are preferred since they simplify the logic further down in the function.

In the SQLite source code this best practice rule is applied for almost all the functions.

What’s not recommended from the sqlite implementation

The sqlite3.c file contains many functions, structs and variables which is not recommended. However Sqlite3 is designed to be embedded in other projects and it’s very practical to have one file to embed.

Conclusion

Exploring some known open source projects is always good to elevate your programming skills, no need to download and build the project, you can just discover the code from GitHub.

 

Leave a Reply

Your email address will not be published. Required fields are marked *