Book report, part 2: Introduction to Programming with Fortran
In part one of working through this book, I briefly covered the first 200 pages of Introduction to Programming with Fortran. Mostly, I had positive things to say; despite skipping a handful of topics true beginners might want to know, I found the examples helpful and felt like I was starting to get a decent handle on the language. For the next 200 pages though, my impressions have been a little more mixed.
Some of the topics from this third of the book were:
- Functions
- Control structures (loops and if statements)
- New data types (characters, logicals, complex, user defined types, pointers)
- Subroutines
- Modules
- Data structures
- Algorithms and Big O
- Operator overloading
- Generic programming
- Some mathematical examples
Code examples continue, perhaps to a fault
Certainly the authors have set out to cover a lot of material, and the pattern of code examples for most points continues. However, I'm getting the sneaking suspicion that the later material was increasingly tacked on to ensure coverage of certain key topics. The code examples are less of a supplement to the text and more the means by which material is conveyed. And they're not without issues. There have been typos of varying severity that don't impact the ability of the code to run or produce correct results, but for someone new to programming, could be extremely confusing. And even for someone who is familiar, it can be distracting. Sometimes this is pretty minor, like when a field in a data type is referred to both as a capital and lower case "c". Fortran is case-insensitive so it's not a problem practically. But the most egregious so far starts off as follows:
character *30, dimension (4) :: heading = [& ' Allocate = ',& ' Random number generation = ',& ' Sort = ',& ' Deallocate = ']
In a nutshell, this is creating a set (technically, an array) of four strings (technically a character array) each of which can be up to 30 characters long. It also sets the values for each string, with the "&" indicating that the code statement continues on the next line. Later on, the code prints out each of the strings before their corresponding values, hence the "heading" name of the variable. Or at least, it's supposed to.
What it does instead is that it (mistakenly, so far as I can tell) overwrites this array right before printing out whatever heading it wanted. So for instance, right before printing out the second member of the array (' Random number generation = ') you have this line of code:
heading = ' Random number generation = '
What this does is it changes every string in the array to be this particular one, overwriting the original intended definition. This doesn't cause issues because when the second element of the array is used, it's the string that was intended, but for the wrong reasons. It is, frankly, sloppy and might mislead readers less familiar with how the code functions.
I should also say that as I've found these, I've written the authors to let them know and to make sure I'm not misinterpreting the code myself. They have responded to some of the earlier issues and say that some changes will make their way into the next edition, but I haven't heard back on some later ones.
The earlier chapters were also much better about providing data files for the example code to read in and manipulate. As the text has progressed, however, there are none provided on the book's website, meaning that unless I have the patience to create test data on my own (I don't) these functions aren't really feasible to run and get a sense of. Providing this sample data would be a great help.
The code examples can also be needlessly drawn out at times. For instance, in the chapter discussing how to build functions that can deal with different input types (like if you wanted to give it integers sometimes, but reals others), the example has seven input variants, with the combined code occupying about 7-8 pages. The issue it's trying to illustrate is a valuable one, but could this have been done with only two or three variants, each with more distinct behavior and less complicated code? I would like to think so.
And as a side note, the code font is such that "l" and "1" are almost indistinguishable (that would be the lowercase letter L and the number 1) and there have been cases where I've been thrown off by that. Expanding "l" to "left" would have done wonders for readability.
But why though?
I found myself having to consult Google a little too often for my liking while reading. One thing I'm perhaps most confused about is the effective difference between functions and subroutines. For those of you unfamiliar with one or both, functions are self-contained pieces of code that contain instructions on how to manipulate their inputs. In a mathematical sense, y = mx+b is a function where the output (y) is determined by multiplying a constant m by the input (x) and adding b to the result. So in the code you might see something like:
y = f(x) real function f(x) real,intent(in)::x f = 2*x+7 end function f
What this is saying is that for any real values, given a value of x, the function f will multiply it by 2, add 7, and output that answer to y. The "intent" line here is to specify that x should not be modified in the function itself and helps to ensure that accidental changes of values do not occur.
Subroutines work in much the same way, except they don't return values per se. Here's how the same calculations would look using a subroutine:
call f(x,y) real subroutine f(x,y) real,intent(in)::x real,intent(out)::y y = 2*x+7 end subroutine f
Obviously there are differences, but so far as I can tell there's nothing unique to using a function verses a subroutine, except that functions can be used like this:
y = sqrt(sin(x)+cos(x))
Because of the "call" mechanism, you can't do this with a subroutine. But is that the only difference? The internet seems to be of the opinion that the two should be different in that functions should never alter their inputs and subroutines should or at least could. It's also possible that the two are a product of backwards compatibility, basically leftover strategies no longer needed in modern computing, but retained because a lot of old code would break if it suddenly disappeared. Are there best practices for when either should be used over the other? I'm not sure, and the book only says that subroutines and functions are both procedures and are used to modularize and streamline code. True, but not terribly helpful.
This is not the only example where I felt like seemingly arbitrary decisions in the examples were a matter of a best practice or style, and in fact having this guidance would be great both for writing good code myself and for understanding code I may encounter out in the world.
Explanation, after the fact
Another issue I find myself running into is concepts that are used in the example code (which, remember, is increasingly the way concepts are conveyed) and are only officially introduced chapters later. This is done with modules, which are used as early as chapter 12 but aren't formally introduced until chapter 21, about 125 pages later. The code examples themselves are guilty of this. For instance, it turns out that a subroutine can be passed as a variable to a function. This may be familiar to anyone who has programming experience in languages like R or Python, but for a true beginner, I could see this being frustrating and confusing to come across while reading, only to realize that a short explanation is at the end of the code. I feel like this information should be in the introduction of the example, so that people know what to look for and expect. And, with this example in particular, this is a really powerful tool that should get a way more thorough discussion than a footnote.
In fact, overall, I would say that the entirety of the text needs to be restructured, but more on that later and in upcoming reviews.
In conclusion
I suspect my growing frustration is clear, but it's not all bad. Even with all the issues I've mentioned, I still feel like I'm getting a good introduction to the language, and may well have a pretty good overview of the language by the time I'm finished. I just can't help but feel like this book should be broken down into two or three smaller ones: a proper introduction of basic topics for true beginners, an intermediate book covering things like overloading, Big O, and the more advanced precision concepts, and an advanced books for parallel computing and the other topics I'm about to get into. I just don't think a book discussing interoperability with C still counts as an introductory text, and feeds into my suspicion that material has been tacked on as the language has grown.