[cloc]

CLOC : Counting Lines Of Code

Version: 1.00 (17 Jan 2021)
(C) Richard Coleman, 2021
Licensed under the terms of the MIT license


 

Written in BASIC, CLOC is a command line program which counts blank lines, comment lines and physical lines of code in 17 programming and scripting languages:

It provides statistics on the numbers of blank lines, comment lines and code lines in files that it recognises as these language files.


Download 1.00 (34K)
 

Installation

To install CLOC, simply place the BASIC file “cloc” in the library directory (Usually !Boot.Library, or under RISCOS 4: !Boot.Library.User).
If you are using RISCOS 4 then put the text file “cloc” in directory “RISCOS4” into the Library Help directory: !Boot.Library.User.Help.

 

How to use

To run CLOC, open a TaskWindow and at the command prompt, type “cloc” without any parameters to display the syntax. It will also accept -h or -help to display the following syntax.
 
Syntax: *cloc [-nowarn] [-nolist] [-noignore] [-quiet] [-stats] <file spec>

-nw, -nl, and -ni can be used in place of -nowarn, -nolist and -noignore respectively.
 
The Wimpslot of the TaskWindow needs to be at least 192Kb otherwise you’ll get the error “No writable memory at this address”.
 
To process files specify the filename or directory on the command line, for example: “cloc c.HelloWorld” will process the file “HelloWorld” in the directory “c” in the currently set directory.
To process all files in the “c” directory, just use: “cloc c” which will process all files and sub-directories.
 
Wildcards may be used, for example: “cloc c.Hello*” to process all files beginning with “Hello”.
 
-nowarn

Whilst it is reading a file, CLOC will perform a number of checks and issue warnings, such as string not terminated. To turn these warnings off, add the option -nowarn.
 
Any errors reported, which will be because either CLOC has detected something wrong in its processing or it can’t open a file or it runs out of memory, etc are reported regardless of whether -nowarn is specified. If an error is reported then CLOC will stop processing files.
 
The only exception to this is the error: “Line counted as neither COMMENT nor CODE”. This is the only error that does not cause CLOC to stop processing files. Once all the files have been processed you may also get the message: “Error: Line count mismatch” or “Error: Total line count mismatch”. This error indicates there is a bug in the code as CLOC should count a non-blank line as either comment or code, never neither. Please let me know if this happens so I can fix it!
 
-noignore

CLOC also lists those files it has ignored because it does not recognise them, either because of the filetype or because it has been unable to guess the type of file. It will also list files and directories that are empty. To stop listing these files use the option -noignore.
 
-nolist

CLOC will list the files that it is processing as well as the directory path, and it will also update the display showing how much of the file it has processed. To turn off this listing use the option -nolist.
 
Warnings will still be displayed unless -nowarn is also specified on the command line along with -nolist. Ignored files will also still be displayed unless -noignore is also specified on the command line.
 
Specifying -nowarn, -nolist and -noignore will result in no display (except a spinning wheel so you know CLOC is doing something) until CLOC has processed all the files and displayed the results at the end.
 
Please note that the spinning wheel and percentage display will not be updated correctly if you are using a TaskWindow provided by !Edit as !Edit does not handle the VDU codes used. Either use StrongEd or Zap, or use the -quiet option.
 
-quiet

Specifying -quiet will result in nothing being display, not even the spinning wheel, whilst processing the files. Any errors will, however, still be displayed.
 
-stats

Once all files have been processed then a break down of the languages is listed, giving the number of files, total number of lines, blank lines, comment lines and code lines per language. A percentage breakdown is also shown with the total of all languages. If the percentages do not equal 100% this will be due to rounding errors.
 
If -stats is specified then extra information is displayed. As each file is processed, the line counts for that particular file are displayed (as long as -nolist has not been set). And once all files have been processed, CLOC will show how long in seconds (or centi-seconds) it took to process the files, along with an indication of how many lines were processed per second. Lines per second are only shown where processing takes at least 1 second. And for each language the maximum line length out of all the files for that langauge is shown along with the average line length of comment and code lines.
 

Examples

Below are two examples of using CLOC on itself, showing the extra stats available.
Current directory has been set to SCSI::SSD.$.Develop.CountLines

*cloc cloc
Count Lines Of Code 1.00 (17 Jan 2021)
   Reading BASIC file: 'SCSI::SSD.$.Develop.CountLines.cloc' ... 100%

Processed 1,795 lines
-------------------------------------------------------------------------
Language           Files       Lines      Blanks    Comments        Code
  BASIC                1       1,795         235         306       1,254
                                            13.1%       17.0%       69.9%

And now with extra stats with “-stats” ...
 
*cloc cloc -stats
Count Lines Of Code 1.00 (17 Jan 2021)
   Reading BASIC file: 'SCSI::SSD.$.Develop.CountLines.cloc' ... 100%

Processed 1,795 lines in 1.95 seconds (920.5 lines/s)
----------------------------------------------------------------------------------
Language           Files       Lines      Blanks    Comments        Code   MaxLen
  BASIC                1       1,795         235         306       1,254      187
                             Average line length:         59          30
----------------------------------------------------------------------------------
                                            13.1%       17.0%       69.9%

And below, processing an application showing the different options.
 
*cloc !textview
Count Lines Of Code 1.00 (17 Jan 2021)
Processing files in directory 'SCSI::SSD.$.Develop.CountLines.!textview' ...
   Reading Obey file: '!Boot' ... 100% ** Warning: Last line not LF or CR terminated **
   --- Ignoring unrecognised file: '!Help'
   Reading Obey file: '!Run' ... 100% ** Warning: Last line not LF or CR terminated **
   Reading BASIC file: '!RunImage' ... 100%
   --- Ignoring unrecognised file: '!Sprites' (Filetype: &FF9)
   --- Ignoring unrecognised file: 'TestText'
   Reading BASIC file: 'TextLib' ... 100%

Processed 547 lines in 4 out of 7 files
Ignored 3 files
-------------------------------------------------------------------------
Language           Files       Lines      Blanks    Comments        Code
  BASIC                2         535          89          73         373
  Obey                 2          12           3           2           7
-------------------------------------------------------------------------
  Total:               4         547          92          75         380
                                            16.8%       13.7%       69.5%

And now with extra stats with “-stats” ...
 
*cloc !textview -stats
Count Lines Of Code 1.00 (17 Jan 2021)
Processing files in directory 'SCSI::SSD.$.Develop.CountLines.!textview' ...
   Reading Obey file: '!Boot' ... 100% ** Warning: Last line not LF or CR terminated **
      Counted 4 lines:  Blanks = 1  Comments = 1  Code = 2
   --- Ignoring unrecognised file: '!Help'
   Reading Obey file: '!Run' ... 100% ** Warning: Last line not LF or CR terminated **
      Counted 8 lines:  Blanks = 2  Comments = 1  Code = 5
   Reading BASIC file: '!RunImage' ... 100%
      Counted 271 lines:  Blanks = 44  Comments = 11  Code = 216
   --- Ignoring unrecognised file: '!Sprites' (Filetype: &FF9)
   --- Ignoring unrecognised file: 'TestText'
   Reading BASIC file: 'TextLib' ... 100%
      Counted 264 lines:  Blanks = 45  Comments = 62  Code = 157

Processed 547 lines in 4 out of 7 files in 52 centi-seconds
Ignored 3 files
-------------------------------------------------------------------------
Language           Files       Lines      Blanks    Comments        Code   MaxLen
  BASIC                2         535          89          73         373      155
                             Average line length:         40          21
  Obey                 2          12           3           2           7       64
                             Average line length:         25          34
----------------------------------------------------------------------------------
  Total:               4         547          92          75         380
                                            16.8%       13.7%       69.5%

And now suppressing warnings with “-nowarn” ...
 
*cloc !textview -nowarn
Count Lines Of Code 1.00 (17 Jan 2021)
Processing files in directory 'SCSI::SSD.$.Develop.CountLines.!textview' ...
   Reading Obey file: '!Boot' ... 100%
   Reading Obey file: '!Run' ... 100%
   Reading BASIC file: '!RunImage' ... 100%
   Reading BASIC file: 'TextLib' ... 100%

Processed 547 lines in 4 out of 7 files
Ignored 3 files
-------------------------------------------------------------------------
Language           Files       Lines      Blanks    Comments        Code
  BASIC                2         535          89          73         373
  Obey                 2          12           3           2           7
-------------------------------------------------------------------------
  Total:               4         547          92          75         380
                                            16.8%       13.7%       69.5%

And now suppressing warnings but with extra stats with “-nowarn” and “-stats” ...
 
*cloc !textview -nowarn -stats
Count Lines Of Code 1.00 (17 Jan 2021)
Processing files in directory 'SCSI::SSD.$.Develop.CountLines.!textview' ...
   Reading Obey file: '!Boot' ... 100%
      Counted 4 lines:  Blanks = 1  Comments = 1  Code = 2
   Reading Obey file: '!Run' ... 100%
      Counted 8 lines:  Blanks = 2  Comments = 1  Code = 5
   Reading BASIC file: '!RunImage' ... 100%
      Counted 271 lines:  Blanks = 44  Comments = 11  Code = 216
   Reading BASIC file: 'TextLib' ... 100%
      Counted 264 lines:  Blanks = 45  Comments = 62  Code = 157

Processed 547 lines in 4 out of 7 files in 52 centi-seconds
Ignored 3 files
-------------------------------------------------------------------------
Language           Files       Lines      Blanks    Comments        Code   MaxLen
  BASIC                2         535          89          73         373      155
                             Average line length:         40          21
  Obey                 2          12           3           2           7       64
                             Average line length:         25          34
----------------------------------------------------------------------------------
  Total:               4         547          92          75         380
                                            16.8%       13.7%       69.5%

And now suppressing listing of files with “-nolist” ...
 
*cloc !textview -nolist
Count Lines Of Code 1.00 (17 Jan 2021)
** Warning: Last line not LF or CR terminated: 'SCSI::SSD.$.Develop.CountLines.!textview.!Boot'
   --- Ignoring unrecognised file: 'SCSI::SSD.$.Develop.CountLines.!textview.!Help'
** Warning: Last line not LF or CR terminated: 'SCSI::SSD.$.Develop.CountLines.!textview.!Run'
   --- Ignoring unrecognised file: 'SCSI::SSD.$.Develop.CountLines.!textview.!Sprites' (Filetype: &FF9)
   --- Ignoring unrecognised file: 'SCSI::SSD.$.Develop.CountLines.!textview.TestText'

Processed 547 lines in 4 out of 7 files
Ignored 3 files
-------------------------------------------------------------------------
Language           Files       Lines      Blanks    Comments        Code
  BASIC                2         535          89          73         373
  Obey                 2          12           3           2           7
-------------------------------------------------------------------------
  Total:               4         547          92          75         380
                                            16.8%       13.7%       69.5%

And now suppressing listing of files but with extra stats with “-nolist” and “-stats” ...
 
*cloc !textview -nolist -stats
Count Lines Of Code 1.00 (17 Jan 2021)
** Warning: Last line not LF or CR terminated: 'SCSI::SSD.$.Develop.CountLines.!textview.!Boot'
   --- Ignoring unrecognised file: 'SCSI::SSD.$.Develop.CountLines.!textview.!Help'
** Warning: Last line not LF or CR terminated: 'SCSI::SSD.$.Develop.CountLines.!textview.!Run'
   --- Ignoring unrecognised file: 'SCSI::SSD.$.Develop.CountLines.!textview.!Sprites' (Filetype: &FF9)
   --- Ignoring unrecognised file: 'SCSI::SSD.$.Develop.CountLines.!textview.TestText'

Processed 547 lines in 4 out of 7 files in 51 cent-seconds
Ignored 3 files
-------------------------------------------------------------------------
Language           Files       Lines      Blanks    Comments        Code   MaxLen
  BASIC                2         535          89          73         373      155
                             Average line length:         40          21
  Obey                 2          12           3           2           7       64
                             Average line length:         25          34
----------------------------------------------------------------------------------
  Total:               4         547          92          75         380
                                            16.8%       13.7%       69.5%

And now suppressing listing of files and warnings but with extra stats with “-nolist”, “-nowarn” and “-stats” ...
 
*cloc !textview -nolist -nowarn -stats
Count Lines Of Code 1.00 (17 Jan 2021)

Processed 547 lines in 4 out of 7 files in 51 centi-seconds
Ignored 3 files
----------------------------------------------------------------------------------
Language           Files       Lines      Blanks    Comments        Code   MaxLen
  BASIC                2         535          89          73         373      155
                             Average line length:         40          21
  Obey                 2          12           3           2           7       64
                             Average line length:         25          34
----------------------------------------------------------------------------------
  Total:               4         547          92          75         380
                                            16.8%       13.7%       69.5%

Limitations

This program came about as a result of a couple of comments on a ROOL forum about counting lines of code, and is not related in anyway with the program of the same name and function at github.com/AlDanial/cloc, apart from making use of the same table layout for displaying the information.
 
Identifying comments must count as one of Douglas Adams’ six impossible things to do before breakfast; it is a lot trickier than you might think. Certainly a lot trickier than I originally thought!
 
Many languages would need a parser in order to count lines with 100% accuracy. CLOC is not a parser and so it might not be 100% accurate depending on the language and how the source code is formatted.
 
There will also undoubtedly be files that it simply can’t recognise. If that is the case then if there is a filetype then either make sure the file has that filetype instead of Text (&FFF) or Data (&FFD), or append the filetype to the end of the filename (eg: “BasicFile,ffb”).
 
If a filetype is not available, other than Text or Data then make the first line in the file be a comment line.
 
This Limitations section and the How It Works section below should enable you to set the file so that CLOC can recognise and process files correctly.
 
First we need to define exactly what it is that CLOC counts.
 
CLOC counts physical lines, as opposed to logical lines, of code.
 
A physical line of code is defined as “a line ending in a newline or end of file marker, and which contains at least one non-whitespace non-comment character” (dwheeler.com/sloccount/sloccount.html). A logical line of code is defined as an executable statement (en.wikipedia.org/wiki/Source_lines_of_code).
 
So, for example, this line in BASIC:

is one physical line, but three logical lines of code and one logical comment line.
 
In this example (from wikipedia): we have four physical lines of code, two logical lines of code (does not include { and }) and one comment line.
 
Similarly, some languages allow a line to continue on to the next line with the use of a continuation mark (\ in C/C++ and & in Fortran77). These are ignored by CLOC and each line is counted as a separate line as CLOC counts physical lines rather than logical lines.
 
So, for example, this in Fortran: is counted as two physical lines even though it is one logical line of code. It is equivalent to:
Comment lines are lines that only contain one or more comments. Lines containing both code and comments are counted as a code line.
 
So, for example, this line in Pascal: will be counted as a code line by CLOC.
 
Lines which consists only of whitespace characters (space and tab) and statement separator characters are counted as blank lines. Blank lines inside a multi-line comment are counted as blank lines and not as comment lines.
 
If one language is embedded in another, for example assembly code in a higher level language such as C, this will not be recognised if the embdedded language uses different comment constructs from the containing language. BASIC and Charm are the only exceptions to this.
 
In Charm, the semi-colon character is used as a statement separator and as a comment character in the inline assembler. CLOC counts a semi-colon at the start of a line as an assembler comment.
 
In Python, strings can be strings or they can be comments and it is the context that depends on how Python interprets them. CLOC handles this as best it can by counting strings at the start of a line as comments otherwise it counts them as code.
 
Conditional code (eg #if..#endif) which is often used to stop code from being compiled is treated as normal code. C pre-processor lines are considered as normal C code. Similarly compiler directives, if they use the comment markers are treated as comments, otherwise as code. This can cause a problem if a line which the compiler would not process, because of a directive or a conditional, contains a syntax error, such as a string not being terminated, because it will still be processed by CLOC, and will either cause a warning to be reported, if the language does not support multi-line strings, or for the counting to go horribly wrong if it does.
For example: C and C++ files are not treated separately but are counted towards the “C/C++” language count. And so the comment // is recognised in both C and C++ files.
 
Python 2 and Python 3 files, whilst recognised by their different filetypes are also counted towards the “Python” language count. They are not differentiated.
 
Lua files can have a shebang (#!) at the start of a file to identify it as a Lua file, for example: #!Lua
 
Other languages that use shebangs, such as Awk and Python, already use the hash (#) character as a comment character, but Lua does not. And so Lua has an exception to enable it to count the shebang line as a comment line. Whilst it is expected that the shebang will only appear once on the very first line, any #! at the start of a line will be counted as a comment line in Lua files. The shebang check is case insensitive.
 

How it works

CLOC uses a number of different ways to try and identify the language of a file.
 
The first is by the filetype of the file, which includes the filetype attached to the end of the filename as a HostFS extension: ,xxx
 
Many languages, especially those that are compiled, do not have their own filetype and are usually typed as Text files.
 
If the filetype is Text (&FFF), Data (&FFD) or is untyped, then CLOC will examine the first few characters of the file to see if it can guess the language.
 
It will also look at the parent directory, which will override the guess (eg “c.HelloWorld”) will mean HelloWorld is treated as C/C++. The parent directory is commonly the file extension used under other Operating Systems.
 
And then finally it will see if the file has an extension at the end of the filename, which will override the parent directory (eg “HelloWorld/c”).
 
See the Language Details section below for specific languages.
 
Specific Text files are ignored regardless of the guess or parent directory. Files named: !Help, !ReadMe, Help, Messages, ReadMe are ignored.
 
Image files are also ignored. To process an Image file as a directory use:

You will need the application that understands the Image file (Eg !SparkFS) loaded in order to do this otherwise it will report that there are no files. Depending on whether the relevant application is loaded will depend on whether CLOC will recongise the file as an Image file or as an ordinary file.
 
If possible the whole file is read into a buffer (Initially set to 64Kb), if not the buffer is resized to the size of the file. If that is not possible then the buffer is set to half the file size. If that also fails then the file is read in 64Kb chunks. The buffer is claimed by creating a heap in the Wimpslot so make sure that the Wimpslot of the TaskWindow is at least 192Kb otherwise you’ll get the error “No writable memory at this address”.
 
Individual characters are read from the buffer until an EOL character is found. CLOC can cope with LF (UNIX/RISC OS), CR (Mac), CRLF (MSDOS), LFCR (OS_NewLine) line endings.
 
Tab characters are converted to a single space.
 
With the exception of Fortran77, spaces and statement separators at the start of a line are ignored, as are double spaces and double separators, which helps to reduce the length of the line that needs to be checked. This is especially effective in comment lines which often include multiple spaces. Spaces and statement separators at the end of the line are also removed.
 
Because statement separators are removed from the start of the line, it means that lines which only contain the separator character will be counted as blank lines.
 
The characters read are stored in the line buffer which is 1024 bytes which should be enough, but if not it is automatically increased to 2048 bytes by taking memory from the Heap.
 
Once the line has been read, or the End Of File (EOF) has been reached, the line is then processed to determine if it is a blank line, a comment or code line.
 
Fortran77 comments are checked first, then Charm inline assembler comments. And then CLOC looks for a shebang (#!) at the start of the line if the language specifies shebangs. This is currently only used by Lua.
 
Then CLOC looks for a single line comment at the start of the line, unless it’s already in a multi-line string or comment, because of a previous line, or it’s processing a Lua file (Lua multi-line comment and single line comments both start with --).
 
And then CLOC checks each character in the line by working through these steps:
  1. If not already in a string or multi-line comment then see if CLOC has found the start of a string.
    If not in a string then goto step 3.
  2. CLOC is in a string so look for end of string or EOL.
    If found end of string and not EOL then get next character and goto step 1.
  3. If not already in a mutli-line comment then see if CLOC has found the start of a multi-line comment.
  4. If in a multi-line comment then goto step 5.
  5. If not start of a multi-line comment then see if CLOC has found the start of a single line comment.
    If not in a multi-line comment and not EOL then get next character and goto step 1.
  6. CLOC is in a multi-line comment, and if nested comments allowed then look for the start of another multi-line comment.
    After checking for a nested comment, see if CLOC has found the end of a multi-line comment.
    If still in a multi-line comment then repeat step 5 else get next character and goto step 1.
Once the line has been processed, get the next line and repeat until EOF.
 
CLOC can handle comment characters in strings and treats them as part of the string. Many languages allow characters to be escaped with the \ character (eg \n); CLOC can handle these as well as well as escaped quotes within a string (eg '\'').
 

Language details

Listed below are the settings for each language along with notes about its implementation as well as any language specific warnings that may be reported. Warnings that may be reported for all languages are:


 
Ada

Filetype: No
Guess looks for: --
Extensions: /adb, /ads
Parent dir: adb, ads
Single line comment: --
Multi-line comments: No
Statement separator: No
String delimiters: ""
Multi-line strings: No
Escaped characters in strings: No
Notes:
Ada does have single quotes which enclose a single character (eg '*'), so it won’t have comments in them, so they are ignored.
 
ASM (extASM)

Filetype: &725
Guess looks for: No guess made
Extensions: No
Parent dir: No
Single line comment: ;
Multi-line comments: No
Statement separator: No
String delimiters: "" and ''
Multi-line strings: Yes
Escaped characters in strings: Yes, character = |
 
ASM (ObjAsm and ASM)

Filetype: No
Guess looks for: ;
Extensions: /s
Parent dir: s
Single line comment: ;
Multi-line comments: No
Statement separator: No
String delimiters: "" and ''
Multi-line strings: No
Escaped characters in strings: Yes
Notes:
ASM is used for stand alone Assemblers such as ObjAsm and ASM.
 
Awk

Filetype: &187
Guess looks for: #!awk or #!/usr/bin/awk or #!/usr/bin/env awk
Extensions: /awk
Parent dir: awk
Single line comment: #
Multi-line comments: No
Statement separator: ;
String delimiters: ""
Multi-line strings: No
Escaped characters in strings: Yes
Notes:
Can have spaces after the shebang #! and it is case insensitive.
 
BASIC

Filetype: &FFB (Tokenised) or &FD1 (Text)
Guess looks for: REM (&FD1). See notes below for tokenised files
Extensions: /bas
Parent dir: bas
Single line comment: &F4 (&FFB) and REM (&FD1)
Multi-line comments: Two, not nested and only in the Assembler, see notes below
  1. Begins: ; Ends: :
  2. Begins: \ Ends: :
Statement separator: :
String delimiters: ""
Multi-line strings: No
Escaped characters in strings: No
Warnings: Notes:
Comment check for REM is case sensitive.
For tokenised BASIC (&FFB) when guessing and when checking the extension and parent, it also checks that the file is 6 bytes or more in size, that the first two bytes are &0D and &00 and that the last byte is &FF.
The characters *| are also treated as a comment and there may be spaces between the * and |. This is handled as an exception.
CLOC only checks for the Assembler multi-line comments when the character [ is detected to indicate the start of the Assembler, with ] marking the end of the Assembler. It can cope with assembler statements such as LDR r0, [r1,#4].
Whilst the Assembler comments are specified as multi-line comments they do not actually span more than one line; the comment ends either at a colon or the EOL.
 
C/C++ and CMHG

Filetype: No
Guess looks for: /* or //
Extensions: /c, /h, /c++, /h++, /cpp, /hpp, /cc
Parent dir: c, h, c++, h++, cpp, hpp, cc
Single line comment: //
Multi-line comments: One, not nested
  1. Begins: /* Ends: */
Statement separator: ;
String delimiters: "" and ''
Multi-line strings: Yes
Escaped characters in strings: Yes
 
CMHG files are counted as C/C++ files but have different settings:
Filetype: No
Guess looks for: No guess made
Extensions: No
Parent dir: cmhg
Single line comment: ;
Multi-line comments: No
Statement separator: No
String delimiters: ""
Multi-line strings: No
Escaped characters in strings: Yes
 
Charm

Filetype: No
Guess looks for: |
Extensions: No
Parent dir: No
Single line comment: No
Multi-line comments: One, not nested
  1. Begins: | Ends: |
Statement separator: No
String delimiters: "" and ''
Multi-line strings: No
Escaped characters in strings: Yes
Warnings: Notes:
The inline assembler uses the semi-colon as a single line comment character.
However semi-colons are also used as the statement separator with no easy way to distinguish between them, and so if a semi-colon is found at the start of a line it is assumed to be an inline assembler comment.
Whilst the comments are specified as multi-line comments they are not actually multi-line.
 
Forth

Filetype: &D72
Guess looks for: "\ " or "( "
Extensions: /f
Parent dir: f
Single line comment: "\ "
Multi-line comments: Two, not nested
  1. Begins: ".( " Ends: )
  2. Begins: "( " Ends: )
Statement separator: No
String delimiters: ""
Multi-line strings: No
Escaped characters in strings: No
Notes:
According to one spec Forth comments aren’t actually multi-line, but it looks like that depends on the compiler as I have come across code that uses them as multi-line comments, so CLOC treats them as truly multi-line.
The opening comment ".( ", which is a compilation comment that gets printed during compilation, can’t be caught by using "( " as the preceding fullstop will cause the comment to be counted as code, which is why it has to be checked for separately.
 
Fortran77

Filetype: No
Guess looks for: C or c
Extensions: /f77, /for
Parent dir: f77
Single line comment: C or c or * in column 1 only
Multi-line comments: No
Statement separator: No
String delimiters: ""
Multi-line strings: Yes
Escaped characters in strings: No
Notes:
Fortran77 is sensitive to character position; comment characters are only allowed in column 1.
 
Lisp

Filetype: &D23
Guess looks for: No guess made
Extensions: /lisp
Parent dir: lisp
Single line comment: ;
Multi-line comments: One, nested
  1. Begins: #| Ends: |#
Statement separator: No
String delimiters: ""
Multi-line strings: Yes
Escaped characters in strings: Yes
 
Lua

Filetype: &18C
Guess looks for: #!lua or #!/usr/bin/lua or #!/usr/bin/env lua
Extensions: /lua
Parent dir: lua
Single line comment: --
Multi-line comments: Two, not nested
  1. Begins: --[[ Ends: ]]
  2. Begins: --[= Ends: =]
Statement separator: ;
String delimiters: "" and '' and [[]]
Multi-line strings: Yes
Escaped characters in strings: Yes
Notes:
Can have spaces after the shebang #! and it is case insensitive.
Comments and strings can have zero or more equals signs between the square brackets. Having looked at some source code it seems that no equals signs between the brackets are fairly common hence why there are two sets of multi-line comments as this saves CLOC having to look for equals signs every time.
 
Makefile

Filetype: &FE1
Guess looks for: #!make or #!/usr/bin/make or #!/usr/bin/env make or filename = “makefile”
Extensions: No
Parent dir: Filename = “makefile” in c, h, c++, h++, cpp, hpp, cc
Single line comment: #
Multi-line comments: No
Statement separator: ;
String delimiters: "" and ''
Multi-line strings: No
Escaped characters in strings: Yes
Notes:
Can have spaces after the shebang #! and it is case insensitive.
The shebang guess only looks for the first four characters to match “make”.
 
Obey and TaskObey

Filetype: &FEB (Obey) or &FD7 (TaskObey)
Guess looks for: No guess made
Extensions: No
Parent dir: No
Single line comment: |
Multi-line comments: No
Statement separator: *
String delimiters: ""
Multi-line strings: No
Escaped characters in strings: No
Notes:
The characters *| are treated as a comment and there may be spaces between the * and |. This is handled by setting * as the statement separator character so that it is removed at the start of the line along with any spaces to leave the | which is the comment character and so it is correctly treated as a comment.
 
Pascal

Filetype: No
Guess looks for: (* or {
Extensions: /p, /pas
Parent dir: p, pas
Single line comment: No
Multi-line comments: Two, not nested
  1. Begins: (* Ends: *)
  2. Begins: { Ends: }
Statement separator: ;
String delimiters: ''
Multi-line strings: No
Escaped characters in strings: No
Notes:
Pascal comments can start with { and end with *) and vice versa.
 
Python

Filetype: &AE5 (Python 2) or &A73 (Python 3)
Guess looks for: # or """ or '''
Extensions: /py
Parent dir: py
Single line comment: #
Multi-line comments: Four, not nested
  1. Begins: """ Ends: """
  2. Begins: ''' Ends: '''
  3. Begins: " Ends: "
  4. Begins: ' Ends: '
Statement separator: No
String delimiters: "" and ''
Multi-line strings: Yes
Escaped characters in strings: Yes
Notes:
Python files can make use of a shebang, but this is not used for guessing as the guess defaults to Python if it finds a # anyway.
If a string is on the left hand side of a statement then it is considered a comment, otherwise as a string. This can be difficult for CLOC to determine, so CLOC takes the view that if the string is at the start of a line then it is counted as a comment, otherwise it is counted as a string and hence as code.
 
Squeak

Filetype: No
Guess looks for: " as first character and second character is not "
Extensions: No
Parent dir: No
Single line comment: No
Multi-line comments: One, not nested
  1. Begins: " Ends: "
Statement separator: .
String delimiters: ''
Multi-line strings: No
Escaped characters in strings: No
Notes:
The guess checks that the second character is not " otherwise it could get confused with a Python docstring """