Documentation for CodeBeagle

Introduction to CodeBeagle

CodeBeagle allows you to quickly find all occurrences of a search term inside source code files. To do so it creates a full text index of the desired source files. Because it is tolerant to white-space its search syntax works great for searching source code. The search results are displayed in a source viewer with customizable syntax highlighting. It runs without installation and leaves you in full control when to update the index. Advanced features are the support of multiple indexes and custom search scripts which allow to automate sequences of searches. CodeBeagle is written in Python and based on SqLite and Qt.

Search syntax

foo bar

Matches foo followed by any number of white-space followed by bar but not foobar.

The asterisk is allowed as the wildcard character to search for keywords that are only partially known:

foo*

Matches any keywords starting with foo. This should be quite efficient as the keywords are internally sorted alphabetically and SqLite should easily determine those starting with a particular string.

*bar

Matches any keyword ending with bar. This is less efficient as it requires to scan over all keywords.

int*ate

The asterisk is certainly also allowed in the middle of keywords. The example above would match intermediate.

To search for a literal asterisk sign it must be separated by a blank:

a* * b

This matches all keywords starting with a followed by a literal asterisk and then by b.

a<=b

Matches a followed by any number of white-space followed by <= and finally followed by b. By typing a blank between < and = the search phrase would also match if there are any number of white-space characters in between.

hello **3 world

matches hello followed by one to three unknown words/tokens followed by world. This is useful if you know the beginning and end of the search phrase but in the middle there are some unknown parts.

class <!C|Q!>

allows to directly inject a regex into the query. In the example the query searches for class followed by something starting with C or Q. As you can see the part between <! and !> contains the regex.

Technical background:
The indexing associates the processed documents with their containing keywords. A keyword is an arbitrary length string of alpha numerical characters, the underscore and the hash sign. The first step of searching is to split the search phrase into a sequence of keyword and none-keyword parts. An example:

a != b

This yields the sequence a,!=,b. a and b are keywords. The search can then compute all documents which contain these two keywords. This set of results will then be stripped down to the final result by reading the files and scanning for the full search phrase using a regular expression. Any number of white-space is allowed between all parts of the sequence. In the example above there is no white-space allowed between ! and =. This would be the case if the search phrase was a ! = b.

Filtering the results

The UI offers three filters to drill down the results:

Filter by file/path	The file path of the results must contain this filter text as a sub-string to yield a match. If you want to pass more than one filter separate them with a comma. A minus sign before a filter allows to exclude matches.
Filter by extension	Specified a list of extensions which are allowed in the result. The separator is the comma. The following syntax is allowed and all means the same:*cpp,.cpp,.cpp**. A minus sign before an extension allows to exclude this extension.
Case sensitive	If checked the case of the search phrase must match

Configuration

Clicking on the Settings button opens a dialog which allows to configure search locations and general settings like the font type and font size used in the file viewer. Search locations define the directory and file extensions to search. They come in two flavours: Indexed and not indexed. Indexed search locations depend on an index file to be generated before any search can be performed. Therefore the search speed is high. Searching non indexed search locations does not need an index file but is slower instead. The settings done in the dialog are stored in the user profile. If you want to configure machine wide search locations independent of the currently logged on user read the chapter about the global configuration file.

Updating indexed search locations

Indexed search locations can be created and updated in two ways. The first way is from the UI by clicking on the Update index button. This allows to select which indexes you want to update. The update itself is then started as a background process and continues even if CodeBeagle is closed.
The second way is by directly calling "UpdateIndex.exe". This allows to automate the index update for instance using a scheduled task.

All indexes specified in the global configuration are created or updated when calling the command line program "UpdateIndex.exe". Use the "--config" switch to additionaly update indexes configured for a specific user. The switch allows to specify the full path to a user config file. User specific config files are written by CodeBeagle if you use the settings dialog to configure search locations. They are located under "%APPDATA%\..\local\CodeBeagle\config.txt". If there is no "local" directory the %APPDATA% directory is used. If %APPDATA% is not defined %HOME% is used. Call "UpdateIndex.exe" with "--help" to show all parameters.
For an automated index update I recommend to create a scheduled task which runs "UpdateIndex.exe".

The index update can be time consuming especially when being called for the first time. Building a fresh index for 30000 files takes about 80 minutes on my machine at work. The good thing is that an update for an already existing index is much faster as only modified and new files are processed.

Global configuration file

The global configuration is stored in "config.txt". CodeBeagle first reads the global "config.txt" and then merges it with per user configuration from the user profile. This means that the user configuration takes precedence over the machine wide settings. This global config.txt is expected to be in the same directory as the executables itself. The most important things to configure are the directories to index, the extensions of the files to collect and the path where the index is stored. Here is an example how to define an index:

Index1 {
    indexdb=D:\mysource.dat
    extensions=h,cpp,c
    directories=D:\source1,E:\source2
    # optional directory excludes
    dirExcludes=\dir1,\dir2 
}
Index2 {
   ...
}

This indexes all files in "D:\source1" and "E:\source2" with the extensions "h","cpp" and "c" and stores the result in "D:\mysource.dat". The section name which specifies the index must start with "index" (case doesn't matter). Optionally you can specify a list of comma separated strings in the property "dirExcludes". All directories containing one of these strings will be excluded. The path they are compared with is not terminated with a path separator. As illustrated by the section "Index2" you are not restricted to only one index definition. When several indexes are defined you can select the index to search in the upper right corner.

Here is a full table of all supported settings in "config.txt":

Setting	Meaning
showCloseConfirmation	If set to none zero the user must confirm a message box when closing the application
matchOverFiles	Continue with next/previous file if no next/previous match exists in the current file
updateIndexLog	If set UpdateIndex.exe writes a log file. The setting must contain the full path to the log file
profileUpdate	If set to none zero UpdateIndex.exe is run in the profiler and the resulting profiling data is printed to stdout. This really slows down the update and is a debugging option
IndexXYZ {	All groups starting with Index contain an index definition as described above

The config syntax allows to import other config files.

"config.txt" imports "config\SourceViewer.txt" which contains settings about the source viewer:

Setting	Meaning
fontFamily	Set your desired font family here. E.g. "Courier"
fontSize	Configures the font size
tabWidth	Configures the tab width
HighlighterXYZ {	All groups starting with Highlighter contain a syntax highlighting definition. This is explained in the section "Customizing syntax highlighting"

The "config.txt" file also imports "UserConfig.txt" at the very end. The recommendation is to leave "config.txt" unchanged and create a "UserConfig.txt" where you put all your custom configuration changes. The reason behind this is to keep your settings when you upgrade to a new version which ships a new "config.txt". This works because a second definition of the same key,value pair overrides the first. Here is an example of a "UserConfig.txt" which defines one index and overrides the tab width:

Index1 {
    indexdb=D:\mysource.dat
    extensions=h,cpp,c
    directories=D:\source1,E:\source2
}
SourceViewer {
    tabWidth = 2
}

Some remarks about the syntax of the configuration:
The basic syntax is "key = value". A string followed by '{' starts a group. Groups may contain any number of key value pairs and other groups. The content of another config file may be imported as a group using the "import {file} as {group}" syntax. The syntax "import {file} imports the content of a config file in the current group.

Keyboard shortcuts

F7	Jumps to previous match in current document
F8	Jumps to next match in current document
CTRL+F	Opens a UI which allows to search for text in the current document
F3	Shows next occurrence of text in current document
F4	Shows previous occurrence of text in current document
F5	Reloads the current document
CTRL+T	Opens a new search tab
CTRL+W	Closes the current search tab
CTRL+S	Opens the settings dialog
CTRL+G	Goto line in current document
ALT+n	Jumps to the n-th tab. Where n is a number of one to six
CTRL+Text selection	Searches for the selection in a new tab. As a double click on a word selects it this is quite handy to quickly navigate through the source
CTRL+Shift+Text selection	Searches for the selection in the current tab
CTRL+B	Jump to matching brace

Custom context menu entries

The context menu of the list control which shows the matches can be extended with additonal entries. These entries can either simply start an executable or execute a script file which contains python code. The following examples show how to configure both types:

ContextMenu1 {
    title = Notepad
    executable = %windir%\notepad.exe
    args = "%file%"
    showWindow = True
}

As the example illustrates environment variables are resolved properly. The command line of the executable is specified in args. The special variable %file% is resolved as the selected file in the tree control. If more than one file is selected the executable is started for each of the files.

There is also a variant that supports calling programs that need exactly two file parameters like diffing tools. In this case provide two variables in your args with the name file1 and file2. Here is an example that starts WinMerge:

ContextMenu2 {
  title = WinMerge
  executable = C:\Program Files (x86)\WinMerge\WinMergeU.exe
  args = "%file1%" "%file2%"
  showWindow = True
}

ContextMenu2 {
    title = Checkout from version control
    script = contextmenu\checkout.py
}

The list of selected files in the tree control is passed as a string list in the variable files. Here is an example:

import subprocess
for file in files:
    subprocess.Popen ([r"C:\windows\notepad.exe"] + [file])

Custom search scripts

These scripts written in Python allow to automate search tasks. At start-up of the application all files with the extension ".script" are collected from the sub folder scripts. To execute a custom search script type your search phrase and then select the "wizard hat" icon right of "Find" button and select the desired script. The script is then called with the search phrase and all filters settings and is expected to return a set of results. Here is a basic example to illustrate this:

result.matches.extend (performSearch(query, folders, extensions, caseSensitive))

The result of this script is identical to what the regular "Find" button does. As you can see the script uses predefined variables and the function performSearch. Here is a list of all other input variables and functions:

query	The search phrase as string
folders	The folder filter as string
extensions	The extension filter as string
caseSensitive	Boolean if case sensitive is checked
regexFromText(query)	Returns the regex object which can be used to highlight "query"

After the script has finished its work the following variables are inspected to be able display the result:

result.matches	A list of file names with the matches. Each list entry must be the full path to the matching file
result.highlight	A regular expression object from the module re which is used to highlight the matches in each of the matching files. This is necessary because only the search script knows what the criterias are which resulted in the returned matches. If you do not set this variable the default is to highlight the initial query string. Use regexFromText to obtain the regex for instance in order to combine multiple regexes.
result.label	Allows to set the label of the search tab. If you do not set this variable the default is "Custom script"

Customizing syntax highlighting

The default syntax highlighting works fine for C++ and languages with similar keywords (C#,Java,...). It is configured in "config\SourceViewer.txt" like this:

Highlighter_Default {
    config = C++.txt
    extensions = *
}

Highlighter_CPP {
    config = C++.txt
    extensions = c*,h*,inl
}

Each group starting with Highlighter defines a syntax highlighting definition. Each definition contains two entries:

config	A file containing the syntax highlighting rules. The file must reside in the "config" directory.
extensions	A list of extensions, possible with wildcards (* and ? allowed)

The sequence of the Highlighter groups is not important. The corresponding highlighting definition for a file is looked up by the file's extension. If the extension rules of more than one highlighting definition match the one which fits best is used. That means the one which contains the most none wildcard characters. This allows to define a default rule using an asterisk as shown above.

The actual syntax highlighting rule files are Python scripts. The environment for these scripts is constructed in a way to make it easy to access font weights and colors. It might also help to know that "PyQt5.QtCore" is imported as "QtCore" and "PyQt5.QtGui" is imported as "QtGui". Highlighting rules are added using three functions. The functions expect the pair fontWeight and foreground which is internally used to construct a QTextCharFormat:

addKeywords (keywordList, fontWeight, foreground)	Adds highlighting to keywords. keywordList is a comma separated string with all keywords.
addCommentRule (singleLine, multiLineStart, multiLineEnd, fontWeight, foreground)	Adds a rule for comments. singleLine is a string with a regular expression which matches the single line comment. multiLineStart and multiLineEnd are regular expressions which must match the start and end of a multi-line comment. E.g. for C++ the expression should match `/` resp. `/`
addRule (expr, fontWeight, foreground)	Adds an arbitrary rule which highlights the result of the regular expression in expr
setColor(color)	Sets the text color used to display documents.

Predefined colors which can be used as foreground color:

white
black
red
darkRed
green
darkGreen
blue
darkBlue
cyan
darkCyan
magenta
darkMagenta
yellow
darkYellow
gray
darkGray
lightGray

Alternativly you can create any color by using the function rgb(r,g,b) or rgba(r,g,b,a).

Predefined font weights used as fontWeight (ordered from light to bold):

Light
Normal
DemiBold
Bold
Black

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Most icons were taken from the great "Crystal" icon set found at http://www.everaldo.com/crystal/.

Dark theme by Colin Duquesnoy (see GitHub)