Search Help

Search Help In this Pathways application, a number of search options are available. It is possible to perform simple text-based searches. However, it is also possible to perform searches guided on the context of the documents, based on the way the document has been tagged. Text search is derived from the open source search engine Apache Lucene and brings into play many of the options that Lucene provides.

Search Options

There are two basic ways in which you can search the document: text or context. Searching the text returns all the occurrences of a given phrase or word in the collection. Searching for context returns all of the documents containing the chosen tag, whether or not the specific phrase is mentioned within the document. So, if you choose "broad", which is a sub-topic of "broadland", the search engine will return every document which talks of the broads: even if it doesn’t mention the word ‘broad’!

To toggle between the two search modes, click on the ‘Pathways’ menu in the header, then select ‘List Texts and Text Search’ or ‘Context Search’, respectively.

Text Search

List Text and Text Search, as the name suggests, gives the complete list of texts for the Pathways project, and gives a basic text search function.

You can simply scroll down the page and click on a text that looks appealing. Click on a title, and a new tab will pop up and display the text in question.

Alternatively, you can search within the texts. The simplest way of doing this is to type in your search term into the text box. Click the blue button, and a tab will pop up displaying your search results. You can choose which texts you want to search, simply by checking their boxes. Note that if you do not make any choice, you will search in all available texts – there is no need to click all check boxes in order to perform a general search.

The Pathways database contains a number of different kinds of texts: anthropological research, teaching materials, stories, interviews, speeches and more. The Pathways application lets you restrict your search to one or more of these document types. Again, if you don’t make any choice, the search function will simply search in all available text types. You have a choice of search modes. These will be explained in more detail below, but the most common ones are "Any Search Term" and "All Search Terms". The first option is default, so if you do not make any choice, you will automatically search for any search term. What this means is that you want a hit no matter if one or the other of the words you input as search terms is present, or if both are, (so no matter what combination of the search terms you input are present) whereas if you search for all search terms, you only want hit if all of the words you input are present.

When searching for books in a library catalogue, one can usually choose to have the hits displayed according to relevance or according to author, title or suchlike. In the Pathways application, hits are only displayed according to relevance, according to a "score" computed for each search. In essence, the more times your search terms occur in your search scope and the less common they are in the index (that is, in the Pathways files) the higher the score they will get and the more prominent they will be.

This may seem obvious, but you search for words - spaces and punctuation, for example, cannot be searched for, but only the words themselves. Also, the words are all converted to lower-case, so this means that there is no way to differentiate between "Broad" (the landscape type) and "broad" (the adjective). As well as your search terms being converted to lower-case, all punctuation is removed. One may ask: if spaces cannot be searched for, what are phrase searches? Don't phrases consist of words with spaces and punctuation in between? Yes, but when you search for a phrase, this is not the same as when you search for a phrase in a word processing document – with Lucene, you actually search for a sequence of words which have no words in between them, so everything is about words after all (and a phrase search is actually a proximity search).

Search Types

Simple Search

If you select "Any Search Term" and fill in some words in the query field, you are saying that you would like to see as many of the words in the search scope, but if there is only one of them present, you also want to have it displayed as a hit.

If you select "All Search Terms" and fill in some words in the query field, you are saying that you want to see all of the words in the hits within the search scope – if just one of the words is missing, you do not want to have it displayed as a hit.

Phrase Search

If you select "Phrase Search" and fill in some words in the query field, you are saying that you want to see all of the words in the hits within the search scope, but only if they occur in the same sequence. This is the way searches are performed in word processing documents, except that here punctuation is disregarded.

Proximity Search

If you select one of the two "Proximity Search" options and fill in some words in the query field, you are saying that you want to see all of the words in the hits within the search scope, in the order specified or not, and within a certain proximity. The proximity is stated in terms of maximum number of words allowed in between the words you enter in your query.

"Fuzzy Search" may be less obvious. If you take a word, like "snake", you can make changes and additions to it. One change would thus give you "spake", "slave", "snare" and "stake". If you make one more change based on this, you can easily see that a lot of words can be generated. Since this search is very time-consuming, the maximum number of "edits" you can make is 2. If you do not enter any digit, 2 is also assumed. Fuzzy search demands so many resources that only one term can be searched at a time, so all words after the first will be removed from your query. "Wildcard Search" offers the possibility to search using ? for a single character and * for zero, one or more characters. You would retrieve hits with "shake", "spake", "stake" and so on if using "s?ake", and "church", "churches" and “churchyard” with "church*". "*ling" will give you "telling", "trembling", "brawling" and so on, "te??ing" will give you "telling", "teeming", "tending" and so on. This offers some of the functionality of a regular expression search, but be aware that the symbols ? and * have different meanings in wildcard and regex search.

Standard Lucene Syntax

With Lucene standard syntax, there are two ways you can go: you can either prefix words with + and - or use Boolean logic with AND, OR and NOT (written in upper-case). In both cases, you can additionally group your search expressions using parentheses. In case you use any of these operators (or any operators used in regex searches), the search mode will automatically be set to Any Search Term (or to Regex Search if this applies), so choosing any of the other options has no effect.

The first option (using + and -) is better suited to a search which orders hits according to score. Here you let words stand as they are (without + or -) if you would like them to occur in hits, but you prefix them with + if they must occur in a hit and - if they must not occur as a hit. If you search for "fen church" you get a lot of hits with either "fen" or "church" and some with both. If you search for "church +fen", all your hits will contain "fen", but they may or may not contain "church". If you search for "church -fen", you would like to see hits with "church", but only if they do not contain "fen".

If you use AND, OR and NOT, the logic is rather different. If you search for "church AND fen" you get hits with both "church" and "fen" and none with only one of them. This corresponds to "+church +fen". If you search for "church OR fen", this is the same as simply searching for "church fen". If you search for "church NOT fen", this equals searching for "church -fen".

If you use AND, OR and NOT, the logic is rather different. If you search for "church AND fen" you get hits with both "church" and "fen" and none with only one of them. This corresponds to "+church +fen". If you search for "church OR fen", this is the same as simply searching for "church fen". If you search for "church NOT fen", this equals searching for "church -fen".

Searches can acquire a higher complexity through the use of parentheses. Here the use of AND, OR and NOT may come more naturally. Say you want to find passages where the word “fen” occurs but where also at least one of the words "church", "broad", "marsh", "reed" or "bird" occurs. You can express this by "(church OR broad OR marsh OR reed OR bird) AND fen". An AND enforces "must occur" on both sides, so both one of the bracketed search-terms and the word "fen" have to occur in the hits. Say (for some reason) you do not wish the words "coast" and "sea" to occur in your hits – you then embroider your search expression with "NOT (pricket OR mouse)" as "(church OR broad OR marsh OR reed OR bird) AND fen NOT (coast OR sea)"

If you simply search for "sea OR reed AND fen", you will (because AND has a higher order of precedence), search for passages where "reed" and "fen" must occur, but you would also like "sea" to be marked as a hit. You can enforce a certain logic on your query by grouping with parentheses.

If you search for "(church OR reed) AND fen" you are saying that one or both of "church" and "reed" must occur, as must "fen".

If you search for "church OR (reed AND fen)", you would like to retrieve hits where "church" occurs and you would like to retrieve hits where "reed" and "fen" go together. In practice this means that you will get a lot of "church"-only hits. You can also nest parentheses, e.g. "(church OR (reed AND fen)) NOT sea" will remove the hits with "sea" from "church OR (reed AND fen").

As you can see, the options are many …. And as if this was not enough, there is also regex – and regex syntax combined with standard syntax!

Context Search

Pathways database lets you do more than simply search in the files for strings of text. The files are not just strings of formatted text, but are marked up according to the context of the file. So, the search function lets you search for any instance of ‘folklore’ in the files, even when the term ‘folklore’ isn’t mentioned within the text. To access the Context Search, simply click on the ‘Pathways’ menu and choose ‘Context Search’. You will then be presented with a series of headings with a ‘plus’ button on the side. Clicking on any of the headings will present you with a list of texts that are tagged with that word. So, click on ‘paranormal’, and you will get a list of texts which are tagged accordingly. Alternatively, if you want to look for a more specific heading under ‘paranormal’, click on the plus button, and it will show you its various sub-headings, e.g. ‘time-travel’, ‘spirits’and ‘ghost stories’. So, if you are concerned about ghosts, but not with other aspects of the paranormal, you can restrict your search accordingly. Some terms may be unfamiliar, or may have a particular academic meaning. So, you may be wondering what the difference is between ‘mere’ and ‘mire’; or, you way wonder exactly what we mean by ‘oral history’. In that case, simply hover the mouse over the text, and a brief explanation will appear as a pop-up. The headings may appear to lack some logic, or flow and some are very broad: ‘activities’, for instance, encompasses everything from hunting and farming, to play and sport. ‘Affect’, similarly, encompasses not only the basic emotions of happiness, love and anger, but also emotions that a place can evoke: a sense of privacy, a sense of safety, or a spiritual connection. Context search doesn’t allow you to search for the text that the document contains, but for its significance. So, it’s not always explicitly mentioned in the text what emotions an environment evokes, or that a piece of text is a piece of family history; however, the context search lets you search for text which relates to a topic but doesn't explicitly mention it.

Regular Expression (RegEx) Search

A RegEx expression is a special text string for describing a search pattern. The wildcard search is a much simplified form of RegEx. There are whole books devoted to the use of RegEx so it would not be realistic to give a comprehensive tutorial on the use of RegEx. Here are two websites which provide some introductions and information on RegEx tools, in case you are not familiar with RegEx and want to test out searches:

http://www.regular-expressions.info
https://www.regexbuddy.com/regex.html