Citation Galaxies is a web-based tool to aid bibliometricians in their work. Our stakeholder’s work is centered on how, where, and when citations occur in academic texts. Their research is paramount to the understanding of where funding should be distributed to have the most impact. This is especially relevant in Canada because the government uses the funding as an economic stimulus.
The web-based tool uses two different datasets Pubmed and Erudit. Each individual text is broken down into two parts: the distance a word is from the nearest citation (in words), and the distance in sentences. This allows for the investigation of the context in which citations occur. The goal is to allow users to create rule sets that can be used to process large document collections by first investigating citation contexts to get a sense of how the corpus is built.
The rule page allows the user to create compound rules to find specific instances where language rules they define occur in the text. For example, a user could create a simple rule where they would like to find all instances in the texts where the word cancer and heart occur within a distance of 2 sentences of a citation. The rules page supports several logical operators such as NOT, OR, and AND. These operators allow the users to create a complex set of rules in order to find the exact instances they are trying to find in the text. Users can create multiple rulesets to search for in the database. Each ruleset is assigned a color to help display what ruleset is triggered in specific areas of the text.
The home page is a visualization that denotes the number of rules that are triggered in that specific area of text. For example, the visualization could denote that one of your rules triggers only in the first 15% of sentences in academic texts. The texts are organized into columns based on the year the paper was published. By sorting the texts into their year of publication users can start to see how the rules change over time.
The paper’s view page is a more refined view of the occurrences of the rules in specific papers. The users can analyze the text surrounding the rules.
Lastly, the export page allows the users to export the specific instances that triggered the rules allowing them to perform manual analysis on a more refined dataset. By giving Bibliometricians a new way to investigate citation context they can use their expertise to build rule sets quickly, use those rule sets to parse large document corpora, and then export their findings so they can perform their statistical analyses that are part of their normal workflow. In this way, we can get from idea to data set within a short time frame with no coding required.