The main toolboxes
The catalight.analysis
sub-package contains a number of helpful tools to assist with data analysis. There are 4 types of modules found within the sub-package:
Toolboxes — Compiled functions grouped by purpose to help with common tasks
Data classes — Modules containing classes to act on specific data types
Runnable scripts (prepended with “run_”) — Can either be called from another script or run as a gui in command line or editor
GUIs (appended with “gui”) — More complex scripts that should be run exclusively from command line or editor, not called.
We will discuss the contents of each type of files within this section, and additional details can be found in the API documentation.
Tip
Inside the catalight GitHub repository you’ll find a folder called example_data containing a demo calibration file and two “samples” containing multiple experiment folders with data. You can use this data to test out the analysis functions within the analysis toolbox.
analysis.plotting
The plotting
module groups functions used for plotting a variety of data types and formats. The set_plot_style()
function controls the appearance of all output functions, and adjusts some plot visuals automatically based on the plot dimensions requested.
The most standard function in the module is plot_expt_summary()
which is usually the first plotting function called (in conjunction with the run_analysis()
functions from the tools
module) after running an experiment. The plot_expt_summary()
functions calls 3 separate plotting functions, plot_run_num()
, plot_ppm()
, plot_X_and_S()
which we group together because they are often called back to back. Lets step through the output of each of these to better understand how normal analysis is done.
Note
The X tick labels will sometimes contain units to indicate to the user that the values are strings instead of floats. This should only be the case for composition sweep experiments in which the “true” independent variable (e.g. reactant ratio vs reactants pressure) is hard guess automatically.
If the savedata
parameter of the plot_expt_summary()
function is entered as “True”, all three of these plots will be saved in the results subfolder of the experiment’s data folder.
The three plotting functions called by plot_expt_summary()
can be called independently through scripting as well, and additional plotting tool are available within the plotting
module. multiplot_X_and_S()
and multiplot_X_vs_S()
functions are used to plot comparisons between individual experiments and can be most easily accessed using the DataExtractor
dialog, covered in the analysis.user_inputs and Helper scripts subsections.
Many users will want to customize plot style from the default styles printed by catalight. As such, whenever catalight takes a savedata
parameter, figures are saved in a .pickle format. This allows the user to open the file as a matplotlib.pyplot.Figure
object and directly alter the plot elements. The open_pickled_fig()
function accepts the full path to a pickled figure file, shows the image, and returns figure and axis handles to be used for visual editing. When the set_plot_style()
function is called, catalight also sets plt.rcParams['svg.fonttype'] = 'none'
which allows .svg file text to be edited in vector editing software such as Inkscape. Many components of .svg type files can be edited outside of python for visual changes that can be reasonably be performed on a file by file basis (whereas multi-file changes are better done programmatically).
analysis.user_inputs
Where the plotting
toolbox provides many features for creating plot of experimental data through code, the user_inputs
toolbox provides tools for requesting plotting instructions from the user. This toolbox is particularly helpful to super users that would like to develop simple GUIs to aid less experienced team members with data analysis tasks. The GUIs in the analysis
subpackage use user_inputs
to request plotting instructions and plotting
to generate the plots.
Selecting data
The DirectorySelector
and DataExtractor
classes were developed to aid with the selection of data to be plotted/analyzed. DirectorySelector
is the more simple of the two, and is just a wrapper over a normal QFileDialog
. In this case the file dialog is changed from the native dialog to the QT version so that multi directory selection can be enabled. The advantage of this is the user can select multiple folder to analyze (the disadvantage is the dialog looks worse).
To use the user’s selection in other code, call the get_output()
method. The following example demonstrates how to open the GUI, allow the user to make a selection, and return the selection as a list called "expt_dirs"
. "expt_dirs"
can then be used in additional code to run analysis over the directories the user is interested in. Here we just print the selection.
# Prompt user to select multiple experiment directories
data_dialog = DirectorySelector(starting_dir)
if data_dialog.exec_() == QDialog.Accepted:
expt_dirs = data_dialog.get_output()
for dir in expt_dirs:
# Run further analysis with user's GUI selection
print('Target Directory = ')
print(dir)
The DataExtractor
is a bit more complex than DirectorySelector
as it allows selecting specific files/folders and adding custom labels to these data sets for use in plotting.
The use of this class is very similar to DirectorySelector
. Notice that the output of the get_output()
methods is now a tuple returning both a list of files and matching data labels.
# Prompt user to select multiple experiment directories and label them
data_dialog = DataExtractor(starting_dir)
if data_dialog.exec_() == DataExtractor.Accepted:
file_list, data_labels = data_dialog.get_output()
The get_user_inputs() function of both run_change_xdata
and run_plot_comparison
utilizes the exact code above. run_plot_chromatograms_stacked
on the other hand alters the init parameter when instantiating the DataExtractor
class. The parameter ‘’ (empty string) instructs the GUI to search for any file and the ‘.asc’ enforces that the file must have a ‘.asc’ extension. Finally data_depth=0
instructs the dialog to return file paths from its get_output() method rather than directories. This format is used to allow the user to select individual chromatograph data files for plotting.
# Prompt user to select .asc files and label them
data_dialog = DataExtractor(starting_dir, '', '.asc', data_depth=0)
if data_dialog.exec_() == DataExtractor.Accepted:
file_list, data_labels = data_dialog.get_output()
Plotting instructions
Using a combinations of regular file dialogs and the custom dialogs presented in the previous section, the user is now able to select many different data types in a GUI. This section describes tools built to help when the user needs to enter more information than just which data they would like to plot. Many of the functions called in the various helper tools take a number of arguments. Many of these arguments repeat across different functions. The PlotOptionsDialog
was developed to reuse as much code as possible while customizing a GUI specifically for the options required in particular functions. The PlotOptionsDialog
mixes and matches its GUI elements programmatically based on the PlotOptionList
provided to it on instantiation. The PlotOptionList
is a data class containing all of the default plot options each wrapped up in another data class called Option
.
Option
contains the following format:
value: Holds the user supplied value
include: Indicates whether or not to include GUI element
label: Text displayed in gui to represent what the value is
tooltip: Text for widget tooltip
widget: Widget used for entering option values
While PlotOptionList
contains the following Options
reactant |
target_molecule |
mole_bal |
figsize |
switch_to_hours |
savedata |
overwrite |
basecorrect |
units |
plotXvsS |
forcezero |
xdata |
plot_XandS |
No changes need to be made to any default Option
within the PlotOptionList
before generating the PlotOptionsDialog
, but the change_includes()
method can be used to turn on and off specific option GUI components before instantiating the dialog. All options are turned off by default.
# Edit Options specifically for initial analysis dialog
include_dict = {'overwrite': True, 'basecorrect': True, 'reactant': True,
'target_molecule': True, 'mole_bal': True, 'figsize': True,
'savedata': True, 'switch_to_hours': True}
options = PlotOptionList() # Create default gui options list
options.change_includes(include_dict) # Modify gui components
options_dialog = PlotOptionsDialog(options) # Build dialog w/ options
if options_dialog.exec_() == PlotOptionsDialog.Accepted:
response_dict = options.values_todict()
Calling the values_todict()
method returns a dict
of the users entries and kwargs
which can be directly unpacked using the **
symbol
expt_dirs, calDF, response_dict = get_user_inputs()
main(expt_dirs, calDF, **response_dict)
analysis.tools
Finally, the tools
toolbox contains many helper functions that get reused repeatedly throughout the code base. If you are planning to develop new components of catalight, it is particularly advantageous for you to study these functions and utilize them where ever possible. More information about each function can be obtained from the API reference section. Here we will only highlight the most important functions for the average user.
Namely, we will skip over:
run_analysis()
is the main workhorse of the analysis
subpackage. This function takes in an Experiment object
and a calibration file (imported as a DataFrame
) and produces a 3D numpy array
of concentrations, and two 2D pandas DataFrames
(avg_conc and std_conc). If savedata
is entered as True
, the three outputs are saved in the experiment results folder (See data structure diagram)
Once calculated, concentrations, avg, and std can always reintroduced into the code using the load_results()
function. Generally, when the overwrite
kwarg
is accepted, this parameter switches between whether run_analysis()
or load_results()
is called (for existing datasets). Many experiments analyzed programmatically using the catalight.analysis.run_initial_analysis
module. For all analysis types, convert_index()
can be a useful tool. For composition sweeps in particular, it is hard to define an exact X unit the user is looking for in a general and simplistic way. As such, catalight always outputs the x axis of composition sweeps as a string
depicting each individual component. The convert_index()
functions and related run_change_xdata
module allow the user to change the x data from strings to floats with the users desired units. This is how composition sweeps can be generalized between, for example, varying the ratio between two reactants or varying the total reactant pressure.
To run analysis in the first place, a calibration must be supplied to properly convert GC counts to ppm concentrations. The analyze_cal_data()
function takes in a basic .csv file describing chemical elution times and calibration data to generate a compatible calibration file. See the calibration section for more details
Ultimately most experiments will seek to convert some molecular concentration measurements into catalytic measurements. Thus far catalight is able to calculate conversion (X) and chemical selectivity (S). This is done by passing an Experiment object
and Chem ID indicating a reactant and chemical target to the calculate_X_and_S()
function. Conversion, selectivity, and their respective errors are then calculated according to the following equations:
Note
Before running the calculate_X_and_S()
function, run_analysis()
should be called on the desired experiment to generate the concentrations, avg, and std results files. These are imported to the final calculations.
The GCData class
A central idea underpinning the analysis
module is the GCData
class. All imported .asc files from the GC output are loaded as an instance of the GCData
class and relevant data actions (such as plot integration and conversion to ppm concentrations) are carried out by class methods. The goal of the project is to eventually create ubiquitous analysis tools that can interact with a number of different data types. For example, we currently have the GCData class but may one day introduce an FTIRData class. The goal would be to apply this concept wherever possible and built tools such as those found in the other sections of this guide to work as seamlessly with other data types as is reasonable.
For the time being, the only data type supported by catalight is GCData, whose main behavior includes, extracting data from .asc files, finding peaks, integrating peaks, and converting those peaks to concentrations given some calibration data. All actions logically taken on a single data file are performed within this class. All actions which combine data from multiple files are performed else where (ex. run_analysis()
).
Usage of GCData is straight forward on the frontend. An instance of GCData is created by passing a path to an .asc file output by the GC and indicating if base correction is wanted. Some processing is run in the background, making data available as instance attributes of the new data
object. Converting to concentrations is done by simply calling get_concentrations()
.
data = GCData(filepath, basecorrect=True)
values = data.get_concentrations(calDF)
1 |
|
Pull data from .asc file. Returns timestamps and a DataFrame with signal vs elution time. |
2 |
|
Find peaks. |
3 |
|
Find integration bounds of peaks. |
1 |
|
Main method called |
2 |
|
Integrate the peaks based on the computed integration bounds. |
3 |
|
Converts integrated peak counts to ppm based on calibration data for each molecule. |
Lastly, GC_Data
plot_integration()
is helpful for evaluating/troubleshooting the integration performance. This method plots the integration bounds and peak apex. To give a better few of how the data is being processed.
Helper scripts
A number of executable scripts have been written to perform basic data analysis with graphical user inputs. Files prefixed with the phrase “run_” indicate that the file can be executed in command line and UI prompts will help the user run the respective analysis instructions. Alternatively, all of these files can be called in separate, user-created scripts without executing the file entirely. Each “run” file in the analysis subpackage contains two function: “get_user_inputs()” and “main()”. “get_user_inputs()” is designed to open UI dialogs, taking in user values for running analysis. This was done to make data processing as simple as possible for users without coding experience. “main()” is where the actual analysis gets performed. The main() functions typically have a large number of arguments, which may seem intimidating at first. This is mainly to increase flexibility, and many of these arguments can stay as their default values. If a user would like to run analysis in a scripted fashion, calling analysis.run/_”filename”.main() with the desired arguments is a completely acceptable method! Of course, the user can bypass these helper functions all together for even more flexible data analysis options.
Provide prompts to user and call functions to run calibration analysis |
|
Swap the x axis values and units with a user entered array |
|
Run initial analysis on all experiments within the main folder provided by the user |
|
Plot any number of chromatogram files (‘.asc’) on a single set of axes |
|
Provide prompts to the user to plot multiple experiments either on a single plot with conversion and selectivity as axes or on two plots with conversion or selectivity plotted as a function of a shared independent variable (temperature, power, etc.) |
Finally, the chromatogram_scanner_gui
module provides a fully comprehensive GUI to scanning through collected GC data. The behavior of this feature is a bit more complicated than the UI described above, so it is intended to only be used in a GUI style. After selecting a main directory in a basic file dialog, the screen below will appear to the user showing all .asc files contained within. The user is also able to toggle between FID and TCD files (more options can be added) and an experimental feature for supplying custom file extensions is included, but not fully supported at the moment.
Note
The plot integration bounds feature is in development and will be included in a later release within the near future.
Running a calibration
Within catalight, calibrations are handled using external csv files. These are imported as a pandas.DataFrame
, usually referred to in the code as “calDF”. We primarily handle calibrations and integration outside of peaksimple to offer more control over the process and automation of analysis. For users that would prefer to utilize peaksimple for calibration, the results files output from peaksimple are saved in the same location as the ascii files.
Calibrations can be performed by flowing in a calibration standard gas mixture through one of the systems mass flow controllers. The user can perform a composition sweep using either the GUI or scripting and then utilize catalight.analysis.tools.analyze_cal_data()
to analyze the collected data. The catalight.analysis.run_calibration
module includes a GUI interface to help with this process. The Experiment class also contains a calibration experiment type, as seen in it’s expt_type
attribute. This is essentially the same as a composition sweep, but uses different naming conventions, warns GUI users to select a calibration file, and may be outfitted with additional function in later versions. The Gas_System
class provides a set_calibration_gas()
method to build a new custom mixture to control MFC flow with high precision. This method is utilized in the GUI, but needs to be called separately if scripting.
In addition to performing the physical calibration experiment, the user needs to provide a calibration file describing the input gas. More information about the calibration file can be found in the Calibration File Details section.