catalight.analysis.gcdata.GCData

class catalight.analysis.gcdata.GCData(filepath, basecorrect=False)

Bases: object

A class for processing/analyzing gas chromatograph data from an ASCII file.

filepath

filepath where data is stored (.ASC file)

Type:

(str)

timestamp

time since the epoch in seconds

Type:

(float)

rawdata

contains time and signal values from GC run

Type:

pandas.DataFrame

time

np array version of time axis from rawdata (in minutes)

Type:

numpy.ndarray

signal

np array version of time axis from rawdata. If basecorrect=True, it will be baseline corrected using baseline_correct()

Type:

numpy.ndarray

apex_ind

indices of all peaks identified by apex_inds()

Type:

numppy.ndarray

numpeaks

length of apex_ind

Type:

int

lind

indices of leftmost bound for integration for each peak identified

Type:

numpy.ndarray

rind

indices of rightmost bound for integration for each peak identified

Type:

numpy.ndarray

__init__(filepath, basecorrect=False)

Initialize the class with the attributes filename and data.

(which has been read from ASCII and is a pandas dataframe)

Parameters:
  • filepath (str) – full path to the acsii file

  • basecorrect (bool) – set basecorrect to True if you want correction, default is false

Methods

__init__(filepath[, basecorrect])

Initialize the class with the attributes filename and data.

apex_inds()

Use scipy.signal.find_peaks to find peaks in signal.

baseline_correction()

Get self.signal, output signal with background subtraction.

convert_to_ppm(calDF, counts, chemID)

Convert integrated raw counts into ppm based on calibration data.

get_concentrations(calDF)

Return a Pandas series of chemical concentrations.

get_run_number()

Determine run number from filename.

getrawdata()

Read data from ASCII, returns pandas dataframe.

integrate_peak()

Find the area under the peak using a trapezoidal method.

integration_inds()

Find bounds of integration for apexes.

plot_integration()

Plot the chromatogram with the peaks and indices indicated.

apex_inds()

Use scipy.signal.find_peaks to find peaks in signal.

Returns:

All peak locations (as integer indices NOT times)

Return type:

numpy.ndarray

baseline_correction()

Get self.signal, output signal with background subtraction.

Uses tophat filter, based on PyMassSpec/pyms/TopHat.py

Returns:

GC signal with baseline corrected

Return type:

numpy.ndarray

convert_to_ppm(calDF, counts, chemID)

Convert integrated raw counts into ppm based on calibration data.

Parameters:
  • calDF (pandas.DataFrame) – Calibration values by chemical ID

  • counts (float) – Raw integrated counts determined by integrate_peaks()

  • chemID (str) – Contains name of chemical from calDF to be converted to ppm

Returns:

Counts for a single peak converted into ppm based on calDF

Return type:

float

get_concentrations(calDF)

Return a Pandas series of chemical concentrations.

Output has the same order as the calibration file chem IDs. Prints warning if zero molecules are detected or if peaks are found outside of those listed in calDF.

Parameters:

calDF (pandas.DataFrame) – Calibration values by chemical ID

Returns:

Concentrations for each peak in apex_ind, given in ppm

Return type:

pandas.Series

Notes

Unknown peaks could be added to calibration dataframe for reference

get_run_number()

Determine run number from filename.

Returns:

run number extracted from filename

Return type:

int

getrawdata()

Read data from ASCII, returns pandas dataframe.

Uses Matthias Richter’s example code (translated from Matlab) to read GC .ASC files

Returns:

(timestamp, [Elution Time, GC Signal]) First element is timestamp in time since epoch. Second element is pandas.DataFrame containing ‘Time’ (min) and ‘Signal’ (arb units)

Return type:

tuple(float, pandas.DataFrame)

integrate_peak()

Find the area under the peak using a trapezoidal method.

Calculate for each peak identified using bounds from integration_inds. Integrates peak trapezoidal in units of seconds*signal intensity.

Returns:

counts for all peaks, rounded to three decimal places using np.around()

Return type:

numpy.ndarray

integration_inds()

Find bounds of integration for apexes.

Edge values determined by self._half_index_search. Output in the same (time) order as the apex_ind array with the location (index) of the left and right points for integration

Returns:

  • numpy.ndarray of int – Left bounds of integration.

  • numpy.ndrray of int – Right bounds of integration.

plot_integration()

Plot the chromatogram with the peaks and indices indicated.

Notes

The plotting style update here can be integrated with data_analysis module.