pycnet.review package
Module contents
Defines functions for generating a set of apparent detections of one or more target classes based on a set of PNW-Cnet class scores. See target_classes.csv for a complete list of sonotypes detected by PNW-Cnet v4 and v5 and their respective codes / labels.
Functions:
- buildClipDataFrame
Produce a table listing when each clip was recorded.
- getApparentDetections
Find apparent detections of one class at one score threshold within a set of class scores.
- getClipInfo
Infer information about a file from its filename.
- getDefaultReviewSettings
Decide which score threshold to use for each target class when no review_settings file was provided.
- getSourceFile
Return the name of the .wav file from which a clip was taken.
- getSourceFolders
Get locations of a set of files within a directory tree.
- makeKscopeReviewTable
Produce a table of apparent detections, formatted to be exported as a CSV file and browsed / tagged in Kaleidoscope.
- makeReviewTable
Produce a table of apparent detections of one or more target classes for manual review.
- parseStrReviewCriteria
Read a mapping of target classes to score thresholds from a string provided by the user.
- readPredFile
Read a table of PNW-Cnet class scores from a CSV file.
- readReviewSettings
Read a mapping of target classes to score thresholds from a CSV file.
- summarizeRecordingEffort
Summarize the number of clips and amount of recording time by site, station and date.
- summarizeDetections
Tally up apparent detections of all target classes from a set of class scores across a range of score thresholds.
- tallyDetections
Tally up the number of apparent detections for all classes in a set of class scores at a single threshold.
- pycnet.review.buildClipDataFrame(pred_table)
Extract basic information about clips in the predictions table.
- Parameters:
pred_table (Pandas.DataFrame) – DataFrame containing PNW-Cnet class scores indexed by image filename.
- Returns:
DataFrame summarizing information about the audio data that were processed to produce the class scores.
- Return type:
Pandas.DataFrame
- pycnet.review.getApparentDetections(pred_table, class_code, score_threshold)
Filter PNW-Cnet class scores to apparent detections of one class.
- Parameters:
pred_table (Pandas.DataFrame) – DataFrame containing PNW-Cnet class scores indexed by image filename.
class_code (str) – The abbreviation for the target class of interest. Note that class codes are specific to the version of PNW-Cnet used.
score_threshold (float) – A number between 0 and 1 defining the minimum score at which a clip will be treated as an apparent detection of the chosen class.
- Returns:
DataFrame containing rows from pred_table where the score for class_code was greater than or equal to score_threshold.
- Return type:
Pandas.DataFrame
- pycnet.review.getClipInfo(clip_name)
Extract information from the name of a spectrogram image file.
Clip names will be in the form
[Area]_[Site]-[Stn]_[Date]_[Time]_part_[part].png
e.g.
COA_23459-C_20230316_081502_part_001.png
- Parameters:
clip_name (str) – The name of a spectrogram image file.
- Returns:
Dictionary of values inferred from the image filename.
- Return type:
dict
- pycnet.review.getClipTimestamp(source_file, offset)
Get the timestamp of a clip taken from a longer file.
- Parameters:
source_file (str) – Name of a .wav file including a timestamp in the format YYYYMMDD_HHMMSS.
offset (numeric) – Location of the clip within source_file in seconds from the beginning.
- Returns:
A tuple (clip_date, clip_time) containing two strings representing the date and the time at the start of the clip.
- Return type:
tuple
- pycnet.review.getDefaultReviewSettings(cnet_version)
Define default thresholds for classes to include in review file.
If the user does not provide a review_settings file listing the classes and score thresholds they would like to use, by default a threshold of 0.25 (v4) or 0.50 (v5) will be used for northern spotted owl classes and a threshold of 0.95 will be used for all other target classes. Spotted owl classes will be selected first, followed by all other classes in alphabetical order by class code. This corresponds to the thresholds used historically by the northern spotted owl monitoring program to select clips for review.
We recommend tailoring your review criteria more narrowly, especially when using PNW-Cnet v5, as the large number of target classes can result in a large and unwieldy review table full of species you don’t care about.
- Parameters:
cnet_version (str) – Version of the PNW-Cnet model being used, either “v4” or “v5”.
- Returns:
A dictionary of score thresholds indexed by class code.
- Return type:
dict
- pycnet.review.getReadableOffset(offset)
Convert a number of seconds to a more human-readable offset.
- Parameters:
offset (numeric) – A number of seconds, typically representing a position within a long-form .wav file.
- Returns:
A string in format H:MM:SS if offset > 3600 or MM:SS otherwise.
- Return type:
str
- pycnet.review.getSourceFile(clip_name, ext='.wav')
Return the name of the audio file from which a clip was taken.
- Parameters:
clip_name (str) – Filename created by concatenating the name of the source file, a string indicating a position within that file (e.g. “part_017”), and a file extension (“.png”).
ext (str) – File extension of the source audio files.
- Returns:
A tuple (source_file, str_part) containing the name of the source file and the part_xxx string indicating position within the source file. If the clip name is not formatted as expected, the returned tuple will contain two empty strings.
- Return type:
tuple
- pycnet.review.getSourceFolders(clip_list, mode='from_source_files', top_dir='', prefix='', flac_mode=False)
Get locations of a set of files within a directory tree.
This function will attempt to associate each spectrogram image with an existing source file in the directory tree rooted at top_dir, based on the filename. Images for which a source file cannot be found, or for which there are multiple possible source files, will cause the function to return nothing.
- Parameters:
clip_list (list) – A list of names of spectrogram image files.
mode (str) – Either “from_source_files” or “from_clip_names”. If “from_source_files”, source folders will be inferred by joining the source filename with the .wav inventory table under top_dir, which will be created if it does not already exist. If “from_clip_names”, the source folders will be constructed by combining the station code in the filename of each clip with an optional prefix.
top_dir (str) – Path to the root of the directory tree containing the source .wav files. Values in the FOLDER field will be generated relative to this directory. Used if mode == “from_source_files”.
prefix (str) – Prefix to be combined with the stn ID code inferred from the name of each clip in order to construct the source folder. Used if mode == “from_clip_names”.
flac_mode (bool) – Assume audio files are .flac fomat rather than .wav.
- Returns:
DataFrame listing the folder (relative to top_dir), source filename, “part_xxx” string, and image filename for each clip.
- Return type:
Pandas.DataFrame
- pycnet.review.makeKscopeReviewTable(pred_table, target_dir, cnet_version='v5', review_settings=None, timescale='weekly', infer_source_folders=False, source_folder_prefix='', flac_mode=False)
Extract & format apparent detections for review in Kaleidoscope.
- Parameters:
pred_table (Pandas.DataFrame) – DataFrame listing a set of image filenames and the class scores produced by PNW-Cnet for each image.
target_dir (str) – Path to the root of the directory tree containing the audio data.
cnet_version (str) – The version of PNW-Cnet used to generate the class scores (either “v4” or “v5”).
review_settings (dict) – Dictionary mapping target classes to score thresholds. See makeReviewTable for details.
timescale (str) – The temporal scale (“daily” or “weekly”) at which to tally the apparent detections of each class.
infer_source_folders (bool) – Construct source folders based on clip filenames instead of checking a .wav inventory file.
source_folder_prefix (str) – Prefix to combine with the station ID code to construct the names of source folders.
flac_mode (bool) – Assume source audio files are .flac instead of .wav.
- Returns:
DataFrame listing apparent detections of one or more classes, formatted to be written to a CSV file which will be readable and editable using Wildlife Acoustics’ Kaleidoscope software.
- Return type:
Pandas.DataFrame
- pycnet.review.makeReviewTable(pred_table, cnet_version='v5', review_settings=None)
Extract apparent detections from a set of class scores.
This function selects clips representing potential detections based on review criteria and generates information about those clips. The makeKscopeReviewTable function below is designed to format this table for output and human review using Kaleidoscope.
If no review_settings dictionary is provided, the function will use the getDefaultReviewSettings function to map classes to score thresholds.
Apparent detections of each class will be extracted in the order that the classes appear in the review_settings dictionary.
- Parameters:
pred_table (Pandas.DataFrame) – DataFrame containing PNW-Cnet class scores indexed by image filename.
review_settings (dict) – A dictionary mapping class codes to score thresholds used to define apparent detections for each class.
- Returns:
DataFrame listing apparent detections for one or more classes based on the review criteria provided.
- Return type:
Pandas.DataFrame
- pycnet.review.parseStrReviewCriteria(crit_string)
Map target classes to score thresholds based on a string.
The crit_string argument should include class codes or groups of class codes alternating with the score threshold to use for each class or group of classes, e.g.
"BRMA1 0.5 STVA_8Note STVA_Series 0.95"- Parameters:
crit_string (str) – A string listing classes (singly or in groups) alternating with the score threshold to use for each class or group.
- Returns:
A dictionary of score thresholds indexed by class code.
- Return type:
dict
- pycnet.review.readPredFile(pred_file_path)
Read a table of PNW-Cnet class scores from a CSV file.
- Parameters:
pred_file_path (str) – Path to the file containing the class scores.
- Returns:
DataFrame containing PNW-Cnet class scores indexed by image filename.
- Return type:
Pandas.DataFrame
- pycnet.review.readReviewSettings(review_settings_file)
Read a mapping of target class to score threshold from a file.
- Parameters:
review_settings_file (str) – Path to a CSV file with a “Class” column listing the classes to be included in the review file and a “Threshold” column listing the score threshold used to define apparent detections for each class.
- Returns:
A dictionary of score thresholds indexed by class code.
- Return type:
dict
- pycnet.review.summarizeDetections(pred_table, n_workers=None)
Tally apparent detections for all classes at various thresholds.
Uses mp.Pool for multiprocessing, so it needs to be used in a main() function, otherwise the worker processes multiply endlessly.
- Parameters:
pred_table (Pandas.DataFrame) – DataFrame containing PNW-Cnet class scores indexed by image filename.
n_workers (int) – Number of worker processes to use for multiprocessing. Defaults to either 10 or the number of logical CPU cores on the host machine, whichever is lower.
- Returns:
DataFrame listing the number of apparent detections of all target classes over a range of score thresholds [0.05, 0.10, …, 0.95, 0.98, 0.99], grouped by area, site, station and date.
- Return type:
Pandas.DataFrame
- pycnet.review.summarizeRecordingEffort(pred_table)
Summarize recording effort by area, site, station, day and week.
- Parameters:
pred_table (Pandas.DataFrame) – DataFrame containing PNW-Cnet class scores indexed by image filename.
- Returns:
DataFrame with a row for each combination of area, site, station, and date listing the number of hours of recording time based on the number of 12-second clips that were processed to generate the class scores.
- Return type:
Pandas.DataFrame
- pycnet.review.tallyDetections(pred_table, score_threshold)
Tally apparent detections of all classes at one threshold.
- Parameters:
pred_table (Pandas.DataFrame) – DataFrame containing PNW-Cnet class scores indexed by image filename.
score_threshold (float) – Minimum score for a clip to be considered an apparent detection for any class.
- Returns:
DataFrame listing the number of apparent detections of all target classes at the score threshold specified, grouped by area, site, station and date.
- Return type:
Pandas.DataFrame