longitudinal_ecg_analysis.dataset_curators package
Submodules
longitudinal_ecg_analysis.dataset_curators.curate_dataset_hh module
curate_dataset_hh.py
A blank file - to be filled in later.
- longitudinal_ecg_analysis.dataset_curators.curate_dataset_hh.curate_dataset_hh()
placeholder function
longitudinal_ecg_analysis.dataset_curators.curate_dataset_mcmed module
curate_dataset_mcmed.py
Curates the MC-MED dataset for analysis.
- longitudinal_ecg_analysis.dataset_curators.curate_dataset_mcmed.check_mcmed_dataset_files(settings)
Check the presence of required files and directories for the MC-MED dataset.
This function verifies that the expected input files and folders exist in the directory specified by settings[“paths”][“dataset_root_raw_folder”]. It checks for:
A CSV file named ‘visits.csv’
A CSV file named ‘waveform_summary.csv’
A folder named ‘waveforms’
These are essential for processing the dataset. If any of the required files or folders are missing, a FileNotFoundError is raised.
- Parameters:
settings (dict) – A dictionary containing file path settings. It must include “dataset_root_raw_folder” under ‘settings[“paths”]’.
- Returns:
- The updated settings dictionary with paths for the necessary
files and folders added.
- Return type:
dict
- Raises:
FileNotFoundError – If any required file or folder does not exist.
- longitudinal_ecg_analysis.dataset_curators.curate_dataset_mcmed.compute_past_future_flags(group)
- longitudinal_ecg_analysis.dataset_curators.curate_dataset_mcmed.create_var_info()
- longitudinal_ecg_analysis.dataset_curators.curate_dataset_mcmed.curate_dataset_mcmed(settings)
Curate the MC-MED dataset for analysis. This is a freely available dataset, available at: https://doi.org/10.13026/xgx1-7x47
- Parameters:
settings (dict) – Dataset settings loaded from a settings file.
- Returns:
None. Writes the prepared data to disk.
- longitudinal_ecg_analysis.dataset_curators.curate_dataset_mcmed.extract_standard_dataset_variables(settings)
Extract dataset variables in a standardised format
- Parameters:
settings (a dict of settings)
- Returns:
Writes the following prepared data files to disk: …
- longitudinal_ecg_analysis.dataset_curators.curate_dataset_mcmed.identify_waveform_recording_file_paths(rec_link_with_rec_id_orig, settings)
Create file paths for waveform recordings and save them to a CSV.
- Parameters:
settings (dict) – Dictionary containing paths.
- Returns:
…
- longitudinal_ecg_analysis.dataset_curators.curate_dataset_mcmed.merge_df_waves_into_df(df_waves, df)
- longitudinal_ecg_analysis.dataset_curators.curate_dataset_mcmed.reformat_variables(df, up)
longitudinal_ecg_analysis.dataset_curators.curate_dataset_music module
curate_dataset_music.py
Curates the MUSIC (Sudden Cardiac Death in Chronic Heart Failure) dataset for analysis.
- longitudinal_ecg_analysis.dataset_curators.curate_dataset_music.check_music_dataset_files(settings)
Check the presence of required files and directories for the MUSIC dataset.
This function verifies that the expected input files and folders exist in the directory specified by settings[“paths”][“input_dir”]. It checks for:
A CSV file named ‘subject-info.csv’
A folder named ‘Holter_ECG’
These are essential for processing the MUSIC dataset. If any of the required files or folders are missing, a FileNotFoundError is raised.
- Parameters:
settings (dict) – A dictionary containing file path settings. It must include ‘input_dir’ under ‘settings[“paths”]’.
- Returns:
- The updated settings dictionary with paths for ‘subj-info-csv’ and
’holter_ecg_folder’ added to settings[“paths”].
- Return type:
dict
- Raises:
FileNotFoundError – If any required file or folder does not exist.
- longitudinal_ecg_analysis.dataset_curators.curate_dataset_music.curate_dataset_music(settings)
Curate the MUSIC (Sudden Cardiac Death in Chronic Heart Failure) dataset for analysis. The dataset is available at: https://doi.org/10.13026/fa8p-he52
- Parameters:
settings (dict) – Dataset settings loaded from a settings file.
- Returns:
None. Writes the prepared data to disk.
- longitudinal_ecg_analysis.dataset_curators.curate_dataset_music.extract_standard_dataset_variables(settings)
Extract clinical, outcome and dataset variables in a standardised format
- Parameters:
settings (a dict of settings, including settings["paths"]["subj-info-csv"] - the path of subject-info.csv)
- Returns:
Writes the following prepared data files to disk: standard-clinical-metrics.csv : A CSV file containing clinical metrics for each subject. standard-outcome-variables.csv : A CSV file containing outcome variables for each subject. standard-dataset-variables.csv : A CSV file containing variables describing the dataset.
- longitudinal_ecg_analysis.dataset_curators.curate_dataset_music.identify_ECG_recording_file_paths(settings)
Create file paths for ECG recordings and save them to a CSV.
- Parameters:
settings (dict) – Dictionary containing paths, including ‘standard-clinical-metrics-csv’, ‘holter_ecg_folder’ and ‘signal-filepaths-csv’.
- Returns:
DataFrame with columns ‘subj_id’ and ‘filepath’.
- Return type:
pd.DataFrame