Skip to content

ICC

This module contains functions for running intraclass correlation coefficient (ICC) analyses.

icc

Functions:

calculate_intraclass_correlations

calculate_intraclass_correlations(data: DataFrame, variables: List[str], variable_col: str = 'variable', subject_col: str = 'subjectID', timepoint_col: str = 'timepoint', value_col: str = 'value', icc_type: str = 'ICC3') -> DataFrame

Computes intraclass correlation coefficients (ICCs) for a specified list of variables across multiple subjects and timepoints (or raters). This function is designed for use with long-format DataFrames, where each row represents a single observation, including the subject ID, timepoint, variable name, and corresponding value. It supports various types of ICC calculations, as implemented in the corresponding function from pingouin. The output is a DataFrame containing the ICC values, associated statistics, and confidence intervals for each variable.

Parameters:

  • data

    (DataFrame) –

    The DataFrame containing the data variables. This should be a long-format DataFrame with one row per observation.

  • variables

    (List[str]) –

    List of variables (as strings) for which to calculate intraclass correlations.

  • variable_col

    (str, default: 'variable' ) –

    The name of the column containing the variable names. Defaults to "variable".

  • subject_col

    (str, default: 'subjectID' ) –

    The name of the column containing the subject IDs. Defaults to "subjectID".

  • timepoint_col

    (str, default: 'timepoint' ) –

    The name of the column containing the timepoint (rater). Defaults to "timepoint".

  • value_col

    (str, default: 'value' ) –

    The name of the column containing the values to be rated. Defaults to "value".

  • icc_type

    (str, default: 'ICC3' ) –

    The type of intraclass correlation. Defaults to "ICC3".

Returns:

  • DataFrame

    pd.DataFrame: A DataFrame containing intraclass correlation results for

  • DataFrame

    each variable.

Example
import pandas as pd
import numpy as np

# Number of subjects
num_subjects = 5

# Initial values for timepoint 1
values_timepoint_1 = {
    'A': np.random.randint(5, 15, num_subjects),
    'B': np.random.randint(5, 15, num_subjects)
}

# Adding Gaussian noise to create values for timepoint 2
noise = np.random.normal(0, 0.5, num_subjects)

# Constructing the DataFrame
data = pd.DataFrame({
    'subjectID': np.repeat(np.arange(1, num_subjects + 1), 4),
    'timepoint': [1, 1, 2, 2] * num_subjects,
    'variable': ['A', 'B', 'A', 'B'] * num_subjects,
    'value': np.concatenate([
        values_timepoint_1['A'],
        values_timepoint_1['B'],
        values_timepoint_1['A'] + noise,
        values_timepoint_1['B'] + noise
    ])
})

# Calculate intraclass correlations for variables A, B, and C
icc_results = calculate_intraclass_correlations(
    data=data,
    variables=['A', 'B'],
    variable_col='variable',
    subject_col='subjectID',
    timepoint_col='timepoint',
    value_col='value',
    icc_type='ICC3'
)

print(icc_results)
Source code in stats_utils/reliability/icc.py
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def calculate_intraclass_correlations(
    data: pd.DataFrame,
    variables: List[str],
    variable_col: str = "variable",
    subject_col: str = "subjectID",
    timepoint_col: str = "timepoint",
    value_col: str = "value",
    icc_type: str = "ICC3",
) -> pd.DataFrame:
    """
    Computes intraclass correlation coefficients (ICCs) for a specified list of
    variables across multiple subjects and timepoints (or raters). This
    function is designed for use with long-format DataFrames, where each row
    represents a single observation, including the subject ID, timepoint,
    variable name, and corresponding value. It supports various types of ICC
    calculations, as implemented in the corresponding function from
    `pingouin`. The output is a DataFrame containing the ICC
    values, associated statistics, and confidence intervals for each variable.

    Args:
        data (pd.DataFrame): The DataFrame containing the data variables.
            This should be a long-format DataFrame with one row per
            observation.
        variables (List[str]): List of variables (as strings) for which
            to calculate intraclass correlations.
        variable_col (str): The name of the column containing the variable
            names. Defaults to `"variable"`.
        subject_col (str): The name of the column containing the subject IDs.
            Defaults to `"subjectID"`.
        timepoint_col (str): The name of the column containing the timepoint
            (rater). Defaults to `"timepoint"`.
        value_col (str): The name of the column containing the values to be
            rated. Defaults to `"value"`.
        icc_type (str): The type of intraclass correlation. Defaults to
            `"ICC3"`.

    Returns:
        pd.DataFrame: A DataFrame containing intraclass correlation results for
        each variable.

    Example:
        ```python
        import pandas as pd
        import numpy as np

        # Number of subjects
        num_subjects = 5

        # Initial values for timepoint 1
        values_timepoint_1 = {
            'A': np.random.randint(5, 15, num_subjects),
            'B': np.random.randint(5, 15, num_subjects)
        }

        # Adding Gaussian noise to create values for timepoint 2
        noise = np.random.normal(0, 0.5, num_subjects)

        # Constructing the DataFrame
        data = pd.DataFrame({
            'subjectID': np.repeat(np.arange(1, num_subjects + 1), 4),
            'timepoint': [1, 1, 2, 2] * num_subjects,
            'variable': ['A', 'B', 'A', 'B'] * num_subjects,
            'value': np.concatenate([
                values_timepoint_1['A'],
                values_timepoint_1['B'],
                values_timepoint_1['A'] + noise,
                values_timepoint_1['B'] + noise
            ])
        })

        # Calculate intraclass correlations for variables A, B, and C
        icc_results = calculate_intraclass_correlations(
            data=data,
            variables=['A', 'B'],
            variable_col='variable',
            subject_col='subjectID',
            timepoint_col='timepoint',
            value_col='value',
            icc_type='ICC3'
        )

        print(icc_results)
        ```
    """
    # Initialize a list to store the intraclass correlation results for each
    # variable
    iccs = []

    # Loop through each variable to compute its intraclass correlation
    for variable in variables:
        # Filter the data for the current variable
        variable_data = data[data[variable_col] == variable]

        # Calculate intraclass correlation
        icc_res = intraclass_corr(
            data=variable_data,
            targets=subject_col,
            raters=timepoint_col,
            ratings=value_col,
        )

        # Add a column to the result indicating the current variable
        icc_res["variable"] = variable

        # Append the result to the list of intraclass correlations
        iccs.append(icc_res)

    # Concatenate the individual results into a single DataFrame
    iccs = pd.concat(iccs)

    # Filter to only include results of the desired type
    iccs = iccs[iccs["Type"] == icc_type]

    return iccs