py_hla_match.export.PairwiseMatch

class py_hla_match.export.PairwiseMatch(source, target, storage_filename='match_results.csv', loci=None, include_ard_details=False, include_molecular_details=False, include_homozygosity=False, include_dpb1_tce=False, stream=False, chunk_size=10000, overwrite=False)[source]

Bases: object

Match individuals row-wise based on two data sources - > indices of source to same indices of target. Will store the results in a csv file.

Parameters:

source (HLADataSource) – HLADataSource for the source dataset
target (HLADataSource) – HLADataSource for the target dataset
storage_filename (str) – Name of the file to store the results
loci (Iterable[str] | None) – Optional iterable of specific loci to export. If None, defaults to all supported loci
include_ard_details (bool) – If True, include ARD refinement columns
include_molecular_details (bool) – If True, include molecular refinement columns
include_homozygosity (bool) – If True, include homozygosity boolean
include_dpb1_tce (bool) – If True, include DPB1 TCE status column
stream (bool) – If True, results will be streamed and not stored in memory
chunk_size (int) – Size of the chunks to read from the file
overwrite (bool) – If True, allow overwriting existing output files

Raises:

ValueError – If resolution is not one of ‘basic’, ‘high’, or ‘full’

run()[source]

Executes matching pipeline.

Matches individuals from source and target datasets row-wise. Assumes that both datasets are aligned by index. Processes data in chunks and periodically flushes results to the output file. Processes data in chunks (if streamed) or in memory.

Raises:

FileExistsError – If output file exists and overwrite is False
ValueError – If input datasets have mismatched lengths

Return type:

None

to_df()[source]

Returns a DataFrame of the results.

Only available if stream=False.

Returns:: pandas DataFrame containing the match results
Raises:: RuntimeError – If streaming is enabled
Return type:: DataFrame