py_hla_match.export.PairwiseMatch
- class py_hla_match.export.PairwiseMatch(source, target, storage_filename='match_results.csv', loci=None, include_ard_details=False, include_molecular_details=False, include_homozygosity=False, include_dpb1_tce=False, stream=False, chunk_size=10000, overwrite=False)[source]
Bases:
objectMatch individuals row-wise based on two data sources - > indices of source to same indices of target. Will store the results in a csv file.
- Parameters:
source (HLADataSource) – HLADataSource for the source dataset
target (HLADataSource) – HLADataSource for the target dataset
storage_filename (str) – Name of the file to store the results
loci (Iterable[str] | None) – Optional iterable of specific loci to export. If None, defaults to all supported loci
include_ard_details (bool) – If True, include ARD refinement columns
include_molecular_details (bool) – If True, include molecular refinement columns
include_homozygosity (bool) – If True, include homozygosity boolean
include_dpb1_tce (bool) – If True, include DPB1 TCE status column
stream (bool) – If True, results will be streamed and not stored in memory
chunk_size (int) – Size of the chunks to read from the file
overwrite (bool) – If True, allow overwriting existing output files
- Raises:
ValueError – If resolution is not one of ‘basic’, ‘high’, or ‘full’
- run()[source]
Executes matching pipeline.
Matches individuals from source and target datasets row-wise. Assumes that both datasets are aligned by index. Processes data in chunks and periodically flushes results to the output file. Processes data in chunks (if streamed) or in memory.
- Raises:
FileExistsError – If output file exists and overwrite is False
ValueError – If input datasets have mismatched lengths
- Return type:
None
- to_df()[source]
Returns a DataFrame of the results.
Only available if stream=False.
- Returns:
pandas DataFrame containing the match results
- Raises:
RuntimeError – If streaming is enabled
- Return type:
DataFrame