py_hla_match.export

class py_hla_match.export.PairwiseMatch(source, target, storage_filename='match_results.csv', loci=None, include_ard_details=False, include_molecular_details=False, include_homozygosity=False, include_dpb1_tce=False, stream=False, chunk_size=10000, overwrite=False)[source]

Bases: object

Match individuals row-wise based on two data sources - > indices of source to same indices of target. Will store the results in a csv file.

Parameters:
  • source (HLADataSource) – HLADataSource for the source dataset

  • target (HLADataSource) – HLADataSource for the target dataset

  • storage_filename (str) – Name of the file to store the results

  • loci (Iterable[str] | None) – Optional iterable of specific loci to export. If None, defaults to all supported loci

  • include_ard_details (bool) – If True, include ARD refinement columns

  • include_molecular_details (bool) – If True, include molecular refinement columns

  • include_homozygosity (bool) – If True, include homozygosity boolean

  • include_dpb1_tce (bool) – If True, include DPB1 TCE status column

  • stream (bool) – If True, results will be streamed and not stored in memory

  • chunk_size (int) – Size of the chunks to read from the file

  • overwrite (bool) – If True, allow overwriting existing output files

Raises:

ValueError – If resolution is not one of ‘basic’, ‘high’, or ‘full’

run()[source]

Executes matching pipeline.

Matches individuals from source and target datasets row-wise. Assumes that both datasets are aligned by index. Processes data in chunks and periodically flushes results to the output file. Processes data in chunks (if streamed) or in memory.

Raises:
  • FileExistsError – If output file exists and overwrite is False

  • ValueError – If input datasets have mismatched lengths

Return type:

None

to_df()[source]

Returns a DataFrame of the results.

Only available if stream=False.

Returns:

pandas DataFrame containing the match results

Raises:

RuntimeError – If streaming is enabled

Return type:

DataFrame

py_hla_match.export.scan_loci(source, chunk_size=10000)[source]

Utility function to scan hla data source and identify all loci present.

Parameters:
  • source (HLADataSource) – HLADataSource to scan

  • chunk_size (int) – Size of the chunks to read from the file

Returns:

Sorted list of unique loci detected in the data source

Return type:

List[str]

Classes

PairwiseMatch(source, target[, ...])

Match individuals row-wise based on two data sources - > indices of source to same indices of target.

Functions

scan_loci(source[, chunk_size])

Utility function to scan hla data source and identify all loci present.