scButterfly.data_processing.ATAC_data_preprocessing

scButterfly.data_processing.ATAC_data_preprocessing(ATAC_data, binary_data=True, filter_features=True, fpeaks=0.005, tfidf=True, normalize=True, save_data=False, file_path=None, logging_path=None)

Preprocessing for ATAC data, we choose binarize, peaks filtering, TF-IDF transformation and scale transformation, using scanpy.

Parameters:
  • ATAC_data (Anndata) – ATAC anndata for processing.

  • binary_data (bool) – choose binarized ATAC data or not, default True.

  • filter_features (bool) – choose use peaks filtering or not, default True.

  • fpeaks (float) – filter out the peaks expressed less than fpeaks*n_cells, if don’t filter peaks set it None, default 0.005.

  • tfidf (bool) – choose using TF-IDF transform or not, default True.

  • normalize (bool) – choose set data to [0, 1] or not, default True.

  • save_data (bool) – choose save the processed data or not, default False.

  • file_path (str) – the path for saving processed data, only used if save_data is True, default None.

  • logging_path (str) – the path for output process logging, if not save, set it None, default None.

Returns:

  • ATAC_data_processed (Anndata) – ATAC data with binarization, peaks filtering, TF-IDF transformation and scale transformation preprocessed.

  • divide_title (numpy matrix) – matrix divided in TF-IDF transformation process, would be used in “inverse_TFIDF”.

  • multiply_title (numpy matrix) – matrix multiplied in TF-IDF transformation process, would be used in “inverse_TFIDF”.

  • max_temp (float) – max scale factor divided in process, would be used in “inverse_TFIDF”.