scButterfly.data_processing.ATAC_data_preprocessing¶
- scButterfly.data_processing.ATAC_data_preprocessing(ATAC_data, binary_data=True, filter_features=True, fpeaks=0.005, tfidf=True, normalize=True, save_data=False, file_path=None, logging_path=None)¶
Preprocessing for ATAC data, we choose binarize, peaks filtering, TF-IDF transformation and scale transformation, using scanpy.
- Parameters:
ATAC_data (Anndata) – ATAC anndata for processing.
binary_data (bool) – choose binarized ATAC data or not, default True.
filter_features (bool) – choose use peaks filtering or not, default True.
fpeaks (float) – filter out the peaks expressed less than fpeaks*n_cells, if don’t filter peaks set it None, default 0.005.
tfidf (bool) – choose using TF-IDF transform or not, default True.
normalize (bool) – choose set data to [0, 1] or not, default True.
save_data (bool) – choose save the processed data or not, default False.
file_path (str) – the path for saving processed data, only used if save_data is True, default None.
logging_path (str) – the path for output process logging, if not save, set it None, default None.
- Returns:
ATAC_data_processed (Anndata) – ATAC data with binarization, peaks filtering, TF-IDF transformation and scale transformation preprocessed.
divide_title (numpy matrix) – matrix divided in TF-IDF transformation process, would be used in “inverse_TFIDF”.
multiply_title (numpy matrix) – matrix multiplied in TF-IDF transformation process, would be used in “inverse_TFIDF”.
max_temp (float) – max scale factor divided in process, would be used in “inverse_TFIDF”.