27  Parquet

import pandas as pd
%%timeit
f = '../../data/Temixco_2018_10Min.csv'
pd.read_csv(f,index_col=0,parse_dates=True)
57.4 ms ± 2.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
f = '../../data/Temixco_2018_10Min.xlsx'
pd.read_excel(f,index_col=0,parse_dates=True)
4.17 s ± 85.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
f = '../../data/Temixco_2018_10Min.parquet'
pd.read_parquet(f)
6.37 ms ± 199 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
f = '../../data/Temixco_2018_10Min.csv'
tmp = pd.read_csv(f,index_col=0,parse_dates=True)
tmp.to_parquet('../../data/tmp.parquet')
f = '../../data/tmp.parquet'
tmx = pd.read_parquet(f)
tmx.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 52560 entries, 2018-01-01 00:00:00 to 2018-12-31 23:50:00
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Ib      52423 non-null  float64
 1   Ig      52423 non-null  float64
 2   To      52560 non-null  float64
 3   RH      52560 non-null  float64
 4   WS      52560 non-null  float64
 5   WD      52560 non-null  float64
 6   P       52560 non-null  float64
dtypes: float64(7)
memory usage: 3.2 MB

image.png
%%timeit
f = '../../data/Temixco_2018_10Min.csv.zip'
pd.read_csv(f,compression='zip',index_col=0,parse_dates=True)
77.3 ms ± 2.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Explora además:

  1. hdf5
  2. feather
  3. apache orc
  4. pickle

Pon atención a las dependencias que tengas que instalar