import pandas as pd
27 Parquet
%%timeit
= '../../data/Temixco_2018_10Min.csv'
f =0,parse_dates=True) pd.read_csv(f,index_col
57.4 ms ± 2.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
= '../../data/Temixco_2018_10Min.xlsx'
f =0,parse_dates=True) pd.read_excel(f,index_col
4.17 s ± 85.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
= '../../data/Temixco_2018_10Min.parquet'
f pd.read_parquet(f)
6.37 ms ± 199 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
= '../../data/Temixco_2018_10Min.csv'
f = pd.read_csv(f,index_col=0,parse_dates=True)
tmp '../../data/tmp.parquet') tmp.to_parquet(
= '../../data/tmp.parquet'
f = pd.read_parquet(f)
tmx tmx.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 52560 entries, 2018-01-01 00:00:00 to 2018-12-31 23:50:00
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Ib 52423 non-null float64
1 Ig 52423 non-null float64
2 To 52560 non-null float64
3 RH 52560 non-null float64
4 WS 52560 non-null float64
5 WD 52560 non-null float64
6 P 52560 non-null float64
dtypes: float64(7)
memory usage: 3.2 MB
%%timeit
= '../../data/Temixco_2018_10Min.csv.zip'
f ='zip',index_col=0,parse_dates=True) pd.read_csv(f,compression
77.3 ms ± 2.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Explora además:
- hdf5
- feather
- apache orc
- pickle
Pon atención a las dependencias que tengas que instalar