import pandas as pd
44 Concatena archivos con Pandas
En esta sesión, exploraremos una herramienta para esta tarea: la función pd.concat de Pandas. Nos centraremos en su aplicación en el contexto de series temporales y la importancia de entender la opción axis y su efecto en la concatenación. Al utilizar pd.concat, podemos elegir entre concatenar a lo largo del eje de las filas (axis=0) o a lo largo del eje de las columnas (axis=1). Exploraremos cómo estas opciones son fundamentales al unir series temporales de diferentes fuentes para crear una secuencia temporal continua o para combinar series temporales paralelas manteniendo sus índices de tiempo sincronizados.
pd.concat()
SyntaxError: invalid syntax (1465400318.py, line 1)
= '../data/esolmet/001.xls'
f = pd.read_excel(f,index_col=0,parse_dates=True,skiprows=[0,1,3,4])
e1 e1.head()
Directa | Global | Difusa | UV | Temperatura | Humedad | Viento Vel. | Presiòn Bar. | |
---|---|---|---|---|---|---|---|---|
Fecha | ||||||||
2018-01-01 00:00:00 | 0.057 | 0.0 | 0.0 | 0.001 | 18.93 | 41.57 | 1.253 | 879.0692 |
2018-01-01 00:10:00 | 0.002 | 0.0 | 0.0 | 0.001 | 18.76 | 41.00 | 0.418 | 879.4363 |
2018-01-01 00:20:00 | 0.170 | 0.0 | 0.0 | 0.001 | 18.92 | 40.96 | 0.955 | 879.5181 |
2018-01-01 00:30:00 | 0.371 | 0.0 | 0.0 | 0.001 | 18.52 | 42.46 | 1.823 | 879.5826 |
2018-01-01 00:40:00 | 0.305 | 0.0 | 0.0 | 0.001 | 18.49 | 42.43 | 2.149 | 879.6826 |
= '../data/esolmet/002.xls'
f = pd.read_excel(f,index_col=0,parse_dates=True,skiprows=[0,1,3,4])
e2 e2.head()
Directa | Global | Difusa | UV | Temperatura | Humedad | Viento Vel. | Presiòn Bar. | |
---|---|---|---|---|---|---|---|---|
Fecha | ||||||||
2018-02-01 00:00:00 | 0.003 | 0.0 | 0.0 | 0.001 | 15.38 | 53.71 | 1.531 | 878.2341 |
2018-02-01 00:10:00 | 0.313 | 0.0 | 0.0 | 0.001 | 15.71 | 50.03 | 2.339 | 878.1605 |
2018-02-01 00:20:00 | 0.029 | 0.0 | 0.0 | 0.001 | 15.57 | 51.52 | 2.836 | 877.9796 |
2018-02-01 00:30:00 | 0.013 | 0.0 | 0.0 | 0.001 | 15.23 | 52.00 | 2.780 | 877.9725 |
2018-02-01 00:40:00 | 0.001 | 0.0 | 0.0 | 0.001 | 15.28 | 51.80 | 2.541 | 878.0463 |
= pd.concat([e2,e1])
e e
Directa | Global | Difusa | UV | Temperatura | Humedad | Viento Vel. | Presiòn Bar. | |
---|---|---|---|---|---|---|---|---|
Fecha | ||||||||
2018-02-01 00:00:00 | 0.003 | 0.0 | 0.0 | 0.001 | 15.38 | 53.71 | 1.531 | 878.2341 |
2018-02-01 00:10:00 | 0.313 | 0.0 | 0.0 | 0.001 | 15.71 | 50.03 | 2.339 | 878.1605 |
2018-02-01 00:20:00 | 0.029 | 0.0 | 0.0 | 0.001 | 15.57 | 51.52 | 2.836 | 877.9796 |
2018-02-01 00:30:00 | 0.013 | 0.0 | 0.0 | 0.001 | 15.23 | 52.00 | 2.780 | 877.9725 |
2018-02-01 00:40:00 | 0.001 | 0.0 | 0.0 | 0.001 | 15.28 | 51.80 | 2.541 | 878.0463 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
2018-01-31 23:20:00 | 0.000 | 0.0 | 0.0 | 0.001 | 15.82 | 52.51 | 2.070 | 878.3740 |
2018-01-31 23:30:00 | 0.000 | 0.0 | 0.0 | 0.001 | 15.42 | 53.71 | 1.581 | 878.5696 |
2018-01-31 23:40:00 | 0.081 | 0.0 | 0.0 | 0.001 | 15.28 | 53.26 | 2.083 | 878.6181 |
2018-01-31 23:50:00 | 0.075 | 0.0 | 0.0 | 0.001 | 15.32 | 52.38 | 2.500 | 878.3397 |
2018-02-01 00:00:00 | 0.003 | 0.0 | 0.0 | 0.001 | 15.38 | 53.71 | 1.531 | 878.2341 |
8498 rows × 8 columns
e.sort_index()
Directa | Global | Difusa | UV | Temperatura | Humedad | Viento Vel. | Presiòn Bar. | |
---|---|---|---|---|---|---|---|---|
Fecha | ||||||||
2018-01-01 00:00:00 | 0.057 | 0.0 | 0.0 | 0.001 | 18.93 | 41.57 | 1.253 | 879.0692 |
2018-01-01 00:10:00 | 0.002 | 0.0 | 0.0 | 0.001 | 18.76 | 41.00 | 0.418 | 879.4363 |
2018-01-01 00:20:00 | 0.170 | 0.0 | 0.0 | 0.001 | 18.92 | 40.96 | 0.955 | 879.5181 |
2018-01-01 00:30:00 | 0.371 | 0.0 | 0.0 | 0.001 | 18.52 | 42.46 | 1.823 | 879.5826 |
2018-01-01 00:40:00 | 0.305 | 0.0 | 0.0 | 0.001 | 18.49 | 42.43 | 2.149 | 879.6826 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
2018-02-28 23:20:00 | 0.000 | 0.0 | 0.0 | 0.000 | 22.49 | 38.44 | 1.811 | 874.8154 |
2018-02-28 23:30:00 | 0.000 | 0.0 | 0.0 | 0.000 | 22.75 | 39.60 | 0.859 | 874.8812 |
2018-02-28 23:40:00 | 0.003 | 0.0 | 0.0 | 0.000 | 22.34 | 39.43 | 1.244 | 875.0685 |
2018-02-28 23:50:00 | 0.117 | 0.0 | 0.0 | 0.000 | 22.52 | 39.90 | 1.729 | 874.9626 |
2018-03-01 00:00:00 | 0.004 | 0.0 | 0.0 | 0.000 | 22.35 | 41.03 | 1.508 | 875.0115 |
8498 rows × 8 columns
e
Directa | Global | Difusa | UV | Temperatura | Humedad | Viento Vel. | Presiòn Bar. | |
---|---|---|---|---|---|---|---|---|
Fecha | ||||||||
2018-02-01 00:00:00 | 0.003 | 0.0 | 0.0 | 0.001 | 15.38 | 53.71 | 1.531 | 878.2341 |
2018-02-01 00:10:00 | 0.313 | 0.0 | 0.0 | 0.001 | 15.71 | 50.03 | 2.339 | 878.1605 |
2018-02-01 00:20:00 | 0.029 | 0.0 | 0.0 | 0.001 | 15.57 | 51.52 | 2.836 | 877.9796 |
2018-02-01 00:30:00 | 0.013 | 0.0 | 0.0 | 0.001 | 15.23 | 52.00 | 2.780 | 877.9725 |
2018-02-01 00:40:00 | 0.001 | 0.0 | 0.0 | 0.001 | 15.28 | 51.80 | 2.541 | 878.0463 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
2018-01-31 23:20:00 | 0.000 | 0.0 | 0.0 | 0.001 | 15.82 | 52.51 | 2.070 | 878.3740 |
2018-01-31 23:30:00 | 0.000 | 0.0 | 0.0 | 0.001 | 15.42 | 53.71 | 1.581 | 878.5696 |
2018-01-31 23:40:00 | 0.081 | 0.0 | 0.0 | 0.001 | 15.28 | 53.26 | 2.083 | 878.6181 |
2018-01-31 23:50:00 | 0.075 | 0.0 | 0.0 | 0.001 | 15.32 | 52.38 | 2.500 | 878.3397 |
2018-02-01 00:00:00 | 0.003 | 0.0 | 0.0 | 0.001 | 15.38 | 53.71 | 1.531 | 878.2341 |
8498 rows × 8 columns
=True) e.sort_index(inplace
e
Directa | Global | Difusa | UV | Temperatura | Humedad | Viento Vel. | Presiòn Bar. | |
---|---|---|---|---|---|---|---|---|
Fecha | ||||||||
2018-01-01 00:00:00 | 0.057 | 0.0 | 0.0 | 0.001 | 18.93 | 41.57 | 1.253 | 879.0692 |
2018-01-01 00:10:00 | 0.002 | 0.0 | 0.0 | 0.001 | 18.76 | 41.00 | 0.418 | 879.4363 |
2018-01-01 00:20:00 | 0.170 | 0.0 | 0.0 | 0.001 | 18.92 | 40.96 | 0.955 | 879.5181 |
2018-01-01 00:30:00 | 0.371 | 0.0 | 0.0 | 0.001 | 18.52 | 42.46 | 1.823 | 879.5826 |
2018-01-01 00:40:00 | 0.305 | 0.0 | 0.0 | 0.001 | 18.49 | 42.43 | 2.149 | 879.6826 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
2018-02-28 23:20:00 | 0.000 | 0.0 | 0.0 | 0.000 | 22.49 | 38.44 | 1.811 | 874.8154 |
2018-02-28 23:30:00 | 0.000 | 0.0 | 0.0 | 0.000 | 22.75 | 39.60 | 0.859 | 874.8812 |
2018-02-28 23:40:00 | 0.003 | 0.0 | 0.0 | 0.000 | 22.34 | 39.43 | 1.244 | 875.0685 |
2018-02-28 23:50:00 | 0.117 | 0.0 | 0.0 | 0.000 | 22.52 | 39.90 | 1.729 | 874.9626 |
2018-03-01 00:00:00 | 0.004 | 0.0 | 0.0 | 0.000 | 22.35 | 41.03 | 1.508 | 875.0115 |
8498 rows × 8 columns
=1) pd.concat([e2,e1],axis
Directa | Global | Difusa | UV | Temperatura | Humedad | Viento Vel. | Presiòn Bar. | Directa | Global | Difusa | UV | Temperatura | Humedad | Viento Vel. | Presiòn Bar. | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fecha | ||||||||||||||||
2018-01-01 00:00:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.057 | 0.0 | 0.0 | 0.001 | 18.93 | 41.57 | 1.253 | 879.0692 |
2018-01-01 00:10:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.002 | 0.0 | 0.0 | 0.001 | 18.76 | 41.00 | 0.418 | 879.4363 |
2018-01-01 00:20:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.170 | 0.0 | 0.0 | 0.001 | 18.92 | 40.96 | 0.955 | 879.5181 |
2018-01-01 00:30:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.371 | 0.0 | 0.0 | 0.001 | 18.52 | 42.46 | 1.823 | 879.5826 |
2018-01-01 00:40:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.305 | 0.0 | 0.0 | 0.001 | 18.49 | 42.43 | 2.149 | 879.6826 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2018-02-28 23:20:00 | 0.000 | 0.0 | 0.0 | 0.0 | 22.49 | 38.44 | 1.811 | 874.8154 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2018-02-28 23:30:00 | 0.000 | 0.0 | 0.0 | 0.0 | 22.75 | 39.60 | 0.859 | 874.8812 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2018-02-28 23:40:00 | 0.003 | 0.0 | 0.0 | 0.0 | 22.34 | 39.43 | 1.244 | 875.0685 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2018-02-28 23:50:00 | 0.117 | 0.0 | 0.0 | 0.0 | 22.52 | 39.90 | 1.729 | 874.9626 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2018-03-01 00:00:00 | 0.004 | 0.0 | 0.0 | 0.0 | 22.35 | 41.03 | 1.508 | 875.0115 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
8497 rows × 16 columns
Ejercicio propuesto
Hacer una lista de glob, cargar todos los archivos usando un list comprehension y definiendo una función, concatenando todos los archivos y ordenando el índice.