17  Warnings

¡Bienvenidos a esta sesión! Hoy vamos a revisar el tema de las advertencias en Pandas y cómo estas afectan tu análisis de datos. Aprenderemos a interpretar los mensajes de alerta que surgen en tu libreta de Jupyter y a manejarlos de manera efectiva.

Las advertencias en Pandas son más que simples notificaciones; son indicadores cruciales que pueden tener un impacto significativo en tu análisis. Estos mensajes no solo sirven para señalar problemas menores, sino que también son herramientas valiosas que te orientan para identificar posibles inconvenientes y perfeccionar tu código.

Es importante aprender a identificar advertencias relacionadas con funciones que están quedando obsoletas o que han experimentado cambios en su sintaxis. Esto te permitirá mantener tu código al día y evitar posibles errores en versiones futuras de Pandas.

# pip uninstall pyarrow --y
import pandas as pd
/var/folders/2z/fh3yv7r50rxgy804jm3f7b0c0000gn/T/ipykernel_62846/4080736814.py:1: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
f = "../data/Cuernavaca_Enero_comas.csv"
cuerna = pd.read_csv(f,index_col=0,parse_dates=True)
cuerna
To RH P Ws Wd Ig Ib Id
tiempo
2012-01-01 00:00:00 19.3 58 87415 0.0 26 0 0 0
2012-01-01 01:00:00 18.6 59 87602 0.0 26 0 0 0
2012-01-01 02:00:00 17.9 61 87788 0.0 30 0 0 0
2012-01-01 03:00:00 17.3 66 87554 0.0 30 0 0 0
2012-01-01 04:00:00 16.6 71 87321 0.0 27 0 0 0
... ... ... ... ... ... ... ... ...
2012-01-31 19:00:00 17.9 41 86878 0.9 264 0 0 0
2012-01-31 20:00:00 16.9 44 86839 1.0 269 0 0 0
2012-01-31 21:00:00 16.5 45 86887 1.0 208 0 0 0
2012-01-31 22:00:00 16.3 46 87020 1.0 148 0 0 0
2012-01-31 23:00:00 16.2 45 87238 1.0 170 0 0 0

744 rows × 8 columns

cuerna_1dia = cuerna.loc["2012-01-01"]
cuerna_1dia.tail()
To RH P Ws Wd Ig Ib Id
tiempo
2012-01-01 19:00:00 17.0 69 87101 0.0 269 0 0 0
2012-01-01 20:00:00 17.3 67 87115 0.0 50 0 0 0
2012-01-01 21:00:00 17.0 56 87080 0.2 85 0 0 0
2012-01-01 22:00:00 16.6 49 87089 0.5 89 0 0 0
2012-01-01 23:00:00 15.9 44 87143 0.8 93 0 0 0
cuerna_1dia["To"] = 10
cuerna_1dia
/var/folders/2z/fh3yv7r50rxgy804jm3f7b0c0000gn/T/ipykernel_62846/3724519935.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cuerna_1dia["To"] = 10
To RH P Ws Wd Ig Ib Id
tiempo
2012-01-01 00:00:00 10 58 87415 0.0 26 0 0 0
2012-01-01 01:00:00 10 59 87602 0.0 26 0 0 0
2012-01-01 02:00:00 10 61 87788 0.0 30 0 0 0
2012-01-01 03:00:00 10 66 87554 0.0 30 0 0 0
2012-01-01 04:00:00 10 71 87321 0.0 27 0 0 0
2012-01-01 05:00:00 10 76 87087 0.0 26 0 0 0
2012-01-01 06:00:00 10 72 87096 0.0 27 0 0 0
2012-01-01 07:00:00 10 70 87140 0.0 34 20 151 11
2012-01-01 08:00:00 10 68 87185 0.0 61 164 522 37
2012-01-01 09:00:00 10 60 87229 0.0 95 369 812 58
2012-01-01 10:00:00 10 64 87229 1.0 108 568 931 68
2012-01-01 11:00:00 10 68 87229 2.1 160 717 981 75
2012-01-01 12:00:00 10 60 87273 1.8 135 800 999 79
2012-01-01 13:00:00 10 53 87316 1.5 160 810 998 80
2012-01-01 14:00:00 10 53 87302 1.3 164 747 977 79
2012-01-01 15:00:00 10 53 87287 1.2 176 617 932 74
2012-01-01 16:00:00 10 53 87273 1.0 140 433 846 65
2012-01-01 17:00:00 10 64 87185 0.0 198 219 650 46
2012-01-01 18:00:00 10 69 87104 0.0 221 0 0 0
2012-01-01 19:00:00 10 69 87101 0.0 269 0 0 0
2012-01-01 20:00:00 10 67 87115 0.0 50 0 0 0
2012-01-01 21:00:00 10 56 87080 0.2 85 0 0 0
2012-01-01 22:00:00 10 49 87089 0.5 89 0 0 0
2012-01-01 23:00:00 10 44 87143 0.8 93 0 0 0
cuerna_1dia = cuerna.loc["2012-01-01"].copy()
cuerna_1dia["To"] = 10
cuerna.loc[cuerna.To>19,"RH"] = 'a' 
/var/folders/2z/fh3yv7r50rxgy804jm3f7b0c0000gn/T/ipykernel_62846/2653515496.py:1: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'a' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  cuerna.loc[cuerna.To>19,"RH"] = 'a'
cuerna
To RH P Ws Wd Ig Ib Id
tiempo
2012-01-01 00:00:00 19.3 a 87415 0.0 26 0 0 0
2012-01-01 01:00:00 18.6 59 87602 0.0 26 0 0 0
2012-01-01 02:00:00 17.9 61 87788 0.0 30 0 0 0
2012-01-01 03:00:00 17.3 66 87554 0.0 30 0 0 0
2012-01-01 04:00:00 16.6 71 87321 0.0 27 0 0 0
... ... ... ... ... ... ... ... ...
2012-01-31 19:00:00 17.9 41 86878 0.9 264 0 0 0
2012-01-31 20:00:00 16.9 44 86839 1.0 269 0 0 0
2012-01-31 21:00:00 16.5 45 86887 1.0 208 0 0 0
2012-01-31 22:00:00 16.3 46 87020 1.0 148 0 0 0
2012-01-31 23:00:00 16.2 45 87238 1.0 170 0 0 0

744 rows × 8 columns

cuerna.RH
tiempo
2012-01-01 00:00:00     a
2012-01-01 01:00:00    59
2012-01-01 02:00:00    61
2012-01-01 03:00:00    66
2012-01-01 04:00:00    71
                       ..
2012-01-31 19:00:00    41
2012-01-31 20:00:00    44
2012-01-31 21:00:00    45
2012-01-31 22:00:00    46
2012-01-31 23:00:00    45
Name: RH, Length: 744, dtype: object
f = "../data/Cuernavaca_1dia_comas_duplicado.csv"

df = pd.read_csv(f,index_col=0,parse_dates=True)
df
To Ws Wd P Ig Ib Id
tiempo
2012-01-01 00:00:00 19.3 0.0 26 87415 0 0 0
2012-01-01 01:00:00 18.6 0.0 26 87602 0 0 0
2012-01-01 02:00:00 17.9 0.0 30 87788 0 0 0
2012-01-01 03:00:00 17.3 0.0 30 87554 0 0 0
2012-01-01 04:00:00 16.6 0.0 27 87321 0 0 0
2012-01-01 05:00:00 15.9 0.0 26 87087 0 0 0
2012-01-01 06:00:00 17.0 0.0 27 87096 0 0 0
2012-01-01 07:00:00 18.0 0.0 34 87140 20 151 11
2012-01-01 08:00:00 19.0 0.0 61 87185 164 522 37
2012-01-01 09:00:00 20.0 0.0 95 87229 369 812 58
2012-01-01 10:00:00 20.0 1.0 108 87229 568 931 68
2012-01-01 11:00:00 20.0 2.1 160 87229 717 981 75
2012-01-01 12:00:00 21.0 1.8 135 87273 800 999 79
2012-01-01 13:00:00 22.0 1.5 160 87316 810 998 80
2012-01-01 14:00:00 21.7 1.3 164 87302 747 977 79
2012-01-01 15:00:00 21.3 1.2 176 87287 617 932 74
2012-01-01 16:00:00 21.0 1.0 140 87273 433 846 65
2012-01-01 17:00:00 19.0 0.0 198 87185 219 650 46
2012-01-01 18:00:00 17.1 0.0 221 87104 0 0 0
2012-01-01 19:00:00 17.0 0.0 269 87101 0 0 0
2012-01-01 20:00:00 17.3 0.0 50 87115 0 0 0
2012-01-01 21:00:00 17.0 0.2 85 87080 0 0 0
2012-01-01 22:00:00 16.6 0.5 89 87089 0 0 0
2012-01-01 23:00:00 15.9 0.8 93 87143 0 0 0
2012-01-01 23:00:00 15.9 0.8 93 87143 0 0 0
f = "../data/Cuernavaca_1dia_comas_duplicado.csv"

df = pd.read_csv(f)
df.tiempo = pd.to_datetime(df.tiempo,yearfirst=True)
# df.set_index('tiempo',inplace=True)
df.set_index('tiempo',inplace=True,verify_integrity=True)
df
ValueError: Index has duplicate keys: DatetimeIndex(['2012-01-01 23:00:00'], dtype='datetime64[ns]', name='tiempo', freq=None)

f = "../data/Cuernavaca_1dia_comas_duplicado.csv"

df = pd.read_csv(f)
df.tiempo = pd.to_datetime(df.tiempo,yearfirst=True)
# df.set_index('tiempo',inplace=True)
df.drop_duplicates(subset="tiempo",inplace=True,keep="first")
df.set_index('tiempo',inplace=True,verify_integrity=True,)
df
To Ws Wd P Ig Ib Id
tiempo
2012-01-01 00:00:00 19.3 0.0 26 87415 0 0 0
2012-01-01 01:00:00 18.6 0.0 26 87602 0 0 0
2012-01-01 02:00:00 17.9 0.0 30 87788 0 0 0
2012-01-01 03:00:00 17.3 0.0 30 87554 0 0 0
2012-01-01 04:00:00 16.6 0.0 27 87321 0 0 0
2012-01-01 05:00:00 15.9 0.0 26 87087 0 0 0
2012-01-01 06:00:00 17.0 0.0 27 87096 0 0 0
2012-01-01 07:00:00 18.0 0.0 34 87140 20 151 11
2012-01-01 08:00:00 19.0 0.0 61 87185 164 522 37
2012-01-01 09:00:00 20.0 0.0 95 87229 369 812 58
2012-01-01 10:00:00 20.0 1.0 108 87229 568 931 68
2012-01-01 11:00:00 20.0 2.1 160 87229 717 981 75
2012-01-01 12:00:00 21.0 1.8 135 87273 800 999 79
2012-01-01 13:00:00 22.0 1.5 160 87316 810 998 80
2012-01-01 14:00:00 21.7 1.3 164 87302 747 977 79
2012-01-01 15:00:00 21.3 1.2 176 87287 617 932 74
2012-01-01 16:00:00 21.0 1.0 140 87273 433 846 65
2012-01-01 17:00:00 19.0 0.0 198 87185 219 650 46
2012-01-01 18:00:00 17.1 0.0 221 87104 0 0 0
2012-01-01 19:00:00 17.0 0.0 269 87101 0 0 0
2012-01-01 20:00:00 17.3 0.0 50 87115 0 0 0
2012-01-01 21:00:00 17.0 0.2 85 87080 0 0 0
2012-01-01 22:00:00 16.6 0.5 89 87089 0 0 0
2012-01-01 23:00:00 15.9 0.8 93 87143 0 0 0