Saltar a contenido

Polars MariaDB

Leer tabla de base de datos

Referencias

Documentaci贸n API Polars
Connector-x: Supported Sources & Destinations

pip install polars connectorx
import polars as pl

Lectura v铆a connectorx

Publicaci贸n VLDB 2022: ConnectorX: Accelerating Data Loading From Databases to Dataframes

El c贸digo siguiente permite leer los datos desde una tabla en la BD a una tabla Arrow

import connectorx as cx
import pyarrow.compute as pc
conn = 'mysql://usuario:contrase帽a@mariadb:3306/kafka'
query = 'SELECT * FROM songs'
# Leer produciendo una tabla arrow
ta = cx.read_sql(conn, query, return_type='arrow')
print("Tabla Arrow:", ta.schema)
pjta = ta.filter(pc.equal(ta['artist'], 'Pearl Jam'))
print(pjta.select(["album","title"]))
Tabla Arrow: ids: int64
title: string
album: string
duration: string
release: int64
artist: string
type: string
filename: string
fechreg: timestamp[us]
pyarrow.Table
album: string
title: string
----
album: [["Ten","Ten","Ten","Ten"]]
title: [["Once","Even Flow","Alive","Jeremy"]]

Leer datos de una tabla en un Polars Dataframe indicando el URI de conexi贸n

uri = "mysql://usuario:contrase帽a@mariadb:3306/kafka"
query = "SELECT * FROM songs"

pl.read_database_uri(query=query, uri=uri)
idstitlealbumdurationreleaseartisttypefilenamefechreg
i64strstrstri64strstrstrdatetime[渭s]
1"Black Hole Sun""Superunknown""05:06"1994"Soundgarden""Rock""metal-musics-dataset.csv"2025-04-09 07:59:48
2"Smells Like Teen Spirit""Nevermind""05:01"1991"Nirvana""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
3"Breed""Nevermind""03:03"1991"Nirvana""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
4"Lithium""Nevermind""04:17"1991"Nirvana""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
5"Once""Ten""03:51"1991"Pearl Jam""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
6"Even Flow""Ten""04:53"1991"Pearl Jam""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
7"Alive""Ten""05:40"1991"Pearl Jam""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
8"Jeremy""Ten""05:18"1991"Pearl Jam""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
9"Forty Six & 2""Aenima""06:04"1994"Tool""Metal Progresivo""metal-musics-dataset.csv"2025-04-09 07:59:48
10"Lateralus""Lateralus""09:24"2001"Tool""Metal Progresivo""metal-musics-dataset.csv"2025-04-09 07:59:48

Crear conexi贸n a la base de datos mediante Alchemy

import pandas as pd
from sqlalchemy import create_engine

# Define the connection string
username = 'usuario'
password = 'contrase帽a'
host = 'mariadb:3306'
database = 'kafka'

# Creating the engine
dbcon = create_engine(f'mariadb+mariadbconnector://{username}:{password}@{host}/{database}',pool_size=10, max_overflow=20)

Consultar la base de datos usando la conexi贸n dbcon

SQLAlchemy

polars.read_database() requiere una conexi贸n SQLALchemy. Por eso, normalmente es m谩s eficiente, y m谩s simple, usar polars.read_dabase_uri()

pl.read_database(query="SELECT * FROM songs", connection=dbcon)
idstitlealbumdurationreleaseartisttypefilenamefechreg
i64strstrstri64strstrstrdatetime[渭s]
1"Black Hole Sun""Superunknown""05:06"1994"Soundgarden""Rock""metal-musics-dataset.csv"2025-04-09 07:59:48
2"Smells Like Teen Spirit""Nevermind""05:01"1991"Nirvana""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
3"Breed""Nevermind""03:03"1991"Nirvana""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
4"Lithium""Nevermind""04:17"1991"Nirvana""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
5"Once""Ten""03:51"1991"Pearl Jam""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
6"Even Flow""Ten""04:53"1991"Pearl Jam""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
7"Alive""Ten""05:40"1991"Pearl Jam""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
8"Jeremy""Ten""05:18"1991"Pearl Jam""Grunge""metal-musics-dataset.csv"2025-04-09 07:59:48
9"Forty Six & 2""Aenima""06:04"1994"Tool""Metal Progresivo""metal-musics-dataset.csv"2025-04-09 07:59:48
10"Lateralus""Lateralus""09:24"2001"Tool""Metal Progresivo""metal-musics-dataset.csv"2025-04-09 07:59:48

Escribir en tabla de la base de datos

# Leer datos de archivo CSV local
dfpl = pl.read_csv("Eurostar/stations.csv")

# Volcar en la base de datos
dfpl.write_database(table_name='stations', connection=dbcon, engine='sqlalchemy', if_table_exists = "replace")