o <ºiRã@s¨ddlZddlZddlZddlmZddlmZddlm Z m Z mZmZm Z dedeede fdd „Zdedeedefd d„Zdejd ejdeedejfdd„ZdS)éN)ÚTable)Úcompute)ÚAlwaysFalseÚBooleanExpressionÚEqualToÚInÚOrÚdfÚ join_colsÚreturncs|| ˆ¡ ˆ¡ g¡}tˆƒdkrtˆd|d ¡ƒS‡fdd„| ¡Dƒ}t|ƒdkr0tƒSt|ƒdkr:|dSt|ŽS)Nércs(g|]‰t tj‡fdd„ˆDƒ¡‘qS)csg|] }t|ˆ|ƒ‘qS©)r)Ú.0Úcol©Úrowr úX/home/ubuntu/veenaModal/venv/lib/python3.10/site-packages/pyiceberg/table/upsert_util.pyÚ (sz2create_match_filter...)Ú functoolsÚreduceÚoperatorÚand_)r©r rrr'sÿz'create_match_filter..)ÚselectÚgroup_byÚ aggregateÚlenrÚ to_pylistrr)r r Úunique_keysÚfiltersr rrÚcreate_match_filter!s ÿr cCs4t| |¡ |¡ gdfg¡ t d¡dk¡ƒdkS)zFCheck for duplicate rows in a PyArrow table based on the join columns.Ú count_allrr)rrrrÚfilterÚpcÚfield)r r r r rÚhas_duplicate_rows3s4r%Úsource_tableÚtarget_tablecCspt|jƒ}t|ƒ}t||ƒ}t||ƒrtdƒ‚t|ƒdkr#|j ¡Sd}d}||vs/||vr:t|›d|›dƒd‚| |j¡ |¡ |t t t|ƒƒ¡¡}| |¡ |t t t|ƒƒ¡¡} |j| t|ƒdd } g}t| | ¡| | ¡d dD]2\}} | |d¡}| | d¡}|D]}| |¡d ¡}| |¡d ¡}||krª| |¡nq‹qy|r³| |¡S|j ¡S) a Return a table with rows that need to be updated in the target table based on the join columns. The table is joined on the identifier columns, and then checked if there are any updated rows. Those are selected and everything is renamed correctly. z0Target table has duplicate rows, aborting upsertrÚ__source_indexÚ__target_indexz and zH are reserved for joining DataFrames, and cannot be used as column namesNÚinner)ÚkeysÚ join_typeT)Ústrictr)ÚsetÚcolumn_namesÚlistr%Ú ValueErrorrÚschemaÚempty_tableÚcastrÚ append_columnÚpaÚarrayÚrangeÚjoinÚziprÚsliceÚcolumnÚas_pyÚappendÚtake)r&r'r Úall_columnsÚ join_cols_setÚnon_key_colsÚSOURCE_INDEX_COLUMN_NAMEÚTARGET_INDEX_COLUMN_NAMEÚsource_indexÚtarget_indexÚmatching_indicesÚto_update_indicesÚ source_idxÚ target_idxÚ source_rowÚ target_rowÚkeyÚ source_valÚ target_valr r rÚget_rows_to_update8sP ÿý ý ý þ€ rP)rrÚpyarrowr6rÚ pyarrow_tablerr#Úpyiceberg.expressionsrrrrrr0Ústrr Úboolr%rPr r r rÚs (