-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete rows function from table where data existe in another dataframe #57
Comments
exemple of function |
@ilyasse05 : If you are trying to delete the rows from target table by comparing columns between source & target , then below code should work.
|
@puneetsharma04, we need to do it in target table it self without intermediate dataframe and without returning new dataframe. |
@ilyasse05 & @MrPowers :
|
@MrPowers & @puneetsharma04 `import io.delta.tables._ def deleteFromAnotherDataframe val stagedUpdatesAttrs = attrColNames.map(attr => f"target.$attr = source.$attr").mkString(" AND ") the function returns a number of deleted rows from metrics what do you think ? |
I am not sure about the above code , however the same can be implemented in scala as below:
Python code :
In this modified code, I've wrapped the main block of code inside a try block. If an exception is thrown during the execution of the code, it will be caught by the except block, which will print an error message to the console. Of course, this is just an example, and you may want to handle errors differently depending on your specific use case. |
@puneetsharma04 i have test this code and it works correctly val stagedUpdatesAttrs = attrColNames.map(attr => f"target.$attr = source.$attr").mkString(" AND ") |
@ilyasse05 : Then in that case , you just need to put the error handling mechanism in the code. |
@puneetsharma04 @ilyasse05 - can you please put code in pull requests? The code should be checked with unit tests. That will make it easier for other people to run the code and make tweaks. Thanks! |
@puneetsharma04 @ilyasse05 - hey! I am glad you want to contribute to the project. This feature is interesting. Which one of you wants to send the pull request, so I can assign the issue to you? Regarding the function, an observation is that maybe other users would like to use the same functionality but with a different comparison operator, it would be good if somehow we can make the function flexible enough that one can delete using >,<,=,!=, etc, operators. |
@MrPowers & @brayanjuls : Thanks for recognition. source_df = spark.createDataFrame([ ("Alice", 26), ("David", 40), ("Charlie", 35)], ["name", "age"]) Output should be: |
@puneetsharma04 - My understanding is that the idea that @ilyasse05 is proposing, is to delete only from the delta table based on an input dataframe, if we assume that given your example the delta table is source_df and the comparison operator is equal(=), then the final state of your example is correct. But again I think we should make the comparison operator flexible to support all of them (=,>,<,!=, etc). |
@brayanjuls & @puneetsharma04 /**
def deleteFromAnotherDataframe targetTableDelta.alias("target") } |
@ilyasse05 - could you please open a pull request with this code, so we can iterate on it more easily ? |
@brayanjuls I'm not sure if i have access to do it ! |
@ilyasse05 - yes you can. You need to:
That should do the job, if you need help, let me know. |
thank you @brayanjuls i will try that i have an issu to use this code error : any idea ? |
@ilyasse05 - when you use
|
@brayanjuls i have done a pull request with Unit Test and Readme #66 but i'am stilling not able to run the unit test with locally, do you have tuto for that please ? |
@ilyasse05 - You need to make sure you have installed sbt and Scala with the versions mentioned in the readme of this repository. Regarding a tutorial on how to test, please take a look at the official sbt documentation https://www.scala-sbt.org/1.x/docs/Testing.html. I will write something more concrete for this project in the following weeks but for now that documentation should help. |
Hello,
There is an interesting function, is to delete rows from Table when value of some columns exist in dataframe, i searched a function like that i haven't found it with scala/spark, but there is possibility to do it with SQL spark with "where exists" and it works very well and very performed, don't need to do it in intermediate dataframe or table or adding flags to delete after merge...
Exemple of sql :
Delete from table1 as A
Where exist (
Select 1 from table2 as B
Where
A.col1 = B.col1
And B.col2=A.col2
).
What do you think about this function?
The text was updated successfully, but these errors were encountered: