2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
In spark sql, users can use Join hint to suggest which Join to use for Spark. Before Spark 3.0, only BROADCAST Join hint was supported. Since Spark 3.0, MERGE, SHUFFLE_HASH and SHUFFLE_REPLICATE_NL Join hints have been added. The priority is BROADCAST > MERGE > SHUFFLE_HASH > SHUFFLE_REPLICATE_NL. If BROADCAST or SHUFFLE_HASH is added to both sides of the Join, Spark will choose which side to build based on the joinType and the size of both sides.
- -- Join Hints for broadcast join
- SELECT /*+ BROADCAST(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
- SELECT /*+ BROADCASTJOIN (t1) */ * FROM t1 left JOIN t2 ON t1.key = t2.key;
- SELECT /*+ MAPJOIN(t2) */ * FROM t1 right JOIN t2 ON t1.key = t2.key;
-
- -- Join Hints for shuffle sort merge join
- SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
- SELECT /*+ MERGEJOIN(t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
- SELECT /*+ MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
-
- -- Join Hints for shuffle hash join
- SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
-
- -- Join Hints for shuffle-and-replicate nested loop join
- SELECT /*+ SHUFFLE_REPLICATE_NL(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
-
- -- When different join strategy hints are specified on both sides of a join, Spark
- -- prioritizes the BROADCAST hint over the MERGE hint over the SHUFFLE_HASH hint
- -- over the SHUFFLE_REPLICATE_NL hint.
- -- Spark will issue Warning in the following example
- -- org.apache.spark.sql.catalyst.analysis.HintErrorLogger: Hint (strategy=merge)
- -- is overridden by another hint and will not take effect.
- SELECT /*+ BROADCAST(t1), MERGE(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
Use relations in spark hint: https://blog.51cto.com/u_15435003/5296344