Hadoop-23 Sqoop data MySQL to HDFS (partial) SQL generates data HDFS cluster Sqoop import jdbc ETL MapReduce

2024-07-12

Chapter Content

In the previous section, we completed the following:

Write a SQL script to generate data
Start our dependent services: HDFS, Hive, YARN
Sqoop imports all data into Hive
Check MapReduce status and HDFS results

Background

Here are three public cloud servers, each with 2C4G, used to build a Hadoop learning environment for me to study.
I have built it once on a VM virtual machine before, but I didn't leave any notes. This time, I took advantage of the three machines I got a few days ago and tried to build it on the public Internet to experience it.

2C4G No.h121
2C4G No.h122
2C2G No.h123

insert image description here

Precautions

Apache Sqoop 于 2021 年 6 月迁入 Attic。
Apache Sqoop's mission is to create and maintain large-scale data transfer software related to Apache Hadoop and structured data stores.
Although the project may be outdated, if you have enough time, you can study it properly, it may be helpful to you!!!

Partial import: --query

Execute the script

sqoop import 
--connect jdbc:mysql://h122.wzk.icu:3306/sqoop 
--username hive 
--password [email protected] 
--target-dir /root 
--append 
-m 1 
--fields-terminated-by "t" 
--query 'select gname, serialNumber, price, stock_number,
create_time from goodtbl where price>88 and $CONDITIONS;'
1
2
3
4
5
6
7
8
9
10

上述参数的解释：

The where clause of the query must contain '$CONDITIONS'
If double quotes are used after query, an escape symbol must be added before $CONDITIONS to prevent the shell from thinking it is its own variable.

Assignments

You can observe that Sqoop has started the MapReduce task.
insert image description here

Waiting for execution

At this point, the task has begun to allocate Map -> Reduce
insert image description here

View Results

You can see that the task has been completed.
insert image description here

Partial import: Specifying columns

Execute the script

sqoop import 
--connect jdbc:mysql://h122.wzk.icu:3306/sqoop 
--username hive 
--password [email protected] 
--target-dir /root 
--delete-target-dir 
--num-mappers 1 
--fields-terminated-by "t" 
--columns gname,serialNumber,price 
--table goodtbl
1
2
3
4
5
6
7
8
9
10

上述参数的解释：

If multiple columns are involved in columns, they should be separated by commas and no spaces can be added!!!

Assignments

Consistent with the above content, I will not repeat the content and only extract part of it.
insert image description here

Partial import: --where

Execute the script

sqoop import 
--connect jdbc:mysql://h122.wzk.icu:3306/sqoop 
--username hive 
--password [email protected] 
--target-dir /root 
--delete-target-dir 
-m 1 
--fields-terminated-by "t" 
--table goodtbl 
--where "price>=68"
1
2
3
4
5
6
7
8
9
10

Assignments

Consistent with the above, the content and results are basically repeated, so they are not included.

Please add a description of the image

Technology Sharing