技术共享

dataX入门

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

下载dataX

https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202308/datax.tar.gz

然后

下载后解压至本地某个目录,进入bin目录,即可运行同步作业:

$ cd  {YOUR_DATAX_HOME}/bin
$ python datax.py {YOUR_JOB.json}

要求你有python和jdk1.8还有maven3

第一步、创建作业的配置文件(json格式)

模板类型:

#stream2stream.json
{
  "job": {
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "sliceRecordCount": 10,
            "column": [
              {
                "type": "long",
                "value": "10"
              },
              {
                "type": "string",
                "value": "hello,你好,世界-DataX"
              }
            ]
          }
        },
        "writer": {
          "name": "streamwriter",
          "parameter": {
            "encoding": "UTF-8",
            "print": true
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 5
       }
    }
  }
}

启动

$ cd {YOUR_DATAX_DIR_BIN}
$ python datax.py ./stream2stream.json 

github左边,你想用哪个reader或者writer

直接去当前的resouece下,用他给好的json就行了。

如果你打不开github也无所谓,你下载下来的文件夹里面plugins里面就有模板。

非常简单。

例子

mysql读写例子

  1. {"job": {"content": [{"reader": {"name": "mysqlreader", "parameter": {"username": "root","password": "123123","column": ["*"],"splitPk": "ID","where": "ID <= 1888","connection": [{"jdbcUrl": ["jdbc:mysql://192.168.1.1:3306/xxx?useUnicode=true&characterEncoding=utf8"], "table": ["t_member"]}]}}, "writer": {"name": "mysqlwriter", "parameter": {"column": ["*"], "connection": [{"jdbcUrl": "jdbc:mysql://192.168.1.2:3306/xxx?useUnicode=true&characterEncoding=utf8","table": ["t_xxx"]}], "password": "123123","preSql": ["执行写入前执行的语句,比如删除表啊,之类的"], "session": ["set session sql_mode='ANSI'"], "username": "root", "writeMode": "insert"}}}], "setting": {"speed": {"channel": "5"}}}
  2. }