2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Hive supports multiple data types, which are divided into two categories: primitive data types and complex data types. The following are the data types supported by Hive:
1. Integer type:
tinyint: 1-byte signed integer
smallint: 2-byte signed integer
int: 4-byte signed integer
bigint: 8-byte signed integer
float: 4-byte single-precision floating point number
double: 8-byte double-precision floating point number
decimal: High-precision numeric type, you can specify the precision and scale, such as decimal(10,2)
Byte: One of the most basic storage units in a computer. 1 byte occupies 8 bits. Data range: negative range: -128 to -1, positive range: 0 to 127
2. String type:
string: Variable-length strings
varchar: A variable-length string with a maximum length limit, such as varchar(255)
char: Fixed-length string, such as char(10)
3. Date/Time Type:
timestamp: A timestamp containing the date and time, accurate to nanoseconds
date: Contains only the date part, not the time part
interval: Time interval, used to represent the difference between two dates or times
4. Boolean type:
boolean: Boolean value, true or false
5. Binary Type:
binary: byte array of arbitrary length
array<T>: An ordered list containing multiple elements of the same type, such as an array<int>
map<K, V>: An unordered collection of key-value pairs, where the key and value can be of any data type, such as map<string, int>
struct<col1: type1, col2: type2, ...>: A record contains multiple fields, each of which can be of a different data type, such asstruct<name: string, age: int>
- CREATE TABLE example_table (
- tinyint_col tinyint,
- smallint_col smallint,
- int_col int,
- bigint_col bigint,
- float_col float,
- double_col double,
- decimal_col decimal(10, 2),
- string_col string,
- varchar_col varchar(255),
- char_col char(10),
- timestamp_col timestamp,
- date_col date,
- boolean_col boolean,
- binary_col binary,
- array_col array<int>,
- map_col map<string, int>,
- struct_col struct<name: string, age: int>,
- union_col uniontype<int, string>
- );
Hive storage formats are divided into two categories:
A type of plain text file: textfile, uncompressed, and also the default storage format of hive
One type is binary file storage:
sequencefile: will be compressed, and data cannot be loaded using the load method
orcfile: will be compressed, and data cannot be loaded using the load method
parquet: will be compressed, and data cannot be loaded using the load method
rcfile: It will be compressed and cannot load data using the load method. It is a low-end version of orcfile.
The storage formats of textfile and sequencefile are both based on row storage; orc and parquet are based on column storage, and rcfile is a mixed row and column storage.
When creating a table, you can use stored as parquet to specify the storage format of the table, for example:
- create table if not exists stocks_parquet (
- track_time string,
- url string,
- session_id string,
- referer string,
- ip string,
- end_user_id string,
- city_id string
- )
- stored as parquet;
Modify the default storage format of hive:
- <property>
- <name>hive.default.fileformat</name>
- <value>TextFile</value>
- <description>
- Expects one of [textfile, sequencefile, rcfile, orc].
- Default file format for CREATE TABLE statement. Users can explicitly override it by CREATE TABLE ... STORED AS [FORMAT]
- </description>
- </property>
- 也可以使用set方式修改:
- set hive.default.fileformat=TextFile