Technology Sharing

Python Cool Library Tour - Third-Party Library Pandas(011)

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Table of contents

1. Usage

25. pandas.HDFStore.get function

25-1. Grammar

25-2. Parameters

25-3. Function

25-4. Return value

25-5. Description

25-6. Usage

25-6-1. Data preparation

25-6-2. Code Example

25-6-3. Result output

26. pandas.HDFStore.select function

26-1. Grammar

26-2. Parameters

26-3. Function

26-4. Return value

26-5. Description

26-6. Usage

26-6-1. Data preparation

26-6-2. Code Example

26-6-3. Result output

27. pandas.HDFStore.info function

27-1. Grammar

27-2. Parameters

27-3. Function

27-4. Return value

27-5. Description

27-6. Usage

27-6-1. Data preparation

27-6-2. Code Examples

27-6-3. Result output

2. Recommended Reading

1. Python foundation building journey

2. Python Function Tour

3. Python Algorithm Tour

4. Python Magic Journey

5. Blog personal homepage

1. Usage

25、pandas.HDFStore.getfunction
25-1. Grammar
  1. # 25、pandas.HDFStore.get函数
  2. HDFStore.get(key)
  3. Retrieve pandas object stored in file.
  4. Parameters:
  5. key
  6. str
  7. Returns:
  8. object
  9. Same type as object stored in file.
25-2. Parameters

25-2-1、key(must)A string that specifies the location or name of the data to be retrieved in the HDF5 file. This key usually corresponds to the name or path you used when saving the data to the HDF5 file.

25-3. Function

Used to retrieve (or get) stored data from an HDF5 file.

25-4. Return value

Generally, this function will return the pandas object associated with key, such as a DataFrame, Series, or other possible pandas container.

Specifically, the return value can be:

25-4-1、DataFrame: If the data associated with key stored in the HDF5 file is a table or table-like data structure, the get method will return a DataFrame object. DataFrame is the main data structure used in pandas to store and manipulate structured data. It stores data in a tabular form, containing rows and columns.

25-4-2、Series: In some cases, if the stored data is one-dimensional, such as time series data or a single column of data, the get method may return a Series object, which is a data structure in pandas for storing one-dimensional data (that is, an array with an index).

25-4-3. Other pandas objects: Although uncommon, it is theoretically possible to store other types of pandas objects in HDF5 files, such as Panel (Note: Panel has been deprecated and removed from the pandas library as of pandas 0.25.0). However, this has become increasingly rare as pandas has evolved.

25-4-4, None or default value: If the specified key does not exist in the HDF5 file, and the get method is not provided with a default value as the second argument, it may raise a KeyError. However, if a default value is provided (although this is not the standard behavior of the get method, as get methods in HDFStore do not generally support default value arguments directly, this may be the case for DataFrame.getmethod), it will return that default value. However, in the context of HDFStore, it is more common to use a try-except block to catch the KeyError and handle the situation if necessary.

25-5. Description

none

25-6. Usage
25-6-1. Data preparation
25-6-2. Code Example
  1. # 25、pandas.HDFStore.get函数
  2. import pandas as pd
  3. # 创建一个示例的DataFrame
  4. data = pd.DataFrame({
  5. 'A': [1, 2, 3, 4],
  6. 'B': ['foo', 'bar', 'foo', 'bar'],
  7. 'C': [0.1, 0.2, 0.3, 0.4]
  8. })
  9. # 将数据保存到HDF5文件中
  10. filename = 'example.h5'
  11. key = 'data'
  12. data.to_hdf(filename, key=key, format='table', mode='w')
  13. # 从HDF5文件中读取数据
  14. with pd.HDFStore(filename, mode='r') as store:
  15. df_from_hdf = store.get(key)
  16. # 打印读取的数据
  17. print("Data read from HDF5:")
  18. print(df_from_hdf)
25-6-3. Result output
  1. # 25、pandas.HDFStore.get函数
  2. # Data read from HDF5:
  3. # A B C
  4. # 0 1 foo 0.1
  5. # 1 2 bar 0.2
  6. # 2 3 foo 0.3
  7. # 3 4 bar 0.4
26、pandas.HDFStore.selectfunction
26-1. Grammar
  1. # 26、pandas.HDFStore.select函数
  2. HDFStore.select(key, where=None, start=None, stop=None, columns=None, iterator=False, chunksize=None, auto_close=False)
  3. Retrieve pandas object stored in file, optionally based on where criteria.
  4. Warning
  5. Pandas uses PyTables for reading and writing HDF5 files, which allows serializing object-dtype data with pickle when using the “fixed” format. Loading pickled data received from untrusted sources can be unsafe.
  6. See: https://docs.python.org/3/library/pickle.html for more.
  7. Parameters:
  8. key
  9. str
  10. Object being retrieved from file.
  11. where
  12. list or None
  13. List of Term (or convertible) objects, optional.
  14. start
  15. int or None
  16. Row number to start selection.
  17. stop
  18. int, default None
  19. Row number to stop selection.
  20. columns
  21. list or None
  22. A list of columns that if not None, will limit the return columns.
  23. iterator
  24. bool or False
  25. Returns an iterator.
  26. chunksize
  27. int or None
  28. Number or rows to include in iteration, return an iterator.
  29. auto_close
  30. bool or False
  31. Should automatically close the store when finished.
  32. Returns:
  33. object
  34. Retrieved object from file.
26-2. Parameters

26-2-1、key(must)The key (or path) in the HDF5 file to retrieve. This is typically the name or path you specified when saving the data to the HDF5 file.

26-2-2、where(optional, default value is None)The conditional expression used to filter the data. If it is a string, it should be a valid Pandas query string, similar to the string used when using the .query() method on a DataFrame; if it is a callable object (such as a function), it should accept a DataFrame as input and return a boolean series indicating which rows should be selected.

26-2-3、start/stop(optional, default value is None)The starting/ending index (0-based) of the rows to retrieve. If start and stop are specified, only rows between these two indices (including start but excluding stop) will be retrieved.

26-2-4、columns(optional, default value is None)A list of column names to retrieve or a single column name, if this parameter is specified, only data for those columns will be retrieved.

26-2-5、iterator(optional, default value is False)If True, returns an iterator that yields data chunk by chunk instead of loading the entire dataset into memory at once, which is useful for processing large datasets.

26-2-6、chunksize(optional, default value is None)When iterator=True, this parameter specifies the number of rows in each chunk, which allows you to control the amount of memory used and can improve performance when processing large datasets.

26-2-7、auto_close(optional, default value is False)If True, the storage is automatically closed when the iterator is exhausted or an exception occurs, which helps ensure that the file is properly closed even when an error occurs. However, please note that if you intend to continue using the HDFStore object after the iterator is exhausted, you should set this parameter to False.

26-3. Function

Retrieves a pandas object (such as a DataFrame or Series) stored under a specific key from an HDF5 file, and allows the user to filter or control the retrieved data based on a range of parameters.

26-4. Return value

  The return value depends on the type of data associated with key stored in the HDF5 file and the query condition (if any). Typically, the return value is a pandas object, such as:

26-4-1、DataFrame: If the retrieved data is in tabular form, a DataFrame object will be returned.

26-4-2、SeriesIf the data retrieved is one-dimensional (i.e. a single column of data), then the returned value may be a Series object, although this typically happens when a single column is explicitly specified as the columns argument.

26-4-3. Other pandas objects: In theory, other pandas containers are possible, but in the context of HDF5 files, the most common ones are DataFrame and Series.

26-5. Description

none

26-6. Usage
26-6-1. Data preparation
26-6-2. Code Example
  1. # 26、pandas.HDFStore.select函数
  2. import pandas as pd
  3. import numpy as np
  4. # 创建一个示例DataFrame
  5. np.random.seed(0) # 设置随机种子以确保结果可重复
  6. data = pd.DataFrame({
  7. 'A': np.random.randn(100),
  8. 'B': np.random.randn(100),
  9. 'C': np.random.randn(100),
  10. 'D': np.random.randint(0, 2, 100)
  11. })
  12. # 将DataFrame保存到HDF5文件中
  13. with pd.HDFStore('example.h5') as store:
  14. store.put('data', data, format='table')
  15. # 从HDF5文件中检索数据的示例
  16. with pd.HDFStore('example.h5') as store:
  17. # 选择所有数据
  18. print("nAll data:")
  19. all_data = store.select('data')
  20. print(all_data.head()) # 只打印前几行以节省空间
  21. # 选择特定的列
  22. print("nSpecific columns (A, B):")
  23. specific_columns = store.select('data', columns=['A', 'B'])
  24. print(specific_columns.head())
  25. # 选择部分数据行(注意:HDF5的索引可能不是从0开始的,但这里假设它是)
  26. print("nPartial data (rows 10 to 19):")
  27. partial_data = store.select('data', start=10, stop=20)
  28. print(partial_data)
  29. # 使用chunksize来逐块读取数据
  30. print("nData read in chunks:")
  31. chunks = store.select('data', chunksize=10)
  32. for i, chunk in enumerate(chunks):
  33. print(f"Chunk {i + 1}:")
  34. print(chunk.head()) # 只打印每个块的前几行
26-6-3. Result output
  1. # 26、pandas.HDFStore.select函数
  2. # All data:
  3. # A B C D
  4. # 0 1.764052 1.883151 -0.369182 0
  5. # 1 0.400157 -1.347759 -0.239379 0
  6. # 2 0.978738 -1.270485 1.099660 1
  7. # 3 2.240893 0.969397 0.655264 1
  8. # 4 1.867558 -1.173123 0.640132 0
  9. #
  10. # Specific columns (A, B):
  11. # A B
  12. # 0 1.764052 1.883151
  13. # 1 0.400157 -1.347759
  14. # 2 0.978738 -1.270485
  15. # 3 2.240893 0.969397
  16. # 4 1.867558 -1.173123
  17. #
  18. # Partial data (rows 10 to 19):
  19. # A B C D
  20. # 10 0.144044 1.867559 0.910179 0
  21. # 11 1.454274 0.906045 0.317218 0
  22. # 12 0.761038 -0.861226 0.786328 1
  23. # 13 0.121675 1.910065 -0.466419 0
  24. # 14 0.443863 -0.268003 -0.944446 0
  25. # 15 0.333674 0.802456 -0.410050 0
  26. # 16 1.494079 0.947252 -0.017020 1
  27. # 17 -0.205158 -0.155010 0.379152 1
  28. # 18 0.313068 0.614079 2.259309 0
  29. # 19 -0.854096 0.922207 -0.042257 0
  30. #
  31. # Data read in chunks:
  32. # Chunk 1:
  33. # A B C D
  34. # 0 1.764052 1.883151 -0.369182 0
  35. # 1 0.400157 -1.347759 -0.239379 0
  36. # 2 0.978738 -1.270485 1.099660 1
  37. # 3 2.240893 0.969397 0.655264 1
  38. # 4 1.867558 -1.173123 0.640132 0
  39. # Chunk 2:
  40. # A B C D
  41. # 10 0.144044 1.867559 0.910179 0
  42. # 11 1.454274 0.906045 0.317218 0
  43. # 12 0.761038 -0.861226 0.786328 1
  44. # 13 0.121675 1.910065 -0.466419 0
  45. # 14 0.443863 -0.268003 -0.944446 0
  46. # Chunk 3:
  47. # A B C D
  48. # 20 -2.552990 0.376426 -0.955945 0
  49. # 21 0.653619 -1.099401 -0.345982 1
  50. # 22 0.864436 0.298238 -0.463596 0
  51. # 23 -0.742165 1.326386 0.481481 0
  52. # 24 2.269755 -0.694568 -1.540797 1
  53. # Chunk 4:
  54. # A B C D
  55. # 30 0.154947 -0.769916 -1.424061 1
  56. # 31 0.378163 0.539249 -0.493320 0
  57. # 32 -0.887786 -0.674333 -0.542861 0
  58. # 33 -1.980796 0.031831 0.416050 1
  59. # 34 -0.347912 -0.635846 -1.156182 1
  60. # Chunk 5:
  61. # A B C D
  62. # 40 -1.048553 -1.491258 -0.637437 0
  63. # 41 -1.420018 0.439392 -0.397272 1
  64. # 42 -1.706270 0.166673 -0.132881 0
  65. # 43 1.950775 0.635031 -0.297791 0
  66. # 44 -0.509652 2.383145 -0.309013 0
  67. # Chunk 6:
  68. # A B C D
  69. # 50 -0.895467 -0.068242 0.521065 1
  70. # 51 0.386902 1.713343 -0.575788 1
  71. # 52 -0.510805 -0.744755 0.141953 0
  72. # 53 -1.180632 -0.826439 -0.319328 0
  73. # 54 -0.028182 -0.098453 0.691539 1
  74. # Chunk 7:
  75. # A B C D
  76. # 60 -0.672460 -0.498032 -1.188859 1
  77. # 61 -0.359553 1.929532 -0.506816 1
  78. # 62 -0.813146 0.949421 -0.596314 0
  79. # 63 -1.726283 0.087551 -0.052567 0
  80. # 64 0.177426 -1.225436 -1.936280 0
  81. # Chunk 8:
  82. # A B C D
  83. # 70 0.729091 0.920859 0.399046 0
  84. # 71 0.128983 0.318728 -2.772593 1
  85. # 72 1.139401 0.856831 1.955912 0
  86. # 73 -1.234826 -0.651026 0.390093 1
  87. # 74 0.402342 -1.034243 -0.652409 1
  88. # Chunk 9:
  89. # A B C D
  90. # 80 -1.165150 -0.353994 -0.110541 0
  91. # 81 0.900826 -1.374951 1.020173 0
  92. # 82 0.465662 -0.643618 -0.692050 1
  93. # 83 -1.536244 -2.223403 1.536377 0
  94. # 84 1.488252 0.625231 0.286344 0
  95. # Chunk 10:
  96. # A B C D
  97. # 90 -0.403177 -1.292857 -0.628088 1
  98. # 91 1.222445 0.267051 -0.481027 1
  99. # 92 0.208275 -0.039283 2.303917 0
  100. # 93 0.976639 -1.168093 -1.060016 1
  101. # 94 0.356366 0.523277 -0.135950 0
27、pandas.HDFStore.infofunction
27-1. Grammar
  1. # 27、pandas.HDFStore.info函数
  2. HDFStore.info()
  3. Print detailed information on the store.
  4. Returns:
  5. str
27-2. Parameters

none

27-3. Function

Provides detailed information about the datasets (also called keys or nodes) stored in the HDF5 file.

27-4. Return value

There is no direct return value (that is, no data is returned to the variable), but the information is printed to the console (or standard output).

27-5. Description

none

27-6. Usage
27-6-1. Data preparation
27-6-2. Code Examples
  1. # 27、pandas.HDFStore.info函数
  2. import pandas as pd
  3. import numpy as np
  4. # 创建一个包含随机数的数据帧
  5. data = pd.DataFrame({
  6. 'A': np.random.randn(100),
  7. 'B': np.random.randn(100),
  8. 'C': np.random.randn(100),
  9. 'D': np.random.randint(0, 2, 100)
  10. })
  11. # 将数据写入HDF5文件
  12. with pd.HDFStore('example.h5') as store:
  13. store.put('data', data, format='table')
  14. # 使用HDFStore.info()函数获取HDF5文件的信息
  15. with pd.HDFStore('example.h5') as store:
  16. # 打印存储的信息
  17. store.info()
  18. # 读取数据以确认
  19. all_data = store.select('data')
  20. print("nAll data (first 5 rows):")
  21. print(all_data.head())
27-6-3. Result output
  1. # 27、pandas.HDFStore.info函数
  2. # All data (first 5 rows):
  3. # A B C D
  4. # 0 -1.186803 -0.983345 0.661022 1
  5. # 1 0.549244 -0.429500 -0.022329 1
  6. # 2 1.408989 0.779268 0.079574 1
  7. # 3 -1.178696 0.918125 0.174332 0
  8. # 4 -0.538677 -0.124535 -1.165208 1

2. Recommended Reading

1、Python Foundation Journey
2、A tour of Python functions
3、Python Algorithm Tour
4、Python Magic Tour
5、Blog Personal Homepage