8000 Support basic import/export for parquet format. by Ognimalf · Pull Request #1446 · infiniflow/infinity · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Support basic import/export for parquet format. #1446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jul 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ jobs:
- name: Build release version
run: |
sed -i "s/^version = \".*\"/version = \"$(echo $RELEASE_TAG | cut -c2-)\"/" pyproject.toml
sudo docker exec ${BUILDER_CONTAINER} bash -c "git config --global safe.directory \"*\" && cd /infinity && rm -fr cmake-build-release && mkdir -p cmake-build-release && cmake -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCPACK_PACKAGE_VERSION=${{ env.RELEASE_TAG }} -DCPACK_DEBIAN_PACKAGE_ARCHITECTURE=amd64 -DCMAKE_JOB_POOLS:STRING='link=1' -S /infinity -B /infinity/cmake-build-release && cmake --build /infinity/cmake-build-release --target infinity"
sudo docker exec ${BUILDER_CONTAINER} bash -c "git config --global safe.directory \"*\" && cd /infinity && rm -fr cmake-build-release && mkdir -p cmake-build-release && cmake -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DARROW_BUILD_SHARED=OFF -DARROW_ENABLE_TIMING_TESTS=OFF -DARROW_GGDB_DEBUG=OFF -DARROW_PARQUET=ON -DARROW_DEPENDENCY_USE_SHARED=OFF -DCPACK_PACKAGE_VERSI 10000 ON=${{ env.RELEASE_TAG }} -DCPACK_DEBIAN_PACKAGE_ARCHITECTURE=amd64 -DCMAKE_JOB_POOLS:STRING='link=1' -S /infinity -B /infinity/cmake-build-release && cmake --build /infinity/cmake-build-release --target infinity"

- name: Download resources
run: rm -rf resource && git clone --depth=1 https://github.com/infiniflow/resource.git
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/slow_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:

- name: Build release version
if: ${{ !cancelled() && !failure() }}
run: sudo docker exec ${BUILDER_CONTAINER} bash -c "git config --global safe.directory \"*\" && cd /infinity && rm -fr cmake-build-release && mkdir -p cmake-build-release && cmake -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_JOB_POOLS:STRING=link=4 -S /infinity -B /infinity/cmake-build-release && cmake --build /infinity/cmake-build-release"
run: sudo docker exec ${BUILDER_CONTAINER} bash -c "git config --global safe.directory \"*\" && cd /infinity && rm -fr cmake-build-release && mkdir -p cmake-build-release && cmake -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DARROW_BUILD_SHARED=OFF -DARROW_ENABLE_TIMING_TESTS=OFF -DARROW_GGDB_DEBUG=OFF -DARROW_PARQUET=ON -DARROW_DEPENDENCY_USE_SHARED=OFF -DCMAKE_JOB_POOLS:STRING=link=4 -S /infinity -B /infinity/cmake-build-release && cmake --build /infinity/cmake-build-release"

- name: Download resources
run: rm -rf resource && git clone --depth=1 https://github.com/infiniflow/resource.git
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ jobs:

- name: Build debug version
if: ${{ !cancelled() && !failure() }}
run: sudo docker exec ${BUILDER_CONTAINER} bash -c "git config --global safe.directory \"*\" && cd /infinity && rm -fr cmake-build-debug && mkdir -p cmake-build-debug && cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_JOB_POOLS:STRING=link=4 -S /infinity -B /infinity/cmake-build-debug && cmake --build /infinity/cmake-build-debug --target infinity test_main"
run: sudo docker exec ${BUILDER_CONTAINER} bash -c "git config --global safe.directory \"*\" && cd /infinity && rm -fr cmake-build-debug && mkdir -p cmake-build-debug && cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DARROW_BUILD_SHARED=OFF -DARROW_ENABLE_TIMING_TESTS=OFF -DARROW_GGDB_DEBUG=OFF -DARROW_PARQUET=ON -DARROW_DEPENDENCY_USE_SHARED=OFF -DENABLE_JEMALLOC=OFF -DCMAKE_JOB_POOLS:STRING=link=4 -S /infinity -B /infinity/cmake-build-debug && cmake --build /infinity/cmake-build-debug --target infinity test_main"

- name: Unit test debug version
if: ${{ !cancelled() && !failure() }}
Expand Down Expand Up @@ -146,7 +146,7 @@ jobs:

- name: Build release version
if: ${{ !cancelled() && !failure() }}
run: sudo docker exec ${BUILDER_CONTAINER} bash -c "git config --global safe.directory \"*\" && cd /infinity && rm -fr cmake-build-release && mkdir -p cmake-build-release && cmake -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_JOB_POOLS:STRING=link=4 -S /infinity -B /infinity/cmake-build-release && cmake --build /infinity/cmake-build-release --target infinity test_main knn_import_benchmark knn_query_benchmark"
run: sudo docker exec ${BUILDER_CONTAINER} bash -c "git config --global safe.directory \"*\" && cd /infinity && rm -fr cmake-build-release && mkdir -p cmake-build-release && cmake -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DARROW_BUILD_SHARED=OFF -DARROW_ENABLE_TIMING_TESTS=OFF -DARROW_GGDB_DEBUG=OFF -DARROW_PARQUET=ON -DARROW_DEPENDENCY_USE_SHARED=OFF -DENABLE_JEMALLOC=OFF -DCMAKE_JOB_POOLS:STRING=link=4 -S /infinity -B /infinity/cmake-build-release && cmake --build /infinity/cmake-build-release --target infinity test_main knn_import_benchmark knn_query_benchmark"

- name: Unit test release version
if: ${{ !cancelled() && !failure() }}
Expand Down
35 changes: 21 additions & 14 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -116,18 +116,22 @@ elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "Debug")
set(CMAKE_CXX_FLAGS "-O0 -g")
set(CMAKE_C_FLAGS "-O0 -g")

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-stack-protector -fno-var-tracking ")
add_compile_options(-fsanitize=address -fsanitize-recover=all -fsanitize=leak)
add_link_options(-fsanitize=address -fsanitize-recover=all -fsanitize=leak)
if(NOT ENABLE_JEMALLOC)

add_compile_options("-fno-omit-frame-pointer")
add_link_options("-fno-omit-frame-pointer")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-stack-protector -fno-var-tracking ")
add_compile_options(-fsanitize=address -fsanitize-recover=all -fsanitize=leak)
add_link_options(-fsanitize=address -fsanitize-recover=all -fsanitize=leak)

# add_compile_options("-fsanitize=undefined")
# add_link_options("-fsanitize=undefined")
add_compile_options("-fno-omit-frame-pointer")
add_link_options("-fno-omit-frame-pointer")

# add_compile_options("-fsanitize=thread")
# add_link_options("-fsanitize=thread")
# add_compile_options("-fsanitize=undefined")
# add_link_options("-fsanitize=undefined")

# add_compile_options("-fsanitize=thread")
# add_link_options("-fsanitize=thread")

endif()

set(CMAKE_DEBUG_POSTFIX "")

Expand Down Expand Up @@ -166,12 +170,15 @@ endif()
find_package(Lz4 REQUIRED)

# You can disable jemalloc by passing the `-DENABLE_JEMALLOC=OFF` option to CMake.
option(ENABLE_JEMALLOC "Enable jemalloc support" ON)
if(ENABLE_JEMALLOC AND NOT "${CMAKE_BUILD_TYPE}" STREQUAL "Debug")
option(ENABLE_JEMALLOC "Enable jemalloc support" OFF)
if(ENABLE_JEMALLOC)
find_package(jemalloc REQUIRED)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DENABLE_JEMALLOC_PROF")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DENABLE_JEMALLOC_PROF")
endif()
set(JEMALLOC_STATIC_LIB "jemalloc.a")
if(NOT "${CMAKE_BUILD_TYPE}" STREQUAL "Debug")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DENABLE_JEMALLOC_PROF")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DENABLE_JEMALLOC_PROF")
endif ()
endif ()

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC")

Expand Down
106 changes: 82 additions & 24 deletions benchmark/local_infinity/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,32 @@ add_executable(infinity_benchmark
target_include_directories(infinity_benchmark PUBLIC "${CMAKE_SOURCE_DIR}/src")
target_link_libraries(
infinity_benchmark
infinity_core
benchmark_profiler
infinity_core
sql_parser
onnxruntime_mlas
zsv_parser
newpfor
fastpfor
# profiler
jma
opencc
dl
parquet.a
arrow.a
thrift.a
thriftnb.a
lz4.a
atomic.a
event.a
c++.a
c++abi.a
jma
opencc
${JEMALLOC_STATIC_LIB}
)

target_link_directories(infinity_benchmark PUBLIC "${CMAKE_BINARY_DIR}/lib")
target_link_directories(infinity_benchmark PUBLIC "${CMAKE_BINARY_DIR}/third_party/arrow/")

# ########################################
# knn
# import benchmark
Expand All @@ -38,14 +49,25 @@ target_link_libraries(
zsv_parser
newpfor
fastpfor
jma
opencc
dl
lz4.a
atomic.a
event.a
c++.a
c++abi.a
jma
opencc
# 9E81 profiler
parquet.a
arrow.a
thrift.a
thriftnb.a
${JEMALLOC_STATIC_LIB}
)

target_link_directories(knn_import_benchmark BEFORE PUBLIC "${CMAKE_BINARY_DIR}/lib")
target_link_directories(knn_import_benchmark PUBLIC "${CMAKE_BINARY_DIR}/third_party/arrow/")

# query benchmark
add_executable(knn_query_benchmark
./knn/knn_query_benchmark.cpp
Expand All @@ -61,14 +83,23 @@ target_link_libraries(
zsv_parser
newpfor
fastpfor
jma
opencc
dl
lz4.a
atomic.a
c++.a
c++abi.a
jma
opencc
parquet.a
arrow.a
thrift.a
thriftnb.a
${JEMALLOC_STATIC_LIB}
)

target_link_directories(knn_query_benchmark BEFORE PUBLIC "${CMAKE_BINARY_DIR}/lib")
target_link_directories(knn_query_benchmark PUBLIC "${CMAKE_BINARY_DIR}/third_party/arrow/")

# ########################################
# fulltext
# import benchmark
Expand All @@ -86,14 +117,23 @@ target_link_libraries(
zsv_parser
newpfor
fastpfor
jma
opencc
dl
lz4.a
atomic.a
c++.a
c++abi.a
jma
opencc
parquet.a
arrow.a
thrift.a
thriftnb.a
${JEMALLOC_STATIC_LIB}
)

target_link_directories(fulltext_benchmark BEFORE PUBLIC "${CMAKE_BINARY_DIR}/lib")
target_link_directories(fulltext_benchmark PUBLIC "${CMAKE_BINARY_DIR}/third_party/arrow/")

# ########################################
add_executable(sparse_benchmark
./sparse/sparse_benchmark.cpp
Expand All @@ -109,14 +149,23 @@ target_link_libraries(
zsv_parser
newpfor
fastpfor
jma
opencc
dl
lz4.a
atomic.a
jma
c++.a
c++abi.a
opencc
parquet.a
arrow.a
thrift.a
thriftnb.a
${JEMALLOC_STATIC_LIB}
)

target_link_directories(sparse_benchmark BEFORE PUBLIC "${CMAKE_BINARY_DIR}/lib")
target_link_directories(sparse_benchmark PUBLIC "${CMAKE_BINARY_DIR}/third_party/arrow/")

add_executable(bmp_benchmark
./sparse/bmp_benchmark.cpp
)
Expand All @@ -131,14 +180,23 @@ target_link_libraries(
zsv_parser
newpfor
fastpfor
jma
opencc
dl
lz4.a
atomic.a
jma
c++.a
c++abi.a
opencc
parquet.a
arrow.a
thrift.a
thriftnb.a
${JEMALLOC_STATIC_LIB}
)

target_link_directories(bmp_benchmark BEFORE PUBLIC "${CMAKE_BINARY_DIR}/lib")
target_link_directories(bmp_benchmark PUBLIC "${CMAKE_BINARY_DIR}/third_party/arrow/")

add_executable(hnsw_benchmark
./knn/hnsw_benchmark.cpp
)
Expand All @@ -153,23 +211,23 @@ target_link_libraries(
zsv_parser
newpfor
fastpfor
jma
opencc
dl
lz4.a
atomic.a
jma

c++.a
c++abi.a
opencc
parquet.a
arrow.a
thrift.a
thriftnb.a
${JEMALLOC_STATIC_LIB}
)

if(ENABLE_JEMALLOC)
target_link_libraries(infinity_benchmark jemalloc.a)
target_link_libraries(knn_import_benchmark jemalloc.a)
target_link_libraries(knn_query_benchmark jemalloc.a)
target_link_libraries(fulltext_benchmark jemalloc.a)
target_link_libraries(sparse_benchmark jemalloc.a)
target_link_libraries(bmp_benchmark jemalloc.a)
target_link_libraries(hnsw_benchmark jemalloc.a)
endif()
target_link_directories(hnsw_benchmark BEFORE PUBLIC "${CMAKE_BINARY_DIR}/lib")
target_link_directories(hnsw_benchmark PUBLIC "${CMAKE_BINARY_DIR}/third_party/arrow/")

# add_definitions(-march=native)
# add_definitions(-msse4.2 -mfma)
Expand Down
13 changes: 7 additions & 6 deletions benchmark/remote_infinity/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ target_include_directories(remote_query_benchmark PUBLIC "${CMAKE_SOURCE_DIR}/sr
target_include_directories(remote_query_benchmark PUBLIC "${CMAKE_SOURCE_DIR}/third_party/thrift/lib/cpp/src")
target_include_directories(remote_query_benchmark PUBLIC "${CMAKE_BINARY_DIR}/third_party/thrift/")
target_link_directories(remote_query_benchmark PUBLIC "${CMAKE_BINARY_DIR}/lib")
target_link_directories(remote_query_benchmark PUBLIC "${CMAKE_BINARY_DIR}/third_party/arrow/")

target_link_libraries(
remote_query_benchmark
Expand All @@ -21,19 +22,19 @@ target_link_libraries(
zsv_parser
newpfor
fastpfor
jma
opencc
dl
lz4.a
atomic.a
thrift.a
c++.a
c++abi.a
jma
opencc
parquet.a
arrow.a
${JEMALLOC_STATIC_LIB}
)

if(ENABLE_JEMALLOC)
target_link_libraries(remote_query_benchmark jemalloc.a)
endif()

# add_definitions(-march=native)
# add_definitions(-msse4.2 -mfma)
# add_definitions(-mavx2 -mf16c -mpopcnt)
Expand Down
14 changes: 14 additions & 0 deletions docs/getstarted/build_from_source.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,20 @@ sudo ln -s /usr/bin/clang-format-18 /usr/bin/clang-format
sudo ln -s /usr/bin/clang-tidy-18 /usr/bin/clang-tidy
sudo ln -s /usr/bin/llvm-symbolizer-18 /usr/bin/llvm-symbolizer
sudo ln -s /usr/lib/llvm-18/include/x86_64-pc-linux-gnu/c++/v1/__config_site /usr/lib/llvm-18/include/c++/v1/__config_site
sudo apt install -y -V ca-certificates lsb-release
wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
sudo apt update
sudo apt install -y -V libarrow-dev libparquet-dev
wget https://github.com/infiniflow/arrow/archive/refs/heads/main.zip -O arrow.zip
unzip arrow.zip
cd arrow-main && cd cpp && mkdir build && cd build
export CC=/usr/bin/clang-18
export CXX=/usr/bin/clang++-18
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DARROW_BUILD_SHARED=OFF -DARROW_ENABLE_TIMING_TESTS=OFF -DARROW_GGDB_DEBUG=OFF -DARROW_PARQUET=ON ..
ninja -j 0 arrow_static parquet_static
sudo cp ./release/libarrow.a /usr/lib/x86_64-linux-gnu/libarrow.a
sudo cp ./release/libparquet.a /usr/lib/x86_64-linux-gnu/libparquet.a
cd ../../../ && rm -rf arrow-main
```

### Step2 Download Source Code
Expand Down
4 changes: 4 additions & 0 deletions python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ Build the debug version of infinity-sdk in the target location `cmake-build-debu
```shell
pip install . -v --config-settings=cmake.build-type="Debug" --config-settings=build-dir="cmake-build-debug"
```
Note: If you run with the release version and turn jemalloc compile flag on, you must set environment variable, for example
```shell
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so python3 example/simple_example.py
```
Note: If you run with the debug version, you must set the **libasan** environment variable, for example
```shell
LD_PRELOAD=/usr/lib/llvm-18/lib/clang/18/lib/x86_64-pc-linux-gnu/libclang_rt.asan.so python3 example/simple_example.py
Expand Down
Loading
Loading
0