TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support.
At the moment TensorFlow I/O supports the following data sources:
tensorflow_io.ignite
: Data source for Apache Ignite and Ignite File System (IGFS). Overview and usage guide here.tensorflow_io.kafka
: Apache Kafka stream-processing support.tensorflow_io.kinesis
: Amazon Kinesis data streams support.tensorflow_io.hadoop
: Hadoop SequenceFile format support.tensorflow_io.arrow
: Apache Arrow data format support. Usage guide here.tensorflow_io.image
: WebP and TIFF image format support.tensorflow_io.libsvm
: LIBSVM file format support.tensorflow_io.video
: Video file support with FFmpeg.tensorflow_io.parquet
: Apache Parquet data format support.tensorflow_io.lmdb
: LMDB file format support.
The tensorflow-io
package could be installed with pip directly:
$ pip install tensorflow-io
The related module such as Kafka could be imported with python:
$ python
Python 2.7.6 (default, Nov 13 2018, 12:45:42)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> import tensorflow_io.kafka as kafka
>>>
>>> dataset = kafka.KafkaDataset(["test:0:0:4"], group="test", eof=True)
>>> iterator = dataset.make_initializable_iterator()
>>> init_op = iterator.initializer
>>> get_next = iterator.get_next()
>>>
>>> with tf.Session() as sess:
... print(sess.run(init_op))
... for i in range(5):
... print(sess.run(get_next))
>>>
Note that python has to run outside of repo directory itself, otherwise python may not be able to find the correct path to the module.
The TensorFlow I/O package (tensorflow-io
) could be built from source:
$ docker run -it -v ${PWD}:/working_dir -w /working_dir tensorflow/tensorflow:custom-op
$ # In docker
$ curl -OL https://github.com/bazelbuild/bazel/releases/download/0.20.0/bazel-0.20.0-installer-linux-x86_64.sh
$ chmod +x bazel-0.20.0-installer-linux-x86_64.sh
$ ./bazel-0.20.0-installer-linux-x86_64.sh
$ ./configure.sh
$ bazel build build_pip_pkg
$ bazel-bin/build_pip_pkg artifacts
A package file artifacts/tensorflow_io-*.whl
will be generated after a build is successful.
We provide a reference Dockerfile here for you so that you can use the R package directly for testing. You can build it via:
docker build -t tfio-r-dev -f R-package/scripts/Dockerfile .
Inside the container, you can start your R session, instantiate a SequenceFileDataset
from an example Hadoop SequenceFile
string.seq, and then use any transformation functions provided by tfdatasets package on the dataset like the following:
library(tfio)
dataset <- sequence_file_dataset("R-package/tests/testthat/testdata/string.seq") %>%
dataset_repeat(2)
sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)
until_out_of_range({
batch <- sess$run(next_batch)
print(batch)
})