public class ImageDataSource
extends Object
image
package implements Spark SQL data source API for loading image data as DataFrame
.
It can load compressed image (jpeg, png, etc.) into raw image representation via ImageIO
in Java library.
The loaded DataFrame
has one StructType
column: image
, containing image data stored
as image schema.
The schema of the image
column is:
- origin: StringType
(represents the file path of the image)
- height: IntegerType
(height of the image)
- width: IntegerType
(width of the image)
- nChannels: IntegerType
(number of image channels)
- mode: IntegerType
(OpenCV-compatible type)
- data: BinaryType
(Image bytes in OpenCV-compatible order: row-wise BGR in most cases)
To use image data source, you need to set "image" as the format in DataFrameReader
and
optionally specify the data source options, for example:
// Scala
val df = spark.read.format("image")
.option("dropInvalid", true)
.load("data/mllib/images/partitioned")
// Java
Dataset<Row> df = spark.read().format("image")
.option("dropInvalid", true)
.load("data/mllib/images/partitioned");
Image data source supports the following options: - "dropInvalid": Whether to drop the files that are not valid images from the result.
, This class is public for documentation purpose. Please don't use this class directly. Rather, use the data source API as illustrated above.