Getting Started

Quick Start

To start working with Kite, simply download its binaries here, edit the settings file, and run the jar file kite-console-xx.jar on the cluster machines using Java 8 JVM (xx refers to Kite version number). Kite is a distributed system that run on commodity hardware clusters. Kite jar file should be executed on each machine seprately, no need to provide a list of machines in the cluster ahead of time. As a new machine runs Kite jar file, it is added to the cluster and automatically discovered by other up-and-running machines as long as they belong to the same network. When a running machine gets down, it is also automatically discovered by other machines. As introduced in About Kite, Kite writes its disk-based artifacts to Hadoop Distributed File System (HDFS). So, before start Kite machines, it is necessary to have an up-and-running HDFS instance. Check here to configure an HDFS cluster. Note that all machines of the same Kite instance should share the same settings of the underlying HDFS, as introduced in Kite Settings File.

After starting each Kite machine, it is ready to receive and execute MQL query language statements. In addition, the Kite jar file can be added as a dependant to Java projects to use Kite APIs in Java programs or with compatible programming languages. To gracefully stop a Kite machine, type quit or exit. Examples section provides sample MQL statements and queries as well as a ready-made example of a streaming data source to start using Kite immediately.

Main Features

Using Kite, system administrators can:
  1. Connect Microblogs streams of arbitrary attributes and schema from local and remote sources.
  2. Create index structures on arbitrary attributes of existing Microblogs streams. Kite provides both spatial and non-spatial index types.
  3. Add and remove machines dynamically to Kite cluster as needed without restarting or interrupting the cluster operation.
  4. Search existing streams using MQL query language and Java-compatible APIs. Kite automatically chooses the right index structures to access to process queries efficiently.
  5. Manage and administrate existing streams and index structures with a variety of utility commands and tools.
Full details of supported features in Kite is maintained here.

Kite Settings File

On running the Kite jar file on each machine, the system administrator should provide a settings file. By default, Kite jar assumes a settings file named kite.settings and located in the same folder of the jar file. If the settings file is located elsewhere or named differently, it should be provided as a command line argument to the jar file.

The settings file include the HDFS settings as mandatory settings, in addition to other optional settings that allow system administrator to tune and control and system performance and behaviour. Kite settings file is a properties file that includes the following parameters:

MQL Query Language

Kite come with a SQL-like query language, called Microblogs Query Language (MQL), that eases exploiting the system features to system administrators through a declaritive interface. MQL provides the main statements CREATE STREAM, CREATE INDEX, DROP STREAM, DROP INDEX, and SELECT to create and drop streams and index structures and query them. It also provides additional statements to manage and administrate the system assets: SHOW, UNSHOW, PAUSE, RESUME, ACTIVATE, DEACTIVATE, RESTART, and DESC statements. The usage of each statement is detailed below.
CREATE STREAM stream_name (att1:Type, att2:Type, att3:Type,... attn:Type)
FROM stream_source
FORMAT stream_format
CREATE STREAM stream1 (id:Long, mtime:Timestamp, keyword:String, location:GeoLocation, username:String)
FROM Network_TCP(
FORMAT CSV(0,1,4,3,2)
This statement creates and connects a new Microblog stream to the system. Back to MQL
CREATE INDEX HASH index_name ON stream_name(attribute_name) [OPTIONS index_capacity, num_index_segments]
CREATE INDEX SPATIAL spatial_partitioning_type index_name ON stream_name(attribute_name)
[OPTIONS index_capacity, num_index_segments, north, south, east, west, num_rows, num_cols]
CREATE INDEX HASH index1 ON stream1(keyword)
CREATE INDEX HASH index1 ON stream1(keyword) OPTIONS 2000000,20
CREATE INDEX SPATIAL GRID index2 ON stream1(location)
CREATE INDEX SPATIAL GRID index2 ON stream1(location) OPTIONS 2000000,20,90,-90,180,-180,180,360
This statement creates a new index on an existing stream. Kite supports two families of index structures: hash index structure for any attribute and spatial index structures for spatial attributes. Each Kite index consists of two components, in-memory component and in-disk component. Both in-memory and in-disk components are segmented based on the time attribute. The in-memory component has a maximum capacity. When the maximum memory capacity is filled, the oldest data segment is flushed to the in-disk component. Back to MQL
DROP INDEX index_name stream_name
DROP INDEX index1 stream1
This statement drops an existing index. Back to MQL
DROP STREAM stream_name
This statement drops an existing stream. Back to MQL
SELECT attribute_list FROM stream_name [WHERE condition] [TOPK k] [TIME time_interval]
SELECT * FROM stream1
SELECT id, keyword FROM stream1 TOPK 17
SELECT id, keyword FROM stream1 WHERE keyword = obama
SELECT id, keyword FROM stream1 WHERE keyword = obama TOPK 70 TIME [13 Jan 2017, 15 Jan 2017]
SELECT id, keyword FROM stream1 WHERE (keyword = obama OR keyword=trump) AND location WITHIN [50,24,-122,-126] TOPK 50
This statement posts a query on an existing stream. Back to MQL
SHOW stream_name
SHOW stream1
This statement shows the user the continuous insertion operations in an existing stream and its index structures. Back to MQL
UNSHOW stream_name
UNSHOW stream1
This statement reverts the effect of a SHOW statement on an existing stream. Back to MQL
PAUSE stream_name
PAUSE stream1
This statement pauses data insertion in an existing stream and all its index structures. Back to MQL
RESUME stream_name
RESUME stream1
This statement reverts the effect of a PAUSE statement on an existing stream. Back to MQL
ACTIVATE index_name stream_name
ACTIVATE index1 stream1
This statement activates insertion on an existing index structure. Back to MQL
DEACTIVATE index_name stream_name
DEACTIVATE index1 stream1
This statement deactivates insertion on an existing index structure. Back to MQL
RESTART stream_name
RESTART stream1
This statement restarts an existing stream and all its indexe structures. It is usually used after a system machine restarts to re-play a Microblog stream that was running on the machine before it gets down. Back to MQL
DESC [stream_name]
DESC stream1
This statement describes the system metadata. If a stream name is provided, the statement outputs a description for the given stream and all its index structures. If no stream name provided, then the statement describes all existing streams in the system, both active and paused ones. Back to MQL

Java APIs

All Kite features can be used through Java programs by adding the Kite jar file to the Java project and import edu.umn.cs.kite.*. Actually, all MQL statements are executed through translating them into the equivalent Java lines of code. In this tutorial, we describe how to launch a Kite machine and give the equivalent Java lines of code for each MQL statement.
Action Java Code Snippets Notes
Launch Kite Machine
KiteLaunchTool kite = new KiteLaunchTool();
KiteInstance.initSettings(kite, settingsFilePath);
or KiteInstance.initSettings(kite);//for default settings file
Execute an MQL Statement
String statement = "CREATE....";
parsingResults = MQL.parseStatement(statement);
The parser returns a Boolean indicates a successful or failed parsing, a String error message in case of failed parsing, and a MetadataEntry in case of successful parsing.
StreamFormatInfo format = new StreamFormatInfo("csv", attrIndecies);
Scheme scheme = new Scheme(attrList);
Preprocessor preprocessor = new MicroblogCSVPreprocessor(format, scheme);
StreamingDataSource source = new SocketStream (host,port, preprocessor);
StreamDataset stream = new StreamDataset (name, source);
KiteInstance.addStream(stream.getName(), stream);
StreamDataset stream = new StreamDataset (...);
stream.createIndexHash(index_attribute, index_name, index_capacity,
num_index_segments, loadDiskIndex);
stream.createIndexSpatial(index_attribute, index_name, new GridPartitioner(...),
index_capacity, num_index_segments, loadDiskIndex);
loadDiskIndex is true when the index previously exists in the system, and false otherwise.
StreamDataset stream = KiteInstance.getStream(stream_name);
KiteInstance.removeIndexMetadata(stream_name, index_name);
StreamDataset stream = KiteInstance.getStream(stream_name);
StreamDataset stream = KiteInstance.getStream (stream_name);
MQLResults results = (new Query(...), attributeNames);
StreamDataset stream = KiteInstance.getStream (stream_name);;
StreamDataset stream = KiteInstance.getStream (stream_name);
StreamDataset stream = KiteInstance.getStream (stream_name);
DESC ...
KiteInstance.descStream (stream_name);
KiteInstance.descAllStreams ();


Streaming Data Source Example

We provide an example streaming data source that work over network TCP connections. Kite users can download the data source binaries and source files from here. The jar file takes an input text file that is in this format, an example input file can be downloaded here. This data source takes data from a local file system folder. The data folder has one or more subfolders, each subfolder has one or more data file(s). A sample data folder can be downloaded from here. This example data source reads files that are compressed in GZip format and each line in the file represents one Tweet in JSON format as Twitter APIs format. To read other file formats, the following methods should be edited: method TextStream.openNextFile() to read file formats other than GZip and method TweetJSONPreprocessor.preprocess(String jsonTweet) to parse Tweet formats other than JSON.

MQL Examples

Kite users can download MQL statements examples here.

Kite Features

Kite main features are listed here. Full detailed of supported features in Kite is maintained here.