Originally published on Sep 10, 2015, at ObjectRocket.com/blog
Do you have a brand new Elasticsearch® instance, but all your useful data you’d like to search lives in a CSV file? No problem. Logstash® makes turning almost any data into something easily searchable in an Elasticsearch index.

To start with, you need some data and an environment similar to Unix® to use these examples. Windows® works fine with
some minor adjustments. In this case, we wanted to take an export of the data from our Davis Vantage Pro2® weather
station, in .CSV
format, and create a new index with it.
We started with a few million lines similar to these, stored in a local file:
$ head -3 /home/erik/weather.csv
HumOut,TempIn,DewPoint,HumIn,WindDir,RainMonth,WindSpeed,RainDay,BatteryVolts,WindChill,Pressure,time,TempOut,WindSpeed10Min,RainRate
76,78.0,78.227017302825,44,109,2.0,2,0.0,1.236328125,90.87261657090625,29.543,2015-06-18T17:49:29Z,86.5,1,0.0
76,78.0,78.227017302825,44,107,2.0,2,0.0,1.236328125,90.87261657090625,29.543,2015-06-18T17:49:45Z,86.5,1,0.0
76,78.0,78.32406784157725,44,107,2.0,0,0.0,1.236328125,90.83340000000001,29.543,2015-06-18T17:50:00Z,86.59999999999999,1,0.0
Note: For this experiment to work, you need to have at least one data source.
After you have data, you can get started. First, make sure you have a version of Java installed:
$ java -version
openjdk version "1.8.0_51"
Any Java Virtual Machine (JVM) is fine for this—OpenJDK®, Oracle®, and so on.
$ curl -O https://download.elastic.co/logstash/logstash/logstash-1.5.4.tar.gz
$ tar xfz logstash-1.5.4.tar.gz
$ cd logstash-1.5.4
$ mkdir conf
Now, it’s time to build a configuration file.
First, define an input
section where you tell Logstash where to find the data:
input {
file {
path => "/home/erik/weather.csv"
start_position => beginning
}
}
This just tells Logstash where to look and that we want to load from the beginning of the file. Next, we need a filter—Logstash has loads of filter plugins available by default. This example uses a couple to parse the data. So far, Logstash doesn’t know anything about the data in the file—you need to specify the format and any other specifics on how to handle various fields:
filter {
csv {
columns => [
"HumOut",
"TempIn",
"DewPoint",
"HumIn",
"WindDir",
"RainMonth",
"WindSpeed",
"RainDay",
"BatteryVolts",
"WindChill",
"Pressure",
"time",
"TempOut",
"WindSpeed10Min",
"RainRate"
]
separator => ","
remove_field => ["message"]
}
date {
match => ["time", "ISO8601"]
}
mutate {
convert => ["TempOut", "float"]
}
}
The columns are self-explanatory, but here’s more detail. First, the example removes the message
field, which is an entry containing
the entire row. You won’t need it because you’re searching for specific attributes. Second, it specifies that the time
field contains
an ISO8601-formatted date
so that Elasticsearch knows it’s not a plain string. Finally, it uses the mutate function to convert the
TempOut
value into a floating-point number.
Now, use the following code to ingest the data and parse it after storing it in Elasticsearch:
output {
elasticsearch {
protocol => "https"
host => ["iad1-20999-0.es.objectrocket.com:20999"]
user => "erik"
password => "mysupersecretpassword"
action => "index"
index => "eriks_weather_index"
}
stdout { }
}
Finally, configure your host and port, authentication data, and the name of the index to store it in.
Ok, let’s fire it up. If it’s working, it should look similar to this:
$ bin/logstash -f conf/logstash.conf -v
Logstash startup completed
Did it work? Ask Elasticsearch:
$ curl -u erik:mysupersecretpassword 'https://iad1-20999-0.es.objectrocket.com:20999/_cat/indices?v'
health status index pri rep docs.count store.size pri.store.size
green open eriks_weather_index 5 1 294854 95.8mb 48.5mb
The documents are there, so query for one:
$ curl -u erik:mysupersecretpassword 'https://iad1-20999-0.es.objectrocket.com:20999/eriks_weather_index/_search?q=TempOut:>75&pretty&terminate_after=1'
This tells Elasticsearch to find documents with TempOut
greater than 75 (Tempout:>75
), to format it for human consumption (pretty),
and to return no more than one result per shard (terminate_after=1
). It should return something like this:
{
"took" : 4,
"timed_out" : false,
"terminated_early" : true,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [ {
"_index" : "eriks_weather_index",
"_type" : "logs",
"_id" : "AU-yXZJIJb3HnhKvpdNC",
"_score" : 1.0,
"_source":{"@version":"1","@timestamp":"2015-06-22T10:24:23.000Z","host":"kibana","path":"/home/erik/weather.csv","HumOut":"86","TempIn":"79.7","DewPoint":"70.65179649787358","HumIn":"46","WindDir":"161","RainMonth":"2.7","WindSpeed":"0","RainDay":"0.36","BatteryVolts":"1.125","WindChill":"82.41464999999999","Pressure":"29.611","time":"2015-06-22T10:24:23Z","TempOut":75.1,"WindSpeed10Min":"0","RainRate":"0.0"}
} ]
}
}
Success. Logstash is a great Swiss Army Knife for turning any data you have laying around into something you can easily play within Elasticsearch, so have fun!
Visit www.rackspace.com and click Sales Chat to start the cinversation. Use the Feedback tab to make any comments or ask questions.
Click here to view The Rackspace Cloud Terms of Service.