How to upload large CSV in Elixir

Upload of big CSV files in Elixir is always a tedious process as the file will consume the whole memory. The solution in elixir for this is “File. Stream”. It handles the upload effectively as it stores one line at a time in a memory. This article explains the steps to achieve the large CSV file uploads using “File. Stream”.

Our milestones

Below are the steps to be followed

  • Upload full CSV into the storage server
  • Process CSV
  • Monitor the status of the process
Upload full CSV into the storage server

Uploading files will be automatically stored in temporary locations by using managed uploaded file %Plug.Upload{}. But the file will be deleted after the request is completed. However, this process will take time when uploading a large file and hence this should not be stored in temporary locations. Alternate solution is to store the CSV in the file storage.

Process CSV

While storing the file, following statistics are required and should be stored

  • Uploaded file path
  • Uploaded file error file path – For error handling
  • Total rows – Number of rows in the CSV
  • Processed row(s) – beginning with zero
  • Failed row(s) – beginning with zero
  • Loaded row(s) – beginning with zero
  • Status – In the beginning, it is “New”

By using above mentioned details, CSV uploading process can be monitored considering the fact that the above details are stored in the process table. By using the unique id of the process, the status and other process details can be monitored. For initiating the process, any of the asynchronous queueing system (like redis, queue etc.,) can be used as it cannot be done in single request. As it is clear that big files will have a long processing time, holding the same request will result in timeout. By following the process table method explained above, after initiating the process, the process will start whenever the queue is free and will not incur any abrupt timeouts or stoppages. Below is the sample code:

File.stream!(“example.csv”)
|> CSV.decode(headers: true)
|> Stream.filter(#Any filter process if needed)
|> Enum.reduce({0, 0,0, []}, fn
{ok, row_object}, {process_acc, loaded_acc, failed_acc, error_acc} ->
processed_acc = processed_acc + 1
case some_data_validation do
:ok ->
#insert a single record into database
loaded_acc = loaded_acc + 1
#Update the process details
{prcessed_acc, loaded_acc, failed_acc, error_acc}
{:error, reason} ->
failed_acc = failed_acc + 1
#Update the process details
{processed_acc, loaded_acc, failed_acc, [error_acc | reason]}

end
end)

To explain the steps mentioned in the above code:

  • File.stream!(“example.csv”) this will stream the file line by line
  • CSV.decode(headers: true) will decode CSV row to an object
  • Stream.filter(#Any filter process if needed)
  • Enum.reduce() – For updating process monitoring details, few accumulators can be used to store the required details and in the same process, validations are also updated
  • After completing the above mentioned process, an error file will be created by using error_acc so that errors can be debugged with the help of the error file.
Monitor the status of the process

During the processing, the process details are updated and by using process unique id, all parameters like records processed, records successfully loaded, records in error can be monitored. These parameters are quite useful to show the progression of the process in the application and can be enhanced with graphical views like progress bars, charts etc.

Switch to Erlang or Elixir for the better future

As the name suggests, it’s an epic battle between decades old native programming languages and the recent functional programming languages like erlang and elixir. I’m not going to list down the differences between the functional programming paradigm and OOPS paradigm, instead I’ll show you the advantages of our spotlight Erlang and Elixir. After reading this post, you will learn the benefits of functional programming with elixir and erlang.

Erlang – Battle Tested
  • It has been in production applications for more than 30 years
  • Community Support is still getting better
  • It powers most of the robust systems that handles of millions of users
  • It’s in-built libraries are far more self-sufficient to create and deploy applications in an instant
  • 30 years of programming needs and pressure made sure that ERLANG is the base for many other functional programming languages
  • OTP (Open Telecom Platform), a set of standard tools that shows the robustness of Erlang

An excellent article by a programmer using Erlang: Article

Fault-Tolerant

For many of you, who don’t know what is fault-tolerant, let me put it in simple terms:

Suppose, you have an application with process A and process B, if process B raises an exception or error, your whole system might have been brought down, unless you have defensively coded every awful situations. As a Human, we’re always prone to mistakes.

This is where Erlang SHINES, According to Joe Armstrong (Fondly known as Father of Erlang), the motto of Erlang is :

LET IT FAIL

Oh my goodness! What’s is that? I know what you’re thinking, let me jump in straight to the point. In Erlang, the processes are light weight and are very much similar to OS threads, isolated from each other. Gotcha! So when any process fails, only that process gets affected and the rest of the system stays intact.

Concurrency

In this Era, all programming languages have some sort of supporting library to support concurrency, but the Erlang has in-built support for Concurrency and since, it’s process threads are really light weight, it can handle more concurrent process / requests with commodity hardware.

For example,

Whatsapp was designed by a just tens of people with minimal infrastructure and yet, it supported more than millions without breaking sweat and it still is.

I can give you lots and lots of companines that are using erlang to power their systems. For exhaustive list of companies, Companies – Powered By Erlang

Hot-Code Swapping

After the innovation of docker, the main selling point of Erlang or less used or underrated feature of Erlang, is it’s ability to upgrade Code without stopping the application. Every other programming languages, need to stop their systems to bring the new changes to effect, but erlang does it with ease without stopping the application There are few more features such as pattern matching, immutability and others, which are used everywhere in Erlang, but let’s get through with the our main goal.

Elixir

Elixir is one of most popular language that was built on top of Erlang VM. The reasons for it’s popularity are:

  • Goodness of Ruby, in terms of Syntax (Ruby is still one of the best web development framework)
  • Goodness of Erlang, in terms of BEAM (Compiled code of Erlang), OTP and Erlang VM
  • Meta-programming (Simply put, Code that writes Code)
  • Pipe Operator (Pass the result from one operation to another without intermediate storage / variables)
  • Mix Tool (Toolkit for app releases, upgrade, update, etc.,)
  • Nerves Project (Embedded Systems – Elixir Projects)
  • Phoenix Framework and Ecto (Fully Functional Web Development Framework)

Everyday, the community support is growing and recently Elixir released it’s latest planned features complete version 1.9 and Jose Valim (Creator of Elixir) has announced that the community will actively ship new releases every six months with improvements and fixes though the elixir by now as a language is complete with self-sufficient packaging and releases.

Migrating to Elixir

As pioneers in Elixir, we are proud to say that we have helped almost 40+ clients to ship their existing applications to Elixir within a short span of time. We have multiple stages for supporting you, if you need to ship a part of your application or the whole application itself:

  • Free POC to prove our Capability
  • Product / Feature Understanding (Business Analysis)
  • Product Planning / Mapping Features
  • Choosing the Right Set of Frameworks or Tools
  • Choosing the Best Persistent Storage (SQL or NoSQL)
  • Product / Feature Development
  • User Access Testing
  • Production Release

The above process is a proven process that we have been following with every client.All are happy with the new structure that can support huge loads of traffic without exercising a big need for resources.

Hope! You really enjoyed and understood, why you really need to switch! If you’re looking for support related to migration or create applications from scratch, fill out the form below!