How we moved to open source alternative for Application Performance Management?

Application Performance Management (APM) is monitoring and management of performance and availability of software applications. The interpretation of APM can vary for different people and businesses. A very basic and most important reason for monitoring your Infrastructure and Application is achieving 100% uptime for your customers and stakeholders. Multiple applications have been built over time to allow developers to achieve the same.

‍For reading more on application performance management , visit.

Different organizations use different tools as per their requirements. With multiple solutions available at hand, it is tough to pick one since each of them have their pros and cons. At Shadowfax, we have tried a few as well and as our application traffic increased over time, we wanted to setup more detailed alerts, like if error count of our APIs is higher than a certain threshold or the average response times of our tasks.

‍

A great start with New Relic

During our early days, we focused more on building features for our customers and for internal processes and decided to start with New Relic APM Lite as our Monitoring Tool. It helped us to monitor our complete application performance and lot of issues were rectified to improve our overall response times. As a trial account, we were allowed to monitor our whole application i.e both web and non-web components.

As our application grew and our trial period with New Relic got over, we started to miss a lot of insights. We had no way to keep track of our servers and our production servers would go down without our knowledge. Production Issues were reported mostly from our on-ground team when their applications stopped working. Even tracking just the disk usage was also hard and resulted in downtime multiple times.

Back to the roots, we set up parsing over our server logs and inspected them each time something bad happened. Some of the usual problems that happened but went unreported included the following:

Failure in Computational Resource. Increase of CPU Utilization, Memory Issue, Disks Usage increasing to 100% etc
RabbitMQ Queue unable to receive or delegate tasks
Some database query taking too much time, causing clogging in connection pool, resulting in the application going down

‍

Visualization and Debugging with ELK

As our infrastructure grew, we started using ELK (ELasticSearch, Logstash, Kibana) for debugging production issues. We moved to central RabbitMQ, centralized our celery nodes and created Dashboards to monitor Nginx logs, MQTT Stats, visualizations for team related metrics. We were using New Relic APM Lite along with Nagios along with ELK.

What changed:

Great Visualizations and Dashboards
Central Logging system to debug production issues
We stopped using Flower for monitoring celery workers

Gaps that remained

X-Pack did not offer alerting and authentication for Basic or the open source plan
Use of multiple monitoring tools (Nagios, ELK, Sentry, New Relic APM Lite, Flower) is hard to maintain
Gathering data debugging production issue from multiple tools is tedious

With multiple monitoring tools to maintain, we wanted to upgrade our New Relic Subscription Plan and stop worrying. However the Pricing Plan stopped us to do it and we decided to find an Open Source Solution.

‍

Beyond the doors of open source

With a little trial and error strategy, we decided to use Graphite with StasD and collectd. With multiple collectd plugins already available and easy integration of statsd with django it was a very easy transition. We used collectd to gather server metrics with plugins like collectd-rabbitmq, redis-collectd-plugin. To visualize our time series data for application and analytics, we used Grafana which has better visualization component that Graphite.

We also added authentication to ELK and used self hosted version of Sentry.

What we achieved:

One tool to monitor our complete infrastructure
A better context of our systems
Visualizing aggregated data over time helps our decisions in tweaking our stack
Enabling developers to add metrics for monitoring as per their need
With ease of use, individuals across teams were able to create dashboards and alerting mechanism as per their use case
Confidence in decisions on scaling infrastructure and troubleshooting
Customizing the different components used for monitoring as per our need
Clear separation of metrics from our production, staging and demo environments
Cost effectiveness
It was fun to set up our own thing

‍

What’s next

There is always a lot to refine and with time we would move towards clustering graphite to handle more data. We plan to stop using ElasticSearch as datasource as alerting is still not available in the current version of Grafana.

‍

About Shadowfax

Shadowfax is India’s largest crowdsourced delivery platform with presence in 70+ cities across India and 7000+ daily active delivery personnel. Shadowfax’s unique app enables delivery of food, grocery, pharmacy and e-commerce for businesses and helps them create customer delight using technology. With relentless focus on engineering pleasant experiences for the customers, Shadowfax envisions to become the most desirable and trustworthy delivery platform for customers.

‍

3PL Insights

Navigating the path of progress: E-commerce and logistics expectations from Budget 2024-25

As we stand at the threshold of a new fiscal year, all eyes are on the Budget of India for 2024-25, with expectations in the e-commerce and logistics sectors reaching a peak. The interplay of technology, policy, and infrastructure funding is pivotal for these industries, with stakeholders eagerly anticipating measures that will spur growth and address critical challenges.

Praharsh Chandra

February 26, 2024

3PL Insights

Transforming the last mile: New delivery models, technology strengthen logistics in India

As we bid farewell to 2023, it's undeniable that the landscape of last-mile logistics in India has undergone a revolutionary transformation, with key advancements reshaping the way goods are delivered to our doorsteps. In this 2023 reflection, we delve into the key themes that defined the evolution of last-mile logistics, setting the stage for a dynamic 2024.

Praharsh Chandra

February 26, 2024

3PL Insights

What is 3PL: Know About Its Benefits & Tips

Discover the power of 3PL & its diverse types from Shadowfax. Uncover the logistics magic that optimizes supply chains, enhancing operations & customer satisfaction.

Team Shadowfax

September 4, 2020

How we moved to open source alternative for Application Performance Management?