What is Anaconda
is a free open-source Python distribution (as well as R programming language), intended for large-scale data processing and analysis, and scientific computing. Anaconda Python distribution is manged and developed by Continuum Analytics
Anaconda (“Anaconda Distribution”) is a free, easy-to-install package manager, environment manager, Python distribution, and collection of over 720 open source packages with free community support. Hundreds more open source packages and their dependencies can be installed with a simple “conda install [packagename]”. It’s platform-agnostic, can be used on Windows, OS X and Linux. Or even easier.
Continue reading “Anaconda on CentOS 7.x”
I decided to write this post, as I myself when for the first time tried to use conda (the package manager for Anaconda Python distribution
, the first question was in what ways conda is better then pip, and so why one should think of preferring condo over the de-facto pip. Here I have put a comprehensive post about ‘getting started with conda’ i.e. what extra condo can offer.
A short comparison
- Can only be used for Python packages.
- The supported package manger by the Python foundation, hence widely used.
- Handles library dependencies even outside Python i.e. packages for C libraries, or R packages, or really anything.
- Supports virtual environment out of the box.
- Developed to be used with Anaconda Python distribution [link here], though can be used with the standard Python distribution – but highly not recommended.
Continue reading “conda VS pip”
” In Unix-based computer operating systems, init (short for initialization) is the first process started during booting of the computer system. Init is a daemon process that continues running until the system is shut down.”
Continue reading “Register Python script as a Linux systemd service”
Elasticsearch is a distributed storage and real-time search engine.
- Distributed storage – you just need to setup and add Elasticsearch nodes, it’ll keep the data distributed on the cluster nodes. The distributed-ness makes data durable and highly-available too.
- Real-time search engine – You can get to query the data the moment it’s been written.
Due to the above 2 attributes you have been listening and reading about Elasticsearch, wherever there’s a discussion of real-time data analysis. It’d not be an overstatement to say technologies like Elasticsearch set the foundation for any efficient and reliable search engine.
The ES itself is implemented in Java, but it provides a good RESTful api interface which makes it possible to use it with any programming language.
Continue reading “Elasticsearch with Python”
Step 0 – Prerequisites
Docker requires a 64-bit OS and version 3.10 or higher of the Linux kernel. To check your OS architecture and kernel version:
Run yum packages update
Continue reading “Install Docker on CentOS 7”
What is Docker
As per Docker’s website
“Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications, whether on laptops, data center VMs, or the cloud.”
Docker is basically, based on operating-system-level virtualization
“Operating-system-level virtualization is a server-virtualization method where the kernel of an operating system allows for multiple isolated user-space instances, instead of just one. Such instances (sometimes called containers, software containers, virtualization engines (VE), virtual private servers (VPS), or jails) may look and feel like a real server from the point of view of its owners and users.”
Continue reading “Docker 101 – Getting started with Docker”
For this tutorial we are using 3 VMs, with IPs and hostnames – one Ansible controller/manager (ansible-controller) which will be doing the provisioning on the two remote servers i.e. ansible-node1 and ansible-node2:
‘ansible-controller‘ is the manger node, the one performing the provisioning on the rest of the hosts i.e. on ansible-controller we’ll be installing and configuring Ansible.
Continue reading “Install Ansible on CentOS 7”
What is Ansible ?
Ansible is an open-source configuration management
tool i.e. for automating development or production environment setups, cloud provisioning, change management across multiple nodes. The thing which makes Ansible better then the other popular configuration management tools, like Puppet
, Chef, etc is it’s agentless architecture. In the formal (Chef and Puppet) you are required to install the daeman/agent on all the nodes i.e. the controller/master node, as well as the nodes you need to manage (can be 100s or 1000s). In case of Ansible all you need to do is install the Ansible on the controller/master node, and it’ll ssh into all the ‘to be managed’ nodes (may require you to add the keys). This architecture not only makes the setup easy, but also reduces the network overhead, as continuous polling of controller node by the client nodes isn’t required. Ansible was initially supported and sponsored by Ansible, Inc (originally AnsibleWorks, Inc) – in October 2015 it was acquired by Red Hat.
Continue reading “Getting started with Ansible”
Step 0. Prelimenaries
Run yum update
Stop the firewall
To make the ports accessible i.e. for clustering nodes use 5672, 4369, and 25672.
$ sudo systemclt stop firewall-cmd
The above command will disable SELinux for the session i.e. until next reboot – to permanently disable it set SELINUX=disabled in /etc/selinux/config file.
Set host names
Doing so later will most probably break the installation.
We’ll name the nodes/VMs as rabbit1 (192.168.40.192) and rabbit2 (192.168.40.193) – the default is localhost.
Continue reading “Rabbitmq Cluster on CentOS 7”
What is the ELK Stack?
ELK is an acronym from the first letter of three open-source products — Elasticsearch, Logstash, and Kibana— from Elastic
. The 3 products are used collectively (though can be used separately) mainly for centralizing and visualizing logs from multiple servers (as much as you want).
- Elasticsearch is basically a distributed, NoSQL data store, that uses on the Lucene search capabilities.
- Logstash is a log collection pipeline tool that accepts inputs from various sources (log forwarder), executes different filtering and formatting, and writes the data to Elasticsearch.
- Kibana is a graphical-user-interface (GUI) for visualization of Elasticsearch data.
The ELK Stack is the most widely used log analytics solution, beating Splunk’s enterprise software, which had long been the market leader. The ELK Stack is downloaded 500,000 times every month, making it the world’s most popular log management platform. In contrast, Splunk — the historical leader in the space — self-reports 10,000 total customers.
This tutorial is a guide to set up ELK stack and Filebeat as log-forwarder to gather syslogs of a remote machine (or as many servers as you want).
Continue reading “Install ELK stack on CentOS 7 to centralize logs analytics”