Open Source Projects That I Rely On To Effectively Do My Job – Part 1

There are a number of things that exist in the open source world without which I do no think I could do my Job. I am a Web Developer. I work on a range of projects using different systems, languages and processes. I work a lot with WordPress as well.

Many aspects of my work revolve around scanning logs, writing and reading code in a text editor and browsing the internet. I have my prefered programs for doing each of those tasks.

This is a set of articles that look at a lof of the open source projects that I rely on to do my job and do it effectively.

Open Source Operating Systems and Server Software

A lot of open source code is enabled by other software, tools, specifications and systems that are also open source. The most obvious enabler is the availability of open source Operating Systems. These are used on local machines but even more common in infrastructure powering systems and services.

Operating Systems

Open Source OS are only possible because of the ability to take many other pieces of OSS and link or modify it in such a way that it works well together as a whole.

I mainly use Linux OS. Ubuntu, Cent OS, CoreOS, Arch. At the heart of them all is the Linux Kernel. All open, all developed in public.

Server Software – Specifically HTTP Servers

Another specific type of software that I rely on is HTTP servers. These servers allow requests and responses to be made between clients and servers – in a user friendly way returning the rich content we expect on the web today.

There are 2 specific softwares that dominate the http server domain. Apache and NGINX. 

I'd take a guess at 75% or more of all http requests made over the internet would be responded to by one or the other.

Without both OSs and HTTP servers being available as open source I doubt that the web would be what it is. I expect my job may not exist.

PHP & JavaScript

WordPress is primarily written in PHP with many JavaScript components for use in the browser. PHP is itself an open source language and JavaScript is an open specification.

Coding for WordPress most of the time involves working with pure PHP or JavaScript and then hooking that code into WP with some more code.

MySQL

The application layer of most applications, including WordPress, connect to a data layer that is often a MySQL database. MySQL is another open source project (although at the time of MariaDB creation that was very up in arms).

Node

Node is another popular system that I work with a lot. Essentially it runs JavaScript without a browser.

Many people are first introduced to Node as part of build tools – especially since the usage of task runnings become more popular. Grunt and Gulp run in Node. If you've ever ran a npm install command you've used Node.

Nginx Reverse Proxy Cache of WordPress on Apache

An NGINX reverse proxy for WordPress sites running on Apache is my standard setup for running WP sites. I've got a pretty slick setup running entirely self-contained NGINX reverse proxy to WP on Apache PHP7 using Docker to Proxy Multiple WordPress Instances.

Every single shared and manage host I've personally used in the last 10-15 years ran Apache as the default http server. Every client I've ever had with a shared or managed account too. I've only every once been offered the option of anything different, it was not default configuration though.

NGINX is very capable of doing the exact same thing as Apache but I see it used more commonly as a proxy. You can also use Apache for a proxy if you want to.

Apache and NGINX are both http servers, they are pretty interchangeable if you are only interested in an end result being a page reaching the requesting user.

Some Key High Level Differences Between Apache and NGINX

Apache is incredibly well supported and used by a huge amount of servers. It can be installed and works almost right out of the box. It's modular, works on many systems and is capable of hosting a wide range of sites with relatively minimal configuration.

It's the default http server of choice for so many for a reason – it copes well with most situations and is generally simple to configure.

On the other hand NGINX has a smaller market share, can be a little more tricky to install, make it work right – and may require additional setup for particular applications.

It's not as modular (turning on features sometimes requires complete rebuild from source) but it performs a lot better than non-tuned Apache installs. It is less memory hungry and handles static content way better than Apache. In comparisons is excels particularly well when handling concurrent connections.

Why Put An HTTP Server In Front Of An HTTP Server?

I get asked this by site builders a lot more than I ever thought I would. There are several technical reasons and infrastructure reasons why you may want to do this. There's also performance reasons and privacy reasons. I won't go into great detail about any of them but I encourage you to Google for more detail if you are intrigued.

There are 2 simple reasons why I do this that are both related to separating the access to a site from the operation of a site.

  1. Isolating front-end from back-end means that I can have specially tweaked configurations, run necessary services spanning multiple host machines and know that all of that in transparent to the end user.
  2. The other reason is performance based. The front-end does nothing dynamic, it serves only static html and other static content that it is provided from the backend services. It can manage load balancing and handle service failover. It can cache many of the resources it has – this results in less dynamic work generating pages and more work actually serving the pages once they have been generated.

When To Cache A Site At The Proxy

I cache almost every request to WordPress sites when users are not logged in. Images, styles and scripts, the generated html. Cache it all, and for a long time.

That is because the kinds of sites I host and almost completely content providing sites. They are blogs, service sites and resources. I think most sites fit into that same bucket.

These kinds of sites are not always updated daily, comments on some posts are days or weeks between them. Single pages often stay the same for a long time, homepages and tax pages may need updated more often but still not as often as to require a freshly generated page every time.

Some Particular Caching Rules and Configs For These Sites

A good baseline confg for my kind of sites would follow rules similar to these:

  • Default cache time of 1 month.
  • Default cache pragma of public
  • Cache statics, like images and scripts, on first request – cache for 1 year. 
  • Cache html only after 2 requests, pass back 5-10% of requests to backend to check for updated page.
  • Allow serving of stale objects and do a refresh check in the background when it occurs.
  • Clear unrequested objects every 7 days.

A long default cache lifetime is good to start with, I'd even default to 1 year in some instances. 1 month is more appropriate for more cases though.

Setting cache type to public means that not just browsers will cache but also other services as well between request and response.

Static resources are unlikely to change ever. Long cache lifetimes for these items. Some single pages may have content that doesn't ever change but the markup can still be different sometimes – maybe there's a widget of latest articles or comments that would output a new item every now and again.

Because of that you should send some of the requests to the backend to check for an updated page. Depending on how much traffic you have and how dynamic the pages are you can tweak the percentage.

The reason that html is set not to be cached on the first 2 requests is because the backend sometimes does it's own caching and optimizations that require 1 or 2 requests to start showing. We should let the backend have some requests to prime it's cache so that when it is cached at the proxy it is caching the fully optimized version of the page.

Serving stale objects while grabbing new ones from the backend helps to ensure that as many requests as possible are cached. If the backend object hasn't changed then the cache just has it's date changed but if it is update then the cache is updated with the new item.

Clearing out cached items that were never requested every so often helps to keep filesize down for the total cache.