Open Source Projects That I Rely On To Effectively Do My Job – Part 3

There are a number of things that exist in the open source world without which I do no think I could do my Job. I am a Web Developer. I work on a range of projects using different systems, languages and processes. I work a lot with WordPress as well.

Many aspects of my work revolve around scanning logs, writing and reading code in a text editor and browsing the internet. I have my prefered programs for doing each of those tasks.

This is a set of articles that look at a lof of the open source projects that I rely on to do my job and do it effectively.

Online Applications

Some of the tools I use are online services or applications. In the open source world people build things and they share them. Since I am in the web developer sphere that means a lot of the circles I am in people build online software.

Online software is convenient because they are more portable and often accessible from a variety of devices. A lot of online services are powered by open source software (and that's not counting the unlaying OS or the fact that it probably uses Apache or NGINX to respond to people's browsers).

WordPress

A lot of the work that I do relates back to WordPress in some way. It powers a huge amount of the publicly accessible internet. Sometimes I build for WP or extend it, other times I build things to work alongside it. Sometimes I just build server stacks capable of running it.

If WordPress was closed source, or did not exist, a god portion of my work would not come in.

GitHub – And Git

GitHub is a giant when it comes to source code management. GitHub manages code using an unlaying software called git. That software was started by the same man who started the linux kernel.

GitHub itself is not an open source application. I can't download a copy of it and run my own private version of it (but you can have private instances setup and managed by them, either hosted in the cloud or in-house). It is powered by an open software and also values open source greatly. Most projects hosted there are under some kind of open licence.

Other Online Git Services – BitBucket, GitLab

There are other repo hosts available. Bitbucket is a good choice. GitLab is also a good choice.

GitLab is an online service where you can host your code as well but it's an open software too. You can download it to run on your own server managed by yourself. It is extremely full featured – offering much of the same as github and bitbucket – as well as a lot of integrated CI and tooling.

Communications – Slack

Even talking to yourself can be useful at times, communication is better when more people can be involved and the conversations can be archived and searched. Slack lets that happen. It's actually not an open source project as such but a tool for communication that isn't email is essential when working online with others. 

Conversations happen in Chat Rooms. Slack provides nice rooms to have those conversations.

Nginx Reverse Proxy Cache of WordPress on Apache

An NGINX reverse proxy for WordPress sites running on Apache is my standard setup for running WP sites. I've got a pretty slick setup running entirely self-contained NGINX reverse proxy to WP on Apache PHP7 using Docker to Proxy Multiple WordPress Instances.

Every single shared and manage host I've personally used in the last 10-15 years ran Apache as the default http server. Every client I've ever had with a shared or managed account too. I've only every once been offered the option of anything different, it was not default configuration though.

NGINX is very capable of doing the exact same thing as Apache but I see it used more commonly as a proxy. You can also use Apache for a proxy if you want to.

Apache and NGINX are both http servers, they are pretty interchangeable if you are only interested in an end result being a page reaching the requesting user.

Some Key High Level Differences Between Apache and NGINX

Apache is incredibly well supported and used by a huge amount of servers. It can be installed and works almost right out of the box. It's modular, works on many systems and is capable of hosting a wide range of sites with relatively minimal configuration.

It's the default http server of choice for so many for a reason – it copes well with most situations and is generally simple to configure.

On the other hand NGINX has a smaller market share, can be a little more tricky to install, make it work right – and may require additional setup for particular applications.

It's not as modular (turning on features sometimes requires complete rebuild from source) but it performs a lot better than non-tuned Apache installs. It is less memory hungry and handles static content way better than Apache. In comparisons is excels particularly well when handling concurrent connections.

Why Put An HTTP Server In Front Of An HTTP Server?

I get asked this by site builders a lot more than I ever thought I would. There are several technical reasons and infrastructure reasons why you may want to do this. There's also performance reasons and privacy reasons. I won't go into great detail about any of them but I encourage you to Google for more detail if you are intrigued.

There are 2 simple reasons why I do this that are both related to separating the access to a site from the operation of a site.

  1. Isolating front-end from back-end means that I can have specially tweaked configurations, run necessary services spanning multiple host machines and know that all of that in transparent to the end user.
  2. The other reason is performance based. The front-end does nothing dynamic, it serves only static html and other static content that it is provided from the backend services. It can manage load balancing and handle service failover. It can cache many of the resources it has – this results in less dynamic work generating pages and more work actually serving the pages once they have been generated.

When To Cache A Site At The Proxy

I cache almost every request to WordPress sites when users are not logged in. Images, styles and scripts, the generated html. Cache it all, and for a long time.

That is because the kinds of sites I host and almost completely content providing sites. They are blogs, service sites and resources. I think most sites fit into that same bucket.

These kinds of sites are not always updated daily, comments on some posts are days or weeks between them. Single pages often stay the same for a long time, homepages and tax pages may need updated more often but still not as often as to require a freshly generated page every time.

Some Particular Caching Rules and Configs For These Sites

A good baseline confg for my kind of sites would follow rules similar to these:

  • Default cache time of 1 month.
  • Default cache pragma of public
  • Cache statics, like images and scripts, on first request – cache for 1 year. 
  • Cache html only after 2 requests, pass back 5-10% of requests to backend to check for updated page.
  • Allow serving of stale objects and do a refresh check in the background when it occurs.
  • Clear unrequested objects every 7 days.

A long default cache lifetime is good to start with, I'd even default to 1 year in some instances. 1 month is more appropriate for more cases though.

Setting cache type to public means that not just browsers will cache but also other services as well between request and response.

Static resources are unlikely to change ever. Long cache lifetimes for these items. Some single pages may have content that doesn't ever change but the markup can still be different sometimes – maybe there's a widget of latest articles or comments that would output a new item every now and again.

Because of that you should send some of the requests to the backend to check for an updated page. Depending on how much traffic you have and how dynamic the pages are you can tweak the percentage.

The reason that html is set not to be cached on the first 2 requests is because the backend sometimes does it's own caching and optimizations that require 1 or 2 requests to start showing. We should let the backend have some requests to prime it's cache so that when it is cached at the proxy it is caching the fully optimized version of the page.

Serving stale objects while grabbing new ones from the backend helps to ensure that as many requests as possible are cached. If the backend object hasn't changed then the cache just has it's date changed but if it is update then the cache is updated with the new item.

Clearing out cached items that were never requested every so often helps to keep filesize down for the total cache.