4
\$\begingroup\$

Downloading specific version like 3.11.1 from https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgz and installing ./configure --enable-optimizations && make install is slow (30 - 40 mins), on GitHub actions.

Dockerfile:

FROM --platform=linux/amd64 ubuntu:22.04 as base USER root ENV PYTHONDONTWRITEBYTECODE 1 ENV PYTHONUNBUFFERED 1 ENV DEBIAN_FRONTEND noninteractive COPY . /app WORKDIR /app COPY ZscalerCertificate.crt /usr/local/share/ca-certificates/ZscalerCertificate.crt RUN find /tmp -name \*.deb -exec rm {} + RUN apt-get update && \ apt-get upgrade -y && \ apt-get install -y software-properties-common ca-certificates &&\ update-ca-certificates RUN apt-get update &&\ apt-get upgrade -y && \ apt-get install -y --no-install-recommends curl gcc g++ gnupg unixodbc-dev openssl git &&\ rm -rf /var/lib/apt/lists/* RUN apt-get update && apt-get upgrade RUN apt-get install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev libssl-dev \ libreadline-dev libffi-dev wget libbz2-dev libsqlite3-dev RUN mkdir /python && cd /python RUN wget https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgz RUN tar -zxvf Python-3.11.1.tgz RUN cd Python-3.11.1 && ls -lhR && ./configure --enable-optimizations && make install 

What is the optimal way to install python specific version? and how to reduce the build time?

\$\endgroup\$
3
  • 2
    \$\begingroup\$Do you absolutely need to build yourself? There are utilities like pyenv that do a better job of managing multiple versions and have download and build included.\$\endgroup\$CommentedAug 17, 2023 at 14:49
  • \$\begingroup\$@Reinderien Not really, If I can get specific python version, I need it because security scanning tool (whitesource/mend) is showing some critical vulnerabilities in the latest version of python.\$\endgroup\$CommentedAug 17, 2023 at 15:28
  • \$\begingroup\$Use make -j with your number of cores for parallelism.\$\endgroup\$
    – qwr
    CommentedAug 18, 2023 at 0:57

2 Answers 2

5
\$\begingroup\$

This is a bad layer:

RUN apt-get update && apt-get upgrade 

That locks in the package lists permanently in a layer. Worse, it means changes to the next layer may use a cached version of this layer instead of getting an update. That could mean the apt-get install tries to get packages no longer in the archive.

Combine it with the install command and post-install size reduction, as you did for the earlier package installs.

Each of those ought to apt-get clean, too, to remove the contents of /var/cache/apt/archives, which tend to be large.


Similarly, this one leaves the large archive lying around:

wget https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgz 

We should be immediately unpacking then removing it (all in a single layer, to keep the container image reasonably small). Preferably, we should unpack, build and clean, keeping just the installed result.

\$\endgroup\$
5
  • \$\begingroup\$For my understanding: isn't changes to the next layer may use a cached version of this layer instead of getting an update a feature instead of a bug? If a layer does not have caching as its purpose, what is it for?\$\endgroup\$CommentedAug 17, 2023 at 15:14
  • \$\begingroup\$Thanks for providing the review. I get to know some new things, any suggestions on improving the build time?\$\endgroup\$CommentedAug 17, 2023 at 15:32
  • \$\begingroup\$I think you're unlikely to improve the build time, but reducing the container size may give you speed and cost improvements by transferring less data. The trouble with using cached package indexes is that when the package repositories are updated, we'll be using old lists of what's available (i.e. the cache is stale). So we always want apt-get update in the same layer as apt-get install.\$\endgroup\$CommentedAug 17, 2023 at 15:51
  • \$\begingroup\$I think you might be able to declare /var/cache as a volume, so that it's never included in any layers. I haven't tried that though - perhaps someone else can verify whether that's a good practice?\$\endgroup\$CommentedAug 17, 2023 at 15:52
  • \$\begingroup\$An update on the solution, Size got reduced by 100MB and no change in build time.\$\endgroup\$CommentedAug 18, 2023 at 14:13
2
\$\begingroup\$

There are a couple optimisations you can do here.

First and foremost, you should re-order your commands, so that the earlier layers are the least likely to change and latter layers are the ones that are more likely to change. In general, you should install your dependencies first before COPY-ing files belonging to the application that will run on it. This way, you take better advantage of caching. COPY . makes a very volatile layer as any change in any files would invalidate all of the subsequent layers. If you really have to do that, do that as close as possible to the final step.

FROM --platform=linux/amd64 ubuntu:22.04 as base USER root ENV PYTHONDONTWRITEBYTECODE 1 ENV PYTHONUNBUFFERED 1 ENV DEBIAN_FRONTEND noninteractive # is this step really necessary? there shouldn't be anything in /tmp # RUN find /tmp -name \*.deb -exec rm {} + RUN apt-get update && \ apt-get upgrade -y && \ apt-get install -y software-properties-common ca-certificates &&\ update-ca-certificates RUN apt-get update &&\ apt-get upgrade -y && \ apt-get install -y --no-install-recommends curl gcc g++ gnupg unixodbc-dev openssl git &&\ rm -rf /var/lib/apt/lists/* RUN apt-get update && apt-get upgrade RUN apt-get install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev libssl-dev \ libreadline-dev libffi-dev wget libbz2-dev libsqlite3-dev RUN mkdir /python && cd /python RUN wget https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgz RUN tar -zxvf Python-3.11.1.tgz RUN cd Python-3.11.1 && ls -lhR && ./configure --enable-optimizations && make install COPY . /app WORKDIR /app COPY ZscalerCertificate.crt /usr/local/share/ca-certificates/ZscalerCertificate.crt 

That way, merely changing Dockerfile or certificate wouldn't require an entire reinstallation and recompilation of Python.

Second, to minimize image sizes, avoid creating layers with unnecessary cached files:

FROM --platform=linux/amd64 ubuntu:22.04 as base USER root ENV PYTHONDONTWRITEBYTECODE 1 ENV PYTHONUNBUFFERED 1 ENV DEBIAN_FRONTEND noninteractive # is this step really necessary? there shouldn't be anything in /tmp # RUN find /tmp -name \*.deb -exec rm {} + RUN apt-get update &&\ apt-get upgrade -y && \ apt-get install -y --no-install-recommends curl gcc g++ gnupg unixodbc-dev openssl git &&\ apt-get install -y software-properties-common ca-certificates &&\ apt-get install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev libssl-dev libreadline-dev libffi-dev wget libbz2-dev libsqlite3-dev && \ update-ca-certificates && \ rm -rf /var/lib/apt/lists/* RUN mkdir /python && cd /python && \ wget https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgz && \ tar -zxvf Python-3.11.1.tgz && \ cd Python-3.11.1 && \ ls -lhR && \ ./configure --enable-optimizations && \ make install && \ rm -rf /python COPY . /app WORKDIR /app COPY ZscalerCertificate.crt / 

Third, you can go further than this to slim down the image by removing unnecessary apt-get dependencies as well. There are two approach to this:

  1. use multi-stage build and use COPY --from to copy just the Python files that you need on a fresh image.
  2. Or do what the official debian slim python did, which is to use apt-mark to uninstall unnecessary packages. This need to happen in the same layer as the entire python compile and install step to avoid creating bloated intermediate images.
\$\endgroup\$
2
  • \$\begingroup\$Thanks for the details. Can you add more details on multi-stage build? What are the paths? From which path python executables need to copy and in which path should it need to be store in final stage?\$\endgroup\$CommentedAug 18, 2023 at 5:55
  • \$\begingroup\$@Pythoncoder you can use docker diff to find the list of files that changed since the container was created from the image. You can run make as a dockerfile step, then run make install, find the docker diff, and use that list of files. Alternatively, make a deb package so all the file you need to COPY is in a single self contained file (though this doubles the image diff size).\$\endgroup\$
    – Lie Ryan
    CommentedAug 19, 2023 at 0:07

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.