CodeQL documentation

Uncontrolled data used in path expression

ID: py/path-injection Kind: path-problem Security severity: 7.5 Severity: error Precision: high Tags: - correctness - security - external/cwe/cwe-022 - external/cwe/cwe-023 - external/cwe/cwe-036 - external/cwe/cwe-073 - external/cwe/cwe-099 Query suites: - python-code-scanning.qls - python-security-extended.qls - python-security-and-quality.qls 

Click to see the query in the CodeQL repository

Accessing files using paths constructed from user-controlled data can allow an attacker to access unexpected resources. This can result in sensitive information being revealed or deleted, or an attacker being able to influence behavior by modifying unexpected files.

Recommendation

Validate user input before using it to construct a file path, either using an off-the-shelf library function like werkzeug.utils.secure_filename, or by performing custom validation.

Ideally, follow these rules:

  • Do not allow more than a single “.” character.

  • Do not allow directory separators such as “/” or “\” (depending on the file system).

  • Do not rely on simply replacing problematic sequences such as “../”. For example, after applying this filter to “…/…//”, the resulting string would still be “../”.

  • Use an allowlist of known good patterns.

Example

In the first example, a file name is read from an HTTP request and then used to access a file. However, a malicious user could enter a file name that is an absolute path, such as "/etc/passwd".

In the second example, it appears that the user is restricted to opening a file within the "user" home directory. However, a malicious user could enter a file name containing special characters. For example, the string "../../../etc/passwd" will result in the code reading the file located at "/server/static/images/../../../etc/passwd", which is the system’s password file. This file would then be sent back to the user, giving them access to all the system’s passwords. Note that a user could also use an absolute path here, since the result of os.path.join("/server/static/images/","/etc/passwd") is "/etc/passwd".

In the third example, the path used to access the file system is normalized before being checked against a known prefix. This ensures that regardless of the user input, the resulting path is safe.

importos.pathfromflaskimportFlask,request,abortapp=Flask(__name__)@app.route("/user_picture1")defuser_picture1():filename=request.args.get('p')# BAD: This could read any file on the file systemdata=open(filename,'rb').read()returndata@app.route("/user_picture2")defuser_picture2():base_path='/server/static/images'filename=request.args.get('p')# BAD: This could still read any file on the file systemdata=open(os.path.join(base_path,filename),'rb').read()returndata@app.route("/user_picture3")defuser_picture3():base_path='/server/static/images'filename=request.args.get('p')#GOOD -- Verify with normalised version of pathfullpath=os.path.normpath(os.path.join(base_path,filename))ifnotfullpath.startswith(base_path):raiseException("not allowed")data=open(fullpath,'rb').read()returndata

References

close