- Notifications
You must be signed in to change notification settings - Fork 54
/
Copy pathWorkingWithFiles.html
817 lines (741 loc) · 64.3 KB
/
WorkingWithFiles.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
<!DOCTYPE html>
<htmlclass="writer-html5" lang="en" >
<head>
<metacharset="utf-8" /><metacontent="Topic: Working with paths and files, Difficulty: Medium, Category: Section" name="description" />
<metacontent="open file, read file, pathlib, join directory, context manager, close file, rb, binary file, utf-8, encoding, pickle, numpy, load, archive, npy, npz, pkl, glob, read lines, write, save" name="keywords" />
<metaname="viewport" content="width=device-width, initial-scale=1.0" />
<title>Working with Files — Python Like You Mean It</title>
<linkrel="stylesheet" href="../_static/pygments.css" type="text/css" />
<linkrel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<linkrel="stylesheet" href="../_static/my_theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../_static/js/html5shiv.min.js"></script>
<![endif]-->
<scriptdata-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
<scriptsrc="../_static/jquery.js"></script>
<scriptsrc="../_static/underscore.js"></script>
<scriptsrc="../_static/doctools.js"></script>
<scriptasync="async" src="https://www.googletagmanager.com/gtag/js?id=UA-115029372-1"></script>
<scriptsrc="../_static/gtag.js"></script>
<scriptcrossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
<script>window.MathJax={"tex": {"inlineMath": [["$","$"],["\\(","\\)"]],"processEscapes": true},"options": {"ignoreHtmlClass": "tex2jax_ignore|mathjax_ignore|document","processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
<scriptdefer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
<scriptsrc="../_static/js/theme.js"></script>
<linkrel="index" title="Index" href="../genindex.html" />
<linkrel="search" title="Search" href="../search.html" />
<linkrel="next" title="Import: Modules and Packages" href="Modules_and_Packages.html" />
<linkrel="prev" title="Matplotlib" href="Matplotlib.html" />
</head>
<bodyclass="wy-body-for-nav">
<divclass="wy-grid-for-nav">
<navdata-toggle="wy-nav-shift" class="wy-nav-side">
<divclass="wy-side-scroll">
<divclass="wy-side-nav-search" >
<ahref="../index.html" class="icon icon-home"> Python Like You Mean It
</a>
<divclass="version">
1.4
</div>
<divrole="search">
<formid="rtd-search-form" class="wy-form" action="../search.html" method="get">
<inputtype="text" name="q" placeholder="Search docs" />
<inputtype="hidden" name="check_keywords" value="yes" />
<inputtype="hidden" name="area" value="default" />
</form>
</div>
</div><divclass="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<pclass="caption" role="heading"><spanclass="caption-text">Table of Contents:</span></p>
<ulclass="current">
<liclass="toctree-l1"><aclass="reference internal" href="../intro.html">Python Like You Mean It</a></li>
<liclass="toctree-l1"><aclass="reference internal" href="../module_1.html">Module 1: Getting Started with Python</a></li>
<liclass="toctree-l1"><aclass="reference internal" href="../module_2.html">Module 2: The Essentials of Python</a></li>
<liclass="toctree-l1"><aclass="reference internal" href="../module_2_problems.html">Module 2: Problems</a></li>
<liclass="toctree-l1"><aclass="reference internal" href="../module_3.html">Module 3: The Essentials of NumPy</a></li>
<liclass="toctree-l1"><aclass="reference internal" href="../module_3_problems.html">Module 3: Problems</a></li>
<liclass="toctree-l1"><aclass="reference internal" href="../module_4.html">Module 4: Object Oriented Programming</a></li>
<liclass="toctree-l1 current"><aclass="reference internal" href="../module_5.html">Module 5: Odds and Ends</a><ulclass="current">
<liclass="toctree-l2"><aclass="reference internal" href="Writing_Good_Code.html">Writing Good Code</a></li>
<liclass="toctree-l2"><aclass="reference internal" href="Matplotlib.html">Matplotlib</a></li>
<liclass="toctree-l2 current"><aclass="current reference internal" href="#">Working with Files</a><ul>
<liclass="toctree-l3"><aclass="reference internal" href="#Working-with-Paths">Working with Paths</a><ul>
<liclass="toctree-l4"><aclass="reference internal" href="#pathlib.Path">pathlib.Path</a></li>
</ul>
</li>
<liclass="toctree-l3"><aclass="reference internal" href="#Opening-Files">Opening Files</a><ul>
<liclass="toctree-l4"><aclass="reference internal" href="#Specifying-the-Open-Mode">Specifying the Open-Mode</a></li>
<liclass="toctree-l4"><aclass="reference internal" href="#Working-with-the-File-Object">Working with the File Object</a></li>
</ul>
</li>
<liclass="toctree-l3"><aclass="reference internal" href="#Example:-Writing-and-Reading-a-Text-File">Example: Writing and Reading a Text File</a></li>
<liclass="toctree-l3"><aclass="reference internal" href="#Globbing-for-Files">Globbing for Files</a></li>
<liclass="toctree-l3"><aclass="reference internal" href="#Saving-&-Loading-Python-Objects:-pickle">Saving & Loading Python Objects: pickle</a></li>
<liclass="toctree-l3"><aclass="reference internal" href="#Saving-and-Loading-NumPy-Arrays">Saving and Loading NumPy Arrays</a></li>
<liclass="toctree-l3"><aclass="reference internal" href="#Links-to-Official-Documentation">Links to Official Documentation</a></li>
<liclass="toctree-l3"><aclass="reference internal" href="#Reading-Comprehension-Solutions">Reading Comprehension Solutions</a></li>
</ul>
</li>
<liclass="toctree-l2"><aclass="reference internal" href="Modules_and_Packages.html">Import: Modules and Packages</a></li>
</ul>
</li>
<liclass="toctree-l1"><aclass="reference internal" href="../changes.html">Changelog</a></li>
</ul>
</div>
</div>
</nav>
<sectiondata-toggle="wy-nav-shift" class="wy-nav-content-wrap"><navclass="wy-nav-top" aria-label="Mobile navigation menu" >
<idata-toggle="wy-nav-top" class="fa fa-bars"></i>
<ahref="../index.html">Python Like You Mean It</a>
</nav>
<divclass="wy-nav-content">
<divclass="rst-content">
<divrole="navigation" aria-label="Page navigation">
<ulclass="wy-breadcrumbs">
<li><ahref="../index.html" class="icon icon-home"></a> »</li>
<li><ahref="../module_5.html">Module 5: Odds and Ends</a> »</li>
<li>Working with Files</li>
<liclass="wy-breadcrumbs-aside">
<ahref="../_sources/Module5_OddsAndEnds/WorkingWithFiles.md.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<divrole="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<divitemprop="articleBody">
<style>
/* CSS for nbsphinx extension */
/* remove conflicting styling from Sphinx themes */
div.nbinput.containerdiv.prompt*,
div.nboutput.containerdiv.prompt*,
div.nbinput.containerdiv.input_areapre,
div.nboutput.containerdiv.output_areapre,
div.nbinput.containerdiv.input_area .highlight,
div.nboutput.containerdiv.output_area .highlight {
border: none;
padding:0;
margin:0;
box-shadow: none;
}
div.nbinput.container>div[class*=highlight],
div.nboutput.container>div[class*=highlight] {
margin:0;
}
div.nbinput.containerdiv.prompt*,
div.nboutput.containerdiv.prompt* {
background: none;
}
div.nboutput.containerdiv.output_area .highlight,
div.nboutput.containerdiv.output_areapre {
background: unset;
}
div.nboutput.containerdiv.output_areadiv.highlight {
color: unset; /* override Pygments text color */
}
/* avoid gaps between output lines */
div.nboutput.containerdiv[class*=highlight] pre {
line-height: normal;
}
/* input/output containers */
div.nbinput.container,
div.nboutput.container {
display: -webkit-flex;
display: flex;
align-items: flex-start;
margin:0;
width:100%;
}
@media (max-width:540px) {
div.nbinput.container,
div.nboutput.container {
flex-direction: column;
}
}
/* input container */
div.nbinput.container {
padding-top:5px;
}
/* last container */
div.nblast.container {
padding-bottom:5px;
}
/* input prompt */
div.nbinput.containerdiv.promptpre {
color:#307FC1;
}
/* output prompt */
div.nboutput.containerdiv.promptpre {
color:#BF5B3D;
}
/* all prompts */
div.nbinput.containerdiv.prompt,
div.nboutput.containerdiv.prompt {
width:4.5ex;
padding-top:5px;
position: relative;
user-select: none;
}
div.nbinput.containerdiv.prompt>div,
div.nboutput.containerdiv.prompt>div {
position: absolute;
right:0;
margin-right:0.3ex;
}
@media (max-width:540px) {
div.nbinput.containerdiv.prompt,
div.nboutput.containerdiv.prompt {
width: unset;
text-align: left;
padding:0.4em;
}
div.nboutput.containerdiv.prompt.empty {
padding:0;
}
div.nbinput.containerdiv.prompt>div,
div.nboutput.containerdiv.prompt>div {
position: unset;
}
}
/* disable scrollbars on prompts */
div.nbinput.containerdiv.promptpre,
div.nboutput.containerdiv.promptpre {
overflow: hidden;
}
/* input/output area */
div.nbinput.containerdiv.input_area,
div.nboutput.containerdiv.output_area {
-webkit-flex:1;
flex:1;
overflow: auto;
}
@media (max-width:540px) {
div.nbinput.containerdiv.input_area,
div.nboutput.containerdiv.output_area {
width:100%;
}
}
/* input area */
div.nbinput.containerdiv.input_area {
border:1px solid #e0e0e0;
border-radius:2px;
/*background: #f5f5f5;*/
}
/* override MathJax center alignment in output cells */
div.nboutput.containerdiv[class*=MathJax] {
text-align: left !important;
}
/* override sphinx.ext.imgmath center alignment in output cells */
div.nboutput.containerdiv.mathp {
text-align: left;
}
/* standard error */
div.nboutput.containerdiv.output_area.stderr {
background:#fdd;
}
/* ANSI colors */
.ansi-black-fg { color:#3E424D; }
.ansi-black-bg { background-color:#3E424D; }
.ansi-black-intense-fg { color:#282C36; }
.ansi-black-intense-bg { background-color:#282C36; }
.ansi-red-fg { color:#E75C58; }
.ansi-red-bg { background-color:#E75C58; }
.ansi-red-intense-fg { color:#B22B31; }
.ansi-red-intense-bg { background-color:#B22B31; }
.ansi-green-fg { color:#00A250; }
.ansi-green-bg { background-color:#00A250; }
.ansi-green-intense-fg { color:#007427; }
.ansi-green-intense-bg { background-color:#007427; }
.ansi-yellow-fg { color:#DDB62B; }
.ansi-yellow-bg { background-color:#DDB62B; }
.ansi-yellow-intense-fg { color:#B27D12; }
.ansi-yellow-intense-bg { background-color:#B27D12; }
.ansi-blue-fg { color:#208FFB; }
.ansi-blue-bg { background-color:#208FFB; }
.ansi-blue-intense-fg { color:#0065CA; }
.ansi-blue-intense-bg { background-color:#0065CA; }
.ansi-magenta-fg { color:#D160C4; }
.ansi-magenta-bg { background-color:#D160C4; }
.ansi-magenta-intense-fg { color:#A03196; }
.ansi-magenta-intense-bg { background-color:#A03196; }
.ansi-cyan-fg { color:#60C6C8; }
.ansi-cyan-bg { background-color:#60C6C8; }
.ansi-cyan-intense-fg { color:#258F8F; }
.ansi-cyan-intense-bg { background-color:#258F8F; }
.ansi-white-fg { color:#C5C1B4; }
.ansi-white-bg { background-color:#C5C1B4; }
.ansi-white-intense-fg { color:#A1A6B2; }
.ansi-white-intense-bg { background-color:#A1A6B2; }
.ansi-default-inverse-fg { color:#FFFFFF; }
.ansi-default-inverse-bg { background-color:#000000; }
.ansi-bold { font-weight: bold; }
.ansi-underline { text-decoration: underline; }
div.nbinput.containerdiv.input_areadiv[class*=highlight] >pre,
div.nboutput.containerdiv.output_areadiv[class*=highlight] >pre,
div.nboutput.containerdiv.output_areadiv[class*=highlight].math,
div.nboutput.containerdiv.output_area.rendered_html,
div.nboutput.containerdiv.output_area>div.output_javascript,
div.nboutput.containerdiv.output_area:not(.rendered_html) >img{
padding:5px;
margin:0;
}
/* fix copybtn overflow problem in chromium (needed for 'sphinx_copybutton') */
div.nbinput.containerdiv.input_area>div[class^='highlight'],
div.nboutput.containerdiv.output_area>div[class^='highlight']{
overflow-y: hidden;
}
/* hide copybtn icon on prompts (needed for 'sphinx_copybutton') */
.prompt .copybtn {
display: none;
}
/* Some additional styling taken form the Jupyter notebook CSS */
div.rendered_htmltable {
border: none;
border-collapse: collapse;
border-spacing:0;
color: black;
font-size:12px;
table-layout: fixed;
}
div.rendered_htmlthead {
border-bottom:1px solid black;
vertical-align: bottom;
}
div.rendered_htmltr,
div.rendered_htmlth,
div.rendered_htmltd {
text-align: right;
vertical-align: middle;
padding:0.5em0.5em;
line-height: normal;
white-space: normal;
max-width: none;
border: none;
}
div.rendered_htmlth {
font-weight: bold;
}
div.rendered_htmltbodytr:nth-child(odd) {
background:#f5f5f5;
}
div.rendered_htmltbodytr:hover {
background:rgba(66,165,245,0.2);
}
/* CSS overrides for sphinx_rtd_theme */
/* 24px margin */
.nbinput.nblast.container,
.nboutput.nblast.container {
margin-bottom:19px; /* padding has already 5px */
}
/* ... except between code cells! */
.nblast.container+ .nbinput.container {
margin-top:-19px;
}
.admonition>p:before {
margin-right:4px; /* make room for the exclamation icon */
}
/* Fix math alignment, see https://github.com/rtfd/sphinx_rtd_theme/pull/686 */
.math {
text-align: unset;
}
</style>
<divclass="section" id="Working-with-Files">
<h1>Working with Files<aclass="headerlink" href="#Working-with-Files" title="Permalink to this headline"></a></h1>
<p>This section will discuss the best practices for writing Python code that involves reading from and writing to files. We will learn about the built-in <codeclass="docutils literal notranslate"><spanclass="pre">pathlib.Path</span></code> object, which will help to ensure that the code that we write is portable across operating systems (OS) (e.g. Windows, MacOS, Linux). We will also be introduced to a <em>context manager</em>, <codeclass="docutils literal notranslate"><spanclass="pre">open</span></code>, which will permit us to read-from and write-to a file safely; by “safely” we mean that we will be assured that any file that we open will
eventually be closed properly, so that it will not be corrupted even in the event that our code hits an error. Next, we will learn how to “glob” for files, meaning that we will learn to search for and list files whose names match specific patterns. Lastly, we will briefly encounter the <codeclass="docutils literal notranslate"><spanclass="pre">pickle</span></code> module which allows us to save (or “pickle”) and load Python objects to and from your computer’s file system.</p>
<divclass="section" id="Working-with-Paths">
<h2>Working with Paths<aclass="headerlink" href="#Working-with-Paths" title="Permalink to this headline"></a></h2>
<p>Suppose you are writing a Jupyter notebook where you are analyzing data that is saved to your computer. You will naturally need to detail the location where your data is stored on your computer’s file system so that you can load your data. Let’s suppose that this notebook is in the directory <codeclass="docutils literal notranslate"><spanclass="pre">my_folder</span></code> and that there is a directory, <codeclass="docutils literal notranslate"><spanclass="pre">data</span></code>, within it, which contains some text files with your data. Thus your directory structure looks like this:</p>
<divclass="highlight-none notranslate"><divclass="highlight"><pre><span></span>my_folder/
|-notebook.ipynb
|-data/
|-data1.txt
|-data2.txt
</pre></div>
</div>
<p>Now, if you are on a machine that is running Linux or MacOS, the path to <codeclass="docutils literal notranslate"><spanclass="pre">data1.txt</span></code> relative to the notebook is: <codeclass="docutils literal notranslate"><spanclass="pre">./data/data1.txt</span></code>. See that the character <codeclass="docutils literal notranslate"><spanclass="pre">/</span></code> is used as a separator used to denote subsequent directories in a path. On a Windows machine, the separator is <codeclass="docutils literal notranslate"><spanclass="pre">\</span></code>, thus the path to your data would be written as <codeclass="docutils literal notranslate"><spanclass="pre">.\data\data1.txt</span></code>. We want to write our code so that it can be utilized, without modification, across operating systems. This where Python’s fantastic <codeclass="docutils literal notranslate"><spanclass="pre">pathlib</span></code>
module comes in handy.</p>
<divclass="section" id="pathlib.Path">
<h3>pathlib.Path<aclass="headerlink" href="#pathlib.Path" title="Permalink to this headline"></a></h3>
<p>The standard library’s <aclass="reference external" href="https://docs.python.org/3/library/pathlib.html">pathlib module</a> provides a number of classes that make it easy to work with file system paths across operating systems. We will limit our discussion to the <codeclass="docutils literal notranslate"><spanclass="pre">pathlib.Path</span></code> class, which will take care of all of our most pressing needs. This class allows us to write all of our path-related code in a single way, and it will convert the path to the operating system-appropriate format for us underneath the hood.</p>
<p>Let’s begin by creating a <codeclass="docutils literal notranslate"><spanclass="pre">Path</span></code> object that points to the directory containing the present notebook:</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># creating a path-object pointing to the present directory</span>
<spanclass="o">>>></span><spanclass="kn">from</span><spanclass="nn">pathlib</span><spanclass="kn">import</span><spanclass="n">Path</span>
<spanclass="o">>>></span><spanclass="n">root</span><spanclass="o">=</span><spanclass="n">Path</span><spanclass="p">(</span><spanclass="s2">"."</span><spanclass="p">)</span><spanclass="c1"># '.' means: the present directory that this code exists in</span>
</pre></div>
</div>
<p>Because I am running this code from a Windows machine, this will form a <codeclass="docutils literal notranslate"><spanclass="pre">WindowsPath</span></code> object automatically:</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="gp">>>> </span><spanclass="n">root</span>
<spanclass="go">WindowsPath('.')</span>
</pre></div>
</div>
<p>If I were running on a Linux or MacOS machine, it would have formed a <codeclass="docutils literal notranslate"><spanclass="pre">PosixPath</span></code> object instead. Fortunately, we need not worry about these details as these classes handle them for us! The <codeclass="docutils literal notranslate"><spanclass="pre">Path</span></code> class has many useful methods for us to leverage. First, see that it conveniently overrides the <codeclass="docutils literal notranslate"><spanclass="pre">/</span></code> operator (by implementing a <aclass="reference external" href="http://www.pythonlikeyoumeanit.com/Module4_OOP/Special_Methods.html">special method</a>) so that we can create a path to a subsequent directory. Let’s see this in
action:</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># creating a path to the file 'data1.txt' in the subdirectory 'data'</span>
<spanclass="o">>>></span><spanclass="n">path_to_data1</span><spanclass="o">=</span><spanclass="n">root</span><spanclass="o">/</span><spanclass="s2">"data"</span><spanclass="o">/</span><spanclass="s2">"data1.txt"</span>
<spanclass="o">>>></span><spanclass="n">path_to_data1</span>
<spanclass="n">WindowsPath</span><spanclass="p">(</span><spanclass="s1">'data/data1.txt'</span><spanclass="p">)</span>
</pre></div>
</div>
<p>See that the <codeclass="docutils literal notranslate"><spanclass="pre">/</span></code> operator, when used in conjunction with a <codeclass="docutils literal notranslate"><spanclass="pre">Path</span></code> instance, created a new path with the appropriate path-separator for the present OS. This is extremely convenient!</p>
<p>Let’s proceed to explore some other useful methods that <codeclass="docutils literal notranslate"><spanclass="pre">Path</span></code> provides us with. These methods enable us to inspect directories and files, create new directories, list all of the files in a directory, open files to for reading/writing, and much more. A complete listing of these methods can be found <aclass="reference external" href="https://docs.python.org/3/library/pathlib.html#methods-and-properties">here</a> and <aclass="reference external" href="https://docs.python.org/3/library/pathlib.html#methods">here</a>, collectively; it is highly recommended that
you take time to look through them.</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="gp">>>> </span><spanclass="n">root</span><spanclass="o">=</span><spanclass="n">Path</span><spanclass="p">(</span><spanclass="s2">"."</span><spanclass="p">)</span>
<spanclass="gp">>>> </span><spanclass="n">path_to_data1</span><spanclass="o">=</span><spanclass="n">root</span><spanclass="o">/</span><spanclass="s2">"data"</span><spanclass="o">/</span><spanclass="s2">"data1.txt"</span>
<spanclass="go"># Checking to see if a file or directory exists:</span>
<spanclass="gp">>>> </span><spanclass="n">path_to_data1</span><spanclass="o">.</span><spanclass="n">exists</span><spanclass="p">()</span>
<spanclass="go">True</span>
<spanclass="gp">>>> </span><spanclass="p">(</span><spanclass="n">root</span><spanclass="o">/</span><spanclass="s2">"bogus_path"</span><spanclass="p">)</span><spanclass="o">.</span><spanclass="n">exists</span><spanclass="p">()</span>
<spanclass="go">False</span>
<spanclass="go"># Getting the "absolute" path to a file or directory:</span>
<spanclass="gp">>>> </span><spanclass="n">path_to_data1</span><spanclass="o">.</span><spanclass="n">absolute</span><spanclass="p">()</span>
<spanclass="go">WindowsPath('C:/Users/TerranceWasabi/Desktop/PLYMI/Module5_OddsAndEnds/data/data1.txt')</span>
<spanclass="go"># Access the name of the file that the path is pointing to</span>
<spanclass="gp">>>> </span><spanclass="n">path_to_data1</span><spanclass="o">.</span><spanclass="n">name</span>
<spanclass="go">'data1.txt'</span>
<spanclass="go"># Create a new directory, named 'new_folder' within the root directory</span>
<spanclass="gp">>>> </span><spanclass="n">new_dir</span><spanclass="o">=</span><spanclass="n">root</span><spanclass="o">/</span><spanclass="s2">"new_folder"</span>
<spanclass="gp">>>> </span><spanclass="n">new_dir</span><spanclass="o">.</span><spanclass="n">mkdir</span><spanclass="p">()</span>
<spanclass="go"># Use 'glob' to return a generator over all files</span>
<spanclass="go"># that match a specified pattern. E.g. get path to every</span>
<spanclass="go"># .txt file in a directory</span>
<spanclass="gp">>>> </span><spanclass="nb">list</span><spanclass="p">((</span><spanclass="n">root</span><spanclass="o">/</span><spanclass="s2">"data"</span><spanclass="p">)</span><spanclass="o">.</span><spanclass="n">glob</span><spanclass="p">(</span><spanclass="s2">"*.txt"</span><spanclass="p">))</span>
<spanclass="go">[WindowsPath('data/data1.txt'), WindowsPath('data/data2.txt')]</span>
<spanclass="go"># convert a path-object to a string formatted for the present OS</span>
<spanclass="gp">>>> </span><spanclass="nb">str</span><spanclass="p">(</span><spanclass="n">path_to_data1</span><spanclass="p">)</span>
<spanclass="go">'data\\data1.txt'</span>
</pre></div>
</div>
<divclass="admonition note">
<pclass="admonition-title fa fa-exclamation-circle"><strong>Takeaway</strong>:</p>
<p>You should strive to utilize <codeclass="docutils literal notranslate"><spanclass="pre">pathlib.Path</span></code> whenever you are working with file system paths in your code. To reiterate - this will ensure that your code is portable across operating systems, it will help make your path handling easy to read, plus this class’s methods provides a massive amount of functionality for you to leverage at your convenience.</p>
</div>
<divclass="admonition warning">
<pclass="admonition-title fa fa-exclamation-circle"><strong>Note</strong>:</p>
<p><codeclass="docutils literal notranslate"><spanclass="pre">pathlib</span></code> was introduced in Python 3.4. Although many 3rd party libraries have updated their file-I/O utilities to accept both strings and <codeclass="docutils literal notranslate"><spanclass="pre">pathlib.Path</span></code> objects (e.g. <codeclass="docutils literal notranslate"><spanclass="pre">numpy.save</span></code> can be passed a <codeclass="docutils literal notranslate"><spanclass="pre">Path</span></code> instance to tell it where to save a numpy-array), some libraries are late to the party and will only accept strings as paths. On such occasions you can simple convert your <codeclass="docutils literal notranslate"><spanclass="pre">Path</span></code> instance to a string by calling <codeclass="docutils literal notranslate"><spanclass="pre">str</span></code> on it, and then pass the resulting string-path to the file-I/O
function. This is also a friendly reminder to accomodate <codeclass="docutils literal notranslate"><spanclass="pre">pathlib.Path</span></code> objects whenever you find yourself writing your own file-I/O functions!</p>
</div>
</div>
</div>
<divclass="section" id="Opening-Files">
<h2>Opening Files<aclass="headerlink" href="#Opening-Files" title="Permalink to this headline"></a></h2>
<p>It is recommended that you refer to the <aclass="reference external" href="https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files">official Python tutorial</a> for a simple rundown of file reading and writing</p>
<p>Whenever you instruct your code to open a file for reading or writing, you must take care that the file ultimately is closed so that its data is not vulnerable to being modified. Python provides the <codeclass="docutils literal notranslate"><spanclass="pre">open</span></code> context manager, which is designed to ensure that a file will be closed even in the event that our code raises an error.</p>
<p>The following code opens the file “file1.txt” for writing:</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># demonstrating the use of the `open` context manager</span>
<spanclass="c1"># we will write to the file named "file1.txt", located</span>
<spanclass="c1"># in the present directory</span>
<spanclass="n">path_to_file</span><spanclass="o">=</span><spanclass="n">Path</span><spanclass="p">(</span><spanclass="s2">"file1.txt"</span><spanclass="p">)</span>
<spanclass="k">with</span><spanclass="nb">open</span><spanclass="p">(</span><spanclass="n">path_to_file</span><spanclass="p">,</span><spanclass="n">mode</span><spanclass="o">=</span><spanclass="s2">"w"</span><spanclass="p">)</span><spanclass="k">as</span><spanclass="n">f</span><spanclass="p">:</span>
<spanclass="c1"># The indented space enters the "context" of the open file.</span>
<spanclass="c1"># Leaving the indented space exits the context of the opened file, forcing</span>
<spanclass="c1"># the file to be closed. This is ensured even if the code within the indented</span>
<spanclass="c1"># block causes an error.</span>
<spanclass="n">f</span><spanclass="o">.</span><spanclass="n">write</span><spanclass="p">(</span><spanclass="s1">'this is a line.</span><spanclass="se">\n</span><spanclass="s1">This is a second line.</span><spanclass="se">\n</span><spanclass="s1">This is the third line.'</span><spanclass="p">)</span>
<spanclass="c1"># The file is closed here.</span>
</pre></div>
</div>
<p>The syntax <codeclass="docutils literal notranslate"><spanclass="pre">with</span><spanclass="pre"><context_manager>()</span><spanclass="pre">as</span><spanclass="pre"><context_variable>:</span></code> signifies the creation of a context with the object <codeclass="docutils literal notranslate"><spanclass="pre"><context_variable></span></code> . In this case <codeclass="docutils literal notranslate"><spanclass="pre">open</span></code> is the context manager, and the variable we named <codeclass="docutils literal notranslate"><spanclass="pre">f</span></code> is the file-object that is opened within that context, which is delimited by the subsequent indented space. You can also call <codeclass="docutils literal notranslate"><spanclass="pre">open</span></code> directly from a <codeclass="docutils literal notranslate"><spanclass="pre">Path</span></code> instance:</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="k">with</span><spanclass="n">path_to_file</span><spanclass="o">.</span><spanclass="n">open</span><spanclass="p">(</span><spanclass="n">mode</span><spanclass="o">=</span><spanclass="s2">"w"</span><spanclass="p">)</span><spanclass="k">as</span><spanclass="n">f</span><spanclass="p">:</span>
<spanclass="n">f</span><spanclass="o">.</span><spanclass="n">write</span><spanclass="p">(</span><spanclass="s1">'this is a line.</span><spanclass="se">\n</span><spanclass="s1">This is a second line.</span><spanclass="se">\n</span><spanclass="s1">This is the third line.'</span><spanclass="p">)</span>
</pre></div>
</div>
<p>The complete documentation for <codeclass="docutils literal notranslate"><spanclass="pre">open</span></code> can be found <aclass="reference external" href="https://docs.python.org/3/library/functions.html#open">here</a>.</p>
<divclass="section" id="Specifying-the-Open-Mode">
<h3>Specifying the Open-Mode<aclass="headerlink" href="#Specifying-the-Open-Mode" title="Permalink to this headline"></a></h3>
<p>Specifying <codeclass="docutils literal notranslate"><spanclass="pre">mode='w'</span></code> indicates that we will be writing to the file anew - if the file already has any content, that content will be <em>erased</em> before being written to. The following are the available “modes” for opening a file:</p>
<tableclass="docutils align-default">
<colgroup>
<colstyle="width: 50%" />
<colstyle="width: 50%" />
</colgroup>
<thead>
<trclass="row-odd"><thclass="head"><p>Mode</p></th>
<thclass="head"><p>Explanation</p></th>
</tr>
</thead>
<tbody>
<trclass="row-even"><td><p><codeclass="docutils literal notranslate"><spanclass="pre">r</span></code></p></td>
<td><p>Open the file for reading text</p></td>
</tr>
<trclass="row-odd"><td><p><codeclass="docutils literal notranslate"><spanclass="pre">w</span></code></p></td>
<td><p>Open the file, <strong>clearing its contents</strong>, for writing text anew</p></td>
</tr>
<trclass="row-even"><td><p><codeclass="docutils literal notranslate"><spanclass="pre">a</span></code></p></td>
<td><p>Open the file to write text to end of any existing content, thus “appending” to the file</p></td>
</tr>
<trclass="row-odd"><td><p><codeclass="docutils literal notranslate"><spanclass="pre">x</span></code></p></td>
<td><p>Open the file for writing text, failing if the file already exists</p></td>
</tr>
<trclass="row-even"><td><p><codeclass="docutils literal notranslate"><spanclass="pre">+</span></code></p></td>
<td><p>Open the file for both reading and writing text</p></td>
</tr>
</tbody>
</table>
<p>By default, these modes will read and write text utilizing the unicode (utf-8) decoding/encoding specification. That is, when you read data from your file system with <codeclass="docutils literal notranslate"><spanclass="pre">mode='r'</span></code> Python will automatically <em>decode</em> that binary data that was stored on your machine according to utf-8, which converts the binary data to written text stored as a string. Similarly, writing a string to a file in modes ‘w’, ‘a’, ‘x’, or ‘+’ will presume that the string should be encoded into a binary representation
(which is necessary for it to be stored as a file) according to the utf-8 encoding scheme.</p>
<p>You can instead force Python to read and write strictly in terms of binary data by adding a <codeclass="docutils literal notranslate"><spanclass="pre">'b'</span></code> to these modes: <codeclass="docutils literal notranslate"><spanclass="pre">'rb'</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">'wb'</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">'ab'</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">'xb'</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">'+b'</span></code>. It is important to be aware of this binary mode. For example, if you are saving a NumPy-array, you should open a file in the ‘wb’ or ‘xb’ modes so that it expects binary data to be written to it; obviously we are not saving text when we are saving a NumPy array of numbers.</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># saving a NumPy-array to the file 'array.npy'</span>
<spanclass="o">>>></span><spanclass="kn">import</span><spanclass="nn">numpy</span><spanclass="k">as</span><spanclass="nn">np</span>
<spanclass="o">>>></span><spanclass="n">x</span><spanclass="o">=</span><spanclass="n">np</span><spanclass="o">.</span><spanclass="n">array</span><spanclass="p">([</span><spanclass="mi">1</span><spanclass="p">,</span><spanclass="mi">2</span><spanclass="p">,</span><spanclass="mi">3</span><spanclass="p">])</span>
<spanclass="c1"># file must be open for binary-write mode</span>
<spanclass="c1"># since we are not saving text</span>
<spanclass="o">>>></span><spanclass="k">with</span><spanclass="nb">open</span><spanclass="p">(</span><spanclass="s2">"array.npy"</span><spanclass="p">,</span><spanclass="n">mode</span><spanclass="o">=</span><spanclass="s2">"wb"</span><spanclass="p">)</span><spanclass="k">as</span><spanclass="n">f</span><spanclass="p">:</span>
<spanclass="o">...</span><spanclass="n">np</span><spanclass="o">.</span><spanclass="n">save</span><spanclass="p">(</span><spanclass="n">f</span><spanclass="p">,</span><spanclass="n">x</span><spanclass="p">)</span>
</pre></div>
</div>
</div>
<divclass="section" id="Working-with-the-File-Object">
<h3>Working with the File Object<aclass="headerlink" href="#Working-with-the-File-Object" title="Permalink to this headline"></a></h3>
<p>When we invoke <codeclass="docutils literal notranslate"><spanclass="pre">open</span></code> to open a file, the context manager produces an opened file object. The methods of this file object allow us to write-to and read-from the opened file (assuming that we have utilized the appropriate mode when opening it).</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># demonstrating the `read` method of the file object</span>
<spanclass="o">>>></span><spanclass="k">with</span><spanclass="nb">open</span><spanclass="p">(</span><spanclass="n">path_to_file</span><spanclass="p">,</span><spanclass="n">mode</span><spanclass="o">=</span><spanclass="s2">"r"</span><spanclass="p">)</span><spanclass="k">as</span><spanclass="n">var</span><spanclass="p">:</span>
<spanclass="o">...</span><spanclass="c1"># reads the entire content of the file as a string</span>
<spanclass="o">...</span><spanclass="n">content</span><spanclass="o">=</span><spanclass="n">var</span><spanclass="o">.</span><spanclass="n">read</span><spanclass="p">()</span>
<spanclass="o">>>></span><spanclass="n">content</span>
<spanclass="s1">'this is a line.</span><spanclass="se">\n</span><spanclass="s1">This is a second line.</span><spanclass="se">\n</span><spanclass="s1">This is the third line.'</span>
<spanclass="o">>>></span><spanclass="nb">print</span><spanclass="p">(</span><spanclass="n">content</span><spanclass="p">)</span>
<spanclass="n">this</span><spanclass="ow">is</span><spanclass="n">a</span><spanclass="n">line</span><spanclass="o">.</span>
<spanclass="n">This</span><spanclass="ow">is</span><spanclass="n">a</span><spanclass="n">second</span><spanclass="n">line</span><spanclass="o">.</span>
<spanclass="n">This</span><spanclass="ow">is</span><spanclass="n">the</span><spanclass="n">third</span><spanclass="n">line</span><spanclass="o">.</span>
</pre></div>
</div>
<p>The following summarizes some of the methods available to this file object:</p>
<ulclass="simple">
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">read()</span></code>: Read the entire content of the file as a string or as bytes (depending on the open-mode)</p></li>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">readline()</span></code>: Read the next line of text from the file, including the trailing <codeclass="docutils literal notranslate"><spanclass="pre">'\n'</span></code> character</p></li>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">readlines()</span></code>: Read in the lines of text from the file, storing each line as an string in a list.</p></li>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">write(x)</span></code>: Write <codeclass="docutils literal notranslate"><spanclass="pre">x</span></code> (a string) to the file.</p></li>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">writelines(x)</span></code>: Given an iterable of strings, treat each string as a line of text to be written to the file (the inverse of <codeclass="docutils literal notranslate"><spanclass="pre">readlines</span></code>)</p></li>
</ul>
<p>Also, it is important to note that the file object can be <em>iterated over</em>, and that each iteration will return an individual line of text from the file. This is the best way to read through an entire file line-by-line.</p>
</div>
</div>
<divclass="section" id="Example:-Writing-and-Reading-a-Text-File">
<h2>Example: Writing and Reading a Text File<aclass="headerlink" href="#Example:-Writing-and-Reading-a-Text-File" title="Permalink to this headline"></a></h2>
<p>Given the following string:</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># recall: triple-quotes can be used to write multi-line strings</span>
<spanclass="o">>>></span><spanclass="n">some_text</span><spanclass="o">=</span><spanclass="s2">"""A bagel rolled down the hill.</span>
<spanclass="s2">I mean *all* the way down the hill.</span>
<spanclass="s2">A lady watched it roll.</span>
<spanclass="s2">Way to help me out."""</span>
<spanclass="o">>>></span><spanclass="n">some_text</span>
<spanclass="s1">'A bagel rolled down the hill.</span><spanclass="se">\n</span><spanclass="s1">I mean *all* the way down the hill.</span><spanclass="se">\n</span><spanclass="s1">A lady watched it roll.</span><spanclass="se">\n</span><spanclass="s1">Way to help me out.'</span>
</pre></div>
</div>
<p>Write that string to a file, “a_poem.txt”, in the present directory:</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># use mode-x to ensure that we don't overwrite the file</span>
<spanclass="c1"># if it already exists</span>
<spanclass="k">with</span><spanclass="nb">open</span><spanclass="p">(</span><spanclass="s2">"a_poem.txt"</span><spanclass="p">,</span><spanclass="n">mode</span><spanclass="o">=</span><spanclass="s2">"x"</span><spanclass="p">)</span><spanclass="k">as</span><spanclass="n">my_open_file</span><spanclass="p">:</span>
<spanclass="n">my_open_file</span><spanclass="o">.</span><spanclass="n">write</span><spanclass="p">(</span><spanclass="n">some_text</span><spanclass="p">)</span>
</pre></div>
</div>
<p>Now let’s read in each line of the file and append them to the list <codeclass="docutils literal notranslate"><spanclass="pre">out</span></code>, but <em>only if that line starts with the letter ‘A’</em> (just to make things a little bit more involved):</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="k">with</span><spanclass="nb">open</span><spanclass="p">(</span><spanclass="s2">"a_poem.txt"</span><spanclass="p">,</span><spanclass="n">mode</span><spanclass="o">=</span><spanclass="s2">"r"</span><spanclass="p">)</span><spanclass="k">as</span><spanclass="n">my_open_file</span><spanclass="p">:</span>
<spanclass="c1"># recall: iterating over the file-object yields each line of the file</span>
<spanclass="c1"># one line at a time</span>
<spanclass="n">out</span><spanclass="o">=</span><spanclass="p">[</span><spanclass="n">line</span><spanclass="k">for</span><spanclass="n">line</span><spanclass="ow">in</span><spanclass="n">my_open_file</span><spanclass="k">if</span><spanclass="n">line</span><spanclass="o">.</span><spanclass="n">startswith</span><spanclass="p">(</span><spanclass="s2">"A"</span><spanclass="p">)]</span>
</pre></div>
</div>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># verify that the output is what we expect</span>
<spanclass="o">>>></span><spanclass="n">out</span>
<spanclass="p">[</span><spanclass="s1">'A bagel rolled down the hill.</span><spanclass="se">\n</span><spanclass="s1">'</span><spanclass="p">,</span><spanclass="s1">'A lady watched it roll.</span><spanclass="se">\n</span><spanclass="s1">'</span><spanclass="p">]</span>
</pre></div>
</div>
</div>
<divclass="section" id="Globbing-for-Files">
<h2>Globbing for Files<aclass="headerlink" href="#Globbing-for-Files" title="Permalink to this headline"></a></h2>
<p>There are many cases in which we may want to construct a list of files to iterate over. For example, if we have several data files, it would be useful to create a file list which we can iterate through and process in sequence. One way to do this would be to manually construct such a list of files:</p>
<divclass="nbinput nblast docutils container">
<divclass="prompt highlight-none notranslate"><divclass="highlight"><pre><span></span>[1]:
</pre></div>
</div>
<divclass="input_area highlight-ipython3 notranslate"><divclass="highlight"><pre><span></span><spanclass="n">my_files</span><spanclass="o">=</span><spanclass="p">[</span><spanclass="s1">'data/file1.txt'</span><spanclass="p">,</span><spanclass="s1">'data/file2.txt'</span><spanclass="p">,</span><spanclass="s1">'data/file3.txt'</span><spanclass="p">,</span><spanclass="s1">'data/file4.txt'</span><spanclass="p">]</span>
</pre></div>
</div>
</div>
<p>However, this is extraordinarily tedious and prone to error, either by mis-typing a file name or forgetting a file. A much more powerful way to construct such a list of files is by file globbing. A <codeclass="docutils literal notranslate"><spanclass="pre">glob</span></code> is a set of file names matching some pattern. To glob files, we use special wildcard characters that will match all the files with a certain part of a file name. In our case, <codeclass="docutils literal notranslate"><spanclass="pre">*</span></code> will be the wildcard character we use the most - it matches any character. This is much better motivated with an
example. Below, we see some globs and the types of patterns they will match:</p>
<divclass="highlight-none notranslate"><divclass="highlight"><pre><span></span># matches anything that starts with `file` and ends with `.txt` like
# file1.txt, filefilefile.txt, file.txt, file12345.txt, ...
file*.txt
# matches all .txt files in the 'data' directory
data/*.txt
# matches any file name
*
# matches all png image files
*.png
# matches anything that contains 'test' as part of its file name
*test*
# matches all .py files that contain 'number'
*number*.py
</pre></div>
</div>
<p>The <codeclass="docutils literal notranslate"><spanclass="pre">pathlib</span></code> module provides convenient functionality for globbing files. Once we have a <codeclass="docutils literal notranslate"><spanclass="pre">Path</span></code> object, we can simply call <codeclass="docutils literal notranslate"><spanclass="pre">glob()</span></code> on it and pass in a glob string. This will return a <aclass="reference external" href="http://www.pythonlikeyoumeanit.com/Module2_EssentialsOfPython/Generators_and_Comprehensions.html#Introducing-Generators">generator</a> that will yield each of the globbed files.</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># glob all of the text files in the present directory</span>
<spanclass="c1"># that start with 'test' and end with '.txt'</span>
<spanclass="o">>>></span><spanclass="n">root_dir</span><spanclass="o">=</span><spanclass="n">Path</span><spanclass="p">(</span><spanclass="s1">'.'</span><spanclass="p">)</span>
<spanclass="o">>>></span><spanclass="n">files</span><spanclass="o">=</span><spanclass="n">root_dir</span><spanclass="o">.</span><spanclass="n">glob</span><spanclass="p">(</span><spanclass="s1">'test*.txt'</span><spanclass="p">)</span><spanclass="c1"># this produces a generator</span>
<spanclass="o"><</span><spanclass="n">generator</span><spanclass="nb">object</span><spanclass="n">Path</span><spanclass="o">.</span><spanclass="n">glob</span><spanclass="n">at</span><spanclass="mh">0x00000146CE118620</span><spanclass="o">></span>
<spanclass="c1"># get a sorted list of the globbed paths</span>
<spanclass="o">>>></span><spanclass="nb">sorted</span><spanclass="p">(</span><spanclass="n">files</span><spanclass="p">)</span>
<spanclass="p">[</span><spanclass="n">PosixPath</span><spanclass="p">(</span><spanclass="s1">'test_0.txt'</span><spanclass="p">),</span>
<spanclass="n">PosixPath</span><spanclass="p">(</span><spanclass="s1">'test_1.txt'</span><spanclass="p">),</span>
<spanclass="n">PosixPath</span><spanclass="p">(</span><spanclass="s1">'test_apple.txt'</span><spanclass="p">)]</span>
<spanclass="c1"># iterating over the generator directly</span>
<spanclass="o">>>></span><spanclass="k">for</span><spanclass="n">file</span><spanclass="ow">in</span><spanclass="n">root_dir</span><spanclass="o">.</span><spanclass="n">glob</span><spanclass="p">(</span><spanclass="s1">'test*.txt'</span><spanclass="p">):</span>
<spanclass="o">>>></span><spanclass="k">with</span><spanclass="nb">open</span><spanclass="p">(</span><spanclass="n">file</span><spanclass="p">,</span><spanclass="s1">'r'</span><spanclass="p">)</span><spanclass="k">as</span><spanclass="n">f</span><spanclass="p">:</span>
<spanclass="o">...</span><spanclass="c1"># do some processing</span>
<spanclass="o">...</span><spanclass="k">pass</span>
</pre></div>
</div>
<p>For more details on globbing, see <aclass="reference external" href="https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob">the documentation</a>.</p>
<divclass="admonition note">
<pclass="admonition-title fa fa-exclamation-circle"><strong>Reading Comprehension: Basic glob patterns</strong></p>
<p>Write a glob pattern for each of the following prompts</p>
<ulclass="simple">
<li><p>Glob all .txt files in the directory <codeclass="docutils literal notranslate"><spanclass="pre">./files</span></code></p></li>
<li><p>Glob all files that contain ‘quirk’ as part of their file name</p></li>
<li><p>Glob all file that begins with ‘data’</p></li>
<li><p>Glob all file that starts with the letter ‘q’, contains a ‘w’, and ends with a ‘.npy’ extension</p></li>
</ul>
</div>
<p>The <codeclass="docutils literal notranslate"><spanclass="pre">*</span></code> wildcard is not the only pattern available to us. Sometimes it can be useful to match certain subsets of characters. For example, we may only want to match file names that start with a number. With the <codeclass="docutils literal notranslate"><spanclass="pre">*</span></code> wildcard alone, that’s not possible. Luckily for us, these common use-cases are also taken care of.</p>
<p>To match a subset of characters, we can use square brackets: <codeclass="docutils literal notranslate"><spanclass="pre">[abc]*</span></code> will match anything that starts with ‘a’, ‘b’, or ‘c’ and nothing else. We can also use a ‘-’ inside our brackets to glob groups of characters. For example:</p>
<divclass="highlight-none notranslate"><divclass="highlight"><pre><span></span># matches any file that starts with a number
[0-9]*.txt
# matches any file that has a vowel in its name
*[aeiou]*
# matches any file that starts with a lowercase letter
[a-z]*
</pre></div>
</div>
<divclass="admonition note">
<pclass="admonition-title fa fa-exclamation-circle"><strong>Reading Comprehension: More glob patterns</strong></p>
<p>Write a glob pattern for each of the following prompts</p>
<ulclass="simple">
<li><p>Any file with an odd number in its name</p></li>
<li><p>All txt files that have the letters ‘q’ or ‘z’ in them</p></li>
</ul>
</div>
</div>
<divclass="section" id="Saving-&-Loading-Python-Objects:-pickle">
<h2>Saving & Loading Python Objects: pickle<aclass="headerlink" href="#Saving-&-Loading-Python-Objects:-pickle" title="Permalink to this headline"></a></h2>
<p>Suppose that you have just populated a dictionary that is serving as a grade book for a course that you are teaching:</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="gp">>>> </span><spanclass="n">grades</span><spanclass="o">=</span><spanclass="p">{</span><spanclass="s2">"Albert"</span><spanclass="p">:</span><spanclass="mi">92</span><spanclass="p">,</span><spanclass="s2">"David"</span><spanclass="p">:</span><spanclass="mi">85</span><spanclass="p">,</span><spanclass="s2">"Emmy"</span><spanclass="p">:</span><spanclass="mi">98</span><spanclass="p">,</span><spanclass="s2">"Marie"</span><spanclass="p">:</span><spanclass="mi">79</span><spanclass="p">}</span>
</pre></div>
</div>
<p>How do you save this dictionary so that you can revisit these grades at a later time? Python’s standard library includes the <aclass="reference external" href="https://docs.python.org/3/library/pickle.html">pickle</a> module, which provides functions for saving and loading Python objects to disk. Let’s “pickle” this dictionary, saving it to the file “grades.pkl” in our present directory:</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="kn">import</span><spanclass="nn">pickle</span>
<spanclass="c1"># pickling a dictionary</span>
<spanclass="k">with</span><spanclass="nb">open</span><spanclass="p">(</span><spanclass="s2">"grades.pkl"</span><spanclass="p">,</span><spanclass="n">mode</span><spanclass="o">=</span><spanclass="s2">"wb"</span><spanclass="p">)</span><spanclass="k">as</span><spanclass="n">opened_file</span><spanclass="p">:</span>
<spanclass="n">pickle</span><spanclass="o">.</span><spanclass="n">dump</span><spanclass="p">(</span><spanclass="n">grades</span><spanclass="p">,</span><spanclass="n">opened_file</span><spanclass="p">)</span>
</pre></div>
</div>
<p><codeclass="docutils literal notranslate"><spanclass="pre">pickle.dump</span></code> creates a serialized representation of our dictionary, which is then written to our opened file via the file object that we supplied. Note that we open the file in write-binary mode as we are writing binary data and not text data that first needs to be encoded to binary data. Also note that we use the “.pkl” suffix to indicate that the file is binary data that was written using Python’s pickle protocol. Using this suffix is not necessary but is good practice.</p>
<p><codeclass="docutils literal notranslate"><spanclass="pre">pickle.load</span></code> will unpickle our Python object from disk, permitting us to resume work with our grade book.</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># unpickling a dictionary</span>
<spanclass="k">with</span><spanclass="nb">open</span><spanclass="p">(</span><spanclass="s2">"grades.pkl"</span><spanclass="p">,</span><spanclass="n">mode</span><spanclass="o">=</span><spanclass="s2">"rb"</span><spanclass="p">)</span><spanclass="k">as</span><spanclass="n">opened_file</span><spanclass="p">:</span>
<spanclass="n">my_loaded_grades</span><spanclass="o">=</span><spanclass="n">pickle</span><spanclass="o">.</span><spanclass="n">load</span><spanclass="p">(</span><spanclass="n">opened_file</span><spanclass="p">)</span>
</pre></div>
</div>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="gp">>>> </span><spanclass="n">my_loaded_grades</span>
<spanclass="go">{'Albert': 92, 'David': 85, 'Emmy': 98, 'Marie': 79}</span>
</pre></div>
</div>
<p><codeclass="docutils literal notranslate"><spanclass="pre">pickle.dump</span></code> and <codeclass="docutils literal notranslate"><spanclass="pre">pickle.load</span></code> cover the vast majority of our object-pickling needs. A wide range of Python objects can be saved in this way, including functions that we define and instances of custom classes. Please refer to <aclass="reference external" href="https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled">the official documentation</a> for a discussion of the Python objects that can and cannot be pickled.</p>
</div>
<divclass="section" id="Saving-and-Loading-NumPy-Arrays">
<h2>Saving and Loading NumPy Arrays<aclass="headerlink" href="#Saving-and-Loading-NumPy-Arrays" title="Permalink to this headline"></a></h2>
<p>NumPy provides its own functions for saving and loading arrays. Although these arrays can be pickled, it is strongly advised to leverage NumPy’s file-IO functions. NumPy’s standard binary file type used to store array data is known as an ‘.npy’ file. The NumPy binary archive format, which stores multiple arrays in one file, is known as the ‘.npz’ format.</p>
<p>Let’s save the array <codeclass="docutils literal notranslate"><spanclass="pre">x</span><spanclass="pre">=</span><spanclass="pre">np.array([1,</span><spanclass="pre">2,</span><spanclass="pre">3])</span></code> to the binary file (not a text file) “my_array.npz”. <codeclass="docutils literal notranslate"><spanclass="pre">numpy.save</span></code> and <codeclass="docutils literal notranslate"><spanclass="pre">numpy.load</span></code> will save and load arrays, handling all of the file opening and closing for you. Thus there is no need to use a context manager when using these functions.</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="gp">>>> </span><spanclass="kn">import</span><spanclass="nn">numpy</span><spanclass="k">as</span><spanclass="nn">np</span>
<spanclass="gp">>>> </span><spanclass="n">x</span><spanclass="o">=</span><spanclass="n">np</span><spanclass="o">.</span><spanclass="n">array</span><spanclass="p">([</span><spanclass="mi">1</span><spanclass="p">,</span><spanclass="mi">2</span><spanclass="p">,</span><spanclass="mi">3</span><spanclass="p">])</span>
<spanclass="go"># save a numpy array to disk</span>
<spanclass="gp">>>> </span><spanclass="n">np</span><spanclass="o">.</span><spanclass="n">save</span><spanclass="p">(</span><spanclass="s2">"my_array.npy"</span><spanclass="p">,</span><spanclass="n">x</span><spanclass="p">)</span>
<spanclass="go"># load the saved array from disk</span>
<spanclass="gp">>>> </span><spanclass="n">y</span><spanclass="o">=</span><spanclass="n">np</span><spanclass="o">.</span><spanclass="n">load</span><spanclass="p">(</span><spanclass="s2">"my_array.npy"</span><spanclass="p">)</span>
<spanclass="gp">>>> </span><spanclass="n">y</span>
<spanclass="go">array([1, 2, 3])</span>
</pre></div>
</div>
<p>We can use <codeclass="docutils literal notranslate"><spanclass="pre">numpy.savez</span></code> to save multiple arrays to a single archive file “my_archive.npz”. Here we will save three arrays to the archive. We can specify the names of these arrays, via the keyword arguments that we provide, so that we can distinguish them when loading the archive.</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># save three arrays to a numpy archive file</span>
<spanclass="n">a0</span><spanclass="o">=</span><spanclass="n">np</span><spanclass="o">.</span><spanclass="n">array</span><spanclass="p">([</span><spanclass="mi">1</span><spanclass="p">,</span><spanclass="mi">2</span><spanclass="p">,</span><spanclass="mi">3</span><spanclass="p">])</span>
<spanclass="n">a1</span><spanclass="o">=</span><spanclass="n">np</span><spanclass="o">.</span><spanclass="n">array</span><spanclass="p">([</span><spanclass="mi">4</span><spanclass="p">,</span><spanclass="mi">5</span><spanclass="p">,</span><spanclass="mi">6</span><spanclass="p">])</span>
<spanclass="n">a2</span><spanclass="o">=</span><spanclass="n">np</span><spanclass="o">.</span><spanclass="n">array</span><spanclass="p">([</span><spanclass="mi">7</span><spanclass="p">,</span><spanclass="mi">8</span><spanclass="p">,</span><spanclass="mi">9</span><spanclass="p">])</span>
<spanclass="c1"># we provide the keywords arguments `soil`, `crust`, and `bedrock`,</span>
<spanclass="c1"># as the names of the respective arrays in the archive.</span>
<spanclass="n">np</span><spanclass="o">.</span><spanclass="n">savez</span><spanclass="p">(</span><spanclass="s2">"my_archive.npz"</span><spanclass="p">,</span><spanclass="n">soil</span><spanclass="o">=</span><spanclass="n">a0</span><spanclass="p">,</span><spanclass="n">crust</span><spanclass="o">=</span><spanclass="n">a1</span><spanclass="p">,</span><spanclass="n">bedrock</span><spanclass="o">=</span><spanclass="n">a2</span><spanclass="p">)</span>
</pre></div>
</div>
<p>Loading arrays from an archive is slightly more involved than loading a single array; we will want to open our archive file using a context manager and then load the arrays as we see fit. <codeclass="docutils literal notranslate"><spanclass="pre">np.load</span></code> can be used as a context manager in lieu of <codeclass="docutils literal notranslate"><spanclass="pre">open</span></code>. The file-object that it produces is our archive of numpy arrays, and it provides a dictionary-like interface for accessing these arrays:</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># opening the archive and accessing each array by name</span>
<spanclass="k">with</span><spanclass="n">np</span><spanclass="o">.</span><spanclass="n">load</span><spanclass="p">(</span><spanclass="s2">"my_archive.npz"</span><spanclass="p">)</span><spanclass="k">as</span><spanclass="n">my_archive_file</span><spanclass="p">:</span>
<spanclass="n">out0</span><spanclass="o">=</span><spanclass="n">my_archive_file</span><spanclass="p">[</span><spanclass="s2">"soil"</span><spanclass="p">]</span>
<spanclass="n">out1</span><spanclass="o">=</span><spanclass="n">my_archive_file</span><spanclass="p">[</span><spanclass="s2">"crust"</span><spanclass="p">]</span>
<spanclass="n">out2</span><spanclass="o">=</span><spanclass="n">my_archive_file</span><spanclass="p">[</span><spanclass="s2">"bedrock"</span><spanclass="p">]</span>
</pre></div>
</div>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><spanclass="gp">>>> </span><spanclass="n">out0</span>
<spanclass="go">array([1, 2, 3])</span>
<spanclass="gp">>>> </span><spanclass="n">out1</span>
<spanclass="go">array([4, 5, 6])</span>
<spanclass="gp">>>> </span><spanclass="n">out2</span>
<spanclass="go">array([7, 8, 9])</span>
</pre></div>
</div>
</div>
<divclass="section" id="Links-to-Official-Documentation">
<h2>Links to Official Documentation<aclass="headerlink" href="#Links-to-Official-Documentation" title="Permalink to this headline"></a></h2>
<ulclass="simple">
<li><p><aclass="reference external" href="https://docs.python.org/3/library/pathlib.html">The ‘pathlib’ module</a></p></li>
<li><p><aclass="reference external" href="https://docs.python.org/3/library/functions.html#open">The ‘open’ function</a></p></li>
<li><p><aclass="reference external" href="https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files">Official tutorial: reading and writing files</a></p></li>
<li><p><aclass="reference external" href="https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob">Globbing files</a></p></li>
<li><p><aclass="reference external" href="https://docs.python.org/3/library/pickle.html">The pickle module</a></p>
<ul>
<li><p><aclass="reference external" href="https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled">What can and cannot be pickled?</a></p></li>
</ul>
</li>
</ul>
</div>
<divclass="section" id="Reading-Comprehension-Solutions">
<h2>Reading Comprehension Solutions<aclass="headerlink" href="#Reading-Comprehension-Solutions" title="Permalink to this headline"></a></h2>
<p><strong>Basic glob patterns: Solutions</strong></p>
<ulclass="simple">
<li><p>Glob all .txt files in the directory <codeclass="docutils literal notranslate"><spanclass="pre">./files</span></code> (answer: <codeclass="docutils literal notranslate"><spanclass="pre">./files/*.txt</span></code>)</p></li>
<li><p>Glob all files that contain ‘quirk’ as part of their file name (answer: <codeclass="docutils literal notranslate"><spanclass="pre">*quirk*</span></code>)</p></li>
<li><p>Glob all file that begins with ‘data’ (answer: <codeclass="docutils literal notranslate"><spanclass="pre">data*</span></code>)</p></li>
<li><p>Glob all file that starts with the letter ‘q’, contains a ‘w’, and ends with a ‘.npy’ extension (answer: <codeclass="docutils literal notranslate"><spanclass="pre">q*w*.npy</span></code>)</p></li>
</ul>
<p><strong>More glob patterns: Solutions</strong></p>
<p>Write a glob pattern for each of the following prompts</p>
<ulclass="simple">
<li><p>Any file with an odd number in its name (answer: <codeclass="docutils literal notranslate"><spanclass="pre">*[13579]*</span></code>)</p></li>
<li><p>All txt files that have the letters ‘q’ or ‘z’ in them (answer: <codeclass="docutils literal notranslate"><spanclass="pre">*[qz]*.txt</span></code>)</p></li>
</ul>
</div>
</div>
</div>
</div>
<footer><divclass="rst-footer-buttons" role="navigation" aria-label="Footer">
<ahref="Matplotlib.html" class="btn btn-neutral float-left" title="Matplotlib" accesskey="p" rel="prev"><spanclass="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<ahref="Modules_and_Packages.html" class="btn btn-neutral float-right" title="Import: Modules and Packages" accesskey="n" rel="next">Next <spanclass="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<divrole="contentinfo">
<p>© Copyright 2021, Ryan Soklaski.</p>
</div>
Built with <ahref="https://www.sphinx-doc.org/">Sphinx</a> using a
<ahref="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <ahref="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function(){
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>