How to Read Text Files Into an Array Python

11. Reading and Writing Data Files: ndarrays

By Bernd Klein. Concluding modified: 01 February 2022.

There are lots of ways for reading from file and writing to information files in numpy. We will discuss the different means and respective functions in this chapter:

  • savetxt
  • loadtxt
  • tofile
  • fromfile
  • salvage
  • load
  • genfromtxt

Saving textfiles with savetxt

Scrabble with the Text Numpy, read, write, array

The kickoff two functions nosotros will cover are savetxt and loadtxt.

In the post-obit simple case, we ascertain an array ten and save it every bit a textfile with savetxt:

            import            numpy            as            np            x            =            np            .            array            ([[            1            ,            2            ,            3            ],            [            4            ,            5            ,            6            ],            [            7            ,            8            ,            9            ]],            np            .            int32            )            np            .            savetxt            (            "exam.txt"            ,            ten            )          

The file "exam.txt" is a textfile and its content looks like this:

          [email protected]:~/Dropbox/notebooks/numpy$ more exam.txt ane.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00 4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00 7.000000000000000000e+00 8.000000000000000000e+00 9.000000000000000000e+00        

Attention: The above output has been created on the Linux command prompt!

Information technology's also possible to print the array in a special format, like for example with three decimal places or equally integers, which are preceded with leading blanks, if the number of digits is less than 4 digits. For this purpose we assign a format string to the tertiary parameter 'fmt'. We saw in our first example that the default delimeter is a blank. We can modify this behaviour by assigning a cord to the parameter "delimiter". In most cases this cord will consist solely of a single grapheme only information technology can be a sequence of character, like a smiley " :-) " as well:

            np            .            savetxt            (            "test2.txt"            ,            x            ,            fmt            =            "            %2.3f            "            ,            delimiter            =            ","            )            np            .            savetxt            (            "test3.txt"            ,            x            ,            fmt            =            "            %04d            "            ,            delimiter            =            " :-) "            )          

The newly created files look similar this:

          [email protected]:~/Dropbox/notebooks/numpy$ more test2.txt  1.000,two.000,3.000 4.000,5.000,half dozen.000 vii.000,8.000,9.000          [e-mail protected]:~/Dropbox/notebooks/numpy$ more test3.txt  0001 :-) 0002 :-) 0003 0004 :-) 0005 :-) 0006 0007 :-) 0008 :-) 0009        

The complete syntax of savetxt looks like this:

savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\north', header='', footer='', comments='# ')        
Parameter Significant
X array_like Data to be saved to a text file.
fmt str or sequence of strs, optional
A unmarried format (%x.5f), a sequence of formats, or a multi-format string, e.g. 'Iteration %d -- %10.5f', in which example 'delimiter' is ignored. For circuitous 'X', the legal options for 'fmt' are:
a) a single specifier, "fmt='%.4e'", resulting in numbers formatted like "' (%s+%sj)' % (fmt, fmt)"
b) a full string specifying every real and imaginary part, e.yard. "' %.4e %+.4j %.4e %+.4j %.4e %+.4j'" for 3 columns
c) a listing of specifiers, i per column - in this example, the real and imaginary part must take separate specifiers, due east.m. "['%.3e + %.3ej', '(%.15e%+.15ej)']" for 2 columns
delimiter A cord used for separating the columns.
newline A string (eastward.g. "\n", "\r\n" or ",\n") which will cease a line instead of the default line catastrophe
header A String that volition exist written at the beginning of the file.
footer A String that volition be written at the end of the file.
comments A String that will be prepended to the 'header' and 'footer' strings, to mark them as comments. The hash tag '#' is used as the default.

Loading Textfiles with loadtxt

We will read in now the file "test.txt", which we have written in our previous subchapter:

              y              =              np              .              loadtxt              (              "test.txt"              )              print              (              y              )            

OUTPUT:

[[ 1.  2.  3.]  [ 4.  5.  6.]  [ seven.  eight.  nine.]]            
              y              =              np              .              loadtxt              (              "test2.txt"              ,              delimiter              =              ","              )              impress              (              y              )            

OUTPUT:

[[ 1.  ii.  iii.]  [ four.  5.  vi.]  [ 7.  8.  9.]]            

Null new, if we read in our text, in which we used a smiley to separator:

              y              =              np              .              loadtxt              (              "test3.txt"              ,              delimiter              =              " :-) "              )              print              (              y              )            

OUTPUT:

[[ 1.  2.  iii.]  [ 4.  5.  6.]  [ 7.  8.  ix.]]            

It's besides possible to choose the columns by alphabetize:

              y              =              np              .              loadtxt              (              "test3.txt"              ,              delimiter              =              " :-) "              ,              usecols              =              (              0              ,              2              ))              impress              (              y              )            

OUTPUT:

[[ i.  3.]  [ four.  6.]  [ 7.  9.]]            

We will read in our next case the file "times_and_temperatures.txt", which nosotros have created in our affiliate on Generators of our Python tutorial. Every line contains a time in the format "hh::mm::ss" and random temperatures between 10.0 and 25.0 degrees. We take to convert the time string into float numbers. The time volition be in minutes with seconds in the hundred. Nosotros define start a function which converts "hh::mm::ss" into minutes:

              def              time2float_minutes              (              fourth dimension              ):              if              type              (              time              )              ==              bytes              :              time              =              fourth dimension              .              decode              ()              t              =              time              .              divide              (              ":"              )              minutes              =              float              (              t              [              0              ])              *              sixty              +              bladder              (              t              [              1              ])              +              bladder              (              t              [              2              ])              *              0.05              /              3              render              minutes              for              t              in              [              "06:00:10"              ,              "06:27:45"              ,              "12:59:59"              ]:              print              (              time2float_minutes              (              t              ))            

OUTPUT:

360.1666666666667 387.75 779.9833333333333            

You lot might have noticed that we cheque the type of time for binary. The reason for this is the employ of our part "time2float_minutes in loadtxt in the post-obit example. The keyword parameter converters contains a dictionary which can agree a function for a column (the fundamental of the column corresponds to the key of the dictionary) to convert the string information of this column into a bladder. The string data is a byte string. That is why we had to transfer it into a a unicode string in our function:

              y              =              np              .              loadtxt              (              "times_and_temperatures.txt"              ,              converters              =              {              0              :              time2float_minutes              })              print              (              y              )            

OUTPUT:

[[  360.     20.1]  [  361.5    16.i]  [  363.     xvi.nine]  ...,   [ 1375.5    22.5]  [ 1377.     11.1]  [ 1378.5    15.ii]]            
            # delimiter = ";" , # i.e. employ ";" every bit delimiter instead of whitespace                      

tofile

tofile is a function to write the content of an array to a file both in binary, which is the default, and text format.

A.tofile(fid, sep="", format="%s")

The data of the A ndarry is always written in 'C' gild, regardless of the gild of A.

The data file written past this method tin can exist reloaded with the part fromfile().

Parameter Meaning
fid can be either an open file object, or a cord containing a filename.
sep The string 'sep' defines the separator between array items for text output. If it is empty (''), a binary file is written, equivalent to file.write(a.tostring()).
format Format string for text file output. Each entry in the assortment is formatted to text by first converting information technology to the closest Python type, and then using 'format' % item.

Remark:

Information on endianness and precision is lost. Therefore it may non be a good idea to use the function to annal data or transport data between machines with different endianness. Some of these problems can exist overcome by outputting the data equally text files, at the expense of speed and file size.

              dt              =              np              .              dtype              ([(              'fourth dimension'              ,              [(              'min'              ,              int              ),              (              'sec'              ,              int              )]),              (              'temp'              ,              bladder              )])              x              =              np              .              zeros              ((              ane              ,),              dtype              =              dt              )              ten              [              'time'              ][              'min'              ]              =              10              x              [              'temp'              ]              =              98.25              print              (              x              )              fh              =              open              (              "test6.txt"              ,              "bw"              )              x              .              tofile              (              fh              )            

OUTPUT:

Live Python grooming

instructor-led training course

Upcoming online Courses

Data Analysis With Python

09 Mar 2022 to eleven Mar 2022
18 May 2022 to 20 May 2022
31 Aug 2022 to 02 Sep 2022
19 Oct 2022 to 21 Oct 2022

Enrol here

fromfile

fromfile to read in information, which has been written with the tofile function. It's possible to read binary data, if the data blazon is known. Information technology'southward also possible to parse simply formatted text files. The information from the file is turned into an array.

The general syntax looks like this:

numpy.fromfile(file, dtype=bladder, count=-1, sep='')

Parameter Meaning
file 'file' tin be either a file object or the proper noun of the file to read.
dtype defines the data type of the array, which will be constructed from the file data. For binary files, it is used to determine the size and byte-order of the items in the file.
count defines the number of items, which will be read. -i means all items will exist read.
sep The string 'sep' defines the separator between the items, if the file is a text file. If it is empty (''), the file volition be treated as a binary file. A infinite (" ") in a separator matches zero or more than whitespace characters. A separator consisting solely of spaces has to match at to the lowest degree one whitespace.
              fh              =              open up              (              "test4.txt"              ,              "rb"              )              np              .              fromfile              (              fh              ,              dtype              =              dt              )            

OUTPUT:

assortment([((4294967296, 12884901890), one.0609978957e-313),        ((30064771078, 38654705672), 2.33419537056e-313),        ((55834574860, 64424509454), 3.60739284543e-313),        ((81604378642, 90194313236), 4.8805903203e-313),        ((107374182424, 115964117018), half dozen.1537877952e-313),        ((133143986206, 141733920800), seven.42698527006e-313),        ((158913789988, 167503724582), viii.70018274493e-313),        ((184683593770, 193273528364), 9.9733802198e-313)],        dtype=[('time', [('min', '<i8'), ('sec', '<i8')]), ('temp', '<f8')])
              import              numpy              as              np              import              os              # platform dependent: divergence betwixt Linux and Windows              #data = np.arange(fifty, dtype=np.int)              data              =              np              .              arange              (              l              ,              dtype              =              np              .              int32              )              data              .              tofile              (              "test4.txt"              )              fh              =              open up              (              "test4.txt"              ,              "rb"              )              # four * 32 = 128              fh              .              seek              (              128              ,              os              .              SEEK_SET              )              ten              =              np              .              fromfile              (              fh              ,              dtype              =              np              .              int32              )              impress              (              10              )            

OUTPUT:

[32 33 34 35 36 37 38 39 xl 41 42 43 44 45 46 47 48 49]            

Attending:

It can cause issues to use tofile and fromfile for data storage, because the binary files generated are non platform independent. At that place is no byte-order or data-type data saved past tofile. Data can exist stored in the platform independent .npy format using relieve and load instead.

Best Exercise to Load and Salve Information

The recommended way to store and load data with Numpy in Python consists in using load and save. Nosotros besides apply a temporary file in the following :

              import              numpy              as              np              print              (              x              )              from              tempfile              import              TemporaryFile              outfile              =              TemporaryFile              ()              x              =              np              .              arange              (              10              )              np              .              salve              (              outfile              ,              x              )              outfile              .              seek              (              0              )              # Only needed here to simulate closing & reopening file              np              .              load              (              outfile              )            

OUTPUT:

[32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49] array([0, one, two, 3, iv, 5, 6, seven, 8, 9])

and nonetheless another mode: genfromtxt

There is notwithstanding some other mode to read tabular input from file to create arrays. Equally the name implies, the input file is supposed to be a text file. The text file can exist in the form of an archive file too. genfromtxt can process the annal formats gzip and bzip2. The type of the annal is determined by the extension of the file, i.e. '.gz' for gzip and bz2' for an bzip2.

genfromtxt is slower than loadtxt, merely it is capable of coping with missing data. It processes the file data in 2 passes. At starting time it converts the lines of the file into strings. Thereupon it converts the strings into the requested data blazon. loadtxt on the other hand works in i get, which is the reason, why it is faster.

recfromcsv(fname, **kwargs)

This is non actually another way to read in csv data. 'recfromcsv' basically a shortcut for

np.genfromtxt(filename, delimiter=",", dtype=None)

Live Python training

instructor-led training course

Upcoming online Courses

Data Analysis With Python

09 Mar 2022 to eleven Mar 2022
eighteen May 2022 to 20 May 2022
31 Aug 2022 to 02 Sep 2022
19 Oct 2022 to 21 October 2022

Enrol here

How to Read Text Files Into an Array Python

Source: https://python-course.eu/numerical-programming/reading-and-writing-data-files-ndarrays.php

0 Response to "How to Read Text Files Into an Array Python"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel