Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why a numpy array appears to have no shape?

I understand the following:

import numpy as np

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

print(arr.shape)

Output:

(2, 4)

So I was wondering why I get the following:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import numpy

import pytesseract
import logging

# Raw call does not need escaping like usual Windows path in python 
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'

logging.basicConfig(level=logging.WARNING)
logging.getLogger('pytesseract').setLevel(logging.DEBUG)


image = r'C:\ocr\target\31832_226140__0001-00002b.jpg'
target = numpy.asarray(pytesseract.image_to_string(image, config='--dpi 96 --psm 6 -c preserve_interword_spaces=1 -c tessedit_char_whitelist="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,- \'" '))
print("target type is:",type(target))
print("target array shape is:",target.shape)

Output:

DEBUG:pytesseract:['C:\\Program Files\\Tesseract-OCR\\tesseract', 'C:\\ocr\\target\\31832_226140__0001-00002b.jpg', 'C:\\Users\\david\\AppData\\Local\\Temp\\tess_p68ogbz9', '--dpi', '96', '--psm', '6', '-c', 'preserve_interword_spaces=1', '-c', "tessedit_char_whitelist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,- '", 'txt']
target type is: <class 'numpy.ndarray'>
target array shape is: ()

Okay. My array is text. But I still would have thought I would get parameter’s example like say (1,999) for my shape?

Using the line print(target) gives the following type of output.

——–>snip<———-

196 ANGUS, Lynne Manon ........................128 Wellington Rd, Wemuomata Recepnonst
        197 ANGUS, Mane Joan .........00... ......129 Wellington Road, Weinumomata, Married
       198 ANGUS, Manon Jean .........................173 Wellington Road, Weinuiomata,Texi Driver
        199 ANGUS. Noel Fulton ........................127 Weinuomats Road, Weinuomate, Carpenter
   

>Solution :

This just means that you’ve created a scalar, i.e., an array with "no shape". Consider:

>>> import numpy as np
>>> arr = np.array(1)
>>> arr
array(1)
>>> arr.shape
()

This is because, I can only surmise, pytesseract.image_to_string returns a str object (or maybe a bytes object). So of course, you get:

>>> np.asarray("some string object")
array('some string object', dtype='<U18')
>>> np.asarray("some string object").shape
()

It isn’t clear exactly what you expect to create. As you stated, you just have a text file, presumably, so why are you trying to create a numpy.ndarray object out of it? If you can elaborate on what you are trying to achieve, perhaps I or others can suggest an approach.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading