Menu
  • HOME
  • TAGS

Python 3.3 TypeError: can't use a string pattern on a bytes-like object in re.findall()

Tag: python-3.x,web-crawler

I am trying to learn how to automatically fetch urls from a page. In the following code I am trying to get the title of the webpage:

import urllib.request
import re

url = "http://www.google.com"
regex = '<title>(,+?)</title>'
pattern  = re.compile(regex)

with urllib.request.urlopen(url) as response:
   html = response.read()

title = re.findall(pattern, html)
print(title)

And I get this unexpected error:

Traceback (most recent call last):
  File "C:\Users\Abhishek\Desktop\Crawler.py", line 11, in <module>
    title = re.findall(pattern, html)
  File "C:\Python33\lib\re.py", line 201, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

What am I doing wrong?

Thanks!

Best How To :

You want to convert html (a byte-like object) into a string using .decode, e.g. html = response.read().decode('utf-8').

See Convert bytes to a Python String

subprocess python 3 check_output not same as shell command?

python-3.x,subprocess

shlex.split() syntax is different from the one used by cmd.exe (%COMSPEC%) use raw-string literals for Windows paths i.e., use r'c:\Users' instead of 'c:\Users' you don't need shell=True here and you shouldn't use it with a list argument you don't need to split the command on Windows: string is the...

Python MVC style GUI Temperature Converter

python,user-interface,python-3.x,model-view-controller,tkinter

You have two mistakes here: 1 - In your Counter.py file and in your Convert class methods, you are not return the right variables, instead of return celsius you should return self.celsius and same goes for self.fahrenheit 2 - In Controller.py file: self.view.outputLabel["text"] = self.model.convertToFahrenheit(celsius) This will not update the...

Python 3.4: List to Dictionary

python,list,python-3.x,dictionary

You can use unpacking operation within a dict comprehension : >>> my_dict={i:j for i,*j in [l[i:i+4] for i in range(0,len(l),4)]} >>> my_dict {'Non Recurring': ['-', '-', '-'], 'Total Other Income/Expenses Net': [33000, 41000, 39000], 'Selling General and Administrative': [6469000, 6384000, 6102000], 'Net Income From Continuing Ops': [4956000, 4659000, 4444000], 'Effect...

tkinter showerror creating blank tk window

python-3.x,tkinter,messagebox,tkmessagebox

from Tkinter import * from tkMessageBox import showerror Tk().withdraw() showerror(title = "Error", message = "Something bad happened") Calling Tk().withdraw() before showing the error message will hide the root window. Note: from tkinter import * for Python 3.x...

Check if element exists in fetched URL [closed]

javascript,jquery,python,web-crawler,window.open

I can suggest you use iframe for loading pages. For example: $.each($your-links, function(index, link) { var href = $(link).attr("href"); // your link preprocess logic ... var $iframe = $("<iframe />").appendTo($("body")); $iframe.attr("src", href).on("load", function() { var $bodyContent = $iframe.contents().find("body"); // check iframe content and remove iframe $iframe.remove(); } } But, I...

How to avoid user to click outside popup Dialog window using Qt and Python?

qt,user-interface,python-3.x,dialog,qt-creator

use setModal() like so; dialog.setModal(1); Or; dialog.setModal(true); ...

The event loop is already running

python,python-3.x,pyqt,pyqt4

I think the problem is with your start.py file. You have a function refreshgui which re imports start.py import will run every part of the code in the file. It is customary to wrap the main functionality in an ''if __name__ == '__main__': to prevent code from being run on...

Installing Python 3 Docker Ubuntu error command 'x86_64-linux-gnu-gcc

python,python-3.x,amazon-web-services,docker

You should install python3-pip in your Dockerfile and then run pip3 install -r requirements.txt

Wrapping Functions in Python 3.4 missing required positional argument

python,python-3.x,flask,flask-login

login_role_required should be a function that returns a decorator function, which in turn takes a single argument—the decorated function—and returns a modified function. So it should look like this: def login_role_required(req_roles = None): if req_roles is None: req_roles = ['any'] def decorator (f): def decorated_view(*args, **kwargs): # … return f(*args,...

How to access a class's property using a partialmethod?

python,python-3.x,descriptor

A partialmethod object will only handle descriptor delegation for the function object, no for any arguments that are descriptors themselves. At the time the partialmethod object is created, there is not enough context (no class has been created yet, let alone an instance of that class), and a partialmethod object...

Pass function call as a function argument

python,python-2.7,python-3.x

The functions are returning tuples, because return only gives back one item. You can "unpack" the tuple returned by prepending it with an asterisk. The syntax will look like this: print function1(*function2(1,2)) ...

What's the fastest way to compare datetime in pandas?

python,python-3.x,numpy,pandas,datetime64

If you call set_index on pdata to date_2 then you can pass this as the param to map and call this on tdata['date_1'] column and then fillna: In [51]: tdata['TBA'] = tdata['date_1'].map(pdata.set_index('date_2')['TBA']) tdata.fillna(0, inplace=True) tdata Out[51]: TBA date_1 0 0 2010-01-04 1 2 2010-01-05 2 0 2010-01-06 3 0 2010-01-07...

index() Method Not Accepting None as Start/Stop

python,python-3.x

This is a bug with how the arguments are being parsed. See issue #1259 and issue #11828 for the same bug with string methods, and issue #13340 for lists. Yes, the documentation implies that the indices are used in roughly the same way as a slice, and the internal utility...

Python file processing?

python,python-3.x

You need to convert your converted string grades to floats (or int) average =(float(grade_1) + float(grade_2)+ float(grade_3))/3.0 average = str(average) ...

Python bruteforce combinations given a starting string

python,python-3.x,brute-force

Taking the original code from the itertools man page copy the code for the combinations_with_replacement code but replace line 7 with new indices starting from your entered word. inputStr='acc' indices=[pool.index(l) for l in inputStr] And then run the rest of the code from the man page. EDIT: For a complete...

TCL parsing a list of arguments to an external call

python,python-3.x,tcl

I don't know if the Tcl interpreter in your system is recent. If it is, you should be able to use python $python_app_name {*}$python_app_args to get the arguments as separate strings. The {*} prefix is a syntactic modifier that splices the items in a list as separate arguments. Example: list...

Put a QLineEdit() into a QTreeWidgetItem()

python,python-3.x,pyqt,pyqt5

It should suffice to set your item flags to include ItemIsEditable: self.item.setFlags(self.item.flags() | Qt.ItemIsEditable) You can also configure the EditTriggers to start editing as you like, e.g. upon double-clicking an item: treeView.setEditTriggers(QtGui.QAbstractItemView.DoubleClicked) Double-clicking an item in your treewidget should now bring up an editor - which by default is simply...

Python3:socket:TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'

sockets,python-3.x

Normal string formatting cannot be used for bytes. I think the way to go about it is - you'd have to first generate a string, format it and then convert it to bytes with appropriate encoding. So the following changes should work change sock.send(b'Hello %s!' % data) to reply =...

How do I make each histogram bin show me the frequency of each action/event/item?

python-3.x,matplotlib,histogram

with your data, cases = list(set(actions)) fig, ax = plt.subplots() ax.hist(map(lambda x: times[actions==x], cases), bins=np.arange(min(times), max(times) + binwidth, binwidth), histtype='bar', stacked=True, label=cases) ax.legend() plt.show() produces ...

Scrapy not entering parse method

python,selenium,web-scraping,web-crawler,scrapy

Your parse(self, response): method is not part of the jobSpider class. If you look at the Scrapy documentation you'll see that the parse method needs to be a method of your spider class. from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium import...

Addition of two dates on python 3

python,csv,datetime,python-3.x

Once you have both the values in two variables, new_date and new_time, you could simply combine them to get the datetime, like this, >>> new_date = dt.datetime.strptime(row[0], "%Y.%m.%d") >>> new_time = dt.datetime.strptime(row[1], "%H:%M").time() >>> >>> dt.datetime.combine(new_date, new_time) datetime.datetime(2005, 2, 28, 17, 38) Note:- Avoid using date and time as variable...

Error Hashing + Salt password

python,authentication,python-3.x,hash,salt

You can do salt = salt.decode("utf-8") after salt is encoded to convert it to string.

If a block of code creates an error, do x; if not, do y (Python)

python,python-3.x

Sounds like you're looking for exceptions https://docs.python.org/2/tutorial/errors.html # checkError: becomes try # some test if x > 0: raise AssertionError("Something failed...") print("The block of code works!") except: print("The block of code does not work!") Something like that...

“Initializing” variables in python?

python,python-3.x

The issue is in the line - grade_1, grade_2, grade_3, average = 0.0 and fName, lName, ID, converted_ID = "" In python, if the left hand side of the assignment operator has multiple items to be allocated, python would try to iterate the right hand side that many times and...

django-admin startproject not working with python3 on OS X

python,django,osx,python-2.7,python-3.x

Recommended: Try using virtualenv and initiate your environment with Python3. Or a quicker solution is to use python interpreter directly to execute django-admin: <path-to-python3>/python /usr/local/bin/django-admin startproject mysite ...

Pylint Error when using metaclass

python,python-3.x,vim,pylint,syntastic

In the docs under 4.2. Q. The python checker complains about syntactically valid Python 3 constructs...: A. Configure the python checker to call a Python 3 interpreter rather than Python 2, e.g: let g:syntastic_python_python_exec = '/path/to/python3' ...

writing a tkinter scrollbar for canvas within a class

python,python-3.x,tkinter

Your problem starts here: frame.bind('<Configure>', self.OnFrameConfigure(parent=canvas)) You are immediately calling the OnFrameConfigure function. That is not how you use bind. You must give a reference to a callable function. Since you're using a class, you don't need to pass parent in, unless you have this one function work for multiple...

Python3 create files from dictionary

file,python-3.x,dictionary

Remove the if not len(key) != len(aDict) and the break. What you probably wanted to do is stopping the loop after iterating all the keys. However key is one of 'OG_1', 'OG_2', 'OG_XX', it's not a counter or something like that. Replace open("key", "w") with open(key + ".txt", "w")....

Return to main fuction in python

python-3.x,def

As suggested by tobias_k, you should add the contents of choices() into a while loop. I also found some other problems: False does not equal "False", so your while loop never runs. You use terms like mylist, mylist1, and mylist2 - it's better to rename these to choosing_list, appending_list, and...

multiple iteration of the same list

python,python-2.7,python-3.x,numpy,shapely

Without downloading shapely, I think what you want to do with lists can be replicated with strings (or integers): In [221]: data=['one','two','three'] In [222]: data1=['one','four','two'] In [223]: results=[[],[]] In [224]: for i in data1: if i in data: results[0].append(i) else: results[1].append(i) .....: In [225]: results Out[225]: [['one', 'two'], ['four']] Replace...

python 3 error with print function syntax

python,python-3.x,printing

The code works fine in Python 2. If you are using Python 3, there is an issue with last line, because print is a function. So, because of where you've put the close parenthesis, only the first part is actually passed to print. Try this instead print("len(list6[1]):", len(list6[1])) ...

Finding the number of letters in a sentence?

python,python-3.x

Change n = words.count(x) to n = i.count(x) does the trick. Reason You are saying i is your one word in each iteration.So you have to use i.count(x) to get count of x in i ...

Python3 after cursor.execute it stopped?

mysql,python-3.x

Below is how I get my trouble solved. Using array to append all the processed data and use executemany to save them at once. In beforehand, have to modify mysql config max_allowed_packet = 500M A pain but valuable lesson. Answer: - import base64 import struct import pymysql.cursors import sys import...

argparse optional value for argument

python,python-3.x,command-line-interface,argparse

You can do this with nargs='?': One argument will be consumed from the command line if possible, and produced as a single item. If no command-line argument is present, the value from default will be produced. Note that for optional arguments, there is an additional case - the option string...

Callable not defined for django.db.models field default

python,django,python-3.x,django-models

You shouldn't use this method to set your default value, rather than override the save method of the model and use it there. For example: class User(models.Model): first_name = models.CharField(max_length=256) last_name = models.CharField(max_length=256) slug = models.SlugField(max_length=256, unique=True, default=uuid.uuid1) def make_slug(self): return self.first_name + self.last_name[0] def save(self, *args, **kwargs): self.slug =...

Cancel last line iteration on a file

python,python-3.x,for-loop,file-io

In python3, the 'for line in file' construct is represented by an iterator internally. By definition, a value that was produced from an iterator cannot be 'put back' for later use (http://www.diveintopython3.net/iterators.html). To get the desired behaviour, you need a function that chains together two iterators, such as the chain...

Distinguishing between HTML and non-HTML pages in Scrapy

python,html,web-crawler,scrapy,scrapy-spider

Nevermind, I found the answer. type() only gives information on the immediate type. It tells nothing of inheritance. I was looking for isinstance(). This code works: if isinstance(response, TextResponse): links = response.xpath("//a/@href").extract() ... http://stackoverflow.com/a/2225066/1455074, near the bottom...

Why does round(5/2) return 2?

python,python-3.x,python-3.4

if two multiples are equally close, rounding is done toward the even choice (so, for example, both round(0.5) and round(-0.5) are 0, and round(1.5) is 2). Quoting the documentation for the round function. Hope this helps :) On a side note, I would suggest always read the doc when...

Pyqt - Add a QMenuBar to a QMainWindow which is in another class

python-3.x,pyqt,pyqt5

QMainWindow comes with its default QMenuBar, but you cant set a new one with QMainWindow.setMenuBar() More informations in the Qt Documentation...

Django runserver not serving some static files

django,python-3.x

According to the Django documentation regarding static/: This should be an initially empty destination directory for collecting your static files from their permanent locations into one directory for ease of deployment; it is not a place to store your static files permanently. You should do that in directories that will...

sys.argv in a windows environment

python,windows,python-3.x

You are calling the script wrong Bring up a cmd (command line prompt) and type: cd C:/Users/user/PycharmProjects/helloWorld/ module_using_sys.py we are arguments And you will get the correct output....

Python 3 filtering directories by name that matches specific pattern

python,regex,python-3.x,directory,filtering

import os import re result = [] reg_compile = re.compile("test\d{8}") for dirpath, dirnames, filenames in os.walk(myrootdir): result = result + [dirname for dirname in dirnames if reg_compile.match(dirname)] As advised I will explain (thanks for the -1 btw :D) the compile("test\d{8}) will prepare a regex that matches any folder named test...

How to make the Sieve of Eratosthenes faster?

python-3.x,primes,sieve-of-eratosthenes,number-theory

Always try to measure empirical complexity of your code at several ranges. Your code is slow because of how you find the set difference, and that you always convert between set and list and back. You should be using one set throughout, and updating it in place with sete.difference_update(range(p*p,n,p*2)) To...

How to have multiple text widgets with scrollbars in a frame on tkinter

python,python-3.x,tkinter

Two issues (apart from the messed-up indentation which probably just happened when pasting your code): The frame was bound to the returned value of a function rather than to a function itself. You can fix this with a lambda function. You're trying to create a complex layout with pack(). Try...

What is a reliable isnumeric() function for python 3?

python,regex,validation,python-3.x,isnumeric

try: float(w.get()) except ValueError: # wasn't numeric ...

Multiple random choices with different outcomes

python,python-3.x,random

The line: randgend = random.choice(gend) makes randgend a single random choice from [ 'male', 'female' ], you're basically writing: randgend = 'male' # or female, whichever gets picked first If you want it to be a function that returns a different random choice each time, you need: randgend = lambda:...

How to parse this string?

python,python-3.x

#!/usr/bin/env python3 # coding: utf-8 s = """00 1f [email protected] 00c 00e 00N 00> 00E 00O 00F 002 00& 00* 00/ 00) 00 1f 00 1c 00 00 00 17 00\r 00 08 00 03 00 f8 ff ea ff e1 ff e1 ff e0 ff da ff d2 ff...

“Initializing” a constant containing a file in python?

python,python-3.x

A common approach in Python is to initialize a variable to None if it is not being used yet. This is a signal to the future readers of your code that you want that variable to exist, but it won't be until later that it is used. infile = None...

T_STRING error in my php code [duplicate]

php,web-crawler

I think that you get this code of C# or C++ or other similar language, this code not work in PHP, If you get an external java application (jar) use the exec functions instead. $url_effective = "http://www.endclothing.co.uk/checkout/cart/add/uenc/aHR0cDovL3d3dy5lbmRjbG90aGluZy5jby51ay9ldHEtaGlnaC10b3AtMS1zbmVha2VyLWVuZC1leGNsdXNpdmUtZXRxLTQtZW5kYmsuaHRtbA,,/product/$i/form_key/DcwmUqDsjy4Wj4Az/"; $crwal = exec("end-cookie.jar -w".$url_effective." -L -s -S -o"); Or some for this style....

Python Reuse a Variable in the Else Block of an If-Else Statement

python,python-3.x,if-statement,condition

When you call getSrc a second time, the value of source that was created the first time has long since gone out of scope and been garbage collected. To prevent this, try making source an attribute of the function the same way you did for has_been_called. def getSrc(): if getSrc.has_been_called...