Menu
  • HOME
  • TAGS

Alter table or select/copy to new table with new columns

google-bigquery

If you want to use a query to copy the table, but don't want nested and repeated fields to be flattened, you can set the flattenResults parameter to false to preserve the structure of your output schema.

How to integrate Google Bigquery with c# console application

c#,integration,google-bigquery

In your answer i could not able to add namespace "using Google.Apis.Authentication.OAuth2.DotNetOpenAuth;" But i manage to retrieve results from BigQuery using below code, you need to update Project Name, Project Id and Query. Download Client ID (I am using Installed Application - Other category ) generate JSON file and add...

BigQuery raises Pagination token expired on first getQueryResults

google-bigquery

It turns out this is just a bad error message. The problem is that BigQuery only supports decorators within the last 7 days. For the query used in the job mentioned above, the time range specified was 30 days in the past. The error should have said something like "invalid...

Using multiple '*' patterns when loading into BigQuery won't work

google-bigquery

From the documentation for load job configuration sourceUris parameter: [Required] The fully-qualified URIs that point to your data in Google Cloud Storage. Wildcard names are only supported when they appear at the end of the URI....

Can bigquery query flattern tables and convert it into nest data structure

google-bigquery

One way is to use GROUP_CONCAT SELECT t1.c1, group_concat(t2.c2) FROM (SELECT 'a' AS c1, 1 AS k) t1 JOIN (SELECT * FROM (SELECT 'b' AS c2, 1 AS k), (SELECT 'c' AS c2, 1 AS k), (SELECT 'd' AS c2, 1 AS k), (SELECT 'e' AS c2, 1 AS k),...

Is there any form to write to BigQuery specifying the name of destination tables dynamically?

google-bigquery,google-cloud-dataflow

Unfortunately, we don't provide an API to name the BigQuery table in a data-dependent way. Generally speaking, data-dependent BigQuery table destination(s) may be error prone. That said, we are working on improving flexibility in this area. No estimates at this time, but we hope to get this soon....

Unable to extract length of an integer column

google-bigquery

LENGTH is a function which operates on strings. If you want the string length of an integer, you could run, e.g.: SELECT LENGTH(STRING(17)); You can also see the BigQuery query reference for string functions for more information....

BigQuery command line tool - append to table using query

google-bigquery

It is a shame BigQuery doesn't support the standard SQL idioms CREATE TABLE foo INSERT a,b,c from bar where blah; INSERT foo SELECT a,b,c from baz where blech; and you have to do it their way. From the help, there is bq query --append_table ... The help for the bq...

DateTime offset in Google BigQuery

google-bigquery

If you are 5 hour behind UTC, you need to use a workaround: UTC_USEC_TO_DAY(timestamp_field- (5*60*60*1000*1000000) ) Timezone functions, is on feature request list as documented here: https://code.google.com/p/google-bigquery/issues/detail?id=8...

'Immediate Follow' Page Path in BigQuery

google-bigquery

/For your particular use case, I'm pretty sure you can do this with much faster execution time by avoiding both JOIN and GROUP BY. Consider: SELECT [date], fullVisitorId, visitId, visitNumber, GROUP_CONCAT(REGEXP_EXTRACT(hits.page.pagePath, '^(/[^/?]*)'), ">>") WITHIN RECORD AS Sequence, FROM (TABLE_DATE_RANGE ( [XXXXXX.ga_sessions_] , TIMESTAMP('2014-06-01') , TIMESTAMP('2014-06-05') ) ) WHERE REGEXP_MATCH(hits.page.pagePath, r'^/Page[123]')...

BigQuery Basics - FROM clause while quering

sql,google-analytics,google-bigquery

Base on what i see on your query , you just giving it Aliases. Just like the example below SELECT * FROM StudentInformation as StudentProfile; ...

IGNORE CASE query problems saving to a table and using Allow large results

google-bigquery

Yes, it is a known problem, and it has not been neglected. The code changes to fix it are (surprisingly) not trivial, but they are mostly done. Not team is carefully looking how to enable and deploy them. I cannot give you a timeline, but the fix to this problem...

Export Google Cloud Datastore and import to BigQuery programmatically

gae-datastore,google-bigquery,google-datastore

There isn't a simple way to do this, but you can separate out the two parts: creating appengine backups and loading them into bigquery. You can use scheduled backups to create datastore backups periodically (https://cloud.google.com/appengine/articles/scheduled_backups). You can then use Apps Script to automate the BigQuery portion (https://developers.google.com/apps-script/advanced/bigquery#load_csv_data) or use an...

Trying to find exact word match within separate table field, accounting for negative words

google-bigquery

The idea is: SPLIT negative keywords into repeated field Remove negative words using OMIT RECORD IF SOME(title CONTAINS negative) construct Match full words using CONTAINS with surrounding spaces, or to catch beginning/end of the string use custom pattern with LIKE Putting it altogether using data from your example: SELECT title,...

How can I generate a BigQuery result without JSON formatting?

python,google-bigquery

I believe you want something like this: for page in paging(service, service.jobs().getQueryResults, num_retries=num_retries, **query_job['jobReference']): for row in page['rows']: # Each row is a dict with fields, 'f', containing # an array of table cells, one for each column # of the query output. Each cell is a dict # containing...

Big query - Concatenate strings horizontally

google-bigquery,string-concatenation

You need to use CONCAT and the trim functions SELECT CONCAT(rtrim(ltrim(first_name)),' ',rtrim(ltrim(last_name))) AS full_name FROM (SELECT 'Anna' AS first_name, ' Miller ' AS last_name), ...

Inner Joining big tables in Big Query

google-bigquery

Looking at your question, seems like all you need is to read up a bit on the doc available. Now, having read Jordan Tigani's book, I can tell you that when you join, the system actually sends the smaller table in every shard that handles your query. Since none of...

BigQuery - same query works when submitted from UI and reports SQL syntax error from batch

batch-processing,google-bigquery

The difference between the succeeding and the failing query appears to be that you are setting flattenResults=false when you run the query in batch. This mode has slightly different behavior with JOINs that can cause subtle issues like this one. From BigQuery docs on JOINs: BigQuery executes multiple JOIN operations...

Bigquery select distinct values

google-bigquery

SELECT cc_info FROM user WHERE date = ? GROUP BY cc_info ...

Google BigQuery asking for JOIN EACH but I'm already using it

google-bigquery

You've got a GROUP BY inside a JOIN EACH. GROUP BY hits limits with cardinality (the number of distinct values) and the final grouping is not parallelizable. This limits BigQuery's ability to do the join. If you change the GROUP BY to GROUP EACH BY, this will most likely work...

BigQuery Bug: SELECT of aliased field fails if scoped aggregation in subquery

google-bigquery

Yes, it is a bug, first query should work.

BigQuery streaming - our data is not showing up anymore

google-bigquery

The issue is under investigation, see: https://code.google.com/p/google-bigquery/issues/detail?id=263 More information in this forum https://groups.google.com/forum/#!forum/bigquery-downtime-notify...

Insert nested data into BigQuery using Golang

go,google-bigquery

Visits should be a slice of bigquery.JsonValue I am not sure why you have used: TableDataInsertAllRequestRows that should be only used one time for the payload descriptor. var visits []bigquery.JsonValue visit := make(map[string]bigquery.JsonValue) visit["duration"] = rand.Intn(100) visits = append(visits, visit) jsonRow["visit"] = visits ps. also make sure you have your...

Bigquery union/join error

google-analytics,google-bigquery

The "result is too large" error applies to the final result of the query, which means that the result is too large even after semijoin in WHERE is applied. This should work though if you use "Allow Large Results" setting.

Use the JOIN command with multiple conditions

sql,google-bigquery

SELECT suppliers.supplier_id, suppliers.supplier_name, orders.order_date FROM suppliers INNER JOIN orders ON suppliers.supplier_id = orders.supplier_id WHERE suppliers.order_date=orders.order_date; ...

unexpected behaviour of Google BigQuery WHERE NOT list CONTAINS string

sql,google-bigquery,contains

Here is a sample query: select * from (select 'G01N 33/55' as ipc, 'G01N 34' as not_ipc), (select 'G01N 33/55' as ipc, 'G01N 33' as not_ipc), (select 'G01N 33/55' as ipc, string(null) as not_ipc) where not ipc contains not_ipc or not_ipc is null this returns: +-----+------------+---------+---+ | Row | ipc...

How to flatten with a table wildcard in BigQuery?

google-bigquery

I believe the issue is that FLATTEN does not work a union of tables, which TABLE_QUERY is eventually rewritten to, if the TABLE_QUERY evaluates to multiple tables. A workaround is to wrap the TABLE_QUERY in a subselect, making the FLATTEN operate over a single source, the subselect. SELECT blah FROM...

Subscriber names from github

google-bigquery

You can use the Events to get the actor login name of a repo. Activity archives for dates starting 1/1/2015 is recorded from the Events API.. SELECT actor.login FROM ( TABLE_DATE_RANGE ( [githubarchive:day.events_] , TIMESTAMP('2015-01-01') , TIMESTAMP('2015-03-01') ) ) where type='WatchEvent' and repo.name = 'ptrofimov/beanstalk_console' For 2014 events SELECT actor...

Bigquery - select timestamp as human readable datetime

google-bigquery

I think I found a working solution from Bigquery reference page. Basically BigQuery stores TIMESTAMP data internally as a UNIX timestamp with microsecond precision. SELECT SEC_TO_TIMESTAMP(date) FROM ... ...

How can I pivot dataset in Google BigQuery?

sql,pivot,google-bigquery

Function NTH is applicable to REPEATED fields, where it chooses the nth repeating element (the error message can be improved). So first step would be to build REPEATED field out of NextStepID, and it can be done with NEST aggregation function. Then you can use NTH as scoped aggregation function:...

Regex QueryString Parsing for a specific in BigQuery

regex,google-app-engine,logging,google-bigquery

Assuming you have just 1 query string per record then you can do this: SELECT REGEXP_EXTRACT(protoPayload.resource, r'device_ID=(.*)$') as device_id FROM mytable The part within the parentheses will be captured and returned in the result. If device_ID isn't guaranteed to be the last parameter in the string, then use something like...

bigquery split string to chars

google-bigquery

You can use SPLIT function with empty string as delimiter, i.e. SELECT id, SPLIT(value, '') value FROM Table Please note, that SPLIT returns repeated field, and if you want flat results (wasn't clear from your question), you would use SELECT * FROM FLATTEN((SELECT id, SPLIT(value, '') value FROM Table), value)...

Error “Login Required” when trying to query Google BigQuery with Python

python,google-bigquery

Instead of: service = build('bigquery', 'v2') datasets = service.datasets() response = datasets.list(projectId=PROJECT_ID).execute(http) Try: service = build('bigquery', 'v2', http=http) datasets = service.datasets() response = datasets.list(projectId=PROJECT_ID).execute() (use the authenticated http connection when building the service)...

error when importing gz files into bigquery

google-bigquery

Inspecting the job configuration, you include a non-gzip file as the first uri, ending in .../20150426/_SUCCESS. BigQuery uses the first file to determine whether compression is enabled. Assuming this file is empty, you can remove it from your load requests to fix this. If there is data in this file,...

BigQuery error: Cannot query the cross product of repeated fields

google-analytics,google-bigquery

Depending on what kind of filtering is acceptable to you, you may be able to work around this by switching to OMIT IF from WHERE. It will give different results, but, again, perhaps such different results are acceptable. The following will remove entire hit record if (some) page inside of...

BiqQuery which tool is used to produce reports online? [closed]

report,google-bigquery

Well, if your output fits in Excel and you are experienced there go ahead with that. Other tools are: Tableau Shufflepoint QlikView + Demo Bime Analytics + Demo Jaspersoft Metric Insights R Today re:dash has support for querying multiple databases, including: Redshift, Google BigQuery, PostgreSQL, MySQL, Graphite and custom scripts....

BigQuery SPLIT() and grouping by result

google-bigquery

My best guess would be that you can get an equivalent result by using a subquery. Something like : SELECT * FROM (Select NTH(2,SPLIT('FIRST-SECOND','-')) as second_part FROM [FOO.bar] limit 10) GROUP BY second_part The system returns Nth in an aggregate internally I guess...

Regexp in BigQuery

regex,google-bigquery

First you need to normalize the text to retain only valid words. The below regular expression is just a simple one, you need to match and extend to your logic. SELECT normalized, count(1) AS c FROM (SELECT label, lower(REGEXP_EXTRACT(label,r'[[:punct:]]?([[:^punct:]]*)')) AS normalized FROM (SELECT string(':Adidas') AS label), (SELECT string('Adidas') AS label),...

Google Bigquery query execution using google cloud dataflow

google-bigquery,google-cloud-dataflow

BigQueryIO currently only supports reading from a Table and not a Query or View (FAQ). One way to work around this is in your main program to create a BigQuery permanent table by issuing a query before you run your Dataflow job. After, your job runs you could delete the...

Google spreadsheet script authorisation to BigQuery

google-apps-script,google-spreadsheet,google-bigquery

Unfortunately you can only allow access to scripts running as 'you' if it is running as a web app. The only way to run it as a webapp is if the doGet()/doPost() function is called by the browser. Running the doGet() as a function runs it as a normal script....

Bigquery AllowLageResults and setMaxResults

java,google-bigquery

If the query result is "large" (hundreds of megabytes), then you will need to use allowLargeResults regardless of whether you read it later in pages or not. Otherwise the query will fail.

unable to configure apprtc.appspot with own url

python,google-bigquery,webrtc,apprtcdemo

In order to use insertAll to stream data into a table, you must first create the table and give it the schema you will use. You should pre-create the table out of band from your streaming insert process, since the rate limits on these apis differ drastically. For scenarios where...

How to count push events on GitHub using BigQuery?

github,google-bigquery

The field is called repository_pushed_at, and you also probably meant to include it in the SELECT list, i.e. SELECT repository_pushed_at, COUNT(*) FROM [githubarchive:github.timeline] WHERE type = 'PushEvent' AND repository_name = "account/repo" GROUP BY repository_pushed_at ORDER BY repository_pushed_at DESC ...

BigQuery query without join gives join error on “not in” usage

google-bigquery

This is SQL incompatibility in BigQuery. As a workaround, I think using just WHERE customer_id NOT IN (...) should work.

Select first row from group each by with count using Big Query

sql,google-bigquery

Let me rephrase how I understand the setup: - Devices are installed at fixed locations throughout the buildings - Clients (people) move through the building at when they pass nearby the device, this event is recorded - The time when client with client_id passes device with device_id is recorded in...

BigQuery running totals

google-bigquery,window-functions,running-total

First, remove "LIMIT 30" - it will interfere with the OVER() clause. You want a ratio? Try RATIO_TO_REPORT: SELECT word, word_count, RATIO_TO_REPORT(word_count) OVER(ORDER BY word_count DESC) FROM [publicdata:samples.shakespeare] WHERE corpus = 'hamlet' AND word > 'a' You want consecutive rows with equal values to increase anyways? Decide an order for...

Running count of apperance of customer id Bigquery

google-bigquery

Window Functions are helping you here: Window functions enable calculations on a specific partition, or "window", of a result set. Each window function expects an OVER clause that specifies the partition, in the following syntax: OVER ( [PARTITION BY <expr>] [ORDER BY <expr>] [ROWS <expr> | RANGE <expr>] ) PARTITION...

BigQuery bq command with asterisk (*) doesn't work in Compute Engine

google-bigquery,google-compute-engine,google-cloud-platform

Apparently the bq command which is located at /usr/bin/bq has the following script: #!/bin/sh exec /usr/lib/google-cloud-sdk/bin/bq ${@} which expands the asterisk. As a current workaround I'm calling /usr/lib/google-cloud-sdk/bin/bq directly....

BiqQuery - select values with max function

sql,max,google-bigquery

I've never used bigquery before but something like this should work: SELECT movieID, CASE WHEN F_rate >= M_rate THEN F_rate ELSE M_rate END max_rating, CASE WHEN F_rate > M_rate THEN 'Females Rated it Higher' WHEN F_rate < M_rate THEN 'Males Rated it Higher' ELSE 'Rated Equal' END AS who_rated_it_higher, ABS(F_rate...

Substring in Google BigQuery

date,substring,google-bigquery

Is this a string manipulation question? Then the answer would be getting the LEFT() 2 characters of that string. If this is a date manipulation question: SELECT TIMESTAMP('2014-03-03 05:00:00') 2014-03-03 05:00:00 UTC SELECT DATE(TIMESTAMP('2014-03-03 05:00:00')) 2014-03-03 SELECT DAY(TIMESTAMP('2014-03-03 05:00:00')) 3 ...

Using external .csv file in Google BigQuery

google-bigquery

I interpret your question as for every word in Shakespeare's work you want to find out if it exists in your CSV file. Once you uploaded your CSV file to BigQuery, you can use the following syntax in your SQL query: SELECT word FROM publicdata:samples.shakespeare WHERE word IN (SELECT some_word...

Get MAX from row with column name (SQL)

google-bigquery

You can use GREATEST: SELECT userid, CASE WHEN A = GREATEST(A,B,C,D,E) THEN 'A' WHEN B = GREATEST(A,B,C,D,E) THEN 'B' WHEN C = GREATEST(A,B,C,D,E) THEN 'C' WHEN D = GREATEST(A,B,C,D,E) THEN 'D' WHEN E = GREATEST(A,B,C,D,E) THEN 'E' END AS MAX FROM TableName Result: userId MAX 1 A 2 D With...

BigQuery - filtering without losing 'null' values

null,google-bigquery,contains

See this: select * from (select string(NULL) as name,'SFO' as city, 20 as sold), (select 'Nike' as name,'NYC' as city, 15 as sold), where not lower(name) contains 'nike2' or name is null returns +-----+------+------+------+---+ | Row | name | city | sold | | +-----+------+------+------+---+ | 1 | null |...

Google Bigquery says “Response too large to return” with simple select

google-bigquery

I would do couple modifications to this query: Move WHERE clause filters closer to the table scan Use JOIN EACH construct SELECT s.flyFrom, s.to, s.typeFlight, r.price, b.price, b.affily FROM [sptest.buy] AS b INNER JOIN EACH (SELECT * FROM [sptest.search_results] WHERE saved_at > DATE('2015-06-23 00:00:00')) AS r ON b.booking_token=r.booking_token INNER JOIN...

Cannot use calculated offset in BigQuery's DATE_ADD function

google-bigquery,tableau,google-cloud-platform

I acknowledge that this is a hole in functionality of DATE_ADD. It can be fixed, but it will take some time until fix is rolled into production.

Using more than one field with IN ( ) for a sub-query

google-analytics,google-bigquery

Do a JOIN instead. The equivalent of: SELECT COUNT(*), stn, a.wban, FIRST(name) name, FIRST(country) country FROM [fh-bigquery:weather_gsod.gsod2014] a WHERE stn, wban IN (SELECT usaf, wban FROM [fh-bigquery:weather_gsod.stations] WHERE country='UK') GROUP BY 2, 3 ORDER BY 1 DESC Would be: SELECT COUNT(*), stn, a.wban, FIRST(name) name, FIRST(country) country FROM [fh-bigquery:weather_gsod.gsod2014] a...

The Python script configuration for a Big Query job requires a sourceUri value, but there is no sourceUri

python,python-2.7,google-bigquery

The configuration.query.tableDefinitions parameter should be optional. If you are querying only data stored in BigQuery tables, then you should be able to omit the entire tableDefinitions parameter. The sourceUris parameter should only be required if a tableDefinitions object is present. https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.query.tableDefinitions...

Is it good to call Thread.Sleep during polling Google Big Query results in ASP.NET? Alternatives?

c#,asp.net,asp.net-mvc,async-await,google-bigquery

I wouldn't do this on the server side as one have to be careful which waiting calls to use to avoid high resource consumption under load. Your users also don't get any feedback from the page. You can improve this situation by displaying a spinning wheel, but it might be...

Extracting data using regexp_extract in Google BigQuery

sql,regex,extract,google-bigquery

It's very simple to do: select regexp_extract(input,r'he=(.{32})'); or as example: select regexp_extract('http://mpp.xyz.com/conv/v=5;m=1;t=16901;ts=20150516234355;he=5e3152eafc50ed0346df7f10095d07c4;catname=Horoscope',r'he=(.{32})') ...

Is there a way to determine or specify what geo region BigQuery stores data in?

google-bigquery

Note: Everything in this post should be considered a guideline and not a guarantee. When in doubt, refer to the BigQuery terms-of-service, which will spell out in more detail about what is guaranteed with respect to data location. By default, BigQuery stores your data in us-central1 and us-central2. If you...

Google BigQuery - simulate Pandas removeDuplicates() in Google BigQuery SQL

sql,data,pandas,analytics,google-bigquery

You can group by all of your columns that you want to remove duplicates from, and use FIRST() of the others. That is, removeDuplicates([col1, col3]) would translate to SELECT col1, FIRST(col2) as col2, col3 FROM table GROUP EACH BY col1, col3 Note that in BigQuery SQL, if you have more...

Exporting data from BigQuery to GCS - Partial transfer possible?

asynchronous,export,google-bigquery,google-cloud-storage,callblocking

Partial exports are possible if the job fails for some reason mid-way through execution. If the job is in the DONE state, and there are no errors in the job, then all the data has been exported. I recommend waiting a bit before polling for job done -- you can...

'TRIM' or 'PROPER' in BigQuery

google-bigquery,trim

BigQuery does have LTRIM (trims spaces from left) and RTRIM (trims spaces from right) functions. (Strings functions documentation at https://cloud.google.com/bigquery/query-reference#stringfunctions missed them, we will fix this shortly).

How do I cast dd/mm/yyyy string into date in BigQuery?

datetime,casting,google-bigquery,string-to-datetime

You can convert your dd/MM/yyyy strings into BigQuery timestamps using something like the following: SELECT TIMESTAMP(year + '-' + month + '-' + day) as output_timestamp FROM ( SELECT REGEXP_EXTRACT(input_date, '.*/([0-9]{4})$') as year, REGEXP_EXTRACT(input_date, '^([0-9]{2}).*') as day, REGEXP_EXTRACT(input_date, '.*/([0-9]{2})/.*') AS month FROM (SELECT '30/10/2015' as input_date), (SELECT '25/01/2015' as input_date)...

Designing an API on top of BigQuery

google-app-engine,bigdata,google-bigquery

Based on my experience analyzing performance of similar projects in BigQuery. If you are concerned with performance only, then you don't have to change anything. BigQuery's optimizer can figure out many things, and if query uses WHERE against only few days - the performance will be good. But from billing...

Can we perform joins on tables in two different projects in BigQuery?

database,join,google-bigquery

Yes, you certainly can. You need to qualify the table name with project name, i.e. projectname:dataset.table Here is an example of my joining one of my tables against table in publicdata project: select sum(a.is_male) from (select is_male, year from [publicdata:samples.natality]) a inner join (select year from [moshap.my_years]) b on a.year...

How do you calculate a boolean aggregate over a column in BigQuery?

sql,aggregate-functions,google-bigquery

Yes, BigQuery has such aggregation functions, it uses SQL Standard names for them: EVERY (will do logical and) SOME (will do logical or) ...

Value cannot be null. Parameter name: baseUri

c#,google-api,google-bigquery,google-api-dotnet-client,service-accounts

I manage to found the root cause and solution for my problem. Actual Error: Table Schema getting Mismatched. Reason: In Big Query table Field Mode is REQUIRED. I am getting table schema from Response of this query "Select * From Table-name Limit 1" because not able to build new schema....

Truncate a table in GBQ

google-bigquery

BigQuery doesn't support TRUNCATE as part of a query string. The only DDL/DML verb that BQ supports is SELECT. One option is to run a job with WRITE_TRUNCATE write disposition (link is for the query job parameter, but it's supported on all job types with a destination table). This will...

Hard limit on number of tables in a BQ project

google-bigquery

There are projects with that number of distinct tables today. There is not currently a hard cap on the number of distinct tables. Some related considerations that come to mind when you're contemplating representations that use that many tables: A query (including referenced views) can currently only reference 1000 tables....

Row larger than the maximum allowed size

google-bigquery

Some of the answers here gave me an idea so I went on a tried it. It appears as if for some strange reason BQ didn't like line endings so I wrote a quick script to rewrite the original input file to use line endings. Automagically the import worked! This...

Hits per day in Google Big Query

sql,google-bigquery

You can query data that exists in your tables, the query cannot guess which dates are missing from your table. This problem you need to handle either in your programming language, or you could join with a numbers table and generates the dates on the fly. If you know the...

Google BigQuery asking Gmail Confirmation, Best way to handle in Production Environment

google-bigquery,dev-to-production

I'm not an expert in C#, but I can see that you are using "GoogleWebAuthorizationBroker" - that's indeed a helper class that will open a web page to get the end-user authentication and authorization. Use instead an OAuth2ServiceAccount for machine2machine auth (https://developers.google.com/identity/protocols/OAuth2ServiceAccount). For sample C# code, see "where can i...

BigQuery: Using threshold with COUNT DISTINCT in WINDOW function returns error

google-bigquery,window-functions

COUNT(DISTINCT) is documented as approximation when used as aggregation function, but when it is used as analytic function - it is actually the exact implementation, so you don't need extra parameter - you will get the exact result without it.

BigQuery export with TIMESTAMP of derived tables broken?

google-bigquery

Thanks to Mosha, to pointing it out in the issues list: Export to GCS in CSV format renders datetime field as empty The issue is resolved in the meantime....

When does a cached Big Query job expire?

google-bigquery

Directly from the docs (if you Google "App Engine BigQuery Caching") : Results are cached for approximately 24 hours and cache lifetimes are extended when a query returns a cached result. So basically, as long as you get your cached result once every 24 hours, cache should stay indefinitely....

combining two multiple bigquery SELECT FROM statements

sql,google-bigquery

You can count distinct users like this: SELECT EXACT_COUNT_DISTINCT(userId) as buyers FROM (FLATTEN([table1], user_attribute)) WHERE event_value > 0 AND event_parameters.Name = "SKU" One way to join them is to add a static scalar value and use that for join: SELECT buyers/total FROM ( SELECT EXACT_COUNT_DISTINCT(userId) AS buyers, 1 AS scalar,...

Error when I try to create different BigQuery tables at the same pipeline execution

google-bigquery,google-cloud-dataflow

Thanks for reporting this. The cause was a bug in BigQueryIO that caused the second table to occasionally not be created. This bug has now been fixed in github with this commit. The fix will be pushed to maven later this month. Sorry for the trouble!

'Allow Large Results' option in Browser Tool is not honored

google-bigquery

This doesn't work: SELECT * FROM [wikipedia_benchmark.Wiki10M] "Response too large to return." This works: SELECT * FROM [wikipedia_benchmark.Wiki10M] [x] Allow Large Results This doesn't work: SELECT * FROM [wikipedia_benchmark.Wiki10M] ORDER BY title [x] Allow Large Results "Response too large to return." The problem is that you can not use 'ORDER...

Big Query Table Last Modified Timestamp does not correspond to time of last table insertion

google-bigquery

Short answer: You are indeed looking at the correct metadata. Long answer: The last modification time includes the time of some internal compaction of data, unrelated to data change. Executing a query against your table with a decorator ending at either 1431223783125 or 1431216576000 produces the same results, just like...

Beginner GA export bigquery questions

google-analytics,google-bigquery

COUNT(DISTINCT field[, N]) is a statistical approximation. For counts less than N, it is exact. To get an exact count for large values, use count(*) on a group each by, however this may be a much slower query. See COUNT documentation at https://cloud.google.com/bigquery/query-reference....

How to get distinct values on GROUP_CONCAT using Google Big Query

distinct,google-bigquery,group-concat

Here is solution which uses UNIQUE scope aggregation function to remove duplicates. Note, that in order to use it, first we need to build a REPEATED using NEST aggregation: SELECT GROUP_CONCAT(UNIQUE(ids)) WITHIN RECORD, GROUP_CONCAT(UNIQUE(products)) WITHIN RECORD FROM ( SELECT category, NEST(id) as ids, NEST(product) as products FROM (SELECT "a" as...

How to Pivot in Google BigQuery

python,pandas,google-bigquery

This is a way to do: select shipmentID, sum(IF (category='shoes', quantity, 0)) AS shoes, sum(IF (category='hats', quantity, 0)) AS hats, sum(IF (category='shirts', quantity, 0)) AS shirts, sum(IF (category='toys', quantity, 0)) AS toys, sum(IF (category='books', quantity, 0)) AS books, from (select 1 as shipmentID, 'shoes' as category, 5 as quantity), (select...

Error with BQ command line tool: Cannot start a job without a project id

google-bigquery

You should configure gcloud command-line tool first: gcloud config set project 'yourProjectId' This project is billed for querying not public data. Then you can run your query: bq query 'select count(*) from publicdata:samples.shakespeare' ...

BigQuery - Check if table already exists

google-api,export,google-bigquery,google-cloud-storage

Here is a python snippet that will tell whether a table exists: def doesTableExist(project_id, dataset_id, table_id): bq.tables().delete( projectId=project_id, datasetId=dataset_id, tableId=table_id).execute() return False Alternately, if you'd prefer not deleting the table in the process, you could try: def doesTableExist(project_id, dataset_id, table_id): try: bq.tables().get( projectId=project_id, datasetId=dataset_id, tableId=table_id).execute() return True except HttpError, err...

Append a column and its data to a BigQuery table

google-bigquery

BigQuery is append-only, so you cannot update existing rows. For new inserts you can populate the new column you added. In case you want to update the previous data, you need to do recreate the table into a new one, then you will be able to add on insert time...

Job configuration for Synchronous query in Google Big Query

python,google-app-engine,google-bigquery

Normally, queries have a maximum response size. If you plan to run a query that might return larger results, you can set allowLargeResults to true in your job configuration. Jobs are objects that manage asynchronous tasks. So this is not possible in synchronous mode. Queries that return large results will...

BigQuery SPLIT() ignores empty values

google-bigquery

This is By Design behavior, and it is not specific to SPLIT function, but to REPEATED fields in general. BigQuery REPEATED fields cannot store NULLs (same behavior as in protocol buffers), therefore nothing that SPLIT does can make NULLs appear inside REPEATED fields.

BigQuery completed job returns 404 on getting query results (immediately after)

google-bigquery

There was a brief period yesterday (June 3) where a small percentage of requests to BigQuery were rejected with a 404 response. It should have cleared up by about 8pm Pacific Time. This was due to a problem with a configuration change that was caught before it rolled out widely,...

Getting A Better Understanding Of Streaming Inserts With BigQuery

google-bigquery,google-cloud-platform

My interpretation of the rules, I have to confirm with the team: If your rows are less than 1KB each, this would bring the price from $0.01 per 100,000 rows to $0.01 per 200,000 rows - an effective 50% reduction of previous pricing. If your rows are exactly 2KB each,...