Menu
  • HOME
  • TAGS

Split transformation steps based on Parameter in Pentaho Data Integration

pentaho,etl,data-integration

Use "Filter Rows" step in PDI. Check the image below: Give the parameter value as "1" or "0" in the filter rows section. Also properly assign the path for both true and false conditions. For more you can check this link. Hope it helps :)...

Jitterbit4 vs Jitterbit 5+ vs Talend vs Other ETL tools

etl,talend,jitterbit

I'm going to start this by saying I really don't have any knowledge about Jitterbit at all so have no real comparison. The other thing to add is that some of the things you want are available in the enterprise licences for Talend but not in the free Talend Open...

How do I get CAS to update a small subset of record properties during a partial update?

etl,endeca

I've confirmed that there's not really an explicit mechanism to do this, so I invented my own. To summarize how it works: I customized the PartialUpdate beanshell script so that, right after the last mile crawl runs, it invokes a custom-component I created called DGIDXTransformer (i.e. it extends CustomComponent). This...

How does ETL (database to database) fit into SOA?

database,architecture,soa,etl,decoupling

All of these answers are good and helpful. As I now understand SOA is not about implementing application, but about Architecture ("A"), mainly Enterprise Architecture. Enterprise main management method is delegation of responsibility for Services ("S"). So if there are two different business functions in the enterprise structure with two...

Reducing data with data stage

etl,datastage

Datastage does not do percentage wise reductions What you can do is to use a tranformer stage or a filter stage to filter out the data from the source based on certain conditions. But like I said conditions have to be very specific. (for example - select only those records...

How to scale concurrent ETL tasks up to an arbitrary number in SSIS?

concurrency,ssis,scale,etl

This is controlled by the Package Property: MaxConcurrentExecutables. The default is -1 which means machine cores x 2, and usually works well. You can also affect this by setting EngineThreads on each Data Flow Task. Here's a good summary: http://blogs.msdn.com/b/sqlperf/archive/2007/05/11/implement-parallel-execution-in-ssis.aspx...

Converting strings to dates server side in MongoDB

mongodb,etl

As suggested in the comments, if the balancer turns off it is safe to run this kind of query directly on the shards. This is also discussed in this email thread. In my case, the query went from taking 2 days to 2 hours.

Can I prompt user for Field Mappings in SSIS package?

sql-server,ssis,etl

SSIS packages are generally headless because they typically will run as a scheduled job somewhere on a database server. That said, there are definitely ways to do this. One option that I have used is SQL Management Objects (SMO) to connect to the SQL Server Agent where the job is...

I need a SQL query/SSIS transform in SSIS that checks date?

sql,ssis,etl

Before your data flow task use a Execute SQL Task and run a query similar to this to see if your destination table is empty(initial load)- IF (SELECT count(1) FROM [Table].[Destination])>0 BEGIN SELECT 0 as TableEmpty END ELSE BEGIN SELECT 1 as TableEmpty END Store the result of the query...

Import custom text format without separators [closed]

c#,sql-server,regex,ssis,etl

Instead of a DataTable I'm now writing directly to SQL Server. You need to enter a connection string and the name of the SQL Table in the Insert SQL. If you are really adding that many lines I would consider using SQLCMD.EXE which comes with SQL Server. it accepts any...

Talend Job not letting me map tMap to another tMap Component

etl,talend

To prevent looped components connection in a job, you can use the tHashOutput and tHashInput to store the data of a flow and read it later, by default those components are hidden, you can install them from the menu "file" -> "edit project settings" -> "designer" -> "palette setting" ...

Add Edge Property ithrough Orientdb ETL

etl,orient-db

In edge transformer use edgeFields to bind properties in edges. Example: "edge":{ "class":"HAS_PET", "joinFieldName":"PETID", "lookup":"PET.ID", "direction":"out", "edgeFields": { "Jumps": "${input.Jumps}" }, "unresolvedLinkAction":"NOTHING" } Remember to remove "Jumps" from vertex, after the edge transformer, with: "field": { "fieldName": "Jump", "operation": "remove" }, ...

Talend, combine tWaitForFile and tFileList

etl,talend

TWaitForFile used to start file processing when there is files, the job should be run continiously but do nothing until TWaitForFile detect file in the directory, its simply a file-based trigger used for who can buy Entreprise edition.

Fact table organization

etl,dimensional-modeling,star-schema

The 1st structure seems to be more natural and common to me. However, the 2nd one is more flexible, because it supports adding new KPIs without changing the structure of the fact table. If different ways of accessing data actually require different structures, there is nothing wrong about having two...

job don't run properly when using several tBufferOuput/tBufferInput couples in in it in Talend

java,buffer,etl,jobs,talend

If you aren't passing the data to a parent job and are instead keeping the data inside the same job (just multiple subjobs inside) you'd probably be better off with the tHash components. This allows you to cache some data (either in memory or temporarily to disk) and then retrieve...

OrientDB ETL edge lookup from query - how to access $input?

graph,etl,orient-db

Seems you're inserting a property "id", but it's reserved in Blueprints standard. You can rename it (with "field" transformers) or set this in Orient Loader: standardElementConstraints: false, Then I've created the file /temp/datasets/charles.json with this content: [ { name: "Joe", id: 1, friends: [2,4,5], enemies: [6] }, { name: "Suzie",...

Datastage - run user defined sql query file using odbc connector

sql-server,etl,datastage

ODBC connector does have the option of reading the select statement from a file. What version of Datastage are you using ? Another work around (which I consider Not the best of practices) is to put the query in a parameter Set and then call this parameter set in the...

“m_db unload” command not returning any value in AbInitio

parameters,etl,graphical-programming,ab-initio

I have found a solution for this problem. There were two mistakes I was doing: First I was not committing the data after insertion in SQL server. Second I was using parameter interpretation as substitution but now I have changed it to shell. Any other suggestion are most welcomed. :)...

Counting how many script in SSIS have been completed

scripting,ssis,etl,data-warehouse

Currently there is no such functionality provided by SSIS to track progress of package execution. It seems you need to write your own custom utility/application to implement same or use third party one. There are few ways to do - Using a switch called /Reporting or /Rep of DTEXEC at...

How to write a UDF in Tajo [closed]

etl,apache-tajo

I'm a Tajo PMC member. Thank you for your interest in Tajo. Currently, Tajo does not support Python UDF yet. But, Tajo was designed to have multiple UDF function implementations for each single function signature. Python function UDF feature can be added to Tajo easily. Now, you should use Java-based...

What should the converted data type of the corresponding column within the Data Converter SSIS Data Flow Component be?

sql-server-2008,ms-access,ssis,etl

@vikramsinh-shinde Thanks for the webpage links. The following implementation seemed to ensure that the SSIS package ran properly, however, I hope it's the proper way to handle conversion in the SSIS packages. -varchar(5) column values in Microsoft SQL Server 2008 -extracted as string [DT_STR] that has a length of 5...

Comparing filenames in PDI

pentaho,etl,kettle,data-integration,pdi

There are various ways to achieve that result. Here's one: Get filenames lists all files within a specific folder that match a given pattern. As ${KeyDate} is already defined as a parameter, the pattern could be ${KeyDate}[^]_[0-9].csv (you can use a simpler regex, but this one will match only filenames...

Outputting a single Excel file with multiple worksheets

excel,etl,talend

You'll need to output your data into two separate tFileOutputExcel components with the second one set to append the data to the file as a different sheet. A quick example has some name and age data held against a unique id that needs to be split into two separate sheets...

DWH and ETL explained

etl,dimensional-modeling

This answer by no means should be treated as a complete definition of a data warehouse. It's only my attempt to explain the term in layman's terms. Transactional (operational, OLTP) and analytical (data warehouses) systems can both use the same RDBMS as the back-end and they may contain exactly the...

org.apache.axis2.AxisFault: Transport error: 411 Error: Length Required (CloudConnect Salesforce sample ETL SOQL validation error)

object,salesforce,etl,soql,gooddata

This is the known issue (bug) in CloudConnect. We are working hard to fix this.

Talend Normalize Flat File into Relational Database Tables

etl,talend,database-normalization

I would approach this iteratively, normalising part of the table with each step. You should be able to normalise the person data away from the address, state and zip data in one step and then normalise the state away from the address and zip data and then finally normalise the...

SSIS package data flow tasks report

reporting-services,ssis,etl

For your first question, we use user variables in SSIS to log the number of rows processed by each step along with the package name and execution id. You can then run reports on the history table, and if any of the executions have a large variance in the...

staging database and etl processes

ssis,etl,data-warehouse

We use method #1. We log the execution start of the SSIS package that refreshes the staging database to a table, and log the completion or error to a separate column in the same row. Our ETL processes check the most recent row in that table to determine whether the...

Using blank-line delimited records and colon-separated fields in awk

awk,etl

Method 1 We use split to split each field into two parts: the key and the value. From these, we create associative array a: $ awk -F'\n' -v RS= '{for (i=1;i<=NF;i++) {split($i,arr,/: /); a[arr[1]]=arr[2];} if (a["Age"]+0>40) print a["Name"];}' file Smith, John Mills, Pat Method 2 Here, we split fields at...

Pentaho to convert tree structure data

pentaho,etl,kettle

Your CSV file contains graph or tree definition. The output format is rich (node_id needs to be generated, parent_id needs to be resolved, level needs to be set). There are few issues you will face when processing this kind of CSV file in Pentaho Data Integration: Data loading & processing:...

Creating a denormalized table from a normalized key-value table using 100s of joins

sql,sql-server,join,etl,olap

Think about doing this using conditional aggregation: SELECT s.StudentId, d.Day, max(case when sfv.FieldId = 1 then sfv.Value end) as NumberOfClasses, max(case when sfv.FieldId = 2 then sfv.Value end) as MinutesLateToSchool, ... INTO StudentDays FROM Students s CROSS JOIN Days d LEFT OUTER JOIN StudentFieldValues sfv ON s.StudentId = sfv.StudentId AND...

please provide me with a mapping and proper trasnformation in detail

mapping,etl,informatica,informatica-powercenter

In expression transformation, you can do as following: ID (I) - ID Name (I) - Name v_EXP (V) - v_EXP||'('||ID||'/'||Name||')' o_EXP (O) - v_EXP Then link this exp transformation to aggregator transformation which will assign '(1/Andrew)(2/john)(3/Robert)' to o_EXP. Then push it through exp transformation again and do the following: o_EXP...

Table loading on Simple model still writes to log

sql-server,sql-server-2012,etl

You can't. In simple recovery mode you can still do BEGIN TRAN then COMMIT/ROLLBACK, and more significantly each statement is transactional, so everything has to be written to the log. The thing about simple recovery mode is that the log space is re-used as soon as the transaction (or statement)...

Spoon inserting into postgres yields “duplicate key value violates unique constraint”

postgresql,etl,kettle

The SERIAL type just means that id_facultad is pointing to a sequence which starts at 1. Every time a value is retrieved from that sequence, the next value is incremented. If you insert your own values for id_facultad, then it becomes your responsibility to update the sequence to a next...

Best architecture to convert JSON to SQL?

sql,json,etl

Simple strategy: Parse out of the JSON the fields that are fixed and that you know about. Put these in SQL tables. Fields that you don't recognize, leave them as JSON. If the database supports a JSON type, put it there. Otherwise store it in a big string field. Don't...

How ETL changes Table Structure

database,etl,data-warehouse

Hope this is what you want, a dimension with Geography details. DIM_GEOGRAPHY { PK, CITY_ID, CITY_NAME, DISTRICT_ID, DISTRICT_NAME, REGION_ID, REGION_NAME } FACT_TABLE { PRIMARY_KEY, CITY_ID; COST; } Also you can query the same structure like this, SELECT DIM.DISTRICT_NAME AS 'District_Name', SUM(F.COST) AS 'Total_Cost' FROM FACT F INNER JOIN DIM_GEOGRAPHY DIM...

Solution for SSIS mapping?

ssis,logic,etl,solution

If you do have each no match look up hooked to the error table then you would end up with multiple rows. Instead you can go back to setting the looked up value as 0 for the lookups and before your final insert have a conditional split to check if...

Can I set a Parameter based on the output of a Stored Procedure in Informatica PowerCenter?

parameters,etl,informatica,informatica-powercenter

Informatica uses two kinds of objects: Parameters - these cannot be modified Variables - these can be modified during the execution of a mapping using SETVARIABLE() function. You can define a variable, run stored procedure somewhere in the mapping, connect the output of Stored Procedure to Expression Transformation and add...

Tranforming relational data bases to graph databases

postgresql,graph,neo4j,etl

Some modeling advice: A well normalized relational model, which was not yet denormalized for performance reasons can be translated into the equivalent graph model. Graph model shapes are mostly driven by use-cases, so there will be opportunity for optimization and model evolution afterwards. A good, normalized Entity-Relationship diagram often already...

How to return no matched row in Pentaho Data Inegration (Kettle)?

java,pentaho,lookup,etl,kettle

There is indeed a step that does this, but it doesn't do it alone. It's the Merge rows(diff) step and it has some requirements. In your case, A is the "compare" table and B is the "reference" table. First of all, both inputs (rows from A and B in your...

SQL table up to date from IBM COGNOS TM1 cube

sql,sql-server,ibm,etl,cognos-tm1

You can export the data directly into the SQL Server table using the ODBCOutput function. Frankly I don't recommend it; it leaves you at the mercy of the connection and the workload that may be happening on the SQL Server side at the time. Exporting to a file and reimporting...

Unable to complete Oracle example for ODI Flat File to Flat File Export

database,oracle,etl,data-integration,oracle-data-integrator

You probably missed a part of the second step of that section. Click on the Overview tab and choose "In-Memory Engine: SUNOPSIS_MEMORY_ENGINE" as your staging area. Then go back on the flow tab and you should see three separated groups instead of one. Click on the Datastore of the group...

Building high volume batch data processing tool in Java

java,jdbc,etl

Based on you usage scenario I would recommend Spring Batch. It is very easy to learn and implement. On high level it contains the following 3 important components. ItemReader: This component is used the read batch data from source. You have ready to use implementations like JDBCITeamReader, HibernateItemReader etc. Item...

Kettle hangs on post data using web service step

web-services,etl,kettle

I have found the solution to this problem.The Web service step was becoming a bottle neck step and therefor requires some tweaks to be done to transformation. I have followed the solution found on the following link : http://type-exit.org/adventures-with-open-source-bi/2010/06/parallel-processing-in-pentaho-kettle-jobs/ and now job is done.All just need to set No of...

How can we count number of rows in Talend jobs

if-statement,etl,talend

You'll want a Run if connection between 2 components somewhere (they both have to be sub job startable - they should have a green square background when you drop them on to the canvas) and to use the NB_Line variable from the previous sub job component with something like this...

How to do a aggregation transformation in a kiba etl script (kiba gem)?

ruby,etl,kiba-etl

Kiba author here! You can achieve that in many different ways, depending mainly on the data size and your actual needs. Here are a couple of possibilities. Aggregating using a variable in your Kiba script require 'awesome_print' transform do |r| r[:amount] = BigDecimal.new(r[:amount]) r end total_amounts = Hash.new(0) transform do...

Ab initio component to stop the graph if duplicate rows/records found

etl,talend,data-manipulation,ab-initio

Pass the flow to dedup component. In Dedup component, select unique property for output. This will give you all the unique records. Now in case you have duplicate records, it will go thru the dup port. You can collect those record(s) in a intermediate file (for auditing purpose) and the...

Pentaho Dimension lookup/update

csv,pentaho,etl,kettle

The key points of Dimension Lookup / Update step (used in Update mode) when using for building SCD II. table are: Keys - Key fields: Here you define an id column from your source data (I guess its Key from your CSV file). It is used to look up previously...

Conditional Mapping in Talend

etl,talend

Before joining your input flows, you have to reject rows with null values, I have created a mapping based on the given simple data. ...

Pull Text file to SQL server 2008 table

sql-server,sql-server-2008,etl

SQL Server 2000's DTS was replaced in later versions of SQL Server by SSIS. From the first link (on DTS in Sql Server 2008): Data Transformation Services (DTS) has been replaced by SQL Server Integration Services. You can still install DTS support on Sql server 2008, but I would recommend...

concurrent statistics gathering on Oracle 11g partiitioned table

oracle,oracle11g,etl,data-warehouse,table-statistics

I managed to reach a decent compromise with this function. PROCEDURE gather_tb_partiz( p_tblname IN VARCHAR2, p_partname IN VARCHAR2) IS v_stale all_tab_statistics.stale_stats%TYPE; BEGIN BEGIN SELECT stale_stats INTO v_stale FROM user_tab_statistics WHERE table_name = p_tblname AND object_type = 'TABLE'; EXCEPTION WHEN NO_DATA_FOUND THEN v_stale := 'YES'; END; IF v_stale = 'YES' THEN...

tELTPostgresql* usage issue

postgresql,etl,talend

This is not how to use the ELT components. These should be used to do in database server transformations such as creating a star schema table from multiple tables in the same database. This allows you to use the database to do the transformation and avoid reading the data into...

OCDM combined with ODI

oracle,etl,data-warehouse,oracle-data-integrator

It is indeed possible. OCDM is a solution using an Oracle 11g database to store the data, so ODI can definitely load it. Actually OCDM comes out-of-the-box with adapters to load the data from NCC (Oracle Communications Network Charging and Control) and BRM (Oracle Communications Billing and Revenue Management), and...

SSIS-Calling java service using Script Task

sql-server,web-services,ssis,etl,script-task

I experienced this while using sql server 2008 R2. Now i have tried same solution in sql server 2012 and it is working fine with out any changes.

Extracting multiple choice answers stored in single database field as integer

sql,sql-server,database,ssis,etl

Here is an example of using bitwise operators to extract what you want: Then it's a matter of pivoting and processing that into what you require. SELECT Customer, answer, answer & POWER(2,0) pos1, answer & POWER(2,1) pos2, answer & POWER(2,2) pos3, answer & POWER(2,3) pos4, answer & POWER(2,4) pos5, answer...