DEA-C01 Sample Practice Exam Questions 2024 Updated Verified [Q63-Q82]

DEA-C01 Sample Practice Exam Questions 2024 Updated Verified

Exam Study Guide Free Practice Test LAST UPDATED DEA-C01

NEW QUESTION # 63
A CSV file around 1 TB in size is generated daily on an on-premise server A corresponding table. Internal stage, and file format have already been created in Snowflake to facilitate the data loading process How can the process of bringing the CSV file into Snowflake be automated using the LEAST amount of operational overhead?

A. On the on premise server schedule a Python file that uses the Snowpark Python library. The Python script will read the CSV data into a DataFrame and generate an insert into statement that will directly load into the table The script will bypass the need to move a file into an internal stage
B. Create a task in Snowflake that executes once a day and runs a copy into statement that references the internal stage The internal stage will read the files directly from the on-premise server and copy the newest file into the table from the on-premise server to the Snowflake table
C. On the on-premise server schedule a SQL file to run using SnowSQL that executes a PUT to push a specific file to the internal stage Create a task that executes once a day m Snowflake and runs a OOPY WTO statement that references the internal stage Schedule the task to start after the file lands in the internal stage
D. On the on-premise server schedule a SQL file to run using SnowSQL that executes a PUT to push a specific file to the internal stage. Create a pipe that runs a copy into statement that references the internal stage Snowpipe auto-ingest will automatically load the file from the internal stage when the new file lands in the internal stage.

Answer: D

Explanation:
Explanation
This option is the best way to automate the process of bringing the CSV file into Snowflake with the least amount of operational overhead. SnowSQL is a command-line tool that can be used to execute SQL statements and scripts on Snowflake. By scheduling a SQL file that executes a PUT command, the CSV file can be pushed from the on-premise server to the internal stage in Snowflake. Then, by creating a pipe that runs a COPY INTO statement that references the internal stage, Snowpipe can automatically load the file from the internal stage into the table when it detects a new file in the stage. This way, there is no need to manually start or monitor a virtual warehouse or task.

NEW QUESTION # 64
A Data Engineer wants to create a new development database (DEV) as a clone of the permanent production database (PROD) There is a requirement to disable Fail-safe for all tables.
Which command will meet these requirements?

A. CREATE TRANSIENT DATABASE DEV
CLONE RPOD
B. CREATE DATABASE DEV
CLONE PROD
FAIL_SAFE=FALSE;
C. CREATE DATABASE DEV
CLOSE PROD
DATA_RETENTION_TIME_IN_DAYS =0L
D. CREATE DATABASE DEV
CLONE PROD;

Answer: A

Explanation:
Explanation
This option will meet the requirements of creating a new development database (DEV) as a clone of the permanent production database (PROD) and disabling Fail-safe for all tables. By using the CREATE TRANSIENT DATABASE command, the Data Engineer can create a transient database that does not have Fail-safe enabled by default. Fail-safe is a feature in Snowflake that provides additional protection against data loss by retaining historical data for seven days beyond the time travel retention period. Transient databases do not have Fail-safe enabled, which means that they do not incur additional storage costs for historical data beyond their time travel retention period. By using the CLONE option, the Data Engineer can create an exact copy of the PROD database, including its schemas, tables, views, and other objects.

NEW QUESTION # 65
A Data Engineer needs to ingest invoice data in PDF format into Snowflake so that the data can be queried and used in a forecasting solution.
..... recommended way to ingest this data?

A. Create a Java User-Defined Function (UDF) that leverages Java-based PDF parser libraries to parse PDF data into structured data
B. Use Snowpipe to ingest the files that land in an external stage into a Snowflake table
C. Create an external table on the PDF files that are stored in a stage and parse the data nto structured data
D. Use a COPY INTO command to ingest the PDF files in an external stage into a Snowflake table with a VARIANT column.

Answer: A

Explanation:
Explanation
The recommended way to ingest invoice data in PDF format into Snowflake is to create a Java User-Defined Function (UDF) that leverages Java-based PDF parser libraries to parse PDF data into structured data. This option allows for more flexibility and control over how the PDF data is extracted and transformed. The other options are not suitable for ingesting PDF data into Snowflake. Option A and B are incorrect because Snowpipe and COPY INTO commands can only ingest files that are in supported file formats, such as CSV, JSON, XML, etc. PDF files are not supported by Snowflake and will cause errors or unexpected results.
Option C is incorrect because external tables can only query files that are in supported file formats as well.
PDF files cannot be parsed by external tables and will cause errors or unexpected results.

NEW QUESTION # 66
A company receives a daily file that contains customer data in .xls format. The company stores the file in Amazon S3. The daily file is approximately 2 GB in size.
A data engineer concatenates the column in the file that contains customer first names and the column that contains customer last names. The data engineer needs to determine the number of distinct customers in the file.
Which solution will meet this requirement with the LEAST operational effort?

A. Create and run an Apache Spark job in Amazon EMR Serverless to calculate the number of distinct customers.
B. Create and run an Apache Spark job in an AWS Glue notebook. Configure the job to read the S3 file and calculate the number of distinct customers.
C. Use AWS Glue DataBrew to create a recipe that uses the COUNT_DISTINCT aggregate function to calculate the number of distinct customers.
D. Create an AWS Glue crawler to create an AWS Glue Data Catalog of the S3 file. Run SQL queries from Amazon Athena to calculate the number of distinct customers.

Answer: C

Explanation:
AWS Glue DataBrew: AWS Glue DataBrew is a visual data preparation tool that allows data engineers and data analysts to clean and normalize data without writing code. Using DataBrew, a data engineer could create a recipe that includes the concatenation of the customer first and last names and then use the COUNT_DISTINCT function. This would not require complex code and could be performed through the DataBrew user interface, representing a lower operational effort.

NEW QUESTION # 67
Which methods will trigger an action that will evaluate a DataFrame? (Select TWO)

A. DataFrame.show ()
B. DataFrame.col ( )
C. DataFrame.collect ()
D. DataFrame.random_split ( )
E. DateFrame.select ()

Answer: A,C

Explanation:
Explanation
The methods that will trigger an action that will evaluate a DataFrame are DataFrame.collect() and DataFrame.show(). These methods will force the execution of any pending transformations on the DataFrame and return or display the results. The other options are not methods that will evaluate a DataFrame. Option A, DataFrame.random_split(), is a method that will split a DataFrame into two or more DataFrames based on random weights. Option C, DataFrame.select(), is a method that will project a set of expressions on a DataFrame and return a new DataFrame. Option D, DataFrame.col(), is a method that will return a Column object based on a column name in a DataFrame.

NEW QUESTION # 68
Search optimization works best to improve the performance of a query when the following condi-tions are true:[Select All that apply]

A. The table is frequently queried on columns other than the primary cluster key.
B. Search Query uses Sort Operations.
C. The table is not clustered.
D. Search Query uses Equality predicates (for example, <column_name> = <constant>) OR Predicates that use IN.

Answer: A,C,D

Explanation:
Explanation
Materialized Views works best for search query performance in case of Sort Operations. For Rest of the points Search optimization works best to improve query performance.

NEW QUESTION # 69
A company has developed several AWS Glue extract, transform, and load (ETL) jobs to validate and transform data from Amazon S3. The ETL jobs load the data into Amazon RDS for MySQL in batches once every day. The ETL jobs use a DynamicFrame to read the S3 data.
The ETL jobs currently process all the data that is in the S3 bucket. However, the company wants the jobs to process only the daily incremental data.
Which solution will meet this requirement with the LEAST coding effort?

A. Create an ETL job that reads the S3 file status and logs the status in Amazon DynamoDB.
B. Configure the ETL jobs to delete processed objects from Amazon S3 after each run.
C. Enable job bookmarks for the ETL jobs to update the state after a run to keep track of previously processed data.
D. Enable job metrics for the ETL jobs to help keep track of processed objects in Amazon CloudWatch.

Answer: C

Explanation:
AWS Glue job bookmarks are designed to handle incremental data processing by automatically tracking the state.

NEW QUESTION # 70
Which callback function is required within a JavaScript User-Defined Function (UDF) for it to execute successfully?

A. finalize ()
B. processRow ()
C. initialize ()
D. handler

Answer: B

Explanation:
Explanation
The processRow () callback function is required within a JavaScript UDF for it to execute successfully. This function defines how each row of input data is processed and what output is returned. The other callback functions are optional and can be used for initialization, finalization, or error handling.

NEW QUESTION # 71
A company needs to build a data lake in AWS. The company must provide row-level data access and column-level data access to specific teams. The teams will access the data by using Amazon Athena, Amazon Redshift Spectrum, and Apache Hive from Amazon EMR.
Which solution will meet these requirements with the LEAST operational overhead?

A. Use Amazon Redshift for data lake storage. Use Redshift security policies to restrict data access by rows and columns. Provide data access by using Apache Spark and Amazon Athena federated queries.
B. Use Amazon S3 for data lake storage. Use AWS Lake Formation to restrict data access by rows and columns. Provide data access through AWS Lake Formation.
C. Use Amazon S3 for data lake storage. Use Apache Ranger through Amazon EMR to restrict data access by rows and columns. Provide data access by using Apache Pig.
D. Use Amazon S3 for data lake storage. Use S3 access policies to restrict data access by rows and columns. Provide data access through Amazon S3.

Answer: B

Explanation:
https://docs.aws.amazon.com/lake-formation/latest/dg/cbac-tutorial.html

NEW QUESTION # 72
A company is building a dashboard for thousands of Analysts. The dashboard presents the results of a few summary queries on tables that are regularly updated. The query conditions vary by tope according to what data each Analyst needs Responsiveness of the dashboard queries is a top priority, and the data cache should be preserved.
How should the Data Engineer configure the compute resources to support this dashboard?

A. Assign queries to a multi-cluster virtual warehouse with economy auto-scaling Allow the system to automatically start and stop clusters according to demand.
B. Assign all queries to a multi-cluster virtual warehouse set to maximized mode Monitor to determine the smallest suitable number of clusters.
C. Create a virtual warehouse for every 250 Analysts Monitor to determine how many of these virtual warehouses are being utilized at capacity.
D. Create a size XL virtual warehouse to support all the dashboard queries Monitor query runtimes to determine whether the virtual warehouse should be resized.

Answer: B

Explanation:
Explanation
This option is the best way to configure the compute resources to support this dashboard. By assigning all queries to a multi-cluster virtual warehouse set to maximized mode, the Data Engineer can ensure that there is enough compute capacity to handle thousands of concurrent queries from different analysts. A multi-cluster virtual warehouse can scale up or down by adding or removing clusters based on the load. A maximized scaling policy ensures that there is always at least one cluster running and that new clusters are added as soon as possible whenneeded. By monitoring the utilization and performance of the virtual warehouse, the Data Engineer can determine the smallest suitable number of clusters that can meet the responsiveness requirement and minimize costs.

NEW QUESTION # 73
Mark the incorrect statement in case Data engineer using the COPY INTO <table> command to load data from files into Snowflake tables?

A. For loading data from all semi-structured supported file formats (JSON, Avro, etc.), as well as unloading data, UTF-8 is the only supported character set.
B. UTF-32 & UTF-16 both encoding character sets supported for loading data from de-limited files (CSV, TSV, etc.)
C. For Data loading of files with semi-structured file formats (JSON, Avro, etc.), the only supported character set is UTF-16.
D. For Local environment, Files are first copied ("staged") to an internal (Snowflake) stage, then loaded into a table.

Answer: C

Explanation:
Explanation
For Data Loading of delimited files (CSV, TSV, etc.), the default character set is UTF-8. To use any other characters sets, you must explicitly specify the encoding to use for loading.
For semi-structured file formats (JSON, Avro, etc.), the only supported character set is UTF-8.
Rest of the statements are correct.

NEW QUESTION # 74
A company uses Amazon Redshift for its data warehouse. The company must automate refresh schedules for Amazon Redshift materialized views.
Which solution will meet this requirement with the LEAST effort?

A. Use Apache Airflow to refresh the materialized views.
B. Use the query editor v2 in Amazon Redshift to refresh the materialized views.
C. Use an AWS Lambda user-defined function (UDF) within Amazon Redshift to refresh the materialized views.
D. Use an AWS Glue workflow to refresh the materialized views.

Answer: C

Explanation:
AWS Lambda allows running code in response to triggers without needing to provision or manage servers. However, creating a UDF within Amazon Redshift to call a Lambda function for this purpose involves writing custom code and managing permissions between Lambda and Redshift.

NEW QUESTION # 75
A company stores data in a data lake that is in Amazon S3. Some data that the company stores in the data lake contains personally identifiable information (PII). Multiple user groups need to access the raw data. The company must ensure that user groups can access only the PII that they require.
Which solution will meet these requirements with the LEAST effort?

A. Use Amazon QuickSight to access the data. Use column-level security features in QuickSight to limit the PII that users can retrieve from Amazon S3 by using Amazon Athena. Define QuickSight access levels based on the PII access requirements of the users.
B. Create IAM roles that have different levels of granular access. Assign the IAM roles to IAM user groups. Use an identity-based policy to assign access levels to user groups at the column level.
C. Use Amazon Athena to query the data. Set up AWS Lake Formation and create data filters to establish levels of access for the company's IAM roles. Assign each user to the IAM role that matches the user's PII access requirements.
D. Build a custom query builder UI that will run Athena queries in the background to access the data.
Create user groups in Amazon Cognito. Assign access levels to the user groups based on the PII access requirements of the users.

Answer: C

Explanation:
https://aws.amazon.com/blogs/big-data/anonymize-and-manage-data-in-your-data-lake-with- amazon-athena-and-aws-lake-formation/

NEW QUESTION # 76
A Data Engineer ran a stored procedure containing various transactions During the execution, the session abruptly disconnected preventing one transactionfrom committing or rolling hark.The transaction was left in a detached state and created a lock on resources
...must the Engineer take to immediately run a new transaction?

A. Call the system function SYSTEM$CANCEL_TRANSACTION.
B. Call the system function SYSTEM$ABORT_TRANSACTION.
C. Set the LOCK_TIMEOUTto FALSE in the stored procedure
D. Set the transaction abort on error to true in the stored procedure.

Answer: B

Explanation:
Explanation
The system function SYSTEM$ABORT_TRANSACTION can be used to abort a detached transaction that was left in an open state due to a session disconnect or termination. The function takes one argument: the transaction ID of the detached transaction. The function will abort the transaction and release any locks held by it. The other options are incorrect because they do not address the issue of a detached transaction. The system function SYSTEM$CANCEL_TRANSACTION can be used to cancel a running transaction, but not a detached one. The LOCK_TIMEOUT parameter can be used to set a timeout period for acquiring locks on resources, but it does not affect existing locks. The TRANSACTION_ABORT_ON_ERROR parameter can be used to control whether a transaction should abort or continue when an error occurs, but it does not affect detached transactions.

NEW QUESTION # 77
A company has used an Amazon Redshift table that is named Orders for 6 months. The company performs weekly updates and deletes on the table. The table has an interleaved sort key on a column that contains AWS Regions.
The company wants to reclaim disk space so that the company will not run out of storage space.
The company also wants to analyze the sort key column.
Which Amazon Redshift command will meet these requirements?

A. VACUUM FULL Orders
B. VACUUM SORT ONLY Orders
C. VACUUM REINDEX Orders
D. VACUUM DELETE ONLY Orders

Answer: A

NEW QUESTION # 78
To support Time Travel, Which of the following SQL extensions/parameters/commands have been implemented?

A. ONSET (time difference in seconds from the present time)
B. STATEMENT_ID (identifier for statement, e.g. query ID)
C. AT | BEFORE clause which can be specified in the CREATE ... CLONE commands.
D. OFFSET (time difference in seconds from the present time)
E. STATEMENT (identifier for statement, e.g. query ID)
F. UNDROP command for tables, schemas, and databases.

Answer: C,D,E,F

NEW QUESTION # 79
Can the same column be specified in both a Dynamic data masking policy signature and a row ac-cess policy signature at the same time?

A. NO
B. YES

Answer: A

NEW QUESTION # 80
Regular views do not cache data, and therefore cannot improve performance by caching?

A. TRUE
B. FALSE

Answer: A

Explanation:
Explanation
Regular views do not cache data, and therefore cannot improve performance by caching.

NEW QUESTION # 81
In a data engineering pipeline, a company is using multiple applications and teams to access a shared Amazon S3 bucket. To streamline access and simplify permissions management for these different entities, which S3 feature should the company utilize?

A. Activate S3 Transfer Acceleration for the bucket to ensure fast and differentiated access for each application or team.
B. Enable multiple IAM roles, each corresponding to an application or team, granting access to the S3 bucket.
C. Use S3 Access Points to create unique endpoints with tailored permissions for each application or team.
D. Implement S3 Lifecycle policies for each application or team to manage their specific data access and retention.

Answer: C

NEW QUESTION # 82
......

The New DEA-C01 2024 Updated Verified Study Guides & Best Courses: https://examcollection.dumpsactual.com/DEA-C01-actualtests-dumps.html

DEA-C01 Sample Practice Exam Questions 2024 Updated Verified [Q63-Q82]

Related Articles

Latest Exam Questions

Useful Links

Contact Us